使用 Unix 工具解析 JSON

发布于12月24日

我试图解析从curl请求返回的JSON，如下所示:

curl 'http://twitter.com/users/username.json' |
    sed -e 's/[{}]/''/g' | 
    awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'

The above splits the JSON into fields, for example:

% ...
"geo_enabled":false
"friends_count":245
"profile_text_color":"000000"
"status":"in_reply_to_screen_name":null
"source":"web"
"truncated":false
"text":"My status"
"favorited":false
% ...

如何打印特定字段(用-v k=text表示)？

常见问题解答

为什么不是纯粹的shell 解决方案呢？

The standard POSIX/Single Unix Specification shell is a very limited language which doesn't contain facilities for representing sequences (list or arrays) or associative arrays (also known as hash tables, maps, dicts, or objects in some other languages). This makes representing the result of parsing JSON somewhat tricky in portable shell scripts. There are somewhat hacky ways to do it, but many of them can break if keys or values contain certain special characters.

Bash 4及更高版本、zsh和ksh都支持数组和关联数组，但这些shell并不是通用的(macOS在Bash 3上停止更新Bash，原因是从GPLv2更改为GPLv3，而许多Linux系统没有现成安装zsh).你可以编写一个可以在Bash 4或zsh中工作的脚本，其中一个现在在大多数macOS、Linux和BSD系统上都可以使用，但是要编写一个适用于这样一个多语言脚本的shebang行是很困难的.

最后，在shell中编写完整的JSON解析器将是一个足够重要的依赖项，因此您可能只需要使用JQ或Python之类的现有依赖项即可.要实现一个好的实现，它不会是一行代码，甚至不是很小的五行代码片段.

Why not use awk, sed, or grep?

可以使用这些工具以已知的形状和格式(例如每行一个键)快速提取JSON.在其他答案中，有几个例子可以说明这一点.

However, these tools are designed for line based or record based formats; they are not designed for recursive parsing of matched delimiters with possible escape characters.

So these quick and dirty solutions using awk/sed/grep are likely to be fragile, and break if some aspect of the input format changes, such as collapsing whitespace, or adding additional levels of nesting to the JSON objects, or an escaped quote within a string. A solution that is robust enough to handle all JSON input without breaking will also be fairly large and complex, and so not too much different than adding another dependency on jq or Python.

我以前不得不处理由于shell脚本中的输入解析不佳而导致大量客户数据被删除的情况，所以我从不推荐可能以这种方式脆弱的又快又脏的方法.如果您正在进行一些一次性处理，请参阅其他答案以获得建议，但我仍然强烈建议您只使用现有的经过测试的JSON解析器.

Historical notes

This answer originally recommended jsawk, which should still work, but is a little more cumbersome to use than jq, and depends on a standalone JavaScript interpreter being installed which is less common than a Python interpreter, so the above answers are probably preferable:

curl -s 'https://api.github.com/users/lambda' | jsawk -a 'return this.name'

这个答案最初也使用了问题中的Twitter API，但该API不再工作，因此很难复制示例进行测试，而新的Twitter API需要API密钥，因此我转而使用GitHub API，它可以在没有API密钥的情况下轻松使用.最初问题的第一个答案是:

curl 'http://twitter.com/users/username.json' | jq -r '.text'

使用 Unix 工具解析 JSON

推荐答案

常见问题解答

为什么不是纯粹的shell 解决方案呢？

Why not use awk, sed, or grep?

Historical notes

Json相关问答推荐

Swift解码错误类型与`Bool`type不一致

从Razor Pages的AJAX Json呈现DataTables问题.Net GET

PowerShell：将Invoke-WebRequest与变量一起使用

由于无效的UTF-8开始字节0xa0，JSON被拒绝，但编码似乎有效

使用JQ将对象数组转换为平面数组

NoneType 对象的 Python 类型错误

如何在 terraform 输出中打印一组用户信息

如何使用 SQL Server 将 json 存储为字符串的列分解/规范化为行和列？

如何在 Dart 中与多个 map (字典)相交

通过sql查询读取嵌套Json

使用 System.Text.Json 序列化记录成员

使用 json 值过滤 Django 模型

UTF-8 字符编码之战 json_encode()

JSON Schema 与 XML Schema 的比较及其future

将 ES6 类对象序列化为 JSON

如何在 Eclipse 中安装和使用 JSON 编辑器？

反序列化大型 json 对象的 JsonMaxLength 异常

将json字符反序列化为枚举

在 .NET 中缩小缩进的 JSON 字符串

将多个值存储在json中的单个键中