我有一个txt文件,其中的值是通过递归调用以下命令获得的:gsutil ls -r gs://bucket-test/** | while IFS= read -r key; do gsutil stat $key; done,如下所示:

gs://bucket-test/4e123978-8eed-43ae-f521-8fba54c704ea.zip:
    Creation time:          Wed, 21 Dec 2022 10:39:27 GMT
    Update time:            Wed, 21 Dec 2022 10:39:27 GMT
    Storage class:          STANDARD
    Content-Length:         0
    Content-Type:           application/zip
    Hash (crc32c):          AAAAAA==
    Hash (md5):             1B2M2Y8AsgTpgAmY7PhCfg==
    ETag:                   CM30q9XCivwCEAE=
    Generation:             1671619167320653
    Metageneration:         1
gs://bucket-test/GKiSQMZ5rAqrSWwur/uploads/GENERAL/SNrQD97nzQN9eDLeA/AAZYefiL5CT8pxe4L:
    Creation time:          Mon, 10 Apr 2023 19:09:41 GMT
    Update time:            Mon, 10 Apr 2023 19:09:41 GMT
    Storage class:          STANDARD
    Content-Disposition:    inline; filename=James_INGREDIENTS_A3.pdf
    Content-Length:         4381797
    Content-Type:           application/pdf
    Hash (crc32c):          GOzitA==
    Hash (md5):             eUSLC/z70gjDB2WQKIPOuQ==
    ETag:                   CLGPvu+BoP4CEAE=
    Generation:             1681153781106609
    Metageneration:         1
gs://bucket-test/prova.pdf:
    Creation time:          Mon, 08 May 2023 15:37:26 GMT
    Update time:            Mon, 08 May 2023 15:40:12 GMT
    Storage class:          STANDARD
    Content-Disposition:    inline; filename=James_KEY_VISUAL_A3.pdf
    Content-Language:       ace
    Content-Length:         15407
    Content-Type:           application/pdf
    Metadata:               
        meta-1:             prova 1
        meta-2:             prova 2
    Hash (crc32c):          ZIrHPA==
    Hash (md5):             oZbD+S8y35spkNozW3hUDA==
    ETag:                   CNDj09OG5v4CEAM=
    Generation:             1683560246604240
    Metageneration:         3

我需要将输出转换为json格式,按前导空格拆分,并将每个组第一行上的值分配给"key"字段,然后可能会有子字段,例如在"METADATA"值下:

{
  "Key": "gs://bucket-test/4e123978-8eed-43ae-f521-8fba54c704ea.zip",
  "Creation time": "Wed, 21 Dec 2022 10:39:27 GMT",
  "Update time": "Wed, 21 Dec 2022 10:39:27 GMT",
  "Storage class": "STANDARD",
  "Content-Length": "0",
  "Content-Type": "application/zip",
  "Hash (crc32c)": "AAAAAA==",
  "Hash (md5)": "1B2M2Y8AsgTpgAmY7PhCfg==",
  "ETag": "CM30q9XCivwCEAE=",
  "Generation": "1671619167320653",
  "Metageneration": "1"
},
{
  "Key": "gs://bucket-test/GKiSQMZ5rAqrSWwur/uploads/GENERAL/SNrQD97nzQN9eDLeA/AAZYefiL5CT8pxe4L",
  "Creation time": "Mon, 10 Apr 2023 19:09:41 GMT",
  "Update time": "Mon, 10 Apr 2023 19:09:41 GMT",
  "Storage class": "STANDARD",
  "Content-Disposition": "inline; filename=James_INGREDIENTS_A3.pdf",
  "Content-Length": "4381797",
  "Content-Type": "application/pdf",
  "Hash (crc32c)": "GOzitA==",
  "Hash (md5)": "eUSLC/z70gjDB2WQKIPOuQ==",
  "ETag": "CLGPvu+BoP4CEAE=",
  "Generation": "1681153781106609",
  "Metageneration": "1"
},
{
  "Key": "gs://bucket-test/prova.pdf",
  "Creation time": "Mon, 08 May 2023 15:37:26 GMT",
  "Update time": "Mon, 08 May 2023 15:40:12 GMT",
  "Storage class": "STANDARD",
  "Content-Disposition": "inline; filename=James_KEY_VISUAL_A3.pdf",
  "Content-Language": "ace",
  "Content-Length": "15407",
  "Content-Type": "application/pdf",
  "Metadata": {
    "meta-1": "prova 1",
    "meta-2": "prova 2"
  },
  "Hash (crc32c)": "ZIrHPA==",
  "Hash (md5)": "oZbD+S8y35spkNozW3hUDA==",
  "ETag": "CNDj09OG5v4CEAM=",
  "Generation": "1683560246604240",
  "Metageneration": "3"
}

我try 对唯一的组使用此命令,但没有成功: gsutil stat gs://bucket-test/prova.pdf | printf %s "$(cat)" | jq -R -s 'split("\n") | map({key: split(": ")[0], value: split(": ")[1]})'

将json转换为数组:

[
  {
    "key": "gs://spin8-test/prova.pdf:",
    "value": null
  },
  {
    "key": "    Creation time",
    "value": "         Mon, 08 May 2023 15:37:26 GMT"
  },
  {
    "key": "    Update time",
    "value": "           Mon, 08 May 2023 15:40:12 GMT"
  },
  {
    "key": "    Storage class",
    "value": "         STANDARD"
  },
  {
    "key": "    Content-Disposition",
    "value": "   inline; filename=James_KEY_VISUAL_A3.pdf"
  },
  {
    "key": "    Content-Language",
    "value": "      ace"
  },
  {
    "key": "    Content-Length",
    "value": "        15407"
  },
  {
    "key": "    Content-Type",
    "value": "          application/pdf"
  },
  {
    "key": "    Metadata",
    "value": "              "
  },
  {
    "key": "        meta-1",
    "value": "            prova 1"
  },
  {
    "key": "        meta-2",
    "value": "            prova 2"
  },
  {
    "key": "    Hash (crc32c)",
    "value": "         ZIrHPA=="
  },
  {
    "key": "    Hash (md5)",
    "value": "            oZbD+S8y35spkNozW3hUDA=="
  },
  {
    "key": "    ETag",
    "value": "                  CNDj09OG5v4CEAM="
  },
  {
    "key": "    Generation",
    "value": "            1683560246604240"
  },
  {
    "key": "    Metageneration",
    "value": "        3"
  }
]

有什么建议吗?谢谢

推荐答案

使用JQ,您可以使用-R标志读入原始文本,并使用reduce遍历各行.从空数组[]开始,然后基于缩进添加新项、附加到last个1、或附加到last个人的.Metadata字段.分别使用matchcapture的正则表达式判断缩进和解析行内容:

jq -Rn '
  reduce (inputs | {
    ind: match("^\\s*").length,
    cap: capture("\\s*(?<key>.*):(\\s+(?<value>.*))?$")
  }) as {$ind, $cap} ([];
    if $ind == 0 then . + [$cap | {key}]
    elif $ind == 4 then last += ([$cap | select(.key == "Metadata").value = {}] | from_entries)
    elif $ind == 8 then last.Metadata += ([$cap] | from_entries)
    else . end
  )
'

这将创建一个有效的JSON数组(因为没有方括号,但项之间有逗号,它就不是有效的JSON):

[
  {
    "key": "gs://bucket-test/4e123978-8eed-43ae-f521-8fba54c704ea.zip",
    "Creation time": "Wed, 21 Dec 2022 10:39:27 GMT",
    "Update time": "Wed, 21 Dec 2022 10:39:27 GMT",
    "Storage class": "STANDARD",
    "Content-Length": "0",
    "Content-Type": "application/zip",
    "Hash (crc32c)": "AAAAAA==",
    "Hash (md5)": "1B2M2Y8AsgTpgAmY7PhCfg==",
    "ETag": "CM30q9XCivwCEAE=",
    "Generation": "1671619167320653",
    "Metageneration": "1"
  },
  {
    "key": "gs://bucket-test/GKiSQMZ5rAqrSWwur/uploads/GENERAL/SNrQD97nzQN9eDLeA/AAZYefiL5CT8pxe4L",
    "Creation time": "Mon, 10 Apr 2023 19:09:41 GMT",
    "Update time": "Mon, 10 Apr 2023 19:09:41 GMT",
    "Storage class": "STANDARD",
    "Content-Disposition": "inline; filename=James_INGREDIENTS_A3.pdf",
    "Content-Length": "4381797",
    "Content-Type": "application/pdf",
    "Hash (crc32c)": "GOzitA==",
    "Hash (md5)": "eUSLC/z70gjDB2WQKIPOuQ==",
    "ETag": "CLGPvu+BoP4CEAE=",
    "Generation": "1681153781106609",
    "Metageneration": "1"
  },
  {
    "key": "gs://bucket-test/prova.pdf",
    "Creation time": "Mon, 08 May 2023 15:37:26 GMT",
    "Update time": "Mon, 08 May 2023 15:40:12 GMT",
    "Storage class": "STANDARD",
    "Content-Disposition": "inline; filename=James_KEY_VISUAL_A3.pdf",
    "Content-Language": "ace",
    "Content-Length": "15407",
    "Content-Type": "application/pdf",
    "Metadata": {
      "meta-1": "prova 1",
      "meta-2": "prova 2"
    },
    "Hash (crc32c)": "ZIrHPA==",
    "Hash (md5)": "oZbD+S8y35spkNozW3hUDA==",
    "ETag": "CNDj09OG5v4CEAM=",
    "Generation": "1683560246604240",
    "Metageneration": "3"
  }
]

Demo

Json相关问答推荐

JQ如何获取特定子元素的所有父母

无法使用Jolt变换在嵌套的JSON中提取值

当 JSON 字段名称有空格时,ABAP 中的 JSON 反序列化

使用 jolt 变换压平具有公共列 JSON 的复杂嵌套

使用 jq 如何更改键的值?

将 GEOSwift.JSON 转换为 Swift 中的 struct

如何使用 jq 在连续的 json 记录流上调用操作

如何在 Postman 中匹配 json 响应中的内容?并可视化

jq:来自嵌套 JSON 的映射

Vue 3如何将参数作为json发送到axios get

如果 JSON 对象包含列表中的子字符串,则丢弃它们

如何使用 gson 调用默认反序列化

使用 gson 反序列化对象的特定 JSON 字段

使用适用于 Python 的 Google API - 我从哪里获取 client_secrets.json 文件?

严重:找不到媒体类型 = 应用程序/json、类型 = 类 com.jersey.jaxb.Todo、通用类型 = 类 com.jersey.jaxb.Todo 的 MessageBodyWriter

获取一个数字的 PHP 对象属性

IE10/11 Ajax XHR 错误 - SCRIPT7002:XMLHttpRequest:网络错误 0x2ef3

强制 JSON.NET 在序列化 DateTime 时包含毫秒(即使 ms 组件为零)

如何在dart Flutter 中将json字符串转换为json对象?

SCRIPT5009:JSON未定义