给出雅典娜的一个原始表,它类似于下面这个人为设计的示例:
Original Table:个
a | b | cmsmessage | c |
---|---|---|---|
... | ... | {"asset": {"metadata": {"item": [{"key": "author", "value": "Rob"}, { "key": "id", "value": "123"}, {"key": "version", "value": "1"}]}}} | ... |
... | ... | {"asset": {"metadata": {"item": [{ "key": "id", "value": "456"}, {"key": "author", "value": "John"}, {"key": "version", "value": "3"}]}}} | ... |
... | ... | {"asset": {"metadata": {"item": [{"key": "version", "value": "2"}, {"key": "author", "value": "Sally"}, {"key": "id", "value": "789" }]}}} | ... |
我试图从cmsmessage
列中的字符串化JSON创建一个视图:
Desired Resultant View:
id | author | version |
---|---|---|
123 | Rob | 1 |
456 | John | 3 |
789 | Sally | 3 |
正如您在"Desired Resultant View"中看到的,我实际上希望根据key
的值从item
数组的每个对象中提取数据.
[{"key": "author", "value": "Rob"}, { "key": "id", "value": "123"}, {"key": "version", "value": "1"}]
个
我见过/try 过按索引访问数组中的项的各种示例,例如[0]
,如examples in Athena docs.但是,给定原始表的第cmsmessage
列中的json/字符串,数组中的索引位置可能会有所不同.
My Failed attempt:个
下面显示了我失败的try :
WITH dataset AS
(
SELECT *
FROM (VALUES
('{"asset": {"metadata": {"item": [{"key": "author", "value": "Rob"}, { "key": "id", "value": "123"}, {"key": "version", "value": "1"}]}}}'),
('{"asset": {"metadata": {"item": [{ "key": "id", "value": "456"}, {"key": "author", "value": "John"}, {"key": "version", "value": "3"}]}}}'),
('{"asset": {"metadata": {"item": [{"key": "version", "value": "2"}, {"key": "author", "value": "Sally"}, {"key": "id", "value": "789" }]}}}')
) AS t (cmsmessage)
)
SELECT
json_extract_scalar(objArray, '$.key') as _keys,
json_extract_scalar(objArray, '$.value') as _values
FROM dataset
CROSS JOIN UNNEST(CAST(json_extract(cmsmessage, '$.asset.metadata.item') as array(json))) as t (objArray)