我得到了一个有1列的表,它是字符串类型,但里面看起来像是json类型.
该值如下所示
包含值的'old_id'列
[{"name":"Entitas Penugasan","id":"6415","value":"HIJRA"},
{"name":"Function","id":"10594","value":"People & Culture"},
{"name":"Unit","id":"10595","value":"Organization Development"},
{"name":"Tribe","id":"10602","value":"Shared Service"}
]
'new_id'列带值
[{"name":"Entitas Penugasan","id":"6415","value":"AFS"},
{"name":"Function","id":"10594","value":"Finance"},
{"name":"Unit","id":"10595","value":"Finance Operations"},
{"name":"Tribe","id":"10602","value":"Commercial"}
]
我需要SQL Athena Query从那些json列中创建列old_name、old_id、old_value、new_name、new_id、new_value
我试过用
REGEXP_EXTRACT(old_id, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 1) AS old_name,
REGEXP_EXTRACT(new_id, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 1) AS new_name,
REGEXP_EXTRACT(old_id, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 2) AS old_id,
REGEXP_EXTRACT(new_id, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 2) AS new_id,
REGEXP_EXTRACT(old_id, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 3) AS old_value,
REGEXP_EXTRACT(new_id, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 3) AS new_value
但它只生成1行,即使在列中,它也显示4个"数组"
The query should generate 4 rows, looks like below
id | old_name | new_name | old_id | new_id | old_value | new_value |
---|---|---|---|---|---|---|
A | Entitas Penugasan | Entitas Penugasan | 6415 | 6415 | HIJRA | AFS |
A | Function | Function | 10594 | 10594 | People & Culture | Finance |
A | Unit | Unit | 10595 | 10595 | Organization Development | Finance Operations |
A | Tribe | Tribe | 10602 | 10602 | Shared Service | Commercial |
有没有办法在SQL Athena中做到这一点?
编辑:我在下面的查询中取得了一些进展
with raw_data as(
select id, user_id, old_custom_fields, new_custom_fields
from my_table
where
-- new_custom_fields <> '' and new_custom_fields<> 'None' and new_custom_fields is not null and
id in (A)
),
splitted_data as (
SELECT id, user_id,
split(old_custom_fields, '},{') AS old_custom_field_id,
split(new_custom_fields, '},{') AS new_custom_field_id
FROM my_table
),
old_custom_field_id_unnest as (
SELECT
*
from splitted_data
CROSS JOIN UNNEST(old_custom_field_id) AS t (_old_custom_fields)
),
new_custom_field_id_unnest as (
SELECT
*
from splitted_data
CROSS JOIN UNNEST(new_custom_field_id) AS t (_new_custom_fields)
),
old_custom_field_cleaned as (
select id, old_custom_field_id,
REGEXP_EXTRACT(_old_custom_fields, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 1) AS old_name,
REGEXP_EXTRACT(_old_custom_fields, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 2) AS old_id,
REGEXP_EXTRACT(_old_custom_fields, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 3) AS old_value
from old_custom_field_id_unnest
),
new_custom_field_cleaned as (
select id, new_custom_field_id,
REGEXP_EXTRACT(_new_custom_fields, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 1) AS new_name,
REGEXP_EXTRACT(_new_custom_fields, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 2) AS new_id,
REGEXP_EXTRACT(_new_custom_fields, '"name":"(.*?)","id":"(.*?)","value":"(.*?)"', 3) AS new_value
from new_custom_field_id_unnest
)
select oc.id, old_name, new_name,
old_id,new_id,
old_value,new_value
from old_custom_field_cleaned oc
join new_custom_field_cleaned nc on oc.id = nc.id
But this leads to duplicated rows, now I have 16 rows due to the join, still needs help in removing the unneeded rows