使用DuckDB,可以像这样展开 struct :
SELECT
parent_id,
top_level_struct.*
FROM
arrow_table AS root
我在try 解包嵌套 struct 时遇到了一个问题(参见下面的示例代码):
SELECT
parent_id,
my_list_unnested.my_list.* EXCLUDE (list_struct), -- not working
my_list_unnested.my_list.list_struct.* -- not working
FROM
arrow_table AS root,
UNNEST(root.my_list) AS my_list_unnested
Code to reproduce the issue个
import duckdb
import pyarrow as pa
test_records = [
{
"parent_id": 123,
"top_level_struct": {
"top_level_hello": "World",
"top_level_foo": "bar",
"top_level_baz": "qux"
},
"my_list": [
{
"item_id": 123,
"list_struct": {
"list_hello": "World",
"list_foo": "bar",
"list_baz": "qux"
}
}
]
}
]
arrow_table = pa.Table.from_pylist(test_records)
# this is working
WORKING_SQL = """
SELECT
parent_id,
top_level_struct.*
FROM
arrow_table AS root
"""
df = duckdb.sql(WORKING_SQL)
# this is not working
NOT_WORKING_SQL = """
SELECT
parent_id,
my_list_unnested.my_list.* EXCLUDE (list_struct), -- not working
my_list_unnested.my_list.list_struct.* -- not working
FROM
arrow_table AS root,
UNNEST(root.my_list) AS my_list_unnested
"""
df = duckdb.sql(NOT_WORKING_SQL)
# Gives
# duckdb.duckdb.ParserException: Parser Error: syntax error at or near "*"
What I try to achieve个
我正在try 将上述记录展平为如下所示的 struct ,由于我正在处理的实际 case ,我需要使用DuckDB:
desired_structure = [
{
"parent_id": 123,
"top_level_hello": "World",
"top_level_foo": "bar",
"top_level_baz": "qux",
"item_id": 123,
"list_hello": "World",
"list_foo": "bar",
"list_baz": "qux"
}
]
Environment/versions个
- DuckDB:0.9.2
- 巨 Python :3.10.12
- Ubuntu 22.04