我有以下代码:
import polars as pl
from typing import NamedTuple
class Event(NamedTuple):
name: str
description: str
def event_table(num) -> list[Event]:
events = []
for i in range(num):
events.append(Event("name", "description"))
return events
data = {"events": [1, 2]}
df = pl.DataFrame(data).select(events=pl.col("events").map_elements(event_table))
"""
shape: (2, 1)
┌───────────────────────────────────┐
│ events │
│ --- │
│ list[struct[2]] │
╞═══════════════════════════════════╡
│ [{"name","description"}] │
│ [{"name","description"}, {"name"… │
└───────────────────────────────────┘
"""
但是如果第一个列表是空的,我得到的是list[list[str]]
而不是我需要的list[struct[2]]
:
data = {"events": [0, 1, 2]}
df = pl.DataFrame(data).select(events=pl.col("events").map_elements(event_table))
print(df)
"""
shape: (3, 1)
┌───────────────────────────────────┐
│ events │
│ --- │
│ list[list[str]] │
╞═══════════════════════════════════╡
│ [] │
│ [["name", "description"]] │
│ [["name", "description"], ["name… │
└───────────────────────────────────┘
"""
我试着使用map_elements
函数的return_dtype
,比如:
data = {"events": [0, 1, 2]}
df = pl.DataFrame(data).select(
events=pl.col("events").map_elements(
event_table,
return_dtype=pl.List(pl.Struct({"name": pl.String, "description": pl.String})),
)
)
但这失败了:
Traceback (most recent call last):
File "script.py", line 18, in <module>
df = pl.DataFrame(data).select(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/polars/dataframe/frame.py", line 8193, in select
return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1943, in collect
return wrap_df(ldf.collect())
^^^^^^^^^^^^^
polars.exceptions.SchemaError: expected output type 'List(Struct([Field { name: "name", dtype: String }, Field { name: "description", dtype: String }]))', got 'List(List(String))'; set `return_dtype` to the proper datatype
我怎么才能让它工作?如果第一个列表是空的,我需要这个列的类型是list[struct[2]]
事件.