I'd like to parse lines of text to multiple columns and lines in polars, with user defined function.
import polars as pl
df = pl.DataFrame({'file': ['aaa.txt','bbb.txt'], 'text': ['my little pony, your big pony','apple+banana, cake+coke']})
def myfunc(p_str: str) -> list:
res = []
for line in p_str.split(','):
x = line.strip().split(' ')
res.append({f'word{e+1}': w for e, w in enumerate(x)})
return res
如果我只运行一个测试,就可以创建一个字典列表:
myfunc(df['text'][0])
[{'word1': 'my', 'word2': 'little', 'word3': 'pony'},
{'word1': 'your', 'word2': 'big', 'word3': 'pony'}]
甚至创建它的数据帧也很容易:
pl.DataFrame(myfunc(df['text'][0]))
但是try 执行map_Elements()失败了:
(df.with_columns(pl.struct(['text']).map_elements(lambda x: myfunc(x['text'])).alias('aaa')
)
)
线程‘’在crates/polars-core/src/chunked_array/builder/list/anonymous.rs:161:69:出现panic
对Err
值调用Result::unwrap()
:InvalidOperation(ErrString("不可能连接不同数据类型的array.")
-在从Python获取PanicException后,Py03恢复死机.--
作为结果,我希望是这样的:
file word1 word2 word3
aaa.txt my little pony
aaa.txt your big pony
bbb.txt apple+banana
bbb.txt cake+coke
有什么主意吗?