Python 如何在Polars中处理用户自定义函数的多行结果

发布于03月08日

I'd like to parse lines of text to multiple columns and lines in polars, with user defined function.

import polars as pl
df = pl.DataFrame({'file': ['aaa.txt','bbb.txt'], 'text': ['my little pony, your big pony','apple+banana, cake+coke']})

def myfunc(p_str: str) -> list:
    res = []
    for line in p_str.split(','):
        x = line.strip().split(' ')
        res.append({f'word{e+1}': w for e, w in enumerate(x)})
    return res

如果我只运行一个测试，就可以创建一个字典列表:

myfunc(df['text'][0])

[{'word1': 'my', 'word2': 'little', 'word3': 'pony'},
 {'word1': 'your', 'word2': 'big', 'word3': 'pony'}]

甚至创建它的数据帧也很容易:

pl.DataFrame(myfunc(df['text'][0]))

但是try 执行map_Elements()失败了:

(df.with_columns(pl.struct(['text']).map_elements(lambda x: myfunc(x['text'])).alias('aaa')
                 )
 )

线程‘’在crates/polars-core/src/chunked_array/builder/list/anonymous.rs:161:69:出现panic 对Err值调用Result::unwrap():InvalidOperation(ErrString("不可能连接不同数据类型的array.") -在从Python获取PanicException后，Py03恢复死机.--

作为结果，我希望是这样的:

file     word1         word2   word3
aaa.txt  my            little  pony
aaa.txt  your          big     pony
bbb.txt  apple+banana
bbb.txt  cake+coke

有什么主意吗？

import random import polars as pl def ufunc(x: int): return [ {f"word_{i}": "elephant" for i in range(random.randint(1, 4))} for _ in range(random.randint(1, 4)) ] pl.DataFrame({"id": [1, 2]}).with_columns(pl.col("id").map_elements(ufunc))

shape: (7, 4) ┌──────────┬──────────┬──────────┬──────────┐ │ word_0 ┆ word_1 ┆ word_2 ┆ word_3 │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ str │ ╞══════════╪══════════╪══════════╪══════════╡ │ elephant ┆ elephant ┆ null ┆ null │ │ elephant ┆ elephant ┆ elephant ┆ elephant │ │ elephant ┆ null ┆ null ┆ null │ │ elephant ┆ null ┆ null ┆ null │ │ elephant ┆ elephant ┆ null ┆ null │ │ elephant ┆ elephant ┆ elephant ┆ null │ │ elephant ┆ null ┆ null ┆ null │ └──────────┴──────────┴──────────┴──────────┘

Python 如何在Polars中处理用户自定义函数的多行结果

推荐答案

Python相关问答推荐

替换字符串中的多个重叠子字符串

删除最后一个pip安装的包

如何列举Pandigital Prime Set

使用groupby Pandas的一些操作

如何更改分组条形图中条形图的 colored颜色？

如何创建一个缓冲区周围的一行与manim？

isinstance()在使用dill.dump和dill.load后，对列表中包含的对象失败

从Windows Python脚本在WSL上运行Linux应用程序

无论输入分辨率如何，稳定扩散管道始终输出512 * 512张图像

在单次扫描中创建列表

Maya Python脚本将纹理应用于所有对象，而不是选定对象

为什么调用函数的值和次数不同，递归在代码中是如何工作的？

ModuleNotFoundError：没有模块名为x时try 运行我的代码''

在电影中向西北方向对齐""

Pandas数据框上的滚动平均值，其中平均值的中心基于另一数据框的时间

操作布尔值的Series时出现索引问题

为什么Visual Studio Code说我的代码在使用Pandas concat函数后无法访问？

合并Pandas中的数据帧，但处理不存在的列

生产者/消费者-Queue.get by list

Python键盘模块不会立即检测到按键