Python 将Pandas 函数应用于列以创建多个新列

发布于04月27日

How to do this in pandas:

I have a function extract_text_features on a single text column, returning multiple output columns. Specifically, the function returns 6 values.

The function works, however there doesn't seem to be any proper return type (pandas DataFrame/ numpy array/ Python list) such that the output can get correctly assigned df.ix[: ,10:16] = df.textcol.map(extract_text_features)

So I think I need to drop back to iterating with df.iterrows(), as per this?

UPDATE: Iterating with df.iterrows() is at least 20x slower, so I surrendered and split out the function into six distinct .map(lambda ...) calls.

UPDATE 2: this question was asked back around v0.11.0, before the useability df.apply was improved or df.assign() was added in v0.16. Hence much of the question and answers are not too relevant.

推荐答案

Building off of user1827356 's answer, you can do the assignment in one pass using df.merge:

df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})), 
    left_index=True, right_index=True)

    textcol  feature1  feature2
0  0.772692  1.772692 -0.227308
1  0.857210  1.857210 -0.142790
2  0.065639  1.065639 -0.934361
3  0.819160  1.819160 -0.180840
4  0.088212  1.088212 -0.911788

EDIT: Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !

Python 将Pandas 函数应用于列以创建多个新列

推荐答案

Python相关问答推荐

如何让剧作家等待Python中出现特定cookie(然后返回它)？

输出中带有南的亚麻神经网络

为什么抓取的HTML与浏览器判断的元素不同？

从spaCy的句子中提取日期

如何从需要点击/切换的网页中提取表格？

Python Pandas获取层次路径直到顶层管理

Tkinter菜单自发添加额外项目

python中csv. Dictreader. fieldname的类型是什么？'

为什么在FastAPI中创建与数据库的连接时需要使用生成器？

Python将一个列值分割成多个列，并保持其余列相同

如何反转一个框架中列的值？

如何在Python中将超链接添加到PDF中每个页面的顶部？

有没有办法让Re.Sub报告它所做的每一次替换？

上传文件并使用Panda打开时的Flask 问题

如何在不不断遇到ChromeDriver版本错误的情况下使用Selify？

Pandas ，快速从词典栏中提取信息到新栏

合并Pandas中的数据帧，但处理不存在的列

生产者/消费者-Queue.get by list

在FastAPI/Starlette中使用WebSockets时如何运行后台任务？

nameError_C未定义