Python 动态拆分 DataFrame 的列并将其存储为新列

发布于05月10日

我正在try 拆分一列，并将最后一个"u"之后的部分存储为新列.

import pandas as pd
import numpy as np
names= ['John', 'Jane', 'Brian','Suzan', 'John']
expertise = ['primary_chat', 'follow_email', 'repeat_chat', 'primary_video_chat', 'tech_chat']

data  = list(zip(names,expertise))
df = pd.DataFrame(data, columns=['Name', 'Communication'])
df

Output

    Name       Communication
0   John        primary_chat
1   Jane        follow_email
2  Brian         repeat_chat
3  Suzan  primary_video_chat
4   John           tech_chat

通过拆分列添加新列时:

df['Platform'] = df['Communication'].str.split('_', expand=True)[1]
df

Output

    Name       Communication Platform
0   John        primary_chat     chat
1   Jane        follow_email    email
2  Brian         repeat_chat     chat
3  Suzan  primary_video_chat    video
4   John           tech_chat     chat

但问题是，[1]占据了分割的第二部分.当我们只有一个"_u"时，这不是问题，第二部分是我们需要的.但是当你有两个像第三个(Suzan)一样的"uu"时，[1]会让你得到短语"视频"而不是"邮箱"，我们应该在那里有[2]索引.

我们可以动态地获取"s"的数量并使用这个值，但是，下面的代码即使输出正确的值，当我在[]中使用它作为索引值时，我会得到一个错误.

df['Communication'].str.count('_')

0    1
1    1
2    1
3    2
4    1
Name: Communication, dtype: int64

给了我正确的"x"数.但是，当我在前一行代码中使用split()并创建新列时，我得到了一个错误

df['Platform'] = df['Communication'].str.split('_', expand=True)[df['Agent Expertise'].str.count('_')]

但是我犯了一个错误..

也许我应该try 使用apply()和lambda，但我想知道是否有办法解决这个问题..

df['Platform'] = df['Communication'].str.rsplit('_', n=1).str[1] print(df) # Output Name Communication Platform 0 John primary_chat chat 1 Jane follow_email email 2 Brian repeat_chat chat 3 Suzan primary_video_chat chat 4 John tech_chat chat

Python 动态拆分 DataFrame 的列并将其存储为新列

推荐答案

Python相关问答推荐

递归访问嵌套字典中的元素值

Python Tkinter为特定样式调整所有ttkbootstrap或ttk Button填充的大小，适用于所有主题

重置PD帧中的值

在方法中设置属性值时，如何处理语句不可达[Unreacable]"；的问题？

如何检测鼠标/键盘的空闲时间，而不是其他输入设备？

Pandas：填充行并删除重复项，但保留不同的值

为什么在FastAPI中创建与数据库的连接时需要使用生成器？

Python日志(log)模块如何在将消息发送到父日志(log)记录器之前向消息添加类实例变量

如果有2个或3个，则从pandas列中删除空格

使用polars. pivot()旋转一个框架(类似于R中的pivot_longer)

高效生成累积式三角矩阵

python3中np. divide(x，y)和x/y有什么区别？'

启动线程时，Python键盘模块冻结/不工作

通过对列的其余部分进行采样，在Polars DataFrame中填充_null`？

PyTorch变压器编码器中的填充掩码问题

排除NRRD文件中的多切片卷加载问题

在FastAPI/Starlette中使用WebSockets时如何运行后台任务？

通过外键Django创建从一个字段到其他字段的 Select 列表

有理由将基于Django职业的观点个人化吗？

在极坐标中添加列总计行