我有一个数据帧,我想按GROUP BY并将列的字符串连接在一起.所以类似于下面的内容.
df = pd.DataFrame({
'id': [1, 1, 2, 2, 3, 3],
'txt': ['sth', 'sth else', 'sth', 'one more thing', 'sth else', 'sth else'],
'status': ['open', 'open', 'closed', 'open', 'open', 'open']})
df.assign(output=
df.where(df.status=='open')
.groupby(df.id)
.txt.transform(lambda col: ', '.join(col.fillna(''))))
这给了我这个
id txt status output
0 1 sth open sth, sth else
1 1 sth else open sth, sth else
2 2 sth closed , one more thing
3 2 one more thing open , one more thing
4 3 sth else open sth else, sth else
5 3 sth else open sth else, sth else
有没有办法
- 没有重复的值(如第4行和第5行)
- 如果状态为"已关闭",则不使用前导逗号(如第2行和第3行) 这样我就能得到
id txt status output
0 1 sth open sth, sth else
1 1 sth else open sth, sth else
2 2 sth closed one more thing
3 2 one more thing open one more thing
4 3 sth else open sth else
5 3 sth else open sth else