数据框如下面屏幕截图的左侧.
我想按名称分组,找出缺少哪些数字(与[1,2,3,4,5]相比).
理想的输出是屏幕截图的右侧.
我试过以下代码.但GroupBy后面的"星星数"列被视为字符串列表.所以它不进行比较.
有什么办法可以帮我修吗?非常感谢.
import pandas as pd
from io import StringIO
csvfile = StringIO("""
Name Number of stars
Benjamin 1,3,2,1,2
Benjamin 2,5,1,3
Emma 2,1,1,4,4,2
Ethan 2,5,4
Emma 2,2,2
Ethan 5,4,4,1,1,1
Olivia 4,1,3,5""")
df = pd.read_csv(csvfile, sep = '\t', engine='python')
df_1 = df.groupby('Name')['Number of stars'].apply(list)
df_1 = df_1.to_frame().reset_index()
df_1['all stars'] = pd.Series([list(range(1,6)) for x in range(len(df_1.index))])
df_1['diff'] = df_1['all stars'].map(set) - df_1['Number of stars'].map(set)
print (df_1)
输出:
Name Number of stars all stars diff
0 Benjamin [1,3,2,1,2, 2,5,1,3] [1, 2, 3, 4, 5] {1, 2, 3, 4, 5}
1 Emma [2,1,1,4,4,2, 2,2,2] [1, 2, 3, 4, 5] {1, 2, 3, 4, 5}
2 Ethan [2,5,4, 5,4,4,1,1,1] [1, 2, 3, 4, 5] {1, 2, 3, 4, 5}
3 Olivia [4,1,3,5] [1, 2, 3, 4, 5] {1, 2, 3, 4, 5}