我正在努力获取按进度统计活动的学生专栏.

Data looks like enter image description here

STUDENT_ID STUDENT_ACTIVITY_SESSION_ID NODE_NAME   ACTIVITY_NAME   prog_level
FredID  gobbledeegook1  Node1   MyActivity1 pass
FredID  gobbledeegook2  Node1   MyActivity1 pass
FredID  gobbledeegook3  Node2   MyActivity2 pass
JaniceID    gobbledeegook4  Node3   MyActivity3 stay
JaniceID    gobbledeegook5  Node3   MyActivity3 stay
JaniceID    gobbledeegook5  Node3   MyActivity3 fail

Here is what I want: enter image description here

STUDENT_ID attempts_pass   attempts_fail   attempts_stay
FredID  3       
JaniceID        1   2
  1. 我try 遍历变量名,以便变量名是自动的.我希望每一行都是一个学生ID,而计数是一列
def std_attempts_by_prog_level(df):
    dict_fields = {}
    df_by_prog_level = df.groupby('prog_level')['STUDENT_ACTIVITY_SESSION_ID']
    for name, group in df_by_prog_level:
        x = group.count() 
        dict_fields["attempts_" + name] = x

    return pd.Series(dict_fields)     
  
df.groupby('STUDENT_ID').apply(std_attempts_by_prog_level).reset_index()

结果:

STUDENT_ID level_1 0
0   Fred    attempts_cancel 104
1   Fred    attempts_fail   96
2   Fred    attempts_in_progress    30

...所以这将需要旋转和摆弄,所以我试着从旋转的方法来处理它

  1. 轴心法和手动命名字段:生成的多索引不会让我容易地与其他学生指标合并回go
df_temp=df.groupby(['STUDENT_ID', 'prog_level'],as_index=False)['STUDENT_ACTIVITY_SESSION_ID'].count().pivot(index='STUDENT_ID', columns='prog_level').rename({'cancel':'attempts_cancel', 'fail':'attempts_fail', 'in_progress':'attempts_in_progress', 'pass':'attempts_pass'}, axis=1)

print(df_temp.columns)

结果:

MultiIndex([('STUDENT_ACTIVITY_SESSION_ID',      'attempts_cancel'),
            ('STUDENT_ACTIVITY_SESSION_ID',        'attempts_fail'),
            ('STUDENT_ACTIVITY_SESSION_ID', 'attempts_in_progress'),
            ('STUDENT_ACTIVITY_SESSION_ID',        'attempts_pass')],
           names=[None, 'prog_level'])

推荐答案

你可以用.pivot_table:

result = df.pivot_table(
    index="STUDENT_ID", columns="prog_level", values="ACTIVITY_NAME",
    aggfunc="count", fill_value=0
).rename(lambda c: f"prog_level_{c}", axis=1).rename_axis(None, axis=1)

结果:

            prog_level_fail  prog_level_pass  prog_level_stay
STUDENT_ID                                                   
FredID                    0                3                0
JaniceID                  1                0                2

如果希望索引为列,则在管道末尾添加.reset_index().

Python相关问答推荐

在matplotlib动画gif中更改配色方案

线性模型PanelOLS和statmodels OLS之间的区别

ModuleNotFound错误:没有名为Crypto Windows 11、Python 3.11.6的模块

在Python Attrs包中,如何在field_Transformer函数中添加字段?

pandas滚动和窗口中有效观察的最大数量

用Python解密Java加密文件

Odoo 16使用NTFS使字段只读

创建可序列化数据模型的最佳方法

pandas在第1列的id,第2列的标题,第3列的值,第3列的值?

提取相关行的最快方法—pandas

如何在turtle中不使用write()来绘制填充字母(例如OEG)

如何从需要点击/切换的网页中提取表格?

如何在BeautifulSoup/CSS Select 器中处理regex?

巨 Python :逆向猜谜游戏

当条件满足时停止ODE集成?

Python pint将1/华氏度转换为1/摄氏度°°

如何在Python中使用Iscolc迭代器实现观察者模式?

如何在Python 3.9.6和MacOS Sonoma 14.3.1下安装Pyregion

合并相似列表

根据过滤后的牛郎星图表中的数据计算新系列