我正在使用下面的数据集,但我在根据球队id计算总分时遇到了麻烦.一支球队可以是主场,也可以是客场,我正在计算他们得分的连续总分.
我已经成功创建了基于主场ID和客场ID的 run 总数以及基于主场/客场ID的 run 平均值,但我很难根据这两列进行计算
例如,如果在第1场比赛中主队得分1分,然后在第2场比赛中他们是客场球队并且得分3分,我想创建一个列,表明截至数据集中的此时他们总共得分4分
到目前为止我的代码是:
import pandas as pd
game_data = pd.read_csv('game_data.csv')
game_data['home_avg_home_games'] = game_data.groupby('home_id')['home_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['home_avg_against_home_games'] = game_data.groupby('home_id')['away_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['away_avg_away_games'] = game_data.groupby('away_id')['away_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['away_avg_against_away_games'] = game_data.groupby('away_id')['home_score'].transform(lambda x: x.rolling(165, min_periods = 0).mean())
game_data['scored_home_total'] = game_data.groupby('home_id')['home_score'].cumsum()
game_data['scored_away_total'] = game_data.groupby('away_id')['away_score'].cumsum()
game_id | away_id | home_id | away_score | home_score | home_avg_home_games | home_avg_against_home_games | away_avg_away_games | away_avg_against_away_games |
---|---|---|---|---|---|---|---|---|
446877 | 138 | 134 | 1 | 4 | 4 | 1 | 1 | 4 |
446911 | 141 | 139 | 5 | 3 | 3 | 5 | 5 | 3 |
446873 | 121 | 118 | 3 | 4 | 4 | 3 | 3 | 4 |
446875 | 137 | 158 | 12 | 3 | 3 | 12 | 12 | 3 |
446872 | 142 | 110 | 2 | 3 | 3 | 2 | 2 | 3 |
446876 | 136 | 140 | 2 | 3 | 3 | 2 | 2 | 3 |
446874 | 143 | 113 | 2 | 6 | 6 | 2 | 2 | 6 |
446879 | 120 | 144 | 4 | 3 | 3 | 4 | 4 | 3 |
446871 | 119 | 135 | 15 | 0 | 0 | 15 | 15 | 0 |
446878 | 141 | 139 | 5 | 3 | 3 | 5 | 5 | 3 |
446869 | 115 | 109 | 10 | 5 | 5 | 10 | 10 | 5 |
446889 | 112 | 108 | 9 | 0 | 0 | 9 | 9 | 0 |
446868 | 145 | 133 | 4 | 3 | 3 | 4 | 4 | 3 |
446870 | 117 | 147 | 5 | 3 | 3 | 5 | 5 | 3 |
446867 | 111 | 114 | 6 | 2 | 2 | 6 | 6 | 2 |
446896 | 121 | 118 | 2 | 0 | 2 | 2.5 | 2.5 | 2 |
446910 | 138 | 134 | 5 | 6 | 5 | 3 | 3 | 5 |
446887 | 141 | 139 | 2 | 3 | 3 | 4 | 4 | 3 |
446883 | 116 | 146 | 8 | 7 | 7 | 8 | 8 | 7 |
446886 | 136 | 140 | 10 | 2 | 2.5 | 6 | 6 | 2.5 |
446885 | 137 | 158 | 2 | 1 | 2 | 7 | 7 | 2 |
446882 | 115 | 109 | 6 | 11 | 8 | 8 | 8 | 8 |
446880 | 112 | 108 | 6 | 1 | 0.5 | 7.5 | 7.5 | 0.5 |
446881 | 145 | 133 | 5 | 4 | 3.5 | 4.5 | 4.5 | 3.5 |
446884 | 119 | 135 | 3 | 0 | 0 | 9 | 9 | 0 |
446901 | 141 | 139 | 3 | 5 | 3.5 | 3.75 | 3.75 | 3.5 |
446898 | 137 | 158 | 3 | 4 | 2.666666667 | 5.666666667 | 5.666666667 | 2.666666667 |
446899 | 136 | 140 | 9 | 5 | 3.333333333 | 7 | 7 | 3.333333333 |
446891 | 115 | 109 | 4 | 3 | 6.333333333 | 6.666666667 | 6.666666667 | 6.333333333 |
我想要的输出是:
game_id | away_id | home_id | away_score | home_score | home_avg_for_home_games | home_avg_against_home_games | away_avg_away_games | away_avg_against_away_games | home_total_score | away_total_score |
---|---|---|---|---|---|---|---|---|---|---|
446877 | 1 | 2 | 1 | 4 | 4 | 1 | 1 | 4 | 4 | 1 |
446911 | 2 | 3 | 5 | 3 | 3 | 5 | 5 | 3 | 3 | 5 |
446873 | 1 | 3 | 3 | 4 | 3.5 | 4 | 2 | 4 | 7 | 4 |