逻辑并不完全清楚(因为看起来提供的输出不是与提供的输入对应的真实输出),但以下是一些方法:
using crosstab
每年的百分比
df2 = df.melt(value_name='year')
df2 = pd.crosstab(df2['variable'], df2['year'], normalize='columns').mul(100)
# or
# df2 = pd.crosstab(df2['variable'], df2['year'])
# df2.div(df2.sum()).mul(100)
输出:
year 1993 1994 1996 1997 1998 1999 2001 2002 2003
variable
A 0.0 0.0 0.0 0.0 75.0 25.0 50.0 0.0 0.0
B 0.0 0.0 0.0 100.0 25.0 50.0 50.0 0.0 0.0
C 50.0 0.0 100.0 0.0 0.0 0.0 0.0 100.0 50.0
D 50.0 100.0 0.0 0.0 0.0 25.0 0.0 0.0 50.0
每个变量的百分比
df2 = df.melt(value_name='year')
pd.crosstab(df2['variable'], df2['year'], normalize='index').mul(100)
# or
# df2 = pd.crosstab(df2['variable'], df2['year'])
# df2.div(df2.sum(1), axis=0).mul(100)
输出:
year 1993 1994 1996 1997 1998 1999 2001 2002 2003
variable
A 0.0 0.0 0.0 0.0 60.0 20.0 20.0 0.0 0.0
B 0.0 0.0 0.0 20.0 20.0 40.0 20.0 0.0 0.0
C 20.0 0.0 20.0 0.0 0.0 0.0 0.0 20.0 40.0
D 20.0 20.0 0.0 0.0 0.0 20.0 0.0 0.0 40.0
using groupby
(df.stack()
.groupby(level=1)
.apply(lambda s: s.value_counts(normalize=True))
.unstack(fill_value=0)
.mul(100)
)
输出:
1993 1994 1996 1997 1998 1999 2001 2002 2003
A 0.0 0.0 0.0 0.0 60.0 20.0 20.0 0.0 0.0
B 0.0 0.0 0.0 20.0 20.0 40.0 20.0 0.0 0.0
C 20.0 0.0 20.0 0.0 0.0 0.0 0.0 20.0 40.0
D 20.0 20.0 0.0 0.0 0.0 20.0 0.0 0.0 40.0