我有一个55049行667列的数据帧.
数据帧 struct 示例如下:
data = {
'g1': [1],
'g2': [2],
'g3': [3],
'st1_1': [1],
'st1_2': [1],
'st1_3': [1],
'st1_4': [1],
'st1_5': [5],
'st1_6': [5],
'st1_7': [5],
'st1_8': [5],
'st1_Next_1': [8],
'st1_Next_2': [8],
'st1_Next_3': [8],
'st1_Next_4': [8],
'st1_Next_5': [9],
'st1_Next_6': [9],
'st1_Next_7': [9],
'st1_Next_8': [9],
'st2_1': [2],
'st2_2': [2],
'st2_3': [2],
'st2_4': [2],
'st2_5': [2],
'st2_6': [2],
'st2_7': [2],
'st2_8': [2],
'ft_1': [1],
'ft_2': [0],
'ft_3': [1],
'ft_4': [1],
'ft_5': [1],
'ft_6': [0],
'ft_7': [0],
'ft_8': [1]
}
df = pd.DataFrame(data)
print(df)
为了获得所需的输出,我在使用pd.wide_to_long
的地方使用了以下代码
ilist = ['g1','g2','g3']
stublist = ['st1','st1_Next','st2','ft']
df_long = pd.wide_to_long(
df.reset_index(),
i=['index']+ilist ,
stubnames= stublist,
j='j', sep='_').reset_index()
df_long = df_long[df_long['ft']==1]
上面的代码运行良好,达到了预期效果.
我做了这个宽到长的操作来应用过滤器
df_long[df_long['ft']==1]
.这意味着FT_1需要申请ALL_1,FT_2需要申请ALL_2.....,所以需要申请ALL_8.
问题是执行宽到长的操作大约花了2分钟,因为我有800多个源文件来处理整个过程花费了1600分钟,这是相当高的.
我正在寻找任何替代建议,以换位的数据.
我试了this次,但不太有效,差别很大.
正如@sammywemmy建议的那样,我try 了以下代码.但yields 还不到
st1_Next
.
ilist = ['g1','g2','g3']
stublist = ['st1','st1_Next','st2','ft']
df_pvot = df.pivot_longer(index=ilist,names_to=stublist,names_pattern=stublist)
print(df_pvot)
输出缺少st1_Next,并且使用st1代替新列的数据库.
Output:
g1 g2 g3 st1 st2 ft
0 1 2 3 1 2.0 1.0
1 1 2 3 1 2.0 0.0
2 1 2 3 1 2.0 1.0
3 1 2 3 1 2.0 1.0
4 1 2 3 5 2.0 1.0
5 1 2 3 5 2.0 0.0
6 1 2 3 5 2.0 0.0
7 1 2 3 5 2.0 1.0
8 1 2 3 8 NaN NaN
9 1 2 3 8 NaN NaN
10 1 2 3 8 NaN NaN
11 1 2 3 8 NaN NaN
12 1 2 3 9 NaN NaN
13 1 2 3 9 NaN NaN
14 1 2 3 9 NaN NaN
15 1 2 3 9 NaN NaN