假设这个4x3可重现的输入:
import pandas as pd
import numpy as np
s = [ "S" + str(i) for i in range(1,4+1)]
c = [ "C" + str(i) for i in range(1,3+1)]
arr1 = np.arange(len(c)*len(s)).reshape(len(s), len(c))
arr2 = np.arange(len(c)*len(s)).reshape(len(s), len(c))*10
我想你想用numpy.repeat
,numpy.tile
和numpy.ravel
:
# numpy_1
df = pd.DataFrame({'S': np.repeat(s, len(c)),
'C': np.tile(c, len(s)),
'arr1': arr1.ravel(),
'arr2': arr2.ravel(),
})
或使用concat
和stack
:
# pandas_1
df = (pd.concat({'arr1': pd.DataFrame(arr1, index=s, columns=c),
'arr2': pd.DataFrame(arr2, index=s, columns=c),},
axis=1)
.stack().rename_axis(['S', 'C']).reset_index()
)
输出:
S C arr1 arr2
0 S1 C1 0 0
1 S1 C2 1 10
2 S1 C3 2 20
3 S2 C1 3 30
4 S2 C2 4 40
5 S2 C3 5 50
6 S3 C1 6 60
7 S3 C2 7 70
8 S3 C3 8 80
9 S4 C1 9 90
10 S4 C2 10 100
11 S4 C3 11 110
ordering C first
# numpy_2
df = pd.DataFrame({'S': np.tile(s, len(c)),
'C': np.repeat(c, len(s)),
'arr1': arr1.ravel(order='F'),
'arr2': arr2.ravel(order='F'),
})
或者:
# pandas_2
df = (pd.concat({'arr1': pd.DataFrame(arr1, index=s, columns=c),
'arr2': pd.DataFrame(arr2, index=s, columns=c),},
axis=0)
.T.stack().rename_axis(['C', 'S']).reset_index()
)
输出:
S C arr1 arr2
0 S1 C1 0 0
1 S2 C1 3 30
2 S3 C1 6 60
3 S4 C1 9 90
4 S1 C2 1 10
5 S2 C2 4 40
6 S3 C2 7 70
7 S4 C2 10 100
8 S1 C3 2 20
9 S2 C3 5 50
10 S3 C3 8 80
11 S4 C3 11 110
comparison of timings