如何在Python中将具有多级列的Pandas DataFrame转换为单级列DataFrame

发布于09月10日

我有一个具有多层列的Pandas DataFrame，如下所示:

				23-Jan								23-Feb
Market	Product	City	Territory	VALUES	Values MARKET SHARE	VALUES GROWTH	VALUES GEO. SHARE	UNITS	UNITS MARKET SHARE	UNITS GROWTH	UNITS GEO. SHARE	VALUES	Values MARKET SHARE	VALUES GROWTH	VALUES GEO. SHARE	UNITS	UNITS MARKET SHARE	UNITS GROWTH	UNITS GEO. SHARE

I want to create a Python function that transforms this DataFrame into the following format:

Market	Product	City	Territory	VALUES	Values MARKET SHARE	VALUES GROWTH	VALUES GEO. SHARE	UNITS	UNITS MARKET SHARE	UNITS GROWTH	UNITS GEO. SHARE	Date
												23-Jan
												23-Feb

How can I achieve this transformation using Python and Pandas?

推荐答案

当MultiIndex构造函数不可用时，很难提供帮助.您可以使用stack和一些索引方法reshape 数据帧:

>>> (df.set_index(df.columns[:4].tolist())  # Market, Product, City, Territory
       .rename_axis(index=df.columns[:4].droplevel(0),  # Flat them
                    columns=['Date', None])  # Define column names
       .stack('Date', sort=False).reset_index())  # Reshape your dataframe

   Market  Product  City  Territory    Date  VALUES  Values MARKET SHARE  VALUES GROWTH  VALUES GEO. SHARE  UNITS  UNITS MARKET SHARE  UNITS GROWTH  UNITS GEO. SHARE
0       0        0     0          0  23-Jan       1                    1              1                  1      1                   1             1                 1
1       0        0     0          0  23-Feb       2                    2              2                  2      2                   2             2                 2

最小工作示例:

data = {('', 'Market'): {0: 0},
 ('', 'Product'): {0: 0},
 ('', 'City'): {0: 0},
 ('', 'Territory'): {0: 0},
 ('23-Jan', 'VALUES'): {0: 1},
 ('23-Jan', 'Values MARKET SHARE'): {0: 1},
 ('23-Jan', 'VALUES GROWTH'): {0: 1},
 ('23-Jan', 'VALUES GEO. SHARE'): {0: 1},
 ('23-Jan', 'UNITS'): {0: 1},
 ('23-Jan', 'UNITS MARKET SHARE'): {0: 1},
 ('23-Jan', 'UNITS GROWTH'): {0: 1},
 ('23-Jan', 'UNITS GEO. SHARE'): {0: 1},
 ('23-Feb', 'VALUES'): {0: 2},
 ('23-Feb', 'Values MARKET SHARE'): {0: 2},
 ('23-Feb', 'VALUES GROWTH'): {0: 2},
 ('23-Feb', 'VALUES GEO. SHARE'): {0: 2},
 ('23-Feb', 'UNITS'): {0: 2},
 ('23-Feb', 'UNITS MARKET SHARE'): {0: 2},
 ('23-Feb', 'UNITS GROWTH'): {0: 2},
 ('23-Feb', 'UNITS GEO. SHARE'): {0: 2}}
df = pd.DataFrame(data)
print(df)

# Output
                                23-Jan                      ...        23-Feb                                                                         
  Market Product City Territory VALUES Values MARKET SHARE  ... VALUES GROWTH VALUES GEO. SHARE UNITS UNITS MARKET SHARE UNITS GROWTH UNITS GEO. SHARE
0      0       0    0         0      1                   1  ...             2                 2     2                  2            2                2

[1 rows x 20 columns]