Python 使用列的 MultiIndex 将高 DataFrame 转换为宽格式

发布于09月21日

我有一个长格式的表，其中显示了哪些供应商被分配到一组地点的哪项服务.对于每个地点和服务，都有任意数量的分配级别(例如，如果第一个供应商不能提供服务，工作将升级到第二个供应商).

我正在try 将该表转换为宽格式，我知道我肯定做错了什么.在预期的宽格式表中，在位置之后，应该有一个双标题，其中0级是服务名称，1级是每个分配编号.因此，当导出到Excel时，每个服务名称应该跨越任意数量的赋值列.

实际输出并不是预期的结果.我想我要么是旋转错了，要么是pd.pivot不是合适的工具.我需要在透视后手动对多重索引进行排序和调整，这一事实很能说明问题.

如何调整代码以获得预期的输出？

示例代码

import pandas as pd
import numpy as np


sample_data = [
 {'Location': 'required',
  'ServiceSpecialty': 'required',
  'Assignment 1': 'optional',
  'Assignment 2': 'optional',
  'Assignment 3': 'optional',
  'Assignment 4': 'optional',
  'Assignment 5': 'optional'},
 {'Location': '123 Main Street',
  'ServiceSpecialty': 'Appliances',
  'Assignment 1': 'John Smith',
  'Assignment 2': np.nan,
  'Assignment 3': np.nan,
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '123 Main Street',
  'ServiceSpecialty': 'Carpentry/Handyman Svcs',
  'Assignment 1': 'ACME Supplier A',
  'Assignment 2': 'Mom & Pop Shop',
  'Assignment 3': 'Amy Smith',
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '123 Main Street',
  'ServiceSpecialty': 'Doors',
  'Assignment 1': 'Abugida',
  'Assignment 2': 'ACME Industries',
  'Assignment 3': 'Mom & Pop Shop',
  'Assignment 4': 'Amy Smith',
  'Assignment 5': 'John Smith'},
 {'Location': '456 Broadway Ave',
  'ServiceSpecialty': 'Appliances',
  'Assignment 1': 'John Smith',
  'Assignment 2': np.nan,
  'Assignment 3': np.nan,
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '456 Broadway Ave',
  'ServiceSpecialty': 'Carpentry/Handyman Svcs',
  'Assignment 1': 'ACME Supplier A',
  'Assignment 2': 'Mom & Pop Shop',
  'Assignment 3': 'Amy Smith',
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '456 Broadway Ave',
  'ServiceSpecialty': 'Doors',
  'Assignment 1': 'Abugida',
  'Assignment 2': 'ACME Industries',
  'Assignment 3': 'Mom & Pop Shop',
  'Assignment 4': 'Amy Smith',
  'Assignment 5': 'John Smith'}
]

df = pd.DataFrame.from_dict(sample_data)

# Remove the unneeded secondary header (required/optional), and pivot
df = df.drop(index=df.index[0], axis=0)
df = df.pivot(index='Location', columns='ServiceSpecialty').reset_index()

# Remove any fully blank columns (eg "ATM" only has 1 assignment, but "Carpentry/Handyman Svcs" has 3).
# We only want to retain the populated "Assignment" columns
df = df.dropna(axis=1, how='all')

# Clean up the MultiIndex header (remove the names, and put the ServiceSpecialty above the Assignment
# columns. This will cause the ServiceSpecialty to span all the Assignment columns when exported to Excel)
df.index.name = None
df.columns.names = (None, None)
df.columns = df.columns.swaplevel(0,1)

# Sort the MultiIndex columns by the ServiceSpecialty, not the Assignment
cols_location = [('', 'Location')]
cols_assignments = [c for c in df.columns if c[0] != '']
cols_assignments.sort(key=lambda c: c[0])
cols_updated = cols_location + cols_assignments
df.columns = pd.MultiIndex.from_tuples(cols_updated)

# Write the output to file
with pd.ExcelWriter('current_output.xlsx', engine='xlsxwriter') as writer:
    df.to_excel(writer)

Python 使用列的 MultiIndex 将高 DataFrame 转换为宽格式

示例代码

样本输入

电流输出

预期yields

推荐答案

Python相关问答推荐

如何随着收件箱的增加动态添加到HTML表的右下角？

在Python中，如何初始化集合列表脚本的输出

在Python中是否可以输入使用任意大小参数列表的第一个元素的函数

按日期和组增量计算总价值

带有pandas的分区列上的过滤器的多个条件read_parquet

如何将新的SQL服务器功能映射到SQL Alchemy的ORM

inspect_asm不给出输出

用Python获取HTML Span类中的数据

Python主进程和分支进程如何共享gc信息？

从包含数字和单词的文件中读取和获取数据集

在Python中为变量的缺失值创建虚拟值

使用FASTCGI在IIS上运行Django频道

时间序列分解

如何从具有不同len的列表字典中创建摘要表？

在Wayland上使用setCellWidget时，try 编辑QTable Widget中的单元格时，PyQt 6崩溃

基于字符串匹配条件合并两个帧

无法使用DBFS File API路径附加到CSV In Datricks(OSError Errno 95操作不支持)

多指标不同顺序串联大Pandas 模型

计算天数

在方法中设置属性值时，如何处理语句不可达[Unreacable]"；的问题？