我有一个长格式的表,其中显示了哪些供应商被分配到一组地点的哪项服务.对于每个地点和服务,都有任意数量的分配级别(例如,如果第一个供应商不能提供服务,工作将升级到第二个供应商).

我正在try 将该表转换为宽格式,我知道我肯定做错了什么.在预期的宽格式表中,在位置之后,应该有一个双标题,其中0级是服务名称,1级是每个分配编号.因此,当导出到Excel时,每个服务名称应该跨越任意数量的赋值列.

实际输出并不是预期的结果.我想我要么是旋转错了,要么是pd.pivot不是合适的工具.我需要在透视后手动对多重索引进行排序和调整,这一事实很能说明问题.

如何调整代码以获得预期的输出?

示例代码

import pandas as pd
import numpy as np


sample_data = [
 {'Location': 'required',
  'ServiceSpecialty': 'required',
  'Assignment 1': 'optional',
  'Assignment 2': 'optional',
  'Assignment 3': 'optional',
  'Assignment 4': 'optional',
  'Assignment 5': 'optional'},
 {'Location': '123 Main Street',
  'ServiceSpecialty': 'Appliances',
  'Assignment 1': 'John Smith',
  'Assignment 2': np.nan,
  'Assignment 3': np.nan,
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '123 Main Street',
  'ServiceSpecialty': 'Carpentry/Handyman Svcs',
  'Assignment 1': 'ACME Supplier A',
  'Assignment 2': 'Mom & Pop Shop',
  'Assignment 3': 'Amy Smith',
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '123 Main Street',
  'ServiceSpecialty': 'Doors',
  'Assignment 1': 'Abugida',
  'Assignment 2': 'ACME Industries',
  'Assignment 3': 'Mom & Pop Shop',
  'Assignment 4': 'Amy Smith',
  'Assignment 5': 'John Smith'},
 {'Location': '456 Broadway Ave',
  'ServiceSpecialty': 'Appliances',
  'Assignment 1': 'John Smith',
  'Assignment 2': np.nan,
  'Assignment 3': np.nan,
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '456 Broadway Ave',
  'ServiceSpecialty': 'Carpentry/Handyman Svcs',
  'Assignment 1': 'ACME Supplier A',
  'Assignment 2': 'Mom & Pop Shop',
  'Assignment 3': 'Amy Smith',
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '456 Broadway Ave',
  'ServiceSpecialty': 'Doors',
  'Assignment 1': 'Abugida',
  'Assignment 2': 'ACME Industries',
  'Assignment 3': 'Mom & Pop Shop',
  'Assignment 4': 'Amy Smith',
  'Assignment 5': 'John Smith'}
]

df = pd.DataFrame.from_dict(sample_data)

# Remove the unneeded secondary header (required/optional), and pivot
df = df.drop(index=df.index[0], axis=0)
df = df.pivot(index='Location', columns='ServiceSpecialty').reset_index()

# Remove any fully blank columns (eg "ATM" only has 1 assignment, but "Carpentry/Handyman Svcs" has 3).
# We only want to retain the populated "Assignment" columns
df = df.dropna(axis=1, how='all')

# Clean up the MultiIndex header (remove the names, and put the ServiceSpecialty above the Assignment
# columns. This will cause the ServiceSpecialty to span all the Assignment columns when exported to Excel)
df.index.name = None
df.columns.names = (None, None)
df.columns = df.columns.swaplevel(0,1)

# Sort the MultiIndex columns by the ServiceSpecialty, not the Assignment
cols_location = [('', 'Location')]
cols_assignments = [c for c in df.columns if c[0] != '']
cols_assignments.sort(key=lambda c: c[0])
cols_updated = cols_location + cols_assignments
df.columns = pd.MultiIndex.from_tuples(cols_updated)

# Write the output to file
with pd.ExcelWriter('current_output.xlsx', engine='xlsxwriter') as writer:
    df.to_excel(writer)

样本输入

enter image description here

电流输出

enter image description here

预期yields

enter image description here

推荐答案

IIUC,您只能使用.swaplevel():

df = df.drop(index=df.index[0], axis=0)
df = (
    df.pivot(index="Location", columns="ServiceSpecialty")
    .reset_index()
    .swaplevel(axis=1)
    .sort_index(axis=1)
    .dropna(axis=1, how="all")
    .rename_axis(index=None, columns=(None, None))
)
print(df)

打印:

                      Appliances Carpentry/Handyman Svcs                                     Doors                                                           
           Location Assignment 1            Assignment 1    Assignment 2 Assignment 3 Assignment 1     Assignment 2    Assignment 3 Assignment 4 Assignment 5
0   123 Main Street   John Smith         ACME Supplier A  Mom & Pop Shop    Amy Smith      Abugida  ACME Industries  Mom & Pop Shop    Amy Smith   John Smith
1  456 Broadway Ave   John Smith         ACME Supplier A  Mom & Pop Shop    Amy Smith      Abugida  ACME Industries  Mom & Pop Shop    Amy Smith   John Smith

Python相关问答推荐

如何随着收件箱的增加动态添加到HTML表的右下角?

在Python中,如何初始化集合列表脚本的输出

在Python中是否可以输入使用任意大小参数列表的第一个元素的函数

按日期和组增量计算总价值

带有pandas的分区列上的过滤器的多个条件read_parquet

如何将新的SQL服务器功能映射到SQL Alchemy的ORM

inspect_asm不给出输出

用Python获取HTML Span类中的数据

Python主进程和分支进程如何共享gc信息?

从包含数字和单词的文件中读取和获取数据集

在Python中为变量的缺失值创建虚拟值

使用FASTCGI在IIS上运行Django频道

时间序列分解

如何从具有不同len的列表字典中创建摘要表?

在Wayland上使用setCellWidget时,try 编辑QTable Widget中的单元格时,PyQt 6崩溃

基于字符串匹配条件合并两个帧

无法使用DBFS File API路径附加到CSV In Datricks(OSError Errno 95操作不支持)

多指标不同顺序串联大Pandas 模型

计算天数

在方法中设置属性值时,如何处理语句不可达[Unreacable]";的问题?