我有一个长格式的表,其中显示了哪些供应商被分配到一组地点的哪项服务.对于每个地点和服务,都有任意数量的分配级别(例如,如果第一个供应商不能提供服务,工作将升级到第二个供应商).
我正在try 将该表转换为宽格式,我知道我肯定做错了什么.在预期的宽格式表中,在位置之后,应该有一个双标题,其中0级是服务名称,1级是每个分配编号.因此,当导出到Excel时,每个服务名称应该跨越任意数量的赋值列.
实际输出并不是预期的结果.我想我要么是旋转错了,要么是pd.pivot
不是合适的工具.我需要在透视后手动对多重索引进行排序和调整,这一事实很能说明问题.
如何调整代码以获得预期的输出?
示例代码
import pandas as pd
import numpy as np
sample_data = [
{'Location': 'required',
'ServiceSpecialty': 'required',
'Assignment 1': 'optional',
'Assignment 2': 'optional',
'Assignment 3': 'optional',
'Assignment 4': 'optional',
'Assignment 5': 'optional'},
{'Location': '123 Main Street',
'ServiceSpecialty': 'Appliances',
'Assignment 1': 'John Smith',
'Assignment 2': np.nan,
'Assignment 3': np.nan,
'Assignment 4': np.nan,
'Assignment 5': np.nan},
{'Location': '123 Main Street',
'ServiceSpecialty': 'Carpentry/Handyman Svcs',
'Assignment 1': 'ACME Supplier A',
'Assignment 2': 'Mom & Pop Shop',
'Assignment 3': 'Amy Smith',
'Assignment 4': np.nan,
'Assignment 5': np.nan},
{'Location': '123 Main Street',
'ServiceSpecialty': 'Doors',
'Assignment 1': 'Abugida',
'Assignment 2': 'ACME Industries',
'Assignment 3': 'Mom & Pop Shop',
'Assignment 4': 'Amy Smith',
'Assignment 5': 'John Smith'},
{'Location': '456 Broadway Ave',
'ServiceSpecialty': 'Appliances',
'Assignment 1': 'John Smith',
'Assignment 2': np.nan,
'Assignment 3': np.nan,
'Assignment 4': np.nan,
'Assignment 5': np.nan},
{'Location': '456 Broadway Ave',
'ServiceSpecialty': 'Carpentry/Handyman Svcs',
'Assignment 1': 'ACME Supplier A',
'Assignment 2': 'Mom & Pop Shop',
'Assignment 3': 'Amy Smith',
'Assignment 4': np.nan,
'Assignment 5': np.nan},
{'Location': '456 Broadway Ave',
'ServiceSpecialty': 'Doors',
'Assignment 1': 'Abugida',
'Assignment 2': 'ACME Industries',
'Assignment 3': 'Mom & Pop Shop',
'Assignment 4': 'Amy Smith',
'Assignment 5': 'John Smith'}
]
df = pd.DataFrame.from_dict(sample_data)
# Remove the unneeded secondary header (required/optional), and pivot
df = df.drop(index=df.index[0], axis=0)
df = df.pivot(index='Location', columns='ServiceSpecialty').reset_index()
# Remove any fully blank columns (eg "ATM" only has 1 assignment, but "Carpentry/Handyman Svcs" has 3).
# We only want to retain the populated "Assignment" columns
df = df.dropna(axis=1, how='all')
# Clean up the MultiIndex header (remove the names, and put the ServiceSpecialty above the Assignment
# columns. This will cause the ServiceSpecialty to span all the Assignment columns when exported to Excel)
df.index.name = None
df.columns.names = (None, None)
df.columns = df.columns.swaplevel(0,1)
# Sort the MultiIndex columns by the ServiceSpecialty, not the Assignment
cols_location = [('', 'Location')]
cols_assignments = [c for c in df.columns if c[0] != '']
cols_assignments.sort(key=lambda c: c[0])
cols_updated = cols_location + cols_assignments
df.columns = pd.MultiIndex.from_tuples(cols_updated)
# Write the output to file
with pd.ExcelWriter('current_output.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer)