在python中重构excel表格数据

发布于11月24日

我有一个巨大的EXCEL表格，我使用的是Python.一个例子:

Date	id_m00	mprice	id_m01	mprice
01.01.2023	aa-bb-cc	12,05	dd-ee-fr	8,80
02.01.2023	aa-dd-ee	09,55	ff-gg-gg	7,50

这个列模式遵循的是它的46倍.如id_m02和mrpice；id_q1和mPrice...

我想要的是:

Date	id	mprice
01.01.2023	aa-bb-cc	12,05
02.01.2023	aa-dd-ee	09,55
01.01.2023	dd-ee-fr	8,80
02.01.2023	ff-gg-gg	7,50

你知道怎么用Python语言做到这一点吗？我(第一次)使用了熔化功能，但做得不好.它以一些额外的列和大量的空值结束.

推荐答案

可能的解决方案有lreshape个:

prices = (df.pop("mprice").pipe(lambda x:
      x.set_axis(range(len(x.columns)), axis=1)))

out = (
    pd.lreshape(
        pd.concat([df, prices], axis=1),
        {"id": df.filter(like="id_m").columns, "mprice": prices.columns})
)

NB : The code above can be simplified if the example you shared correspond to the actual table in the spreadsheet. If so, while making the initial DataFrame, pandas will make sure to de-duplicate the 100 and will give 101, 102, .. 103:

out = (
    pd.lreshape((df:=pd.read_excel("file.xlsx")), # << feel free to adjust
        {"id": df.filter(like="id_m").columns,
         "mprice": df.filter(like="price").columns})
)

发帖主题:Re:Kolibrios

print(out)

         Date        id  mprice
0  01.01.2023  aa-bb-cc   12.05
1  02.01.2023  aa-dd-ee    9.55
2  01.01.2023  dd-ee-fr    8.80
3  02.01.2023  ff-gg-gg    7.50