这是an earlier question I had asked a little over a year ago年后的一点后续

我有一个"主"数据框,其中有许多material 的产品代码和名称以及它们的每月消耗量.类似于:

product code Name Consumption A Consumption B Consumption C Consumption D
123 AA 100 120 130 140
456 BB 5 7 9 11
789 CC 12 5 33 89
134 AD 4 17 37 57
467 BD 1 3 5 7
179 ED 6 19 30 61
426 FD 8 5 2 13

我还有另一个表,其中列出了一些产品代码,这些代码是"替代"或替代产品,例如:

Product Code Alt Code
123 134
123 179
123 426
456 467

(与之前的不同之处在于,现在,同一个"产品码"可以有多个不同的"Alt码"S)

我如何使用第二个数据帧来处理第一个数据帧,使其变为:

product code Name Consumption A Consumption B Consumption C Consumption D
123 / 134 / 179 / 426 AA / AD / ED / FD 118 161 199 271
456 / 467 BB / BD 6 10 14 18
789 CC 12 5 33 89

其中,产品代码和名称已连接到同一单元格中,数量相加,并删除了替代方案的"重复项"?

与上次不同的是,现在有一些地方,一个"主"代码有多个备选代码,但它们将始终与主代码一起列出在第一列,备选代码在"alt code"列.理想情况下,我希望使用相同的代码将所有可选行合并为一个单独的行.

这是我一直在try 使用的代码,基于我上次得到的答案:

    if alt_name != "":
        altf = pd.read_excel(io=alt_name)
        group = df['Material'].map(lambda x: altf.set_index('Material')['Alt Material'].get(x, x))
        d = {c: 'sum' for c in df.columns}
        out = (df
               .astype({'Material': str})
               .groupby([group], as_index=False)
               .agg({**d, **{'Material': ' / '.join, 'Description': ' / '.join}})
               )
        df = out

ALT_NAME是包含列的EXCEL表格的名称,另存为"material "和"替代material ".

当我try 使用具有多个备选方案的表来运行它时,我得到以下错误:

Traceback (most recent call last):
  File "/Users/[Location]/function_code_v2.py", line 285, in <module>
    trendcharts(file_name, zone, combinezones, month, numMonths, percent_wanted, trendgen, alt_name, save_directory)
  File "/Users/[Location]/function_code_v2.py", line 106, in trendcharts
    .agg({**d, **{'Material': ' / '.join, 'Description': ' / '.join}})
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/generic.py", line 1445, in aggregate
    result = op.agg()
             ^^^^^^^^
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/apply.py", line 175, in agg
    return self.agg_dict_like()
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/apply.py", line 406, in agg_dict_like
    return self.agg_or_apply_dict_like(op_name="agg")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/apply.py", line 1388, in agg_or_apply_dict_like
    result_index, result_data = self.compute_dict_like(
                                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/apply.py", line 480, in compute_dict_like
    getattr(obj._gotitem(key, ndim=1), op_name)(how, **kwargs)
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/generic.py", line 275, in aggregate
    if self.ngroups == 0:
       ^^^^^^^^^^^^
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/groupby.py", line 825, in ngroups
    return self.grouper.ngroups
           ^^^^^^^^^^^^^^^^^^^^
  File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 758, in ngroups
    return len(self.result_index)
               ^^^^^^^^^^^^^^^^^
  File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 769, in result_index
    return self.groupings[0].result_index.rename(self.names[0])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/grouper.py", line 718, in result_index
    return self.group_index
           ^^^^^^^^^^^^^^^^
  File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/grouper.py", line 722, in group_index
    codes, uniques = self._codes_and_uniques
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/groupby/grouper.py", line 801, in _codes_and_uniques
    codes, uniques = algorithms.factorize(  # type: ignore[assignment]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/algorithms.py", line 795, in factorize
    codes, uniques = factorize_array(
                     ^^^^^^^^^^^^^^^^
  File "/Users/[Location]/.venv/lib/python3.12/site-packages/pandas/core/algorithms.py", line 595, in factorize_array
    uniques, codes = table.factorize(
                     ^^^^^^^^^^^^^^^^
  File "pandas/_libs/hashtable_class_helper.pxi", line 7280, in pandas._libs.hashtable.PyObjectHashTable.factorize
  File "pandas/_libs/hashtable_class_helper.pxi", line 7194, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'Series'

推荐答案

您可以使用相同的逻辑,使用公共id进行聚合:

d = {c: 'sum' for c in df.columns[1:]}
d['Name'] = lambda x: '/'.join(map(str, x))

out = (df
   .groupby(df['product code'].replace(altdf.set_index('Alt Code')['Product Code']))
   .agg(d).reset_index()
)

然后固定产品代码:

mapper = altdf.groupby('Product Code')['Alt Code'].agg(set)
out['product code'] = out['product code'].map(lambda x: '/'.join(map(str, sorted(mapper.get(x, set())|{x}))))

输出:

     product code         Name  Consumption A  Consumption B  Consumption C  Consumption D
0  123/134/179/426  AA/AD/ED/FD            118            161            199            271
1          456/467        BB/BD              6             10             14             18
2              789           CC             12              5             33             89

Python相关问答推荐

双情节在单个图上切换-pPython

如何从不同长度的HTML表格中抓取准确的字段?

如何使用Selenium访问svg对象内部的元素

jit JAX函数中的迭代器

如何从FDaGrid实例中删除某些函数?

具有多个选项的计数_匹配

我在使用fill_between()将最大和最小带应用到我的图表中时遇到问题

acme错误-Veritas错误:模块收件箱没有属性linear_util'

@Property方法上的inspect.getmembers出现意外行为,引发异常

在Python中处理大量CSV文件中的数据

可变参数数量的重载类型(args或kwargs)

如何使用LangChain和AzureOpenAI在Python中解决AttribeHelp和BadPressMessage错误?

如何记录脚本输出

如何使用表达式将字符串解压缩到Polars DataFrame中的多个列中?

从spaCy的句子中提取日期

如何使用SentenceTransformers创建矢量嵌入?

解决调用嵌入式函数的XSLT中表达式的语法移位/归约冲突

使用Python查找、替换和调整PDF中的图像'

在输入行运行时停止代码

基于另一列的GROUP-BY聚合将列添加到Polars LazyFrame