如何将分组后的数据框的所有产品名称聚合到一个新列中,作为列表或集合:
import pandas as pd # 2.0.3
df = pd.DataFrame(
{
"customer_id": [1, 2, 3, 2, 1],
"order_id": [1, 2, 3, 4, 1],
"products": ["foo", "bar", "baz", "foo", "bar"],
"amount": [1, 1, 1, 1, 1]
}
)
print(df)
grouped = df.groupby(["customer_id", "order_id"])
df["product_order_count"] = grouped["amount"].transform("sum")
df["all_products"] = grouped["products"].agg(list).reset_index()
print(df)
尽管我回答了另一个问题(Pandas groupby: How to get a union of strings),但抛出了一个异常:
Traceback (most recent call last):
File "C:\temp\tt.py", line 15, in <module>
df["all_orders"] = grouped["products"].agg(list).reset_index()
File "c:\Users\foo\.venvs\kapa_monitor-38\lib\site-packages\pandas\core\frame.py", line 3940, in __setitem__
self._set_item_frame_value(key, value)
File "c:\Users\foo\.venvs\kapa_monitor-38\lib\site-packages\pandas\core\frame.py", line 4094, in _set_item_frame_value
raise ValueError(
ValueError: Cannot set a DataFrame with multiple columns to the single column all_products
预期输出(all_products
,作为list
或set
):
customer_id order_id products amount product_order_count all_products
0 1 1 foo 1 2 'foo', 'bar'
1 2 2 bar 1 1 'bar'
2 3 3 baz 1 1 'baz'
3 2 4 foo 1 1 'foo'
4 1 1 bar 1 2 'foo', 'bar'