Python Pandas 每个唯一单词的总和

发布于08月14日

更新了.我没有输入dict个数据，而是换成了dataframe

我正在分析一个大约有10,000行和2列的DataFrame.

我的分析标准是基于某些单词是否出现在某个单元格中.

我相信，如果我知道哪些词与价值观最相关，我会更成功……

Foo data to be used as an example:

data = { 'product': ['Dell Notebook I7', 'Dell Notebook I3', 'Logitech mx keys', 'Logitech mx 2'],
         'cost': [1000,1200,300,100]}

df_data = pd.DataFrame(data)

	product	cost
0	Dell Notebook I7	1000
1	Dell Notebook I3	1200
2	Logitech mx keys	300
3	Logitech mx 2	100

基本上，栏product显示了产品的描述. 列cost中显示的是产品成本.

What I want:

我想创建另一个数据帧，如下所示:

Desired Output:

	unique_words	total_cost_for_unique_word
1	Dell	2200
4	Logitech	2200
5	Notebook	2200
2	I3	1200
3	I7	1000
7	mx	400
6	keys	300
0	2	100

列unique_words，具有出现在列product中的每个单词的列表.
列total_cost_for_unique_word，其中包含该单词的产品的值之和.

我试着在这里搜索StackOverflow的帖子...此外，我也做了谷歌搜索，但还没有找到解决方案.也许我仍然没有找到答案的知识.

如果这个帖子已经被回复了，请让我知道，我会删除这个帖子.

谢谢大家.

df_data = pd.DataFrame(data) new_df = (df_data .assign(unique_words=df['product'].str.split()) .explode('unique_words') .groupby('unique_words', as_index=False) .agg(**{'total cost': ('cost' ,'sum')}) .sort_values('total cost', ascending=False, ignore_index=True) )

unique_words total cost 0 Dell 2200 1 Notebook 2200 2 I3 1200 3 I7 1000 4 Logitech 400 5 mx 400 6 keys 300 7 2 100

Python Pandas 每个唯一单词的总和

Foo data to be used as an example:

What I want:

Desired Output:

推荐答案

Python相关问答推荐

具有症状的分段函数：如何仅针对某些输入值定义函数？

列表上值总和最多为K(以O(log n))的最大元素数

使用LineConnection动画1D数据

使用Keras的线性回归参数估计

如何自动抓取以下CSV

试图找到Python方法来部分填充numpy数组

根据另一列中的nan重置值后重新加权Pandas列

用合并列替换现有列并重命名

无法在Docker内部运行Python的Matlab SDK模块，但本地没有问题

将JSON对象转换为Dataframe

如何在Python中找到线性依赖mod 2

在单个对象中解析多个Python数据帧

将pandas导出到CSV数据，但在此之前，将日期按最小到最大排序

python sklearn ValueError：使用序列设置数组元素

Pandas 数据帧中的枚举，不能在枚举列上执行GROUP BY吗？

如何使用加速广播主进程张量？

上传文件并使用Panda打开时的Flask 问题

具有不匹配列的2D到3D广播

如何在Quarto中的标题页之前创建序言页

如何在PYTHON中向单元测试S Side_Effect发送额外参数？