I have 2 dataframes:

ID Name Category----
1 Apple Fruit
2 Orange Fruit
3 brocolli Vegetable
4 Spinach Vegetable

DF2型

UserID Date UserName Description
111 01/01/2020 AAA Ordered 1 Box Apples
111 01/02/2021 AAA Ordered 1KG spinach
222 15/03/2021 BBB Ordered 3 boxes of Orange

Can anyone help how I can match the "Description" from DF2型 which contains "Name" string from DF1 and add the respective "Category" column in DF2型?

期望输出:

UserID Date UserName Description Category
111 01/01/2020 AAA Ordered 1 Box Apples Fruit
111 01/02/2021 AAA Ordered 1KG spinach Vegetable
222 15/03/2021 BBB Ordered 3 boxes of Orange Fruit

推荐答案

Edit - Second solution below as per OP comments

First:该代码使用MERGE完成相同的任务

import pandas as pd

# Input Data
df1 = pd.DataFrame({'Name':['Apple','Orange','Brocolli','Spinach'], 'Category':['Fruit', 'Fruit','Vegitable','Vegitable']})
df2 = pd.DataFrame({'Date':['01/01/2020','02/02/2021','03/03/2022'], 'Description':['Ordered 1 Box Apple', 'Ordered 1 KG spinach','Ordered 3 Box Orange']})

# Data Processing
pd.merge(df2, df1, left_on = df2['Description'].str.lower().str.split(' ', expand=True)[3], right_on = df1['Name'].str.lower(), how='left' ).drop('key_0', axis=1)

输出:

enter image description here

Second Solution

根据以下OP注释更新代码

 fruit_cat_mapping = { i[0]:i[1] for i in df1[['NAME','CATEGORY']].values}

def mapper_func(x):
  for key in fruit_cat_mapping.keys():
      if x.find(key.lower()) > -1:
         res = fruit_cat_mapping[key]
         return res

df2['Description'].str.lower().apply(lambda x: mapper_func(x))

Python相关问答推荐

如何在BeautifulSoup中链接Find()方法并处理无?

运行回文查找器代码时发生错误:[类型错误:builtin_index_or_system对象不可订阅]

如何使用html从excel中提取条件格式规则列表?

删除所有列值,但判断是否存在任何二元组

更改键盘按钮进入'

Godot:需要碰撞的对象的AdditionerBody2D或Area2D以及queue_free?

让函数调用方程

在Python中调用变量(特别是Tkinter)

numpy.unique如何消除重复列?

为什么常规操作不以其就地对应操作为基础?

python—telegraph—bot send_voice发送空文件

Pandas:填充行并删除重复项,但保留不同的值

PYTHON、VLC、RTSP.屏幕截图不起作用

巨 Python :逆向猜谜游戏

需要帮助使用Python中的Google的People API更新联系人的多个字段'

启动线程时,Python键盘模块冻结/不工作

查找数据帧的给定列中是否存在特定值

使用Scikit的ValueError-了解

pyspark where子句可以在不存在的列上工作

Python键盘模块不会立即检测到按键