Python 查找Col1在Col2中的项目，并注释匹配百分比

发布于10月29日

我的数据框:

data = {'Col1': ['Bad Homburg', 'Bischofferode', 'Essen', 'Grabfeld OT Rentwertshausen','Großkrotzenburg','Jesewitz/Weg','Kirchen (Sieg)','Laudenbach a. M.','Nachrodt-Wiblingwerde','Rehburg-Loccum','Dingen','Burg (Dithmarschen)'],
        'Col2': ['Rehburg-Loccum','Grabfeld','Laudenbach','Kirchen','Jesewitz','Großkrotzenburg','Nachrodt-','Essen/Stadt','Bischofferode','Bad Homburg','Münster','Burg']}

df = pd.DataFrame(data)

我的df中有两个列，如下所示:

col1	col2
Bad Homburg	Rehburg-Loccum
Bischofferode	Grabfeld
Essen	Laudenbach
Grabfeld OT Rentwertshausen	Kirchen
Großkrotzenburg	Jesewitz
Jesewitz/Weg	Großkrotzenburg
Kirchen (Sieg)	Nachrodt-
Laudenbach a. M.	Essen/Stadt
Nachrodt-Wiblingwerde	Bischofferode
Rehburg-Loccum	Bad Homburg
Dingen	Münster
Burg (Dithmarschen)	Burg

我想在COL2中查找COL1数据.如果项目存在，我想写在列Lookup_Value下的同一行中，我还会 comments 匹配百分比.以下是我的预期结果:

col1	col2	Lookup_value	Comment
Bad Homburg	Rehburg-Loccum	Bad Homburg	100% Matched
Bischofferode	Grabfeld	Bischofferode	100% Matched
Essen	Laudenbach a. M.	Essen/Stadt	Best Possible Match
Grabfeld OT Rentwertshausen	Kirchen	Grabfeld	Best Possible Match
Großkrotzenburg	Jesewitz	Großkrotzenburg	100% Matched
Jesewitz/Weg	Großkrotzenburg	Jesewitz	Best Possible Match
Kirchen (Sieg)	Nachrodt-	Kirchen	Best Possible Match
Laudenbach	Essen/Stadt	Laudenbach a. M.	Best Possible Match
Nachrodt-Wiblingwerde	Bischofferode	Nachrodt-	Best Possible Match
Rehburg-Loccum	Bad Homburg	Rehburg-Loccum	100% Matched
Dingen	Münster		No Match
Burg (Dithmarschen)	Burg	Burg	Best Possible Match

我正在try 这种方式，但行不通:

def lookup_value_and_comment(row):
    col1_value = row['Col1']
    col2_value = row['Col2']
    
    if col1_value in col2_value:
        if col1_value == col2_value:
            return pd.Series([col1_value, '100% Matched'], index=['Lookup_value', 'Comment'])
        else:
            return pd.Series([col2_value, 'Best Possible Match'], index=['Lookup_value', 'Comment'])
    else:
        return pd.Series(['', 'No Match'], index=['Lookup_value', 'Comment'])

df[['Lookup_value', 'Comment']] = df.apply(lookup_value_and_comment, axis=1)

print(df)

import pandas as pd df['col3'] = df['col1'].str.split('-|/| ').str[0] df['col4'] = df['col2'].str.split('-|/| ').str[0] def lookup_value_and_comment(row): col1_value = row['col1'] col3_value = row['col3'] ind = df['col4'].isin([col3_value]) if (df['col2'].isin([col1_value])).any(): return pd.Series([col1_value, '100% Matched'], index=['Lookup_value', 'Comment']) elif ind.any(): return pd.Series([df.loc[ind, 'col2'].values[0], 'Best Possible Match'], index=['Lookup_value', 'Comment']) else: return pd.Series(['', 'No Match'], index=['Lookup_value', 'Comment']) df[['Lookup_value', 'Comment']] = df.apply(lookup_value_and_comment, axis=1)

col1 col2 col3 col4 Lookup_value Comment 0 Bad Homburg Rehburg-Loccum [Bad, Homburg] [Rehburg, Loccum] Bad Homburg 100% Matched 1 Bischofferode Grabfeld [Bischofferode] [Grabfeld] Bischofferode 100% Matched 2 Essen Laudenbach a. M. [Essen] [Laudenbach, a., M.] Essen/Stadt Best Possible Match 3 Grabfeld OT Rentwertshausen Kirchen [Grabfeld, OT, Rentwertshausen] [Kirchen] Grabfeld Best Possible Match 4 Großkrotzenburg Jesewitz [Großkrotzenburg] [Jesewitz] Großkrotzenburg 100% Matched 5 Jesewitz/Weg Großkrotzenburg [Jesewitz, Weg] [Großkrotzenburg] Jesewitz Best Possible Match 6 Kirchen (Sieg) Nachrodt- [Kirchen, (Sieg)] [Nachrodt, ] Kirchen Best Possible Match 7 Laudenbach Essen/Stadt [Laudenbach] [Essen, Stadt] Laudenbach a. M. Best Possible Match 8 Nachrodt-Wiblingwerde Bischofferode [Nachrodt, Wiblingwerde] [Bischofferode] Nachrodt- Best Possible Match 9 Rehburg-Loccum Bad Homburg [Rehburg, Loccum] [Bad, Homburg] Rehburg-Loccum 100% Matched 10 Dingen Münster [Dingen] [Münster] No Match 11 Burg (Dithmarschen) Burg [Burg, (Dithmarschen)] [Burg] Burg Best Possible Match

Python 查找Col1在Col2中的项目，并注释匹配百分比

推荐答案

Python相关问答推荐

如何在WTForm中使用back_plumates参考brand_id？

两极：滚动组，起始指数由不同列设置

从收件箱获取特定列中的重复行

如何修复fpdf中的线路出血

如何在矩阵上并行化简单循环？

拆分pandas列并创建包含这些拆分值计数的新列

跟踪我已从数组中 Select 的样本的最有效方法

添加包含中具有任何值的其他列的计数的列

大Pandas 胚胎中产生组合

@Property方法上的inspect.getmembers出现意外行为，引发异常

如何列举Pandigital Prime Set

将pandas Dataframe转换为3D numpy矩阵

两个pandas的平均值按元素的结果串接元素.为什么？

从groupby执行计算后创建新的子框架

将9个3x3矩阵按特定顺序排列成9x9矩阵

如何从pandas的rame类继承并使用filepath实例化

ThreadPoolExecutor和单个线程的超时

不允许访问非IPM文件夹

Django admin Csrf令牌未设置

python—telegraph—bot send_voice发送空文件