假设我有两个词典列表:

v = [{'call 1': 'debit card'},
 {'call 2': 'debit card'},
 {'call 3': 'payment limit'},
 {'call 1': 'bond'},
 {'call 2': 'mortgage'},
 {'call 3': 'debit card'},
 {'call 1': nan},
 {'call 2': 'spending limit'},
 {'call 3': nan}]

w = [{'cluster 1': 'payment limit'},
 {'cluster 2': 'debit card'},
 {'cluster 3': 'bond'},
 {'cluster 1': 'spending limit'},
 {'cluster 2': 'debit card'},
 {'cluster 3': 'mortgage'},
 {'cluster 1': None},
 {'cluster 2': 'debit card'},
 {'cluster 3': None}]

I want to drop the null values on both 和 merge the two lists on values of the dictionaries, such that I get:

# desired outcome 
    [{'call 3':{'cluster 1': 'payment limit'}},
     {'call 1':{'cluster 2': 'debit card'}},
     {'call 1':{'cluster 3': 'bond'}},
     {'call 2':{'cluster 1': 'spending limit'}},
     {'call 2':{'cluster 2': 'debit card'}},
     {'call 3':{'cluster 2': 'debit card'}}]

The puzzling part here to me is how to assign the calls to each cluster. As you can see debit card appears in call 1, call 2call 3, so in general I should be able to assign a distinct key to each cluster.

推荐答案

一种简单的方法:

v = [{'call 1': 'debit card'},
 {'call 2': 'debit card'},
 {'call 3': 'payment limit'},
 {'call 1': 'bond'},
 {'call 2': 'mortgage'},
 {'call 3': 'debit card'},
 {'call 1': None},
 {'call 2': 'spending limit'},
 {'call 3': None}]

w = [{'cluster 1': 'payment limit'},
 {'cluster 2': 'debit card'},
 {'cluster 3': 'bond'},
 {'cluster 1': 'spending limit'},
 {'cluster 2': 'debit card'},
 {'cluster 3': 'mortgage'},
 {'cluster 1': None},
 {'cluster 2': 'debit card'},
 {'cluster 3': None}]

def join(d1, d2):
    # Step 1
    updated_d1 = []
    for ls in d1:
        for k, v in ls.items():
            if v == None:
                continue
            else:
                updated_d1.append({k: v})
    updated_d2 = []
    for ls in d2:
        for k, v in ls.items():
            if v == None:
                continue
            else:
                updated_d2.append({k: v})
    # Step 2
    d1_dict = {}
    for ls in updated_d1:
        for k, v in ls.items():
            if v in d1_dict:
                d1_dict[v].append(k)
            else:
                d1_dict[v] = [k]
    d2_dict = {}
    for ls in updated_d2:
        for k, v in ls.items():
            if v in d2_dict:
                d2_dict[v].append(k)
            else:
                d2_dict[v] = [k]
    # Step 3
    ls_results = []
    for k, v in d1_dict.items():
        if k in d2_dict:
            for i in v:
                for j in d2_dict[k]:
                    if {i: {j: k}} not in ls_results:
                        ls_results.append({i: {j: k}})
        else:
            continue
    return ls_results
print(join(v, w))

输出:

[
    {'call 1': {'cluster 2': 'debit card'}}, 
    {'call 2': {'cluster 2': 'debit card'}}, 
    {'call 3': {'cluster 2': 'debit card'}}, 
    {'call 3': {'cluster 1': 'payment limit'}}, 
    {'call 1': {'cluster 3': 'bond'}}, 
    {'call 2': {'cluster 3': 'mortgage'}}, 
    {'call 2': {'cluster 1': 'spending limit'}}
]

到底在做什么?

步骤1.首先删除所有没有值的字典

步骤2.创建新词典,其中键现在是值,值是在原始词典中具有相同值的所有键

步骤3.现在剩下的就是在两个词典中找到匹配的关键字,并添加它们的值的所有可能组合,同时还跟踪重复项

Python相关问答推荐

滚动和,句号来自Pandas列

类型错误:输入类型不支持ufuncisnan-在执行Mann-Whitney U测试时[SOLVED]

非常奇怪:tzLocal.get_Localzone()基于python3别名的不同输出?

运行总计基于多列pandas的分组和总和

为什么符号没有按顺序添加?

什么相当于pytorch中的numpy累积ufunc

从一个系列创建一个Dataframe,特别是如何重命名其中的列(例如:使用NAs/NaN)

我如何根据前一个连续数字改变一串数字?

创建可序列化数据模型的最佳方法

如何在Python中使用Pandas将R s Tukey s HSD表转换为相关矩阵''

为什么if2/if3会提供两种不同的输出?

如何检测鼠标/键盘的空闲时间,而不是其他输入设备?

合并与拼接并举

替换现有列名中的字符,而不创建新列

将一个双框爆炸到另一个双框的范围内

如何将返回引用的函数与pybind11绑定?

如何在Python中自动创建数字文件夹和正在进行的文件夹?

启动线程时,Python键盘模块冻结/不工作

奇怪的Base64 Python解码

as_index=False groupBy不支持count