Python3.x 根据条件过滤元组列表

发布于09月02日

对于给定的元组列表，如果列表中的多个元组的第一个元素相同，则只 Select 最后一个元素最大的元组.

例如:

sample_list = [(5,16,2),(5,10,3),(5,8,1),(21,24,1)]

在上面的sample_list个元组中，因为前3个元组具有类似的第一个元素5，在这种情况下，其中只有第二个元组应该保留，因为它具有max last element=&gt；3

预期op:

op = [(5,10,3),(21,24,1)]

代码:

op = []
for m in range(len(sample_list)):
    li = [sample_list[m]]
    for n in range(len(sample_list)):
        if(sample_list[m][0] == sample_list[n][0]
           and sample_list[m][2] != sample_list[n][2]):
            li.append(sample_list[n])
    op.append(sorted(li,key=lambda dd:dd[2],reverse=True)[0])

print (list(set(op)))

这很管用.这张单子很长，但很慢.有没有一种更为通灵或有效的方法来做到这一点？

TL；博士

使用collections.defaultdict是最快的 Select ，也可以说是最简单的 Select :

from collections import defaultdict

sample_list = [(5, 16, 2), (5, 10, 3), (5, 8, 1), (21, 24, 1)]

d = defaultdict(lambda: (0, 0, float("-inf")))
for e in sample_list:
    first, _, last = e
    if d[first][2] < last:
        d[first] = e

res = [*d.values()]
print(res)

Output

[(5, 10, 3), (21, 24, 1)]

这是一个单通O(n)，它不仅是渐近最优的，而且在实践中也表现良好.

详细解释

表演

为了证明这一点，我们可以设计一个实验，考虑问题的两个主要变量，唯一键的数量(元组第一个位置的值)和输入列表的长度，以及以下替代方法:

def defaultdict_max_approach(lst):
    d = defaultdict(lambda: (0, 0, float("-inf")))
    for e in lst:
        first, _, last = e
        if d[first][2] < last:
            d[first] = e
    return [*d.values()]


def dict_max_approach(lst):
    # https://stackoverflow.com/a/69025193/4001592
    d = {}
    for tpl in lst:
        first, *_, last = tpl
        if first not in d or last > d[first][-1]:
            d[first] = tpl

    return [*d.values()]


def groupby_max_approach(lst):
    # https://stackoverflow.com/a/69025193/4001592
    return [max(g, key=ig(-1)) for _, g in groupby(sorted(lst), key=ig(0))]

如下图所示，对于不同数量的唯一键(500、1000、5000、10000)以及多达1000000个元素的集合(请注意，中的x轴以千为单位)，使用defaultdict的方法是最有效的方法.

上述实验与其他人(1 2)的实验一致.重现实验的代码可以在here中找到.

Python 的

说最多pythonic个是主观的，但以下是赞成的主要论点:

Is a well known Python idiom

使用defaultdict对序列键值对进行分组，然后进行聚合，这是一个众所周知的Python习惯用法.

雷蒙德·赫廷格(Raymond Hettinger)在PyCon 2013 talk Transforming Code into Beautiful, Idiomatic Python中还表示，使用defaultdict进行此类操作是最重要的.

Is compliant with the Zen of Python

在Python的禅宗中可以这样理解

Flat is better than nested.
Sparse is better than dense.

使用defaultdict就像只使用for-loop和简单if语句的普通dict一样简单.在defaultdict的情况下，if条件更简单.

这两种解决方案都是sparser而不是itertools.groupby，请注意，这种方法还包括在列表中调用sorted、itemgetter和max.

原始答案

您可以使用collections.defaultdict对具有相同第一个元素的元组进行分组，然后根据第三个元素对每个组取最大值:

from collections import defaultdict

sample_list = [(5,16,2),(5,10,3),(5,8,1),(21,24,1)]

d = defaultdict(list)
for e in sample_list:
    d[e[0]].append(e)

res = [max(val, key=lambda x: x[2]) for val in d.values()]
print(res)

Output