I have two lists.

L1 = ['worry not', 'be happy', 'very good', 'not worry', 'good very', 'full stop'] # bigrams list
L2 = ['take into account', 'always be happy', 'stay safe friend', 'happy be always'] #trigrams list

If I look closely, L1 has 'not worry' and 'good very' which are exact reversed repetitions of 'worry not' and 'very good'.

I need to remove such reversed elements from the list. Similary in L2, 'happy be always' is a reverse of 'always be happy', which is to be removed as well.

The final output I'm looking for is:

L1 = ['worry not', 'be happy', 'very good', 'full stop']
L2 = ['take into account', 'always be happy', 'stay safe friend']

I tried one solution

[[max(zip(map(set, map(str.split, group)), group))[1]] for group in L1]

But it is not giving the correct output. Should I be writing different functions for bigrams and trigrams reverse repetition removal, or is there a pythonic way of doing this in a faster way,because I'll have to run this for about 10K+strings.

推荐答案

You can do it with list comprehensions if you iterate over the list from the end

lst = L1[::-1] # L2[::-1]
x = [s for i, s in enumerate(lst) if ' '.join(s.split()[::-1]) not in lst[i+1:]][::-1]

# L1: ['worry not', 'be happy', 'very good', 'full stop']
# L2: ['take into account', 'always be happy', 'stay safe friend']

Python相关问答推荐

Python避免mypy在相互引用中从另一个类重定义类时失败

用fft计算指数复和代替求和来模拟衍射?

如何将泛型类类型与函数返回类型结合使用?

如果不使用. to_list()[0],我如何从一个pandas DataFrame中获取一个值?

Django.core.exceptions.SynchronousOnlyOperation您不能从异步上下文中调用它-请使用线程或SYNC_TO_ASYNC

PYTHON中的pd.wide_to_long比较慢

如果服务器设置为不侦听创建,则QWebSocket客户端不连接到QWebSocketServer;如果服务器稍后开始侦听,则不连接

解析CSV文件以将详细信息添加到XML文件

关于数字S种子序列内部工作原理的困惑

当lambda函数作为参数传递时,pyo3执行

如何将ManyToManyfield用于Self类

Django/Python-UpdateView中的Delete函数正在复制,而不是删除

有条件的滚动平均数(面试问题)

提取子数组,然后在Python中将它们连接起来

无法使用ConnectSlotsByName将插槽连接到pqtgraph InfiniteLine信号

拆分单词的正则表达式:前三个大写字母,然后按大写字母拆分,然后按小写字母拆分

Pandas 替换为另一行中的值

[]和Expression API的区别是什么?

PYTHON中的regex与PostgreSQL等其他系统中的regex有区别吗?

在所有任务完成之前完成Python事件循环