I have two lists.
L1 = ['worry not', 'be happy', 'very good', 'not worry', 'good very', 'full stop'] # bigrams list
L2 = ['take into account', 'always be happy', 'stay safe friend', 'happy be always'] #trigrams list
If I look closely, L1 has 'not worry'
and 'good very'
which are exact reversed repetitions of 'worry not'
and 'very good'
.
I need to remove such reversed elements from the list. Similary in L2, 'happy be always'
is a reverse of 'always be happy'
, which is to be removed as well.
The final output I'm looking for is:
L1 = ['worry not', 'be happy', 'very good', 'full stop']
L2 = ['take into account', 'always be happy', 'stay safe friend']
I tried one solution
[[max(zip(map(set, map(str.split, group)), group))[1]] for group in L1]
But it is not giving the correct output. Should I be writing different functions for bigrams and trigrams reverse repetition removal, or is there a pythonic way of doing this in a faster way,because I'll have to run this for about 10K+strings.