我想找出这对曾经联系过的人.以下是数据:
Input is
K-\> M, H
M-\> K, E
H-\> F
B-\> T, H
E-\> K, H
F-\> K, H, E
A-\> Z
输出结果为:
Output:
K, M //(this means K has supplied goods to M and M has also supplied some good to K)
H, F
这是我写的代码.
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession, SQLContext
from pyspark.ml.regression import LinearRegression
import re
from itertools import combinations
spark = SparkContext("local", "DoubleRDD")
def findpairs(ls):
lst = []
for i in range(0,len(ls)-1):
for j in range(i+1, len(ls)):
if ls[i] == tuple(reversed(ls[j])):
lst.append(ls[i])
return(lst)
text = spark.textFile("path to the .txt")
text = text.map(lambda s: s.replace("->",","))
text = text.map(lambda s: s.replace(",",""))
text = text.map(lambda s: s.replace(" ",""))
pairs = text.flatMap(lambda x: [(x[0],y) for y in x[1:]])
commonpairs = pairs.filter(lambda x: findpairs(x))
pairs.collect()
The output is: []