将DataFrame读取为from_pandas_edgelist
的有向图,然后对topological_sort
的 node 进行排序,只保留每个子 node 具有最大拓Flutter 指数的父 node 的边:
G = nx.from_pandas_edgelist(data, source='manager_id', target='employee_id',
create_using=nx.DiGraph)
# topological order
order = {n: i for i,n in enumerate(nx.topological_sort(G))}
# {'A': 0, 'B': 1, 'D': 2, 'C': 3, 'E': 4}
# for each node, only keep the parent that has the greatest topological index
parents = {}
for source, target in G.edges():
p = parents.setdefault(target, source)
if order[source] > order[p]:
parents[target] = source
# parents
# {'C': 'B', 'E': 'C', 'B': 'A', 'D': 'A'}
# remove shorter edges
G.remove_edges_from(G.edges - set(zip(parents.values(), parents.keys())))
# or
# G = G.edge_subgraph(list(zip(parents.values(), parents.keys())))
输出:
过滤前图表:
Variant
您还可以计算topological_generations
的顺序,以使相同层代的 node 具有相同的编号.
order = {n: i for i, l in enumerate(nx.topological_generations(G)) for n in l}
# {'A': 0, 'B': 1, 'D': 1, 'C': 2, 'E': 3}
在LINK TO your other question中,还可以计算相对世代差,并仅保留两个 node 之间世代差为1的边:
G = nx.from_pandas_edgelist(data, source='manager_id', target='employee_id',
create_using=nx.DiGraph)
order = {n: i for i, l in enumerate(nx.topological_generations(G)) for n in l}
# {'A': 0, 'B': 1, 'D': 1, 'C': 2, 'E': 3}
# compute the relative generation difference
# keep the edges with a difference of 1
keep = [e for e in G.edges if order[e[1]]-order[e[0]] == 1]
# [('A', 'B'), ('A', 'D'), ('B', 'C'), ('C', 'E'), ('F', 'G')]
G = G.edge_subgraph(list(zip(parents.values(), parents.keys())))