我正在努力学习更多关于网络图搜索算法的知识.为了说明这一点,我创建了以下示例.

假设有Step 1:个国家(COUNTRY_1...COUNTRY_Step 1:)彼此随机连接

set.seed(123)


library(igraph)


countries <- paste0("country_", 1:100)


g <- make_empty_graph(100)


num_edges <- 200
edge_list <- sample(countries, size = num_edges * 2, replace = TRUE)
edge_list <- matrix(edge_list, ncol = 2, byrow = TRUE)
g <- graph_from_edgelist(edge_list, directed = FALSE)


V(g)$name <- countries

plot(g, vertex.label.cex = 0.7, vertex.label.color = "black", vertex.label.dist = 2)

Step 2:现在,假设20个人(Person_A...Person_T)生活在这些国家(每个国家最多只能有一个人--其中80个国家将是空的):

edge_list <- as_edgelist(g)

df <- as.data.frame(edge_list)

colnames(df) <- c("from", "to")

people <- paste0("person_", LETTERS[1:20])

assignment <- sample(countries, size = length(people), replace = FALSE)
names(assignment) <- people

df2 <- data.frame(country = countries)


df2$person <- ifelse(df2$country %in% assignment, names(assignment)[match(df2$country, assignment)], "empty")

Step 3:作为可选步骤,我们可以将结果可视化:

library(visNetwork)

df2$color <- ifelse(df2$person == "empty", "grey", "red")

df2$label <- ifelse(df2$person == "empty", df2$country, paste0(df2$person, "\n", df2$country))

nodes <- data.frame(id = df2$country, label = df2$label, color = df2$color)

edges <- df

visNetwork(nodes, edges) %>% 
    visInteraction(navigationButtons = TRUE)

enter image description here

假设我们以"Person_A"为例--我想找出谁离"Person_A"最近,这个人住在哪个国家.I am interested in learning how to write a BFS algorithm for this problem by hand-例如:取Person_A搜索半径为1的所有人--如果没有找到任何人,现在搜索半径为2的所有人...继续下go ,直到你找到第一个人(S).

我知道如何使用此算法的预构建实现:

adj_matrix <- as_adjacency_matrix(g)

diag(adj_matrix) <- 0

shortest_paths <- shortest.paths(g)

df2_filtered <- subset(df2, person != "empty")
selected_countries <- intersect(rownames(shortest_paths), df2_filtered$country)

filtered_paths <- shortest_paths[selected_countries, selected_countries]

item = df2[df2$person %in% c("person_A"), ]

#answer (exclude distance = 0, i.e. the same country itself)
sort(filtered_paths[rownames(filtered_paths) == item$country, ])[2]

Can someone please show me how I could write a search algorithm (by hand) to accomplish this task which starts with a person's name - and then prints the results of the search at each step until a person is found?

推荐答案

背景/概述

广度优先搜索的总体思路是从图中的一点(让我们称之为a)开始,然后将所有邻居添加到未探索点的列表中(我们称之为frontier点).然后逐个判断列表,对于每个点,您将该点的不可见邻居添加到队列的末尾,依此类推,直到您找到您正在寻找的点(可以是特定的点b,或者满足您设置的特定标准的任何点),或者您已经用完了所有地方(因为您已经探索了所有地方).

整理数据

首先,我要清理一下数据.我创建了一个数据帧,其中只有存在的人(没有空):

people_df <- df2 %>% 
    filter(person != "empty") %>%
    select(person, country)

然后,我将Country Connections数据帧df转换为数据帧neighbours_df,它为我提供每个点的邻居.根据数据帧的 struct ,它有(例如)一行:

      from         to
country_31 country_79

但没有一个是相反的,即

      from         to
country_79 country_31

因此,我交换了列,将反转的列添加到第一列的末尾,并将每个点的邻居分组到一个列表中,以使其更整洁:

reversed_df <- df %>% 
    mutate(new_from = to, to = from, from = new_from) %>% 
    select(from, to)

neighbours_df <- df %>% 
    bind_rows(reversed_df) %>% 
    filter(from != to) %>%
    group_by(from) %>%
    summarise(to = list(to))
    
#          from                            to
# 1    country_1   c("country_8", "country_92")

实施

breadth_first_search <- function(person, neighbours_df, people_df) {
  # get the country of the person
    starting_country <- people_df$country[people_df$person == person]

  # initialise the visited list 
    visited <- c()

  # initialise the frontier with the starting point
    frontier <- list()
    frontier[[starting_country]] <- 0

  # initialise distance from start variable (so we can print how far we are from the start)
    distance_from_start <- 0

  # while the frontier is not empty
    while (length(frontier) > 0) {

      # get the first element of the frontier
        current <- names(frontier)[1]

      # get the distance from current to start
        distance_from_start <- frontier[[1]]
      
      print(paste0("Current point: ", current, " (", distance_from_start, " steps from start)"))

      # remove the first element of the frontier (the one we just selected)
        frontier <- frontier[-1]

        # if the current point is in the country column of our `people_df` (i.e. someone lives there), and it's not the starting country, return the person who lives there
        if (current %in% people_df$country && current != starting_country) {
          found_person <- people_df$person[people_df$country == current]
          print(paste0("Found person: ", found_person, ", " , distance_from_start , " steps from start, in country ", current))
            return(found_person)
        }

        # add the current point to the visited list
        visited <- c(visited, current)

        # get the neighbors of the current point
        neighbs <- neighbours_df$to[neighbours_df$from == current][[1]]

        # add the neighbors to the frontier if they haven't been visited already
        neighbs <- neighbs[!neighbs %in% visited]
        frontier <- c(frontier, setNames(rep(distance_from_start + 1, length(neighbs)), neighbs))
        }
    # if we search through all the points, and didn't find anyone, return NA
    return(NA)
}

print(breadth_first_search("person_R", neighbours_df, people_df))
# [1] "person_J"

参考资料/更多信息

我被Red Blob Games(this other piece的伙伴,它很好地介绍了广度优先搜索(以及其他类似的图形搜索算法,如A*))从this article抄袭了相当多的内容.如果你想更全面地了解它们是如何工作的,BFS的缺点和优点,和/或想要玩一些互动的东西,我建议你go 看看!

R相关问答推荐

如何使用R以NASAGIBS.ViirsEarthAtNight2012风格绘制自定义 map

有没有一种方法可以在子包上使用‘library()’中的‘exclub’参数?

使用lapply的重新定位功能

使用R中的gt对R中的html rmarkdown文件进行条件格式设置表的单元格

为什么st_join(ob1,ob2,left = True)返回具有比ob1更多功能的sf对象?

如何使用R Shiny中的条件面板仅隐藏和显示用户输入,同时仍允许运行基础计算?

警告:lmdif:info = 0. nls. lm()函数的输入参数不正确

如何写一个R函数来旋转最后n分钟?

制作等距离的线串副本

使用R闪光显示所有数据点作为默认设置

在数组索引上复制矩阵时出错

以更少间隔的较小表中的聚合离散频率表

plotly hover文本/工具提示在shiny 中不起作用

R Read.table函数无法对制表符分隔的数据正常工作

ComplexHEAT:使用COLUMN_SPLIT时忽略COLUMN_ORDER

将数据集旋转到长格式,用于遵循特定名称模式的所有变量对

如何删除设置大小的曲线图并添加条形图顶部数字的百分比

将列表中的字符串粘贴到R中for循环内的dplyr筛选器中

防止正则表达式覆盖以前的语句

如何为混合模型输出绘制不同的线型?