如何将 R 目录中的文件名与 CSV 列中的名称匹配

发布于07月31日

我正在try 编写一个r脚本，该脚本将与目录中的文件名匹配，并将其与CSV文件中的文件名进行比较.这样我就可以知道哪些文件已经下载，哪些数据需要下载.我已经编写了代码，它将从目录中读取文件，并将它们作为DF列出，以及读取CSV文件.然而，我在更改文件名以提取我想要的字符串以及将文件名与CSV文件中的名称列匹配时遇到了问题.我还想理想地创建一个新的Electron 表格，可以告诉我什么文件匹配，这样我就知道什么已经下载.这就是我到目前为止所拥有的.

# read files from directory and list as df
file_names <-list.files(path="KOMP/", 
                        pattern="nrrd",
                        all.files=TRUE,
                        full.names=TRUE,
                        recursive=TRUE) %>%
# turn into df
as.data.frame(x = file_names)

# read in xl file 
name_data <- read_excel("KOMP/all_data.xlsx")

# change the file_name from the string KOMP//icbm/agtc1/12dsfs.nrrd.txt  to -> 12dsfs
# match the file name with the name column in name_data
# create a new spread sheet that pulls the id and row if it has been downloaded [enter image description here][1]

Example files/directory

让我们创建一个包含一些示例文件的示例目录.这将让我们证明该解决方案是有效的，并且是可重复的解决方案的关键.

library(dplyr) library(writexl) library(readxl) # Example directory with example files dir.create(path = "KOMP") write.csv(data.frame(x = 5), file = "KOMP/foo.csv") write.csv(data.frame(x = 20), file = "KOMP/foo.nrrd.csv") write.csv(data.frame(x = 1), file = "KOMP/foo2.nrrd.csv") write.csv(data.frame(z = 2), file = "KOMP/bar.csv") write.csv(data.frame(z = 5), file = "KOMP/bar.rrdr.csv") # Example Excel file write_xlsx(data.frame(name = c("foo", "hotdog")), path = "KOMP/all_data.xlsx")

Solution

现在，我们可以使用我们的示例文件和目录来显示问题的解决方案.

# Get file paths in a data.frame for those that contain ".nrrd" # Use data.frame() to avoid row names instead of as.data.frame() # Need to use \\ to escape the period in the regular expression file_names <- list.files( path = "KOMP/", pattern = "\\.nrrd", all.files = TRUE, full.names = TRUE, recursive = TRUE ) %>% data.frame(paths = .) # Extract part of file name (i.e. removing directory substrings) that # comes before .nrrd and add a column. Can get file name with basename() # and use regular expressions for the other part. file_names$match_string <- file_names %>% pull(paths) %>% basename() %>% gsub(pattern = "\\.nrrd.*", replacement = "") file_names$match_string #> [1] "foo" "foo2" # Read in excel file with file names to match (if possible) name_data <- read_excel("KOMP/all_data.xlsx") name_data$name #> [1] "foo" "hotdog" # Create match indicator and row number name_data <- name_data %>% mutate( matched = case_when(name %in% file_names$match_string ~ 1, TRUE ~ 0), rowID = row_number() ) # Create excel spreadsheet of files already downloaded name_data %>% filter(matched == 1) %>% write_xlsx(path = "KOMP/already_downloaded.xlsx")

如何将 R 目录中的文件名与 CSV 列中的名称匹配

推荐答案

Example files/directory

Solution

R相关问答推荐

按R中不同长度的组将日期时间列值四舍五入到小时

R：如何在没有for循环的情况下替换多边形几何中的值？

仅返回R中所有其他列的列ID和年份缺失(NA)数据的列表

如何生成包含可能条目列表而不是计数的表？

R -模运算后的加法

如何将具有重复名称的收件箱合并到R中的另一列中，而结果不同？

在特定列上滞后n行，同时扩展框架的长度

管道末端运行功能

按R中的组查找相邻列的行累积和的最大值

提取具有连续零值的行，如果它们前面有R中的有效值

使用较长的查询提取具有部分匹配的列表中的较短目标，

如何同时从多个列表中获取名字？

使用for循环和粘贴创建多个变量

哪一行和行和 Select 特定行，但是考虑到Nas

R -在先前group_by级别汇总时获取最大大小子组的计数

在R中，我如何使用滑动窗口计算位置，然后进行过滤？

在数据帧列表上绘制GGPUP

如何使用字符串从重复的模式中提取多个数字？

自定义交互作用图的标签

如果满足条件，则替换列的前一个值和后续值