我正在try 在RStudio中导入多个CSV文件,同时保留它们的文件名.
library(readr)
library(dplyr)
library(purrr)
#importing all csv files at once
csv_files = list.files(pattern ="*Con.csv")
myfiles = lapply(csv_files , read.delim, header = TRUE, sep = "," )
#merging all files by identifiers
Samp_merg <- myfiles %>% reduce(full_join, by=c("chr", "start","end"))
这样做之后,我可以导入文件,但列表myfiles
中缺少文件的名称.
myfiles <- dir(pattern = "*Con.csv", full.names = FALSE)
myfiles_data <- lapply(myfiles, data.table::fread)
# assign names to list items
names(myfiles_data) <- myfiles
#merging the files
dat_merg <- myfiles_data %>% reduce(full_join, by=c("chr", "start", "end"))
在这里,使用这个脚本,我可以通过将文件名保存在myfiles_data
对象中来导入文件.但是,在通过三个标识符连接后,我无法将它们的文件名保留为列名.我想保留合并的df
的colname
作为单独的文件名,没有扩展名(.csv).
目录中大约有90个CSV文件具有相同的头.
$ls
01AvPMPpCon.csv
02AvPMPpCon.csv
03AvPMPpCon.csv
04AvPMPpCon.csv
05AvPMPpCon.csv
$head 01AvPMPpCon.csv
chr,start,end,CpG
chr1,2017424,2017750,10
chr1,24901325,24901700,11
chr1,24902268,24902701,25
chr1,24927215,24927416,4
chr1,26861926,26862173,5
chr1,26864186,26864613,15
chr1,35576334,35576451,3
chr1,36304606,36304817,7
现在,合并后的文件如下所示,
$head(dat_merg)
chr start end CpG.x CpG.y CpG.x.x CpG.y.y CpG.x.x.x CpG.y.y.y
1: chr1 3903250 3903277 4 NA NA NA 4 NA
2: chr1 4657240 4657314 3 NA NA NA NA NA
3: chr1 24900249 24900468 5 NA 5 NA NA NA
4: chr1 46484938 46485047 4 NA 4 NA NA NA
5: chr1 47223634 47223758 4 NA NA NA 4 4
6: chr1 66752822 66753167 12 12 NA NA 12 NA
所以,我的预期输出应该是这样的,
$head(dat_merg)
chr start end 01Av 02Av 03Av 04Av 05Av 06Av
1: chr1 3903250 3903277 4 NA NA NA 4 NA
2: chr1 4657240 4657314 3 NA NA NA NA NA
3: chr1 24900249 24900468 5 NA 5 NA NA NA
4: chr1 46484938 46485047 4 NA 4 NA NA NA
5: chr1 47223634 47223758 4 NA NA NA 4 4
6: chr1 66752822 66753167 12 12 NA NA 12 NA