我正在try 在RStudio中导入多个CSV文件,同时保留它们的文件名.

library(readr)
library(dplyr)
library(purrr)

#importing all csv files at once
csv_files = list.files(pattern ="*Con.csv")
myfiles = lapply(csv_files , read.delim, header = TRUE, sep = "," )

#merging all files by identifiers
Samp_merg <- myfiles %>% reduce(full_join, by=c("chr", "start","end"))

这样做之后,我可以导入文件,但列表myfiles中缺少文件的名称.

myfiles <- dir(pattern = "*Con.csv", full.names = FALSE) 
myfiles_data <- lapply(myfiles, data.table::fread) 

# assign names to list items
names(myfiles_data) <- myfiles 

#merging the files
dat_merg <- myfiles_data %>% reduce(full_join, by=c("chr", "start", "end"))

在这里,使用这个脚本,我可以通过将文件名保存在myfiles_data对象中来导入文件.但是,在通过三个标识符连接后,我无法将它们的文件名保留为列名.我想保留合并的dfcolname作为单独的文件名,没有扩展名(.csv).

目录中大约有90个CSV文件具有相同的头.

$ls
01AvPMPpCon.csv
02AvPMPpCon.csv
03AvPMPpCon.csv
04AvPMPpCon.csv
05AvPMPpCon.csv

$head 01AvPMPpCon.csv 
chr,start,end,CpG
chr1,2017424,2017750,10
chr1,24901325,24901700,11
chr1,24902268,24902701,25
chr1,24927215,24927416,4
chr1,26861926,26862173,5
chr1,26864186,26864613,15
chr1,35576334,35576451,3
chr1,36304606,36304817,7

现在,合并后的文件如下所示,

$head(dat_merg)
    chr    start      end CpG.x CpG.y CpG.x.x CpG.y.y CpG.x.x.x CpG.y.y.y
1: chr1  3903250  3903277     4    NA      NA      NA         4        NA
2: chr1  4657240  4657314     3    NA      NA      NA        NA        NA
3: chr1 24900249 24900468     5    NA       5      NA        NA        NA
4: chr1 46484938 46485047     4    NA       4      NA        NA        NA
5: chr1 47223634 47223758     4    NA      NA      NA         4         4
6: chr1 66752822 66753167    12    12      NA      NA        12        NA

所以,我的预期输出应该是这样的,

 $head(dat_merg)
        chr    start      end   01Av  02Av    03Av    04Av      05Av      06Av
    1: chr1  3903250  3903277     4    NA      NA      NA         4        NA
    2: chr1  4657240  4657314     3    NA      NA      NA        NA        NA
    3: chr1 24900249 24900468     5    NA       5      NA        NA        NA
    4: chr1 46484938 46485047     4    NA       4      NA        NA        NA
    5: chr1 47223634 47223758     4    NA      NA      NA         4         4
    6: chr1 66752822 66753167    12    12      NA      NA        12        NA

推荐答案

pivot_wider()代替reduce(full_join, ...)怎么样?

准备Reprex,99个4行CSV文件:

library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(readr)
library(purrr)

csv_ <- read_csv("chr,start,end,CpG
chr1,2017424,2017750,10
chr1,24901325,24901700,11
chr1,24902268,24902701,25
chr1,24927215,24927416,4
chr1,26861926,26862173,5
chr1,26864186,26864613,15
chr1,35576334,35576451,3
chr1,36304606,36304817,7", show_col_types = FALSE)

sprintf("%.2dAvPMPpCon.csv", 1:99) |>
  walk(\(f_) slice_sample(csv_, n = 4) |> write_csv(f_))

read_csv()可以从一个文件列表中读取文件名并存储在id列中,c(chr, start, end)将用于pivot_wider() id_cols:

list.files(pattern ="*Con.csv") |>
  read_csv(id = "src") |>
  mutate(src = substr(src, 1, 4)) |>
  pivot_wider(names_from = src, values_from = CpG)
#> Rows: 396 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): chr
#> dbl (3): start, end, CpG
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

结果:

#> # A tibble: 8 × 102
#>   chr      start     end `01Av` `02Av` `03Av` `04Av` `05Av` `06Av` `07Av` `08Av`
#>   <chr>    <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1 chr1  26864186  2.69e7     15     NA     15     15     NA     NA     NA     NA
#> 2 chr1  26861926  2.69e7      5      5     NA      5      5     NA     NA     NA
#> 3 chr1  24902268  2.49e7     25     25     25     25     25     25     NA     NA
#> 4 chr1  24927215  2.49e7      4      4     NA     NA      4     NA      4      4
#> 5 chr1  35576334  3.56e7     NA      3      3     NA     NA      3      3     NA
#> 6 chr1   2017424  2.02e6     NA     NA     10     10     NA     NA     10     10
#> 7 chr1  24901325  2.49e7     NA     NA     NA     NA     11     11     11     11
#> 8 chr1  36304606  3.63e7     NA     NA     NA     NA     NA      7     NA      7
#> # ℹ 91 more variables: `09Av` <dbl>, `10Av` <dbl>, `11Av` <dbl>, `12Av` <dbl>,
#> #   `13Av` <dbl>, `14Av` <dbl>, `15Av` <dbl>, `16Av` <dbl>, `17Av` <dbl>,
#> #   `18Av` <dbl>, `19Av` <dbl>, `20Av` <dbl>, `21Av` <dbl>, `22Av` <dbl>,
#> #   `23Av` <dbl>, `24Av` <dbl>, `25Av` <dbl>, `26Av` <dbl>, `27Av` <dbl>,
#> #   `28Av` <dbl>, `29Av` <dbl>, `30Av` <dbl>, `31Av` <dbl>, `32Av` <dbl>,
#> #   `33Av` <dbl>, `34Av` <dbl>, `35Av` <dbl>, `36Av` <dbl>, `37Av` <dbl>,
#> #   `38Av` <dbl>, `39Av` <dbl>, `40Av` <dbl>, `41Av` <dbl>, `42Av` <dbl>, …

创建于2024-01-18年第reprex v2.0.2

R相关问答推荐

卸载安装了BRM的模型发出的警告

在ggplot Likert条中添加水平线

如何根据条件计算时差(天)

根据R中两个变量的两个条件删除带有dspirr的行

抖动点与嵌套类别变量箱形图的位置不对齐

多重RHS固定估计

修改用R编写的用户定义函数

如果第一个列表中的元素等于第二个列表的元素,则替换为第三个列表的元素

如何在geom_col中反转条

将选定的索引范围与阈值进行比较

使用R中的dist()迭代ID匹配的欧几里德距离

将多个列值转换为二进制

我将工作代码重构为一个函数--现在我想不出如何传递轴列参数

在R中,如何从一系列具有索引名的变量快速创建数据帧?

如何使用字符串从重复的模式中提取多个数字?

R将函数参数传递给ggploy

禁用时,SelecizeInput将变得不透明

通过匹配另一个表(查找表)中的列值来填充数据表,并在另一个变量上进行内插

通过比较来自多个数据框的值和R中的条件来添加新列

将字符变量出现次数不相等的字符框整形为pivot_wider,而不删除重复名称或嵌套字符变量