我有一个没有标题行的大型CSV文件,标题对我来说是一个向量.我希望在不加载整个文件的情况下使用文件列的子集.所需列的子集以单独列表的形式提供.
Edit: in this case, the column names provided in the header list are important. This MRE only has 4 column names, but the solution should work for a large dataset with pre-specified column names. The catch is that the column names are only provided externally, not as a header in the CSV file.
1,2,3,4
5,6,7,8
9,10,11,12
header <- c("A", "B", "C", "D")
subset <- c("D", "B")
到目前为止,我一直在以以下方式读取数据,这将获得我想要的结果,但首先加载整个文件.
# Setup
library(readr)
write.table(
structure(list(V1 = c(1L, 5L, 9L), V2 = c(2L, 6L, 10L), V3 = c(3L, 7L, 11L), V4 = c(4L, 8L, 12L)), class = "data.frame", row.names = c(NA, -3L)),
file="sample-data.csv",
row.names=FALSE,
col.names=FALSE,
sep=","
)
header <- c("A", "B", "C", "D")
subset <- c("D", "B")
# Current approach
df1 <- read_csv(
"sample-data.csv",
col_names = header
)[subset]
df1
# A tibble: 3 × 2
D B
<dbl> <dbl>
1 4 2
2 8 6
3 12 10
如何才能在不先加载整个文件的情况下获得相同的结果?
相关问题
- Only read selected columns包括第一行中的标题.
-
Ways to read only select columns from a file into R? (A happy medium between
read.table
andscan
?) [duplicate]不指定文件外的列名,答案不适用于这种情况. - how to skip reading certain columns in readr [duplicate]不同,因为它似乎是跳过未知的第一列,并在多个文件中读取已知的第二列和第三列.在这个问题中,数据类型不一定事先就知道.
- Is there a way to omit the first column when reading a csv [duplicate]:根据位置跳过列,而不是外部提供的列名列表中的位置.