以下是我的数据(下载需要几秒钟,请耐心等待):
library(dplyr)
mydata <- "https://pxdata.stat.fi:443/PxWeb/sq/87e44319-48f8-41b4-bd0d-a6629dc7829c" %>%
paste0(".relational_table") %>% read.table(sep = "\t", header = T)
现在,一些行看起来应该是这样的,例如
> head(mydata)
Underlying.cause.of.death..ICD.10..3.character.level. Age Year Sex Information Deaths
1 A00-Y89 Total Total 2022 Total Deaths 63172
2 A00-Y89 Total Total 2022 Males Deaths 31703
3 A00-Y89 Total Total 2022 Females Deaths 31469
4 A00-Y89 Total 0 2022 Total Deaths 91
5 A00-Y89 Total 0 2022 Males Deaths 52
6 A00-Y89 Total 0 2022 Females Deaths 39
然而,有一些行看起来并不那么好:
> mydata %>% filter(grepl("\t",Underlying.cause.of.death..ICD.10..3.character.level.)) %>% head
Underlying.cause.of.death..ICD.10..3.character.level. Age Year Sex Information Deaths
1 A30 Leprosy (Hansens disease)\tTotal\t2022\tTotal\tDeaths\t0\nA30 Leprosy (Hansens disease) Total 2022 Males Deaths 0
2 A30 Leprosy (Hansens disease)\tTotal\t2022\tFemales\tDeaths\t0\nA30 Leprosy (Hansens disease) 0 2022 Total Deaths 0
3 A30 Leprosy (Hansens disease)\t0\t2022\tMales\tDeaths\t0\nA30 Leprosy (Hansens disease) 0 2022 Females Deaths 0
4 A30 Leprosy (Hansens disease)\t1 - 4\t2022\tTotal\tDeaths\t0\nA30 Leprosy (Hansens disease) 1 - 4 2022 Males Deaths 0
5 A30 Leprosy (Hansens disease)\t1 - 4\t2022\tFemales\tDeaths\t0\nA30 Leprosy (Hansens disease) 5 - 9 2022 Total Deaths 0
6 A30 Leprosy (Hansens disease)\t5 - 9\t2022\tMales\tDeaths\t0\nA30 Leprosy (Hansens disease) 5 - 9 2022 Females Deaths 0
有什么 idea ,为什么会发生这样的事情?如果Read.table应该使用"\t"作为列分隔符,那么它为什么要粘贴原始行,如下所示,而这种情况只发生在某些行上?
有没有更好的函数可以正确地将这些数据读取到表中?
(我正在使用Windows 10,如果这可能与这个问题有关的话.)