我有一个有三列的数据集,理论上应该有相同数量的唯一观测值.
以下是数据样本:
speciesID common_name species
s001 common lizard Zootoca vivipara
s002 social tuco-tuco Ctenomys sociabilis
s002 social tuco-tuco Ctenomys sociabilis
s002 social tuco-tuco Ctenomys sociabilis
s002 social tuco-tuco Ctenomys sociabilis
s002 social tuco-tuco Ctenomys sociabilis
s003 red grouse Lagopus lagopus scoticus
s003 red grouse Lagopus lagopus scoticus
s004 elk Cervus elaphus
完整的数据集可以在here处找到.
但是,当我判断唯一观察的数量时,它们并不匹配.
df %>% as_tibble() %>% count(speciesID) %>% nrow() #148 unique values
df %>% as_tibble() %>% count(common_name) %>% nrow() #150 unique values
df %>% as_tibble() %>% count(species) %>% nrow() #147 unique values
Is there a way to figure out which where the 2 missing unique values are from the 100 column and the 3 missing unique values are from the 101 column?
理想情况下,我希望能够识别出问题行,以便能够返回原始数据并修复错误(即,应该有150条唯一记录).
我希望有一种方法可以在R中实现这一点,而不是手动判断大约700行数据.
我try 过使用anti_join
,但这并不成功.
我在R工作,最舒服的是dplyr
.