这个问题是从here开始讨论的.我觉得,进一步的阐述补充保证了发布一个新问题,因为它与原始帖子中的问题不同.如果我错了,请道歉.
给定与以下类似的数据帧:
mydf <- data.frame(X1=c("0_times","3-10_times", "11-20_times", "1-2_times","3-10_times",
"0_times","3-10_times", "11-20_times", "1-2_times","3-10_times" ),
X2=c('ab','bb','cb','db','eb','ab','bb','cb','db','eb'),
X3=c("11-20_times", "3-10_times","1-2_times","21-30_times","more_than_30_times",
"11-20_times", "3-10_times","1-2_times","21-30_times","more_than_30_times"),
X4=c("foo", "bar","fizz","buzz","weee","foo", "bar","fizz","buzz","weee"),
X5=c("3-10_times","1-2_times","0_times","more_than_30_times","11-20_times",
"21-30_times","1-2_times","0_times","3-10_times","11-20_times")
)
我想创建第二个数据帧来存储列名和来自第一个数据帧的唯一值的列表/向量.
结果如下:
names vals
1 X1 0_times, 1-2_times, 3-10_times, 11-20_times
2 X2 ab, bb, cb, db, eb
3 X3 1-2_times, 3-10_times, 11-20_times , 21-30_times, more_than_30_times
4 X4 foo, bar, fizz, buzz, weee
1 X5 0_times,1-2_times,3-10_times,11-20_times,21-30_times
我使用以下方法创建第二个数据帧:
mydf2 <- data.frame(names = colnames(mydf))
mydf2$vals <- lapply(mydf, unique)
我认为到目前为止还可以.然而,我面临的挑战是,我需要包含数字的向量(在本例中只有mydf2$X1
个)以升序排序,而不仅仅是使用每个项目的第一个数字.
在a Stack user的大力帮助下,这是一种对包含数字的向量进行排序的方法,它在单个向量上运行良好:
mylist <- c('0_times','3-10_times','11_20_times','1-2_times','more_than_20_times')
o <- sapply(strsplit(mylist, '\\D+'), function(x) min(as.numeric(x[nzchar(x)])))
mylist[order(o)]
当我试图通过替换列名将其应用于整个mydf2$vals
列时:
o <- sapply(strsplit(mydf2$vals, '\\D+'), function(x) min(as.numeric(x[nzchar(x)])))
mydf2$vals[order(o)]
我收到错误error in evaluating the argument 'X' in selecting a method for function 'sapply': non-character argument
我有两个问题:
- 有没有更简单的方法来实现我的目标?
- 如何修改建议的排序函数以避免出现错误?