正如我的问题所指出的,我想把一个字符串向量转换成一个新的向量,它是每个字符串中出现的两个值之一.下面是一个非常简单的数据帧的例子:

data <- tibble::tibble(
  w = c("Strongly disagree", "Somewhat disagree", "Disagree", "Somewhat agree", "Strongly agree", "Agree"),
  x = c("Definitely true", "Probably true", "Somewhat false", "Definitely false", "Definitely true", "Definitely false"),
  y = c("Definitely not doing enough", "Definitely doing enough", "Possibly not doing enough", "Possibly doing enough", "Definitely not doing enough", "Somehat doing enough"),
  z = c("Very comfortable", "Comfortable", "Somewhat comfortable", "Very uncomfortable", "Somewhat uncomfortable", "Comfortable")
)

我们可以看到,w个字符串中的每一根都要么同意,要么不同意;x要么是真的,要么是假的;y的弦要么做得够多,要么做得不够;z的弦要么舒服,要么不舒服.有没有一个函数可以让我根据每列中出现的两个值中的一个来创建一个新的向量?让我来说明一下我的意思.

# write up a function
some_function <- function(arguments) {
  "function text goes here"
}

# use new function to create a vector based on `w` from `data`
data %>% some_function(w)

# resulting vector would be:
[1] "Disagree" "Disagree" "Disagree" "Agree" "Agree" "Agree   

我得到的最接近的是这个函数.但是,它会删除字符串的第一个单词.如果每个字符串的第一个单词是描述字符串其余部分的形容词,这将是很好的,但在字符串只是一个单词的情况下,它会给我安娜.

# write function
make_dicho <- function(df = data, var) {
  
  df %>% 
    # pick out the column (equivalent to df[[var]])
    dplyr::pull({{ var }}) %>% 
    # convert to a factor
    haven::as_factor() %>% 
    # remove the first part of the factor
    stringr::str_extract("(?<=\\s).+") %>%
    # make the first letter uppercase
    stringr::str_to_sentence()
  
}
# test this on the fake data
data %>% make_dicho(., w)
[1] "Disagree" "Disagree" NA         "Agree"    "Agree"    NA  

我之所以在里面有df参数,是因为我想在dplyr::mutate()的内部使用这个函数,就像data %>% mutate(new_a = make_dicho(., w)一样.

推荐答案

从你的描述听起来,你很乐意go 掉第一个单词,除非有一个以上的单词.如果没有空格,我们可以假设只有一个单词.

remove_first_word  <- function(x) {
    ifelse(
        grepl("\\s", x),
        sub(".+\\s(*?)", "\\1", x),
        x
    )  |>
    # Make first letter upper case
    gsub("^([a-z])", "\\U\\1", x = _, perl = TRUE)
}

然后您可以根据需要在mutate()中使用它:

data  |>
    mutate(
        across(w:z, remove_first_word)
    )
# # A tibble: 6 × 4
#   w        x     y                z            
#   <chr>    <chr> <chr>            <chr>        
# 1 Disagree True  Not doing enough Comfortable  
# 2 Disagree True  Doing enough     Comfortable  
# 3 Disagree False Not doing enough Comfortable  
# 4 Agree    False Doing enough     Uncomfortable
# 5 Agree    True  Not doing enough Uncomfortable
# 6 Agree    False Doing enough     Comfortable  

tidyverse version

作为对您的 comments 的回应,以下是原始函数的stringr版本:

remove_first_word_tidy  <- function(x) {
    dplyr::if_else(
        stringr::str_detect(x, "\\s"),
        stringr::str_replace(x, "\\w+\\s", ""),
        x
    )  |>
    stringr::str_to_title()
}

您可以创建一个函数,该函数获取数据框和列的列表并应用此函数.如果要使用tidyverse,我们可以使用整齐的SELECT函数,并使用purrr::map()将其应用于所有需要的列,并生成向量列表:

make_dicho  <- function(dat, cols) {

    out  <- dat  |>
        select({{cols}})  |>
        purrr::map(remove_first_word_tidy)
    
    # Return vector if only one column supplied
    if(length(out)==1) return(out[[1]])
    # Otherwise return list of vectors
    out
}


make_dicho(data, w) 
# [1] "Disagree" "Disagree" "Disagree" "Agree"    "Agree"    "Agree"   

make_dicho(data, y:z)
# $y
# [1] "Not Doing Enough" "Doing Enough"     "Not Doing Enough" "Doing Enough"     "Not Doing Enough" "Doing Enough"    

# $z
# [1] "Comfortable"   "Comfortable"   "Comfortable"   "Uncomfortable" "Uncomfortable" "Comfortable"  

R相关问答推荐

提取rame中对应r中某个变量的n个最小正值和n个最大负值的条目

Tidyverse/Djirr为从嵌套列表中提取的列名赋值的解决方案

根据R中的另一个日期从多列中 Select 最近的日期和相应的结果

如何将dygraph调用到R Markdown作为一个shiny 的react 对象的参数?

在组中添加值增加和减少的行

R根据条件进行累积更改

如何改变x轴比例的列在面

将. xlsx内容显示为HTML表

通过在colname中查找其相应值来创建列

R Select()可以测试不存在的子集列

R:用GGPLATE,如何在两个独立的变量中制作不同形状的散点图?

有没有一种方法可以同时对rhandsontable进行排序和从rhandsontable中删除?

我们如何在R中透视数据并在之后添加计算

提高圣彼得堡模拟的速度

如何在反曲线图中更改X标签

随机 Select 的非NA列的行均数

如果满足条件,则替换列的前一个值和后续值

如何在shiny 的应用程序 map 视图宣传单中可视化单点

带查找数据的FCT_REORDER.帧

以列名的字符向量作为参数按行应用自定义函数