R 有没有办法将基于每个值中出现的两个关键字或短语的字符串向量重新编码为具有这两个值的新向量

发布于02月09日

正如我的问题所指出的，我想把一个字符串向量转换成一个新的向量，它是每个字符串中出现的两个值之一.下面是一个非常简单的数据帧的例子:

data <- tibble::tibble(
  w = c("Strongly disagree", "Somewhat disagree", "Disagree", "Somewhat agree", "Strongly agree", "Agree"),
  x = c("Definitely true", "Probably true", "Somewhat false", "Definitely false", "Definitely true", "Definitely false"),
  y = c("Definitely not doing enough", "Definitely doing enough", "Possibly not doing enough", "Possibly doing enough", "Definitely not doing enough", "Somehat doing enough"),
  z = c("Very comfortable", "Comfortable", "Somewhat comfortable", "Very uncomfortable", "Somewhat uncomfortable", "Comfortable")
)

我们可以看到，w个字符串中的每一根都要么同意，要么不同意；x要么是真的，要么是假的；y的弦要么做得够多，要么做得不够；z的弦要么舒服，要么不舒服.有没有一个函数可以让我根据每列中出现的两个值中的一个来创建一个新的向量？让我来说明一下我的意思.

# write up a function
some_function <- function(arguments) {
  "function text goes here"
}

# use new function to create a vector based on `w` from `data`
data %>% some_function(w)

# resulting vector would be:
[1] "Disagree" "Disagree" "Disagree" "Agree" "Agree" "Agree

我得到的最接近的是这个函数.但是，它会删除字符串的第一个单词.如果每个字符串的第一个单词是描述字符串其余部分的形容词，这将是很好的，但在字符串只是一个单词的情况下，它会给我安娜.

# write function
make_dicho <- function(df = data, var) {
  
  df %>% 
    # pick out the column (equivalent to df[[var]])
    dplyr::pull({{ var }}) %>% 
    # convert to a factor
    haven::as_factor() %>% 
    # remove the first part of the factor
    stringr::str_extract("(?<=\\s).+") %>%
    # make the first letter uppercase
    stringr::str_to_sentence()
  
}
# test this on the fake data
data %>% make_dicho(., w)
[1] "Disagree" "Disagree" NA         "Agree"    "Agree"    NA

我之所以在里面有df参数，是因为我想在dplyr::mutate()的内部使用这个函数，就像data %>% mutate(new_a = make_dicho(., w)一样.

remove_first_word <- function(x) { ifelse( grepl("\\s", x), sub(".+\\s(*?)", "\\1", x), x ) |> # Make first letter upper case gsub("^([a-z])", "\\U\\1", x = _, perl = TRUE) }

data |> mutate( across(w:z, remove_first_word) ) # # A tibble: 6 × 4 # w x y z # <chr> <chr> <chr> <chr> # 1 Disagree True Not doing enough Comfortable # 2 Disagree True Doing enough Comfortable # 3 Disagree False Not doing enough Comfortable # 4 Agree False Doing enough Uncomfortable # 5 Agree True Not doing enough Uncomfortable # 6 Agree False Doing enough Comfortable

make_dicho <- function(dat, cols) { out <- dat |> select({{cols}}) |> purrr::map(remove_first_word_tidy) # Return vector if only one column supplied if(length(out)==1) return(out[[1]]) # Otherwise return list of vectors out } make_dicho(data, w) # [1] "Disagree" "Disagree" "Disagree" "Agree" "Agree" "Agree" make_dicho(data, y:z) # $y # [1] "Not Doing Enough" "Doing Enough" "Not Doing Enough" "Doing Enough" "Not Doing Enough" "Doing Enough" # $z # [1] "Comfortable" "Comfortable" "Comfortable" "Uncomfortable" "Uncomfortable" "Comfortable"

R 有没有办法将基于每个值中出现的两个关键字或短语的字符串向量重新编码为具有这两个值的新向量

推荐答案

`tidyverse` version

R相关问答推荐

提取rame中对应r中某个变量的n个最小正值和n个最大负值的条目

Tidyverse/Djirr为从嵌套列表中提取的列名赋值的解决方案

根据R中的另一个日期从多列中 Select 最近的日期和相应的结果

如何将dygraph调用到R Markdown作为一个shiny 的react 对象的参数？

在组中添加值增加和减少的行

R根据条件进行累积更改

如何改变x轴比例的列在面

将. xlsx内容显示为HTML表

通过在colname中查找其相应值来创建列

R Select()可以测试不存在的子集列

R：用GGPLATE，如何在两个独立的变量中制作不同形状的散点图？

有没有一种方法可以同时对rhandsontable进行排序和从rhandsontable中删除？

我们如何在R中透视数据并在之后添加计算

提高圣彼得堡模拟的速度

如何在反曲线图中更改X标签

随机 Select 的非NA列的行均数

如果满足条件，则替换列的前一个值和后续值

如何在shiny 的应用程序 map 视图宣传单中可视化单点

带查找数据的FCT_REORDER.帧

以列名的字符向量作为参数按行应用自定义函数

推荐答案

tidyverse version

R相关问答推荐

`tidyverse` version