R 根据现有列的名称和字符串的存在进行变异以创建多个新列

发布于03月12日

我有三个问题，但它们是相关的.

My starting point is

Question 1:我想创建五个新列:#Enterprise_Sig、Income_Sig、Costs_Sig、Net Income_Sig和Assets_Sig.这些列在其各自的列中将包含相同数量的*.因此，原始列将只包含数字.例如，下面的代码做了我需要的事情，但只针对一列.

table_2 <- table_2 %>% mutate("Net Revenues_Sig" = ifelse(str_count(table_2$`Net Revenues`,"∗")==1, "∗",ifelse(str_count(table_2$`Net Revenues`,"∗")==2, "∗∗",ifelse(str_count(table_2$`Net Revenues`,"∗")==3, "∗∗∗",""))))

table_2$`Net Revenues` <- str_replace_all(table_2$`Net Revenues`, "[∗]", "")

to produce

当然，我可以再重复这个过程4次，但肯定有更有效的方法来做到这一点……？

Question 2:我想做一些类似的事情，除了方括号.我将如何为不带括号的各个标准误差创建5个新列(例如，新列INAGERS_SE将是数值，包含值24346.05、16080.92、34895.03)，然后删除这三行(因此，收入只有三个长期、短期和一次总和值)？

Question 3:如何将除5个SIG列以外的所有列转换为数字(由于括号和星号，当前为字符)？

structure(list(Treatment = c("Long Term Arm", "", "Short Term Arm", 
"", "Lumpsum Arm", ""), `# Enterprises` = c("9.93∗∗", "[3.96]", 
"3.39", "[3.57]", "14.67∗∗∗", "[3.92]"), Revenues = c("61379.40∗∗", 
"[24346.05]", "23177.47", "[16080.92]", "107746.75∗∗∗", 
"[34895.03]"), Costs = c("32055.29∗", "[16478.13]", "8497.42", 
"[10462.44]", "71903.23∗∗∗", "[24360.84]"), `Net Revenues` =
c("28226.05∗∗",  "[12334.27]", "14824.71∗", "[8143.69]",
"35576.39∗∗∗",  "[13382.81]"), Assets = c("36050.66∗∗∗", "[12589.11]",
"16441.81", "[10029.27]", "29404.54∗∗∗", "[10977.68]")), row.names =
3:8, class = "data.frame")

# structure table_2 <- structure( list( Treatment = c("Long Term Arm", "", "Short Term Arm", "", "Lumpsum Arm", ""), `# Enterprises` = c("9.93∗∗", "[3.96]", "3.39", "[3.57]", "14.67∗∗∗", "[3.92]"), Revenues = c("61379.40∗∗", "[24346.05]", "23177.47", "[16080.92]", "107746.75∗∗∗", "[34895.03]"), Costs = c("32055.29∗", "[16478.13]", "8497.42", "[10462.44]", "71903.23∗∗∗", "[24360.84]"), `Net Revenues` = c("28226.05∗∗", "[12334.27]", "14824.71∗", "[8143.69]", "35576.39∗∗∗", "[13382.81]"), Assets = c("36050.66∗∗∗", "[12589.11]", "16441.81", "[10029.27]", "29404.54∗∗∗", "[10977.68]") ), row.names = 3:8, class = "data.frame" ) # turn to dt table_2 <- data.table::as.data.table(table_2) # redo treatment trt <- table_2$Treatment for (i in seq_len(length(trt))) { t <- trt[i] if (t == "") { trt[i] <- paste0(trt[i - 1], " SE") trt[i] <- trt[i - 1] } } table_2[, Treatment := trt] # get columns to change to_change <- colnames(table_2)[colnames(table_2) != "Treatment"] # add on Sig to each column to_change_sig <- paste0(to_change, " Sig") # first function - extract all ∗ and collapse fun <- function(y) { z <- stringr::str_extract_all(y, "∗") lapply(z, function(x) paste0(x, collapse = "")) |> unlist() } # extracts the stars table_2[, (to_change_sig) := lapply(.SD, fun), .SDcols = to_change] # melt down table_2 <- data.table::melt(table_2, id.vars = "Treatment") table_2[, grp := ifelse(stringr::str_detect(value, "\\["), "SE", "Estimate")] table_2[variable %in% to_change_sig, grp := "Sig"] # functino to remove all brackets and stars, turn to numeric fun <- function(y) { z <- stringr::str_remove_all(y, "\\[|\\]|∗") as.numeric(z) } # apply it only to estimate and se table_2[grp != "Sig", value := fun(value)] # remove " Sig" from the significance variables (for casting wide) table_2[, variable := stringr::str_remove_all(variable, " Sig")] # order the table (like dplyr::arrange) data.table::setorder(table_2, Treatment, variable, grp) # remove values where there are no stars table_2 <- table_2[value != ""] # cast wide table_2 <- data.table::dcast( table_2, Treatment + variable ~ grp, value.var = "value" )

r$> table_2 Key: <Treatment, variable> Treatment variable Estimate SE Sig <char> <char> <char> <char> <char> 1: Long Term Arm # Enterprises 9.93 3.96 ∗∗ 2: Long Term Arm Assets 36050.66 12589.11 ∗∗∗ 3: Long Term Arm Costs 32055.29 16478.13 ∗ 4: Long Term Arm Net Revenues 28226.05 12334.27 ∗∗ 5: Long Term Arm Revenues 61379.4 24346.05 ∗∗ 6: Lumpsum Arm # Enterprises 14.67 3.92 ∗∗∗ 7: Lumpsum Arm Assets 29404.54 10977.68 ∗∗∗ 8: Lumpsum Arm Costs 71903.23 24360.84 ∗∗∗ 9: Lumpsum Arm Net Revenues 35576.39 13382.81 ∗∗∗ 10: Lumpsum Arm Revenues 107746.75 34895.03 ∗∗∗ 11: Short Term Arm # Enterprises 3.39 3.57 <NA> 12: Short Term Arm Assets 16441.81 10029.27 <NA> 13: Short Term Arm Costs 8497.42 10462.44 <NA> 14: Short Term Arm Net Revenues 14824.71 8143.69 ∗ 15: Short Term Arm Revenues 23177.47 16080.92 <NA>

# redo treatment trt <- table_2$Treatment for (i in seq_len(length(trt))) { t <- trt[i] if (t == "") { trt[i] <- paste0(trt[i - 1], " SE") trt[i] <- trt[i - 1] } } table_2[, Treatment := trt]

# first function - extract all ∗ and collapse fun <- function(y) { z <- stringr::str_extract_all(y, "∗") lapply(z, function(x) paste0(x, collapse = "")) |> unlist() } # extracts the stars table_2[, (to_change_sig) := lapply(.SD, fun), .SDcols = to_change]

r$> head(table_2) Treatment # Enterprises Revenues Costs Net Revenues Assets <char> <char> <char> <char> <char> <char> 1: Long Term Arm 9.93∗∗ 61379.40∗∗ 32055.29∗ 28226.05∗∗ 36050.66∗∗∗ 2: Long Term Arm [3.96] [24346.05] [16478.13] [12334.27] [12589.11] 3: Short Term Arm 3.39 23177.47 8497.42 14824.71∗ 16441.81 4: Short Term Arm [3.57] [16080.92] [10462.44] [8143.69] [10029.27] 5: Lumpsum Arm 14.67∗∗∗ 107746.75∗∗∗ 71903.23∗∗∗ 35576.39∗∗∗ 29404.54∗∗∗ 6: Lumpsum Arm [3.92] [34895.03] [24360.84] [13382.81] [10977.68] # Enterprises Sig Revenues Sig Costs Sig Net Revenues Sig Assets Sig <char> <char> <char> <char> <char> 1: 1 1 1 1 1 2: 1 1 1 1 1 3: 1 1 1 1 1 4: 1 1 1 1 1 5: 1 1 1 1 1 6: 1 1 1 1 1

R 根据现有列的名称和字符串的存在进行变异以创建多个新列

推荐答案

R相关问答推荐

按块将载体转换为矩阵-reshape

带有gplot 2的十字舱口

更改Heatmap Annotation对象的名称

从嵌套列表中智能提取线性模型系数

如何删除R中除某些特定名称外的所有字符串？

将复杂的组合列表转换为数据框架

如何使用`ggplot2：：geom_segment()`或`ggspatial：：geom_spatial_segment()`来处理不在格林威治中心的sf对象？

有没有一个R函数允许你从一个数字变量中提取一个数字，而不考虑它的位置(不仅仅是第一个或最后一个数字？

如何使用R对每组变量进行随机化？

二维样条，严格以一个参数递增

移除仪表板Quarto中顶盖和车身之间的白色区域

如何使用列表中多个列表中的第一条记录创建数据框

如何将Which()函数用于管道%>；%

根据约束随机填充向量的元素

为什么我对圆周率图的蒙特卡罗估计是空的？

将列的值乘以在不同数据集中找到的值

错误包arrowR：READ_PARQUET/OPEN_DATASET&QOT；无法反序列化SARIFT：TProtocolException：超出大小限制&Quot；

Conditional documentr：：R中数据帧的summarize()

以任意顺序提取具有多个可能匹配项的组匹配项

对计算变量所有唯一值的变量进行变异