我有三个问题,但它们是相关的.

My starting point is original table

Question 1:我想创建五个新列:#Enterprise_Sig、Income_Sig、Costs_Sig、Net Income_Sig和Assets_Sig.这些列在其各自的列中将包含相同数量的*.因此,原始列将只包含数字.例如,下面的代码做了我需要的事情,但只针对一列.

table_2 <- table_2 %>% mutate("Net Revenues_Sig" = ifelse(str_count(table_2$`Net Revenues`,"∗")==1, "∗",ifelse(str_count(table_2$`Net Revenues`,"∗")==2, "∗∗",ifelse(str_count(table_2$`Net Revenues`,"∗")==3, "∗∗∗",""))))

table_2$`Net Revenues` <- str_replace_all(table_2$`Net Revenues`, "[∗]", "")

to produce enter image description here

当然,我可以再重复这个过程4次,但肯定有更有效的方法来做到这一点……?

Question 2:我想做一些类似的事情,除了方括号.我将如何为不带括号的各个标准误差创建5个新列(例如,新列INAGERS_SE将是数值,包含值24346.05、16080.92、34895.03),然后删除这三行(因此,收入只有三个长期、短期和一次总和值)?

Question 3:如何将除5个SIG列以外的所有列转换为数字(由于括号和星号,当前为字符)?

structure(list(Treatment = c("Long Term Arm", "", "Short Term Arm", 
"", "Lumpsum Arm", ""), `# Enterprises` = c("9.93∗∗", "[3.96]", 
"3.39", "[3.57]", "14.67∗∗∗", "[3.92]"), Revenues = c("61379.40∗∗", 
"[24346.05]", "23177.47", "[16080.92]", "107746.75∗∗∗", 
"[34895.03]"), Costs = c("32055.29∗", "[16478.13]", "8497.42", 
"[10462.44]", "71903.23∗∗∗", "[24360.84]"), `Net Revenues` =
c("28226.05∗∗",  "[12334.27]", "14824.71∗", "[8143.69]",
"35576.39∗∗∗",  "[13382.81]"), Assets = c("36050.66∗∗∗", "[12589.11]",
"16441.81", "[10029.27]", "29404.54∗∗∗", "[10977.68]")), row.names =
3:8, class = "data.frame")

推荐答案

不管它有没有价值,这都是你想要的.你需要library(data.table)library(stringr).

# structure
table_2 <- structure(
    list(
        Treatment = c("Long Term Arm", "", "Short Term Arm", "", "Lumpsum Arm", ""),
        `# Enterprises` = c("9.93∗∗", "[3.96]", "3.39", "[3.57]", "14.67∗∗∗", "[3.92]"),
        Revenues = c("61379.40∗∗", "[24346.05]", "23177.47", "[16080.92]", "107746.75∗∗∗", "[34895.03]"),
        Costs = c("32055.29∗", "[16478.13]", "8497.42", "[10462.44]", "71903.23∗∗∗", "[24360.84]"),
        `Net Revenues` = c("28226.05∗∗", "[12334.27]", "14824.71∗", "[8143.69]", "35576.39∗∗∗", "[13382.81]"),
        Assets = c("36050.66∗∗∗", "[12589.11]", "16441.81", "[10029.27]", "29404.54∗∗∗", "[10977.68]")
    ),
    row.names = 3:8, class = "data.frame"
)

# turn to dt
table_2 <- data.table::as.data.table(table_2)

# redo treatment
trt <- table_2$Treatment
for (i in seq_len(length(trt))) {
    t <- trt[i]
    if (t == "") {
        trt[i] <- paste0(trt[i - 1], " SE")
        trt[i] <- trt[i - 1]
    }
}
table_2[, Treatment := trt]

# get columns to change
to_change <- colnames(table_2)[colnames(table_2) != "Treatment"]

# add on Sig to each column
to_change_sig <- paste0(to_change, " Sig")

# first function - extract all ∗ and collapse
fun <- function(y) {
    z <- stringr::str_extract_all(y, "∗")
    lapply(z, function(x) paste0(x, collapse = "")) |> unlist()
}

# extracts the stars
table_2[, (to_change_sig) := lapply(.SD, fun), .SDcols = to_change]

# melt down
table_2 <- data.table::melt(table_2, id.vars = "Treatment")
table_2[, grp := ifelse(stringr::str_detect(value, "\\["), "SE", "Estimate")]
table_2[variable %in% to_change_sig, grp := "Sig"]

# functino to remove all brackets and stars, turn to numeric
fun <- function(y) {
    z <- stringr::str_remove_all(y, "\\[|\\]|∗")
    as.numeric(z)
}

# apply it only to estimate and se
table_2[grp != "Sig", value := fun(value)]

# remove " Sig" from the significance variables (for casting wide)
table_2[, variable := stringr::str_remove_all(variable, " Sig")]

# order the table (like dplyr::arrange)
data.table::setorder(table_2, Treatment, variable, grp)

# remove values where there are no stars
table_2 <- table_2[value != ""]

# cast wide
table_2 <- data.table::dcast(
    table_2,
    Treatment + variable ~ grp,
    value.var = "value"
)
r$> table_2
Key: <Treatment, variable>
         Treatment      variable  Estimate       SE    Sig
            <char>        <char>    <char>   <char> <char>
 1:  Long Term Arm # Enterprises      9.93     3.96     ∗∗
 2:  Long Term Arm        Assets  36050.66 12589.11    ∗∗∗
 3:  Long Term Arm         Costs  32055.29 16478.13      ∗
 4:  Long Term Arm  Net Revenues  28226.05 12334.27     ∗∗
 5:  Long Term Arm      Revenues   61379.4 24346.05     ∗∗
 6:    Lumpsum Arm # Enterprises     14.67     3.92    ∗∗∗
 7:    Lumpsum Arm        Assets  29404.54 10977.68    ∗∗∗
 8:    Lumpsum Arm         Costs  71903.23 24360.84    ∗∗∗
 9:    Lumpsum Arm  Net Revenues  35576.39 13382.81    ∗∗∗
10:    Lumpsum Arm      Revenues 107746.75 34895.03    ∗∗∗
11: Short Term Arm # Enterprises      3.39     3.57   <NA>
12: Short Term Arm        Assets  16441.81 10029.27   <NA>
13: Short Term Arm         Costs   8497.42 10462.44   <NA>
14: Short Term Arm  Net Revenues  14824.71  8143.69      ∗
15: Short Term Arm      Revenues  23177.47 16080.92   <NA>

看起来您正在处理某种转换为数据的模型输出.Frame-如果这是真的,您可能想看看是否可以直接从模型中提取相同的信息,而不是使用这种方法.

无论如何,希望这对我们有所帮助!:)

EDIT

我没有看到上面问题的第二部分,您还想在其中添加SE列.现在的解决方案做到了这一点.

解释:

为了添加SE列,我们需要填写相应的处理,因为它们是空白的.这就是这些代码行的作用:

# redo treatment
trt <- table_2$Treatment
for (i in seq_len(length(trt))) {
    t <- trt[i]
    if (t == "") {
        trt[i] <- paste0(trt[i - 1], " SE")
        trt[i] <- trt[i - 1]
    }
}
table_2[, Treatment := trt]
r$> table_2$Treatment
[1] "Long Term Arm"  "Long Term Arm"  "Short Term Arm"
[4] "Short Term Arm" "Lumpsum Arm"    "Lumpsum Arm" 

为了提取星星,我使用

# first function - extract all ∗ and collapse
fun <- function(y) {
    z <- stringr::str_extract_all(y, "∗")
    lapply(z, function(x) paste0(x, collapse = "")) |> unlist()
}

这看起来很复杂的原因是,str_Extact_all返回一个列表.如果我们设置

y <- x$Costs
r$> stringr::str_extract_all(y, "∗")
[[1]]
[1] "∗"

[[2]]
character(0)

[[3]]
character(0)

[[4]]
character(0)

[[5]]
[1] "∗" "∗" "∗"

[[6]]
character(0)

你可以看到在元素5中,恒星并不在一起,而是像一个矢量.想想c("*", "*", "*")个,而不是"***"个.这就是paste0中的collapse论点发挥作用的地方:

r$> lapply(z, function(x) paste0(x, collapse = ""))
[[1]]
[1] "∗"

[[2]]
[1] ""

[[3]]
[1] ""

[[4]]
[1] ""

[[5]]
[1] "∗∗∗"

[[6]]
[1] ""

然后,我们只需要取消列出,这样我们就可以将其放回data.Frame中.

至于

# first function - extract all ∗ and collapse
fun <- function(y) {
    z <- stringr::str_extract_all(y, "∗")
    lapply(z, function(x) paste0(x, collapse = "")) |> unlist()
}

# extracts the stars
table_2[, (to_change_sig) := lapply(.SD, fun), .SDcols = to_change]

这是一些比较容易混淆的data.table语法,但基本上我们定义了to_change_sig作为列名,并在其末尾加上"sig".如果我们这么做了

table_2[, (to_change_sig) := 1]

它将创建所有这些带有1‘S的专栏:

r$> head(table_2)
        Treatment # Enterprises     Revenues       Costs Net Revenues      Assets
           <char>        <char>       <char>      <char>       <char>      <char>
1:  Long Term Arm        9.93∗∗   61379.40∗∗   32055.29∗   28226.05∗∗ 36050.66∗∗∗
2:  Long Term Arm        [3.96]   [24346.05]  [16478.13]   [12334.27]  [12589.11]
3: Short Term Arm          3.39     23177.47     8497.42    14824.71∗    16441.81
4: Short Term Arm        [3.57]   [16080.92]  [10462.44]    [8143.69]  [10029.27]
5:    Lumpsum Arm      14.67∗∗∗ 107746.75∗∗∗ 71903.23∗∗∗  35576.39∗∗∗ 29404.54∗∗∗
6:    Lumpsum Arm        [3.92]   [34895.03]  [24360.84]   [13382.81]  [10977.68]
   # Enterprises Sig Revenues Sig Costs Sig Net Revenues Sig Assets Sig
              <char>       <char>    <char>           <char>     <char>
1:                 1            1         1                1          1
2:                 1            1         1                1          1
3:                 1            1         1                1          1
4:                 1            1         1                1          1
5:                 1            1         1                1          1
6:                 1            1         1                1          1

这就是第一部分.:= lapply(.SD, fun), .SDcols = to_change所做的就是说"在.SDcols中的列中应用函数fun".因此,我们通过将fun应用于我们在to_change中指定的每个原始列来创建to_change_sig中的列.在这里,您可能可以想象to_change的顺序必须与to_change_sig一致.

既然你也想把SE作为一个单独的专栏,我认为最好的方法就是融化并再次广泛投射.因此才有了最新消息.

R相关问答推荐

按块将载体转换为矩阵-reshape

带有gplot 2的十字舱口

更改Heatmap Annotation对象的名称

从嵌套列表中智能提取线性模型系数

如何删除R中除某些特定名称外的所有字符串?

将复杂的组合列表转换为数据框架

如何使用`ggplot2::geom_segment()`或`ggspatial::geom_spatial_segment()`来处理不在格林威治中心的sf对象?

有没有一个R函数允许你从一个数字变量中提取一个数字,而不考虑它的位置(不仅仅是第一个或最后一个数字?

如何使用R对每组变量进行随机化?

二维样条,严格以一个参数递增

移除仪表板Quarto中顶盖和车身之间的白色区域

如何使用列表中多个列表中的第一条记录创建数据框

如何将Which()函数用于管道%>;%

根据约束随机填充向量的元素

为什么我对圆周率图的蒙特卡罗估计是空的?

将列的值乘以在不同数据集中找到的值

错误包arrowR:READ_PARQUET/OPEN_DATASET&QOT;无法反序列化SARIFT:TProtocolException:超出大小限制&Quot;

Conditional documentr::R中数据帧的summarize()

以任意顺序提取具有多个可能匹配项的组匹配项

对计算变量所有唯一值的变量进行变异