我是一名研究人员,有史以来第一次为了工作而运行二项回归(以及编码和统计)--这是一种经历!我在中途接管了这个项目的工作,所以没有自己开发最初的代码.我以前从来没有编程过,所以我一直在学习R.然而,我遇到了一个我无法解决的错误问题(尽管我怀疑这很可能很简单),任何帮助都将不胜感激.我已经在下面更详细地展示了它,如果有帮助的话,可以附加屏幕截图.

最初的数据集是1,276个人(行),每个人回答188个问题(列)中的 Select .自那以后,我被要求在这个初始数据集中添加对另外8个问题的答复,这意味着最终数据集的196个问题(列).总体而言,只有9个专栏,而且保持不变.但是,我在调整代码以考虑这些新列的添加时遇到了问题.

欢迎就可能导致行不匹配的原因提出任何建议!

例如,我的第一个代码将运行:

Ans_Data = read_xlsx("DSM Data 15.2.23 IB v4.xlsx",
  sheet = "CHANGED Tab 2 - AR weighted",
  range = "A12:GG1290", col_names = F, col_types = c("text",rep("numeric",188)))
Question_Data = t(read_xlsx("DSM Data 15.2.23 IB v4.xlsx",
  sheet = "CHANGED Tab 2 - AR weighted",
  range = "A1:GG10", col_names = T))

colnames(Question_Data) = Question_Data[1,] 
Question_Data = Question_Data[-1,] 
Question_Data = data.table(Question_Data)

Ans_Data_2 = Ans_Data %>%
  pivot_longer(cols = colnames(Ans_Data)[2:189])

for (i in 1:1278) {
  if (i==1) {
    Question_Data_2 = rbind(Question_Data,Question_Data)
  } else {
    Question_Data_2 = rbind(Question_Data_2,Question_Data)
  }
}

Ans_Data_3 = cbind(Ans_Data_2, Question_Data_2)

然而,我的更新代码如下:

Ans_Data = read_xlsx("DSM Data 15.2.23 DP v5.xlsx",
  sheet = "CHANGED Tab 2 - AR weighted",
  range = "A12:GO1287", col_names = F,col_types = c("text",rep("numeric",196)))
Question_Data = t(read_xlsx("DSM Data 15.2.23 DP v5.xlsx", 
  sheet = "CHANGED Tab 2 - AR weighted",
  range = "A1:GO10", col_names = T))

colnames(Question_Data) = Question_Data[1,] 
Question_Data = Question_Data[-1,] 
Question_Data = data.table(Question_Data)

Ans_Data_2 = Ans_Data %>%
  pivot_longer(cols = colnames(Ans_Data)[2:197])

for (i in 1:1278) {
  if (i==1) {
    Question_Data_2 = rbind(Question_Data,Question_Data)
  } else {
    Question_Data_2 = rbind(Question_Data_2,Question_Data)
  }
}

Ans_Data_3 = cbind(Ans_Data_2, Question_Data_2)

产生以下错误:

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 250096, 250684

推荐答案

所以很抱歉,我写了一个答案(现在还不能 comments ).我偶然看到了你的代码,不知何故引起了我的注意.无论如何,您的错误非常简单.您正试图"绑定列"(Cind)或将具有不同行数的两个数据帧绑定在一起.现在,这是另一个问题.

因此,在阅读代码时,您导入了两个数据集:

Ans_Data = read_xlsx("DSM Data 15.2.23 DP v5.xlsx", sheet = "CHANGED Tab 2 - AR weighted", range = "A12:GO1287", col_names = F,col_types = c("text",rep("numeric",196)))

Question_Data = t(read_xlsx("DSM Data 15.2.23 DP v5.xlsx", sheet = "CHANGED Tab 2 - AR weighted", range = "A1:GO10", col_names = T)).

From the naming of the dataset I assume that Ans_Data are the responses; This is a Dataset of 197 columns (A to GO) 和 1276 rows (12 to 1287). You later pivot that dataframe into long format; In your case that creates a dataframe with 250096 rows. This results from 196 (from 2:197) columns times the 1276 rows.

The second dataset (Question_Data) is a dataframe that has transposed (the t) 10 columns 和 197 rows. You than use the first line of that dataframe as colnames 和 exclude it leaving 196 rows. You later run a loop that for case i = 1 copies (row binds) 196 rows to the end of the Question_Data dataframe resulting in 392 rows. You than repeat that process for case i > 1 1277 times. The resulting dataframe Question_Data therefore has 392 + 196 * 1277 or 250684 rows.

Your datasets have 250096 und 250684 rows; So as mentioned cbind gives an error. Assuming Question_Data gives the design matrix und Ans_Data the responses, the code was probably built to merch the design matrix to the responses. Given you want 196 responses from 1276 individuals this should be 250096 rows (from 196 times 1276). So i would suggest that the sequence you loop through is to long 和 it should be 1:1275? Sry 1275 because its doubled in the if clause.

R相关问答推荐

R gtsummary tBL_summary,包含分层和两个独立分组变量

使用sensemakr和fixest feols模型(R)

判断字符串中数字的连续性

抖动点与嵌套类别变量箱形图的位置不对齐

在R中将特定列的值向右移动

单个轮廓重叠条的单独图例

从BRM预测价值

使用外部文件分配变量名及其值

如何在观测缺失的地方添加零

如何在ggplot2中绘制具有特定 colored颜色 的连续色轮

Geom_Hline将不会出现,而它以前出现了

如何根据R中其他变量的类别汇总值?

如何计算R glm probit中的线性预测因子?

Geom_arcbar()中出错:找不到函数";geom_arcbar";

自定义交互作用图的标签

填充图例什么时候会有点?

名字的模糊匹配

R:如何在数据集中使用Apply

将仪表板中的值框大小更改为Quarto

R/shiny APP:如何充分利用窗口?