我正在try 对包含五个因素的数据集使用library(mgcv)来拟合GAM模型.有些因素是嵌套的.有一个名为x的自变量和一个名为y的响应变量.该分析的目标是预测y作为x的函数,对于不同级别的固定因子(在本例中,因子4和因子5是固定的,其余的是随机的).我想包括随机因素,以便在模型中考虑它们,但我希望预测在固定因素(如re_form = NA)的水平上.

以下是我的数据集:

data_exp <- read.csv(text = "
, factor1, factor2, factor3, factor4, factor5, x, y
1, A, 1, A15, DE, RS1, 1, 20.6
2, B, 1, A15, DE, RS1, 1.5, 26.2
3, C, 1, A15, DE, RS1, 1.5, 24.2
4, D, 1, A15, DE, RS1, 1.8, 31
5, E, 1, A15, DE, RS1, 2.2, 30.7
6, F, 1, A15, DE, RS1, 2.2, 28.4
7, A, 2, A15, DE, RS1, 2.2, 31
8, B, 2, A15, DE, RS1, 3.5, 36.3
9, C, 2, A15, DE, RS1, 3.6, 36.2
10, D, 2, A15, DE, RS1, 3.7, 34.7
11, E, 2, A15, DE, RS1, 5.4, 41
12, F, 2, A15, DE, RS1, 6.4, 39.6
13, A, 1, A16, DE, RS1, 6.7, 37.7
14, B, 1, A16, DE, RS1, 6.7, 34.5
15, C, 1, A16, DE, RS1, 6.7, 41.2
16, D, 1, A16, DE, RS1, 6.7, 38.9
17, E, 1, A16, DE, RS1, 6.7, 38.9
18, F, 1, A16, DE, RS1, 7.8, 41.3
19, A, 2, A16, DE, RS1, 8, 41.1
20, B, 2, A16, DE, RS1, 8.5, 40.8
21, C, 2, A16, DE, RS1, 8.7, 41.5
22, D, 2, A16, DE, RS1, 9.5, 41.3
23, E, 2, A16, DE, RS1, 10, 39.9
24, F, 2, A16, DE, RS1, 10, 35.8
25, A, 1, A15, DE, RS2, 1, 24
26, B, 1, A15, DE, RS2, 1.5, 31.1
27, C, 1, A15, DE, RS2, 1.8, 36.3
28, D, 1, A15, DE, RS2, 1.8, 30.9
29, E, 1, A15, DE, RS2, 1.8, 33
30, F, 1, A15, DE, RS2, 2.5, 37.5
31, A, 2, A15, DE, RS2, 2.7, 39.5
32, B, 2, A15, DE, RS2, 3.5, 40.2
33, C, 2, A15, DE, RS2, 3.6, 44.9
34, D, 2, A15, DE, RS2, 3.7, 40.9
35, E, 2, A15, DE, RS2, 3.7, 42
36, F, 2, A15, DE, RS2, 3.7, 43.2
37, A, 1, A16, DE, RS2, 4.5, 41.9
38, B, 1, A16, DE, RS2, 4.8, 41
39, C, 1, A16, DE, RS2, 5, 46.8
40, D, 1, A16, DE, RS2, 5, 38.6
41, E, 1, A16, DE, RS2, 5, 45
42, F, 1, A16, DE, RS2, 5.5, 43.8
43, A, 2, A16, DE, RS2, 6.2, 46.2
44, B, 2, A16, DE, RS2, 6.2, 45
45, C, 2, A16, DE, RS2, 6.2, 48.3
46, D, 2, A16, DE, RS2, 7.5, 42.2
47, E, 2, A16, DE, RS2, 7.5, 47.8
48, F, 2, A16, DE, RS2, 7.5, 48.5
49, A, 1, A15, PE, RS1, 1, 27.2
50, B, 1, A15, PE, RS1, 1.5, 34.1
51, C, 1, A15, PE, RS1, 2, 39.1
52, D, 1, A15, PE, RS1, 2.5, 39.1
53, E, 1, A15, PE, RS1, 2.5, 41.8
54, F, 1, A15, PE, RS1, 2.5, 41.5
55, A, 2, A15, PE, RS1, 2.5, 41.6
56, B, 2, A15, PE, RS1, 3.5, 47
57, C, 2, A15, PE, RS1, 4, 43.3
58, D, 2, A15, PE, RS1, 4.1, 48.4
59, E, 2, A15, PE, RS1, 4.2, 44.5
60, F, 2, A15, PE, RS1, 4.5, 44.1
61, A, 1, A16, PE, RS1, 4.5, 44.9
62, B, 1, A16, PE, RS1, 4.5, 46.9
63, C, 1, A16, PE, RS1, 4.5, 46.7
64, D, 1, A16, PE, RS1, 5, 46.2
65, E, 1, A16, PE, RS1, 5, 44.9
66, F, 1, A16, PE, RS1, 5.2, 45.7
67, A, 2, A16, PE, RS1, 5.3, 49.5
68, B, 2, A16, PE, RS1, 5.4, 47.5
69, C, 2, A16, PE, RS1, 5.5, 46
70, D, 2, A16, PE, RS1, 5.6, 45.4
71, E, 2, A16, PE, RS1, 5.8, 46.9
72, F, 2, A16, PE, RS1, 6.5, 44.3
73, A, 1, A15, PE, RS2, 1, 32.5
74, B, 1, A15, PE, RS2, 1, 30.9
75, C, 1, A15, PE, RS2, 1.8, 42.3
76, D, 1, A15, PE, RS2, 1.9, 42.4
77, E, 1, A15, PE, RS2, 2, 45.2
78, F, 1, A15, PE, RS2, 2.3, 45
79, A, 2, A15, PE, RS2, 2.3, 41.1
80, B, 2, A15, PE, RS2, 2.3, 43.2
81, C, 2, A15, PE, RS2, 2.3, 47.4
82, D, 2, A15, PE, RS2, 3.2, 50.5
83, E, 2, A15, PE, RS2, 3.5, 48.7
84, F, 2, A15, PE, RS2, 3.6, 47.8
85, A, 1, A16, PE, RS2, 3.6, 48.5
86, B, 1, A16, PE, RS2, 3.6, 47.7
87, C, 1, A16, PE, RS2, 3.7, 48
88, D, 1, A16, PE, RS2, 3.8, 48.4
89, E, 1, A16, PE, RS2, 3.9, 50.6
90, F, 1, A16, PE, RS2, 4, 48.3
91, A, 2, A16, PE, RS2, 4, 48.2
92, B, 2, A16, PE, RS2, 4, 47.8
93, C, 2, A16, PE, RS2, 4, 51.5
94, D, 2, A16, PE, RS2, 4.5, 49.2
95, E, 2, A16, PE, RS2, 4.6, 51.6
96, F, 2, A16, PE, RS2, 4.8, 51.7") %>%
  mutate(factor1 = as.factor(factor1),
         factor2 = as.factor(factor2),
         factor3 = as.factor(factor3),
         factor4 = as.factor(factor4),
         factor5 = as.factor(factor5))

这就是我try 过的:

library(mgcv)
bam_ex.2 <- bam (y ~
                   factor5 +
                   factor4 + 
                   s(x, by = interaction(factor5, factor4), k= 3) + # one smooth per each level of the two factors
                   s(factor1, by = factor4, bs = "re") + # factor1 is nested in factor4, factor 1 is random
                   s(factor2, by = factor3, bs = "re") + # factor2 is nested in factor3,  factor 2 is random       
                   s(factor3, bs = "re"), # factor 3 is random
                 data = data_exp,
                 discrete = TRUE)

pred_ex.2 <- data_exp %>% 
  mutate(y.p = predict(bam_ex.2, 
                       re_form = NA,
                       type="response"))

在绘制预测时,例如:

library(ggplot2)
ggplot(pred_ex.2, aes(x = x, y = y.p, color = factor4))+
  geom_line(aes(linetype= factor5), size=1) +
  scale_linetype_manual(values=c("solid", "dotted", "dashed"))

这就是我得到的:

enter image description here

我所期待的是,作为x的函数,y.p的四条流畅的线,没有间断.仔细观察预测,我可以看到,对于相同的factor4factor5x级别,我有不同的y.p个预测.

如下所示:

enter image description here

在那些情况下,由于factor4factor5x是相同的,所以我预计y.p的预测值是相同的.但事实并非如此.

推荐答案

您不适合随机(或混合)效果模型,请非常仔细地注意mgcv::smooth.termsbs="re"的文档(加粗我的):

这些参数项被脊罚罚(即单位矩阵)惩罚.当这种光滑具有多个自变量时,它代表了这些自变量的参数相互作用,其中系数受到脊罚的惩罚.在系数为I.I.D.的情况下,脊罚等于an assumption.正常的随机效果.

这些并不是真正的随机效果,任何predict呼叫都会将其视为固定效果.此外,mgcv::predict.gam不接受re_form参数,而是在...内悄悄丢弃该参数.

解决方案是通过一个该方法确实理解的论点,从预测中排除你认为是"随机"的影响:

pred_ex.2 <- data_exp %>% 
   mutate(y.p = predict(bam_ex.2,
                        exclude=c("s(factor1):factor4 DE",
                                  "s(factor1):factor4 PE",
                                  "s(factor2):factor3 A15",
                                  "s(factor2):factor3 A16",
                                  "s(factor3)"),
                        type="response"))

R相关问答推荐

在ComplexHeatmap中,如何更改anno_barplot()标题的Angular ?

使用R中的gt对R中的html rmarkdown文件进行条件格式设置表的单元格

警告:lmdif:info = 0. nls. lm()函数的输入参数不正确

将嵌套列表子集化为嵌套列表

在R中使用download. file().奇怪的URL?

根据元素和前一个值之间的差值过滤矩阵的元素

R中的时间序列(Ts)函数计数不正确

为了网络分析目的,将数据框转换为长格式列联表

仅 Select 超过9行的CSV文件

展开对数比例绘图的轴(添加填充)

如何识别倒排的行并在R中删除它们?

为什么在BASE R中绘制线条时会看到线上的点?

如何创建累加到现有列累计和的新列?

变异以按组从其他列创建具有最大和最小值的新列

如何移动点以使它们的打印不重叠

策略表单连接两个非常大的箭头数据集,而不会 destruct 内存使用

将CSV转换为R中的自定义JSON格式

向内存不足的数据帧添加唯一行

用逗号拆分字符串,并删除一些字符

用LOOCV进行K近邻问题