我的数据具有二元结果和二元因变量. 问题是几乎一个变量存在零事件.

我必须用OR(95%CI)估计进行二元逻辑回归,但由于零事件,我不能.

我曾在逻辑回归模型(elrm R包)中try 过类似精确的推理,但没有结果.

我该如何解决它?

这是elrm的代码:

> x <- xtabs(~dead + interaction(volo_1, NACA_cat), data = db)
> x
     interaction(volo_1, NACA_cat)
morto  0.0  1.0  0.1  1.1
    0 2340  273  303   81
    1    0    0   86   21 

> cdat <- cdat <- data.frame(volo_1 = rep(0:1, 2), NACA_cat = rep(0:1, each = 2), admit = x[2, ], ntrials = colSums(x))
> cdat 
    volo_1 NACA_cat admit ntrials
0.0      0        0     0    2340
1.0      1        0     0     273
0.1      0        1    86     389
1.1      1        1    21     102
> m.volo_1 <- elrm(formula = admit/ntrials ~ volo_1, interest = ~volo_1, iter = 22000, dataset = cdat, burnIn = 2Progress: 100%                      

Generation of the Markov Chain required 3 secs
Conducting inference ...
Inference required 0 secs
> ## summary of model including estimates and CIs
> summary(m.volo_1)

Call:

elrm(formula = admit/ntrials ~ volo_1, interest = ~volo_1, iter = 22000, dataset = cdat, burnIn = 2000)


Results:

       estimate p-value p-value_se mc_size
volo_1  0.63666 0.02295    0.00166   20000


95% Confidence Intervals for Parameters

            lower    upper
volo_1 0.06414017 1.352148
> 
> 
> 
> m.NACA_cat <- elrm(formula = admit/ntrials ~ NACA_cat, interest = ~NACA_cat, iter = 22000, dataset = cdat, burnIn = 2Progress: 100%                      

Generation of the Markov Chain required 4 secs
Conducting inference ...
Inference required 0 secs
Warning message:
'NACA_cat' observed value of the sufficient statistic was not sampled 
> ## summary of model including estimates and CIs
> summary(m.NACA_cat)

Call:

elrm(formula = admit/ntrials ~ NACA_cat, interest = ~NACA_cat, iter = 22000, dataset = cdat, burnIn = 2000)


Results:

         estimate p-value p-value_se mc_size
NACA_cat       NA       0          0   20000


95% Confidence Intervals for Parameters

         lower upper
NACA_cat    NA    NA

这是二元逻辑回归的代码:

> full.model <- glm(morto ~ volo_1  + NACA_cat , data = db,family=binomial())
> logistic.display(full.model)

Logistic regression predicting morto : 1 vs 0 
 
                 crude OR(95%CI)       adj. OR(95%CI)        P(Wald's test) P(LR-test)
volo_1: 1 vs 0   1.82 (1.12,2.98)      0.91 (0.53,1.56)      0.741          0.739     
                                                                                      
NACA_cat: 1 vs 0 647257526.44 (0,Inf)  653272824.33 (0,Inf)  0.972          < 0.001   
                                                                                      
Log-likelihood = -257.3593
No. of observations = 3104
AIC value = 520.7187

编辑: 这是一个记录较少但问题相同的MRE:

db<- data.frame( morto =  c(0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                 volo_1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0),
                 NACA_dic = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))



db$volo_1<- as.factor(db$volo_1)


db$morto<- as.factor(db$morto)


db$NACA_dic<- as.factor(db$NACA_dic)



full.model <- glm(morto ~ volo_1 + NACA_dic  , data = db,family=binomial())

logistic.display(full.model)

x <- xtabs(~morto + interaction(volo_1, NACA_dic), data = db)


cdat <- cdat <- data.frame(volo_1 = rep(0:1, 2), NACA_dic = rep(0:1, each = 2), 
                           admit = x[2, ], ntrials = colSums(x))
cdat 


m.volo_1 <- elrm(formula = admit/ntrials ~ volo_1, interest = ~volo_1, iter = 22000, 
                 dataset = cdat, burnIn = 2000)
summary(m.volo_1)

m.NACA_dic <- elrm(formula = admit/ntrials ~ NACA_dic, interest = ~NACA_dic, iter = 22000, 
                   dataset = cdat, burnIn = 2000)
summary(m.NACA_cat)

问题是:

> m.NACA_dic <- elrm(formula = admit/ntrials ~ NACA_dic, interest = ~NACA_dic, iter = 22000, 
+                    dataset = cdat, burnIn = 2Progress: 100%                      

Generation of the Markov Chain required 3 secs
Conducting inference ...
Inference required 0 secs
Warning message:
'NACA_dic' observed value of the sufficient statistic was not sampled 
> summary(m.NACA_cat)

Call:

elrm(formula = admit/ntrials ~ NACA_cat, interest = ~NACA_cat, iter = 22000, dataset = cdat, burnIn = 2000)


Results:

         estimate p-value p-value_se mc_size
NACA_cat       NA       0          0   20000


95% Confidence Intervals for Parameters

         lower upper
NACA_cat    NA    NA

推荐答案

解决方案:

根据@BenBolker的建议,我通过brglm包在二项响应广义线性模型中实现了偏差减少

> model<-brglm(morto ~ volo_1 + NACA_dic , data = db,family=binomial, model = TRUE, method = "brglm.fit",
+              pl = FALSE, x = FALSE, y = TRUE, contrasts = NULL)
> 
> summary(model)

Call:
brglm(formula = morto ~ volo_1 + NACA_dic, family = binomial, 
    data = db, model = TRUE, method = "brglm.fit", pl = FALSE, 
    x = FALSE, y = TRUE, contrasts = NULL)


Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -4.5591     1.4144  -3.223  0.00127 **
volo_11      -1.3200     0.7845  -1.683  0.09244 . 
NACA_dic1     4.6583     1.4325   3.252  0.00115 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.942  on 133  degrees of freedom
Residual deviance:  92.711  on 131  degrees of freedom
Penalized deviance: 90.11255 
AIC:  98.711 

> exp(coef(model))
 (Intercept)      volo_11    NACA_dic1 
  0.01047146   0.26712352 105.45428986 
> exp(confint(model))
Profiling the ordinary deviance for the corresponding ML fit...
Profiling the penalized deviance for the supplied fit...
Calculating confidence intervals for the ML fit using deviance profiles...
Calculating confidence intervals for the BR fit using penalized likelihood profiles...
                  2.5 %     97.5 %
(Intercept)  0.00000000 0.07274017
volo_11      0.03224977 1.06917625
NACA_dic1   14.10469036        Inf
> coef(summary(model))[,'Pr(>|z|)']
(Intercept)     volo_11   NACA_dic1 
0.001266712 0.092437418 0.001146851 

R相关问答推荐

如何将具有重复名称的收件箱合并到R中的另一列中,而结果不同?

将Multilinetring合并到一个线串中,使用sf生成规则间隔的点

列出用m n个值替换来绘制n个数字的所有方法(i.o.w.:R中大小为n的集合的所有划分为m个不同子集)

带有gplot 2的十字舱口

我想在R中总结一个巨大的数据框架,使我只需要唯一的lat、lon、Date(Year)和Maxium Value""""""""

在R中将特定列的值向右移动

如何将R中数据帧中的任何Nas替换为最后4个值

仅 Select 超过9行的CSV文件

如何将网站图像添加到带有极坐标的面包裹条形图?

您是否可以将组添加到堆叠的柱状图

警告消息";没有非缺失的参数到min;,正在返回数据中的inf";.表分组集

WRS2包中带有bwtrim的简单ANOVA抛出错误

将具有坐标列表列的三角形转换为多个多边形

我需要使用ggplot2制作堆叠条形图

填充图例什么时候会有点?

条形图中的条形图没有try 赋予它们的 colored颜色

如果极点中存在部分匹配,则替换整个字符串

如何使用list_rind在列表中保留已命名但不包含第0行的记录?

Ggplot2:添加更多特定 colored颜色 的线条

具有某些列的唯一值的数据帧