使用下面的代码,我绘制了一个小提琴情节.我还试图对我的数据应用Games-Howell方差分析事后测试.它正在成功完成并显示A、B、C和D亚基因组之间的显着差异.然而,我无法在我的情节中表现出显着的差异.我认为只有我代码的最后一部分需要修复#Annotate显着差异.
Anova事后测试显示了以下重要性.我所try 的只是绘制A和B、A和C、A和D之间的意义 *.
治疗之间的比较意味着
difference pvalue signif. LCL UCL
A - B -0.037101857 0. 0000 * -0.046188341 -0.02801537 A-C-0.028211022 0.0000 * -0.037394783 -0.01902726 A-D-0.030163234 0. 0000 * -0.039466699 -0.02085977 B-C 0.008890835 0.1059 -0.001186650.01896834 B-D 0.006938623 0.2979 -0.003248084 0. 01712533 C-D-0.0019522120.9618 -0.012225785 0. 00832136
我try 调整ymax限制,但收到以下警告消息:
Warning messages:
1: Removed 1 row containing missing values or values outside the scale range
(`geom_text()`).
2: Removed 1 row containing missing values or values outside the scale range
(`geom_text()`).
3: Removed 1 row containing missing values or values outside the scale range
(`geom_text()`).
我的代码:
library(dplyr)
library(agricolae)
library(stringr)
library(ggplot2)
library(Hmisc)
# Read the data
df <- read.table('ABCD-meth-r1.tsv', header = TRUE, sep = "\t")
df$subgenome <- factor(df$subgenome)
#create myxlab label
my_xlab <- paste(levels(df$subgenome), "\n ",table(df$subgenome),sep="")
#Conduct ANOVA
anova_result <- aov(value ~ subgenome, data = df)
summary(anova_result)
# Initialize sig_diff as a data frame to avoid scoping issues if no significant results
sig_diff <- data.frame(group1 = character(), group2 = character(), pvalue = numeric())
# Check if the ANOVA is significant and then perform Games-Howell test
if (summary(anova_result)[[1]][["Pr(>F)"]][1] < 0.05) {
print("Significant differences detected, performing Games-Howell test")
games_howell_result <- HSD.test(anova_result, "subgenome", group=FALSE, console=TRUE)
# Extracting pairs with significant differences
if (nrow(games_howell_result$comparison) > 0) {
sig_diff <- games_howell_result$comparison[games_howell_result$comparison$pvalue <
0.05,]
}
}
# Calculate ymax
ymax <- max(df$value, na.rm = TRUE) * 1.1
print(paste("ymax for annotation:", ymax))
# Plotting code
p <- ggplot(df, aes(x = subgenome, y = value, fill = subgenome)) +
geom_violin() +
geom_boxplot(width=0.08, fill="white")+
stat_summary(fun=mean, geom="point", shape=20, size=1, color="darkgreen") +
scale_x_discrete(labels=my_xlab)+
xlab("") +
ylab("CDS methylation") +
theme_bw()
ylim(0, ymax)
# Annotate significant differences
if (nrow(sig_diff) > 0) {
for(i in 1:nrow(sig_diff)) {
sub1_index <- which(levels(df$subgenome) == sig_diff$group1[i])
sub2_index <- which(levels(df$subgenome) == sig_diff$group2[i])
mid_x <- mean(c(sub1_index, sub2_index))
print(paste("Annotating between:", sig_diff$group1[i], "and", sig_diff$group2[i],
"at", mid_x))
p <- p + annotate("text", label = "*", x = mid_x, y = ymax * 0.95, size = 5,
vjust = 0)
p <- p + annotate("segment", x = sub1_index, xend = sub2_index, y = ymax, yend =
ymax, color = "red", linewidth = 1)
}
}
# Print the plot
print(p)
输出图:
made the following change in the last line of the part of code.
Rest everything stay the same:
# Plotting code
p <- ggplot(df, aes(x = subgenome, y = value, fill = subgenome)) +
geom_violin() +
geom_boxplot(width=0.08, fill="white")+
stat_summary(fun=mean, geom="point", shape=20, size=1, color="darkgreen") +
scale_x_discrete(labels=my_xlab)+
xlab("") +
ylab("CDS methylation") +
theme_bw()+
coord_cartesian(ylim = c(0, ymax), clip = "off")
Made the foolowing change in the plot:
# Initialize sig_diff as a data frame to avoid scoping issues if no significant results
sig_diff <- data.frame(group1 = character(), group2 = character(), pvalue = numeric())
**sig_diff$group2 <- sub(".+ \\- (.+)", "\\1", row.names(sig_diff))
sig_diff$group1 <- sub("(.+) \\- .+", "\\1", row.names(sig_diff))**