我正在使用ggplot2创建一个棒棒糖图表,比较美国大学学费和家庭收入中位数(适用于所有种族,特别适用于黑人家庭).为了让图表更容易阅读,我想将家庭收入的两个条的线宽和点大小设置为1.3和5,而将其他条(学费和费用)的线宽和点大小设置为0.7和2.然而,出于某种原因,R将我的参数应用于黑人家庭和佛罗里达大学(见图),而不是将它们应用于这两个收入标准,但我不知道如何修复它.

此外,我设置的x和y标签不会应用于图形.相反,左上角有一个小小的namerank徘徊,右下角有一个y徘徊.我也不知道如何让他们离开.

Current graph

这是我的代码:

clg_fee|>
  arrange(costatt)|>
  mutate(namerank = factor(namerank, namerank))|>
  ggplot() +
  geom_segment(
    aes(x=namerank, 
        xend=namerank, 
        y=0, 
        yend=costatt, 
       color = ifelse(namerank %in% c("Real Median Household Income (2022)", 
                                   "Real Median Household Income (Black, 2022)"), 
                   "Median Household Income","Cost of Attendance (out-of-state)")),
    linewidth = ifelse(clg_fee$namerank %in% c("Real Median Household Income (2022)", 
                                       "Real Median Household Income (Black, 2022)"), 
                       1.3,0.7) #cost of attendance and income
  )+
  geom_segment(
    aes(x=namerank, 
        xend=namerank, 
        y=0, 
        yend=out_state, 
        color = "Tuition (out-of-state)"),
    linewidth = 0.7 #out_state tuition
    )+
  geom_point(aes(x = namerank, 
                 y=out_state, 
                 color="Tuition (out-of-state)"),
             size = 2)+ #out_state tuition
  geom_point(aes(x = namerank, y = costatt,
                 color = ifelse(namerank %in% c("Real Median Household Income (2022)", 
                                            "Real Median Household Income (Black, 2022)"), 
                            "Median Household Income","Cost of Attendance (out-of-state)")),
             size = ifelse(clg_fee$namerank %in% c("Real Median Household Income (2022)", 
                                            "Real Median Household Income (Black, 2022)"), 
                            5, 2))+ #cost of attendance and income
  geom_segment(
    aes(x=namerank, 
        xend=namerank, 
        y=0, 
        yend=in_state, 
        color = "Tuition (in-state)"),
    linewidth = 0.7 #in_state
  )+ 
  geom_point(aes(x = namerank, y=in_state, color = "Tuition (in-state)"), size = 2)+ #in_state
  coord_flip() +
  scale_y_continuous(labels = scales::label_number(scale_cut = scales::cut_short_scale(), suffix = "$"))+
  scale_color_manual(
    values = c(
      "Tuition (out-of-state)" = "#779ECB",
      "Tuition (in-state)" = "#77DD77",
      "Median Household Income" = "orange",
      "Cost of Attendance (out-of-state)" = "#757575"
    )
  )+
  theme_ipsum()+
  theme(legend.position = "top")+
  labs(
    xlab = "",
    ylab = "Undergraduate costs and tuition",
    color = "",
    title = "University costs are far from affordable",
    caption = "Tuition fees source: Visual Capitalist
    Note that in-state tuition data is unavailable for most universities \n
    Cost of attendance source: University websites
    Note that official estimations of cost of attendance are unavailable for Boston College and Northeastern"
  )

这是我的数据: 由于我想按降序对大学的出勤成本进行排名,并将家庭收入置于同一排名中,因此我将美国的收入中位数作为两行插入到收件箱中,并将收入值置于costatt以下(代表出勤成本)

structure(list(namerank = c("Real Median Household Income (2022)", 
"Real Median Household Income (Black, 2022)", "University of Southern California(Rank28)", 
"Brown University(Rank9)", "Duke University(Rank7)", "University of Pennsylvania(Rank6)", 
"Cornell University(Rank12)", "Northwestern University(Rank9)", 
"University of Chicago(Rank12)", "Columbia University(Rank12)", 
"Dartmouth College(Rank18)", "Georgetown University(Rank22)", 
"Yale University(Rank5)", "Vanderbilt University(Rank18)", "Carnegie Mellon University(Rank24)", 
"Johns Hopkins University(Rank9)", "California Institute of Technology(Rank7)", 
"Washington University, St. Louis(Rank24)", "University of Notre Dame(Rank20)", 
"Stanford University(Rank3)", "Emory University(Rank24)", "Massachusetts Institute of Technology(Rank2)", 
"Princeton University(Rank1)", "Harvard University(Rank3)", "University of Virginia(Rank24)", 
"Rice University(Rank17)", "University of Michigan, Ann Arbor(Rank21)", 
"University of California, San Diego(Rank28)", "University of California, Berkeley(Rank15)", 
"University of California, LA(Rank15)", "University of California, Davis(Rank28)", 
"University of North Carolina at Chapel Hill(Rank22)", "University of Florida(Rank28)"
), rank = c(NA, NA, 28, 9, 7, 6, 12, 9, 12, 12, 18, 22, 5, 18, 
24, 9, 7, 24, 20, 3, 24, 2, 1, 3, 24, 17, 21, 28, 15, 15, 28, 
22, 28), school_name = c(NA, NA, "University of\r\r\r\nSouthern California", 
"Brown University", "Duke University", "University of\r\r\r\nPennsylvania", 
"Cornell University", "Northwestern University", "University of Chicago", 
"Columbia University", "Dartmouth College", "Georgetown University", 
"Yale University", "Vanderbilt University", "Carnegie Mellon University", 
"Johns Hopkins\r\r\r\nUniversity", "California Institute\r\r\r\nof Technology", 
"Washington\r\r\r\nUniversity, St. Louis", "University of Notre Dame", 
"Stanford University", "Emory University", "Massachusetts\r\r\r\nInstitute of\r\r\r\nTechnology", 
"Princeton University", "Harvard University", "University of Virginia", 
"Rice University", "University of\r\r\r\nMichigan, Ann Arbor", 
"University of\r\r\r\nCalifornia, San Diego", "University of\r\r\r\nCalifornia, Berkeley", 
"University of\r\r\r\nCalifornia, LA", "University of\r\r\r\nCalifornia, Davis", 
"University of North\r\r\r\nCarolina at Chapel Hill", "University of Florida"
), state = c(NA, NA, "California", "Rhode Island", "North Carolina", 
"Pennsylvania", "New York", "Illinois", "Illinois", "New York", 
"New Hampshire", "Washington, DC", "Connecticut", "Tennessee", 
"Pennsylvania", "Maryland", "California", "Missouri", "Indiana", 
"California", "Georgia", "Massachusetts", "New Jersey", "Massachusetts", 
"Virginia", "Texas", "Michigan", "California", "California", 
"California", "California", "North Carolina", "Florida"), out_state = c(NA, 
NA, 68237, 68230, 66172, 66104, 66014, 65997, 65619, 65524, 65511, 
65082, 64700, 63946, 63829, 63340, 63255, 62982, 62693, 62484, 
60774, 60156, 59710, 59076, 58950, 58128, 57273, 48630, 48465, 
46326, 46043, 39338, 28658), in_state = c(NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 22323, NA, 17786, 16056, 15891, 13752, 15266, 8998, 
6381), costatt = c(74580, 52860, 95225, 91676, 88938, 92228, 
83296, 91290, 89040, 88942, 91312, 88782, 90975, 89590, 73000, 
86065, 80028, 87644, 86125, 92892, 88414, 82720, 86700, 91166, 
91440, 86279, 76294, 77886, 78582, 67959, 78996, 66372, 45808
)), class = "data.frame", row.names = c(NA, -33L))

一开始我没有linewidthsize个论点,情节是毫无问题地创建的.在我添加参数后,R开始警告我,即使我彻底判断了我的管道,也找不到namerank.我在我的规模论点中添加了clg_fee$--因此是size = ifelse(clg_fee$namerank etc. etc.),它解决了这个问题,但现在我有黑人家庭的中位数收入,佛罗里达大学强调了这两个标准,而不是中位数收入.

对于标签,我try 在标签参数中设置xlab = NULL,但这不起作用.

推荐答案

问题在于,您在将数据传递为ggplot()之前对数据进行了重新排序,但根据原始的"无序"数据集使用ifelse设置了linewidthsize.相反,我建议根据美学进行映射,并使用scale_xxx_identityscale_xxx_manuallinewidthsize设置所需的值,就像我在下面的代码中所做的那样.这两种方法都需要稍微多一些的工作,但不太容易出错:

注:由于类别标签很长,我将标题和传奇与"plot"(而不是"panel")对齐,至少对于legend.location来说需要ggplot2 >= 3.5.0.

library(ggplot2)
library(dplyr, warn = FALSE)
library(hrbrthemes)

clg_fee |>
  arrange(costatt) |>
  mutate(namerank = factor(namerank, namerank)) |>
  mutate(
    costatt_or_income = ifelse(
      grepl("^Real Median", namerank),
      "Median Household Income", "Cost of Attendance (out-of-state)"
    )
  ) |>
  ggplot(aes(x = namerank, xend = namerank)) +
  geom_segment(
    aes(
      y = 0,
      yend = costatt,
      color = costatt_or_income,
      linewidth = costatt_or_income
    )
  ) +
  geom_point(
    aes(
      y = costatt,
      color = costatt_or_income,
      size = costatt_or_income
    )
  ) +
  geom_segment(
    aes(
      y = 0,
      yend = out_state,
      color = "Tuition (out-of-state)",
      linewidth = "Tuition (out-of-state)"
    )
  ) +
  geom_point(
    aes(
      y = out_state,
      color = "Tuition (out-of-state)",
      size = "Tuition (out-of-state)"
    )
  ) +
  geom_segment(
    aes(
      y = 0,
      yend = in_state,
      color = "Tuition (in-state)",
      linewidth = "Tuition (in-state)"
    )
  ) +
  geom_point(aes(
    y = in_state, color = "Tuition (in-state)", size = "Tuition (in-state)"
  )) +
  coord_flip() +
  scale_y_continuous(labels = scales::label_number(
    scale_cut = scales::cut_short_scale(), suffix = "$"
  )) +
  scale_color_manual(
    values = c(
      "Tuition (out-of-state)" = "#779ECB",
      "Tuition (in-state)" = "#77DD77",
      "Median Household Income" = "orange",
      "Cost of Attendance (out-of-state)" = "#757575"
    )
  ) +
  scale_linewidth_manual(
    values = c(
      "Median Household Income" = 1.3,
      "Cost of Attendance (out-of-state)" = .7,
      "Tuition (out-of-state)" = .7,
      "Tuition (in-state)" = .7
    ),
    guide = "none"
  ) +
  scale_size_manual(
    values = c(
      "Median Household Income" = 5,
      "Cost of Attendance (out-of-state)" = 2,
      "Tuition (out-of-state)" = 2,
      "Tuition (in-state)" = 2
    ),
    guide = "none"
  ) +
  theme_ipsum() +
  theme(
    legend.position = "top",
    plot.title.position = "plot",
    legend.location = "plot"
  ) +
  labs(
    xlab = "",
    ylab = "Undergraduate costs and tuition",
    color = NULL,
    title = "University costs are far from affordable",
    caption = "Tuition fees source: Visual Capitalist
    Note that in-state tuition data is unavailable for most universities \n
    Cost of attendance source: University websites
    Note that official estimations of cost of attendance are unavailable for Boston College and Northeastern"
  )

enter image description here

R相关问答推荐

计算转换的次数

如何将图案添加到ggplot中的一个类别

将R data.frame转换为json数组(源代码)

使用gsim删除特殊词

R:如何自动化变量创建过程,其中我需要基于ifelse()为现有变量的每个级别创建一个单独的变量

以R表示的gglikert地块调整总数

将带有范围的字符串转换为R中的数字载体

使用scale_x_continuous复制ggplot 2中的离散x轴

跨列应用多个摘要函数:summerise_all:列表对象无法强制为double类型'

列出用m n个值替换来绘制n个数字的所有方法(i.o.w.:R中大小为n的集合的所有划分为m个不同子集)

单击 map 后,将坐标复制到剪贴板

在(g)子中使用asserable字符

次级y轴R gggplot2

lightgbm发动机在tidymmodels中的L1正则化""

在GGPLATE中将突出的点放在前面

识别连接的子网(R-igraph)

跨列查找多个时间报告

减go R中列表的所有唯一元素对

整理ggmosaic图的标签

R-找出存在其他变量的各种大小的所有组合