缩写某些字符串(dplyr 语法)

发布于07月12日

我在R中使用欧盟统计局的数据，它的变量中有Geopolitical entity (reporting)，而这些变量通常会采用"Euro Area - 12 countries (2001-2006)"或"European Union - 27 countries (from 2020)"这样的值.现在，我想缩写Geopolitical entity (reporting)中以"Euro"开头的所有值，这样我就只剩下EA12或EU27这样的值，即保留前两个单词的第一个字母，然后是国家数.我知道我需要使用mutate和case_when以及gsub或sub，但我在正则表达式方面从来都不擅长.对于上下文:

library(tidyverse)
library(eurostat) 
df <- get_eurostat("spr_exp_pens",type = "label")
colnames(df)[1:5] <- label_eurostat_vars(df)

此外，我知道我还需要继续:

df %>% mutate(`Geopolitical entity (reporting)` = 
       case_when( ...

有谁能帮帮我吗？

编辑:我意识到Geopolitical entity (reporting)也采用Germany (until 1990 former territory of the FRG)和European Economic Area (EEA18-1995, EEA28-2004, EEA30-2007, EEA31-2013, EEA30-2020)个的值，并希望将它们也缩短为Germany和EEA.

我取得了一些进展:

df <- df %>% mutate(`Geopolitical entity (reporting)` =
           case_when(
             grepl("Germany", `Geopolitical entity (reporting)`, ignore.case = TRUE) ~ "Germany",
             startsWith(`Geopolitical entity (reporting)`, "Euro") ~
               sub("^([A-Za-z])[A-Za-z]*\\s([A-Za-z])[A-Za-z]*\\s-\\s(\\d+).*", "\\U\\1\\U\\2\\3", `Geopolitical entity (reporting)`, perl = TRUE),
             startsWith(`Geopolitical entity (reporting)`, "European Economic Area") ~ "EEA",
             TRUE ~ `Geopolitical entity (reporting)`))

然而，有两个价值仍然存在:

European Economic Area (EEA18-1995, EEA28-2004, EEA30-2007, EEA31-2013, EEA30-2020)个
Euro area – 20 countries (from 2023)个

有人能解释一下为什么以及如何解决最后这些问题吗？

library(tidyverse) # install.packages("eurostat") library(eurostat) df <- get_eurostat("spr_exp_pens",type = "label") colnames(df)[1:5] <- label_eurostat_vars(df) df %>% mutate(`Geopolitical entity (reporting)` = case_when( str_detect(`Geopolitical entity (reporting)`, "Euro area") ~ paste0( "EA", str_extract(`Geopolitical entity (reporting)`, "\\d+") ), str_detect(`Geopolitical entity (reporting)`, "European Union") ~ paste0( "EU", str_extract(`Geopolitical entity (reporting)`, "\\d+") ), str_detect(`Geopolitical entity (reporting)`, "Germany") ~ "Germany", str_detect(`Geopolitical entity (reporting)`, "European Economic Area") ~ "EEA", TRUE ~ `Geopolitical entity (reporting)`) ) %>% select(`Geopolitical entity (reporting)`) %>% distinct() %>% print(n = 50) #> # A tibble: 45 × 1 #> `Geopolitical entity (reporting)` #> <chr> #> 1 Austria #> 2 Ireland #> 3 Lithuania #> 4 Latvia #> 5 Malta #> 6 Albania #> 7 Belgium #> 8 Bulgaria #> 9 Switzerland #> 10 Cyprus #> 11 Czechia #> 12 Germany #> 13 Denmark #> 14 EA12 #> 15 EA19 #> 16 EA20 #> 17 Estonia #> 18 Greece #> 19 Spain #> 20 EU27 #> 21 Finland #> 22 France #> 23 Croatia #> 24 Hungary #> 25 Iceland #> 26 Italy #> 27 Luxembourg #> 28 Montenegro #> 29 Netherlands #> 30 Norway #> 31 Poland #> 32 Portugal #> 33 Romania #> 34 Serbia #> 35 Sweden #> 36 Slovenia #> 37 Slovakia #> 38 Türkiye #> 39 Bosnia and Herzegovina #> 40 EA18 #> 41 EU15 #> 42 EU28 #> 43 United Kingdom #> 44 EEA #> 45 North Macedonia

df_2 <- df %>% mutate(`Geopolitical entity (reporting)` = case_when( str_detect(`Geopolitical entity (reporting)`, "Euro area") ~ paste0( "EA", str_extract(`Geopolitical entity (reporting)`, "\\d+") ), str_detect(`Geopolitical entity (reporting)`, "European Union") ~ paste0( "EU", str_extract(`Geopolitical entity (reporting)`, "\\d+") ), str_detect(`Geopolitical entity (reporting)`, "Germany") ~ "Germany", str_detect(`Geopolitical entity (reporting)`, "European Economic Area") ~ "EEA", TRUE ~ `Geopolitical entity (reporting)`) )

缩写某些字符串(dplyr 语法)

推荐答案

R相关问答推荐

如何以编程方式将X轴勾号上的希腊符号合并到R图中？

使用facet_wrap()时如何将面板标题转换为脚注？

DT：：可数据的正规表达OR运算符问题

有没有方法将琴弦完全捕捉到R中的多边形？

变量计算按R中的行更改

无法运行通过R中的Auto.arima获得的ARIMA模型

使用ggplot 2根据R中的类别排列Likert比例gplot

如何使用stat_extract_all正确提取我的目标值？

隐藏e_mark_line的工具提示

整数成随机顺序与约束R？

有效识别长载体中的高/低命中

将向量元素重新排序为R中的第二个

R -使用矩阵reshape 列表

如何在PDF格式的kableExtra表格中显示管道字符？

如何移除GGPlot中超出与面相交的任何格网像元

如何在R中改变fviz_pca_biplot中圆的边界线的 colored颜色？

创建列并对大型数据集中的特定条件进行成对比较的更高效程序

将列表中的字符串粘贴到R中for循环内的dplyr筛选器中

将列的值乘以在不同数据集中找到的值

按两个因素将观测值分组后计算单独的百分比