image of excelexcel image-技能流前排 专栏(雇主赞助,州领地提名,区域,商业创新,全球人才,技能独立,杰出人才,区域技能,11月在岸(应删除))
- 家庭流顶行 列(合作伙伴父母、子元素、其他家庭)
如何将技能流和家庭流作为一个列,前面有子类别?
需要在R语言中将其转换为列 Here is a link to download my data.个
image of excelexcel image-技能流前排 专栏(雇主赞助,州领地提名,区域,商业创新,全球人才,技能独立,杰出人才,区域技能,11月在岸(应删除))
如何将技能流和家庭流作为一个列,前面有子类别?
需要在R语言中将其转换为列 Here is a link to download my data.个
# load libraries
library(readxl)
library(tidyverse)
library(zoo)
file_path <-"~/Downloads/migration_trends_statistical_package_2021_22.xlsx"
# pull out the top column names. For some reason, this requires skipping the first 5 rows. Don't ask me why, I don't know. This is likely to change sheet to sheet, so you want to adjust it so that you get the same result.
categories <- read_excel(path = file_path, sheet = "1.1", col_names = FALSE, skip = 5, n_max = 2) %>%
mutate(across(everything(), ~ str_trim(str_remove_all(., "\\d+"))))
# we then turn it into a cleaned long dataframe, from a messy wide one
categories <- data.frame(
category = na.locf(as.character(categories[1,])), # fill in the blanks with the last non-NA value
name = as.character(categories[2,])
)
read_excel(path = file_path, sheet = "1.1", skip = 7, col_names = categories[,2]) %>%
pivot_longer(-Year, values_transform = list(value = as.numeric)) %>%
left_join(categories)
# A tibble: 468 × 4
Year name value category
<chr> <chr> <dbl> <chr>
1 2012–13 "Employer Sponsored" 47740 Skill stream
2 2012–13 "State/Territory Nominated" 21637 Skill stream
3 2012–13 "Regional" NA Skill stream
4 2012–13 "Business Innovation and Investment" 7010 Skill stream
5 2012–13 "Global Talent (Independent)" NA Skill stream
6 2012–13 "Skilled Independent" 44251 Skill stream
7 2012–13 "Distinguished\r\n Talent" 200 Skill stream
8 2012–13 "Skilled Regional" 8132 Skill stream
9 2012–13 "November\r\nOnshore" NA Skill stream
10 2012–13 "Skill stream total" 128973 Skill stream
# ℹ 458 more rows