我有一个包含Year、Quarter、QY列和许多数值变量的数据集.
#example dataset
Year = c("2019", "2020", "2021",
"2019", "2020", "2021",
"2019", "2020", "2021",
"2019", "2020", "2021")
Quarter = c("1Q", "1Q", "1Q",
"2Q", "2Q", "2Q",
"3Q", "3Q", "3Q",
"4Q", "4Q", "4Q")
QY = c("1Q19", "1Q20", "1Q21",
"2Q19", "2Q20", "2Q21",
"3Q19", "3Q20", "3Q21",
"4Q19", "4Q20", "4Q21")
VAR1 = c(10, 20, 30,
30, 20, 25,
27, 10, 15,
13, 34, 25)
df <- data.frame(Year, Quarter, QY, VAR1)
我需要把所有的字符列的因素.Year和Quarter列在转换时似乎有正确的级别,但QY列没有,所以我手动定义了它.
df$Year <- as.factor(df$Year)
df$Quarter <- as.factor(df$Quarter)
df$QY <- as.factor(df$QY)
#check the level, QY is incorrect
sapply(df, str)
#manually defined QY level
df$QY <- factor(df$QY,
levels = c("1Q19", "2Q19", "3Q19", "4Q19",
"1Q20", "2Q20", "3Q20", "4Q20",
"1Q21", "2Q21", "3Q21", "4Q21"))
有没有一种更有效的方式让R可以自动识别这些QY的水平,而不需要我手动定义它?尤其是随着QY的增长,我需要从2019年开始定义它.
我认为没有必要将Year、Quarter、QY列更改为Time Variable,但请澄清我是否应该这样做.我的计算和数据可视化需求主要是按年更改或按QY更改.