我的目标是找出每一次观察的时薪.但问题是,月薪是有限度的,每月工作时间是有水平的.我怎么才能把月薪的上下限除以工作时数呢?

样本数据

dput(joint.time)
structure(list(totpinc = c(2, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 5, 
4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 3, 
3, 3, 3, 4, 4, 4, 4, 4, 2, 2, 4, 3, 4, 3, 4, 3, 2, 4, 4, 4, 4, 
5, 5, 4, 2, 2, 4, 4, 2, 2, 3, 3, 3, 2, 5, 2, 5, 2, 5, 2, 5, 6, 
5, 3, 5, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 5, 5, 5, 
3, 3, 3, 9, 4, 3, 3, 3, 3, 3, 4, 4, 3, 4, 4, 4, 3, 2, 3, 2, 3, 
2, 4, 5, 4, 5, 3, 2, 2, 2, 2, 6, 6, 6, 1, 1, 1, 5, 4, 1, 5, 4, 
1, 5, 4, 1, 5, 4, 6, 6, 6, 2, 2, 5, 5, 5, 5, 4, 3, 3, 3, 7, 4, 
7, 4, 7, 4, 7, 7, 6, 5, 5, 6, 5, 6, 5, 6, 5, 6, 5, 6, 5, 6, 5, 
6, 5, 1, 1, 2, 2, 2, 2, 2, 2, 6, 1, 2, 1, 2, 6, 6, 6, 2, 6, 6, 
6, 2, 2, 2, 2, 3, 3, 1, 4, 5, 2, 2, 2, 2, 3, 2, 3, 5, 5, 5, 3, 
3, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 4, 4, 3, 2, 2, 3, 
3, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 6, 4, 6, 2, 2, 2, 8, 7, 5, 
5, 5, 3, 10, 2, 1, 4, 4, 1, 1, 1, 1, 2, 1, 1, 1, 6, 3, 3, 3, 
3, 2, 3, 2, 3, 2, 4, 6, 4, 2, 6, 4, 2, 2, 2, 2, 2, 4, 4, 3, 3, 
3, 3, 4, 4, 3, 3, 3, 3, 3, 3, 8, 8, 8, 8, 5, 5, 3, 10, 4, 4, 
4, 4, 1, 4, 4, 4, 5, 5, 5, 4, 4, 4, 4, 4, 6, 6, 6, 6, 6, 2, 6, 
6, 6, 3, 3, 4, 4, 3, 3, 3, 3, 3, 5, 3, 5, 5, 5, 5, 2, 3, 2, 3, 
4, 6, 6, 6, 5, 5, 5, 5, 2, 2, 4, 3, 6, 4, 4, 4, 4, 4, 3, 3, 3, 
2, 2, 2, 2, 2, 4, 6, 4, 4, 4, 5, 5, 5, 5, 5, 5, 3, 6, 6, 6, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 10, 5, 10, 5, 10, 2, 
2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 11, 11, 
1, 3, 6, 2, 2, 2, 5, 5, 5, 5, 5, 3, 5, 3, 4, 3, 8, 8, 3, 1, 3, 
1, 1, 4, 4, 4, 4, 1, 6, 4, 4, 4, 4, 4, 4, 4, 5, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 4, 4, 4, 4), mthhrs = c(150.5, 193.5, 86, 193.5, 
86, 193.5, 86, 193.5, 86, 215, 172, 172, 154.8, 154.8, 154.8, 
150.5, 150.5, 150.5, 150.5, 150.5, 150.5, 150.5, 43, 43, 43, 
43, 258, 258, 172, 172, 172, 172, 129, 129, 129, 129, 150.5, 
150.5, 150.5, 150.5, 150.5, 60.2, 60.2, 172, 167.7, 172, 167.7, 
172, 167.7, 51.6, 159.1, 159.1, 159.1, 159.1, 167.7, 167.7, 150.5, 
68.8, 68.8, 197.8, 197.8, 86, 68.8, 141.9, 141.9, 141.9, 64.5, 
258, 64.5, 258, 64.5, 258, 64.5, 258, 193.5, 258, 64.5, 172, 
172, 86, 86, 86, 86, 86, 86, 86, 193.5, 193.5, 391.3, 391.3, 
391.3, 391.3, 391.3, 159.1, 159.1, 159.1, 86, 86, 86, 202.1, 
163.4, 60.2, 60.2, 60.2, 60.2, 60.2, 103.2, 103.2, 120.4, 159.1, 
159.1, 159.1, 150.5, 86, 150.5, 86, 150.5, 86, 180.6, 193.5, 
180.6, 193.5, 159.1, 34.4, 34.4, 43, 43, 236.5, 236.5, 236.5, 
124.7, 124.7, 90.3, 172, 150.5, 90.3, 172, 150.5, 90.3, 172, 
150.5, 90.3, 172, 150.5, 172, 172, 172, 150.5, 150.5, 167.7, 
167.7, 167.7, 167.7, 159.1, 215, 215, 215, 193.5, 154.8, 193.5, 
154.8, 193.5, 154.8, 159.1, 159.1, 159.1, 129, 129, 150.5, 172, 
150.5, 172, 150.5, 172, 150.5, 172, 150.5, 172, 150.5, 172, 150.5, 
172, 86, 86, 47.3, 47.3, 47.3, 81.7, 47.3, 47.3, 150.5, 55.9, 
107.5, 55.9, 107.5, 172, 172, 172, 64.5, 159.1, 159.1, 159.1, 
172, 172, 172, 172, 86, 86, 64.5, 163.4, 150.5, 81.7, 81.7, 81.7, 
172, 103.2, 172, 103.2, 172, 172, 172, 86, 86, 68.8, 150.5, 236.5, 
159.1, 150.5, 159.1, 150.5, 159.1, 150.5, 159.1, 64.5, 64.5, 
64.5, 64.5, 172, 159.1, 103.2, 86, 86, 137.6, 137.6, 64.5, 64.5, 
64.5, 94.6, 94.6, 94.6, 94.6, 159.1, 159.1, 159.1, 159.1, 301, 
159.1, 301, 60.2, 60.2, 60.2, 258, 215, 150.5, 150.5, 150.5, 
120.4, 387, 51.6, 30.1, 159.1, 150.5, 43, 43, 43, 43, 12.9, 154.8, 
154.8, 154.8, 159.1, 77.4, 150.5, 150.5, 77.4, 64.5, 77.4, 64.5, 
77.4, 64.5, 193.5, 193.5, 172, 266.6, 193.5, 172, 266.6, 107.5, 
107.5, 107.5, 107.5, 150.5, 150.5, 129, 129, 81.7, 81.7, 159.1, 
159.1, 159.1, 150.5, 159.1, 150.5, 159.1, 150.5, 258, 258, 258, 
258, 172, 172, 129, 193.5, 167.7, 172, 172, 159.1, 129, 94.6, 
94.6, 94.6, 258, 258, 258, 150.5, 150.5, 150.5, 154.8, 154.8, 
236.5, 236.5, 236.5, 236.5, 236.5, 68.8, 159.1, 159.1, 215, 133.3, 
133.3, 172, 172, 8.6, 8.6, 8.6, 167.7, 129, 129, 129, 129, 129, 
129, 129, 60.2, 107.5, 60.2, 107.5, 55.9, 154.8, 154.8, 154.8, 
129, 129, 129, 129, 68.8, 68.8, 107.5, 120.4, 193.5, 184.9, 94.6, 
94.6, 159.1, 159.1, 167.7, 167.7, 167.7, 68.8, 68.8, 68.8, 86, 
86, 361.2, 258, 150.5, 150.5, 150.5, 206.4, 206.4, 206.4, 206.4, 
206.4, 206.4, 159.1, 129, 129, 129, 154.8, 154.8, 150.5, 77.4, 
150.5, 77.4, 86, 86, 86, 172, 172, 172, 172, 172, 146.2, 236.5, 
146.2, 236.5, 146.2, 236.5, 86, 86, 150.5, 60.2, 150.5, 60.2, 
150.5, 60.2, 150.5, 60.2, 150.5, 60.2, 150.5, 60.2, 150.5, 60.2, 
150.5, 60.2, 103.2, 202.1, 202.1, 172, 120.4, 154.8, 86, 86, 
283.8, 180.6, 180.6, 180.6, 180.6, 172, 107.5, 172, 107.5, 159.1, 
258, 150.5, 150.5, 116.1, 47.3, 124.7, 129, 129, 301, 301, 159.1, 
159.1, 34.4, 172, 215, 215, 150.5, 150.5, 150.5, 150.5, 150.5, 
159.1, 197.8, 197.8, 172, 159.1, 172, 159.1, 172, 159.1, 172, 
159.1, 193.5, 193.5, 193.5, 193.5)), row.names = c(NA, 500L), class = "data.frame")

目标

  1. 将月薪的上下限除以工作时间.这将使我有足够的时薪.
  2. 使用小时工资界限,找到中点值.

请注意,情况并非如此.

|    totpinc    | mthhrs | hrwage_bound  | hrwage |
|    --------   | -----  |    --------   |--------|
| £215 - £435   | 150.5  |   1.4 - 2.9   | 2.2    |
| £870 - £1305  | 193.5  |   4.5 - 6.7   | 5.6    |
| £1305 - £1740 | 86     |  15.2 - 20.2  | 17.7   |
| £870 - £1305  | 193.5  |   4.5 - 6.7   | 5.6    |
| £1305 - £1740 | 86     |   15.2 - 20.2 | 17.7   |
| £870 - £1305  | 193.5  |   4.5 - 6.7   | 5.6    |
| £1305 - £1740 | 86     |   15.2 - 20.2 | 17.7   |
| £870 - £1305  | 193.5  |   4.5 - 6.7   | 5.6    |
| £1305 - £1740 | 86     |  15.2 - 20.2  | 17.7   |
| £870 - £1305  | 193.5  |   4.5 - 6.7   | 5.6    |

推荐答案

免责

> class(data_compressed$totpinc)
[1] "haven_labelled" "vctrs_vctr"    
[3] "double" 

我不是处理haven_labelled个数据的专家.因此,我使用了一种基于R的方法,希望能让你开始.有可能编写更简洁和更少的硬编码代码,使用外部库,…请根据你手头的实际问题修改这个答案.

我们可以看到,totpinc由十个标签组成:

> labels(data_compressed$totpinc)
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9" 
[10] "10"

This answer帮助我们获取以下属性:

> attributes(data_compressed$totpinc)$labels
         ineligible - not currently employed/ self employed 
                                                         -2 
                                   ineligible - under 16yrs 
                                                         -1 
                                           less than £  215 
                                                          1 
                                 £  215 to less than £  435 
                                                          2 
                                 £  435 to less than £  870 
                                                          3 
                                  £  870 to less than £1305 
                                                          4 
                                   £1305 to less than £1740 
                                                          5 
                                   £1740 to less than £2820 
                                                          6 
                                   £2820 to less than £3420 
                                                          7 
                                   £3420 to less than £3830 
                                                          8 
                                   £3830 to less than £4580 
                                                          9 
                                   £4580 to less than £6670 
                                                         10 
                                              £6670 or more 
                                                         11 
eligible (current employee or self-emp) - dk/ refuse income 
                                                         12 

获取和组织属性

在下文中,我使用与labels(data_compressed$totpinc)匹配的子集attributes(data_compressed$totpinc)$labels[3:12].对于names(),我提取字符串中给出的信息.Map() + nchar() + strsplit()我得到了数字.我在标签"1"上添加一个人工值"0".请将其更改为您认为在实际分析中更合适的任何内容.最后,我将character个值强制转换为numeric.

y = names(attributes(data_compressed$totpinc)$labels[3:12]) 
yy = Map(\(x) x[nchar(x) > 0L], strsplit(y, "\\D+"))
yy[[1]] = append(yy[[1]], "0", after = 0)
yy = lapply(yy, as.numeric)

这给了我们

> yy
[[1]]
[1]   0 215

[[2]]
[1] 215 435

[[3]]
[1] 435 870

[[4]]
[1]  870 1305

[[5]]
[1] 1305 1740

[[6]]
[1] 1740 2820

[[7]]
[1] 2820 3420

[[8]]
[1] 3420 3830

[[9]]
[1] 3830 4580

[[10]]
[1] 4580 6670

查找表

然后,我创建了一个简单的查找表.这是一种冗长但易于阅读的方法:

lookup = data.frame(labels(data_compressed$totpinc), 
                       t(list2DF(yy))) |>
  `colnames<-`(c("label", "minb", "maxb")) |>
  `row.names<-`(NULL)
lookup$max = lookup$max - 1 # "less than" 

看起来像

> lookup
   label  min  max
1      1    0  214
2      2  215  434
3      3  435  869
...

火柴

match玩具数据(data_compressed)的minbmaxb值:

data_compressed[c("minb", "maxb")] = 
  lookup[match(data_compressed$totpinc, lookup$label), c("minb", "maxb")]

我只是想弄清楚(显然,没有必要将这些列绑定到data_compressed.)

计算

最后,我执行以下计算:(1)将界限(minb,maxb)除以工作时间(mthhrs),(2)求出每个观测值的平均值:

data_compressed[c("min_mthhrs", "max_mthhrs")] = lapply(data_compressed[c("minb", "maxb")], \(x) x / data_compressed$mthhrs)
data_compressed$avg_per_hour = with(data_compressed, (min_mthhrs + max_mthhrs) / 2L)

这将导致

> data_compressed
   totpinc mthhrs minb maxb min_mthhrs max_mthhrs avg_per_hour
1        2  150.5  215  434   1.428571   2.883721     2.156146
2        4  193.5  870 1304   4.496124   6.739018     5.617571
3        5   86.0 1305 1739  15.174419  20.220930    17.697674
4        4  193.5  870 1304   4.496124   6.739018     5.617571
5        5   86.0 1305 1739  15.174419  20.220930    17.697674
6        4  193.5  870 1304   4.496124   6.739018     5.617571
7        5   86.0 1305 1739  15.174419  20.220930    17.697674
8        4  193.5  870 1304   4.496124   6.739018     5.617571
9        5   86.0 1305 1739  15.174419  20.220930    17.697674
10       4  215.0  870 1304   4.046512   6.065116     5.055814

R相关问答推荐

导入到固定列宽的R中时出现问题

在值和NA的行顺序中寻找中断模式

如何根据条件计算时差(天)

从gtsummary包中使用tBL_strata()和tBL_summary()时删除变量标签

在for循环中转换rabrame

如何在ggplot中标记qqplot上的点?

如何在观测缺失的地方添加零

如何同时从多个列表中获取名字?

找出二叉树中每个 node 在R中的深度?

如何在R forestplot中为多条垂直线分配唯一的 colored颜色 ?

从多个可选列中选取一个值到一个新列中

如何阻止围堵地理密度图?

R中治疗序列的相对时间指数

整理曲线图、曲线图和点图

如何使用包metaviz更改标签的小数位数?

使用函数从R中的列中删除标高

有没有办法更改ggplot2中第二个y轴的比例限制?

对一个数据帧中另一个数据帧中的值进行计数

如何在R中添加标识连续日期的新列

如何在基数R中根据矩阵散点图中的因子给数据上色?