我正在用R中的一个数据集,我试图根据每个女性与户主的关系来计算每个女性的子元素数量.数据集包括家庭ID、个人ID、与户主的关系、年龄、性别和收入等变量.
HouseholdID IndividualID Relationshiptothehouseholdhead Age Gender Income
<dbl> <dbl> <chr> <dbl> <chr> <dbl>
1 1 1 C 80 male 150
2 1 2 D 81 female 120
3 1 3 A 60 male 630
4 1 4 B 59 female 500
5 1 5 E3 35 male 380
6 1 6 F3 30 female 220
7 1 7 E5 33 female 170
8 1 8 F5 30 male 160
9 1 9 G32 20 female 290
10 1 10 G51 15 female 200
11 1 11 G52 12 female 100
12 1 12 G55 8 male 80
13 2 1 A 58 male 380
14 2 2 B 55 female 220
15 2 3 E1 35 male 170
16 2 4 F1 37 female 160
17 2 5 E2 33 male 290
18 2 6 F2 30 female 110
19 2 7 G21 17 female 210
20 2 8 G22 15 female 750
21 2 9 G23 12 female 350
表中提供的数据 struct 包括以下变量:
Household ID: This is a unique identifier for a family household.
Individual ID: This is a unique number assigned to each individual within the household.
Relationship to the household head: Specific symbols are used to represent the relationship of an individual to the head of the household.
- "A"指户主本身;
- "B"指户主的配偶;
- "C"指户主的父亲;
- "D"指户主的母亲;
- 对于户主的子女及其后代,符号"E1"用于第一个子女,"E2"用于第二个子女,等等;"F1"用于第一个子女的配偶,"F2"用于第二个子女的配偶,等等.
- 对于孙辈,"G11"表示第一个子女(E1)的第一个子女、"G12"表示第一个子女(E1)的第二个子女、"G21"表示第二个子女(E2)的第一个子女等;"H11"表示第一个子女(G11)的配偶等.
Age: The age of the individual.
Gender: The gender of the individual, represented by "male" or "female".
Income: The income situation of the individual.
请根据表1中的数据生成类似于表2的数据集,并满足以下要求:
- 只包括女性个人.
- 计算每个女性生下的子元素数.
值得注意的是, children 的数量主要由字母后面的最高数量决定,而不是简单地计算数据中的观察数量.例如,在家庭1中,ID等于4的个人应被视为有5个子元素,而不是2个.
结果应如下:
HouseholdID IndividualID Age Gender Income Numofkids
1 2 81 female 120 1
1 4 59 female 500 5
1 6 30 female 220 2
1 7 33 female 170 3
1 9 35 female 290 0
1 10 15 female 200 0
1 11 12 female 100 0
2 2 55 female 220 2
2 4 37 female 160 0
2 6 30 female 110 3
2 7 17 female 210 0
2 8 15 female 750 0
2 9 12 female 350 0
这是数据
data = structure(list(HouseholdID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2), IndividualID = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9), Relationshiptothehouseholdhead = c("C",
"D", "A", "B", "E3", "F3", "E5", "F5", "G32", "G51", "G52", "G55",
"A", "B", "E1", "F1", "E2", "F2", "G21", "G22", "G23"), Age = c(80,
81, 60, 59, 35, 30, 33, 30, 20, 15, 12, 8, 58, 55, 35, 37, 33,
30, 17, 15, 12), Gender = c("male", "female", "male", "female",
"male", "female", "female", "male", "female", "female", "female",
"male", "male", "female", "male", "female", "male", "female",
"female", "female", "female"), Income = c(150, 120, 630, 500,
380, 220, 170, 160, 290, 200, 100, 80, 380, 220, 170, 160, 290,
110, 210, 750, 350)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -21L))
谢谢!