我有一个数据源,看起来大致像这样:

      location  available  sold     name  local_id    more_data
0   1001 - BBB          1     0    Alpha        24   'DJQ3DD3y'
1   1001 - BBB          1     0    Alpha        24   'aB3joXQy'
2   1001 - BBB          1     0    Alpha        24   'ZJve572B'
3   1001 - BBB          1     0    Alpha        24   'DJEkx8Dy'
4   1001 - BBB          1     0    Alpha        24   'VyVaWLYp'
5   1001 - BBB          1     0    Bravo        19   'Rpr7AvVy'
6   1001 - BBB          1     0    Bravo        19   'ZJlO0VmB'
7   1001 - BBB          1     0    Bravo        19   'OBb6NrrB'
8   1001 - BBB          1     0    Bravo        19   'ZJvPPEXy'
9   1001 - BBB          1     0  Charlie         6   'Vy9MMOEy'
10  1001 - BBB          1     0  Charlie         6   'MJ8AALKp'
11  1001 - BBB          1     0    Delta        17   'vpWmN1kB'
12  1001 - BBB          1     0    Delta        17   'DJEb9qQp'
13  1001 - BBB          1     0     Echo         7   'qyZ1zn1p'
14  1001 - BBB          1     0     Echo         7   'bBqaYoMB'
15  1001 - BBB          1     0     Golf        22   'AJgLr9qp'
16  1001 - BBB          1     0     Golf        22   'vBdV57Ap'
17  1001 - BBB          1     0     Golf        22   'VJYxnLZB'
18   1001 - GG       1029   237  Charlie         6   'VJYxnGXB'
19   1001 - GG       1029   237  Charlie         6   'Vy9Mo52y'
20   1001 - GG       1029   237    Delta        17   'aB3zxYWy'
21   1001 - GG       1029   237    Delta        17   'MJ8A3z0p'
22   1001 - GG       1029   237     Echo         7   'YpLMPwNy'
23   1001 - GG       1029   237     Echo         7   '8Bwev1ep'
24   1001 - GG       1029   237     Golf        22   'MJXm6bLp'
25   1001 - GG       1029   237     Golf        22   'oye7XR0J'
26   1001 - GG       1029   237     Golf        22   'vpWmDDYB'
27    1001 - P        873   375  Charlie         6   'DJEbjjkp'
28    1001 - P        873   375  Charlie         6   'aB3z66zy'
29    1001 - P        873   375    Delta        17   'Kp4zrrKB'
30    1001 - P        873   375    Delta        17   'oyxqMMAB'
31    1001 - P        873   375     Echo         7   'zJ1KMMZy'
32    1001 - P        873   375     Echo         7   'ZJlOzz6B'
33    1001 - P        873   375  Foxtrot        20   'YpLMbbay'
34    1001 - P        873   375  Foxtrot        20   'ZJnmzzYB'
35    1001 - P        873   375     Golf        22   'Kp4zr5LB'
36    1001 - P        873   375     Golf        22   'oye7jg8J'
37    1001 - P        873   375     Golf        22   'OBb6jE3B'
38   1002 - GG         37    11  Charlie         6   'EyGMWPbJ'
39   1002 - GG         37    11  Charlie         6   'aB3zOoDy'
40   1002 - GG         37    11    Delta        17   'DJQ4laLB'
41   1002 - GG         37    11    Delta        17   'ZJlOvNXB'
42   1002 - GG         37    11     Echo         7   'Rpr7a8Dy'
43   1002 - GG         37    11     Echo         7   'zJjYNR4B'
44   1002 - GG         37    11     Golf        22   'Vy9MqkRy'
45   1002 - GG         37    11     Golf        22   'oye7Y0kJ'
46   1002 - GG         37    11     Golf        22   '8BweZbnp'
47    1002 - P       1854   826  Charlie         6   'Rpr7Z5by'
48    1002 - P       1854   826  Charlie         6   'vBdVK1Ap'
49    1002 - P       1854   826    Delta        17   '4Jkae8Dy'
50    1002 - P       1854   826    Delta        17   'YpLM3nxy'
51    1002 - P       1854   826     Echo         7   'VB7vD6Py'
52    1002 - P       1854   826     Echo         7   'ZJlOXbzB'
53    1002 - P       1854   826  Foxtrot        20   'MpNqezKJ'
54    1002 - P       1854   826  Foxtrot        20   '9pOWo39p'
55    1002 - P       1854   826     Golf        22   'MJXm5qnp'
56    1002 - P       1854   826     Golf        22   'oy5vxd4B'
57    1002 - P       1854   826     Golf        22   'DJQ4qz3B'

如您所见,availablesold列中的数据与location列相关地呈现,并且与该列中数字和字母的独特组合相关. 我试图做的是按照location列的第一部分将这些分组(即,"available1 - XXX"的"available1"部分),并对availablesold的唯一值进行相加,同时保留各行唯一的其他数据,并包括属于键的数字类型,我不想更改.

因此,输出如下所示:

    location  available  sold     name  local_id    more_data
0       1001       1903   612    Alpha        24   'DJQ3DD3y'
1       1001       1903   612    Alpha        24   'aB3joXQy'
2       1001       1903   612    Alpha        24   'ZJve572B'
3       1001       1903   612    Alpha        24   'DJEkx8Dy'
4       1001       1903   612    Alpha        24   'VyVaWLYp'
5       1001       1903   612    Bravo        19   'Rpr7AvVy'
6       1001       1903   612    Bravo        19   'ZJlO0VmB'
7       1001       1903   612    Bravo        19   'OBb6NrrB'
8       1001       1903   612    Bravo        19   'ZJvPPEXy'
9       1001       1903   612  Charlie         6   'Vy9MMOEy'
10      1001       1903   612  Charlie         6   'MJ8AALKp'
11      1001       1903   612    Delta        17   'vpWmN1kB'
12      1001       1903   612    Delta        17   'DJEb9qQp'
13      1001       1903   612     Echo         7   'qyZ1zn1p'
14      1001       1903   612     Echo         7   'bBqaYoMB'
15      1001       1903   612     Golf        22   'AJgLr9qp'
16      1001       1903   612     Golf        22   'vBdV57Ap'
17      1001       1903   612     Golf        22   'VJYxnLZB'
18      1001       1903   612  Charlie         6   'VJYxnGXB'
19      1001       1903   612  Charlie         6   'Vy9Mo52y'
20      1001       1903   612    Delta        17   'aB3zxYWy'
21      1001       1903   612    Delta        17   'MJ8A3z0p'
22      1001       1903   612     Echo         7   'YpLMPwNy'
23      1001       1903   612     Echo         7   '8Bwev1ep'
24      1001       1903   612     Golf        22   'MJXm6bLp'
25      1001       1903   612     Golf        22   'oye7XR0J'
26      1001       1903   612     Golf        22   'vpWmDDYB'
27      1001       1903   612  Charlie         6   'DJEbjjkp'
28      1001       1903   612  Charlie         6   'aB3z66zy'
29      1001       1903   612    Delta        17   'Kp4zrrKB'
30      1001       1903   612    Delta        17   'oyxqMMAB'
31      1001       1903   612     Echo         7   'zJ1KMMZy'
32      1001       1903   612     Echo         7   'ZJlOzz6B'
33      1001       1903   612  Foxtrot        20   'YpLMbbay'
34      1001       1903   612  Foxtrot        20   'ZJnmzzYB'
35      1001       1903   612     Golf        22   'Kp4zr5LB'
36      1001       1903   612     Golf        22   'oye7jg8J'
37      1001       1903   612     Golf        22   'OBb6jE3B'
38      1002       1891   837  Charlie         6   'EyGMWPbJ'
39      1002       1891   837  Charlie         6   'aB3zOoDy'
40      1002       1891   837    Delta        17   'DJQ4laLB'
41      1002       1891   837    Delta        17   'ZJlOvNXB'
42      1002       1891   837     Echo         7   'Rpr7a8Dy'
43      1002       1891   837     Echo         7   'zJjYNR4B'
44      1002       1891   837     Golf        22   'Vy9MqkRy'
45      1002       1891   837     Golf        22   'oye7Y0kJ'
46      1002       1891   837     Golf        22   '8BweZbnp'
47      1002       1891   837  Charlie         6   'Rpr7Z5by'
48      1002       1891   837  Charlie         6   'vBdVK1Ap'
49      1002       1891   837    Delta        17   '4Jkae8Dy'
50      1002       1891   837    Delta        17   'YpLM3nxy'
51      1002       1891   837     Echo         7   'VB7vD6Py'
52      1002       1891   837     Echo         7   'ZJlOXbzB'
53      1002       1891   837  Foxtrot        20   'MpNqezKJ'
54      1002       1891   837  Foxtrot        20   '9pOWo39p'
55      1002       1891   837     Golf        22   'MJXm5qnp'
56      1002       1891   837     Golf        22   'oy5vxd4B'
57      1002       1891   837     Golf        22   'DJQ4qz3B'

我知道我错过了一些超级简单的事情,因为这种问题是Pandas 的主导权.但遗憾的是,这正是我处于学习曲线上的阶段;我希望有人能 bootstrap 我走向正确的方向.

推荐答案

您可以在位置上将 bootstrap 部分计算为extract,然后在唯一行(使用drop_duplicates获得)上计算groupby.sum,并在原始形状上计算reindex:

df['location2'] = df['location'].str.extract(r'(\d+)')
df[['available', 'sold']] = (df.drop_duplicates('location')
                               .groupby('location2')[['available', 'sold']].sum()
                               .reindex(df['location2']).to_numpy()
                            )
df['location'] = df.pop('location2')

输出:

   location  available  sold     name  local_id   more_data
0      1001       1903   612    Alpha        24  'DJQ3DD3y'
1      1001       1903   612    Alpha        24  'aB3joXQy'
2      1001       1903   612    Alpha        24  'ZJve572B'
3      1001       1903   612    Alpha        24  'DJEkx8Dy'
4      1001       1903   612    Alpha        24  'VyVaWLYp'
5      1001       1903   612    Bravo        19  'Rpr7AvVy'
6      1001       1903   612    Bravo        19  'ZJlO0VmB'
7      1001       1903   612    Bravo        19  'OBb6NrrB'
8      1001       1903   612    Bravo        19  'ZJvPPEXy'
9      1001       1903   612  Charlie         6  'Vy9MMOEy'
10     1001       1903   612  Charlie         6  'MJ8AALKp'
11     1001       1903   612    Delta        17  'vpWmN1kB'
12     1001       1903   612    Delta        17  'DJEb9qQp'
13     1001       1903   612     Echo         7  'qyZ1zn1p'
14     1001       1903   612     Echo         7  'bBqaYoMB'
15     1001       1903   612     Golf        22  'AJgLr9qp'
16     1001       1903   612     Golf        22  'vBdV57Ap'
17     1001       1903   612     Golf        22  'VJYxnLZB'
18     1001       1903   612  Charlie         6  'VJYxnGXB'
19     1001       1903   612  Charlie         6  'Vy9Mo52y'
20     1001       1903   612    Delta        17  'aB3zxYWy'
21     1001       1903   612    Delta        17  'MJ8A3z0p'
22     1001       1903   612     Echo         7  'YpLMPwNy'
23     1001       1903   612     Echo         7  '8Bwev1ep'
24     1001       1903   612     Golf        22  'MJXm6bLp'
25     1001       1903   612     Golf        22  'oye7XR0J'
26     1001       1903   612     Golf        22  'vpWmDDYB'
27     1001       1903   612  Charlie         6  'DJEbjjkp'
28     1001       1903   612  Charlie         6  'aB3z66zy'
29     1001       1903   612    Delta        17  'Kp4zrrKB'
30     1001       1903   612    Delta        17  'oyxqMMAB'
31     1001       1903   612     Echo         7  'zJ1KMMZy'
32     1001       1903   612     Echo         7  'ZJlOzz6B'
33     1001       1903   612  Foxtrot        20  'YpLMbbay'
34     1001       1903   612  Foxtrot        20  'ZJnmzzYB'
35     1001       1903   612     Golf        22  'Kp4zr5LB'
36     1001       1903   612     Golf        22  'oye7jg8J'
37     1001       1903   612     Golf        22  'OBb6jE3B'
38     1002       1891   837  Charlie         6  'EyGMWPbJ'
39     1002       1891   837  Charlie         6  'aB3zOoDy'
40     1002       1891   837    Delta        17  'DJQ4laLB'
41     1002       1891   837    Delta        17  'ZJlOvNXB'
42     1002       1891   837     Echo         7  'Rpr7a8Dy'
43     1002       1891   837     Echo         7  'zJjYNR4B'
44     1002       1891   837     Golf        22  'Vy9MqkRy'
45     1002       1891   837     Golf        22  'oye7Y0kJ'
46     1002       1891   837     Golf        22  '8BweZbnp'
47     1002       1891   837  Charlie         6  'Rpr7Z5by'
48     1002       1891   837  Charlie         6  'vBdVK1Ap'
49     1002       1891   837    Delta        17  '4Jkae8Dy'
50     1002       1891   837    Delta        17  'YpLM3nxy'
51     1002       1891   837     Echo         7  'VB7vD6Py'
52     1002       1891   837     Echo         7  'ZJlOXbzB'
53     1002       1891   837  Foxtrot        20  'MpNqezKJ'
54     1002       1891   837  Foxtrot        20  '9pOWo39p'
55     1002       1891   837     Golf        22  'MJXm5qnp'
56     1002       1891   837     Golf        22  'oy5vxd4B'
57     1002       1891   837     Golf        22  'DJQ4qz3B'

Python相关问答推荐

如何将桌子刮成带有Se的筷子/要求/Beautiful Soup ?

在Pandas框架中截短至固定数量的列

使用scipy. optimate.least_squares()用可变数量的参数匹配两条曲线

列表上值总和最多为K(以O(log n))的最大元素数

使用mySQL的SQlalchemy过滤重叠时间段

2维数组9x9,不使用numpy.数组(MutableSequence的子类)

删除最后一个pip安装的包

在Polars(Python库)中将二进制转换为具有非UTF-8字符的字符串变量

有没有一种方法可以从python的pussompy比较结果中提取文本?

Python中的变量每次增加超过1

在Python 3中,如何让客户端打开一个套接字到服务器,发送一行JSON编码的数据,读回一行JSON编码的数据,然后继续?

如何使用Numpy. stracards重新编写滚动和?

在极中解析带有数字和SI前缀的字符串

在输入行运行时停止代码

将标签移动到matplotlib饼图中楔形块的开始处

BeautifulSoup:超过24个字符(从a到z)的迭代失败:降低了首次深入了解数据集的复杂性:

GPT python SDK引入了大量开销/错误超时

如何在一组行中找到循环?

numpy数组和数组标量之间的不同行为

Pandas数据框上的滚动平均值,其中平均值的中心基于另一数据框的时间