Python 根据 Pandas Groupby 中的组合标准添加新列

发布于05月13日

继我之前的问题之后(感谢那些回答的人)，我再一次陷入困境，无法实现我怀疑使用groupby英寸大Pandas 可能实现的目标.以下是我努力实现的目标.使用以下示例数据帧:

data_initial = {
"account_id": ['1001', '1001', '1001', '1002', '1002', '1002', '1002', '1002', '1002', '1002', '1002', '1002', '1002', '1003', '1003', '1003', '1003', '1003', '1003',],
"data_type": ['payment', 'payment', 'payment', 'payment', 'payment', 'plan', 'payment', 'plan', 'plan', 'payment', 'payment', 'payment', 'payment', 'payment', 'plan', 'payment', 'payment', 'payment', 'payment',],
"transaction_date": ['2022-04-01', '2022-04-12', '2022-05-02', '2022-02-02', '2022-03-01', '2022-03-15', '2022-04-01', '2022-04-01', '2022-04-13', '2022-04-26', '2022-05-01', '2022-05-04', '2022-05-10', '2022-03-10', '2022-03-25', '2022-04-05', '2022-04-16', '2022-04-24', '2022-05-05',],
"amount": ['-50', '-40', '-60', '-30', '-25', '250', '-50', '200', '200', '-25', '-25', '-25', '-25', '-20', '100', '-25', '-25', '-25', '-25',],}

initial dataframe

我希望有效地将account_id人分组，然后应用以下逻辑:

如果data_type是"付款"，并且{account_id没有data_type="计划"，或者记录的transaction_date在任何data_type="计划"记录之前，那么新的列classification="收据"与计划无关"
如果data_type是"付款"，并且{account_id有一个data_type="计划"，transaction_date在任何data_type="计划"记录之后，那么新的列classification="接收计划"
如果data_type表示"计划"是"计划"的唯一实例，则新列classification="仅"
如果data_type是"计划"，并且是"计划"的第一个实例，则新列classification="初始"
如果data_type是"计划"，并且不是"计划"的第一个实例，也不是最后一个实例，则新列classification="过期"
如果data_type是"计划"，并且是"计划"的最后一个实例，则新列classification="当前"

因此，示例数据帧的结果如下:

enter image description here

再次提前感谢所有能够提供帮助的人.非常感谢.

import numpy as np df['plans'] = df.groupby('account_id')['data_type'].transform(lambda x: x.eq('plan').cumsum()) df['n_plans'] = df.groupby('account_id')['plans'].transform('max') is_payment = df['data_type'].eq('payment') is_plan = df['data_type'].eq('plan') df['classification'] = np.select([is_payment & df['plans'].eq(0), is_payment & df['plans'].gt(0), is_plan & df['n_plans'].eq(1), is_plan & df['plans'].eq(1), is_plan & df['plans'].gt(1) & df['plans'].lt(df['n_plans']), is_plan & df['plans'].eq(df['n_plans'])], ['receipt_not_plan_related', 'receipt_on_plan', 'only', 'initial', 'expired', 'current']) print(df.drop(columns=['plans', 'n_plans']))

account_id data_type transaction_date amount classification 0 1001 payment 2022-04-01 -50 receipt_not_plan_related 1 1001 payment 2022-04-12 -40 receipt_not_plan_related 2 1001 payment 2022-05-02 -60 receipt_not_plan_related 3 1002 payment 2022-02-02 -30 receipt_not_plan_related 4 1002 payment 2022-03-01 -25 receipt_not_plan_related 5 1002 plan 2022-03-15 250 initial 6 1002 payment 2022-04-01 -50 receipt_on_plan 7 1002 plan 2022-04-01 200 expired 8 1002 plan 2022-04-13 200 current 9 1002 payment 2022-04-26 -25 receipt_on_plan 10 1002 payment 2022-05-01 -25 receipt_on_plan 11 1002 payment 2022-05-04 -25 receipt_on_plan 12 1002 payment 2022-05-10 -25 receipt_on_plan 13 1003 payment 2022-03-10 -20 receipt_not_plan_related 14 1003 plan 2022-03-25 100 only 15 1003 payment 2022-04-05 -25 receipt_on_plan 16 1003 payment 2022-04-16 -25 receipt_on_plan 17 1003 payment 2022-04-24 -25 receipt_on_plan 18 1003 payment 2022-05-05 -25 receipt_on_plan

Python 根据 Pandas Groupby 中的组合标准添加新列

推荐答案

Python相关问答推荐

将数据框架与导入的Excel文件一起使用

使用Python更新字典中的值

Scrapy和Great Expectations(great_expectations)—不合作

在Python中，从给定范围内的数组中提取索引组列表的更有效方法

不允许访问非IPM文件夹

UNIQUE约束失败：customuser. username

* 动态地 * 修饰Python中的递归函数

如何使用两个关键函数来排序一个多索引框架？

如何找出Pandas 图中的连续空值(NaN)？

OpenCV轮廓.很难找到给定图像的所需轮廓

Gekko中基于时间的间隔约束

Pandas—MultiIndex Resample—我不想丢失其他索引的信息´

使用Python异步地持久跟踪用户输入

如何关联来自两个Pandas DataFrame列的列表项？

为什么在不先将包作为模块导入的情况下相对导入不起作用

组颠倒大Pandas 数据帧

运行从Airflow包导入的python文件，需要airflow实例？

突出显示两幅图像之间的变化或差异区域

打印：添加具有不同填充 colored颜色的矩形

Python-迭代PANAS中的数据框并替换列表中不包含字符串的值