我在指定拉姆达函数时遇到困难.我希望拥有类似下面的Lambda的东西,但不完全是这样.代码应将rejected_time与组内的任何paid_out_time进行比较,如果rejected_time发生在任何paid_out_time后5分钟内,则返回True.
f = lambda x: ((x['rejected_time'].dropna() - x['paid_out_time'].dropna()).between(pd.Timedelta(0), pd.Timedelta(minutes=5)))
使用x['paid_out_time'].min()
会产生大约x['paid_out_time'].min()
k个True值,但删除.min()
会导致显着减少.我不知道如何使用所有paid_out_times与逐行reposed_time进行比较,并查看拒绝时间是否发生在paid_out_time之后0- 5分钟.
我一直在测试这个代码:
cols = ['paid_out_time', 'rejected_time']
df[cols] = df[cols].apply(pd.to_datetime, errors='coerce')
f = lambda x: ((x['rejected_time'].dropna() - x['paid_out_time'].dropna().min()).between(pd.Timedelta(0), pd.Timedelta(minutes=5)))
df['paid_out_auto_rejection'] = df.groupby('personal_id', group_keys=False).apply(f).astype(int)
以下是一些测试数据:
personal_id | application_id | rejected_time | paid_out_time | expected |
---|---|---|---|---|
26A | 1ab | 2022-09-12 09:20:40.592 | NaT | 1 |
26A | 1ab | 2022-08-23 07:40:03.447463 | NaT | 0 |
26A | 1ab | 2022-08-02 23:16:59.545392 | NaT | 1 |
26A | 1ab | 2022-08-02 23:16:59.545392 | NaT | 1 |
26A | 1ab | 2022-09-12 09:20:40.592000 | 2022-08-02 23:16:59.545392 | 1 |
26A | 1ab | 2022-09-02 18:33:42.226000 | NaT | 0 |
26A | 8f0 | 2022-09-12 09:20:40.592000 | NaT | 1 |
26A | 8f0 | 2022-09-12 09:20:40.592000 | NaT | 1 |
26A | 8f0 | NaT | 2022-09-12 09:20:40.592 | 0 |
26A | 8f0 | 2022-09-12 09:21:08.604000 | NaT | 1 |
26A | 8f0 | 2022-09-22 08:27:45.693060 | NaT | 0 |