我正在try 设置一个指标,以判断新申请何时导致旧申请被拒绝.

如果personal_id内的任何rejected_time在creation_timestamp后5分钟内发生,则由于新应用程序而已被拒绝.基于此,我应该创建如示例中所示的列"new_app_causes_rejecting".

个人ID有数十万个,大多数都有多个应用程序ID,并且应用程序ID内的行数各不相同.

personal_id application_id creation_timestamp approved_amount rejected_time new_application_causes_rejection
5a 694f 2023-01-24 13:01:07.939534 8000.0 2023-01-24 13:13:15.499000 0
5a 694f 2023-01-24 13:01:07.939534 8000.0 2023-01-24 14:38:02.359000 1
5a 694f 2023-01-24 13:01:07.939534 8000.0 2023-01-24 14:37:18.616000 1
5a 694f 2023-01-24 13:01:07.939534 NaN 2023-01-24 13:03:59.626000 0
5a 43fa 2023-01-24 14:36:08.287521 NaN 2023-01-24 14:37:22.096000 0
5a 43fa 2023-01-24 14:36:08.287521 13000.0 2023-01-24 14:39:31.750000 1
5a 43fa 2023-01-24 14:36:08.287521 13000.0 2023-02-02 08:42:26.980106 1
5a 43fa 2023-01-24 14:36:08.287521 NaN 2023-01-24 14:37:22.948214 0
5a a4b6 2023-01-24 14:38:42.625969 5000.0 2023-02-02 08:42:26.980106 0
5a a4b7 2023-01-24 14:38:42.625969 NaN 2023-01-24 14:38:46.922000 0
5a a4b8 2023-01-24 14:38:42.625969 8000.0 2023-02-02 08:42:26.980106 0

推荐答案

如果在05 Minutes之间,则可以将每personal_idapplication_id的移动值与rejected_time进行比较:

df['creation_timestamp'] = pd.to_datetime(df['creation_timestamp'])
df['rejected_time'] = pd.to_datetime(df['rejected_time'])

#for correct output are columns sorted 
df.sort_values(by=['personal_id', 'creation_timestamp'], inplace=True)


td = pd.Timedelta('5 Min')

#create shifted timestamps per groups
s = (df[['personal_id','application_id','creation_timestamp']]
         .drop_duplicates()
         .set_index(['personal_id','application_id'])['creation_timestamp'].shift(-1))

#subtract shifted values and compare
df['new'] = (df['rejected_time'].sub(df.join(s.rename('new'),
                                             on=['personal_id','application_id'])['new'])
                                .between(pd.Timedelta(0), td).astype(int))

print (df)
   personal_id application_id         creation_timestamp approved_amount  \
0          5a           694f  2023-01-24 13:01:07.939534         8000.0    
1          5a           694f  2023-01-24 13:01:07.939534         8000.0    
2          5a           694f  2023-01-24 13:01:07.939534         8000.0    
3          5a           694f  2023-01-24 13:01:07.939534            NaN    
4          5a           43fa  2023-01-24 14:36:08.287521            NaN    
5          5a           43fa  2023-01-24 14:36:08.287521        13000.0    
6          5a           43fa  2023-01-24 14:36:08.287521        13000.0    
7          5a           43fa  2023-01-24 14:36:08.287521            NaN    
8          5a           a4b6  2023-01-24 14:38:42.625969         5000.0    
9          5a           a4b7  2023-01-24 14:38:42.625969            NaN    
10         5a           a4b8  2023-01-24 14:38:42.625969         8000.0    

                rejected_time  new_application_causes_rejection  new  
0  2023-01-24 13:13:15.499000                                 0    0  
1  2023-01-24 14:38:02.359000                                 1    1  
2  2023-01-24 14:37:18.616000                                 1    1  
3  2023-01-24 13:03:59.626000                                 0    0  
4  2023-01-24 14:37:22.096000                                 0    0  
5  2023-01-24 14:39:31.750000                                 1    1  
6  2023-02-02 08:42:26.980106                                 1    0  
7  2023-01-24 14:37:22.948214                                 0    0  
8  2023-02-02 08:42:26.980106                                 0    0  
9  2023-01-24 14:38:46.922000                                 0    1  
10 2023-02-02 08:42:26.980106                                 0    0  

Python相关问答推荐

从多行文本中提取事件对

Tokenizer Docker:无法为Tokenizer构建轮子,这是安装pyproject.toml项目所需的

PyTorch卷积自动编码器,输出维度与输入不同

如何判断LazyFrame是否为空?

已删除的构造函数调用另一个构造函数

在Python中管理多个OpenGVBO和VAO实例

如何使用stride_tricks.as_strided逆转NumPy数组

三个给定的坐标可以是矩形的点吗

线性模型PanelOLS和statmodels OLS之间的区别

如何使用Python将工作表从一个Excel工作簿复制粘贴到另一个工作簿?

如何使用html从excel中提取条件格式规则列表?

不理解Value错误:在Python中使用迭代对象设置时必须具有相等的len键和值

如果值不存在,列表理解返回列表

PyQt5,如何使每个对象的 colored颜色 不同?'

Python列表不会在条件while循环中正确随机化'

名为__main__. py的Python模块在导入时不运行'

如何指定列数据类型

使用Python从rotowire中抓取MLB每日阵容

Matplotlib中的字体权重

Pandas:填充行并删除重复项,但保留不同的值