我有两个表,其中包含零售商用户在前期和后期使用的信息.我的目标是找出零售商在后期对用户来说是什么 fresh 事.我想知道如何在user_id
窗口中做到这一点,因为is_in()
和loc[]
解决方案不太适合我的任务.此外,我还在考虑过滤联接(特别是反联接),但效果并不好.以下是示例数据:
sample1 = pd.DataFrame(
{
'user_id': [45, 556, 556, 556, 556, 556, 556, 1344, 1588, 2063, 2063, 2063, 2673, 2982, 2982],
'retailer': ['retailer_1', 'retailer_1', 'retailer_2', 'retailer_3', 'retailer_4', 'retailer_5', 'retailer_6',
'retailer_3', 'retailer_2', 'retailer_2', 'retailer_3', 'retailer_7', 'retailer_1', 'retailer_1', 'retailer_2']
}
)
sample2 = pd.DataFrame(
{
'user_id': [45, 45, 556, 556, 556, 556, 556, 556, 1344, 1588, 2063, 2063, 2063, 2673, 2673, 2982, 2982],
'retailer': ['retailer_1', 'retailer_6', 'retailer_1', 'retailer_2', 'retailer_3', 'retailer_4', 'retailer_5', 'retailer_6',
'retailer_3', 'retailer_2', 'retailer_2', 'retailer_3', 'retailer_7', 'retailer_1', 'retailer_2', 'retailer_1', 'retailer_2']
}
)
我想要的结果是这样的:
{'user_id': {0: 45, 1: 45, 2: 556, 3: 556, 4: 556, 5: 556, 6: 556, 7: 556, 8: 1344, 9: 1588, 10: 2063, 11: 2063, 12: 2063, 13: 2673, 14: 2673, 15: 2982, 16: 2982}, 'retailer': {0: 'retailer_1', 1: 'retailer_6', 2: 'retailer_1', 3: 'retailer_2', 4: 'retailer_3', 5: 'retailer_4', 6: 'retailer_5', 7: 'retailer_6', 8: 'retailer_3', 9: 'retailer_2', 10: 'retailer_2', 11: 'retailer_3', 12: 'retailer_7', 13: 'retailer_1', 14: 'retailer_2', 15: 'retailer_1', 16: 'retailer_2'}, 'is_new_retailer': {0: 0, 1: 1, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 1, 15: 0, 16: 0}}