我有两个表:一个(‘Sales’)包含销售数据(商品类型、销售日期和数量),另一个(‘Ref’)包含商品类型和参考日期.
我想在第二个表中添加一个列,该列将显示自参考日期起七天内相应商品的销售总量.
以下是示例数据:
sales = pd.DataFrame({'Fruit': {0: 'apples', 1: 'oranges', 2: 'pears', 3: 'apples', 4: 'apples', 5: 'bananas', 6: 'oranges', 7: 'pears', 8: 'pears', 9: 'oranges', 10: 'bananas', 11: 'apples', 12: 'pears', 13: 'pears', 14: 'apples', 15: 'pears', 16: 'oranges', 17: 'oranges', 18: 'pears'},
'Date': {0: '2023-07-07', 1: '2023-02-05', 2: '2023-08-16', 3: '2023-07-26', 4: '2023-07-14', 5: '2024-02-01', 6: '2023-09-19', 7: '2023-04-08', 8: '2023-06-08', 9: '2023-05-15', 10: '2023-10-20', 11: '2023-07-25', 12: '2023-07-31', 13: '2023-10-08', 14: '2023-06-28', 15: '2023-08-15', 16: '2023-05-14', 17: '2023-07-28', 18: '2023-07-29'},
'Quantity': {0: 18, 1: 10, 2: 10, 3: 20, 4: 16, 5: 14, 6: 18, 7: 18, 8: 14, 9: 19, 10: 16, 11: 16, 12: 17, 13: 10, 14: 16, 15: 15, 16: 18, 17: 20, 18: 19}})
sales['Date'] = pd.to_datetime(sales['Date'])
ref = pd.DataFrame({'Fruit': {0: 'apples', 1: 'bananas', 2: 'oranges', 3: 'apples', 4: 'pears', 5: 'oranges', 6: 'bananas', 7: 'oranges', 8: 'oranges'},
'Date': {0: '2023-07-25', 1: '2023-12-27', 2: '2023-07-13', 3: '2023-06-27', 4: '2023-07-08', 5: '2023-09-17', 6: '2023-10-25', 7: '2023-10-05', 8: '2023-04-14'}})
ref['Date'] = pd.to_datetime(ref['Date'])
例如,REF的第一行应该显示36个(2023-07-36年的20个苹果和2023-07-25年的16个苹果).
如果我使用的是Excel,我会使用以下公式:=SUMIF(sales.Quantity,sales.Fruit,ref.Fruit,sale.Date,">;="&;ref.Date-7,sales.Date,"<;="&;ref.Date+7).
在Python中,我可以获得所需的单项结果,如下所示:
sales[(sales['Fruit']=='apples')&
(sales['Date']>=pd.to_datetime('2023-07-25')-pd.to_timedelta(7, unit='d'))&
(sales['Date']<=pd.to_datetime('2023-07-25')+pd.to_timedelta(7, unit='d'))]['Quantity'].sum()
并使用iloc:
sales[(sales['Fruit']==ref.iloc[0,0])&
(sales['Date']>=ref.iloc[0,1]-pd.to_timedelta(7, unit='d'))&
(sales['Date']<=ref.iloc[0,1]+pd.to_timedelta(7, unit='d'))]['Quantity'].sum()
但是,当我try 添加一个新的列来引用这个计算时,我得到了‘ValueError:Can Can Compare Under-Label Series Object’.
ref['Total'] = sales[(sales['Fruit']==ref.iloc[ref.index,0])&
(sales['Date']>=ref.iloc[ref.index,1]-pd.to_timedelta(7, unit='d'))&
(sales['Date']<=ref.iloc[ref.index,1]+pd.to_timedelta(7, unit='d'))]['Quantity'].sum()
我猜我用ref.index代替iloc中的0来得到我需要的数字是错误的-我应该用什么来代替?