Python Pandas 修正滚动平均

发布于02月27日

下面是我在Pandas 身上的离群点检测代码.我正在滚动窗口15，我想要做的是在窗口5，其中这个窗口是基于星期几的中心日期，即，如果中心是星期一，采取2倒退星期一和2向前星期一.滚动版对此没有任何支持.怎么办？

import pandas as pd
import numpy as np

np.random.seed(0)

dates = pd.date_range(start='2022-01-01', end='2023-12-31', freq='D')

prices1 = np.random.randint(10, 100, size=len(dates))
prices2 = np.random.randint(20, 120, size=len(dates)).astype(float)

data = {'Date': dates, 'Price1': prices1, 'Price2': prices2}
df = pd.DataFrame(data)

r = df.Price1.rolling(window=15, center=True)
price_up, price_low = r.mean() + 2 * r.std(), r.mean()  -  2 * r.std()

mask_upper = df['Price1'] > price_up
mask_lower = df['Price1'] < price_low

df.loc[mask_upper, 'Price1'] = r.mean()
df.loc[mask_lower, 'Price1'] = r.mean()

推荐答案

一种 Select 使用groupby.rolling和dayofweek作为分组，以确保在滚动中只使用相同的日期:

r = (df.set_index('Date')
       .groupby(df['Date'].dt.dayofweek.values) # avoid index alignment
       .rolling(f'{5*7}D', center=True)
       ['Price1']
    )
avg = r.mean().set_axis(df.index) # restore correct index
std = r.std().set_axis(df.index)
price_up, price_low = avg + 2 * std, avg  -  2 * std

mask_upper = df['Price1'] > price_up
mask_lower = df['Price1'] < price_low

df.loc[mask_upper, 'Price1'] = avg
df.loc[mask_lower, 'Price1'] = avg

输出示例:

          Date  Price1  Price2
0   2022-01-01    54.0    86.0
1   2022-01-02    57.0   117.0
2   2022-01-03    74.0    32.0
3   2022-01-04    77.0    35.0
4   2022-01-05    77.0    53.0
..         ...     ...     ...
725 2023-12-27    44.0    37.0
726 2023-12-28    60.0    65.0
727 2023-12-29    30.0   116.0
728 2023-12-30    53.0    82.0
729 2023-12-31    10.0    42.0

[730 rows x 3 columns]