我想计算一个金融数据时间序列的滚动"1m"
统计数据.假设每个月的计算行数不会总是相等,除非您有足够的数据来划分月份,这是不常见的.
我try 分配一个列window_index
来跟踪计算中包含的行,因为我将使用.rolling().over()
表达式来计算每个窗口上的统计数据.我希望从最近的日期开始添加window_index
整数,并以其方式返回.
Here is an image to help explain:
目前,我正在try 做的是将红笔和蓝笔所表示的window_index
按组相加.虽然我希望数据按黄色钢笔标记进行分组.数据集在2023-02-07
结束,一个月前将是2023-01-07
,或最接近的值2023-01-06
.
这是我用来实现这一点的代码,但我不确定如何获得我想要的分组窗口.
df_window_index = (
data.group_by_dynamic(
index_column="date", every="1m", by="symbol"
)
.agg()
.with_columns(
pl.int_range(0, pl.len()).over("symbol").alias("window_index")
)
)
data = data.join_asof(df_window_index, on="date", by="symbol").sort(
"symbol"
)
使用timedelta
而不是字符串似乎并不能解决问题.
左边是上面的代码,右边的图像是当我赚every=timedelta(days=31)
的时候.我还是不明白为什么北极星要把那些日期改成31天的时间差.
任何帮助都将不胜感激,或者任何能给我指明正确方向的小贴士!谢谢!
生效日期:
df = pl.read_csv(b"""
date,open,high,low,close,volume,dividends,stock_splits,symbol,window_index
2021-01-04T00:00:00.000000000,133.52,133.61,126.76,129.41,143301900,0.0,0.0,AAPL,0
2021-01-05T00:00:00.000000000,128.89,131.74,128.43,131.01,97664900,0.0,0.0,AAPL,0
2021-01-06T00:00:00.000000000,127.72,131.05,126.38,126.6,155088000,0.0,0.0,AAPL,0
2021-01-07T00:00:00.000000000,128.36,131.63,127.86,130.92,109578200,0.0,0.0,AAPL,1
2021-01-08T00:00:00.000000000,132.43,132.63,130.23,132.05,105158200,0.0,0.0,AAPL,1
2021-01-11T00:00:00.000000000,129.19,130.17,128.5,128.98,100384500,0.0,0.0,AAPL,1
2021-01-12T00:00:00.000000000,128.5,129.69,126.86,128.8,91951100,0.0,0.0,AAPL,1
2021-01-13T00:00:00.000000000,128.76,131.45,128.49,130.89,88636800,0.0,0.0,AAPL,1
2021-01-14T00:00:00.000000000,130.8,131.0,128.76,128.91,90221800,0.0,0.0,AAPL,1
2021-01-15T00:00:00.000000000,128.78,130.22,127.0,127.14,111598500,0.0,0.0,AAPL,1
2021-01-19T00:00:00.000000000,127.78,128.71,126.94,127.83,90757300,0.0,0.0,AAPL,1
2021-01-20T00:00:00.000000000,128.66,132.49,128.55,132.03,104319500,0.0,0.0,AAPL,1
2021-01-21T00:00:00.000000000,133.8,139.67,133.59,136.87,120150900,0.0,0.0,AAPL,1
2021-01-22T00:00:00.000000000,136.28,139.85,135.02,139.07,114459400,0.0,0.0,AAPL,1
2021-01-25T00:00:00.000000000,143.07,145.09,136.54,142.92,157611700,0.0,0.0,AAPL,1
2021-01-26T00:00:00.000000000,143.6,144.3,141.37,143.16,98390600,0.0,0.0,AAPL,1
2021-01-27T00:00:00.000000000,143.43,144.3,140.41,142.06,140843800,0.0,0.0,AAPL,1
2021-01-28T00:00:00.000000000,139.52,141.99,136.7,137.09,142621100,0.0,0.0,AAPL,1
2021-01-29T00:00:00.000000000,135.83,136.74,130.21,131.96,177523800,0.0,0.0,AAPL,1
2021-02-01T00:00:00.000000000,133.75,135.38,130.93,134.14,106239800,0.0,0.0,AAPL,1
2021-02-02T00:00:00.000000000,135.73,136.31,134.61,134.99,83305400,0.0,0.0,AAPL,1
2021-02-03T00:00:00.000000000,135.76,135.77,133.61,133.94,89880900,0.0,0.0,AAPL,1
2021-02-04T00:00:00.000000000,136.3,137.4,134.59,137.39,84183100,0.0,0.0,AAPL,1
2021-02-05T00:00:00.000000000,137.35,137.42,135.86,136.76,75693800,0.2,0.0,AAPL,1
""".strip(), try_parse_dates=True)