freq='infer'
意味着从索引元数据推断出频率.
freq DateOffset、tSeries.Offsets、Time Delta或str,可选
要使用的t系列模块或时间规则的偏移量(例如"EOM").
如果指定了freq,则索引值将移位,但数据将
没有重新调整.也就是说,如果要扩展索引,请使用freq
在移动和保留原始数据时.If freq is specified as
“infer” then it will be inferred from the freq or inferred_freq
attributes of the index. If neither of those attributes exist, a
ValueError is thrown.个
假设此示例具有频率为2周的索引:
df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
"Col2": [13, 23, 18, 33, 48],
"Col3": [17, 27, 22, 37, 52]},
index=pd.date_range("2020-01-01", periods=5, freq='2W'))
df.index
# DatetimeIndex(['2020-01-05', ..., '2020-03-01'], dtype='datetime64[ns]',
# freq='2W-SUN') # <- the important part
df.shift(periods=2, freq='infer')
人将转移2个周期,每次2 W = 4周:
df.shift(periods=2, freq='infer')
Col1 Col2 Col3
2020-02-02 10 13 17 # 4W after 2020-01-05
2020-02-16 20 23 27
2020-03-01 15 18 22
2020-03-15 30 33 37
2020-03-29 45 48 52 # 4W after 2020-03-01
相比之下,简单的df.shift(periods=2)
只移动了2行:
df.shift(periods=2)
Col1 Col2 Col3
2020-01-05 NaN NaN NaN
2020-01-19 NaN NaN NaN
2020-02-02 10.0 13.0 17.0
2020-02-16 20.0 23.0 27.0
2020-03-01 15.0 18.0 22.0
在您的示例中,默认频率date_range
是D
,所以这确实给出了相同的输出.把它改成freq='3D'
:
df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
"Col2": [13, 23, 18, 33, 48],
"Col3": [17, 27, 22, 37, 52]},
index=pd.date_range("2020-01-01", periods=5, freq='3D'))
# shift by 2 * D
df.shift(periods=2, freq='D')
Col1 Col2 Col3
2020-01-03 10 13 17 # 2*1D after 2020-01-01
2020-01-05 20 23 27
2020-01-07 15 18 22
2020-01-09 30 33 37
2020-01-11 45 48 52
# shift by 2 * 3D = 6D
df.shift(periods=2, freq='infer')
Col1 Col2 Col3
2020-01-07 10 13 17 # 2*3D after 2020-01-01
2020-01-10 20 23 27
2020-01-13 15 18 22
2020-01-16 30 33 37
2020-01-19 45 48 52