Python freq = inject在pandas中做了什么''它与freq = D有什么不同''

发布于03月20日

我看不出.shift()分的频率

df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
                   "Col2": [13, 23, 18, 33, 48],
                   "Col3": [17, 27, 22, 37, 52]},
                  index=pd.date_range("2020-01-01", "2020-01-05"))

df.shift(periods=2, freq="infer")

            Col1  Col2  Col3
2020-01-03    10    13    17
2020-01-04    20    23    27
2020-01-05    15    18    22
2020-01-06    30    33    37
2020-01-07    45    48    52

返回与

df.shift(periods=2, freq="d")

            Col1  Col2  Col3
2020-01-03    10    13    17
2020-01-04    20    23    27
2020-01-05    15    18    22
2020-01-06    30    33    37
2020-01-07    45    48    52

有人能解释一下freq='infer'参数的作用吗？

推荐答案

freq='infer'意味着从索引元数据推断出频率.

freq DateOffset、tSeries.Offsets、Time Delta或str，可选

要使用的t系列模块或时间规则的偏移量(例如"EOM").

如果指定了freq，则索引值将移位，但数据将没有重新调整.也就是说，如果要扩展索引，请使用freq 在移动和保留原始数据时.If freq is specified as “infer” then it will be inferred from the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown.个

假设此示例具有频率为2周的索引:

df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
                   "Col2": [13, 23, 18, 33, 48],
                   "Col3": [17, 27, 22, 37, 52]},
                  index=pd.date_range("2020-01-01", periods=5, freq='2W'))

df.index
# DatetimeIndex(['2020-01-05', ..., '2020-03-01'], dtype='datetime64[ns]',
#               freq='2W-SUN')  # <- the important part

df.shift(periods=2, freq='infer')人将转移2个周期，每次2 W = 4周:

df.shift(periods=2, freq='infer')
            Col1  Col2  Col3
2020-02-02    10    13    17   # 4W after 2020-01-05
2020-02-16    20    23    27
2020-03-01    15    18    22
2020-03-15    30    33    37
2020-03-29    45    48    52   # 4W after 2020-03-01

相比之下，简单的df.shift(periods=2)只移动了2行:

df.shift(periods=2)

            Col1  Col2  Col3
2020-01-05   NaN   NaN   NaN
2020-01-19   NaN   NaN   NaN
2020-02-02  10.0  13.0  17.0
2020-02-16  20.0  23.0  27.0
2020-03-01  15.0  18.0  22.0

在您的示例中，默认频率date_range是D，所以这确实给出了相同的输出.把它改成freq='3D':

df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
                   "Col2": [13, 23, 18, 33, 48],
                   "Col3": [17, 27, 22, 37, 52]},
                  index=pd.date_range("2020-01-01", periods=5, freq='3D'))

# shift by 2 * D
df.shift(periods=2, freq='D')
            Col1  Col2  Col3
2020-01-03    10    13    17  # 2*1D after 2020-01-01
2020-01-05    20    23    27
2020-01-07    15    18    22
2020-01-09    30    33    37
2020-01-11    45    48    52

# shift by 2 * 3D = 6D
df.shift(periods=2, freq='infer')
            Col1  Col2  Col3
2020-01-07    10    13    17  # 2*3D after 2020-01-01
2020-01-10    20    23    27
2020-01-13    15    18    22
2020-01-16    30    33    37
2020-01-19    45    48    52