Python Pandas查找给定时间戳之前的最后一个值

发布于03月05日

对于下面的数据帧，我try 在每一行中添加一列，以不同的时间间隔捕获ASK_SIZE，例如1毫秒.

因此，例如，对于行1，1ms之前的大小应该是165ms，因为这是1ms之前的主流ASK大小--尽管之前的时间戳(2024-02-12 09:00:00.178941829)早了很多，但它仍然是之前的**主流**大小1毫秒.

再举一个例子，第3行到第8行应该都是203，因为这是时间戳2024-02-12 09:00:00.334723166的大小，这将是第3行到第8行之前1ms的最后一个时间戳.

一直在阅读merge_asof，try 了下面的一些东西，但没有运气.任何帮助赞赏！

Table example个

idx event_timestamp                 ask_size
0   2024-02-12 09:00:00.178941829   165
1   2024-02-12 09:00:00.334673928   166
2   2024-02-12 09:00:00.334723166   203
3   2024-02-12 09:00:00.339505589   203
4   2024-02-12 09:00:00.339517572   241
5   2024-02-12 09:00:00.339585194   276
6   2024-02-12 09:00:00.339597200   276
7   2024-02-12 09:00:00.339679756   277
8   2024-02-12 09:00:00.339705796   312
9   2024-02-12 09:00:00.343967540   275
10  2024-02-12 09:00:00.393306026   275

Raw DATA个

data = {
    'event_timestamp': ['2024-02-12 09:00:00.178941829', '2024-02-12 09:00:00.334673928',
                        '2024-02-12 09:00:00.334723166', '2024-02-12 09:00:00.339505589',
                        '2024-02-12 09:00:00.339517572', '2024-02-12 09:00:00.339585194',
                        '2024-02-12 09:00:00.339597200', '2024-02-12 09:00:00.339679756',
                        '2024-02-12 09:00:00.339705796', '2024-02-12 09:00:00.343967540'],
    'ask_size_1_x': [165.0, 166.0, 203.0, 203.0, 241.0, 276.0, 276.0, 277.0, 312.0, 275.0]
}

df = pd.DataFrame(data)

Attempt个

data['1ms'] = data['event_timestamp'] - pd.Timedelta(milliseconds=1)

temp = data[['event_timestamp','ask_size_1']]
temp_time_shift = data[['1ms','ask_size_1']]



temp2 = pd.merge_asof(
            temp,
            temp_time_shift,
            left_on = 'event_timestamp',
            right_on = '1ms',
            direction='backward'
        )

EDIT个建议:

import pandas as pd

data = {
    'event_timestamp': [
        '2024-02-12 09:00:00.393306026',
        '2024-02-12 09:00:00.393347792',
        '2024-02-12 09:00:00.393351971',
        '2024-02-12 09:00:00.393355738',
        '2024-02-12 09:00:00.393389724',
        '2024-02-12 09:00:00.542780521',
        '2024-02-12 09:00:00.542841917',
        '2024-02-12 09:00:00.714845055',
        '2024-02-12 09:00:00.714908862',
        '2024-02-12 09:00:00.747016524'
    ],
    'ask_size_1': [275.0, 275.0, 237.0, 237.0, 202.0, 202.0, 202.0, 262.0, 261.0, 263.0]
}

df = pd.DataFrame(data)
df['event_timestamp'] = pd.to_datetime(df['event_timestamp'])  # Convert 'event_timestamp' to datetime format

tolerance = pd.Timedelta('1ms')
df['out'] = pd.merge_asof(df['event_timestamp'].sub(tolerance),
                          df[['event_timestamp', 'ask_size_1']],
                          direction='forward', tolerance=tolerance
                         )['ask_size_1']

输出如下所示，您可以看到第7行，ASK_SIZE和OUT都是相同的.输出应该是第7行之前至少1毫秒的最后一个ASK_SIZE，即第6行，值为202.

从技术上看，黄色可能是NaN，因为在此之前没有大于1ms的时间戳的值.

    event_timestamp             ask_size_1  out
0   2024-02-12 09:00:00.393306026   275.0   275.0
1   2024-02-12 09:00:00.393347792   275.0   275.0
2   2024-02-12 09:00:00.393351971   237.0   275.0
3   2024-02-12 09:00:00.393355738   237.0   275.0
4   2024-02-12 09:00:00.393389724   202.0   275.0
5   2024-02-12 09:00:00.542780521   202.0   202.0
6   2024-02-12 09:00:00.542841917   202.0   202.0
7   2024-02-12 09:00:00.714845055   262.0   262.0
8   2024-02-12 09:00:00.714908862   261.0   262.0
9   2024-02-12 09:00:00.747016524   263.0   263.0

Expected output:

event_timestamp ask_size_1 out 0 2024-02-12 09:00:00.393306026 271.0 NaN 1 2024-02-12 09:00:00.393347792 275.0 NaN 2 2024-02-12 09:00:00.393351971 237.0 NaN 3 2024-02-12 09:00:00.393355738 237.0 NaN 4 2024-02-12 09:00:00.393389724 202.0 NaN 5 2024-02-12 09:00:00.542780521 206.0 202.0 6 2024-02-12 09:00:00.542841917 51.0 202.0 7 2024-02-12 09:00:00.714845055 262.0 51.0 8 2024-02-12 09:00:00.714908862 261.0 51.0 9 2024-02-12 09:00:00.747016524 263.0 261.0

tmp = pd.concat([pd.DataFrame({'event_timestamp': [df['event_timestamp'].iloc[0]-delta], 'ask_size_1': [df['ask_size_1'].iloc[0]]}), df]) delta = pd.Timedelta('1ms') df['out'] = pd.merge_asof(df['event_timestamp'].sub(delta), tmp, direction='backward', allow_exact_matches=False)['ask_size_1']

event_timestamp ask_size_1 out 0 2024-02-12 09:00:00.393306026 271.0 NaN 1 2024-02-12 09:00:00.393347792 275.0 271.0 2 2024-02-12 09:00:00.393351971 237.0 271.0 3 2024-02-12 09:00:00.393355738 237.0 271.0 4 2024-02-12 09:00:00.393389724 202.0 271.0 5 2024-02-12 09:00:00.542780521 206.0 202.0 6 2024-02-12 09:00:00.542841917 51.0 202.0 7 2024-02-12 09:00:00.714845055 262.0 51.0 8 2024-02-12 09:00:00.714908862 261.0 51.0 9 2024-02-12 09:00:00.747016524 263.0 261.0

Python Pandas查找给定时间戳之前的最后一个值

推荐答案

Python相关问答推荐

如何使用symy打印方程？

在Polars(Python库)中将二进制转换为具有非UTF-8字符的字符串变量

如何制作10，000年及以后的日期时间对象？

使用@ guardlasses. guardlass和注释的Python继承

如何在python xsModel库中定义一个可选[December]字段，以产生受约束的SON模式

OR—Tools中CP—SAT求解器的IntVar设置值

如何让这个星型模式在Python中只使用一个for循环？

如何在Polars中从列表中的所有 struct 中 Select 字段？

当递归函数的返回值未绑定到变量时，非局部变量不更新：

如何使用scipy的curve_fit与约束，其中拟合的曲线总是在观测值之下？

将scipy. sparse矩阵直接保存为常规txt文件

在Python中计算连续天数

如何在Python请求中组合多个适配器？

如何在信号的FFT中获得正确的频率幅值

合并相似列表

用来自另一个数据框的列特定标量划分Polars数据框中的每一列，

从列表中分离数据的最佳方式

在聚合中使用python-polars时如何计算模式

奇怪的Base64 Python解码

大Pandas 每月重新抽样200万只和300万只