对于下面的数据帧,我try 在每一行中添加一列,以不同的时间间隔捕获ASK_SIZE,例如1毫秒.
因此,例如,对于行1,1ms之前的大小应该是165ms,因为这是1ms之前的主流ASK大小--尽管之前的时间戳(2024-02-12 09:00:00.178941829)早了很多,但它仍然是之前的**主流**大小1毫秒.
再举一个例子,第3行到第8行应该都是203,因为这是时间戳2024-02-12 09:00:00.334723166的大小,这将是第3行到第8行之前1ms的最后一个时间戳.
一直在阅读merge_asof,try 了下面的一些东西,但没有运气.任何帮助赞赏!
Table example个
idx event_timestamp ask_size
0 2024-02-12 09:00:00.178941829 165
1 2024-02-12 09:00:00.334673928 166
2 2024-02-12 09:00:00.334723166 203
3 2024-02-12 09:00:00.339505589 203
4 2024-02-12 09:00:00.339517572 241
5 2024-02-12 09:00:00.339585194 276
6 2024-02-12 09:00:00.339597200 276
7 2024-02-12 09:00:00.339679756 277
8 2024-02-12 09:00:00.339705796 312
9 2024-02-12 09:00:00.343967540 275
10 2024-02-12 09:00:00.393306026 275
Raw DATA个
data = {
'event_timestamp': ['2024-02-12 09:00:00.178941829', '2024-02-12 09:00:00.334673928',
'2024-02-12 09:00:00.334723166', '2024-02-12 09:00:00.339505589',
'2024-02-12 09:00:00.339517572', '2024-02-12 09:00:00.339585194',
'2024-02-12 09:00:00.339597200', '2024-02-12 09:00:00.339679756',
'2024-02-12 09:00:00.339705796', '2024-02-12 09:00:00.343967540'],
'ask_size_1_x': [165.0, 166.0, 203.0, 203.0, 241.0, 276.0, 276.0, 277.0, 312.0, 275.0]
}
df = pd.DataFrame(data)
Attempt个
data['1ms'] = data['event_timestamp'] - pd.Timedelta(milliseconds=1)
temp = data[['event_timestamp','ask_size_1']]
temp_time_shift = data[['1ms','ask_size_1']]
temp2 = pd.merge_asof(
temp,
temp_time_shift,
left_on = 'event_timestamp',
right_on = '1ms',
direction='backward'
)
EDIT个 建议:
import pandas as pd
data = {
'event_timestamp': [
'2024-02-12 09:00:00.393306026',
'2024-02-12 09:00:00.393347792',
'2024-02-12 09:00:00.393351971',
'2024-02-12 09:00:00.393355738',
'2024-02-12 09:00:00.393389724',
'2024-02-12 09:00:00.542780521',
'2024-02-12 09:00:00.542841917',
'2024-02-12 09:00:00.714845055',
'2024-02-12 09:00:00.714908862',
'2024-02-12 09:00:00.747016524'
],
'ask_size_1': [275.0, 275.0, 237.0, 237.0, 202.0, 202.0, 202.0, 262.0, 261.0, 263.0]
}
df = pd.DataFrame(data)
df['event_timestamp'] = pd.to_datetime(df['event_timestamp']) # Convert 'event_timestamp' to datetime format
tolerance = pd.Timedelta('1ms')
df['out'] = pd.merge_asof(df['event_timestamp'].sub(tolerance),
df[['event_timestamp', 'ask_size_1']],
direction='forward', tolerance=tolerance
)['ask_size_1']
输出如下所示,您可以看到第7行,ASK_SIZE和OUT都是相同的.输出应该是第7行之前至少1毫秒的最后一个ASK_SIZE,即第6行,值为202.
从技术上看,黄色可能是NaN,因为在此之前没有大于1ms的时间戳的值.
event_timestamp ask_size_1 out
0 2024-02-12 09:00:00.393306026 275.0 275.0
1 2024-02-12 09:00:00.393347792 275.0 275.0
2 2024-02-12 09:00:00.393351971 237.0 275.0
3 2024-02-12 09:00:00.393355738 237.0 275.0
4 2024-02-12 09:00:00.393389724 202.0 275.0
5 2024-02-12 09:00:00.542780521 202.0 202.0
6 2024-02-12 09:00:00.542841917 202.0 202.0
7 2024-02-12 09:00:00.714845055 262.0 262.0
8 2024-02-12 09:00:00.714908862 261.0 262.0
9 2024-02-12 09:00:00.747016524 263.0 263.0