考虑以下两个数据帧:

date,price
2022-07-23 02:00:00,22834.24
2022-07-23 03:00:00,22808.55
2022-07-23 04:00:00,22895.41
2022-07-23 05:00:00,22902.46
2022-07-23 06:00:00,22827.46
2022-07-23 19:00:00,22272.57
2022-07-23 20:00:00,22325.82
2022-07-23 21:00:00,22243.32
2022-07-23 22:00:00,22469.08
2022-07-23 23:00:00,22451.07
2022-07-24 00:00:00,22549.18
2022-07-24 01:00:00,22423.58
2022-07-24 02:00:00,22469.09
2022-07-24 04:00:00,22396.51
2022-07-24 05:00:00,22749.98
2022-07-24 06:00:00,22679.01
2022-07-24 07:00:00,22701.61

date,price,passed_bars
2022-07-23 02:00:00,22834.24,30.0
2022-07-23 19:00:00,22272.57,13.0
2022-07-24 04:00:00,22396.51,4.0

我们可以使用以下代码片段重新生成数据帧:

import pandas as pd

li1 = [{'date': '2022-07-23 02:00:00', 'price': 22834.24}, {'date': '2022-07-23 03:00:00', 'price': 22808.55},
       {'date': '2022-07-23 04:00:00', 'price': 22895.41}, {'date': '2022-07-23 05:00:00', 'price': 22902.46},
       {'date': '2022-07-23 06:00:00', 'price': 22827.46}, {'date': '2022-07-23 19:00:00', 'price': 22272.57},
       {'date': '2022-07-23 20:00:00', 'price': 22325.82}, {'date': '2022-07-23 21:00:00', 'price': 22243.32},
       {'date': '2022-07-23 22:00:00', 'price': 22469.08}, {'date': '2022-07-23 23:00:00', 'price': 22451.07},
       {'date': '2022-07-24 00:00:00', 'price': 22549.18}, {'date': '2022-07-24 01:00:00', 'price': 22423.58},
       {'date': '2022-07-24 02:00:00', 'price': 22469.09}, {'date': '2022-07-24 04:00:00', 'price': 22396.51},
       {'date': '2022-07-24 05:00:00', 'price': 22749.98}, {'date': '2022-07-24 06:00:00', 'price': 22679.01},
       {'date': '2022-07-24 07:00:00', 'price': 22701.61}]

li2 = [{'date': '2022-07-23 02:00:00', 'price': 22834.24, 'passed_bars': 30.0},
       {'date': '2022-07-23 19:00:00', 'price': 22272.57, 'passed_bars': 13.0},
       {'date': '2022-07-24 04:00:00', 'price': 22396.51, 'passed_bars': 4.0}]

df1 = pd.DataFrame.from_records(li1)

df2 = pd.DataFrame.from_records(li2)

目标是向第一个数据帧df1添加新列,其中每个值必须根据以下逻辑计算:

这个新列是df1中的当前记录和df2中最近的记录之间的时间距离,即df1.date.iloc[i] >= nearest_to_current(df2.date).

基于上述逻辑,所需的数据帧应该如下所示:

date,price, passed_time
2022-07-23 02:00:00,22834.24, 0 hours
2022-07-23 03:00:00,22808.55, 1 hours
2022-07-23 04:00:00,22895.41, 2 hours
2022-07-23 05:00:00,22902.46, 3 hours
2022-07-23 06:00:00,22827.46, 4 hours
2022-07-23 19:00:00,22272.57, 0 hours
2022-07-23 20:00:00,22325.82, 1 hours
2022-07-23 21:00:00,22243.32, 2 hours
2022-07-23 22:00:00,22469.08, 3 hours
2022-07-23 23:00:00,22451.07, 4 hours
2022-07-24 00:00:00,22549.18, 5 hours
2022-07-24 01:00:00,22423.58, 6 hours
2022-07-24 02:00:00,22469.09, 7 hours
2022-07-24 04:00:00,22396.51, 0 hours
2022-07-24 05:00:00,22749.98, 1 hours
2022-07-24 06:00:00,22679.01, 2 hours
2022-07-24 07:00:00,22701.61, 3 hours

推荐答案

试试pd.merge_asof(数据帧必须排序!):

df1["date"] = pd.to_datetime(df1["date"])
df2["date"] = pd.to_datetime(df2["date"])
df2["passed_time"] = df2["date"]

x = pd.merge_asof(df1, df2[["date", "passed_time"]], on="date")
x["passed_time"] = (x["date"] - x["passed_time"]) / pd.Timedelta("1 hour")
print(x)

打印:

                  date     price  passed_time
0  2022-07-23 02:00:00  22834.24          0.0
1  2022-07-23 03:00:00  22808.55          1.0
2  2022-07-23 04:00:00  22895.41          2.0
3  2022-07-23 05:00:00  22902.46          3.0
4  2022-07-23 06:00:00  22827.46          4.0
5  2022-07-23 19:00:00  22272.57          0.0
6  2022-07-23 20:00:00  22325.82          1.0
7  2022-07-23 21:00:00  22243.32          2.0
8  2022-07-23 22:00:00  22469.08          3.0
9  2022-07-23 23:00:00  22451.07          4.0
10 2022-07-24 00:00:00  22549.18          5.0
11 2022-07-24 01:00:00  22423.58          6.0
12 2022-07-24 02:00:00  22469.09          7.0
13 2022-07-24 04:00:00  22396.51          0.0
14 2022-07-24 05:00:00  22749.98          1.0
15 2022-07-24 06:00:00  22679.01          2.0
16 2022-07-24 07:00:00  22701.61          3.0

Python相关问答推荐

pandas DataFrame GroupBy.diff函数的意外输出

将jit与numpy linSpace函数一起使用时出错

scikit-learn导入无法导入名称METRIC_MAPPING64'

如何列举Pandigital Prime Set

Python键入协议默认值

如何在给定的条件下使numpy数组的计算速度最快?

Godot:需要碰撞的对象的AdditionerBody2D或Area2D以及queue_free?

在np数组上实现无重叠的二维滑动窗口

如何设置视频语言时上传到YouTube与Python API客户端

Python+线程\TrocessPoolExecutor

改进大型数据集的框架性能

如何使regex代码只适用于空的目标单元格

为什么if2/if3会提供两种不同的输出?

Python Tkinter为特定样式调整所有ttkbootstrap或ttk Button填充的大小,适用于所有主题

matplotlib图中的复杂箭头形状

PYTHON、VLC、RTSP.屏幕截图不起作用

pandas:在操作pandora之后将pandora列转换为int

mdates定位器在图表中显示不存在的时间间隔

查找查找表中存在的列值组合

ValueError:必须在Pandas 中生成聚合值