我有两个数据帧df和df1,根据df的日期范围,我想将df1中的距离列相加

from io import StringIO
data = """IGN,External,Voltage,date_time
0,ON,53.06,13-11-2023 10:01:14
743,ON,47.80,13-11-2023 15:42:11
745,ON,51.20,13-11-2023 16:22:41
1152,ON,50.44,13-11-2023 19:18:26
1155,ON,52.88,14-11-2023 09:39:54
2191,ON,53.66,14-11-2023 14:45:53
"""
df = pd.read_csv(StringIO(data), parse_dates=['date_time'], dayfirst=True)
data1 = """IGN,External,Distance,date_time
0,ON,1,13-11-2023 10:01:14
300,ON,5,13-11-2023 10:42:11
400,ON,6,13-11-2023 11:42:11
743,ON,9,13-11-2023 15:42:11
745,ON,10,13-11-2023 16:22:41
1000,ON,12,13-11-2023 17:13:43
1152,ON,89,13-11-2023 19:18:26
1155,ON,52.88,14-11-2023 09:39:54
1189,ON,54.33,14-11-2023 10:39:54
2191,ON,34,14-11-2023 14:45:53
"""
df1 = pd.read_csv(StringIO(data1), parse_dates=['date_time'], dayfirst=True)

根据df,我想对df1的距离求和. 预期yields 为 实际上下面的数据是手动生成的输出

           start_date            end_date  Distance
0 2023-11-13 10:01:14 2023-11-13 15:42:11     21.00
1 2023-11-13 16:22:41 2023-11-13 19:18:26    111.00
2 2023-11-14 09:39:54 2023-11-14 14:45:53    141.21

推荐答案

mozway的答案已经令人满意了,但这是另一种方法.

import pandas as pd
from io import StringIO


def get_df(data):
    return pd.read_csv(
        StringIO(data), parse_dates=["date_time"], dayfirst=True
    )


if __name__ == "__main__":
    df1 = get_df(
        """IGN,External,Voltage,date_time
        0,ON,53.06,13-11-2023 10:01:14
        743,ON,47.80,13-11-2023 15:42:11
        745,ON,51.20,13-11-2023 16:22:41
        1152,ON,50.44,13-11-2023 19:18:26
        1155,ON,52.88,14-11-2023 09:39:54
        2191,ON,53.66,14-11-2023 14:45:53
        """
    )
    df2 = get_df(
        """IGN,External,Distance,date_time
        0,ON,1,13-11-2023 10:01:14
        300,ON,5,13-11-2023 10:42:11
        400,ON,6,13-11-2023 11:42:11
        743,ON,9,13-11-2023 15:42:11
        745,ON,10,13-11-2023 16:22:41
        1000,ON,12,13-11-2023 17:13:43
        1152,ON,89,13-11-2023 19:18:26
        1155,ON,52.88,14-11-2023 09:39:54
        1189,ON,54.33,14-11-2023 10:39:54
        2191,ON,34,14-11-2023 14:45:53
        """
    )

    group_times = zip(df1.date_time[:-1:2], df1.date_time[1::2])
    new_df = []
    for start_date, stop_date in group_times:
        new_df.append(
            {
                "start_date": start_date,
                "stop_date": stop_date,
                "distance": df2[
                    (start_date <= df2.date_time)
                    & (df2.date_time <= stop_date)
                ].Distance.sum(),
            }
        )
    result = pd.DataFrame(new_df)
    print(result)

    
    # OR:
    result = pd.DataFrame(
        [
            {
                "start_date": s1,
                "stop_date": s2,
                "distance": df2[
                    (s1 <= df2.date_time) & (df2.date_time <= s2)
                ].Distance.sum(),
            }
            for s1, s2 in zip(df1.date_time[:-1:2], df1.date_time[1::2])
        ]
    )


start_date stop_date distance
0 2023-11-13 10:01:14 2023-11-13 15:42:11 21
1 2023-11-13 16:22:41 2023-11-13 19:18:26 111
2 2023-11-14 09:39:54 2023-11-14 14:45:53 141.21

Python相关问答推荐

在IIS中运行的FastAPI-获取权限错误:[Win错误10013]试图以其访问权限禁止的方式访问插槽

如何从同一类的多个元素中抓取数据?

使用子字符串动态更新Python DataFrame中的列

调试回归无法解决我的问题

使用decorator 自动继承父类

收件箱转换错误- polars.exceptions. ComputeHelp- pandera(0.19.0b3)带有polars

PyQt5如何将pyuic 5生成的Python类添加到QStackedWidget中?

为什么dict(id=1,**{id:2})有时会引发KeyMessage:id而不是TypMessage?

在for循环中仅执行一次此操作

具有症状的分段函数:如何仅针对某些输入值定义函数?

剧作家Python没有得到回应

使用新的类型语法正确注释ParamSecdecorator (3.12)

Vectorize多个头寸的止盈/止盈回溯测试pythonpandas

用NumPy优化a[i] = a[i-1]*b[i] + c[i]的迭代计算

将pandas Dataframe转换为3D numpy矩阵

删除字符串中第一次出现单词后的所有内容

Python,Fitting into a System of Equations

DataFrames与NaN的条件乘法

使用Python更新字典中的值

改进大型数据集的框架性能