我有一个Pandas DataFrame,表示带有开始和结束时间的时间表,以及可选的休息和用餐间隔.我的目标是用正确的间隔将单行扩展为多行.值得注意的是:

  • 时间轴可能没有任何间歇.
  • 时间表可能只有休息时间,也可能只有用餐时间,或者两者都有.
  • 用餐和休息时间的顺序可以不同(用餐时间可以在休息时间之前和之后).

下面是一个DataFrame示例:

Id Start Time End Time Rest Break Start Time Rest Break End Time Meal Break Start Time Meal Break End Time
1 2024-01-26 07:59 2024-01-26 12:33 2024-01-26 10:43 2024-01-26 10:53 2024-01-26 12:03 2024-01-26 12:33
2 2024-01-26 14:29 2024-01-26 17:35 2024-01-26 16:33 2024-01-26 16:44 NaN NaN
3 2024-01-26 08:02 2024-01-26 12:45 NaN NaN NaN NaN
4 2024-01-26 09:15 2024-01-26 16:15 NaN NaN 2024-01-26 12:15 2024-01-26 12:45
5 2024-01-26 09:10 2024-01-26 16:37 2024-01-26 15:43 2024-01-26 15:55 2024-01-26 13:06 2024-01-26 13:37

我需要的输出是:

Id Category Start Time End Time
1 Session 2024-01-26 07:59 2024-01-26 10:43
1 Rest Break 2024-01-26 10:43 2024-01-26 10:53
1 Session 2024-01-26 10:53 2024-01-26 12:03
1 Meal Break 2024-01-26 12:03 2024-01-26 12:33
2 Session 2024-01-26 14:29 2024-01-26 16:33
2 Rest Break 2024-01-26 16:33 2024-01-26 16:44
2 Session 2024-01-26 16:44 2024-01-26 17:35
3 Session 2024-01-26 08:02 2024-01-26 12:45
4 Session 2024-01-26 09:15 2024-01-26 12:15
4 Meal Break 2024-01-26 12:15 2024-01-26 12:45
4 Session 2024-01-26 12:45 2024-01-26 16:15
5 Session 2024-01-26 09:10 2024-01-26 13:06
5 Meal Break 2024-01-26 13:06 2024-01-26 13:37
5 Session 2024-01-26 13:37 2024-01-26 15:43
5 Rest Break 2024-01-26 15:43 2024-01-26 15:55
5 Session 2024-01-26 15:55 2024-01-26 16:37

我当前的逻辑包括从每一行提取所有唯一的日期时间,对它们进行排序,并创建间隔.虽然这种方法在一定程度上是可行的,但我很难 for each 间隔分配一个"类别"列.

import pandas as pd

# Your original DataFrame

data = {'Id': [1, 2, 3, 4, 5],
    'Start Time': ['2024-01-26 07:59', '2024-01-26 14:29', '2024-01-26 08:02', '2024-01-26 09:15', '2024-01-26 09:10'],
    'End Time': ['2024-01-26 12:33', '2024-01-26 17:35', '2024-01-26 12:45', '2024-01-26 16:15', '2024-01-26 16:37'],
    'Rest Break Start Time': ['2024-01-26 10:43', '2024-01-26 16:33', None, None, '2024-01-26 15:43'],
    'Rest Break End Time': ['2024-01-26 10:53', '2024-01-26 16:44', None, None, '2024-01-26 15:55'],
    'Meal Break Start Time': ['2024-01-26 12:03', None, None, '2024-01-26 12:15', '2024-01-26 13:06'],
    'Meal Break End Time': ['2024-01-26 12:33', None, None, '2024-01-26 12:45', '2024-01-26 13:37']}

df = pd.DataFrame(data)

# Create an empty DataFrame to store the expanded rows

expanded_df = pd.DataFrame(columns=['Id', 'Start Time', 'End Time'])

# Iterate through each row of the original DataFrame

for index, row in df.iterrows():

    id_value = row['Id']
    start_time = pd.to_datetime(row['Start Time'])
    end_time = pd.to_datetime(row['End Time'])

    # Collect all times

    times = {start_time, end_time}
    for column in ['Rest Break Start Time', 'Rest Break End Time', 'Meal Break Start Time', 'Meal Break End Time']:
        if not pd.isna(row[column]):
            times.add(pd.to_datetime(row[column]))

    # Sort the times

    sorted_times = sorted(times)

    # Create intervals

    for i in range(len(sorted_times) - 1):
        if sorted_times[i] != sorted_times[i + 1]:
            expanded_df = expanded_df.append({'Id': id_value, 'Start Time': sorted_times[i], 'End Time': sorted_times[i + 1]}, ignore_index=True)

# Sort the expanded DataFrame by 'Id' and 'Start Time'

expanded_df = expanded_df.sort_values(by=['Id', 'Start Time']).reset_index(drop=True)

# Show the result

expanded_df

以下是输出:

enter image description here

推荐答案

以下是通过判断"开始时间"/"结束时间"的列名来计算类别的更新版本:

rows = []
for index, row in df.iterrows():
    id_value = row["Id"]
    start_time = pd.to_datetime(row["Start Time"])
    end_time = pd.to_datetime(row["End Time"])

    # Collect all times

    times = {(start_time, "Start Time"), (end_time, "End Time")}
    for column in [
        "Rest Break Start Time",
        "Rest Break End Time",
        "Meal Break Start Time",
        "Meal Break End Time",
    ]:
        if not pd.isna(row[column]):
            times.add((pd.to_datetime(row[column]), column))

    # Sort the times

    sorted_times = sorted(times)

    # Create intervals

    for i in range(len(sorted_times) - 1):
        c1 = sorted_times[i][1]

        if "Start Time" in c1:
            cat = c1.replace("Start Time", "").strip() or "Session"
        elif "End Time" in c1:
            cat = "Session"

        if sorted_times[i][0] != sorted_times[i + 1][0]:
            rows.append(
                {
                    "Id": id_value,
                    "Category": cat,
                    "Start Time": sorted_times[i][0],
                    "End Time": sorted_times[i + 1][0],
                }
            )

expanded_df = pd.DataFrame(rows)
# Sort the expanded DataFrame by 'Id' and 'Start Time'

expanded_df = expanded_df.sort_values(by=["Id", "Start Time"]).reset_index(drop=True)

# Show the result

print(expanded_df)

打印:

    Id    Category          Start Time            End Time
0    1     Session 2024-01-26 07:59:00 2024-01-26 10:43:00
1    1  Rest Break 2024-01-26 10:43:00 2024-01-26 10:53:00
2    1     Session 2024-01-26 10:53:00 2024-01-26 12:03:00
3    1  Meal Break 2024-01-26 12:03:00 2024-01-26 12:33:00
4    2     Session 2024-01-26 14:29:00 2024-01-26 16:33:00
5    2  Rest Break 2024-01-26 16:33:00 2024-01-26 16:44:00
6    2     Session 2024-01-26 16:44:00 2024-01-26 17:35:00
7    3     Session 2024-01-26 08:02:00 2024-01-26 12:45:00
8    4     Session 2024-01-26 09:15:00 2024-01-26 12:15:00
9    4  Meal Break 2024-01-26 12:15:00 2024-01-26 12:45:00
10   4     Session 2024-01-26 12:45:00 2024-01-26 16:15:00
11   5     Session 2024-01-26 09:10:00 2024-01-26 13:06:00
12   5  Meal Break 2024-01-26 13:06:00 2024-01-26 13:37:00
13   5     Session 2024-01-26 13:37:00 2024-01-26 15:43:00
14   5  Rest Break 2024-01-26 15:43:00 2024-01-26 15:55:00
15   5     Session 2024-01-26 15:55:00 2024-01-26 16:37:00

Python相关问答推荐

如何在msgraph.GraphServiceClient上进行身份验证?

Django管理面板显示字段最大长度而不是字段名称

替换字符串中的多个重叠子字符串

Polars LazyFrame在收集后未返回指定的模式顺序

难以在Manim中正确定位对象

如何将双框框列中的成对变成两个新列

发生异常:TclMessage命令名称无效.!listbox"

Polars:用氨纶的其他部分替换氨纶的部分

如何从pandas的rame类继承并使用filepath实例化

多处理队列在与Forking http.server一起使用时随机跳过项目

在Python中从嵌套的for循环中获取插值

处理Gekko的非最优解

判断Python操作:如何从字面上得到所有decorator ?

Pythonquests.get(Url)返回Colab中的空内容

Django抛出重复的键值违反唯一约束错误

用0填充没有覆盖范围的垃圾箱

启动线程时,Python键盘模块冻结/不工作

为什么在更新Pandas 2.x中的列时,数据类型不会更改,而在Pandas 1.x中会更改?

PYODBC错误(SQL包含-26272个参数标记,但提供了235872个参数,HY 000)

401使用有效的OAuth令牌向Google Apps脚本Web App发出POST请求时出现未经授权的错误(";