我有一个Pandas DataFrame,表示带有开始和结束时间的时间表,以及可选的休息和用餐间隔.我的目标是用正确的间隔将单行扩展为多行.值得注意的是:
- 时间轴可能没有任何间歇.
- 时间表可能只有休息时间,也可能只有用餐时间,或者两者都有.
- 用餐和休息时间的顺序可以不同(用餐时间可以在休息时间之前和之后).
下面是一个DataFrame示例:
Id | Start Time | End Time | Rest Break Start Time | Rest Break End Time | Meal Break Start Time | Meal Break End Time |
---|---|---|---|---|---|---|
1 | 2024-01-26 07:59 | 2024-01-26 12:33 | 2024-01-26 10:43 | 2024-01-26 10:53 | 2024-01-26 12:03 | 2024-01-26 12:33 |
2 | 2024-01-26 14:29 | 2024-01-26 17:35 | 2024-01-26 16:33 | 2024-01-26 16:44 | NaN | NaN |
3 | 2024-01-26 08:02 | 2024-01-26 12:45 | NaN | NaN | NaN | NaN |
4 | 2024-01-26 09:15 | 2024-01-26 16:15 | NaN | NaN | 2024-01-26 12:15 | 2024-01-26 12:45 |
5 | 2024-01-26 09:10 | 2024-01-26 16:37 | 2024-01-26 15:43 | 2024-01-26 15:55 | 2024-01-26 13:06 | 2024-01-26 13:37 |
我需要的输出是:
Id | Category | Start Time | End Time |
---|---|---|---|
1 | Session | 2024-01-26 07:59 | 2024-01-26 10:43 |
1 | Rest Break | 2024-01-26 10:43 | 2024-01-26 10:53 |
1 | Session | 2024-01-26 10:53 | 2024-01-26 12:03 |
1 | Meal Break | 2024-01-26 12:03 | 2024-01-26 12:33 |
2 | Session | 2024-01-26 14:29 | 2024-01-26 16:33 |
2 | Rest Break | 2024-01-26 16:33 | 2024-01-26 16:44 |
2 | Session | 2024-01-26 16:44 | 2024-01-26 17:35 |
3 | Session | 2024-01-26 08:02 | 2024-01-26 12:45 |
4 | Session | 2024-01-26 09:15 | 2024-01-26 12:15 |
4 | Meal Break | 2024-01-26 12:15 | 2024-01-26 12:45 |
4 | Session | 2024-01-26 12:45 | 2024-01-26 16:15 |
5 | Session | 2024-01-26 09:10 | 2024-01-26 13:06 |
5 | Meal Break | 2024-01-26 13:06 | 2024-01-26 13:37 |
5 | Session | 2024-01-26 13:37 | 2024-01-26 15:43 |
5 | Rest Break | 2024-01-26 15:43 | 2024-01-26 15:55 |
5 | Session | 2024-01-26 15:55 | 2024-01-26 16:37 |
我当前的逻辑包括从每一行提取所有唯一的日期时间,对它们进行排序,并创建间隔.虽然这种方法在一定程度上是可行的,但我很难 for each 间隔分配一个"类别"列.
import pandas as pd
# Your original DataFrame
data = {'Id': [1, 2, 3, 4, 5],
'Start Time': ['2024-01-26 07:59', '2024-01-26 14:29', '2024-01-26 08:02', '2024-01-26 09:15', '2024-01-26 09:10'],
'End Time': ['2024-01-26 12:33', '2024-01-26 17:35', '2024-01-26 12:45', '2024-01-26 16:15', '2024-01-26 16:37'],
'Rest Break Start Time': ['2024-01-26 10:43', '2024-01-26 16:33', None, None, '2024-01-26 15:43'],
'Rest Break End Time': ['2024-01-26 10:53', '2024-01-26 16:44', None, None, '2024-01-26 15:55'],
'Meal Break Start Time': ['2024-01-26 12:03', None, None, '2024-01-26 12:15', '2024-01-26 13:06'],
'Meal Break End Time': ['2024-01-26 12:33', None, None, '2024-01-26 12:45', '2024-01-26 13:37']}
df = pd.DataFrame(data)
# Create an empty DataFrame to store the expanded rows
expanded_df = pd.DataFrame(columns=['Id', 'Start Time', 'End Time'])
# Iterate through each row of the original DataFrame
for index, row in df.iterrows():
id_value = row['Id']
start_time = pd.to_datetime(row['Start Time'])
end_time = pd.to_datetime(row['End Time'])
# Collect all times
times = {start_time, end_time}
for column in ['Rest Break Start Time', 'Rest Break End Time', 'Meal Break Start Time', 'Meal Break End Time']:
if not pd.isna(row[column]):
times.add(pd.to_datetime(row[column]))
# Sort the times
sorted_times = sorted(times)
# Create intervals
for i in range(len(sorted_times) - 1):
if sorted_times[i] != sorted_times[i + 1]:
expanded_df = expanded_df.append({'Id': id_value, 'Start Time': sorted_times[i], 'End Time': sorted_times[i + 1]}, ignore_index=True)
# Sort the expanded DataFrame by 'Id' and 'Start Time'
expanded_df = expanded_df.sort_values(by=['Id', 'Start Time']).reset_index(drop=True)
# Show the result
expanded_df
以下是输出: