我正在try 合并两个Pandas .具有不同日期时间的系列,但在最终合并中获得适当的值有一些问题.我看到一些帖子,他们将这两个系列保存在一个数据帧中,但我想返回一个包含这两个系列总和的系列.
背景: 我有Pandas 系列与检测到的人在房间的计数,我想合并到建筑的计数(包含2个房间在下面的例子).我可以通过将房间聚合在一起,直到我将所有房间合并为一个(然后就是大楼)来做到这一点
我觉得我必须对序列进行排序,然后逐行判断这两个序列才能得到正确的计数. 到目前为止,我使用了Zip()函数逐行遍历序列(已排序),但我怀疑还有更好的方法.有什么主意吗?
以下是一段代码:
# The data in
room1_idx = pd.to_datetime([
'2023-08-11T17:00:44', # 6 people counted
'2023-08-11T17:06:47', # 7 people counted
'2023-08-11T17:06:49', # 8 people counted
'2023-08-11T17:07:00', # 10 people counted
'2023-08-11T17:07:20', # 8 people counted
])
room1 = pd.Series([6, 7, 8, 10, 8], index=room1_idx, name="Room 1")
room2_idx = pd.to_datetime([
'2023-08-11T17:06:45', # 1 people counted
'2023-08-11T17:06:46', # 4 people counted
'2023-08-11T17:06:47', # 5 people counted
'2023-08-11T17:07:02', # 10 people counted
'2023-08-11T17:07:10', # 7 people counted
'2023-08-11T17:07:30', # 2 people counted
])
room2 = pd.Series([1, 4, 5, 10, 7, 2], index=room2_idx, name="Room 2")
print(room1)
print(room2)
我想得到的是一个输出以下内容的函数:
building_idx = pd.to_datetime([
'2023-08-11 17:00:44', # 6+0 people counted
'2023-08-11 17:06:45', # 6+1 people counted
'2023-08-11 17:06:46', # 6+4 people counted
'2023-08-11 17:06:47', # 7+5 people counted
'2023-08-11 17:06:49', # 8+5 people counted
'2023-08-11 17:07:00', # 10+5 people counted
'2023-08-11 17:07:02', # 10+10 people counted
'2023-08-11 17:07:10', # 10+7 people counted
'2023-08-11 17:07:20', # 8+7 people counted
'2023-08-11 17:07:30', # 9+2 people counted
])
building = pd.Series([6, 7, 10, 12, 13, 15, 20, 17, 15, 11], index=building_idx, name="Building")
print(building)
编辑:
我实现了下面的建议(谢谢@mozway),但索引有问题(我希望将其作为DateTime,以便能够对索引值使用.strftime()以导出到JSON).但是,我得到了一个
ValueError:无法在具有重复标签的轴上重新编制索引
以下是代码:
count = pd.concat([count1, count2], axis=1).ffill().sum(axis=1).astype('int64')
count.index = pd.to_datetime(count.index)
以及传入和生成的系列:
count1:
Series([], dtype: object)
count2:
2023-08-11 17:06:47.079497+00:00 5
2023-08-11 17:07:10.101966+00:00 3
2023-08-11 17:08:19.128688+00:00 5
2023-08-11 17:08:48.139546+00:00 2
2023-08-11 17:09:18.160378+00:00 6
2023-08-11 17:09:54.197841+00:00 2
2023-08-11 17:10:04.213910+00:00 5
2023-08-11 17:12:01.281620+00:00 5
2023-08-11 17:13:07.305747+00:00 2
2023-08-12 05:44:03.925516+00:00 1
2023-08-12 05:44:26.918318+00:00 8
2023-08-12 05:44:53.931560+00:00 2
2023-08-12 05:45:18.957140+00:00 8
2023-08-12 05:45:36.968685+00:00 7
2023-08-12 05:45:53.976605+00:00 1
2023-08-12 05:46:14.982210+00:00 1
2023-08-12 05:46:28.989177+00:00 1
2023-08-12 05:46:53.016045+00:00 7
2023-08-12 05:48:30.037841+00:00 6
2023-08-14 06:51:29.096539+00:00 10
2023-08-14 06:53:03.127933+00:00 7
2023-08-14 06:53:49.153529+00:00 5
2023-08-14 06:54:31.169191+00:00 3
2023-08-14 06:54:56.184129+00:00 2
2023-08-14 06:55:20.191304+00:00 2
2023-08-14 06:56:19.219434+00:00 8
2023-08-14 06:57:00.247351+00:00 1
2023-08-14 07:42:37.251053+00:00 2
Name: totalHuman, dtype: int64
输出:
2023-08-11 17:06:47.079497+00:00 5
2023-08-11 17:07:10.101966+00:00 3
2023-08-11 17:08:19.128688+00:00 5
2023-08-11 17:08:48.139546+00:00 2
2023-08-11 17:09:18.160378+00:00 6
2023-08-11 17:09:54.197841+00:00 2
2023-08-11 17:10:04.213910+00:00 5
2023-08-11 17:12:01.281620+00:00 5
2023-08-11 17:13:07.305747+00:00 2
2023-08-12 05:44:03.925516+00:00 1
2023-08-12 05:44:26.918318+00:00 8
2023-08-12 05:44:53.931560+00:00 2
2023-08-12 05:45:18.957140+00:00 8
2023-08-12 05:45:36.968685+00:00 7
2023-08-12 05:45:53.976605+00:00 1
2023-08-12 05:46:14.982210+00:00 1
2023-08-12 05:46:28.989177+00:00 1
2023-08-12 05:46:53.016045+00:00 7
2023-08-12 05:48:30.037841+00:00 6
2023-08-14 06:51:29.096539+00:00 10
2023-08-14 06:53:03.127933+00:00 7
2023-08-14 06:53:49.153529+00:00 5
2023-08-14 06:54:31.169191+00:00 3
2023-08-14 06:54:56.184129+00:00 2
2023-08-14 06:55:20.191304+00:00 2
2023-08-14 06:56:19.219434+00:00 8
2023-08-14 06:57:00.247351+00:00 1
2023-08-14 07:42:37.251053+00:00 2
dtype: int64
和
count.index = pd.to_datetime(count.index)
引发了一个
ValueError:无法在具有重复标签的轴上重新编制索引