Python 将两个包含计数事件和不同行的 Pandas 系列相加

发布于08月14日

我正在try 合并两个Pandas .具有不同日期时间的系列，但在最终合并中获得适当的值有一些问题.我看到一些帖子，他们将这两个系列保存在一个数据帧中，但我想返回一个包含这两个系列总和的系列.

背景: 我有Pandas 系列与检测到的人在房间的计数，我想合并到建筑的计数(包含2个房间在下面的例子).我可以通过将房间聚合在一起，直到我将所有房间合并为一个(然后就是大楼)来做到这一点

我觉得我必须对序列进行排序，然后逐行判断这两个序列才能得到正确的计数. 到目前为止，我使用了Zip()函数逐行遍历序列(已排序)，但我怀疑还有更好的方法.有什么主意吗？

以下是一段代码:

# The data in
room1_idx = pd.to_datetime([
    '2023-08-11T17:00:44',  # 6 people counted
    '2023-08-11T17:06:47',  # 7 people counted
    '2023-08-11T17:06:49',  # 8 people counted
    '2023-08-11T17:07:00',  # 10 people counted
    '2023-08-11T17:07:20',  # 8 people counted
    ])
room1 = pd.Series([6, 7, 8, 10, 8], index=room1_idx, name="Room 1")

room2_idx = pd.to_datetime([
    '2023-08-11T17:06:45',  # 1 people counted
    '2023-08-11T17:06:46',  # 4 people counted
    '2023-08-11T17:06:47',  # 5 people counted
    '2023-08-11T17:07:02',  # 10 people counted
    '2023-08-11T17:07:10',  # 7 people counted
    '2023-08-11T17:07:30',  # 2 people counted
    ])
room2 = pd.Series([1, 4, 5, 10, 7, 2], index=room2_idx, name="Room 2")

print(room1)
print(room2)

我想得到的是一个输出以下内容的函数:

building_idx = pd.to_datetime([
    '2023-08-11 17:00:44',  # 6+0 people counted
    '2023-08-11 17:06:45',  # 6+1 people counted
    '2023-08-11 17:06:46',  # 6+4 people counted
    '2023-08-11 17:06:47',  # 7+5 people counted
    '2023-08-11 17:06:49',  # 8+5 people counted
    '2023-08-11 17:07:00',  # 10+5 people counted
    '2023-08-11 17:07:02',  # 10+10 people counted
    '2023-08-11 17:07:10',  # 10+7 people counted
    '2023-08-11 17:07:20',  # 8+7 people counted
    '2023-08-11 17:07:30',  # 9+2 people counted
    ])
building = pd.Series([6, 7, 10, 12, 13, 15, 20, 17, 15, 11], index=building_idx, name="Building")
print(building)

编辑:

我实现了下面的建议(谢谢@mozway)，但索引有问题(我希望将其作为DateTime，以便能够对索引值使用.strftime()以导出到JSON).但是，我得到了一个

ValueError:无法在具有重复标签的轴上重新编制索引

以下是代码:

count = pd.concat([count1, count2], axis=1).ffill().sum(axis=1).astype('int64')
count.index = pd.to_datetime(count.index)

以及传入和生成的系列:

count1: 
Series([], dtype: object)

count2: 
2023-08-11 17:06:47.079497+00:00     5
2023-08-11 17:07:10.101966+00:00     3
2023-08-11 17:08:19.128688+00:00     5
2023-08-11 17:08:48.139546+00:00     2
2023-08-11 17:09:18.160378+00:00     6
2023-08-11 17:09:54.197841+00:00     2
2023-08-11 17:10:04.213910+00:00     5
2023-08-11 17:12:01.281620+00:00     5
2023-08-11 17:13:07.305747+00:00     2
2023-08-12 05:44:03.925516+00:00     1
2023-08-12 05:44:26.918318+00:00     8
2023-08-12 05:44:53.931560+00:00     2
2023-08-12 05:45:18.957140+00:00     8
2023-08-12 05:45:36.968685+00:00     7
2023-08-12 05:45:53.976605+00:00     1
2023-08-12 05:46:14.982210+00:00     1
2023-08-12 05:46:28.989177+00:00     1
2023-08-12 05:46:53.016045+00:00     7
2023-08-12 05:48:30.037841+00:00     6
2023-08-14 06:51:29.096539+00:00    10
2023-08-14 06:53:03.127933+00:00     7
2023-08-14 06:53:49.153529+00:00     5
2023-08-14 06:54:31.169191+00:00     3
2023-08-14 06:54:56.184129+00:00     2
2023-08-14 06:55:20.191304+00:00     2
2023-08-14 06:56:19.219434+00:00     8
2023-08-14 06:57:00.247351+00:00     1
2023-08-14 07:42:37.251053+00:00     2
Name: totalHuman, dtype: int64

输出:

2023-08-11 17:06:47.079497+00:00     5
2023-08-11 17:07:10.101966+00:00     3
2023-08-11 17:08:19.128688+00:00     5
2023-08-11 17:08:48.139546+00:00     2
2023-08-11 17:09:18.160378+00:00     6
2023-08-11 17:09:54.197841+00:00     2
2023-08-11 17:10:04.213910+00:00     5
2023-08-11 17:12:01.281620+00:00     5
2023-08-11 17:13:07.305747+00:00     2
2023-08-12 05:44:03.925516+00:00     1
2023-08-12 05:44:26.918318+00:00     8
2023-08-12 05:44:53.931560+00:00     2
2023-08-12 05:45:18.957140+00:00     8
2023-08-12 05:45:36.968685+00:00     7
2023-08-12 05:45:53.976605+00:00     1
2023-08-12 05:46:14.982210+00:00     1
2023-08-12 05:46:28.989177+00:00     1
2023-08-12 05:46:53.016045+00:00     7
2023-08-12 05:48:30.037841+00:00     6
2023-08-14 06:51:29.096539+00:00    10
2023-08-14 06:53:03.127933+00:00     7
2023-08-14 06:53:49.153529+00:00     5
2023-08-14 06:54:31.169191+00:00     3
2023-08-14 06:54:56.184129+00:00     2
2023-08-14 06:55:20.191304+00:00     2
2023-08-14 06:56:19.219434+00:00     8
2023-08-14 06:57:00.247351+00:00     1
2023-08-14 07:42:37.251053+00:00     2
dtype: int64

和

count.index = pd.to_datetime(count.index)

引发了一个

ValueError:无法在具有重复标签的轴上重新编制索引

Python 将两个包含计数事件和不同行的 Pandas 系列相加

推荐答案

Python相关问答推荐

合并同名列，但一列为空，另一列包含值

如何调整spaCy token 化器，以便在德国模型中将数字拆分为行末端的点

Python json.转储包含一些UTF-8字符的二元组，要么失败，要么转换它们.我希望编码字符按原样保留

. str.替换pandas.series的方法未按预期工作

运行终端命令时出现问题：pip start anonymous"

如何过滤包含2个指定子字符串的收件箱列名？

Python解析整数格式说明符的规则？

如何将多进程池声明为变量并将其导入到另一个Python文件

优化器的运行顺序影响PyTorch中的预测

无法在Docker内部运行Python的Matlab SDK模块，但本地没有问题

如果满足某些条件，则用另一个数据帧列中的值填充空数据帧或数组

UNIQUE约束失败：customuser. username

Python列表不会在条件while循环中正确随机化'

如何启动下载并在不击中磁盘的情况下呈现响应？

Pandas GroupBy可以分成两个盒子吗？

Python Pandas获取层次路径直到顶层管理

从Windows Python脚本在WSL上运行Linux应用程序

以逻辑方式获取自己的pyproject.toml依赖项

如何使用使用来自其他列的值的公式更新一个rabrame列？

在用于Python的Bokeh包中设置按钮的样式