我有一个带有x和y坐标的DataFrame,其中索引表示时间戳.我们可以假设它是一个每个时间步长都在移动的物体.连续时间戳之间的距离预计会增加.然而,如果距离没有增加一定的门槛,我认为这是一个潜在的"等待"位置. 我用了潜在这个词,因为数据是相当嘈杂的,单一的‘等待’条件不足以真正确定物体没有移动.因此,我需要至少3个或更多连续的‘等待’条件,才能确定物体确实没有移动.

我想要检测这些等待的位置,并相应地在一个新的专栏中标记它们.

Example :
                    x         y
timestamp                       
2023-07-01 00:00:00   1         5
2023-07-01 00:01:00   2         6
2023-07-01 00:02:00   3         7
2023-07-01 00:03:00   4         8
2023-07-01 00:04:00   4         8
2023-07-01 00:05:00   5         9
2023-07-01 00:06:00   6         9
2023-07-01 00:07:00   7        10
2023-07-01 00:08:00   7        10
2023-07-01 00:09:00   7        10
2023-07-01 00:10:00   7        10
2023-07-01 00:11:00   8        11
2023-07-01 00:12:00   9        11

为了计算距离,我已经将数据帧移位1,并计算出距离:

                    x         y  distance  
timestamp                                                    
2023-07-01 00:00:00   1         5       NaN   
2023-07-01 00:01:00   2         6  1.414214    
2023-07-01 00:02:00   3         7  1.414214   
2023-07-01 00:03:00   4         8  1.414214   
2023-07-01 00:04:00   4         8  0.000000   
2023-07-01 00:05:00   5         9  1.414214  
2023-07-01 00:06:00   6         9  1.000000   
2023-07-01 00:07:00   7        10  1.414214   
2023-07-01 00:08:00   7        10  0.000000   
2023-07-01 00:09:00   7        10  0.000000   
2023-07-01 00:10:00   7        10  0.000000   
2023-07-01 00:11:00   8        11  1.414214   
2023-07-01 00:12:00   9        11  1.000000    

现在,假设距离小于1,则可能是等待位置:

                    x         y  distance  condition_fulfilled  
timestamp                                                    
2023-07-01 00:00:00   1         5       NaN    NaN    
2023-07-01 00:01:00   2         6  1.414214    False    
2023-07-01 00:02:00   3         7  1.414214    False    
2023-07-01 00:03:00   4         8  1.414214    False    
2023-07-01 00:04:00   4         8  0.000000    True   
2023-07-01 00:05:00   5         9  1.414214    False   
2023-07-01 00:06:00   6         9  1.000000    False   
2023-07-01 00:07:00   7        10  1.414214    False   
2023-07-01 00:08:00   7        10  0.000000    True   
2023-07-01 00:09:00   7        10  0.000000    True   
2023-07-01 00:10:00   7        10  0.000000    True    
2023-07-01 00:11:00   8        11  1.414214    False    
2023-07-01 00:12:00   9        11  1.000000    False    

由于我要求至少3个连续满足的条件,因此预期输出将为:

                    x         y  distance    status  
timestamp                                                    
2023-07-01 00:00:00   1         5       NaN    moving    
2023-07-01 00:01:00   2         6  1.414214    moving    
2023-07-01 00:02:00   3         7  1.414214    moving    
2023-07-01 00:03:00   4         8  1.414214    moving    
2023-07-01 00:04:00   4         8  0.000000    moving   
2023-07-01 00:05:00   5         9  1.414214    moving   
2023-07-01 00:06:00   6         9  1.000000    moving   
2023-07-01 00:07:00   7        10  1.414214    moving   
2023-07-01 00:08:00   7        10  0.000000    waiting   
2023-07-01 00:09:00   7        10  0.000000    waiting   
2023-07-01 00:10:00   7        10  0.000000    waiting    
2023-07-01 00:11:00   8        11  1.414214    moving    
2023-07-01 00:12:00   9        11  1.000000    moving    

推荐答案

try :

# fill the first NaN
df['condition_fulfilled'] = df['condition_fulfilled'].bfill()

tmp = (df['condition_fulfilled'] != df['condition_fulfilled'].shift()).cumsum()
df['status'] = df.groupby(tmp)['condition_fulfilled'].transform(lambda x: 'waiting' if x.all() and len(x) >= 3 else 'moving')

print(df)

打印:

                     x   y  distance  condition_fulfilled   status
timestamp                                                         
2023-07-01 00:00:00  1   5       NaN                False   moving
2023-07-01 00:01:00  2   6  1.414214                False   moving
2023-07-01 00:02:00  3   7  1.414214                False   moving
2023-07-01 00:03:00  4   8  1.414214                False   moving
2023-07-01 00:04:00  4   8  0.000000                 True   moving
2023-07-01 00:05:00  5   9  1.414214                False   moving
2023-07-01 00:06:00  6   9  1.000000                False   moving
2023-07-01 00:07:00  7  10  1.414214                False   moving
2023-07-01 00:08:00  7  10  0.000000                 True  waiting
2023-07-01 00:09:00  7  10  0.000000                 True  waiting
2023-07-01 00:10:00  7  10  0.000000                 True  waiting
2023-07-01 00:11:00  8  11  1.414214                False   moving
2023-07-01 00:12:00  9  11  1.000000                False   moving

Python相关问答推荐

计算每月过go x年的平均值

Flask:如何在完整路由代码执行之前返回验证

Python中使用Delivercio进行多个请求

遵循轮廓中对象方向的计算线

覆盖Django rest响应,仅返回PK

如何使用stride_tricks.as_strided逆转NumPy数组

在应用循环中间保存pandas DataFrame

剧作家Python:expect(locator).to_be_visible()vs locator.wait_for()

LAB中的增强数组

我必须将Sigmoid函数与r2值的两种类型的数据集(每种6个数据集)进行匹配,然后绘制匹配函数的求导.我会犯错

使用FASTCGI在IIS上运行Django频道

ModuleNotFound错误:没有名为flags.State的模块; flags不是包

如何列举Pandigital Prime Set

基于索引值的Pandas DataFrame条件填充

利用Selenium和Beautiful Soup实现Web抓取JavaScript表

如何在Polars中从列表中的所有 struct 中 Select 字段?

给定高度约束的旋转角解析求解

python中的解释会在后台调用函数吗?

如何在Python中获取`Genericums`超级类型?

如何在FastAPI中为我上传的json文件提供索引ID?