Python 根据条件为新列分配唯一值

发布于04月26日

我有一个总结汽车旅行的数据集，但它没有确定有多少独特的汽车.我想创建一个loop/if语句，根据旅行开始的位置和时间分配一个唯一的数字，以计算出一个近似的唯一汽车数量.

例如，如果第一辆车的下车位置与第二辆车的取车位置相匹配，并且时间范围在2分钟内，则分配与第一辆车相同的车号.如果完全不同，请指定一个新号码.

我try 了不同的 Select ，但无法成功(初学者).非常感谢您在这个时候提供的任何帮助.(R或Python)

这大致就是我所拥有的:

Pickup time	Dropoff time	Pickup location	Dropoff location
2016-06-09 21:06:36	2016-06-09 21:13:08	A	B
2016-06-09 21:13:31	2016-06-09 21:23:59	A	C
2016-06-09 21:13:45	2016-06-09 21:26:29	B	C
2016-06-09 21:15:33	2016-06-09 21:44:31	A	B
2016-06-09 21:24:49	2016-06-09 21:39:29	C	D

这就是我想要实现的目标:

Pickup time	Dropoff time	Pickup location	Dropoff location	Car #
2016-06-09 21:06:36	2016-06-09 21:13:08	A	B	1
2016-06-09 21:13:31	2016-06-09 21:23:59	A	C	2
2016-06-09 21:13:45	2016-06-09 21:24:29	B	C	1
2016-06-09 21:15:33	2016-06-09 21:44:31	A	B	3
2016-06-09 21:24:49	2016-06-09 21:39:29	C	D	2

library(data.table) # Set threshold (in seconds) threshold = 120 # Get the car identifier result=melt( setDT(df)[,trip:=.I][df, on=.(`Dropoff location`=`Pickup location`), nomatch=0] %>% .[between(`i.Pickup time`-`Dropoff time`,0,threshold),.(trip,i.trip)] %>% .[,car:=.I],id.vars = "car",value.name="trip" )[,variable:=NULL][df, on="trip"] # add any other single-instance cars result[is.na(car),car:=seq(max(result$car,na.rm=T)+1, length.out = result[is.na(car),.N])]

car trip Pickup time Dropoff time Pickup location Dropoff location <int> <int> <POSc> <POSc> <char> <char> 1: 1 1 2016-06-09 21:06:36 2016-06-09 21:13:08 A B 2: 2 2 2016-06-09 21:13:31 2016-06-09 21:23:59 A C 3: 1 3 2016-06-09 21:13:45 2016-06-09 21:26:29 B C 4: 3 4 2016-06-09 21:15:33 2016-06-09 21:44:31 A B 5: 2 5 2016-06-09 21:24:49 2016-06-09 21:39:29 C D

structure(list(`Pickup time` = structure(c(1465506396, 1465506811, 1465506825, 1465506933, 1465507489), class = c("POSIXct", "POSIXt" ), tzone = "UTC"), `Dropoff time` = structure(c(1465506788, 1465507439, 1465507589, 1465508671, 1465508369), class = c("POSIXct", "POSIXt" ), tzone = "UTC"), `Pickup location` = c("A", "A", "B", "A", "C"), `Dropoff location` = c("B", "C", "C", "B", "D")), row.names = c(NA, -5L), class = "data.frame")

Python 根据条件为新列分配唯一值

推荐答案

Python相关问答推荐

如何从同一类的多个元素中抓取数据？

如何从不同长度的HTML表格中抓取准确的字段？

Python：MultiIndex Dataframe到类似json的字典列表

Django文件上传不起作用：文件未出现在媒体目录或数据库中

Python无法在已导入的目录中看到新模块

如何防止Plotly在输出到PDF时减少行中的点数？

Python -Polars库中的滚动索引？

如何根据情况丢弃大Pandas 的前n行，使大Pandas 的其余部分完好无损

在Python和matlab中显示不同 colored颜色的图像

使用Keras的线性回归参数估计

有症状地 destruct 了Python中的regex？

如何访问所有文件，例如环境变量

将pandas Dataframe转换为3D numpy矩阵

在极性中创建条件累积和

计算分布的标准差

索引到 torch 张量，沿轴具有可变长度索引

将pandas导出到CSV数据，但在此之前，将日期按最小到最大排序

下三角形掩码与seaborn clustermap bug

提高算法效率的策略？

计算空值