我有一些虚拟数据,如下所示:

datetime,duration_in_traffic_s
2023-12-20T10:50:43.063641000,221.0
2023-12-20T10:59:09.884939000,219.0
2023-12-20T11:09:56.003331000,206.0
...
more rows with different dates
...

假设该数据存储在文件mwe.csv中. 使用polars,我现在想要计算第二列的平均值,以一小时为单位分组.我想用group_by_dynamic(doc)每10分钟获取一次数据.我在 run

(
    pl.read_csv("mwe.csv")
    .with_columns(pl.col("datetime").cast(pl.Datetime))
    .sort("datetime")
    .group_by_dynamic(
        index_column="datetime",
        every="10m",
        period="1h",
    )
    .agg(pl.col("duration_in_traffic_s").mean())
)

and the result looks like this enter image description here

然而,我不希望平均值考虑日期,只考虑时间,例如2023-12-20 10:402023-12-21 10:40应该落入同一个bin.

我希望在流水线中增加.with_columns(pl.col("datetime").dt.time())会有所帮助,但group_by_dynamic不适用于时间数据.

我可以手动将时间列计算为浮点型

(
    pl.read_csv("mwe.csv")
    .with_columns(pl.col("datetime").cast(dtype=pl.Datetime))
    .with_columns(
        t=pl.col("datetime").dt.hour().cast(pl.Float64)
        + pl.col("datetime").dt.minute().cast(pl.Float64) / 60
        + pl.col("datetime").dt.second().cast(pl.Float64) / 60 / 60
    )
).sort("t")

但我不确定如何进行分组.此外,我确实喜欢这种时间格式,所以我希望我能保留下来.

Is there a way to do the dynamic grouping on the time data only, ignoring the date?

以下是完整的mwe.csv个文件:

datetime,duration_in_traffic_s
2023-12-20T10:50:43.063641000,221.0
2023-12-20T10:59:09.884939000,219.0
2023-12-20T11:09:56.003331000,206.0
2023-12-20T11:12:42.347660000,206.0
2023-12-20T11:17:40.084821000,200.0
2023-12-20T11:31:14.957092000,222.0
2023-12-20T11:46:08.886872000,209.0
2023-12-20T12:00:02.024328000,198.0
2023-12-20T12:15:01.910446000,251.0
2023-12-20T12:30:01.447496000,229.0
2023-12-20T12:45:02.761839000,206.0
2023-12-20T14:00:01.456811000,262.0
2023-12-20T14:15:01.718898000,226.0
2023-12-20T14:30:02.452185000,194.0
2023-12-20T14:45:01.717522000,191.0
2023-12-20T14:49:10.150735000,196.0
2023-12-20T14:50:55.800417000,194.0
2023-12-20T14:57:05.230577000,202.0
2023-12-20T14:59:23.005408000,192.0
2023-12-20T15:00:01.316240000,193.0
2023-12-20T15:00:14.842233000,193.33333333333334
2023-12-20T15:00:49.370172000,193.66666666666666
2023-12-20T15:01:06.300133000,193.66666666666666
2023-12-20T15:15:01.943587000,183.0
2023-12-20T15:20:01.567126000,184.0
2023-12-20T15:30:01.784686000,197.0
2023-12-20T15:40:02.468132000,188.0
2023-12-20T15:50:01.968746000,226.0
2023-12-20T16:00:01.864652000,233.0
2023-12-20T16:10:01.185016000,213.0
2023-12-20T16:20:01.544796000,252.0
2023-12-20T16:30:01.621331000,224.0
2023-12-20T16:40:03.567996000,228.0
2023-12-20T16:50:01.014911000,220.0
2023-12-20T17:00:01.723306000,232.0
2023-12-20T17:10:02.490695000,215.0
2023-12-20T17:20:01.844304000,214.0
2023-12-20T17:30:02.147457000,204.0
2023-12-20T17:40:02.217333000,198.0
2023-12-20T17:50:01.741479000,193.0
2023-12-20T18:00:01.665714000,193.0
2023-12-20T18:10:02.334926000,182.0
2023-12-20T18:26:43.135849000,185.0
2023-12-20T18:30:02.434296000,184.0
2023-12-20T18:32:41.033250000,175.0
2023-12-20T18:40:02.941171000,176.0
2023-12-20T19:36:47.313925000,175.0
2023-12-20T19:40:01.895983000,171.0
2023-12-20T19:50:02.049567000,167.0
2023-12-20T20:00:08.284378000,166.0
2023-12-20T20:10:02.727202000,166.0
2023-12-20T20:40:02.407489000,161.0
2023-12-20T21:10:02.100392000,158.0
2023-12-20T21:21:56.063346000,157.0
2023-12-20T21:30:02.005594000,159.0
2023-12-20T21:40:01.915306000,153.0
2023-12-20T21:50:02.318419000,152.0
2023-12-20T22:00:02.369086000,154.0
2023-12-20T22:10:02.704019000,154.0
2023-12-20T22:20:01.968418000,160.0
2023-12-20T22:30:01.965742000,159.0
2023-12-20T22:40:02.718295000,164.0
2023-12-20T22:50:02.347303000,160.0
2023-12-21T05:00:02.595535000,164.0
2023-12-21T05:10:02.642932000,163.0
2023-12-21T05:20:02.390676000,164.0
2023-12-21T05:30:01.971166000,165.0
2023-12-21T05:40:01.874958000,169.0
2023-12-21T05:50:01.806441000,167.0
2023-12-21T06:00:02.396094000,169.0
2023-12-21T06:10:02.350196000,169.0
2023-12-21T06:20:02.041357000,169.0
2023-12-21T06:33:43.895397000,177.0
2023-12-21T07:30:02.240918000,210.0
2023-12-21T07:47:16.654805000,200.0
2023-12-21T07:50:02.960362000,199.0
2023-12-21T08:10:16.746286000,194.0
2023-12-21T08:20:02.218056000,198.0
2023-12-21T08:30:01.729418000,198.0
2023-12-21T08:40:02.345477000,194.0
2023-12-21T08:50:01.464156000,190.0
2023-12-21T09:00:02.476057000,188.0
2023-12-21T09:10:02.130653000,213.0
2023-12-21T09:20:02.364758000,188.0
2023-12-21T09:30:02.499917000,188.0
2023-12-21T09:40:01.911754000,188.0
2023-12-21T09:50:01.885705000,197.0
2023-12-21T10:00:01.633757000,198.0
2023-12-21T10:10:02.531765000,200.0
2023-12-21T10:20:01.685657000,221.0
2023-12-21T10:30:01.567600000,207.0
2023-12-21T10:40:02.279429000,203.0
2023-12-21T10:50:02.548892000,191.0
2023-12-21T11:00:01.622794000,219.0
2023-12-21T11:10:01.435424000,200.0
2023-12-21T11:20:01.849114000,234.0
2023-12-21T11:30:02.391425000,222.0
2023-12-21T11:40:01.796607000,191.0
2023-12-21T11:50:01.776906000,205.0
2023-12-21T12:00:02.485984000,239.0

推荐答案

你可以先用dt.combine来做一个列,列中的所有时间都在同一天

然后,使用dt.truncatedt.time:

df.with_columns(time=pl.date(2024, 1, 1).dt.combine(pl.col("datetime").dt.time())).sort(
    "time"
).group_by_dynamic("time", every="10m", period="1h").agg(
    pl.col("duration_in_traffic_s").mean()
).with_columns(
    time=pl.col("time").dt.time()
)
Out[26]:
shape: (107, 2)
┌──────────┬───────────────────────┐
│ time     ┆ duration_in_traffic_s │
│ ---      ┆ ---                   │
│ time     ┆ f64                   │
╞══════════╪═══════════════════════╡
│ 04:50:00 ┆ 165.0                 │
│ 05:00:00 ┆ 165.333333            │
│ 05:10:00 ┆ 166.166667            │
│ 05:20:00 ┆ 167.166667            │
│ …        ┆ …                     │
│ 22:20:00 ┆ 160.75                │
│ 22:30:00 ┆ 161.0                 │
│ 22:40:00 ┆ 162.0                 │
│ 22:50:00 ┆ 160.0                 │
└──────────┴───────────────────────┘

Python相关问答推荐

OdooElectron 商务产品详情页面中add_qty参数动态更新

如何在Pandas 中存储二进制数?

如何在Power Query中按名称和时间总和进行分组

如何在矩阵上并行化简单循环?

如果AST请求默认受csref保护,那么在Django中使用@ system_decorator(csref_protect)的目的是什么?

过载功能是否包含Support Int而不是Support Int?

如何处理嵌套的SON?

将特定列信息移动到当前行下的新行

Deliveryter Notebook -无法在for循环中更新matplotlib情节(保留之前的情节),也无法使用动画子功能对情节进行动画

查找两极rame中组之间的所有差异

2D空间中的反旋算法

用Python解密Java加密文件

修复mypy错误-赋值中的类型不兼容(表达式具有类型xxx,变量具有类型yyy)

使用setuptools pyproject.toml和自定义目录树构建PyPi包

DataFrames与NaN的条件乘法

改进大型数据集的框架性能

从嵌套的yaml创建一个嵌套字符串,后面跟着点

在www.example.com中使用`package_data`包含不包含__init__. py的非Python文件

合并帧,但不按合并键排序

Gunicorn无法启动Flask应用,因为无法将应用解析为属性名或函数调用.'"'' "