>>> n = 3
>>> x = range(n ** 2),
>>> xn = list(zip(*[iter(x)] * n))

In PEP 618, the author gives this example of how zip can be used to chunk data into equal sized groups.

How does it work?

I think that it relies on an implementation detail of zip such that if it takes the first element of each of the elements of the list [iter(x)] * n that equates to the first n elements because of the changing state of iter(x) as each of the elements are taken.

This is because the following code replicates the above behavior:

n = 3
x = range(n ** 2)
xn = [iter(x)] * n

res = []

while True:    
        try:    
                col = []    
                for element in xn:    
                        col.append(next(element))    
                res.append(col)    
        except:    
                break

However, I would like to make sure that this is indeed the case and that this is a reliable behavior that can be used to chunk elements of an iterable.

推荐答案

It's not really specific to zip, but you basically have that right. In effect, it's zipping 3 references to the same iterator, causing it to round-robin between them. During each iteration, one more element is consumed from the iterator.

Effectively, it's the same as doing this:

>>> n = 3
>>> x = range(n ** 2)
>>> a = b = c = iter(x)
>>> list(zip(a, b, c))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]

Note that it only produces equal sized groups and may drop elements (that part is a characteristic of zip, because it's limited by the smallest iterable, though you could use itertools.zip_longest if you want):

>>> n = 4
>>> x = range(n ** 2)
>>> a = b = c = iter(x)
>>> list(zip(a, b, c))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 14)]

Python相关问答推荐

使用 groupby 在日期范围内有效地计算值的出现次数

检索其列(一个)值连续等于列表值的 Pandas 数据帧行

为什么将 DP 添加到我的递归中会使其停止正常工作?

使用 conftest.py 与从专用模块导入固定装置

Python Sendgrid 发送带有所有扩展文件附件 django 的邮箱

根据转换是否可以增量运行来更改输入

重新格式化列表中的字符串 - Python

在转换为 numpy 数组/Pandas 数据帧之前有效地过滤字节流

了解 pyspark 适用于 groupby

如何从 API 获取纯文本?

具有固定 .keys() 内容和可变参数的 Python 字典

tkinter 中的 for 循环问题

迭代多个查询并将其存储在 pyspark 数据框中

根据 Pandas 数据框中的事件填充当前行中的下一行事件

Pandas Dataframe:显示重复行 - 完全重复

如何在 getLogger() 之后从日志(log)中获取日志(log)文件路径

如何在 Pyscript 中获取显示消息

如何按天将分组数据帧中的值映射到非分组数据帧

为什么下划线在新的 Python 匹配中不是有效名称?

有没有办法替换长于 X 位的单词的最后 N 个字母?