Python 将元组 (dtype=object) 的 np.ndarray 转换为 dtype=int 的数组

发布于09月13日

我需要将元组的NP数组(简称)转换为整数的NParray.

最明显的方法不起作用:

# array_of_tuples is given, this is just an example:
array_of_tuples = np.zeros(2, dtype=object)
array_of_tuples[0] = 1,2
array_of_tuples[1] = 2,3

np.array(array_of_tuples, dtype=int)

ValueError: setting an array element with a sequence.

推荐答案

它看起来像是将元组放到一个预先分配的固定大小的缓冲区中，而dtype是可行的方法.它似乎避免了大量与计算大小、粗糙程度和数据类型相关的开销.

以下是一些速度较慢的替代方案和一个基准:

您可以欺骗并创建具有所需数量的字段的dtype，因为NumPy支持将元组转换为定制数据类型:

 dt = np.dtype([('', int) for _ in range(len(array_of_tuples[0]))])
 res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int)
 res.view(dt).ravel()[:] = array_of_tuples

您可以堆叠数组:
```
 np.stack(array_of_tuples, axis=0)
```
不幸的是，这比其他提出的方法还要慢.

预分配没有多大帮助:

 res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int)
 np.stack(array_of_tuples, out=res, axis=0)

try 使用np.concatenate作弊，它允许您指定输出数据类型，也不会有太大帮助:

 np.concatenate(array_of_tuples, dtype=int).reshape(len(array_of_tuples), len(array_of_tuples[0]))

预分配数组也不会:

 res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int)
 np.concatenate(array_of_tuples, out=res.ravel())

您也可以try 在python空间中进行连接，这也很慢:

 np.array(sum(array_of_tuples, start=()), dtype=int).reshape(len(array_of_tuples), len(array_of_tuples[0]))

或

 np.reshape(np.sum(array_of_tuples), (len(array_of_tuples), len(array_of_tuples[0])))

array_of_tuples = np.empty(100, dtype=object)
for i in range(len(array_of_tuples)):
    array_of_tuples[i] = tuple(range(i, i + 100))

%%timeit
res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int)
for i, res[i] in enumerate(array_of_tuples):
    pass
305 µs ± 8.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

dt = np.dtype([('', 'int',) for _ in range(100)])
%%timeit
res = np.empty((100, 100), int)
res.view(dt).ravel()[:] = array_of_tuples
334 µs ± 5.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.array(array_of_tuples.tolist())
478 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
res = np.empty((100, 100), int)
np.concatenate(array_of_tuples, out=res.ravel())
500 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.concatenate(array_of_tuples, dtype=int).reshape(100, 100)
504 µs ± 7.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
res = np.empty((100, 100), int)
np.stack(array_of_tuples, out=res, axis=0)
557 µs ± 25.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.stack(array_of_tuples, axis=0)
577 µs ± 6.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.array(sum(array_of_tuples, start=()), dtype=int).reshape(100, 100)
1.06 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.reshape(np.sum(array_of_tuples), (100, 100))
1.26 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Python 将元组 (dtype=object) 的 np.ndarray 转换为 dtype=int 的数组

推荐答案

Python相关问答推荐

数字梯度的意外值

如何销毁框架并使其在tkinter中看起来像以前的样子？

在应用循环中间保存pandas DataFrame

jit JAX函数中的迭代器

如何使用SubProcess/Shell从Python脚本中调用具有几个带有html标签的参数的Perl脚本？

Python daskValue错误：无法识别的区块管理器dask -必须是以下之一：[]

无法使用requests或Selenium抓取一个href链接

Pandas—合并数据帧，在公共列上保留非空值，在另一列上保留平均值

创建可序列化数据模型的最佳方法

提取相关行的最快方法—pandas

字符串合并语法在哪里记录

使用Python从rotowire中抓取MLB每日阵容

在Admin中显示从ManyToMany通过模型的筛选结果

为什么调用函数的值和次数不同，递归在代码中是如何工作的？

如何删除重复的文字翻拍？

如何在FastAPI中替换Pydantic的constr，以便在BaseModel之外使用？'

递归函数修饰器

如何从比较函数生成ngroup？

如何在Polars中创建条件增量列？

查找数据帧的给定列中是否存在特定值