Python numpy.unique如何消除重复列

发布于03月27日

我不能正确理解Numpy的unique函数如何在多维数组上工作.更准确地说，我无法理解unique的文档如何使用axis参数描述它的操作:

https://numpy.org/doc/stable/reference/generated/numpy.unique.html

当指定轴时，由轴索引的子数组为已经解决了.This is done by making the specified axis the first dimension of the array (move the axis to the first dimension to keep the order of the other axes)，然后按C顺序平整子array.的然后将展平子数组视为 struct 化类型，元素给定一个标签，结果是我们最终得到一个一维数组 struct 化类型可以像其他类型一样处理一维数组结果是，展平的子数组被排序为从第一个元素开始的词典顺序.

我已多次阅读上述段落，但不幸的是，我未能清楚地了解这一过程，特别是我在上面以黑体字表示的内容.例如，假设我们有下面的二维数组:

import numpy as np

myarray = np.array(
    [
        [ 1,  3,  7,  8,  3], 
        [-5,  0,  9,  2,  0], 
        [10, 11, 12, 85, 11]
    ]
)

如您所见，包含值{3, 0, 11}的第2列和第5列是重复的.如果我想使用numpy.unique删除重复的列，那么我会运行以下命令:

np.unique(myarray, axis=1)

它提供了预期结果:

array([[ 1,  3,  7,  8],
       [-5,  0,  9,  2],
       [10, 11, 12, 85]])

第5列确实按预期被删除，因为它是(第2列的)重复.所以视觉上，结果是可以理解的.然而，如果我阅读了上面提到的文档，try 按照建议将选定的轴移动为数组的第一个维度，然后将结果子数组展平，我无法理解Numpy是如何精确地拆分和重组数组的 struct 以达到最终结果的.

您能否根据上述文档详细介绍Numpy是如何得出此结果的？

In [104]: orig_shape, orig_dtype = ar.shape, ar.dtype ...: ar = ar.reshape(orig_shape[0], np.prod(orig_shape[1:], dtype=np.intp)) ...: ar = np.ascontiguousarray(ar) In [105]: ar Out[105]: array([[ 1, -5, 10], [ 3, 0, 11], [ 7, 9, 12], [ 8, 2, 85], [ 3, 0, 11]])

In [106]: dtype = [('f{i}'.format(i=i), ar.dtype) for i in range(ar.shape[1])];dtype Out[106]: [('f0', dtype('int32')), ('f1', dtype('int32')), ('f2', dtype('int32'))] In [107]: consolidated = ar.view(dtype); consolidated Out[107]: array([[(1, -5, 10)], [(3, 0, 11)], [(7, 9, 12)], [(8, 2, 85)], [(3, 0, 11)]], dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])

In [121]: x=[f'{row}' for row in ar];x Out[121]: ['[ 1 -5 10]', '[ 3 0 11]', '[ 7 9 12]', '[ 8 2 85]', '[ 3 0 11]'] In [122]: x.sort();x Out[122]: ['[ 1 -5 10]', '[ 3 0 11]', '[ 3 0 11]', '[ 7 9 12]', '[ 8 2 85]']

Python numpy.unique如何消除重复列

推荐答案

Python相关问答推荐

替换字符串中的多个重叠子字符串

根据不同列的值在收件箱中移动数据

'discord.ext. commanders.cog没有属性监听器'

如何从在虚拟Python环境中运行的脚本中运行需要宿主Python环境的Shell脚本？

数据抓取失败：寻求帮助

无法使用DBFS File API路径附加到CSV In Datricks(OSError Errno 95操作不支持)

在vscode上使用Python虚拟环境时((env))

Pandas Loc Select 到NaN和值列表

python panda ExcelWriter切换动态公式到数组公式

在Python中使用yaml渲染(多行字符串)

基于Scipy插值法的三次样条系数

基于多个数组的多个条件将值添加到numpy数组

提高算法效率的策略？

在Google Drive中获取特定文件夹内的FolderID和文件夹名称

Python：从目录内的文件导入目录

如何在Python中创建仅包含完整天数的月份的列表

如何在Python中实现高效地支持字典和堆操作的缓存？

时间戳上的SOAP头签名无效

如何删除剪裁圆的对角线的外部部分

Pandas ，快速从词典栏中提取信息到新栏