我有一个很大的多维数组here,它有很多NaN值.

我想计算沿第一轴(5124)的平均值或内插值.

import numpy as np

data = np.load('data.npy')

mean = np.nanmean(data, axis=1)

现在,mean的形状是:5124, 112data:5124, 112, 112,所以我试着:

data[np.any(np.isnan(data))][-1, :, :, -1] = mean

但数据中仍充斥着南向价值.

我不确定如何将平均值填充到数据数组中.

我try 了一些插值法,但速度非常慢,而且占用内存,所以我不知道是否有更好的方法来填充NaN值.

推荐答案

有点慢,但你可以试试:

data2 = np.nan_to_num(data) + np.isnan(data) * mean[:, None]

正如@Stef所建议的,你可以使用bottleneck来加速这一过程:

# pip install bottleneck
import bottleneck as bn

mean = bn.nanmean(data, axis=1)
data2 = bn.replace(data, np.nan, 0) + np.isnan(data) * mean[:, None]

输出:

>>> data2
array([[[277.02652, 276.253  , 276.36276, ..., 272.2693 , 271.90436,
         271.64706],
        [277.02652, 276.253  , 276.36276, ..., 272.2693 , 271.90436,
         271.64706],
        [277.02652, 276.253  , 276.36276, ..., 272.2693 , 271.90436,
         271.64706],
        ...,
        [277.12585, 275.2982 , 275.56424, ..., 272.2693 , 271.90436,
         271.64706],
        [277.12585, 275.2982 , 275.56424, ..., 272.2693 , 271.90436,
         271.64706],
        [275.11438, 274.27878, 275.17032, ..., 272.2693 , 271.90436,
         271.64706]],

       [[277.4939 , 277.1011 , 277.1529 , ..., 271.71024, 271.51944,
         271.41312],
        [277.4939 , 277.1011 , 277.1529 , ..., 271.71024, 271.51944,
         271.41312],
        [277.4939 , 277.1011 , 277.1529 , ..., 271.71024, 271.51944,
         271.41312],
        ...,
        [277.50073, 276.22455, 276.3818 , ..., 271.71024, 271.51944,
         271.41312],
        [277.50073, 276.22455, 276.3818 , ..., 271.71024, 271.51944,
         271.41312],
        [275.67734, 275.02505, 275.5379 , ..., 271.71024, 271.51944,
         271.41312]],

       [[280.99646, 280.2319 , 280.23727, ..., 272.60663, 272.44424,
         272.37073],
        [280.99646, 280.2319 , 280.23727, ..., 272.60663, 272.44424,
         272.37073],
        [280.99646, 280.2319 , 280.23727, ..., 272.60663, 272.44424,
         272.37073],
        ...,
        [281.111  , 279.33786, 279.4811 , ..., 272.60663, 272.44424,
         272.37073],
        [281.111  , 279.33786, 279.4811 , ..., 272.60663, 272.44424,
         272.37073],
        [279.05643, 277.9778 , 278.5424 , ..., 272.60663, 272.44424,
         272.37073]],

       ...,

       [[299.3109 , 298.8816 , 299.19708, ..., 291.41086, 290.98898,
         290.52472],
        [299.3109 , 298.8816 , 299.19708, ..., 291.41086, 290.98898,
         290.52472],
        [299.3109 , 298.8816 , 299.19708, ..., 291.41086, 290.98898,
         290.52472],
        ...,
        [299.31546, 298.22787, 298.71487, ..., 291.41086, 290.98898,
         290.52472],
        [299.31546, 298.22787, 298.71487, ..., 291.41086, 290.98898,
         290.52472],
        [297.89618, 297.6253 , 298.5444 , ..., 291.41086, 290.98898,
         290.52472]],

       [[298.63446, 298.0783 , 298.38287, ..., 291.76425, 291.33102,
         290.88724],
        [298.63446, 298.0783 , 298.38287, ..., 291.76425, 291.33102,
         290.88724],
        [298.63446, 298.0783 , 298.38287, ..., 291.76425, 291.33102,
         290.88724],
        ...,
        [298.59354, 297.55087, 298.04398, ..., 291.76425, 291.33102,
         290.88724],
        [298.59354, 297.55087, 298.04398, ..., 291.76425, 291.33102,
         290.88724],
        [297.37015, 297.07132, 297.82864, ..., 291.76425, 291.33102,
         290.88724]],

       [[297.76532, 297.54745, 297.9761 , ..., 292.29636, 291.87845,
         291.44046],
        [297.76532, 297.54745, 297.9761 , ..., 292.29636, 291.87845,
         291.44046],
        [297.76532, 297.54745, 297.9761 , ..., 292.29636, 291.87845,
         291.44046],
        ...,
        [297.75772, 296.86356, 297.46335, ..., 292.29636, 291.87845,
         291.44046],
        [297.75772, 296.86356, 297.46335, ..., 292.29636, 291.87845,
         291.44046],
        [296.43677, 296.22397, 297.20667, ..., 292.29636, 291.87845,
         291.44046]]], dtype=float32)

Python相关问答推荐

Python中是否有方法从公共域检索搜索结果

如何根据另一列值用字典中的值替换列值

如何才能知道Python中2列表中的巧合.顺序很重要,但当1个失败时,其余的不应该失败或是0巧合

Gekko:Spring-Mass系统的参数识别

非常奇怪:tzLocal.get_Localzone()基于python3别名的不同输出?

PyQt5,如何使每个对象的 colored颜色 不同?'

迭代嵌套字典的值

将JSON对象转换为Dataframe

什么是合并两个embrame的最佳方法,其中一个有日期范围,另一个有日期没有任何共享列?

在Python 3中,如何让客户端打开一个套接字到服务器,发送一行JSON编码的数据,读回一行JSON编码的数据,然后继续?

如何在Python中使用Pandas将R s Tukey s HSD表转换为相关矩阵''

在不同的帧B中判断帧A中的子字符串,每个帧的大小不同

从旋转的DF查询非NaN值

如何在Python中使用Iscolc迭代器实现观察者模式?

Pandas 数据帧中的枚举,不能在枚举列上执行GROUP BY吗?

需要帮助使用Python中的Google的People API更新联系人的多个字段'

获取git修订版中每个文件的最后修改时间的最有效方法是什么?

每次查询的流通股数量

合并相似列表

Matplotlib中的曲线箭头样式