Python NumPy ndarray柱状计算的性能改进(缩行)

发布于11月30日

我正在对三维ndarray(KxMxN)进行行缩减，即取一列的所有值，并使用Reduce函数产生标量值；最终，KxMxN矩阵将变成KxN阶的二维ndarray.

我想要解决的问题是:

取一个3D矩阵并将其分成16个子矩阵(或任何大于2的偶数个分裂，16是最佳的);
通过从16块中 Select 一半的子矩阵(8)并将这8个子矩阵连接到一个矩阵中来生成所有组合;因此每个组合都有一个矩阵;
为新矩阵的每一列，为所有组合计算一个标量值，最好是一次计算一个标量值.

还有更多的实现细节，我将在此过程中解释.

3-D ndarray是浮点数.

在下面的例子中，njit和numpy是我目前所能得到的最好的结果.我想知道从任何Angular 看，是否还有进一步改进的空间.

cupy(GPU并行化)、dask(CPU并行化)或numba并行化都未能击败下面的(我的用例显然太微不足道了，无法利用GPU的能力，而且我只有8G GPU).这些工具很有可能被用在更高级的方式上，这是我不知道的.

from numba import njit, guvectorize, float64, int64
from math import sqrt
import numba as nb
import numpy as np
import itertools


# Create a 2D ndarray
m = np.random.rand(800,100)
# Reshape it into a list of sub-matrices
mr = m.reshape(16,50,100)

# Create an indices matrix from combinatorics 
# a typical one for me "select 8 from 16", 12870 combinations
# I do have a custom combination generator, but this is not what I wanted to optimise and itertools really has done a decent job already.
x = np.array( list(itertools.combinations(np.arange(16),8)) )

# Now we are going to select 8 sub-matrices from `mr` and reshape them to become one bigger sub-matrix; we do this in list comprehension.
# This is the matrix we are going to reduce.

# Bottleneck 1: This line takes the longest and I'd hope to improve on this line, but I am not sure there's much we could do here.

m3d = np.array([mr[idx_arr].reshape(400,100) for idx_arr in x])

# We create different versions of the same reduce function. 

# Bottleneck 2: The reduce function is another place I'd want to improve on.

# col - column values
# days - trading days in a year
# rf - risk free rate

# njit version with instance function `mean`, `std`, and python `sqrt`
@njit
def nb_sr(col, days, rf):
    mean = (col.mean() * days) - rf
    std = col.std() * sqrt(days)
    return mean / std

# njit version with numpy
@njit
def nb_sr_np(col, days, rf):
    mean = (np.mean(col) * days) -rf
    std = np.std(col) * np.sqrt(days)
    return mean / std

# guvectorize with numpy
@guvectorize([(float64[:],int64,float64,float64[:])], '(n),(),()->()', nopython=True)
def gu_sr_np(col,days,rf,res):
    mean = (np.mean(col) * days) - rf
    std = np.std(col) * np.sqrt(days)
    res[0] = mean / std

# We wrap them such that they can be applied on 2-D matrix with list comprehension.

# Bottleneck 3: I was thinking to probably vectorize this wrapper, but the closest I can get is list comprehension, which isn't really vectorization.

def nb_sr_wrapper(m2d):
    return [nb_sr(r, 252, .25) for r in m2d.T]

def nb_sr_np_wrapper(m2d):
    return [nb_sr_np(r, 252, .25) for r in m2d.T]

def gu_sr_np_wrapper(m2d):
    return [gu_sr_np(r, 252, .25) for r in m2d.T]

# Finally! here's our performance benchmarking step.

%timeit np.array( [nb_sr_wrapper(m) for m in m3d.T] )
# output: 4.26 s ± 3.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit np.array( [nb_sr_np_wrapper(m) for m in m3d.T] )
# output: 4.33 s ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit np.array( [gu_sr_np_wrapper(m) for m in m3d.T] )
# output: 6.06 s ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

def optstd(mr, x, ddof=0): n = mr.shape[1] * x.shape[1] - ddof m2 = np.sum(mr**2, axis=1) / n m = np.sum(mr, axis=1) / n return np.sqrt(m2[x].sum(1) - m[x].sum(1)**2) out3 = ((np.mean(m3d, axis=1) - rf/days) / optstd(mr, x)).T * sqrt(days) >>> np.allclose(out, out3) True

def optcalc(mr, x, days, rf, ddof=0): n = mr.shape[1] * x.shape[1] m2 = np.sum(mr**2, axis=1) m = np.sum(mr, axis=1) A = m[x].sum(1) / n B = np.sqrt(m2[x].sum(1) / (n - ddof) - m[x].sum(1)**2 / (n - ddof)**2) return ((A - rf/days) / B).T * sqrt(days) out4 = optcalc(mr, x, days, rf) >>> np.allclose(out, out4) True %timeit optcalc(mr, x, days, rf) # 78.1 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Python NumPy ndarray柱状计算的性能改进(缩行)

推荐答案

最终版本

Python相关问答推荐

使用Python从HTTP打印值

将C struct 的指针传递给Python中的ioctel

在后台运行的Python函数

如何编写一个正规表达式来查找序列中具有2个或更多相同辅音的所有单词

使用imap-tools时错误，其邮箱地址包含域名中的非默认字符

在Arrow上迭代的快速方法.Julia中包含3000万行和25列的表

是什么导致对Python脚本的jQuery Ajax调用引发500错误？

在使用Guouti包的Python中运行MPP模型时内存不足

如果条件为真，则Groupby.mean()

pandas DataFrame GroupBy.diff函数的意外输出

如何使用symy打印方程？

Pystata：从Python并行运行stata实例

在Python中处理大量CSV文件中的数据

如何从具有不同len的列表字典中创建摘要表？

从dict的列中分钟

Python，Fitting into a System of Equations

Pandas—在数据透视表中占总数的百分比

改进大型数据集的框架性能

调用decorator返回原始函数的输出

在嵌套span下的span中擦除信息