Can this matrix math be done faster?

我正在使用Python以透视的方式渲染3D点.速度很重要,因为它将在以后的某个时间点直接转换为帧速率.

我试图使用NumPy函数,但我无法清除两个令人讨厌的列表理解.我的程序90%的运行时是在列表理解期间,这是有意义的,因为它们包含了所有的数学知识,所以如果可能的话,我想找到一个更快的方法.

  1. 第一个列表理解发生在pos的时候-它做了一个 和矩阵乘法 vert_array
  2. 第二个数persp将x值和y值相乘 每一行都基于该特定行的z值.

可以用NumPy中的某些东西替换这些列表理解吗?我读到了有关numpy.einsum和numpy.from函数的内容,但我很难理解它们是否与我的问题相关.

Here is the function that does the main rendering calculations:

我想让pospersp更快:

import time
from random import randint
import numpy as np

def render_all_verts(vert_array):
    """
    :param vert_array: a 2-dimensional numpy array of float32 values and
        size 3 x n, formatted as follows, where each row represents one
        vertex's coordinates in world-space coordinates:
        [[vert_x_1, vert_y_1, vert_z_1],
         [vert_x_2, vert_y_2, vert_z_2],
         ...
         [vert_x_n, vert_y, vert_z]]
    :return: a 2-dimensional numpy array of the same data type, size
        and format as vert_array, but in screen-space coordinates
    """
    # Unit Vector is a 9 element, 2D array that represents the rotation matrix
    # for the camera after some rotation (there's no rotation in this example)
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype='float32')

    # Raw Shift is a 3 element, 1D array that represents the position
    # vector (x, y, z) of the camera in world-space coordinates
    shift = np.array([0, 0, 10], dtype='float32')
    
    # PURPOSE: This converts vert_array, with its coordinates relative
    #   to the world-space axes and origin, into coordinates relative
    #   to camera-space axes and origin (at the camera).
    # MATH DESCRIPTION: For every row, raw_shift is added, then matrix
    #   multiplication is performed with that sum (1x3) and unit_array (3x3).
    pos = np.array([np.matmul(unit_vec, row + shift) for row in vert_array], dtype='float32')
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    # This is a constant used to solve for the perspective
    focus = 5
    # PURPOSE: This calculation does the math to change the vertex coordinates,
    #   which are relative to the camera, into a representation of how they'll
    #   appear on screen in perspective. The x and y values are scaled based on
    #   the z value (distance from the camera)
    # MATH DESCRIPTION: Each row's first two columns are multiplied
    #   by a scalar, which is derived from that row's third column value.
    persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1])) for row in pos])
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    return persp

我写了这段代码到时间render_all_verts,并生成一个随机顶点坐标列表,以重复运行它.

# TESTING RENDERING SPEED
start_time = time.time()

# The next few lines make an array, similar to the 3D points I'd be rendering.
# It contains n vertices with random coordinate values from -m to m
n = 1000
m = 50
example_vertices = np.array([(randint(-m, m), randint(-m, m), randint(-m, m)) for i in range(n)])
# This empty array, which is the same shape as example_vertices. The results are saved here.
rendered_verts = np.empty(example_vertices.shape)

print('Example vertices:\n', example_vertices)

# This loop will render the example vertices many times
render_times = 2000
for i in range(render_times):
    rendered_verts = render_all_verts(example_vertices)
    
print('\n\nLast calculated render of vertices:\n', rendered_verts)

print(f'\n\nThis program took an array of {n} vertices with randomized coordinate')
print(f'values between {-m} and {m} and rendered them {render_times} times.')
print(f'--- {time.time() - start_time} seconds ---')

最后,下面是终端输出的一个实例:

C:\...\simplified_demo.py 
Example vertices:
 [[-45   4 -43]
 [ 42  27  28]
 [-33  24 -18]
 ...
 [ -5  48   5]
 [-17 -17  29]
 [ -5 -46 -24]]
C:\...\simplified_demo.py:45: RuntimeWarning: divide by zero encountered in divide
  persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1]))


Last calculated render of vertices:
 [[ -6.81818182   0.60606061 -33.        ]
 [  5.52631579   3.55263158  38.        ]
 [-20.625       15.          -8.        ]
 ...
 [ -1.66666667  16.          15.        ]
 [ -2.17948718  -2.17948718  39.        ]
 [ -1.78571429 -16.42857143 -14.        ]]


This program took an array of 1000 vertices with randomized coordinate
values between -50 and 50 and rendered them 2000 times.
--- 15.910243272781372 seconds ---

Process finished with exit code 0

附注:NumPy目前似乎可以很好地处理被零除数和溢出值,所以我并不担心运行警告.我重新访问了我的文件路径...

附注:是的,我知道我可以只使用OpenGL或任何其他现有的渲染引擎,它们已经处理了所有这些数学运算,但我更感兴趣的是重新发明这个轮子.对我来说,这主要是一个学习Python和NumPy的实验.

推荐答案

通过使用矢量化可以获得初始加速比


def render_all_verts(vert_array):
    """
    :param vert_array: a 2-dimensional numpy array of float32 values and
        size 3 x n, formatted as follows, where each row represents one
        vertex's coordinates in world-space coordinates:
        [[vert_x_1, vert_y_1, vert_z_1],
         [vert_x_2, vert_y_2, vert_z_2],
         ...
         [vert_x_n, vert_y, vert_z]]
    :return: a 2-dimensional numpy array of the same data type, size
        and format as vert_array, but in screen-space coordinates
    """
    # Unit Vector is a 9 element, 2D array that represents the rotation matrix
    # for the camera after some rotation (there's no rotation in this example)
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype='float32')

    # Raw Shift is a 3 element, 1D array that represents the position
    # vector (x, y, z) of the camera in world-space coordinates
    shift = np.array([0, 0, 10], dtype='float32')
    
    # PURPOSE: This converts vert_array, with its coordinates relative
    #   to the world-space axes and origin, into coordinates relative
    #   to camera-space axes and origin (at the camera).
    # MATH DESCRIPTION: For every row, raw_shift is added, then matrix
    #   multiplication is performed with that sum (1x3) and unit_array (3x3).
    pos2 = np.matmul(unit_vec, (vert_array + shift).T).T
    """
    pos = np.array([np.matmul(unit_vec, row + shift) for row in vert_array], dtype='float32')
    print(vert_array.shape, unit_vec.shape)
    assert pos2.shape == pos.shape, (pos2.shape, pos.shape)
    assert np.all(pos2 == pos), np.sum(pos - pos2)
    """
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    # This is a constant used to solve for the perspective
    focus = 5
    # PURPOSE: This calculation does the math to change the vertex coordinates,
    #   which are relative to the camera, into a representation of how they'll
    #   appear on screen in perspective. The x and y values are scaled based on
    #   the z value (distance from the camera)
    # MATH DESCRIPTION: Each row's first two columns are multiplied
    #   by a scalar, which is derived from that row's third column value.
    x = focus / np.abs(pos2[:,2])
    persp2 = np.multiply(pos2, np.dstack([x, x, np.ones(x.shape)]))
    """
    persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1])) for row in pos2])
    assert persp.shape == persp2.shape, (persp.shape, persp2.shape)
    assert np.all(persp == persp2), np.sum(persp - persp2)
    """
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    return persp2

它的运行时间为0.18秒

然后我们可以移除不需要的转置部件,性能提高50%

def render_all_verts_2(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype='float32')

    shift = np.array([0, 0, 10], dtype='float32')
    pos2 = np.matmul(unit_vec, (vert_array + shift).T)
    focus = 5
    x = focus / np.abs(pos2[2])
    persp2 = np.multiply(pos2, np.vstack([x, x, np.ones(x.shape)]))
    return persp2.T[np.newaxis,]

此版本在我的系统上运行时间为0.12秒

最后,我们可以使用Numba将速度提高4倍,达到0.03s

from numba import njit
@njit
def render_all_verts_2(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)

    shift = np.array([0, 0, 10], dtype=np.float32)
    pos2 = np.dot(unit_vec, (vert_array + shift).T)
    focus = 5
    x = focus / np.abs(pos2[2])
    persp2 = np.multiply(pos2, np.vstack((x, x, np.ones(x.shape, dtype=np.float32))))
    return persp2.T.reshape((1, persp2.shape[1], persp2.shape[0]))

这比我的机器上之前26秒的速度快了800倍.

就像@NIN17建议的那样更加优化(但我看不到任何影响),也go 掉了3Dreshape .在测试的某个地方,我断言形状应该与3D匹配,但这一点发生了变化.

# Best njit version with transpose - 0.03
@njit
def render_all_verts(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)
    shift = np.array([0, 0, 10], dtype=np.float32)
    focus = 5
    data = (vert_array + shift).T
    data = np.dot(unit_vec, data)
    data[:2] *= focus / np.abs(data[2:3])
    return data.T

# Best normal version wiht transpose - 0.10
def render_all_verts(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)
    shift = np.array([0, 0, 10], dtype=np.float32)
    focus = 5
    data = (vert_array + shift).T
    data = np.dot(unit_vec, data)
    data[:2] *= focus / np.abs(data[2:3])
    return data.T

# Without transpose is slower, probably because of BLAS implementation / memory contiguity etc.
# Without transpose normal - 0.14s (slowest)
def render_all_verts(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)
    shift = np.array([0, 0, 10], dtype=np.float32)
    focus = 5
    pos2 = (vert_array + shift)
    pos2 = np.dot(pos2, unit_vec)
    pos2[:,:2] *= focus/np.abs(pos2[:,2:3])
    return pos2

# njit without transpose (second fastest) - 0.06s
@njit
def render_all_verts_2(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)
    shift = np.array([0, 0, 10], dtype=np.float32)
    focus = 5
    pos2 = (vert_array + shift)
    pos2 = np.dot(pos2, unit_vec)
    pos2[:,:2] *= focus/np.abs(pos2[:,2:3])
    return pos2

也是我能做到的最快的纯Python实现(使用Numba不受支持的功能)--对于更大的顶点,它基本上与Numba不相上下,对于更大的顶点,它比Numba更快.

def render_all_verts(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)
    shift = np.array([0, 0, 10], dtype=np.float32)
    focus = 5
    data = vert_array.T.astype(np.float32, order='C')
    data += shift[:,np.newaxis]
    np.dot(unit_vec, data, out=data)
    data[:2] *= focus / np.abs(data[2:3])
    return data.T

Python相关问答推荐

为什么sys.exit()不能与subproccess.run()或subprocess.call()一起使用

梯度下降:简化要素集的运行时间比原始要素集长

在Python argparse包中添加formatter_class MetavarTypeHelpFormatter时, - help不再工作""""

在极性中创建条件累积和

所有列的滚动标准差,忽略NaN

如何根据一列的值有条件地 Select 前N组?

在两极中过滤

Flask Jinja2如果语句总是计算为false&

基于Scipy插值法的三次样条系数

如何在Python 3.9.6和MacOS Sonoma 14.3.1下安装Pyregion

使用SeleniumBase保存和加载Cookie时出现问题

Js的查询结果可以在PC Chrome上显示,但不能在Android Chrome、OPERA和EDGE上显示,而两者都可以在Firefox上运行

如何在信号的FFT中获得正确的频率幅值

我可以不带视频系统的pygame,只用于游戏手柄输入吗?''

高效生成累积式三角矩阵

在不中断格式的情况下在文件的特定部分插入XML标签

如何在函数签名中输入数据类字段

如果init被重载,如何输入提示一个基于init的函数的返回类型

apply to pandas列不会给出正确的结果.

在单个图形上使用Pandas DataFrame.lot()绘制比例大不相同且x轴相同的条形图和y线条