Can this matrix math be done faster?

我正在使用Python以透视的方式渲染3D点.速度很重要,因为它将在以后的某个时间点直接转换为帧速率.

我试图使用NumPy函数,但我无法清除两个令人讨厌的列表理解.我的程序90%的运行时是在列表理解期间,这是有意义的,因为它们包含了所有的数学知识,所以如果可能的话,我想找到一个更快的方法.

  1. 第一个列表理解发生在pos的时候-它做了一个 和矩阵乘法 vert_array
  2. 第二个数persp将x值和y值相乘 每一行都基于该特定行的z值.

可以用NumPy中的某些东西替换这些列表理解吗?我读到了有关numpy.einsum和numpy.from函数的内容,但我很难理解它们是否与我的问题相关.

Here is the function that does the main rendering calculations:

我想让pospersp更快:

import time
from random import randint
import numpy as np

def render_all_verts(vert_array):
    """
    :param vert_array: a 2-dimensional numpy array of float32 values and
        size 3 x n, formatted as follows, where each row represents one
        vertex's coordinates in world-space coordinates:
        [[vert_x_1, vert_y_1, vert_z_1],
         [vert_x_2, vert_y_2, vert_z_2],
         ...
         [vert_x_n, vert_y, vert_z]]
    :return: a 2-dimensional numpy array of the same data type, size
        and format as vert_array, but in screen-space coordinates
    """
    # Unit Vector is a 9 element, 2D array that represents the rotation matrix
    # for the camera after some rotation (there's no rotation in this example)
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype='float32')

    # Raw Shift is a 3 element, 1D array that represents the position
    # vector (x, y, z) of the camera in world-space coordinates
    shift = np.array([0, 0, 10], dtype='float32')
    
    # PURPOSE: This converts vert_array, with its coordinates relative
    #   to the world-space axes and origin, into coordinates relative
    #   to camera-space axes and origin (at the camera).
    # MATH DESCRIPTION: For every row, raw_shift is added, then matrix
    #   multiplication is performed with that sum (1x3) and unit_array (3x3).
    pos = np.array([np.matmul(unit_vec, row + shift) for row in vert_array], dtype='float32')
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    # This is a constant used to solve for the perspective
    focus = 5
    # PURPOSE: This calculation does the math to change the vertex coordinates,
    #   which are relative to the camera, into a representation of how they'll
    #   appear on screen in perspective. The x and y values are scaled based on
    #   the z value (distance from the camera)
    # MATH DESCRIPTION: Each row's first two columns are multiplied
    #   by a scalar, which is derived from that row's third column value.
    persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1])) for row in pos])
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    return persp

我写了这段代码到时间render_all_verts,并生成一个随机顶点坐标列表,以重复运行它.

# TESTING RENDERING SPEED
start_time = time.time()

# The next few lines make an array, similar to the 3D points I'd be rendering.
# It contains n vertices with random coordinate values from -m to m
n = 1000
m = 50
example_vertices = np.array([(randint(-m, m), randint(-m, m), randint(-m, m)) for i in range(n)])
# This empty array, which is the same shape as example_vertices. The results are saved here.
rendered_verts = np.empty(example_vertices.shape)

print('Example vertices:\n', example_vertices)

# This loop will render the example vertices many times
render_times = 2000
for i in range(render_times):
    rendered_verts = render_all_verts(example_vertices)
    
print('\n\nLast calculated render of vertices:\n', rendered_verts)

print(f'\n\nThis program took an array of {n} vertices with randomized coordinate')
print(f'values between {-m} and {m} and rendered them {render_times} times.')
print(f'--- {time.time() - start_time} seconds ---')

最后,下面是终端输出的一个实例:

C:\...\simplified_demo.py 
Example vertices:
 [[-45   4 -43]
 [ 42  27  28]
 [-33  24 -18]
 ...
 [ -5  48   5]
 [-17 -17  29]
 [ -5 -46 -24]]
C:\...\simplified_demo.py:45: RuntimeWarning: divide by zero encountered in divide
  persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1]))


Last calculated render of vertices:
 [[ -6.81818182   0.60606061 -33.        ]
 [  5.52631579   3.55263158  38.        ]
 [-20.625       15.          -8.        ]
 ...
 [ -1.66666667  16.          15.        ]
 [ -2.17948718  -2.17948718  39.        ]
 [ -1.78571429 -16.42857143 -14.        ]]


This program took an array of 1000 vertices with randomized coordinate
values between -50 and 50 and rendered them 2000 times.
--- 15.910243272781372 seconds ---

Process finished with exit code 0

附注:NumPy目前似乎可以很好地处理被零除数和溢出值,所以我并不担心运行警告.我重新访问了我的文件路径...

附注:是的,我知道我可以只使用OpenGL或任何其他现有的渲染引擎,它们已经处理了所有这些数学运算,但我更感兴趣的是重新发明这个轮子.对我来说,这主要是一个学习Python和NumPy的实验.

推荐答案

通过使用矢量化可以获得初始加速比


def render_all_verts(vert_array):
    """
    :param vert_array: a 2-dimensional numpy array of float32 values and
        size 3 x n, formatted as follows, where each row represents one
        vertex's coordinates in world-space coordinates:
        [[vert_x_1, vert_y_1, vert_z_1],
         [vert_x_2, vert_y_2, vert_z_2],
         ...
         [vert_x_n, vert_y, vert_z]]
    :return: a 2-dimensional numpy array of the same data type, size
        and format as vert_array, but in screen-space coordinates
    """
    # Unit Vector is a 9 element, 2D array that represents the rotation matrix
    # for the camera after some rotation (there's no rotation in this example)
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype='float32')

    # Raw Shift is a 3 element, 1D array that represents the position
    # vector (x, y, z) of the camera in world-space coordinates
    shift = np.array([0, 0, 10], dtype='float32')
    
    # PURPOSE: This converts vert_array, with its coordinates relative
    #   to the world-space axes and origin, into coordinates relative
    #   to camera-space axes and origin (at the camera).
    # MATH DESCRIPTION: For every row, raw_shift is added, then matrix
    #   multiplication is performed with that sum (1x3) and unit_array (3x3).
    pos2 = np.matmul(unit_vec, (vert_array + shift).T).T
    """
    pos = np.array([np.matmul(unit_vec, row + shift) for row in vert_array], dtype='float32')
    print(vert_array.shape, unit_vec.shape)
    assert pos2.shape == pos.shape, (pos2.shape, pos.shape)
    assert np.all(pos2 == pos), np.sum(pos - pos2)
    """
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    # This is a constant used to solve for the perspective
    focus = 5
    # PURPOSE: This calculation does the math to change the vertex coordinates,
    #   which are relative to the camera, into a representation of how they'll
    #   appear on screen in perspective. The x and y values are scaled based on
    #   the z value (distance from the camera)
    # MATH DESCRIPTION: Each row's first two columns are multiplied
    #   by a scalar, which is derived from that row's third column value.
    x = focus / np.abs(pos2[:,2])
    persp2 = np.multiply(pos2, np.dstack([x, x, np.ones(x.shape)]))
    """
    persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1])) for row in pos2])
    assert persp.shape == persp2.shape, (persp.shape, persp2.shape)
    assert np.all(persp == persp2), np.sum(persp - persp2)
    """
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    return persp2

它的运行时间为0.18秒

然后我们可以移除不需要的转置部件,性能提高50%

def render_all_verts_2(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype='float32')

    shift = np.array([0, 0, 10], dtype='float32')
    pos2 = np.matmul(unit_vec, (vert_array + shift).T)
    focus = 5
    x = focus / np.abs(pos2[2])
    persp2 = np.multiply(pos2, np.vstack([x, x, np.ones(x.shape)]))
    return persp2.T[np.newaxis,]

此版本在我的系统上运行时间为0.12秒

最后,我们可以使用Numba将速度提高4倍,达到0.03s

from numba import njit
@njit
def render_all_verts_2(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)

    shift = np.array([0, 0, 10], dtype=np.float32)
    pos2 = np.dot(unit_vec, (vert_array + shift).T)
    focus = 5
    x = focus / np.abs(pos2[2])
    persp2 = np.multiply(pos2, np.vstack((x, x, np.ones(x.shape, dtype=np.float32))))
    return persp2.T.reshape((1, persp2.shape[1], persp2.shape[0]))

这比我的机器上之前26秒的速度快了800倍.

就像@NIN17建议的那样更加优化(但我看不到任何影响),也go 掉了3Dreshape .在测试的某个地方,我断言形状应该与3D匹配,但这一点发生了变化.

# Best njit version with transpose - 0.03
@njit
def render_all_verts(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)
    shift = np.array([0, 0, 10], dtype=np.float32)
    focus = 5
    data = (vert_array + shift).T
    data = np.dot(unit_vec, data)
    data[:2] *= focus / np.abs(data[2:3])
    return data.T

# Best normal version wiht transpose - 0.10
def render_all_verts(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)
    shift = np.array([0, 0, 10], dtype=np.float32)
    focus = 5
    data = (vert_array + shift).T
    data = np.dot(unit_vec, data)
    data[:2] *= focus / np.abs(data[2:3])
    return data.T

# Without transpose is slower, probably because of BLAS implementation / memory contiguity etc.
# Without transpose normal - 0.14s (slowest)
def render_all_verts(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)
    shift = np.array([0, 0, 10], dtype=np.float32)
    focus = 5
    pos2 = (vert_array + shift)
    pos2 = np.dot(pos2, unit_vec)
    pos2[:,:2] *= focus/np.abs(pos2[:,2:3])
    return pos2

# njit without transpose (second fastest) - 0.06s
@njit
def render_all_verts_2(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)
    shift = np.array([0, 0, 10], dtype=np.float32)
    focus = 5
    pos2 = (vert_array + shift)
    pos2 = np.dot(pos2, unit_vec)
    pos2[:,:2] *= focus/np.abs(pos2[:,2:3])
    return pos2

也是我能做到的最快的纯Python实现(使用Numba不受支持的功能)--对于更大的顶点,它基本上与Numba不相上下,对于更大的顶点,它比Numba更快.

def render_all_verts(vert_array):
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype=np.float32)
    shift = np.array([0, 0, 10], dtype=np.float32)
    focus = 5
    data = vert_array.T.astype(np.float32, order='C')
    data += shift[:,np.newaxis]
    np.dot(unit_vec, data, out=data)
    data[:2] *= focus / np.abs(data[2:3])
    return data.T

Python相关问答推荐

尽管进程输出错误消息,subProcess.check_call的CalledProcess错误.stderr为无

Polars Select 多个元素产品

更改Seaborn条形图中的x轴日期时间限制

如何使用stride_tricks.as_strided逆转NumPy数组

acme错误-Veritas错误:模块收件箱没有属性linear_util'

如何使用Python将工作表从一个Excel工作簿复制粘贴到另一个工作簿?

查找两极rame中组之间的所有差异

如何找到满足各组口罩条件的第一行?

将图像拖到另一个图像

Pandas—合并数据帧,在公共列上保留非空值,在另一列上保留平均值

如何在Raspberry Pi上检测USB并使用Python访问它?

如何调整QscrollArea以正确显示内部正在变化的Qgridlayout?

字符串合并语法在哪里记录

实现神经网络代码时的TypeError

AES—256—CBC加密在Python和PHP中返回不同的结果,HELPPP

以逻辑方式获取自己的pyproject.toml依赖项

OpenCV轮廓.很难找到给定图像的所需轮廓

如果包含特定值,则筛选Groupby

不允许 Select 北极滚动?

如何删除重复的文字翻拍?