Python NumPy能取代这些列表理解，让它运行得更快吗

发布于01月13日

Can this matrix math be done faster?

我正在使用Python以透视的方式渲染3D点.速度很重要，因为它将在以后的某个时间点直接转换为帧速率.

我试图使用NumPy函数，但我无法清除两个令人讨厌的列表理解.我的程序90%的运行时是在列表理解期间，这是有意义的，因为它们包含了所有的数学知识，所以如果可能的话，我想找到一个更快的方法.

第一个列表理解发生在pos的时候-它做了一个和矩阵乘法 vert_array
第二个数persp将x值和y值相乘每一行都基于该特定行的z值.

可以用NumPy中的某些东西替换这些列表理解吗？我读到了有关numpy.einsum和numpy.from函数的内容，但我很难理解它们是否与我的问题相关.

Here is the function that does the main rendering calculations:

我想让pos和persp更快:

import time
from random import randint
import numpy as np

def render_all_verts(vert_array):
    """
    :param vert_array: a 2-dimensional numpy array of float32 values and
        size 3 x n, formatted as follows, where each row represents one
        vertex's coordinates in world-space coordinates:
        [[vert_x_1, vert_y_1, vert_z_1],
         [vert_x_2, vert_y_2, vert_z_2],
         ...
         [vert_x_n, vert_y, vert_z]]
    :return: a 2-dimensional numpy array of the same data type, size
        and format as vert_array, but in screen-space coordinates
    """
    # Unit Vector is a 9 element, 2D array that represents the rotation matrix
    # for the camera after some rotation (there's no rotation in this example)
    unit_vec = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 0, 1]], dtype='float32')

    # Raw Shift is a 3 element, 1D array that represents the position
    # vector (x, y, z) of the camera in world-space coordinates
    shift = np.array([0, 0, 10], dtype='float32')
    
    # PURPOSE: This converts vert_array, with its coordinates relative
    #   to the world-space axes and origin, into coordinates relative
    #   to camera-space axes and origin (at the camera).
    # MATH DESCRIPTION: For every row, raw_shift is added, then matrix
    #   multiplication is performed with that sum (1x3) and unit_array (3x3).
    pos = np.array([np.matmul(unit_vec, row + shift) for row in vert_array], dtype='float32')
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    # This is a constant used to solve for the perspective
    focus = 5
    # PURPOSE: This calculation does the math to change the vertex coordinates,
    #   which are relative to the camera, into a representation of how they'll
    #   appear on screen in perspective. The x and y values are scaled based on
    #   the z value (distance from the camera)
    # MATH DESCRIPTION: Each row's first two columns are multiplied
    #   by a scalar, which is derived from that row's third column value.
    persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1])) for row in pos])
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    return persp

我写了这段代码到时间render_all_verts，并生成一个随机顶点坐标列表，以重复运行它.

# TESTING RENDERING SPEED
start_time = time.time()

# The next few lines make an array, similar to the 3D points I'd be rendering.
# It contains n vertices with random coordinate values from -m to m
n = 1000
m = 50
example_vertices = np.array([(randint(-m, m), randint(-m, m), randint(-m, m)) for i in range(n)])
# This empty array, which is the same shape as example_vertices. The results are saved here.
rendered_verts = np.empty(example_vertices.shape)

print('Example vertices:\n', example_vertices)

# This loop will render the example vertices many times
render_times = 2000
for i in range(render_times):
    rendered_verts = render_all_verts(example_vertices)
    
print('\n\nLast calculated render of vertices:\n', rendered_verts)

print(f'\n\nThis program took an array of {n} vertices with randomized coordinate')
print(f'values between {-m} and {m} and rendered them {render_times} times.')
print(f'--- {time.time() - start_time} seconds ---')

最后，下面是终端输出的一个实例:

C:\...\simplified_demo.py 
Example vertices:
 [[-45   4 -43]
 [ 42  27  28]
 [-33  24 -18]
 ...
 [ -5  48   5]
 [-17 -17  29]
 [ -5 -46 -24]]
C:\...\simplified_demo.py:45: RuntimeWarning: divide by zero encountered in divide
  persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1]))


Last calculated render of vertices:
 [[ -6.81818182   0.60606061 -33.        ]
 [  5.52631579   3.55263158  38.        ]
 [-20.625       15.          -8.        ]
 ...
 [ -1.66666667  16.          15.        ]
 [ -2.17948718  -2.17948718  39.        ]
 [ -1.78571429 -16.42857143 -14.        ]]


This program took an array of 1000 vertices with randomized coordinate
values between -50 and 50 and rendered them 2000 times.
--- 15.910243272781372 seconds ---

Process finished with exit code 0

附注:NumPy目前似乎可以很好地处理被零除数和溢出值，所以我并不担心运行警告.我重新访问了我的文件路径...

附注:是的，我知道我可以只使用OpenGL或任何其他现有的渲染引擎，它们已经处理了所有这些数学运算，但我更感兴趣的是重新发明这个轮子.对我来说，这主要是一个学习Python和NumPy的实验.

def render_all_verts(vert_array): """ :param vert_array: a 2-dimensional numpy array of float32 values and size 3 x n, formatted as follows, where each row represents one vertex's coordinates in world-space coordinates: [[vert_x_1, vert_y_1, vert_z_1], [vert_x_2, vert_y_2, vert_z_2], ... [vert_x_n, vert_y, vert_z]] :return: a 2-dimensional numpy array of the same data type, size and format as vert_array, but in screen-space coordinates """ # Unit Vector is a 9 element, 2D array that represents the rotation matrix # for the camera after some rotation (there's no rotation in this example) unit_vec = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype='float32') # Raw Shift is a 3 element, 1D array that represents the position # vector (x, y, z) of the camera in world-space coordinates shift = np.array([0, 0, 10], dtype='float32') # PURPOSE: This converts vert_array, with its coordinates relative # to the world-space axes and origin, into coordinates relative # to camera-space axes and origin (at the camera). # MATH DESCRIPTION: For every row, raw_shift is added, then matrix # multiplication is performed with that sum (1x3) and unit_array (3x3). pos2 = np.matmul(unit_vec, (vert_array + shift).T).T """ pos = np.array([np.matmul(unit_vec, row + shift) for row in vert_array], dtype='float32') print(vert_array.shape, unit_vec.shape) assert pos2.shape == pos.shape, (pos2.shape, pos.shape) assert np.all(pos2 == pos), np.sum(pos - pos2) """ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # This is a constant used to solve for the perspective focus = 5 # PURPOSE: This calculation does the math to change the vertex coordinates, # which are relative to the camera, into a representation of how they'll # appear on screen in perspective. The x and y values are scaled based on # the z value (distance from the camera) # MATH DESCRIPTION: Each row's first two columns are multiplied # by a scalar, which is derived from that row's third column value. x = focus / np.abs(pos2[:,2]) persp2 = np.multiply(pos2, np.dstack([x, x, np.ones(x.shape)])) """ persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1])) for row in pos2]) assert persp.shape == persp2.shape, (persp.shape, persp2.shape) assert np.all(persp == persp2), np.sum(persp - persp2) """ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ return persp2

def render_all_verts_2(vert_array): unit_vec = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype='float32') shift = np.array([0, 0, 10], dtype='float32') pos2 = np.matmul(unit_vec, (vert_array + shift).T) focus = 5 x = focus / np.abs(pos2[2]) persp2 = np.multiply(pos2, np.vstack([x, x, np.ones(x.shape)])) return persp2.T[np.newaxis,]

from numba import njit @njit def render_all_verts_2(vert_array): unit_vec = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32) shift = np.array([0, 0, 10], dtype=np.float32) pos2 = np.dot(unit_vec, (vert_array + shift).T) focus = 5 x = focus / np.abs(pos2[2]) persp2 = np.multiply(pos2, np.vstack((x, x, np.ones(x.shape, dtype=np.float32)))) return persp2.T.reshape((1, persp2.shape[1], persp2.shape[0]))

# Best njit version with transpose - 0.03 @njit def render_all_verts(vert_array): unit_vec = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32) shift = np.array([0, 0, 10], dtype=np.float32) focus = 5 data = (vert_array + shift).T data = np.dot(unit_vec, data) data[:2] *= focus / np.abs(data[2:3]) return data.T # Best normal version wiht transpose - 0.10 def render_all_verts(vert_array): unit_vec = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32) shift = np.array([0, 0, 10], dtype=np.float32) focus = 5 data = (vert_array + shift).T data = np.dot(unit_vec, data) data[:2] *= focus / np.abs(data[2:3]) return data.T # Without transpose is slower, probably because of BLAS implementation / memory contiguity etc. # Without transpose normal - 0.14s (slowest) def render_all_verts(vert_array): unit_vec = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32) shift = np.array([0, 0, 10], dtype=np.float32) focus = 5 pos2 = (vert_array + shift) pos2 = np.dot(pos2, unit_vec) pos2[:,:2] *= focus/np.abs(pos2[:,2:3]) return pos2 # njit without transpose (second fastest) - 0.06s @njit def render_all_verts_2(vert_array): unit_vec = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32) shift = np.array([0, 0, 10], dtype=np.float32) focus = 5 pos2 = (vert_array + shift) pos2 = np.dot(pos2, unit_vec) pos2[:,:2] *= focus/np.abs(pos2[:,2:3]) return pos2

def render_all_verts(vert_array): unit_vec = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32) shift = np.array([0, 0, 10], dtype=np.float32) focus = 5 data = vert_array.T.astype(np.float32, order='C') data += shift[:,np.newaxis] np.dot(unit_vec, data, out=data) data[:2] *= focus / np.abs(data[2:3]) return data.T

Python NumPy能取代这些列表理解，让它运行得更快吗

Can this matrix math be done faster?

Here is the function that does the main rendering calculations:

推荐答案

Python相关问答推荐

尽管进程输出错误消息，subProcess.check_call的CalledProcess错误.stderr为无

Polars Select 多个元素产品

更改Seaborn条形图中的x轴日期时间限制

如何使用stride_tricks.as_strided逆转NumPy数组

acme错误-Veritas错误：模块收件箱没有属性linear_util'

如何使用Python将工作表从一个Excel工作簿复制粘贴到另一个工作簿？

查找两极rame中组之间的所有差异

如何找到满足各组口罩条件的第一行？

将图像拖到另一个图像

Pandas—合并数据帧，在公共列上保留非空值，在另一列上保留平均值

如何在Raspberry Pi上检测USB并使用Python访问它？

如何调整QscrollArea以正确显示内部正在变化的Qgridlayout？

字符串合并语法在哪里记录

实现神经网络代码时的TypeError

AES—256—CBC加密在Python和PHP中返回不同的结果，HELPPP

以逻辑方式获取自己的pyproject.toml依赖项

OpenCV轮廓.很难找到给定图像的所需轮廓

如果包含特定值，则筛选Groupby

不允许 Select 北极滚动？

如何删除重复的文字翻拍？