使用Python,我向一个Cuda routine 发送50个图像,该 routine 计算每个图像中50个珠子的位置.下面是一张图解(我刚刚画了4个珠子):
Cuda routine 需要一个展平的数组,因此我必须从图像中提取感兴趣的区域(比方说64x64),我将其展平,然后拼接.
以下是我所做的:
import numpy as np
import time
from numpy.lib.stride_tricks import sliding_window_view
# Create a numpy array of 1024 x 1280 (the image)
image = np.random.randint(0, 255, (1024, 1280), dtype=np.uint8)
# Assuming x_values and y_values are lists of x and y coordinates of those 64 x 64 region of interests.
x_values = np.random.randint(100, 900, 50)
y_values = np.random.randint(100, 900, 50)
# create 2 arrays for the 2 methods. Such array is used by cuda
main_array_1 = np.zeros((50*50*64*64))
main_array_2 = np.zeros((50*50*64*64))
print("#########################")
# First method
print("#########################")
start_time = time.time()
sub_images = np.array([image[x:x+64, y:y+64] for x, y in zip(x_values, y_values)])
flattened_sub_arrays = sub_images.ravel()
main_array_1 [:len(flattened_sub_arrays)] = flattened_sub_arrays
print("--- %s seconds ---" % ((time.time() - start_time)))
print(flattened_sub_arrays.shape)
print("#########################")
# second method (Thanks to hpaulj, see comments).
print("#########################")
# Create an array of indices for x and y
x_indices = np.array([np.arange(x, x+64) for x in x_values])
y_indices = np.array([np.arange(y, y+64) for y in y_values])
start_time = time.time()
# Create a sliding window view of the image
window_view = np.lib.stride_tricks.sliding_window_view(image, (64, 64))
# Extract the sub-images using the x and y values
sub_images = window_view[x_values, y_values]
# Flatten the sub-images
flattened_sub_arrays_2 = sub_images.ravel()
main_array_2 [:len(flattened_sub_arrays_2)] = flattened_sub_arrays_2
print("--- %s seconds ---" % ((time.time() - start_time)))
print(flattened_sub_arrays_2.shape)
print("#########################")
print("#########################")
# compare the two methods
print(np.array_equal(main_array_1, main_array_2))
有什么办法能让这个更快吗?