提取子数组，然后在Python中将它们连接起来

发布于02月27日

使用Python，我向一个Cuda routine 发送50个图像，该 routine 计算每个图像中50个珠子的位置.下面是一张图解(我刚刚画了4个珠子):

Cuda routine 需要一个展平的数组，因此我必须从图像中提取感兴趣的区域(比方说64x64)，我将其展平，然后拼接.

以下是我所做的:

import numpy as np
import time
from numpy.lib.stride_tricks import sliding_window_view

# Create a numpy array of 1024 x  1280 (the image)
image = np.random.randint(0, 255, (1024, 1280), dtype=np.uint8)


# Assuming x_values and y_values are lists of x and y coordinates of those 64 x 64 region of interests.
x_values =  np.random.randint(100, 900, 50)
y_values =  np.random.randint(100, 900, 50)

# create 2 arrays for the 2 methods. Such array is used by cuda
main_array_1 = np.zeros((50*50*64*64))
main_array_2 = np.zeros((50*50*64*64))


print("#########################")
# First method 
print("#########################")
start_time = time.time()
sub_images = np.array([image[x:x+64, y:y+64] for x, y in zip(x_values, y_values)])
flattened_sub_arrays = sub_images.ravel()
main_array_1 [:len(flattened_sub_arrays)] = flattened_sub_arrays
print("--- %s seconds ---" % ((time.time() - start_time)))
print(flattened_sub_arrays.shape)


print("#########################")
# second method (Thanks to  hpaulj, see comments).
print("#########################")
# Create an array of indices for x and y
x_indices = np.array([np.arange(x, x+64) for x in x_values])
y_indices = np.array([np.arange(y, y+64) for y in y_values])

start_time = time.time()
# Create a sliding window view of the image
window_view = np.lib.stride_tricks.sliding_window_view(image, (64, 64))
# Extract the sub-images using the x and y values
sub_images = window_view[x_values, y_values]
# Flatten the sub-images
flattened_sub_arrays_2 = sub_images.ravel()
main_array_2 [:len(flattened_sub_arrays_2)] = flattened_sub_arrays_2
print("--- %s seconds ---" % ((time.time() - start_time)))
print(flattened_sub_arrays_2.shape)


print("#########################")
print("#########################")

# compare the two methods
print(np.array_equal(main_array_1, main_array_2))

有什么办法能让这个更快吗？

from numba import njit, prange @njit(parallel=True) def store_array_numba(image, x_values, y_values, main_array): for i in prange(len(x_values)): x = x_values[i] y = y_values[i] idx = 64 * 64 * i for j in range(64): main_array[idx + 64 * j : idx + 64 * (j + 1)] = image[x + j, y : y + 64]

from timeit import timeit import numpy as np from numba import njit, prange # Create a numpy array of 1024 x 1280 image = np.random.randint(0, 255, (1024, 1280), dtype=np.uint8) N = 50 # Assuming x_values and y_values are lists of x and y coordinates (that define the top-left corner of each sub-image) x_values = np.random.randint(100, 900, N) y_values = np.random.randint(100, 900, N) # This is the array that will store the sub-images and which will be used by cuda main_array_1 = np.zeros((N * 64 * 64)) main_array_2 = np.zeros((N * 64 * 64)) def store_array(image, x_values, y_values, main_array): sub_images = np.array( [image[x : x + 64, y : y + 64] for x, y in zip(x_values, y_values)] ) flattened_sub_arrays = sub_images.ravel() main_array[: len(flattened_sub_arrays)] = flattened_sub_arrays return main_array @njit(parallel=True) def store_array_numba(image, x_values, y_values, main_array): for i in prange(len(x_values)): x = x_values[i] y = y_values[i] idx = 64 * 64 * i for j in range(64): main_array[idx + 64 * j : idx + 64 * (j + 1)] = image[x + j, y : y + 64] main_array_1 = store_array(image, x_values, y_values, main_array_1) store_array_numba(image, x_values, y_values, main_array_2) assert np.allclose(main_array_1, main_array_2) t1 = timeit( "store_array(image, x_values, y_values, main_array_1)", number=50, globals=globals(), ) t2 = timeit( "store_array_numba(image, x_values, y_values, main_array_2)", number=50, globals=globals(), ) print(f"time normal = {t1}") print(f"time numba = {t2}")

提取子数组，然后在Python中将它们连接起来

推荐答案

Python相关问答推荐

将jit与numpy linSpace函数一起使用时出错

'discord.ext. commanders.cog没有属性监听器'

对于一个给定的数字，找出一个整数的最小和最大可能的和

在Pandas DataFrame操作中用链接替换'方法的更有效方法

使可滚动框架在tkinter环境中看起来自然

如何将多进程池声明为变量并将其导入到另一个Python文件

如何在Polars中从列表中的所有 struct 中 Select 字段？

重置PD帧中的值

用SymPy在Python中求解指数函数

Gekko中基于时间的间隔约束

并行编程：同步进程

如何将相同组的值添加到嵌套的Pandas Maprame的倒数第二个索引级别

Pandas数据框上的滚动平均值，其中平均值的中心基于另一数据框的时间

无法在盐流道中获得柱子

如何将列表从a迭代到z-以抓取数据并将其转换为DataFrame？

在Pandas 中以十六进制显示/打印列？

高效地计算数字数组中三行上三个点之间的Angular

如何将验证器应用于PYDANC2中的EACHY_ITEM？

为什么在不先将包作为模块导入的情况下相对导入不起作用

如何批量训练样本大小为奇数的神经网络？