所附的脚本针对不同大小的矩阵上的不同数量的并行进程计算numpy.congiate routine ,并记录相应的运行时间. 矩阵形状只在其第一维上变化(从1,64,64到256,64,64).共轭调用总是在1,64,64个子矩阵上进行,以确保正在处理的部分适合我系统上的L2缓存(每个内核256KB,在我的例子中L3缓存是25MB).运行该脚本将生成下图(具有略微不同的AX标签和 colored颜色 ).
正如您所看到的,从大约100、64、64的形状开始,运行时间取决于所使用的并行进程的数量.
What could be the cause of this ?
Or why is the dependence on the number of processes for matrices below (100,64,64) so low?
My main goal is to find a modification to this script such that the runtime becomes as independent as possible from the number of processes for matrices 'a' of arbitrary size.
In case of 20 Processes:
all 'a' matrices take at most: 20 * 16 * 256 * 64 * 64 Byte = 320MB
all 'b' sub matrices take at most: 20 * 16 * 1 * 64 * 64 Byte = 1.25MB
So all sub matrices fit simultaneously in L3 cache as well as individually in the L2 cache per core of my CPU.
I did only use physical cores no hyper-threading for these tests.
以下是脚本:
from multiprocessing import Process, Queue
import time
import numpy as np
import os
from matplotlib import pyplot as plt
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
def f(q,size):
a = np.random.rand(size,64,64) + 1.j*np.random.rand(size,64,64)
start = time.time()
n=a.shape[0]
for i in range(20):
for b in a:
b.conj()
duration = time.time()-start
q.put(duration)
def speed_test(number_of_processes=1,size=1):
number_of_processes = number_of_processes
process_list=[]
queue = Queue()
#Start processes
for p_id in range(number_of_processes):
p = Process(target=f,args=(queue,size))
process_list.append(p)
p.start()
#Wait until all processes are finished
for p in process_list:
p.join()
output = []
while queue.qsize() != 0:
output.append(queue.get())
return np.mean(output)
if __name__ == '__main__':
processes=np.arange(1,20,3)
data=[[] for i in processes]
for p_id,p in enumerate(processes):
for size_0 in range(1,257):
data[p_id].append(speed_test(number_of_processes=p,size=size_0))
fig,ax = plt.subplots()
for d in data:
ax.plot(d)
ax.set_xlabel('Matrix Size: 1-256,64,64')
ax.set_ylabel('Runtime in seconds')
fig.savefig('result.png')