我用concurrent.futures.ProcessPoolExecutor
从一个数字范围中找出一个数字的出现.其目的是调查从并发性中获得的加速性能.为了测试性能,我有一个控件——一个执行上述任务的串行代码(如下所示).我已经编写了两个并发代码,一个使用concurrent.futures.ProcessPoolExecutor.submit()
,另一个使用concurrent.futures.ProcessPoolExecutor.map()
来执行相同的任务.它们如下所示.关于起草前者和后者的建议分别见第here和here页.
分配给所有三个代码的任务是查找数字5在0到1E8范围内的出现次数..submit()
人和.map()
人都被分配了6名工人,.map()
人的区块大小为.submit()
00.在并发代码中,离散化工作负载的方式是相同的.然而,用于查找两个代码中出现的情况的函数是不同的.这是因为参数传递给.submit()
和.map()
调用的函数的方式不同.
所有3个代码报告的发生次数相同,即56953279次.然而,完成这项任务所需的时间却大不相同..submit()
的执行速度是对照组的2倍,而.map()
完成任务的时间是对照组的两倍.
Questions:
- 我想知道
.map()
的缓慢性能是我的代码造成的还是天生的缓慢?"如果是前者,我该如何改进它.我只是惊讶于它的表现比控制慢,因为没有太多动力使用它. - 我想知道是否有办法让
.submit()
个代码执行得更快.我有一个条件,函数_concurrent_submit()
必须返回一个iterable,其中的数字/出现次数包含数字5.
concurrent.futures.ProcessPoolExecutor.submit()
#!/usr/bin/python3.5
# -*- coding: utf-8 -*-
import concurrent.futures as cf
from time import time
from traceback import print_exc
def _findmatch(nmin, nmax, number):
'''Function to find the occurrence of number in range nmin to nmax and return
the found occurrences in a list.'''
print('\n def _findmatch', nmin, nmax, number)
start = time()
match=[]
for n in range(nmin, nmax):
if number in str(n):
match.append(n)
end = time() - start
print("found {0} in {1:.4f}sec".format(len(match),end))
return match
def _concurrent_submit(nmax, number, workers):
'''Function that utilises concurrent.futures.ProcessPoolExecutor.submit to
find the occurences of a given number in a number range in a parallelised
manner.'''
# 1. Local variables
start = time()
chunk = nmax // workers
futures = []
found =[]
#2. Parallelization
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
# 2.1. Discretise workload and submit to worker pool
for i in range(workers):
cstart = chunk * i
cstop = chunk * (i + 1) if i != workers - 1 else nmax
futures.append(executor.submit(_findmatch, cstart, cstop, number))
# 2.2. Instruct workers to process results as they come, when all are
# completed or .....
cf.as_completed(futures) # faster than cf.wait()
# 2.3. Consolidate result as a list and return this list.
for future in futures:
for f in future.result():
try:
found.append(f)
except:
print_exc()
foundsize = len(found)
end = time() - start
print('within statement of def _concurrent_submit():')
print("found {0} in {1:.4f}sec".format(foundsize, end))
return found
if __name__ == '__main__':
nmax = int(1E8) # Number range maximum.
number = str(5) # Number to be found in number range.
workers = 6 # Pool of workers
start = time()
a = _concurrent_submit(nmax, number, workers)
end = time() - start
print('\n main')
print('workers = ', workers)
print("found {0} in {1:.4f}sec".format(len(a),end))
concurrent.futures.ProcessPoolExecutor.map()
#!/usr/bin/python3.5
# -*- coding: utf-8 -*-
import concurrent.futures as cf
import itertools
from time import time
from traceback import print_exc
def _findmatch(listnumber, number):
'''Function to find the occurrence of number in another number and return
a string value.'''
#print('def _findmatch(listnumber, number):')
#print('listnumber = {0} and ref = {1}'.format(listnumber, number))
if number in str(listnumber):
x = listnumber
#print('x = {0}'.format(x))
return x
def _concurrent_map(nmax, number, workers):
'''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
find the occurrences of a given number in a number range in a parallelised
manner.'''
# 1. Local variables
start = time()
chunk = nmax // workers
futures = []
found =[]
#2. Parallelization
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
# 2.1. Discretise workload and submit to worker pool
for i in range(workers):
cstart = chunk * i
cstop = chunk * (i + 1) if i != workers - 1 else nmax
numberlist = range(cstart, cstop)
futures.append(executor.map(_findmatch, numberlist,
itertools.repeat(number),
chunksize=10000))
# 2.3. Consolidate result as a list and return this list.
for future in futures:
for f in future:
if f:
try:
found.append(f)
except:
print_exc()
foundsize = len(found)
end = time() - start
print('within statement of def _concurrent(nmax, number):')
print("found {0} in {1:.4f}sec".format(foundsize, end))
return found
if __name__ == '__main__':
nmax = int(1E8) # Number range maximum.
number = str(5) # Number to be found in number range.
workers = 6 # Pool of workers
start = time()
a = _concurrent_map(nmax, number, workers)
end = time() - start
print('\n main')
print('workers = ', workers)
print("found {0} in {1:.4f}sec".format(len(a),end))
Serial Code:
#!/usr/bin/python3.5
# -*- coding: utf-8 -*-
from time import time
def _serial(nmax, number):
start = time()
match=[]
nlist = range(nmax)
for n in nlist:
if number in str(n):match.append(n)
end=time()-start
print("found {0} in {1:.4f}sec".format(len(match),end))
return match
if __name__ == '__main__':
nmax = int(1E8) # Number range maximum.
number = str(5) # Number to be found in number range.
start = time()
a = _serial(nmax, number)
end = time() - start
print('\n main')
print("found {0} in {1:.4f}sec".format(len(a),end))
Update 13th Feb 2017:
除了@niemmi-answer,我还提供了以下个人研究的答案:
- 如何进一步加快@niemmi的
.map()
和.submit()
解决方案,以及 - 当
ProcessPoolExecutor.map()
可以导致比ProcessPoolExecutor.submit()
更快的速度.