我试着用this个问题来回答我的问题,但没有成功.
我用Python 3.10.
My dictionary
的 struct 如下(where each list of string is a review of the product):
{storeNameA : {productA : 0 [string, string, ..., string]
1 [string, string, ..., string]
2 [string, string, ..., string]
...
n [string, string, ..., string],
productB : 0 [string, string, ..., string]
1 [string, string, ..., string]
2 [string, string, ..., string]
...
n [string, string, ..., string],
...,
product_n : 0 [string, string, ..., string]
1 [string, string, ..., string]
2 [string, string, ..., string]
...
n [string, string, ..., string]},
storeNameB : {productA : 0 [string, string, ..., string]
1 [string, string, ..., string]
2 [string, string, ..., string]
...
n [string, string, ..., string],
productB : 0 [string, string, ..., string]
1 [string, string, ..., string]
2 [string, string, ..., string]
...
n [string, string, ..., string],
...,
product_n : 0 [string, string, ..., string]
1 [string, string, ..., string]
2 [string, string, ..., string]
...
n [string, string, ..., string]}}
So I would access a single 'review' like dictionary['storeNameA']['productB'][0]
or dictionary['storeNameB']['productB'][2]
. Each product
is the same in each store.
我试图对整个词典中的每一篇 comments 都执行一个过程.
def mapAllValues(nestedDict, func):
return {storeName: {product: func(prodFile) for product, prodFile in storeDict.items()} for storeName, storeDict in nestedDict.items()}
new_dictionary = mapAllValues(dictionary, lambda reviews: reviews.apply(processFunction))
# processFunction takes a list of string and returns a list of tuples.
# So I end up with a new dictionary where there is now a list of tuples, where there was a list of string.
# {storeName : {product : 0 [(str, str), (str, str), ..., (str, str)] and so on...
It's a pretty long dictionary, and takes ~606 seconds to complete.
So, I have tried to implement a way to run this in parallel, but it's obviously not working as I expect it to because that runs in ~2170 seconds. I do get the right output though.
我的问题是,我在下面的代码中做错了什么?
manager = multiprocessing.Manager()
container = manager.dict()
d = manager.dict(dictionary)
container = manager.dict()
for key in d:
container[key] = manager.dict()
for key in d['storeNameA']:
container['storeNameA'][key] = manager.dict()
for key in d['storeNameB']:
container['storeNameB'][key] = manager.dict()
with multiprocessing.Pool() as pool:
pool.starmap(processFunction, [('storeNameA', product, d, container) for product in d['storeNameA']], chunksize=round(42739 / multiprocessing.cpu_count()))
pool.starmap(processFunction, [('storeNameB', product, d, container) for product in d['storeNameB']], chunksize=round(198560 / multiprocessing.cpu_count()))
new_dictionary = dict(container)
我肯定我误解了这实际上是如何工作的,但在我看来,这应该是把每家store 的每一种产品分块,并将其排列?
Anyway, I think I've explained it as well as I can. If I need to clarify anything, please let me know.
Thank you in advance!