我有65K的CSV数据.我需要对每个csv行进行一些处理,这会在最后生成一个字符串.我必须在文件中写入/附加该字符串.
Psuedo代码:
for row in csv_data:
processed_string = ...
file_pointer.write(processed_string + '\n')
如何使此写入操作并行运行,以便主处理操作不必包括写入文件所需的时间?我try 使用批处理写入(存储n行,然后同时写入).但如果你能给我推荐一些能同时做到这一点的方法,那就太好了.谢谢
Edit:csv文件中有65K条记录.我正在处理它,生成一个字符串(多行大约10-12).我必须把它写进一个文件.对于65K记录,每个记录的结果为10-15行.通常代码需要10分钟才能运行.但添加此文件操作会将时间增加到+2-3分钟.那么,如果我可以并行执行,而不影响代码执行?
下面是代码部分.
for i in range(len(queries)): # 65K runs
Logs.log_query(i, name, version)
# processed_results = Some processing ...
# Final Answer
s = final_results(name, version, processed_results) # Returns a multiline string
f.write(s + '\n')
"""
EXAMPLE OUTPUT:
-----------------
[0] NAME: Adobe Acrobat Reader DC | VERSION: 21.005
FAISS RESULTS (with cutoff 0.63)
id name version eol_date extended_eol_date major_version minor_version score
1486469 Adobe Acrobat Reader DC 21.005.20054 07-04-2020 07-07-2020 21 005 0.966597
327901 Adobe Acrobat Reader DC 21.005.20048 07-04-2020 07-07-2020 21 005 0.961541
327904 Adobe Acrobat Reader DC 21.007.20095 07-04-2020 07-07-2020 21 007 0.960825
327905 Adobe Acrobat Reader DC 21.007.20099 07-04-2020 07-07-2020 21 007 0.960557
327902 Adobe Acrobat Reader DC 21.005.20060 07-04-2020 07-07-2020 21 005 0.958580
327900 Adobe Acrobat Reader DC 21.001.20145 07-04-2020 07-07-2020 21 001 0.956085
327903 Adobe Acrobat Reader DC 21.007.20091 07-04-2020 07-07-2020 21 007 0.954148
1486465 Adobe Acrobat Reader DC 20.006.20034 07-04-2020 07-07-2020 20 006 0.941820
1486459 Adobe Acrobat Reader DC 19.012.20035 07-04-2020 07-07-2020 19 012 0.928502
1486466 Adobe Acrobat Reader DC 20.012.20048 07-04-2020 07-07-2020 20 012 0.928366
1486458 Adobe Acrobat Reader DC 19.012.20034 07-04-2020 07-07-2020 19 012 0.925761
1486461 Adobe Acrobat Reader DC 19.021.20047 07-04-2020 07-07-2020 19 021 0.922519
1486463 Adobe Acrobat Reader DC 19.021.20049 07-04-2020 07-07-2020 19 021 0.919659
1486462 Adobe Acrobat Reader DC 19.021.20048 07-04-2020 07-07-2020 19 021 0.917590
1486464 Adobe Acrobat Reader DC 19.021.20061 07-04-2020 07-07-2020 19 021 0.912260
1486460 Adobe Acrobat Reader DC 19.012.20040 07-04-2020 07-07-2020 19 012 0.909160
1486457 Adobe Acrobat Reader DC 15.008.20082 07-04-2020 07-07-2020 15 008 0.902536
327899 Adobe Acrobat DC 21.007.20099 07-04-2020 07-07-2020 21 007 0.895940
1277732 Acrobat Reader DC (classic) 2015 07-07-2020 * 2015 NaN 0.875471
OPEN SEARCH RESULTS (with cutoff 13)
{ "score": 67.98198, "id": 327901, "name": Adobe Acrobat Reader DC, "version": 21.005.20048, "eol_date": 2020-04-07, "extended_eol_date": 2020-07-07 }
{ "score": 66.63623, "id": 327902, "name": Adobe Acrobat Reader DC, "version": 21.005.20060, "eol_date": 2020-04-07, "extended_eol_date": 2020-07-07 }
{ "score": 65.96028, "id": 1486469, "name": Adobe Acrobat Reader DC, "version": 21.005.20054, "eol_date": 2020-04-07, "extended_eol_date": 2020-07-07 }
FINAL ANSWER [OPENSEARCH]
{ "score": 67.98198, "id": 327901, "name": Adobe Acrobat Reader DC, "version": 21.005.20048, "eol_date": 2020-04-07, "extended_eol_date": 2020-07-07 }
----------------------------------------------------------------------------------------------------
"""