Python3.x 在python中循环处理时并行写入文件

发布于02月23日

我有65K的CSV数据.我需要对每个csv行进行一些处理，这会在最后生成一个字符串.我必须在文件中写入/附加该字符串.

Psuedo代码:

for row in csv_data:
   processed_string = ...
   file_pointer.write(processed_string + '\n')

如何使此写入操作并行运行，以便主处理操作不必包括写入文件所需的时间？我try 使用批处理写入(存储n行，然后同时写入).但如果你能给我推荐一些能同时做到这一点的方法，那就太好了.谢谢

Edit:csv文件中有65K条记录.我正在处理它，生成一个字符串(多行大约10-12).我必须把它写进一个文件.对于65K记录，每个记录的结果为10-15行.通常代码需要10分钟才能运行.但添加此文件操作会将时间增加到+2-3分钟.那么，如果我可以并行执行，而不影响代码执行？

下面是代码部分.

for i in range(len(queries)): # 65K runs
    Logs.log_query(i, name, version)

    # processed_results = Some processing ...

    # Final Answer
    s = final_results(name, version, processed_results) # Returns a multiline string
    f.write(s + '\n')

"""
EXAMPLE OUTPUT:
-----------------
[0] NAME: Adobe Acrobat Reader DC | VERSION: 21.005
FAISS RESULTS (with cutoff 0.63)
     id                                               name                         version   eol_date extended_eol_date                   major_version minor_version    score
1486469                            Adobe Acrobat Reader DC                    21.005.20054 07-04-2020        07-07-2020                              21           005 0.966597
 327901                            Adobe Acrobat Reader DC                    21.005.20048 07-04-2020        07-07-2020                              21           005 0.961541
 327904                            Adobe Acrobat Reader DC                    21.007.20095 07-04-2020        07-07-2020                              21           007 0.960825
 327905                            Adobe Acrobat Reader DC                    21.007.20099 07-04-2020        07-07-2020                              21           007 0.960557
 327902                            Adobe Acrobat Reader DC                    21.005.20060 07-04-2020        07-07-2020                              21           005 0.958580
 327900                            Adobe Acrobat Reader DC                    21.001.20145 07-04-2020        07-07-2020                              21           001 0.956085
 327903                            Adobe Acrobat Reader DC                    21.007.20091 07-04-2020        07-07-2020                              21           007 0.954148
1486465                            Adobe Acrobat Reader DC                    20.006.20034 07-04-2020        07-07-2020                              20           006 0.941820
1486459                            Adobe Acrobat Reader DC                    19.012.20035 07-04-2020        07-07-2020                              19           012 0.928502
1486466                            Adobe Acrobat Reader DC                    20.012.20048 07-04-2020        07-07-2020                              20           012 0.928366
1486458                            Adobe Acrobat Reader DC                    19.012.20034 07-04-2020        07-07-2020                              19           012 0.925761
1486461                            Adobe Acrobat Reader DC                    19.021.20047 07-04-2020        07-07-2020                              19           021 0.922519
1486463                            Adobe Acrobat Reader DC                    19.021.20049 07-04-2020        07-07-2020                              19           021 0.919659
1486462                            Adobe Acrobat Reader DC                    19.021.20048 07-04-2020        07-07-2020                              19           021 0.917590
1486464                            Adobe Acrobat Reader DC                    19.021.20061 07-04-2020        07-07-2020                              19           021 0.912260
1486460                            Adobe Acrobat Reader DC                    19.012.20040 07-04-2020        07-07-2020                              19           012 0.909160
1486457                            Adobe Acrobat Reader DC                    15.008.20082 07-04-2020        07-07-2020                              15           008 0.902536
 327899                                   Adobe Acrobat DC                    21.007.20099 07-04-2020        07-07-2020                              21           007 0.895940
1277732                        Acrobat Reader DC (classic)                            2015 07-07-2020                 *                            2015           NaN 0.875471

OPEN SEARCH RESULTS (with cutoff 13)
{ "score": 67.98198, "id": 327901, "name": Adobe Acrobat Reader DC, "version": 21.005.20048, "eol_date": 2020-04-07, "extended_eol_date": 2020-07-07 }
{ "score": 66.63623, "id": 327902, "name": Adobe Acrobat Reader DC, "version": 21.005.20060, "eol_date": 2020-04-07, "extended_eol_date": 2020-07-07 }
{ "score": 65.96028, "id": 1486469, "name": Adobe Acrobat Reader DC, "version": 21.005.20054, "eol_date": 2020-04-07, "extended_eol_date": 2020-07-07 }
FINAL ANSWER [OPENSEARCH]
{ "score": 67.98198, "id": 327901, "name": Adobe Acrobat Reader DC, "version": 21.005.20048, "eol_date": 2020-04-07, "extended_eol_date": 2020-07-07 }
----------------------------------------------------------------------------------------------------

"""

Python3.x 在python中循环处理时并行写入文件

推荐答案

Python-3.x相关问答推荐

如何有效地计算Kernel/Matrix

Python根据阈值对数字进行分组

Python多处理池：缺少一个进程

根据另一列中的条件填写该列中的值

如何将参数/值从测试方法传递给pytest的fixture函数？

在循环中使用Print&S结束参数时出现奇怪的问题

从 https：//www.niftytrader.in/stock-options-chart/sbin 提取 SBIN 股票最大痛苦值的 Python 代码不起作用 - 我错过了什么？

如何在 20 秒后重复使用 Pillow 在现有图像上创建新图像？

如何使用复选按钮更改 Pyplot 轴的属性？

有没有办法使用 python opencv 计算与图像的白色距离

正则表达式来识别用 Python 写成单词的数字？

Dask 多阶段资源设置导致 Failed to Serialize 错误

如何在带有 GUI 的 python 游戏中设置回答时间限制？

在 jupyter notebook 的单元格中使用 sudo

如何从字典中打印特定键值？

如何为 Python 中的线程设置异步事件循环？

virtualenv virtualenvwrapper virtualenv：错误：无法识别的参数：--no-site-packages

Python3 mysqlclient-1.3.6(又名 PyMySQL)的用法？

为现有项目创建virtualenv

为什么异步库比这个 I/O 绑定操作的线程慢？