Python 检测txt文件中的数据，字符串解析并输出为csv文件

发布于07月30日

这是我的密码.我的工作是使用代码检测到文件夹中的一堆文本文件，然后将字符串解析为CSV文件输出的数据.你能就如何做这件事给我一些提示吗？我在苦苦挣扎.

我的代码的第一步是检测数据在txt文件中的位置.我发现所有数据都以‘Read’开头，然后我找到了每个文件中数据的开始行.在那之后，我在如何将数据输出导出为CSV文件方面遇到了困难.

import os
import argparse
import csv
from typing import List


def validate_directory(path):
    if os.path.isdir(path):
        return path
    else:
        raise NotADirectoryError(path)


def get_data_from_file(file) -> List[str]:
    ignore_list = ["Read Segment", "Read Disk", "Read a line", "Read in"]
    data = []
    with open(file, "r", encoding="latin1") as f:
        try:
            lines = f.readlines()
        except Exception as e:
            print(f"Unable to process {file}: {e}")
            return []
        for line_number, line in enumerate(lines, start=1):
            if not any(variation in line for variation in ignore_list):
                if line.strip().startswith("Read ") and not line.strip().startswith("Read ("): # TODO: fix this with better regex
                    data.append(f'Found "Read" at line {line_number} in {file}')
                    print(f'Found "Read" at {file}:{line_number}')
                    print(lines[line_number-1])
    return data


def list_read_data(directory_path: str) -> List[str]:
    total_data = []
    for root, _, files in os.walk(directory_path):
        for file_name in files:
            if file_name.endswith(".txt"):
                data = get_data_from_file(os.path.join(root, file_name))
                total_data.extend(data)

    return total_data


def write_results_to_csv(output_file: str, data: List[str]):
    with open(output_file, "w", newline="", encoding="utf-8") as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(["Results"])
        for line in data:
            writer.writerow([line])


def main(directory_path: str, output_file: str):
    data = list_read_data(directory_path)
    write_results_to_csv(output_file, data)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Process the 2020Model folder for input data."
    )
    parser.add_argument(
        "--directory", type=validate_directory, help="folder to be processed"
    )
    parser.add_argument("--output", type=str, help="Output file name (e.g., outputfile.csv)", default="outputfile.csv")

    args = parser.parse_args()
    main(os.path.abspath(args.directory), args.output)

以下是我理想的CSV输出数据:

1985	1986	1986	1987	1988	1989	1990	1991	1992	1993	1994
37839	36962	37856	41971	40838	44640.87	42826.34	44883.03	43077.59	45006.49	46789

你能给我一些提示吗？

将字符串解析放在哪里？
如何输出为CSV文件.

下面是一个示例txt文件:

Select Year(2007-2025)
Read TotPkSav
/2007     2008     2009     2010     2011     2012     2013     2014     2015     2016     2017     2018     2019     2020     2021     2022     2023     2024     2025 
   00       27       53       78      108      133      151      161      169      177      186      195      205      216      229      242      257      273      288

all_rows: list[list[str]] = [] for fname in glob.glob("**/*.txt", recursive=True): with open(fname, encoding="iso-8859-1") as f: print(f"reading {fname}") lines = [x.strip() for x in list(f)] if len(lines) != 4: print(f'skipping {fname} with too few lines"') continue line2 = lines[1] if line2[:4] != "Read" or line2[:6] == "Read (": print(f'skipping {fname} with line2 = "{line2}"') continue line3, line4 = lines[2:4] if line3[0] == "/": line3 = line3[1:] header = [x for x in line3.split(" ") if x] data = [x for x in line4.split(" ") if x] all_rows.append(header) all_rows.append(data) with open("output.csv", "w", newline="", encoding="utf-8") as f: writer = csv.writer(f) writer.writerow(["Result"]) writer.writerows(all_rows)

| Result | |--------| | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 | | 00 | 27 | 53 | 78 | 108 | 133 | 151 | 161 | 169 | 177 | 186 | 195 | 205 | 216 | 229 | 242 | 257 | 273 | 288 | | 2099 | 2098 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 | | 00 | 27 | 53 | 78 | 108 | 133 | 151 | 161 | 169 | 177 | 186 | 195 | 205 | 216 | 229 | 242 | 257 | 273 | 288 |

Python 检测txt文件中的数据，字符串解析并输出为csv文件

推荐答案

Python相关问答推荐

Image Font生成带有条形码Code 128的条形码时出现枕头错误OSErsor：无法打开资源

剧作家Python没有得到回应

如何使用Jinja语法在HTML中重定向期间传递变量？

使可滚动框架在tkinter环境中看起来自然

try 将一行连接到Tensorflow中的矩阵

在Python中动态计算范围

Django RawSQL注释字段

如何启动下载并在不击中磁盘的情况下呈现响应？

为什么\b在这个正则表达式中不解释为反斜杠

寻找Regex模式返回与我当前函数类似的结果

Numpyro AR(1)均值切换模型抽样不一致性

在Docker容器(Alpine)上运行的Python应用程序中读取. accdb数据库

使用嵌套对象字段的Qdrant过滤

504未连接IB API TWS错误—即使API连接显示已接受''

使用SQLAlchemy从多线程Python应用程序在postgr中插入多行的最佳方法是什么？'

如何训练每一个pandaprame行的线性回归并生成斜率

浏览超过10k页获取数据，解析：欧洲搜索服务：从欧盟站点收集机会的微小刮刀&

Python OPCUA，modbus通信代码运行3小时后出现RuntimeError

SpaCy：Regex模式在基于规则的匹配器中不起作用

TypeError：'；Locator'；对象无法在PlayWriter中使用.first()调用