Python 如何才能将文本文件分成多列，但仅限于特定的行

发布于02月13日

我有一个需要操作的.psa文件，但据我所知，该文件需要在.txt或.csv中才能使用PANAS进行操作.为了解决这个问题，我正在读取原始文件并将内容写入另一个.txt文件，我的代码将应用于该文件.

The original .psa file has text data in it, all separated by commas. I am trying to organize this data into columns and only pulling the data I need. Each line has 30+ values separated by commas, but I only need the 3rd value to put into a column.
I will have a zip folder which needs this code to run through and do the same thing to each file within the folder. Each file will have a different store number in the title.

例如:

文件名:1 Area 2 - store 15 group.psa

prod,123,456,abc,def, etc...
pla,124,uhj,jop,etc. 
prod,321,789,ghi,jkl, etc...
...

期望值: 我只想删除以prod开头的行中的第三项，并将其放入一个.csv文件中.我还想将原始文件的标题保留在另一栏中(如果只包含店号，但不是必需的，那就太好了). 前男友.

nums	store #
456	15
789	15

以下是我到目前为止拥有的代码:

with open('1 Area 2 - store 15 group.psa','r') as firstfile, open('test.txt','a') as secondfile: 
    # read content from first file 
    for line in firstfile: 
         # append content to second file 
         secondfile.write(line)

file = pd.read_csv("test.txt", sep=',', usecols=[0,1,2], header=0, names=['col 1','col 2','col 3'])
file.to_csv("output.csv", index=False)

这段代码能够给出列作为输出，但是行结束时包括不以prod开头的行，并且我有3列而不只是num列(当我只以[usecols=2]开头时得到一个错误，所以数据仍然混乱，并且我不知道如何在第二列中获得原始文件的标题.

import re fname = '1 Area 2 - store 15 group.psa' df = pd.read_csv(fname, usecols=[0, 2], header=None, names=['type', 'num']) store = re.search(r'store\s+(\d+)', fname).group(1) df = df[df['type'] == 'prod'].drop(columns='type').assign(store=store) df.to_csv("output.csv", index=False)

Python 如何才能将文本文件分成多列，但仅限于特定的行

推荐答案

Python相关问答推荐

Pandas 第二小值有条件

使用SciPy进行曲线匹配未能给出正确的匹配

如何使用pandasDataFrames和scipy高度优化相关性计算

Pystata：从Python并行运行stata实例

使用miniconda创建环境的问题

可变参数数量的重载类型(args或kwargs)

数据抓取失败：寻求帮助

如何使Matplotlib标题以图形为中心，而图例框则以图形为中心

Python逻辑操作作为Pandas中的条件

为什么Django管理页面和我的页面的其他CSS文件和图片都找不到？'

如果初始groupby找不到满足掩码条件的第一行，我如何更改groupby列，以找到它？

如何使用两个关键函数来排序一个多索引框架？

未调用自定义JSON编码器

在pandas/python中计数嵌套类别

为什么'if x is None：pass'比'x is None'单独使用更快？

如何在Great Table中处理inf和nans

当输入是字典时，`pandas. concat`如何工作？

Stats.ttest_ind：提取df值

如何通过特定导入在类中执行Python代码

当lambda函数作为参数传递时，pyo3执行