我试图从zip文件中搜索字符串,该文件的 struct 如下:

├───some_zip_file.zip
│   ├──some_directory
│   │   ├──zip1.zip
│   │   ├──zip2.zip
│   │   ├──zip3.zip 
│   │   ├   ├──File1 \\ search in this file

我使用了下面的代码,但它不是在文件夹内搜索,我通过许多在线论坛,但无法得到这个解决方案.有人能帮忙吗?

import io
import zipfile

search_string = 'ERROR '
exclude_file = 'Test'
with zipfile.ZipFile("C:\\Python testing\\Logs-node1.zip") as zf:
    for filename in zf.infolist():
        if filename.find(exclude_file) == -1:
            with io.TextIOWrapper(zf.open(filename), encoding="utf-8", errors='ignore') as f:
                for line_no, line in enumerate(f, 1):
                    if search_string.lower() in line.lower():
                        # print(filename + ' : ' + str(line_no) + ' : ' + line)
                        file1 = open("C:\\Python testing\\my file.txt", "w")
                        file1.write(filename + ' : ' + str(line_no) + ' : ' + line)
                        file1.close()
            f.close()

推荐答案

您应该能够跟踪代码中发生的事情.

import zipfile
import os

search_string = 'ERROR '
exclude_file = 'Test.txt' # include filename with extension
outputfile = 'C:\\Python testing\\my file.txt'
rootzipfile = 'C:\\Python testing\\Logs-node1.zip'
PATH_SEPARATOR = os.sep
                
def find_string_in_nested_zip(filepath, zf=None, extra_path_info=''):
    if not(is_zip_or_nested_zip(filepath, zf)) and not(is_dir(filepath)) and \
        not(filepath.endswith(exclude_file)) and file_contains_text(filepath, zf):
        write_text_with_line_no(filepath, zf, extra_path_info)
    
    elif zipfile.is_zipfile(filepath):
        make_recursive_call_for_zip(filepath)
    
    elif is_nested_zip(filepath, zf):
        with zf.open(filepath, 'r') as f:
            make_recursive_call_for_zip(f, create_extra_path_info(filepath))

# must open the nested zip file before checking otherwise doesn't give the correct result             
def is_nested_zip(filename, zf):
    with zf.open(filename, 'r') as f:
        if zipfile.is_zipfile(f):
            return True
        return False
    
def is_zip_or_nested_zip(filename, zf):
    return zipfile.is_zipfile(filename) or is_nested_zip(filename, zf)

# os.path.isdir() does not work for dirs inside zips so we write our own
def is_dir(filepath):
    return filepath.endswith(PATH_SEPARATOR)
                
def file_contains_text(filename, zf):
    with zf.open(filename, 'r') as f:
        if search_string.encode() in f.read():
            return True
        return False

def write_text_with_line_no(filename, zf, extra_path_info):
    with zf.open(filename, 'r') as f, open(outputfile, 'a+') as op:
        for (i, line) in enumerate(f):
            if search_string.encode() in line:
                op.write(f'{extra_path_info + PATH_SEPARATOR + filename}: line{str(i+1)} : {line.decode()}')

# extra_path_info makes sure when accessing nested zips the path up to the nested zip is preserved
def create_extra_path_info(filepath):
    return PATH_SEPARATOR.join(filepath.split(PATH_SEPARATOR)[:-1])

def make_recursive_call_for_zip(filepath, extra_path_info=''):
    with zipfile.ZipFile(filepath, 'r') as zf:
        for filename in zf.namelist():
            find_string_in_nested_zip(filename, zf, extra_path_info)

find_string_in_nested_zip(rootzipfile)

Python相关问答推荐

绘制系列时如何反转轴?

KNN分类器中的GridSearchCV

如何使用Tkinter创建两个高度相同的框架(顶部和底部)?

如何使用Selenium访问svg对象内部的元素

根据给定日期的状态过滤查询集

将numpy数组存储在原始二进制文件中

Odoo 14 hr. emergency.public内的二进制字段

PywinAuto在Windows 11上引发了Memory错误,但在Windows 10上未引发

Pytest两个具有无限循环和await命令的Deliverc函数

将图像拖到另一个图像

在Polars(Python库)中将二进制转换为具有非UTF-8字符的字符串变量

如果值发生变化,则列上的极性累积和

如何使用scipy的curve_fit与约束,其中拟合的曲线总是在观测值之下?

转换为浮点,pandas字符串列,混合千和十进制分隔符

如何找出Pandas 图中的连续空值(NaN)?

Python避免mypy在相互引用中从另一个类重定义类时失败

不允许 Select 北极滚动?

使用polars. pivot()旋转一个框架(类似于R中的pivot_longer)

如何将返回引用的函数与pybind11绑定?

在pandas中,如何在由两列加上一个值列组成的枢轴期间或之后可靠地设置多级列的索引顺序,