我try 解析的日志(log)文件的格式如下:

===================
DateTimeStamp - ShortSummaryOfEntry
===================

Line 1 of text
Line 2 of text 
...
Line last of text

===================
DateTimeStamp - ShortSummaryOfEntry
===================

Line 1 of text
Line 2 of text 
...
Line last of text

===================
DateTimeStamp - ShortSummaryOfEntry
===================

Line 1 of text
Line 2 of text 
...
Line last of text

....

我try 了下面的模式,有很多种变化,但都没有成功:

(={19}\n(.*)\n={19}\n)(\n.*)+(?=={19})

结尾处的查找似乎被前面的"+"覆盖.组0显示整个文件内容;组1和组2是正确的;组3(多行文本数据)是空的.

推荐答案

您可以使用非贪婪修饰符修改regex组以减少捕获组数+(regex101):

import re

text = """\
===================
DateTimeStamp - ShortSummaryOfEntry
===================

1 Line 1 of text
2 Line 2 of text
3 Line last of text

===================
DateTimeStamp - ShortSummaryOfEntry
===================

4 Line 1 of text
5 Line 2 of text
6 Line last of text

===================
DateTimeStamp - ShortSummaryOfEntry
===================

7 Line 1 of text
8 Line 2 of text
9 Line last of text"""


pat = r"={19}\n(.+?)\n={19}\s*(.+?)\s*(?=={19}|\Z)"

for title, body in re.findall(pat, text, flags=re.S):
    print(title)
    print(body)
    print("-" * 80)

打印:

DateTimeStamp - ShortSummaryOfEntry
1 Line 1 of text
2 Line 2 of text
3 Line last of text
--------------------------------------------------------------------------------
DateTimeStamp - ShortSummaryOfEntry
4 Line 1 of text
5 Line 2 of text
6 Line last of text
--------------------------------------------------------------------------------
DateTimeStamp - ShortSummaryOfEntry
7 Line 1 of text
8 Line 2 of text
9 Line last of text
--------------------------------------------------------------------------------

Python相关问答推荐

从webhook中的短代码(而不是电话号码)接收Twilio消息

滚动和,句号来自Pandas列

scikit-learn导入无法导入名称METRIC_MAPPING64'

pandas滚动和窗口中有效观察的最大数量

将tdqm与cx.Oracle查询集成

在Python中动态计算范围

SQLAlchemy Like ALL ORM analog

计算每个IP的平均值

在Python中,从给定范围内的数组中提取索引组列表的更有效方法

有没有一种ONE—LINER的方法给一个框架的每一行一个由整数和字符串组成的唯一id?

名为__main__. py的Python模块在导入时不运行'

基于行条件计算(pandas)

从旋转的DF查询非NaN值

Python日志(log)模块如何在将消息发送到父日志(log)记录器之前向消息添加类实例变量

Beautifulsoup:遍历一个列表,从a到z,并解析数据,以便将其存储在pdf中.

Scipy差分进化:如何传递矩阵作为参数进行优化?

如何在Pandas中用迭代器求一个序列的平均值?

VSCode Pylance假阳性(?)对ImportError的react

根据边界点的属性将图划分为子图

在不降低分辨率的情况下绘制一组数据点的最外轮廓