Python 如何修复此 RegEx 模式，以便提取与此 regex 模式匹配的字符串中所有可能出现的子字符串

发布于08月20日

我使用这段代码的目标是只替换出现在特定模式之前和之后的子字符串(为了建立该模式，我使用了regEx)

实际上，我已经try 了很多方法，但没有得到好的结果，这里我使用compile()方法将输入字符串中找到的RegEx模式编译成一个regex模式对象(基本上我逐个提取我想要修改的满足RegEx模式条件的子字符串的匹配项).

然后，我可以简单地使用replace()函数，原谅冗余，将提取的子串替换为我想要的子串

import re

input_text = "y creo que hay 55 y 6 casas, y quizas alguna mas... yo creo que empezaria entre la 1 ,y las 27"

#the string with which I will replace the desired substrings in the original input string
content_fix = " "

##This is the regex pattern that tries to establish the condition in which the substring should be replaced by the other
#pat = re.compile(r"\b(?:previous string)\s*string that i need\s*(?:string below)?", flags=re.I, )
#pat = re.compile(r"\d\s*(?:y)\s*\d", flags=re.I, )
pat = re.compile(r"\d\s*(?:, y |,y |y )\s*(?:las \d|la \d|\d)", flags=re.I, )

x = pat.findall(input_text)
print(*map(str.strip, x), sep="\n") #it will print the substrings, which it will try to replace in the for cycle
content_list = []
content_list.append(list(map(str.strip, x)))
for content in content_list[0]:
    input_text = input_text.replace(content, content_fix) # "\d y \d"  ---> "\d \d"

print(repr(input_text))

这是我得到的输出:

'y creo que hay 5  casas, y quizas alguna mas... yo creo que empezaria entre la  7'

这是the correct output that I need:

'y creo que hay 55 6 casas, y quizas alguna mas... yo creo que empezaria entre la 1 27'

我应该对RegEx进行哪些更改，以使其提取正确的子字符串并符合此代码的目标？

input_text = "y creo que hay 55 y 6 casas, y quizas alguna mas... \ yo creo que empezaria entre la 1 ,y las 27" re.sub(r'((\d+\s+)y\s+(\d+))| ((\d+\s+),y\s+\w{3}\s+(\d+))', r'\2\3 \5\6', input_text) y creo que hay 55 6 casas, y quizas alguna mas... yo creo que empezaria entre la 1 27

Python 如何修复此 RegEx 模式，以便提取与此 regex 模式匹配的字符串中所有可能出现的子字符串

推荐答案

Python相关问答推荐

如何根据日期和时间将状态更新为已过期或活动？

Polars比较了两个预设-有没有方法在第一次不匹配时立即失败

Matlab中是否有Python的f-字符串等效物

删除最后一个pip安装的包

scikit-learn导入无法导入名称METRIC_MAPPING64'

加速Python循环

如何在Python中并行化以下搜索？

如何在Raspberry Pi上检测USB并使用Python访问它？

字符串合并语法在哪里记录

如何从需要点击/切换的网页中提取表格？

与命令行相比，相同的Python代码在Companyter Notebook中运行速度慢20倍

无论输入分辨率如何，稳定扩散管道始终输出512 * 512张图像

旋转多边形而不改变内部空间关系

如何在PySide/Qt QColumbnView中删除列

matplotlib图中的复杂箭头形状

使用Openpyxl从Excel中的折线图更改图表样式

我怎么才能用拉夫分拣呢？

为什么在更新Pandas 2.x中的列时，数据类型不会更改，而在Pandas 1.x中会更改？

如何删除剪裁圆的对角线的外部部分

按最大属性值Django对对象进行排序