import re

name = "John"

#In these examples it works fine
input_sense_aux = "These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer"
#input_sense_aux = "Do you know if John with the others could come this afternoon?"

#In these examples it does not work well
#input_sense_aux = "John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "Can you help us, otherwise it will be waiting for a while longer for John"
#input_sense_aux = "sorry! can you help us? otherwise it will be waiting for a while longer for John"



regex_patron_m1 = r"\s*((?:\w\s*)+)\s*?" + name + r"\s*((?:\w\s*)+)\s*\??"
m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m1:
    something_1, something_2 = m1.groups()

    something_1 = something_1.strip()
    something_2 = something_2.strip()

    print(repr(something_1))
    print(repr(something_2))

我需要像这样从regexgrab the content before "John":

(start of sentence|¿|¡|,|;|:|(|[|.) \s* "content for something_1" \s* John

然后:

John \s* "content for something_2" \s* (end of sentence|?|!|,|;|:|)|]|.)

在拳头示例中,正则表达式运行得很好:

'these teams are too many but I know that'
'can help us'
'Do you know if'
'with the others could come this afternoon'

但对于最后3个示例的情况,正则表达式不会返回任何内容

我需要帮助才能将我的正则表达式推广到所有这些情况,同时尊重它必须提取something_1something_2的内容的条件

对于最后三个示例,预期结果为:

''
' can help us'
' otherwise it will be waiting for a while longer for '
''
' otherwise it will be waiting for a while longer for '
''

推荐答案

您可以使用

import re

name = "John"

input_sense_auxs = [
    "These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer",
    "These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer",
    "These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer",
    "Do you know if John with the others could come this afternoon?",

    "John can help us, otherwise it will be waiting for a while longer",
    "Can you help us, otherwise it will be waiting for a while longer for John",
    "sorry! can you help us? otherwise it will be waiting for a while longer for John"]

regex_patron_m1 = fr'(?:^|[?!¿¡,;:([.])\s*(?:(\w+(?:\s+\w+)*)\s*)?{name}(?:\s*(\w+(?:\s+\w+)*))?\s*(?:$|[]?!,;:).])'
# r"\s*((?:\w\s*)+)\s*?" + name + r"\s*((?:\w\s*)+)\s*\??"
for input_sense_aux in input_sense_auxs:
    print(f'--- {input_sense_aux} ---')
    m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
    if m1:
        something_1, something_2 = m1.groups()

        something_1 = something_1.strip() if something_1 else ""
        something_2 = something_2.strip() if something_2 else ""

        print(repr(something_1))
        print(repr(something_2))

输出:

--- These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer ---
'I think'
'can help us'
--- These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer ---
'These sound system are too many but I know that'
'can help us'
--- These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer ---
'These sound system are too many but I know that'
'can help us'
--- Do you know if John with the others could come this afternoon? ---
'Do you know if'
'with the others could come this afternoon'
--- John can help us, otherwise it will be waiting for a while longer ---
''
'can help us'
--- Can you help us, otherwise it will be waiting for a while longer for John ---
'otherwise it will be waiting for a while longer for'
''
--- sorry! can you help us? otherwise it will be waiting for a while longer for John ---
'otherwise it will be waiting for a while longer for'
''

请看Python demo.

Details:

  • (?:^|[?!¿¡,;:([.])\s*(?:(\w+(?:\s+\w+)*)\s*)? - the prefix, the left-hand side part, that matches
    • (?:^|[?!¿¡,;:([.])-从?!¿¡,;:([.集合开始的字符串或字符
    • \s*-零个或更多空格
    • (?:(\w+(?:\s+\w+)*)\s*)? - an optional occurrence of
      • (\w+(?:\s+\w+)*)-组1:一个或多个单词字符,然后是一个或多个空格和一个或多个单词字符的零个或多个序列
      • \s*-零个或更多空格
  • John--名字
  • (?:\s*(\w+(?:\s+\w+)*))?\s*(?:$|[]?!,;:).]) - the right-hand part:
    • \s*-零个或更多空格
    • (\w+(?:\s+\w+)*))?-组2:一个或多个单词字符的可选序列,然后零个或多个出现一个或多个空格,后跟一个或多个单词字符
    • \s*-零个或更多空格
    • (?:$|[]?!,;:).])-字符串结尾或]?!,;:).个字符集中的字符.

请看regex demo.

Python相关问答推荐

计算组中唯一值的数量

"使用odbc_connect(raw)连接字符串登录失败;可用于pyodbc"

Pandas计数符合某些条件的特定列的数量

导入...从...混乱

driver. find_element无法通过class_name找到元素'""

当点击tkinter菜单而不是菜单选项时,如何执行命令?

多处理队列在与Forking http.server一起使用时随机跳过项目

Django RawSQL注释字段

如何使用SentenceTransformers创建矢量嵌入?

下三角形掩码与seaborn clustermap bug

如何杀死一个进程,我的Python可执行文件以sudo启动?

ModuleNotFoundError:Python中没有名为google的模块''

在round函数中使用列值

有没有一种方法可以在朗肯代理中集成向量嵌入

Matplotlib中的曲线箭头样式

具有不匹配列的2D到3D广播

为什么在生成时间序列时,元组索引会超出范围?

如何通过函数的强式路径动态导入函数?

如何导入与我试图从该目录之外运行的文件位于同一目录中的Python文件?

如何使用Pillow基于二进制掩码设置PNG的RGB值