我不明白我写的这个正则表达式出了什么问题.

import re

# Define the input strings
inputs = [
    "exmpl-staging-1234-e2e-1707336983872",
    "exmpl-staging-1234-e2e-1706336983875",
    "exmpl-staging-main-e2e-1707336983878",
    "exmpl-demo-e2e-1707336983878",
    "exmpl-production-e2e-1707336983875",
    "exmpl-staging-2345",
    "exmpl-staging-1234",
    "exmpl-staging-1234-my-case-title",
    "exmpl-staging-1234-my-case-title-e2e-1707336983872"
]

# Define the regex pattern
pattern = re.compile(r'^exmpl-(?P<type>main|staging|demo|production)(?:-(?P<case>\d+|main))?(?:-(?P<title>(?!e2e-\d+).+))?(?:-e2e-(?P<timestamp>\d+))?$')

# Initialize a list to store the extracted data
extracted_data = []

# Loop through the input strings
for input_str in inputs:
    # Match the pattern against the input string
    match = pattern.match(input_str)

    # Extract the required information
    if match:
        extracted_data.append({
            'Input': input_str,
            'Type': match.group('type'),
            'Case': match.group('case') if match.group('case') else 'none',
            'Title': match.group('title') if match.group('title') else 'none',
            'Timestamp': match.group('timestamp') if match.group('timestamp') else 'none'
        })
    else:
        extracted_data.append({
            'Input': input_str,
            'Type': 'No match found',
            'Case': 'No match found',
            'Title': 'No match found',
            'Timestamp': 'No match found'
        })

print("| Input                                              | Type       | Case    | Title                                | Timestamp       |")
print("|----------------------------------------------------|------------|---------|--------------------------------------|-----------------|")
for data in extracted_data:
    print("| {:<50} | {:<10} | {:<7} | {:<36} | {:<15} |".format(data['Input'], data['Type'], data['Case'], data['Title'], data['Timestamp']))

这是它提供的输出:

| Input                                              | Type       | Case    | Title                                | Timestamp       |
|----------------------------------------------------|------------|---------|--------------------------------------|-----------------|
| exmpl-staging-1234-e2e-1707336983872               | staging    | 1234    | none                                 | 1707336983872   |
| exmpl-staging-1234-e2e-1706336983875               | staging    | 1234    | none                                 | 1706336983875   |
| exmpl-staging-main-e2e-1707336983878               | staging    | main    | none                                 | 1707336983878   |
| exmpl-demo-e2e-1707336983878                       | demo       | none    | none                                 | 1707336983878   |
| exmpl-production-e2e-1707336983875                 | production | none    | none                                 | 1707336983875   |
| exmpl-staging-2345                                 | staging    | 2345    | none                                 | none            |
| exmpl-staging-1234                                 | staging    | 1234    | none                                 | none            |
| exmpl-staging-1234-my-case-title                   | staging    | 1234    | my-case-title                        | none            |
| exmpl-staging-1234-my-case-title-e2e-1707336983872 | staging    | 1234    | my-case-title-e2e-1707336983872      | none            |

它一直按预期工作,直到最后input,其中timestamp是空的,timestamp被错误地捕获为title组的一部分.我做错了什么?

推荐答案

使title组非贪婪(Regextitle):

^exmpl-(?P<type>main|staging|demo|production)(?:-(?P<case>\d+|main))?(?:-(?P<title>(?!e2e-\d+).+?))?(?:-e2e-(?P<timestamp>\d+))?$

import re

# Define the input strings
inputs = [
    "exmpl-staging-1234-e2e-1707336983872",
    "exmpl-staging-1234-e2e-1706336983875",
    "exmpl-staging-main-e2e-1707336983878",
    "exmpl-demo-e2e-1707336983878",
    "exmpl-production-e2e-1707336983875",
    "exmpl-staging-2345",
    "exmpl-staging-1234",
    "exmpl-staging-1234-my-case-title",
    "exmpl-staging-1234-my-case-title-e2e-1707336983872",
]

# Define the regex pattern
pattern = re.compile(
    r"^exmpl-(?P<type>main|staging|demo|production)(?:-(?P<case>\d+|main))?(?:-(?P<title>(?!e2e-\d+).+?))?(?:-e2e-(?P<timestamp>\d+))?$"
)

# Initialize a list to store the extracted data
extracted_data = []

# Loop through the input strings
for input_str in inputs:
    # Match the pattern against the input string
    match = pattern.match(input_str)

    # Extract the required information
    if match:
        extracted_data.append(
            {
                "Input": input_str,
                "Type": match.group("type"),
                "Case": match.group("case") if match.group("case") else "none",
                "Title": match.group("title") if match.group("title") else "none",
                "Timestamp": match.group("timestamp")
                if match.group("timestamp")
                else "none",
            }
        )
    else:
        extracted_data.append(
            {
                "Input": input_str,
                "Type": "No match found",
                "Case": "No match found",
                "Title": "No match found",
                "Timestamp": "No match found",
            }
        )

print(
    "| Input                                              | Type       | Case    | Title                                | Timestamp       |"
)
print(
    "|----------------------------------------------------|------------|---------|--------------------------------------|-----------------|"
)
for data in extracted_data:
    print(
        "| {:<50} | {:<10} | {:<7} | {:<36} | {:<15} |".format(
            data["Input"], data["Type"], data["Case"], data["Title"], data["Timestamp"]
        )
    )

打印:

| Input                                              | Type       | Case    | Title                                | Timestamp       |
|----------------------------------------------------|------------|---------|--------------------------------------|-----------------|
| exmpl-staging-1234-e2e-1707336983872               | staging    | 1234    | none                                 | 1707336983872   |
| exmpl-staging-1234-e2e-1706336983875               | staging    | 1234    | none                                 | 1706336983875   |
| exmpl-staging-main-e2e-1707336983878               | staging    | main    | none                                 | 1707336983878   |
| exmpl-demo-e2e-1707336983878                       | demo       | none    | none                                 | 1707336983878   |
| exmpl-production-e2e-1707336983875                 | production | none    | none                                 | 1707336983875   |
| exmpl-staging-2345                                 | staging    | 2345    | none                                 | none            |
| exmpl-staging-1234                                 | staging    | 1234    | none                                 | none            |
| exmpl-staging-1234-my-case-title                   | staging    | 1234    | my-case-title                        | none            |
| exmpl-staging-1234-my-case-title-e2e-1707336983872 | staging    | 1234    | my-case-title                        | 1707336983872   |

Python相关问答推荐

具有多个选项的计数_匹配

在Python中处理大量CSV文件中的数据

海运图:调整行和列标签

在Python中管理打开对话框

把一个pandas文件夹从juyter笔记本放到堆栈溢出问题中的最快方法?

NumPy中条件嵌套for循环的向量化

python中的解释会在后台调用函数吗?

CommandeError:模块numba没有属性generated_jit''''

如何禁用FastAPI应用程序的Swagger UI autodoc中的application/json?

Python—为什么我的代码返回一个TypeError

巨 Python :逆向猜谜游戏

当条件满足时停止ODE集成?

如何在Python 3.9.6和MacOS Sonoma 14.3.1下安装Pyregion

根据Pandas中带条件的两个列的值创建新列

如何在一组行中找到循环?

如何将返回引用的函数与pybind11绑定?

如何根据一定条件生成段id

将相应的值从第2列合并到第1列(Pandas )

具有不匹配列的2D到3D广播

如何在Polars中处理用户自定义函数的多行结果?