Python 正则表达式在缩写后添加逗号

发布于04月14日

我想在缩写后面添加逗号和空格, ，缩写的定义是单个或多个字母后跟一个圆点，后跟一个或多个字母重复两次或更多次.例如，这些被认为是缩写A.b.C. a.b. ab.cd. ab.cde. ab.cd.ef.gh.，而不是缩写a.b或A. B 我不想加逗号:

如果缩写的最后一点是给定文本的结尾，
如果缩写后面有可选的空格和大写字母，或者
如果缩写后面有可选的空格和另一个标点符号.

给出以下测试句子:

test_str = """This is an example e.g. sentence and this is with i.e. text and two abbreviations S.T.R. and K.LM.NO.P. as example with acronym.
            but in here it shouldn't catch it because after that there is space and dot g.k. . Also here it shouldn't detect because the next sentence starts with capital A.BC.D.
            And this is a normal sentence. Followed by another normal sentence. This contains only one letter A. and is not abbreviation.
            This shouldn't match i.e., since it contains already a comma. I like to read books such as e.g. book 1 or i.e. book2.
            A.B.C. is an abbreviation that should match. A.B.! is an abbreviation that shouldn't match because it has ! after the abbreviation. 
            A.B.? is an abbreviation that shouldn't match because it has ? after the abbreviation. 
            A.B. ; is an abbreviation that shouldn't match because it has a space and ; after the abbreviation.
            a.b.c.d. is an abbreviation that should match.
            a.b.c., is an abbreviation that shouldn't match because it already has a comma. A.B is not an abbreviation because it contains only one dot.
            Another abbreviation that should not match j.j.L.o.U.h."""

我希望输出如下:

output_text = """This is an example e.g., sentence and this is with i.e., text and two abbreviations S.T.R., and K.LM.NO.P., as example with acronym.
            but in here it shouldn't catch it because after that there is space and dot g.k. . Also here it shouldn't detect because the next sentence starts with capital A.BC.D.
            And this is a normal sentence. Followed by another normal sentence. This contains only one letter A. and is not abbreviation.
            This shouldn't match i.e., since it contains already a comma. I like to read books such as e.g., book 1 or i.e., book2.
            A.B.C., is an abbreviation that should match. A.B.! is an abbreviation that shouldn't match because it has ! after the abbreviation. 
            A.B.? is an abbreviation that shouldn't match because it has ? after the abbreviation. 
            A.B. ; is an abbreviation that shouldn't match because it has a space and ; after the abbreviation.
            a.b.c.d., is an abbreviation that should match.
            a.b.c., is an abbreviation that shouldn't match because it already has a comma. A.B is not an abbreviation because it contains only one dot.
            Another abbreviation that should not match j.j.L.o.U.h."""

我现在使用的是以下内容:

regex = r"(\b(?:[A-Za-z]\.){2,}(?!\s*[,.;?!-]))"

但它会产生以下输出:

This is an example e.g., sentence and this is with i.e., text and two abbreviations S.T.R., and K.LM.NO.P. as example with acronym. but in here it shouldn't catch it because after that there is space and dot g.k. . Also here it shouldn't detect because the next sentence starts with capital A.BC.D. And this is a normal sentence. Followed by another normal sentence. This contains only one letter A. and is not abbreviation. This shouldn't match i.e., since it contains already a comma. I like to read books such as e.g., book 1 or i.e., book2. A.B.C., is an abbreviation that should match. A.B.! is an abbreviation that shouldn't match because it has ! after the abbreviation. A.B.? is an abbreviation that shouldn't match because it has ? after the abbreviation. A.B. ; is an abbreviation that shouldn't match because it has a space and ; after the abbreviation. a.b.c.d., is an abbreviation that should match. a.b., c., is an abbreviation that shouldn't match because it already has a comma. A.B is not an abbreviation because it contains only one dot.
Another abbreviation that should not match j.j.L.o.U.h.,

我的正则表达式失败的 case 以粗体显示.它们应该是K.LM.NO.P.,、a.b.c.,和j.j.L.o.U.h.，因为第一个应该被检测为缩写，第二个应该在最后一个点之后包含一个标点符号，最后一个是给定文本的结尾.

有没有办法实现这一点？如有任何帮助，不胜感激！

Python 正则表达式在缩写后添加逗号

推荐答案

Python相关问答推荐

我必须将Sigmoid函数与r2值的两种类型的数据集(每种6个数据集)进行匹配，然后绘制匹配函数的求导.我会犯错

Python 3.12中的通用[T]类方法隐式类型检索

如何在WSL2中更新Python到最新版本(3.12.2)？

当递归函数的返回值未绑定到变量时，非局部变量不更新：

当点击tkinter菜单而不是菜单选项时，如何执行命令？

Python导入某些库时非法指令(核心转储)(beautifulsoup4."" yfinance)

使用Python从URL下载Excel文件

UNIQUE约束失败：customuser. username

Tkinter菜单自发添加额外项目

为什么numpy. vectorize调用vectorized函数的次数比vector中的元素要多？

处理具有多个独立头的CSV文件

在代码执行后关闭ChromeDriver窗口

Gekko中基于时间的间隔约束

如何使用正则表达式修改toml文件中指定字段中的参数值

pytest、xdist和共享生成的文件依赖项

.awk文件可以使用子进程执行吗？

对包含JSON列的DataFrame进行分组

如何删除剪裁圆的对角线的外部部分

将标签与山脊线图对齐

Pandas 数据框自定义排序功能