Python 自定义命名实体识别

发布于08月12日

我有这样一句话:

text="The weather is extremely severe in England"

我想做一个定制的Name Entity Recognition (NER)度手术

首先，正常的NER过程将输出带有GPE标签的England

pip install spacy

!python -m spacy download en_core_web_lg

import spacy
nlp = spacy.load('en_core_web_lg')

doc = nlp(text)

for ent in doc.ents:
    print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))

Result: England - GPE - Countries, cities, states

然而，我希望整句话都有标签High-Severity.

因此，我正在执行以下步骤:

from spacy.strings import StringStore

new_hash = StringStore([u'High_Severity']) # <-- match id
nlp.vocab.strings.add('High_Severity')

from spacy.tokens import Span

# Get the hash value of the ORG entity label
High_Severity = doc.vocab.strings[u'High_Severity']  

# Create a Span for the new entity
new_ent = Span(doc, 0, 7, label=High_Severity)

# Add the entity to the existing Doc object
doc.ents = list(doc.ents) + [new_ent]

我接受以下错误:

ValueError: [E1010] Unable to set entity information for token 6 which is included in more than one span in entities, blocked, missing or outside.

据我所知，这是因为NER已经将England识别为GRE，不能在现有标签上添加标签.

我试图执行定制的NER代码(也就是，没有先运行正常的NER代码)，但这并没有解决我的问题.

对如何解决这个问题有什么 idea 吗？

Python 自定义命名实体识别

推荐答案

Python相关问答推荐

将两只Pandas rame乘以指数

无法通过python-jira访问jira工作日志(log)中的 comments

大小为M的第N位_计数(或人口计数)的公式

在Python中管理打开对话框

数据抓取失败：寻求帮助

OR—Tools中CP—SAT求解器的IntVar设置值

如何在图中标记平均点？

python中的解释会在后台调用函数吗？

如何从列表框中 Select 而不出错？

try 检索blob名称列表时出现错误填充错误""

在方法中设置属性值时，如何处理语句不可达[Unreacable]"；的问题？

使用Python异步地持久跟踪用户输入

Pandas在rame中在组内洗牌行，保持相对组的顺序不变，

如何在PythonPandas 中对同一个浮动列进行逐行划分？

Python日志(log)库如何有效地获取lineno和funcName？

Seaborn散点图使用多个不同的标记而不是点

如何将列表从a迭代到z-以抓取数据并将其转换为DataFrame？

.awk文件可以使用子进程执行吗？

为什么在安装了64位Python的64位Windows 10上以32位运行？

如何导入与我试图从该目录之外运行的文件位于同一目录中的Python文件？