Python Spacy，用空格清理文本时如何不删除不

发布于08月02日

我使用这个空格代码稍后将其应用于我的文本，但我需要否定词留在文本中，如"Not".

nlp = spacy.load("en_core_web_sm") 

def my_tokenizer(sentence): 
    return [token.lemma_ for token in tqdm(nlp(sentence.lower()), leave = False) if token.is_stop == False and token.is_alpha == True and  token.lemma_ ]

当我申请时，我得到的结果是:

[hello, earphone, work]

然而，最初的句子是

hello,my earphones are still not working.

所以，我想看到下面这句话:[earphone, still, not, work] 谢谢

解

要解决这个问题，您应该从STOP_WORD列表中删除诸如"NOT"之类的目标单词.您可以这样做:

spacy.lang.en.stop_words.STOP_WORDS.remove("not")

然后，您可以重新运行代码，您将获得预期的结果:

import spacy
spacy.lang.en.stop_words.STOP_WORDS.remove("not")
nlp = spacy.load("en_core_web_sm") 
def my_tokenizer(sentence): 
    return [token.lemma_ for token in tqdm(nlp(sentence.lower()), leave = False) if token.is_stop == False and token.is_alpha == True and  token.lemma_ ] 

sentence = "hello,my earphones are still not working."
results = my_tokenizer(sentence)
print(results)

#['hello', 'earphone', 'not', 'work']