Python - 词干和词法化

Python - 词干和词法化 首页 / 数据科学入门教程 / Python - 词干和词法化

在自然语言处理领域,无涯教程遇到了两个或两个以上单词具有共同词根的情况,涉及任何这些单词的搜索应将它们视为相同的单词,即词根。因此,将所有单词到其根词变得至关重要, NLTK库具有执行并提供显示根词的输出的方法。

下面的程序使用Porter Stemming Algorithm进行词干分析。

import nltk
from nltk.stem.porter import PorterStemmer
porter_stemmer = PorterStemmer()

word_data = "It originated from the idea that there are readers who prefer learning new skills from the comforts of their drawing rooms"
# 第一个词标记化
nltk_tokens = nltk.word_tokenize(word_data)
#接下来找到单词的词根
for w in nltk_tokens:
       print "Actual: %s  Stem: %s"  % (w,porter_stemmer.stem(w))

当执行上面的代码时,它将产生以下输出。

Actual: It  Stem: It
Actual: originated  Stem: origin
Actual: from  Stem: from
Actual: the  Stem: the
Actual: idea  Stem: idea
Actual: that  Stem: that
Actual: there  Stem: there
Actual: are  Stem: are
Actual: readers  Stem: reader
Actual: who  Stem: who
Actual: prefer  Stem: prefer
Actual: learning  Stem: learn
Actual: new  Stem: new
Actual: skills  Stem: skill
Actual: from  Stem: from
Actual: the  Stem: the
Actual: comforts  Stem: comfort
Actual: of  Stem: of
Actual: their  Stem: their
Actual: drawing  Stem: draw
Actual: rooms  Stem: room

在下面的程序中,使用WordNet词汇数据库进行词素化。

无涯教程网

import nltk
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

word_data = "It originated from the idea that there are readers who prefer learning new skills from the comforts of their drawing rooms"
nltk_tokens = nltk.word_tokenize(word_data)
for w in nltk_tokens:
       print "Actual: %s  Lemma: %s"  % (w,wordnet_lemmatizer.lemmatize(w))

当无涯教程执行上面的代码时,它将产生以下输出。

链接:https://www.learnfk.comhttps://www.learnfk.com/python-data-science/python-stemming-and-lemmatization.html

来源:LearnFk无涯教程网

Actual: It  Lemma: It
Actual: originated  Lemma: originated
Actual: from  Lemma: from
Actual: the  Lemma: the
Actual: idea  Lemma: idea
Actual: that  Lemma: that
Actual: there  Lemma: there
Actual: are  Lemma: are
Actual: readers  Lemma: reader
Actual: who  Lemma: who
Actual: prefer  Lemma: prefer
Actual: learning  Lemma: learning
Actual: new  Lemma: new
Actual: skills  Lemma: skill
Actual: from  Lemma: from
Actual: the  Lemma: the
Actual: comforts  Lemma: comfort
Actual: of  Lemma: of
Actual: their  Lemma: their
Actual: drawing  Lemma: drawing
Actual: rooms  Lemma: room

祝学习愉快!(内容编辑有误?请选中要编辑内容 -> 右键 -> 修改 -> 提交!)

技术教程推荐

深入浅出gRPC -〔李林锋〕

Go 并发编程实战课 -〔晁岳攀(鸟窝)〕

etcd实战课 -〔唐聪〕

说透区块链 -〔自游〕

手把手带你写一门编程语言 -〔宫文学〕

陈天 · Rust 编程第一课 -〔陈天〕

零基础入门Spark -〔吴磊〕

手把手带你搭建秒杀系统 -〔佘志东〕

React Native 新架构实战课 -〔蒋宏伟〕

好记忆不如烂笔头。留下您的足迹吧 :)