如何获得单字令牌的词干形式?这是我的代码.它对某些单词有效,但对其他单词无效.
let text = "people" // works
// let text = "geese" // doesn't work
let tagger = NLTagger(tagSchemes: [.lemma])
tagger.string = text
let (tag, range) = tagger.tag(at: text.startIndex, unit: .word, scheme: .lemma)
let stemForm = tag?.rawValue ?? String(text[range])
然而,如果我对整篇文章进行词干分类,就能找到单词的所有词干形式.
let text = "This is text with plurals such as geese, people, and millennia."
let tagger = NLTagger(tagSchemes: [.lemma])
tagger.string = text
var words: [String] = []
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lemma, options: [.omitWhitespace, .omitPunctuation]) { tag, range in
let stemForm = tag?.rawValue ?? String(text[range])
words += [stemForm]
return true
}
// this be text with plural such as goose person and millennium
words.joined(separator: " ")
另外,有没有可能逆转这个过程,找到一个词干的复数版本?