我在Spacy中使用TextRank管道对文档进行摘要.我需要总结一下长篇和短篇的文件.你能建议一个好的方法来 Select Limit_Phrase的正确参数吗?
这是我目前使用的方法,但我相信它可以改进:
import spacy
import pytextrank
nlp = spacy.load(spacy_model)
nlp.add_pipe("textrank", last=True)
# Process the input text
doc = nlp(text)
doc_sentences = len(list(doc.sents))
print(f'Number of document sentences = {doc_sentences}')
limit_sentences = int(doc_sentences * percentage)
limit_phrases = int(limit_sentences * 2)
top_sentences = doc._.textrank.summary(limit_phrases=limit_phrases, limit_sentences=limit_sentences, preserve_order=True)