Python 使用 set() 提取包含关键字的句子

发布于06月30日

我正在try 使用set提取包含选定关键字的句子.交点().

到目前为止，我只得到了'van'这个词的句子.我无法得到'blue tinge'或'off the road'这个词的句子，因为下面的代码只能处理单个关键字.

为什么会发生这种情况，我能做些什么来解决这个问题？非常感谢.

from textblob import TextBlob
import nltk
nltk.download('punkt')

search_words = set(["off the road", "blue tinge" ,"van"])

blob = TextBlob("That is the off the road vehicle I had in mind for my adventure. 
Which one? The one with the blue tinge. Oh, I'd use the money for a van.")

matches = []

for sentence in blob.sentences:
    blobwords = set(sentence.words) 
    if search_words.intersection(blobwords):  
        matches.append(str(sentence))

print(matches)

Output: ["Oh, I'd use the money for a van."]

推荐答案

如果要判断搜索关键字的精确匹配，可以使用以下方法完成:

from nltk.tokenize import sent_tokenize
text = "That is the off the road vehicle I had in mind for my adventure. Which one? The one with the blue tinge. Oh, I'd use the money for a van."
search_words = ["off the road", "blue tinge" ,"van"]
matches = []
sentances = sent_tokenize(text)
for word in search_words:
   for sentance in sentances:
       if word in sentance:
           matches.append(sentance)
print(matches)

输出为:

['That is the off the road vehicle I had in mind for my adventure.',
 "Oh, I'd use the money for a van.",
 'The one with the blue tinge.']

如果需要部分匹配，则使用fuzzywuzzy进行百分比匹配.

Python相关问答推荐

为什么tkinter框架没有被隐藏？

Python 使用 set() 提取包含关键字的句子

推荐答案

Python相关问答推荐

为什么tkinter框架没有被隐藏？

numba jitClass，记录类型为字符串

对于一个给定的数字，找出一个整数的最小和最大可能的和

运行Python脚本时，用作命令行参数的SON文本

使用setuptools pyproject.toml和自定义目录树构建PyPi包

删除字符串中第一次出现单词后的所有内容

优化器的运行顺序影响PyTorch中的预测

用砂箱开发Web统计分析

如何启动下载并在不击中磁盘的情况下呈现响应？

在单次扫描中创建列表

Pandas：计算中间时间条目的总时间增量

不允许 Select 北极滚动？

numpy数组和数组标量之间的不同行为

简单 torch 模型测试：ModuleNotFoundError：没有名为'；Ultralytics.yolo'；

获取git修订版中每个文件的最后修改时间的最有效方法是什么？

PySpark：如何最有效地读取不同列位置的多个CSV文件

read_csv分隔符正在创建无关的空列

使用Django标签显示信息

3.我无法找到制作这种三角形图案的方法

判断字典键、值对是否满足用户定义的搜索条件