Python Beautifulsoup 没有从动态网站中提取  标签

发布于04月03日

我有一个网站https://dip.bundestag.de/aktivit%C3%A4t/Dr--Holger-Becker-MdB-SPD/1628877，我想提取的网页连接到"BT-PlenarProtokoll 20/86，S.10313C".该HTML块是:

<a title="PDF Bundestags-Plenarprotokoll öffnen" aria-label="BT-Plenarprotokoll" href="https://dserver.bundestag.de/btp/20/20086.pdf#P.10313" target="_self" class="hsbfb4-0 sc-1xaeas4-1 hTYfHF FZiNn"><svg viewBox="0 0 10 12" class="sc-1c5ggr5-17 cYBAUx"><g stroke="currentColor" fill="none" fill-rule="evenodd"><path d="M6.14.5H.5v11h9V3.86z"></path><path d="M5.56 2.01v2.51H9.5"></path></g></svg><span class="sc-1xaeas4-3 iZuhXx">BT-Plenarprotokoll 20/86, S. 10313C</span></a>

出于任何原因，BeautifulSoup无法识别此网页上的任何标签.我试过不同的代码:

from bs4 import BeautifulSoup

   
url = "https://dip.bundestag.de/aktivit%C3%A4t/Dr--Holger-Becker-MdB-SPD/1628877"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the anchor tag with title 和 aria-label attributes 和 extract the href attribute
a_tag = soup.findAll('a', {'title': 'PDF Bundestags-Plenarprotokoll öffnen', 'aria-label': 'BT-Plenarprotokoll'})

和

url = "https://dip.bundestag.de/aktivit%C3%A4t/Dr--Holger-Becker-MdB-SPD/1628877"

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the anchor tag with title 和 aria-label attributes 和 extract the href attribute
a_tag = soup.findAll('a')

In both cases, a_tag is an empty object, 和 I don't underst和, since this webpage has more than one link.

import requests headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36', 'Authorization': 'ApiKey GmEPb1B.bfqJLIhcGAsH9fTJevTglhFpCoZyAAAdhp', } url = "https://search.dip.bundestag.de/api/v1/aktivitaet?f.id=1628877" documents = requests.get(url, headers=headers).json()["documents"] print(documents[0]["fundstelle"]["pdf_url"])

Python Beautifulsoup 没有从动态网站中提取  标签

推荐答案

Python相关问答推荐

如何让pyparparsing匹配1天或2天，但1天和2天失败？

按照行主要蛇扫描顺序对点列表进行排序

如何使用Python中的clinicalTrials.gov API获取完整结果？

使用pandas、matplotlib和Yearbox绘制时显示错误的年份

Python 3.12中的通用[T]类方法隐式类型检索

@Property方法上的inspect.getmembers出现意外行为，引发异常

查找两极rame中组之间的所有差异

如何访问所有文件，例如环境变量

在Pandas DataFrame操作中用链接替换'方法的更有效方法

Python库：可选地支持numpy类型，而不依赖于numpy

在Wayland上使用setCellWidget时，try 编辑QTable Widget中的单元格时，PyQt 6崩溃

django禁止直接分配到多对多集合的前端.使用user.set()

SQLAlchemy Like ALL ORM analog

从spaCy的句子中提取日期

在Python中，从给定范围内的数组中提取索引组列表的更有效方法

matplotlib + python foor loop

下三角形掩码与seaborn clustermap bug

导入错误：无法导入名称'；操作'；

如何在Python Pandas中填充外部连接后的列中填充DDL值

使用tqdm的进度条