我试图从页面上刮所有匹配报告链接,但有'加载更多'按钮,我不想使用selenium.有没有任何解决方案来收集所有链接没有selenium. 先谢谢你.

以下是我try 的:

 from bs4 import BeautifulSoup as bs
 import requests
 r=requests.get('https://www.iplt20.com/news/match-reports')
 soup = bs(r.text,'lxml')

 for match in soup.find_all('div',class_='latest-slider-wrap 
 position-relative'):
      links = match.find('a')
      print(links['href'])

推荐答案

试着:

import requests
from bs4 import BeautifulSoup

url = "https://www.iplt20.com/news/match-reports"

soup = BeautifulSoup(requests.get(url).content, "html.parser")

for a in soup.select("#div-match-report a:has(li)"):
    print(a["href"])

打印:

https://www.iplt20.com/news/4014/tata-ipl-2024-match-11-lsg-vs-pbks-match-report
https://www.iplt20.com/news/4012/tata-ipl-2024-match-10-rcb-vs-kkr-match-report
https://www.iplt20.com/news/4011/tata-ipl-2024-match-09-rr-vs-dc-match-report
https://www.iplt20.com/news/4009/tata-ipl-2024-match-08-srh-vs-mi-match-report
https://www.iplt20.com/news/4007/tata-ipl-2024-match-07-csk-vs-gt-match-report
https://www.iplt20.com/news/4006/tata-ipl-2024-match-06-rcb-vs-pbks-match-report
https://www.iplt20.com/news/4004/tata-ipl-2024-match-05-gt-vs-mi-match-report
https://www.iplt20.com/news/4003/tata-ipl-2024-match-04-rr-vs-lsg-match-report
https://www.iplt20.com/news/4001/tata-ipl-2024-match-03-kkr-vs-srh-match-report
https://www.iplt20.com/news/4000/tata-ipl-2024-match-02-pbks-vs-dc-match-report
https://www.iplt20.com/news/3999/tata-ipl-2024-match-01-csk-vs-rcb-match-report
https://www.iplt20.com/news/3976/tata-ipl-2023-final-csk-vs-gt-match-report
https://www.iplt20.com/news/3974/tata-ipl-2023-qualifier-2-gt-vs-mi-match-report
https://www.iplt20.com/news/3973/tata-ipl-2023-eliminator-lsg-vs-mi-match-report
https://www.iplt20.com/news/3972/tata-ipl-2023-qualifier-1-gt-vs-csk-match-report
https://www.iplt20.com/news/3971/tata-ipl-2023-match-70-rcb-vs-gt-match-report
https://www.iplt20.com/news/3970/tata-ipl-2023-match-69-mi-vs-srh-match-report
https://www.iplt20.com/news/3969/tata-ipl-2023-match-68-kkr-vs-lsg-match-report
https://www.iplt20.com/news/3968/tata-ipl-2023-match-67-dc-vs-csk-match-report
https://www.iplt20.com/news/3967/tata-ipl-2023-match-66-pbks-vs-rr-match-report
https://www.iplt20.com/news/3966/tata-ipl-2023-match-65-srh-vs-rcb-match-report

编辑:要获得所有链接,你可以使用他们的Ajax分页API:

import requests

api_url = "https://www.iplt20.com/add-more-match-report?page={page}&type=match-reports"

for page in range(1, 4):  # <-- adjust number of pages here
    print(f"{page=}")
    data = requests.get(api_url.format(page=page)).json()
    for d in data["newsResponce"]["data"]:
        print(f'https://www.iplt20.com/news/{d["id"]}/{d["titleUrlSegment"]}')

打印:


...

page=2
https://www.iplt20.com/news/3964/tata-ipl-2023-match-64-pbks-vs-dc-match-report
https://www.iplt20.com/news/3963/tata-ipl-2023-match-63-lsg-vs-mi-match-report
https://www.iplt20.com/news/3962/tata-ipl-2023-match-62-gt-vs-srh-match-report
https://www.iplt20.com/news/3960/tata-ipl-2023-match-61-csk-vs-kkr-match-report
https://www.iplt20.com/news/3959/tata-ipl-2023-match-60-rr-vs-rcb-match-report
https://www.iplt20.com/news/3958/tata-ipl-2023-match-59-dc-vs-pbks-match-report
https://www.iplt20.com/news/3956/tata-ipl-2023-match-58-srh-vs-lsg-match-report
https://www.iplt20.com/news/3955/tata-ipl-2023-match-57-mi-vs-gt-match-report
https://www.iplt20.com/news/3953/tata-ipl-2023-match-56-kkr-vs-rr-match-report
https://www.iplt20.com/news/3952/tata-ipl-2023-match-55-csk-vs-dc-match-report
https://www.iplt20.com/news/3951/tata-ipl-2023-match-54-mi-vs-rcb-match-report
https://www.iplt20.com/news/3947/tata-ipl-2023-match-53-kkr-vs-pbks-match-report
https://www.iplt20.com/news/3946/tata-ipl-2023-match-52-rr-vs-srh-match-report
https://www.iplt20.com/news/3945/tata-ipl-2023-match-51-gt-vs-lsg-match-report
https://www.iplt20.com/news/3944/tata-ipl-2023-match-50-dc-vs-rcb-match-report
https://www.iplt20.com/news/3943/tata-ipl-2023-match-49-csk-vs-mi-match-report
https://www.iplt20.com/news/3942/tata-ipl-2023-match-48-rr-vs-gt-match-report
https://www.iplt20.com/news/3940/tata-ipl-2023-match-47-srh-vs-kkr-match-report
https://www.iplt20.com/news/3938/tata-ipl-2023-match-46-pbks-vs-mi-match-report
https://www.iplt20.com/news/3937/tata-ipl-2023-match-45-lsg-vs-csk-match-report
https://www.iplt20.com/news/3936/tata-ipl-2023-match-44-gt-vs-dc-match-report
page=3
https://www.iplt20.com/news/3934/tata-ipl-2023-match-43-lsg-vs-rcb-match-report
https://www.iplt20.com/news/3932/tata-ipl-2023-match-42-mi-vs-rr-match-report
https://www.iplt20.com/news/3931/tata-ipl-2023-match-41-csk-vs-pbks-match-report
https://www.iplt20.com/news/3930/tata-ipl-2023-match-40-dc-vs-srh-match-report

...

Python相关问答推荐

GL pygame无法让缓冲区与vertextPointer和colorPointer一起可靠地工作

如何使用html从excel中提取条件格式规则列表?

Python键入协议默认值

Godot:需要碰撞的对象的AdditionerBody2D或Area2D以及queue_free?

Django REST Framework:无法正确地将值注释到多对多模型,不断得到错误字段名称字段对模型无效'<><>

如何在Python中使用另一个数据框更改列值(列表)

旋转多边形而不改变内部空间关系

基于行条件计算(pandas)

剪切间隔以添加特定日期

Python—为什么我的代码返回一个TypeError

如何将一组组合框重置回无 Select tkinter?

Js的查询结果可以在PC Chrome上显示,但不能在Android Chrome、OPERA和EDGE上显示,而两者都可以在Firefox上运行

如何训练每一个pandaprame行的线性回归并生成斜率

如何防止html代码出现在quarto gfm报告中的pandas表之上

有没有一种方法可以在朗肯代理中集成向量嵌入

在不中断格式的情况下在文件的特定部分插入XML标签

将数据从一个单元格保存到Jupyter笔记本中的下一个单元格

来自任务调度程序的作为系统的Python文件

在不降低分辨率的情况下绘制一组数据点的最外轮廓

Pandas:根据相邻行之间的差异过滤数据帧