我使用BeautifulSoup抓取TopCashBack网站链接已经有几年了,但当我将URL更改为Screwfix链接时,我没有得到任何数据.

s = requests.get("https://www.screwfix.com/p/128hf")
soup = BeautifulSoup(s.text,'lxml')

print(soup)

我的具体问题是:我收到空数据是因为Screwfix网站正在检测和防止抓取,还是因为他们的网站需要指定不同的解析器而不是‘lxml’?

推荐答案

你只需要在提出请求时通过User-Agent分,它就会奏效.

import requests
from bs4 import BeautifulSoup

s = requests.get("https://www.screwfix.com/p/128hf", headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"})
soup = BeautifulSoup(s.text,'lxml')
print(soup.prettify())

输出:

<!DOCTYPE html>
<html lang="en-GB" prefix="og: https://ogp.me/ns#">
 <head>
  <link href="//tags.tiqcdn.com" rel="dns-prefetch"/>
  <link href="//media.screwfix.com" rel="dns-prefetch"/>
  <meta charset="utf-8"/>
  <script type="text/javascript">
   /*
   .......
   .......
}
  </style>
  <link href="https://media.screwfix.com/" rel="dns-prefetch"/>
  <link crossorigin="anonymous" href="https://media.screwfix.com/" rel="preconnect"/>
  <script async="" src="https://tags.tiqcdn.com/utag/kingfisher/screwfix-fusionx/prod/utag.js">
  </script>
  <meta content="telephone=no" name="format-detection"/>
  <link href="/favicon.ico" rel="icon"/>
  <meta content="Product Details Page" name="page-name"/>
  <meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" name="viewport"/>
  <title>
   Evolution R255SMS-DB 255mm  Electric Double-Bevel Sliding Multi-Material Mitre Saw 220-240V - Screwfix
  </title>
  <link href="https://www.screwfix.com/p/evolution-r255sms-db-255mm-electric-double-bevel-sliding-multi-material-mitre-saw-220-240v/128hf" rel="canonical"/>
  <meta content="https://www.screwfix.com/p/evolution-r255sms-db-255mm-electric-double-bevel-sliding-multi-material-mitre-saw-220-240v/128hf" property="og:url"/>
  <meta content="Screwfix.com" property="og:site_name"/>
  <meta content="product" property="og:type"/>
  <meta content="Evolution R255SMS-DB 255mm  Electric Double-Bevel Sliding Multi-Material Mitre Saw 220-240V - Screwfix" property="og:title"/>
  <meta content="Order online at Screwfix.com. Uses a single blade to cut mild steel, non-ferrous metals, plastic and wood, even if nails are embedded in the material. Provides clean and precise cuts no matter the material. Bevels to 45° in both directions and offers a maximum cross cut of 300 x 80mm both ways. Integrated laser cutting guide and positive bevel stops provide accuracy with every cut. Durable die-cast aluminium base supports a variety of materials. Features powerful 2000W motor and ergonomic over-moulded, in-line handle. On-board tool storage for convenient storage of the blade change hex key. FREE next day delivery available, free collection in 1 minute." property="og:description"/>
  <meta content="221855807852136" property="fb:app_id"/>
  <meta content="https://media.screwfix.com/is/image/ae235/128HF_P" property="og:image"/>
  <meta content="Order Evolution R255SMS-DB 255mm  Electric Double-Bevel Sliding Multi-Material Mitre Saw 220-240V at Screwfix.com. Screwfix customers rate this product 4.7/5. FREE next day delivery available, free collection in 1 minute." name="description"/>
  <script data-qaid="seo-properties" type="application/ld+json">
   {"@context":"https://schema.org/","@type":"Product","@id":"https://www.screwfix.com/p/evolution-r255sms-db-255mm-electric-double-bevel-sliding-multi-material-mitre-saw-220-240v/128hf","image":["https://media.screwfix.com/is/image/ae235/128HF_P","https://media.screwfix.com/is/image/ae235/128HF_A1","https://media.screwfix.com/is/image/ae235/128HF_A2","https://media.screwfix.com/is/image/ae235/128HF_A3"],"name":"Evolution R255SMS-DB 255mm  Electric Double-Bevel Sliding Multi-Material Mitre Saw 220-240V","url":"https://www.screwfix.com/p/evolution-r255sms-db-255mm-electric-double-bevel-sliding-multi-material-mitre-saw-220-240v/128hf","description":"Uses a single blade to cut mild steel, non-ferrous metals, plastic and wood, even if nails are embedded in the material. Provides clean and precise cuts no matter the material. Bevels to 45° in both directions and offers a maximum cross cut of 300 x 80mm both ways. Integrated laser cutting guide and positive bevel stops provide accuracy with every cut. Durable die-cast aluminium base supports a variety of materials. Features powerful 2000W motor and ergonomic over-moulded, in-line handle. On-board tool storage for convenient storage of the blade change hex key.","sku":"128HF","brand":{"@type":"brand","name":"Evolution"},"offers":[{"@type":"Offer","url":"https://www.screwfix.com/p/evolution-r255sms-db-255mm-electric-double-bevel-sliding-multi-material-mitre-saw-220-240v/128hf","itemCondition":"https://schema.org/NewCondition","price":199.99,"priceCurrency":"GBP","potentialAction":{"@type":"Action","url":"https://schema.org/BuyAction"},"availableDeliveryMethod":"https://schema.org/OnSitePickup","availability":"https://schema.org/InStock"},{"@type":"Offer","url":"https://www.screwfix.com/p/evolution-r255sms-db-255mm-electric-double-bevel-sliding-multi-material-mitre-saw-220-240v/128hf","itemCondition":"https://schema.org/NewCondition","price":199.99,"priceCurrency":"GBP","potentialAction":{"@type":"Action","url":"https://schema.org/BuyAction"},"availableDeliveryMethod":"https://schema.org/ParcelService","availability":"https://schema.org/InStock"}]}
  </script>
  <script type="application/ld+json">
   {"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.screwfix.com/"},{"@type":"ListItem","position":2,"name":"Tools","item":"https://www.screwfix.com/c/tools/cat830034"},{"@type":"ListItem","position":3,"name":"Power Tools","item":"https://www.screwfix.com/c/tools/power-tools/cat830692"},{"@type":"ListItem","position":4,"name":"Saws","item":"https://www.screwfix.com/c/tools/saws/cat830716"},{"@type":"ListItem","position":5,"name":"Mitre Saws","item":"https://www.screwfix.com/c/tools/mitre-saws/cat830858"}]}
  </script>
  <meta content="26" name="next-head-count"/>
  <script type="text/javascript">
   /* Polyfill service v4.5.0
 * Disable minification (remove `.min` from URL path) for more info */
  </script>
  <script async="" src="/fusionx/srvinit.js">
  </script>
  <link as="style" href="/_next/static/css/a67c635984c510c9.css" rel="preload"/>
  <link data-n-g="" href="/_next/static/css/a67c635984c510c9.css" rel="stylesheet"/>
  <link as="style" href="/_next/static/css/740fd7f34264ccbf.css" rel="preload"/>
  <link data-n-p="" href="/_next/static/css/740fd7f34264ccbf.css" rel="stylesheet"/>
  <link as="style" href="/_next/static/css/1e3ede1424ebbe1e.css" rel="preload"/>
  <link data-n-p="" href="/_next/static/css/1e3ede1424ebbe1e.css" rel="stylesheet"/>
  <link as="style" href="/_next/static/css/ffb741840bbe8528.css" rel="preload"/>
  <link data-n-p="" href="/_next/static/css/ffb741840bbe8528.css" rel="stylesheet"/>
  <noscript data-n-css="">
  </noscript>
  <script defer="" nomodule="" src="/_next/static/chunks/polyfills-78c92fac7aa8fdd8.js">
  </script>
  <script defer="" src="/_next/static/chunks/webpack-2c90c422d382d168.js">
  </script>
  <script defer="" src="/_next/static/chunks/framework-3299c5364e0ec6ff.js">
  </script>
  <script defer="" src="/_next/static/chunks/main-b4a7336d1ecee5f3.js">
  </script>
  <script defer="" src="/_next/static/chunks/pages/_app-c2b792dea6a1ff69.js">
  </script>
  <script defer="" src="/_next/static/chunks/3467-6975bab37d2da2eb.js">
  </script>
  <script defer="" src="/_next/static/chunks/8814-159c3e8774f70836.js">
  </script>
  <script defer="" src="/_next/static/chunks/8052-47a55f18f0e34837.js">
  </script>
  <script defer="" src="/_next/static/chunks/4839-2cca491ccf37c23b.js">
  </script>
  <script defer="" src="/_next/static/chunks/2429-658db6f90127981f.js">
  </script>
  <script defer="" src="/_next/static/chunks/9139-0b6116c7425866d9.js">
  </script>
  <script defer="" src="/_next/static/chunks/pages/p/%5B...id%5D-edf4957b337aff93.js">
  </script>
  <script defer="" src="/_next/static/4b0be1b1738b369fc073e17e157b925034c5bb88/_buildManifest.js">
  </script>
  <script defer="" src="/_next/static/4b0be1b1738b369fc073e17e157b925034c5bb88/_ssgManifest.js">
  </script>
 </head>
 <body>
  <div id="__next">
   <div>
    <div class="x_28qp">
     <a href="#container-main">
      Skip to content
     </a>
    </div>
    <header class="hRCyfb">
     <div class="rBy7NP">
     </div>
     <div class="_0R2uIG">
      <div class="tYP_3K">
       <div class="mnfaDp U_v2oc">
    ..
    ..
  </body>
</html>

Python相关问答推荐

对某些列的总数进行民意调查,但不单独列出每列

追溯(最近最后一次调用):文件C:\Users\Diplom/PycharmProject\Yolo01\Roboflow-4.py,第4行,在模块导入roboflow中

使用miniconda创建环境的问题

对于一个给定的数字,找出一个整数的最小和最大可能的和

查找两极rame中组之间的所有差异

如何列举Pandigital Prime Set

如何请求使用Python将文件下载到带有登录名的门户网站?

如何将多进程池声明为变量并将其导入到另一个Python文件

SQLAlchemy Like ALL ORM analog

Python逻辑操作作为Pandas中的条件

matplotlib图中的复杂箭头形状

如何创建引用列表并分配值的Systemrame列

为什么在FastAPI中创建与数据库的连接时需要使用生成器?

无法在Spyder上的Pandas中将本地CSV转换为数据帧

根据Pandas中带条件的两个列的值创建新列

应用指定的规则构建数组

如何在SQLAlchemy + Alembic中定义一个"Index()",在基表中的列上

用来自另一个数据框的列特定标量划分Polars数据框中的每一列,

我怎么才能用拉夫分拣呢?

在一个数据帧中,我如何才能发现每个行号是否出现在一列列表中?