如何使用没有Selenium的Python在百思买着陆页面上处理国家地区 Select

发布于05月01日

我试图使用Python从百思买网站获取内容，但在国家 Select 页面上遇到了最初的障碍.首次访问百思买时，该网站要求用户 Select 一个国家/地区，该国家/地区似乎是通过JavaScript管理的.我想自动访问此页面以访问该网站的主要内容.

我目前正在使用BeautifulSoup进行抓取，但我知道它不处理JavaScript.如果可能的话，我希望避免使用Selenium或其他浏览器自动化工具.

是否有一种方法可以使用Selenium以外的库通过Python模拟国家 Select ，例如通过直接的HTTP请求？

如果有任何绕过或模拟国家 Select 的指导或替代建议，我们将不胜感激！

我的代码片段:

def scrape_bestbuy(product_name):
    url = f"https://www.bestbuy.com/site/searchpage.jsp?st={product_name.replace(' ', '+')}"
    response = requests.get(url, headers=get_random_user_agent())
    soup = BeautifulSoup(response.text, 'html.parser')
    try:
        product = soup.select_one('.sku-title a').text.strip()
        price = soup.select_one(".pricing-price div[data-testid='large-price'] .priceView-customer-price > span:nth-child(1)").text
        return {'Site': 'Bestbuy.com', 'Item title name': product, 'Price(USD)': price}
    except AttributeError:
        return {'Site': 'Bestbuy.com', 'Item title name': 'No Product Found', 'Price(USD)': 'N/A'}

from bs4 import BeautifulSoup import requests product_name = "vacuum" url = f"https://www.bestbuy.com/site/searchpage.jsp?st={product_name.replace(' ', '+')}&intl=nosplash" response = requests.get(url, headers={'User-agent': 'Mozilla/5.0'}) soup = BeautifulSoup(response.text, 'html.parser') product = soup.select_one('.sku-title a').text.strip() price = soup.select_one(".pricing-price div[data-testid='large-price'] .priceView-customer-price > span:nth-child(1)").text print(price)

如何使用没有Selenium的Python在百思买着陆页面上处理国家地区 Select

推荐答案

Python相关问答推荐

过滤具有相同列日期值的行

如何在where或过滤器方法中使用SQLAlchemy hybrid_Property？

手动为pandas中的列上色

在Python中，如何才能/应该使用decorator 来实现函数多态性？

如何在Python中按组应用简单的线性回归？

避免循环的最佳方法

给定数据点，制定它们的关系

有什么方法可以避免使用许多if陈述

从包含数字和单词的文件中读取和获取数据集

跟踪我已从数组中 Select 的样本的最有效方法

django禁止直接分配到多对多集合的前端.使用user.set()

如何使用pytest来查看Python中是否存在class attribution属性？

如何使用表达式将字符串解压缩到Polars DataFrame中的多个列中？

Pre—Commit MyPy无法禁用非错误消息

如何从数据库上传数据到html？

如何在图中标记平均点？

字符串合并语法在哪里记录

在www.example.com中使用`package_data`包含不包含init. py的非Python文件

Python Pandas获取层次路径直到顶层管理

Geopandas未返回正确的缓冲区(单位：米)