Python Scrapy Shell Error While Scraping Wallmart

发布于02月15日

我正在使用scrapy抓取 walmart.com.当我获取https://www.walmart.com/没有错误，但当试图获取"https://www.walmart.com/search? q=tablets typeahead=tabltes"出现以下错误: 我已经禁用了obey robot.text，并雇佣了虚假的用户代理.

2024-02-14 09:42:25[scrapy.downloadermidlewares.robotstxt]调试:被robots.txt:&lt；Get https://www.walmart.com/search?q=tablets&typeahead=tabltes> 2024-02-14 09:42:25[py.warning]警告:C:\USERS\SADAM1\PycharmProjects\Untitled4\v

import scrapy

class Wal1Spider(scrapy.Spider):
    name = "wal1"
    allowed_domains = ["walmart.com"]
    start_urls = ["https://walmart.com"]
    
    
    custom_settings = {
        "DOWNLOAD_DELAY": 6.3,
        "RANDOMIZE_DOWNLOAD_DELAY": True,
        "COOKIES_ENABLED": False,
        "AUTOTHROTTLE_ENABLED": True,
        "AUTOTHROTTLE_START_DELAY ": 2,
        "AUTOTHROTTLE_MAX_DELAY": 11.7,
        "AUTOTHROTTLE_TARGET_CONCURRENCY": 1,
        "CONCURRENT_REQUESTS": 4,
        "ROBOTSTXT_OBEY": False,
    }
    def parse(self, response):

        pass

env\lib\site-packages\scrapy_fake_useragent\middleware.py:95:ScrapyDeprecation 警告:属性RetryMiddleware.EXCEPTIONS_TO_RETRY已弃用.请改用RETRY_EXCEPTIONS设置. if isinstance(exception，self.EXCEPTIONS_TO_RETRY)[在此处输入图像描述](https://i.stack.imgur.com/oeTg0.png)

我试过禁用机器人.文本服从和雇用scrapy假用户代理

fetch("https://www.walmart.com/search?q=tablets&typeahead=tablte") # 2024-02-14 19:24:29 [scrapy.core.engine] INFO: Spider opened # 2024-02-14 19:24:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.walmart.com/search?q=tablets&typeahead=tablte> (referer: None)

Python Scrapy Shell Error While Scraping Wallmart

推荐答案

Python相关问答推荐

在Pandas 日历中插入一行

Pandas 在最近的日期合并，考虑到破产

我从带有langchain的mongoDB中的vector serch获得一个空数组

如何将ctyles.POINTER(ctyles.c_float)转换为int？

Python json.转储包含一些UTF-8字符的二元组，要么失败，要么转换它们.我希望编码字符按原样保留

海运图：调整行和列标签

如何使用LangChain和AzureOpenAI在Python中解决AttribeHelp和BadPressMessage错误？

pyscript中的压痕问题

avxspan与pandas period_range

无法使用DBFS File API路径附加到CSV In Datricks(OSError Errno 95操作不支持)

Pre—Commit MyPy无法禁用非错误消息

在vscode上使用Python虚拟环境时((env))

driver. find_element无法通过class_name找到元素'""

字符串合并语法在哪里记录

Matplotlib中的字体权重

在代码执行后关闭ChromeDriver窗口

递归函数修饰器

如何为需要初始化的具体类实现依赖反转和接口分离？

try 在单个WITH_COLUMNS_SEQ操作中链接表达式时，使用Polars数据帧时出现ComputeError

了解如何让库认识到我具有所需的依赖项