我正在使用scrapy抓取 walmart.com.当我获取https://www.walmart.com/没有错误,但当试图获取"https://www.walmart.com/search? q=tablets typeahead=tabltes"出现以下错误: 我已经禁用了obey robot.text,并雇佣了虚假的用户代理.
2024-02-14 09:42:25[scrapy.downloadermidlewares.robotstxt]调试:被robots.txt:<;Get https://www.walmart.com/search?q=tablets&typeahead=tabltes> 2024-02-14 09:42:25[py.warning]警告:C:\USERS\SADAM1\PycharmProjects\Untitled4\v
import scrapy
class Wal1Spider(scrapy.Spider):
name = "wal1"
allowed_domains = ["walmart.com"]
start_urls = ["https://walmart.com"]
custom_settings = {
"DOWNLOAD_DELAY": 6.3,
"RANDOMIZE_DOWNLOAD_DELAY": True,
"COOKIES_ENABLED": False,
"AUTOTHROTTLE_ENABLED": True,
"AUTOTHROTTLE_START_DELAY ": 2,
"AUTOTHROTTLE_MAX_DELAY": 11.7,
"AUTOTHROTTLE_TARGET_CONCURRENCY": 1,
"CONCURRENT_REQUESTS": 4,
"ROBOTSTXT_OBEY": False,
}
def parse(self, response):
pass
env\lib\site-packages\scrapy_fake_useragent\middleware.py:95:ScrapyDeprecation 警告:属性RetryMiddleware.EXCEPTIONS_TO_RETRY已弃用.请改用RETRY_EXCEPTIONS设置. if isinstance(exception,self.EXCEPTIONS_TO_RETRY)[在此处输入图像描述](https://i.stack.imgur.com/oeTg0.png)
我试过禁用机器人.文本服从和雇用scrapy假用户代理