我正在从Yelp上抓取餐厅 comments ,我正在访问餐厅的API来这样做.我目前正在收集4个星级 comments ,例如这restaurant page个有对应的API个.
这是当爬虫程序当前位于餐厅页面上时向API发送http请求的代码块
bizId = response.xpath("//meta[@name='yelp-biz-id']/@content").extract_first()
api_url = 'https://www.yelp.it/biz/' + bizId + '/review_feed?rr=' + str(n_star_filter)
yield response.follow(url=api_url, callback = self.parse_yelp_restaurant_api)
有时API被正确访问,我能够刮取它们.但是,大多数时候,我会得到这样的错误:
2023-10-27 15:57:39 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.yelp.it/biz/78t73jTxdUw5C-v44lj4Iw/review_feed?rr=4>
Traceback (most recent call last):
File "/Users/mauri/anaconda3/lib/python3.11/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
result = context.run(gen.send, result)
File "/Users/mauri/anaconda3/lib/python3.11/site-packages/scrapy/core/downloader/middleware.py", line 64, in process_response
method(request=request, response=response, spider=spider)
File "/Users/mauri/anaconda3/lib/python3.11/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 63, in process_response
decoded_body = self._decode(response.body, encoding.lower())
File "/Users/mauri/anaconda3/lib/python3.11/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 102, in _decode
body = brotli.decompress(body)
File "/Users/mauri/anaconda3/lib/python3.11/site-packages/brotli/brotli.py", line 90, in decompress
d.finish()
File "/Users/mauri/anaconda3/lib/python3.11/site-packages/brotli/brotli.py", line 464, in finish
raise Error("Decompression error: incomplete compressed stream.")
brotli.brotli.Error: Decompression error: incomplete compressed stream.
我不明白这是什么意思,这真的很奇怪,一些API被下载,而另一些API在它们看起来没有什么不同的时候产生这个错误.