我在学羊毛皮.例如,有一个网站http://quotes.toscrape.com. 我正在创建一个简单的蜘蛛(卑鄙的基因蜘蛛引语). 我想要分析引语,以及转到作者的页面并分析他的出生日期. 我试着这样做,但都不管用.
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
allowed_domains = ["quotes.toscrape.com"]
start_urls = ["http://quotes.toscrape.com/"]
def parse(self, response):
quotes=response.xpath('//div[@class="quote"]')
item={}
for quote in quotes:
item['name']=quote.xpath('.//span[@class="text"]/text()').get()
item['author']=quote.xpath('.//small[@class="author"]/text()').get()
item['tags']=quote.xpath('.//div[@class="tags"]/a[@class="tag"]/text()').getall()
url=quote.xpath('.//small[@class="author"]/../a/@href').get()
response.follow(url, self.parse_additional_page, item)
new_page=response.xpath('//li[@class="next"]/a/@href').get()
if new_page is not None:
yield response.follow(new_page,self.parse)
def parse_additional_page(self, response, item):
item['additional_data'] = response.xpath('//span[@class="author-born-date"]/text()').get()
yield item
没有出生日期的代码(正确):
import scrapy
class QuotesSpiderSpider(scrapy.Spider):
name = "quotes_spider"
allowed_domains = ["quotes.toscrape.com"]
start_urls = ["https://quotes.toscrape.com/"]
def parse(self, response):
quotes=response.xpath('//div[@class="quote"]')
for quote in quotes:
yield {
'name':quote.xpath('.//span[@class="text"]/text()').get(),
'author':quote.xpath('.//small[@class="author"]/text()').get(),
'tags':quote.xpath('.//div[@class="tags"]/a[@class="tag"]/text()').getall()
}
new_page=response.xpath('//li[@class="next"]/a/@href').get()
if new_page is not None:
yield response.follow(new_page,self.parse)
问:如何访问作者的页面获取每一句引语并解析出生日期?
如何访问作者的页面获取每一句引语并解析出生日期?