我正在try 从:https://www.mecca.com.au/skin-care/中抓取每个产品的品牌、名称、图片URL
我在抓取图像URL时遇到了问题,因为没有href
例如,该产品的一个页面:https://www.mecca.com.au/tatcha/indigo-cleansing-balm/I-059912.html?cgpath=skincare
<picture class="css-1azgcry-container" id="image-reponsive-container" data-testid="imageReponsiveContainer">
<img class="" title="Tatcha - Indigo Cleansing Balm" alt="Tatcha - Indigo Cleansing Balm" src="https://www.mecca.com.au/on/demandware.static/-/Sites-mecca-online-catalog/default/dw0a0707d8/product/tatcha/hr/i-059912-indigo-cleansing-balm-7-1-940.jpg" data-testid="imageReponsiveImg"></picture>
我想提取的图像网址只以下src
个例如.这里:
src="https://www.mecca.com.au/on/demandware.static/-/Sites-mecca-online-catalog/default/dw0a0707d8/product/tatcha/hr/i-059912-indigo-cleansing-balm-7-1-940.jpg"
以下是我的代码:
import requests
from bs4 import BeautifulSoup
url = "https://www.mecca.com.au/skin-care/"
params = {"start": "0", "sz": "36", "format": "ajax"}
for params["start"] in range(0, 36 * 5, 36):
productlinks = []
cataloguelist = []
soup = BeautifulSoup(requests.get('https://www.mecca.com.au/skin-care/', params=params).content, "html.parser")
products = soup.find_all('div', class_="grid-product-info")
for item in products:
for link in item.find_all('a', href=True):
productlinks.append(url + link['href'])
for link in productlinks:
response = requests.get(link)
soup = BeautifulSoup(response.content, "html.parser")
brand = soup.find('a', class_='css-1p371np-size5-size5-sansSerif-sansSerif-brandNameLink').text
name = soup.find('span', class_='css-1noela6-paragraph-paragraph-sansSerif-sansSerif-productName').text
### This is where I'm struggling, I tried 'picture', class_='css-1azgcry-container' as well
image = soup.find('picture', attrs={'css-1azgcry-container', 'img', 'alt', 'src'})
print(brand, name, image)
谢谢你的帮助!