通常,当我想从网站上刮取特定文本时,我会右键单击文本并 Select inspect.然后在HTML代码中,我查找我感兴趣的文本和100.
然后我将刚才复制的一串文本粘贴到soup中. Select ("在此处输入复制的文本")并将其保存到变量中.然后,我可以执行文本剥离功能,以获取所需的关键文本.
现在对于我正在处理的情况,我想得到这个网页标题h1: carsales.com.au/cars/used/toyota/rav4/.
中显示的汽车总数,截至目前,这个数字是1712辆.
这是我的代码:
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.carsales.com.au/cars/used/toyota/rav4/"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.41 Safari/537.36'}
res = requests.get(url,headers = headers)
res.raise_for_status()
# # prints entire website
# print(res.text)
# # if this gives 200, then you're good to go.
#print(res.status_code)
soup = bs(res.text, 'html.parser')
# # This one gets how many cars are available from the search link.
# # This is the alternate way as the soup.select method is not working.
# header_h1 = soup.find_all('h1')
# print(header_h1)
total_cars_element = soup.select('body > div.listing > div.container.listing-container.has-header-sticky > div.row.flex-nowrap.no-gutters > div:nth-child(1) > div:nth-child(1) > div')
print(total_cars_element)
# the above prints an empty list.
我真的只是想知道为什么这不起作用.我知道还有我在上面的代码中提到的其他解决方法.但我真的想继续喝汤. Select 方法.
非常感谢您的任何见解!