我正试图从一个网站(https://carone.com.uy/autos-usados-y-0km?p=21)中获取几个价值.有些人在工作,但有些人没有.例如,我可以擦除名称、型号、价格和可燃物,但我无法正确擦除"año"或"kri"字段,代码总是返回"N/A"作为值.

以下是我使用的代码:

import pandas as pd
from datetime import date
import os
import socket
import requests
from bs4 import BeautifulSoup

def scrape_product_data(url):
    try:
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
        }

        product_data = []

        # Make the request to get the HTML content
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # Check if the request was successful

        soup = BeautifulSoup(response.text, 'html.parser')
        product_elements = soup.find_all('div', class_='product-item-info')
        for product_element in product_elements:
            # Extract product name, price, model, and attributes as before (same code as previous version)
            product_name_element = product_element.select_one('p.carone-car-info-data-brand.cursor-pointer')
            product_name = product_name_element.text.strip() if product_name_element else "N/A"

            product_price_element = product_element.find('span', class_='price')
            product_price = product_price_element.text.strip() if product_price_element else "N/A"

            product_model_element = product_element.select_one('p.carone-car-info-data-model')
            product_model = product_model_element.get('title').strip() if product_model_element else "N/A"

            # Extract product attributes
            attributes_div = product_element.find('div', class_='carone-car-attributes')
            
            year_element = attributes_div.find('p', class_='carone-car-attribute-title', text='Año')
            year_value = year_element.find_previous_sibling('p', class_='carone-car-attribute-value').text if year_element else "N/A"

            kilometers_element = attributes_div.find('p', class_='carone-car-attribute-title', text='Kilómetros')
            kilometers_value = kilometers_element.find_previous_sibling('p', class_='carone-car-attribute-value').text if kilometers_element else "N/A"

            fuel_element = attributes_div.find('p', class_='carone-car-attribute-title', text='Combustible')
            fuel_value = fuel_element.find_previous_sibling('p', class_='carone-car-attribute-value').text if fuel_element else "N/A"

            # Append product data as a tuple (name, price, model, year, kilometers, fuel) to the list
            product_data.append((product_name, product_price, product_model, year_value, kilometers_value, fuel_value))

结果是这样的:enter image description here

我不明白为什么前面提到的值总是得到N/A,而其他值都运行得很好,方法也是一样的.

推荐答案

问题是,该站点使用的不是Kilómetros,而是Kil&oacutemetros作为元素的文本(对于age也是如此):

def scrape_product_data(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
    }

    product_data = []

    response = requests.get(url, headers=headers)
    response.raise_for_status()

    soup = BeautifulSoup(response.text, "html.parser")
    product_elements = soup.find_all("div", class_="product-item-info")
    for product_element in product_elements:
        product_name_element = product_element.select_one(
            "p.carone-car-info-data-brand.cursor-pointer"
        )
        product_name = (
            product_name_element.text.strip() if product_name_element else "N/A"
        )

        product_price_element = product_element.find("span", class_="price")
        product_price = (
            product_price_element.text.strip() if product_price_element else "N/A"
        )

        product_model_element = product_element.select_one(
            "p.carone-car-info-data-model"
        )
        product_model = (
            product_model_element.get("title").strip()
            if product_model_element
            else "N/A"
        )

        attributes_div = product_element.find("div", class_="carone-car-attributes")

        year_element = attributes_div.find(
            "p", class_="carone-car-attribute-title", string="A&ntildeo"
        )
        year_value = (
            year_element.find_previous_sibling(
                "p", class_="carone-car-attribute-value"
            ).text
            if year_element
            else "N/A"
        )

        kilometers_element = attributes_div.find(
            "p", class_="carone-car-attribute-title", string="Kil&oacutemetros"
        )
        kilometers_value = (
            kilometers_element.find_previous_sibling(
                "p", class_="carone-car-attribute-value"
            ).text
            if kilometers_element
            else "N/A"
        )

        fuel_element = attributes_div.find(
            "p", class_="carone-car-attribute-title", string="Combustible"
        )
        fuel_value = (
            fuel_element.find_previous_sibling(
                "p", class_="carone-car-attribute-value"
            ).text
            if fuel_element
            else "N/A"
        )

        product_data.append(
            (
                product_name,
                product_price,
                product_model,
                year_value,
                kilometers_value,
                fuel_value,
            )
        )

    return pd.DataFrame(
        product_data, columns=["Name", "Price", "Model", "Year", "KM", "Fuel"]
    )


df = scrape_product_data("https://carone.com.uy/autos-usados-y-0km?p=2")
print(df)

打印:

                 Name      Price                                 Model  Year      KM   Fuel
0        Renault Kwid  US$12.000               KWID 1.0 INTENSE TACTIL  2018  82.390  NAFTA
1      Chevrolet Onix  US$20.800                   NEW ONIX 1.0T RS MT  2021  46.000  NAFTA
2        Suzuki Swift  US$17.800                 NUEVO SWIFT 1.2 GL AT  2020  63.641  NAFTA
3           Fiat Toro  US$23.800                TORO 1.8 FREEDOM DC MT  2021  15.330  NAFTA
4       Renault Oroch  US$26.300  NEW OROCH INTENS OUTSIDER 1.3T DC AT  2023  21.360  NAFTA
5     Renault Stepway  US$15.100                 STEPWAY PRIVILEGE 1.6  2017  60.010  NAFTA
6        Renault Kwid  US$13.100                         KWID 1.0 LIFE  2022      14  NAFTA
7      Chevrolet Onix  US$22.800              NEW ONIX 1.0T PREMIER AT  2021  14.780  NAFTA
8   Nissan SENTRA B18  US$34.000           SENTRA B18 2.0 EXCLUSIVE AT  2022  30.430  NAFTA
9        Renault Kwid  US$13.500                   KWID 1.0 INTENSE MT  2020  37.660  NAFTA
10  Chevrolet Tracker  US$16.300                TRACKER 1.8 LTZ 4X4 AT  2014  91.689  NAFTA
11     Chevrolet Onix  US$18.600            NEW ONIX PLUS 1.2 LS 4P MT  2022  24.658  NAFTA

Python相关问答推荐

根据不同列的值在收件箱中移动数据

如何使用pandasDataFrames和scipy高度优化相关性计算

如何使用matplotlib在Python中使用规范化数据和原始t测试值创建组合热图?

如何根据参数推断对象的返回类型?

. str.替换pandas.series的方法未按预期工作

使可滚动框架在tkinter环境中看起来自然

处理带有间隙(空)的duckDB上的重复副本并有效填充它们

根据二元组列表在pandas中创建新列

如何在solve()之后获得症状上的等式的值

如何使用Python以编程方式判断和检索Angular网站的动态内容?

Stacked bar chart from billrame

Python逻辑操作作为Pandas中的条件

无法连接到Keycloat服务器

Django—cte给出:QuerySet对象没有属性with_cte''''

合并与拼接并举

根据客户端是否正在传输响应来更改基于Flask的API的行为

为用户输入的整数查找根/幂整数对的Python练习

应用指定的规则构建数组

在matplotlib中重叠极 map 以创建径向龙卷风图

Python:在cmd中添加参数时的语法