BeautifulSoup与python无法获取h2标记的值

发布于10月12日

我试着从"Economia"部分的这个web page中获得这个值:

我想要得到所有的头衔.这是我当前的代码:

html = client.get("http://larepublica.pe/")
soup = BeautifulSoup(html.text, 'html.parser')

# Obtener la noticia de portada principal
economyNews = ""
for div in soup.findAll('h2', attrs={'class':'ItemSection_itemSection__title__PleA9'}):
    n = div.text
    economyNews += n+"\\n"

print(economyNews )

我已经测试了很多方法来获得这个，但似乎网页锁定了这个. 任何解决这个问题的 idea ，伙计们，我都会很感激的.非常感谢.

推荐答案

您可以try :

import requests
from bs4 import BeautifulSoup

url = "https://larepublica.pe/"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")


for h2 in soup.select("div:has(*:-soup-contains(Economía)) + div h2"):
    print(h2.text)

打印:

Banco Mundial: tasas de interés se mantendrán altas por más tiempo
Precio del dólar cierra al alza y se ubica en S/3,831 este miércoles 11 de octubre
Retiro AFP: ¿cuándo fue la última vez que se autorizó la liberación de fondos y cuánto se devolvió?
Debate sobre RETIRO AFP 2023: SBS y Congreso deliberaron sobre posible medida