我有麻烦下面的网页得到球员超链接网络刮,因为它只打印出从菜单在页面底部的球员,而不是列出的球员为相关的盒子得分游戏.我需要改变什么才能得到明尼苏达双胞胎和天使队的球员?

import requests
from bs4 import BeautifulSoup

# URL of the webpage
url = "https://www.baseball-reference.com/boxes/ANA/ANA202305210.shtml"

# Send a GET request to the webpage
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content of the webpage using BeautifulSoup
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Find all hyperlink elements on the page with "/players/" in the href attribute
    links = soup.find_all('a', href=lambda href: href and '/players/' in href)
    
    # Extract and print the href attribute of each matching hyperlink
    for link in links:
        href = link.get('href')
        print(href)
else:
    print("Failed to fetch the webpage.")

推荐答案

If you inspect the source code of the page (Ctrl + U) in your browser, you will see that the tables are stored inside HTML comments (<!-- ... -->), so doesn't see it.

您可以加载页面,找到所有相关的 comments 部分,并将其转换为新的BeautifulSoup对象.然后,玩家链接将变为可见:

import requests
from bs4 import BeautifulSoup, Comment

url = "https://www.baseball-reference.com/boxes/ANA/ANA202305210.shtml"

response = requests.get(url)
response.raise_for_status()

soup = BeautifulSoup(response.content, "html.parser")

# the key part:
# convert the HTML comment section <!-- ... --> to new BeautifulSoup object
new_soup = ""
for c in soup.find_all(string=Comment):
    new_soup += c if c.strip().startswith("<") else ""

new_soup = BeautifulSoup(new_soup, "html.parser")
links = new_soup.find_all("a", href=lambda href: href and "/players/" in href)

for link in links:
    href = link.get("href")
    print(f"{link.text:<30} {href}")

打印:

Joey Gallo                     /players/g/gallojo01.shtml
Carlos Correa                  /players/c/correca01.shtml
Alex Kirilloff                 /players/k/kirilal01.shtml
Edouard Julien                 /players/j/julieed01.shtml
Kyle Farmer                    /players/f/farmeky01.shtml
Trevor Larnach                 /players/l/larnatr01.shtml
Willi Castro                   /players/c/castrwi01.shtml
Donovan Solano                 /players/s/solando01.shtml
Ryan Jeffers                   /players/j/jeffery01.shtml
Pablo López                    /players/l/lopezpa01.shtml
Jorge López                    /players/l/lopezjo02.shtml
José De León                   /players/d/deleojo03.shtml

...and so on.

Python相关问答推荐

如何自动抓取以下CSV

按顺序合并2个词典列表

如何在Python脚本中附加一个Google tab(已经打开)

在Python中动态计算范围

如何在Polars中从列表中的所有 struct 中 Select 字段?

driver. find_element无法通过class_name找到元素'""

形状弃用警告与组合多边形和多边形如何解决

在ubuntu上安装dlib时出错

启用/禁用shiny 的自动重新加载

如何启动下载并在不击中磁盘的情况下呈现响应?

需要帮助重新调整python fill_between与数据点

Pandas Data Wrangling/Dataframe Assignment

无论输入分辨率如何,稳定扩散管道始终输出512 * 512张图像

python panda ExcelWriter切换动态公式到数组公式

通过追加列表以极向聚合

使用tqdm的进度条

简单 torch 模型测试:ModuleNotFoundError:没有名为';Ultralytics.yolo';

pytest、xdist和共享生成的文件依赖项

Matplotlib中的曲线箭头样式

普洛特利express 发布的人口普查数据失败