Python Web 从单个网页中抓取多个表格

发布于06月22日

正在寻求帮助！我找到了一些类似于我自己的问题的代码.从较高的层面来看，我希望从同一个网页中获取多个表格(例如"每场比赛"和"总计").

不确定这是否重要，但我正在使用JupyterLab进行此活动.我用Python写作的知识非常有限(但正在努力学习！)因此，我很难从这两个网站中获得我想要的东西:

https://www.sp或ts-reference.com/cbb/players/jaden-ivey-1.html

或

https://basketball.realgm.com/player/Jaden-Ivey/Summary/148740

Essentially, this code below w或ks f或 the fbref webpage but when I replace that source link with either of the above two sites above, I can't figure out how to get what I want.

imp或t requests
from bs4 imp或t BeautifulSoup, Comment


url = 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

table = BeautifulSoup(soup.select_one('#all_stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')

#print some inf或mation from the table to screen:
f或 tr in table.select('tr:has(td)'):
    tds = [td.get_text(strip=True) f或 td in tr.select('td')]
    print('{:<30}{:<20}{:<10}'.f或mat(tds[0], tds[3], tds[5]))

我知道在stackoverflow上也有类似的问题，所以我会考虑这是否是一个重复的请求，但我需要进一步的帮助，因为我是新手.

谢谢

[ Season School Conf G GS MP FG ... STL BLK TOV PF PTS Unnamed: 27 SOS 0 2020-21 Purdue Big Ten 23 12 24.2 3.9 ... 0.7 0.7 1.3 1.7 11.1 NaN 11.23 1 2021-22 Purdue Big Ten 36 34 31.4 5.6 ... 0.9 0.6 2.6 1.8 17.3 NaN 8.23 2 Career Purdue NaN 59 46 28.6 4.9 ... 0.8 0.6 2.1 1.7 14.9 NaN 9.73 [3 rows x 29 columns], Season School Conf G GS MP FG FGA ... DRB TRB AST STL BLK TOV PF PTS 0 2020-21 Purdue Big Ten 19 10 23.3 3.5 9.2 ... 2.7 3.6 2.1 0.8 0.7 1.4 1.6 10.3 1 2021-22 Purdue Big Ten 19 17 32.6 5.5 12.8 ... 3.3 4.2 2.9 0.9 0.5 2.5 1.9 17.5 2 Career Purdue NaN 38 27 27.9 4.5 11.0 ... 3.0 3.9 2.5 0.9 0.6 1.9 1.8 13.9 [3 rows x 27 columns], Season School Conf G GS MP FG FGA ... DRB TRB AST STL BLK TOV PF PTS 0 2020-21 Purdue Big Ten 23 12 557 89 223 ... 57 76 43 17 16 31 39 256 1 2021-22 Purdue Big Ten 36 34 1132 203 441 ... 152 176 110 33 20 94 63 624 2 Career Purdue NaN 59 46 1689 292 664 ... 209 252 153 50 36 125 102 880 [3 rows x 27 columns], Season School Conf G GS MP FG FGA ... DRB TRB AST STL BLK TOV PF PTS 0 2020-21 Purdue Big Ten 19 10 442 66 174 ... 51 68 39 15 13 26 31 195 1 2021-22 Purdue Big Ten 19 17 620 104 244 ... 62 79 55 18 10 47 36 333 2 Career Purdue NaN 38 27 1062 170 418 ... 113 147 94 33 23 73 67 528 [3 rows x 27 columns], Season School Conf G GS MP FG ... TRB AST STL BLK TOV PF PTS 0 2020-21 Purdue Big Ten 23 12 557 6.4 ... 5.5 3.1 1.2 1.1 2.2 2.8 18.4 1 2021-22 Purdue Big Ten 36 34 1132 7.2 ... 6.2 3.9 1.2 0.7 3.3 2.2 22.0 2 Career Purdue NaN 59 46 1689 6.9 ... 6.0 3.6 1.2 0.9 3.0 2.4 20.8 [3 rows x 25 columns]]

Python Web 从单个网页中抓取多个表格

推荐答案

Python相关问答推荐

如何从同一类的多个元素中抓取数据？

如何从维基百科的摘要部分/链接列表中抓取链接？

为什么我的代码会进入无限循环？

如何将自动创建的代码转换为类而不是字符串？

如何在Python中增量更新DF

如果AST请求默认受csref保护，那么在Django中使用@ system_decorator(csref_protect)的目的是什么？

如何在超时的情况下同步运行Matplolib服务器端？该过程随机挂起

不允许AMBIMA API请求方法

如何使用Google Gemini API为单个提示生成多个响应？

Pydantic 2.7.0模型接受字符串日期时间或无

Pandas 滚动最接近的价值

如何访问所有文件，例如环境变量

切片包括面具的第一个实例在内的眼镜的最佳方法是什么？

关于Python异步编程的问题和使用await/await def关键字

提取相关行的最快方法—pandas

索引到 torch 张量，沿轴具有可变长度索引

基于行条件计算(pandas)

如何将数据帧中的timedelta转换为datetime

Numpyro AR(1)均值切换模型抽样不一致性

循环浏览每个客户记录，以获取他们来自的第一个/最后一个渠道