Python 使用美汤对维基百科表格进行网络刮擦未返回任何内容

发布于03月07日

一般情况下，对Web抓取和编码来说都是新手.对于更有经验的人来说，这可能是一个简单的问题.也许不是...它是这样的:

试图通过网络从维基百科上刮一张桌子.我已经在html中找到了表，并将该信息添加到我的代码中.但是，当我运行它时，返回的是‘None’，而不是确认表已被正确定位.

from bs4 import BeautifulSoup
from urllib.request import urlopen


url = 'https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles'
html = urlopen(url) 
soup = BeautifulSoup(html, 'html.parser')            

table = soup.find('table',{'class':'wikitable sortable plainrowheaders jquery-tablesorter'})
print(table)

返回:无

推荐答案

从"CLASS"字符串中go 掉jquery-tablesorter--这个类是由javascript添加的，beautifulsoup 看不到它(注意:一定要注意服务器发送给你的真实的HTML文档，这就是beautifulsoup 看到的东西--在浏览器中按ctrl-U键):

from urllib.request import urlopen

from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")

table = soup.find("table", {"class": "wikitable sortable plainrowheaders"})
print(table)

打印:

<table class="wikitable sortable plainrowheaders" style="text-align:center">
<caption>Name of song, core catalogue release, songwriter, lead vocalist and year of original release
</caption>
<tbody><tr>
<th scope="col">Song
</th>
<th scope="col">Core catalogue release(s)
</th>
<th scope="col">Songwriter(s)
</th>

...