下面是我正在使用的一个简单的html源代码
<html>
<head>
<title>Welcome to the comments assignment from www.py4e.com</title>
</head>
<body>
<h1>This file contains the actual data for your assignment - good luck!</h1>
<table border="2">
<tr>
<td>Name</td><td>Comments</td>
</tr>
<tr><td>Melodie</td><td><span class="comments">100</span></td></tr>
<tr><td>Machaela</td><td><span class="comments">100</span></td></tr>
<tr><td>Rhoan</td><td><span class="comments">99</span></td></tr>
下面是我try 获得<td>Melodie</td>
行的代码
html='html text file aboved'
soup=BeautifulSoup(html,'html.parser')
for tag in soup.find_all('td'):
print(tag)
print('----') #Result:
#===============================================================================
# <td>Name</td>
# ----
# <td>Comments</td>
# ----
# <td>Melodie</td>
# ----
# <td><span class="comments">100</span></td>
# ----
# <td>Machaela</td>
# ----
# <td><span class="comments">100</span></td>
# ----
# <td>Rhoan</td>
# ----
#.........
#===============================================================================
现在我只想得到<td>name<td>
行,而不是带有"span"和"class"的行.我try 了两个过滤器soup.find_all('td' and not 'span')
和soup.find_all('td', attrs={'class':None})
,但都不起作用.我知道还有其他方法,但我想在汤中使用过滤器.查找所有().
# <td>Name</td>
# ----
# <td>Comments</td>
# ----
# <td>Melodie</td>
# ----
# <td>Machaela</td>
# ----
# <td>Rhoan</td>
# ----