我怎样才能把下面的 struct 删减到只有h1
,h2
&h3
个元素高于<a>
个标记呢
我想得到所有的<a>
个标签标题放在上面的目标是<a>
个标签在Beautiful Soup .
HTML码:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>Document</title>
</head>
<body>
<h1>Heading H1</h1>
<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
<a href="#">Button</a>
<hr>
<h2>Heading H2</h2>
<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
<p>
<a href="#">Button</a>
</p>
<hr>
<h3>Heading H3</h3>
<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
<p>
<a href="#">Button</a>
</p>
<hr>
</body>
</html>
我的代码:
from bs4 import BeautifulSoup
import requests
website = 'http://127.0.0.1:5500/test.html'
result = requests.get(website)
content = result.text
soup = BeautifulSoup(result.text)
# print(soup.prettify())
href_tags = ["a"]
for tags in soup.find_all(href_tags):
print(tags.name + ' -> ' + tags.text.strip())
try 使用上面的代码时,它只显示<a>
个标签文本.我还想得到<h1>
,<h2>
和<h3>
的标签,这是放置在<a>
的标签.