Python BeautifulSoup 查找放置在  标签上方的

和
标签元素

发布于07月24日

我怎样才能把下面的 struct 删减到只有h1,h2&h3个元素高于<a>个标记呢

我想得到所有的<a>个标签标题放在上面的目标是<a>个标签在Beautiful Soup .

HTML码:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>Document</title>
</head>
<body>
    <h1>Heading H1</h1>
    <p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
    <a href="#">Button</a>

    <hr>

    <h2>Heading H2</h2>
    <p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
    <p>
        <a href="#">Button</a>
    </p>

    <hr>

    <h3>Heading H3</h3>
    <p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
    <p>
        <a href="#">Button</a>
    </p>
    
    <hr>
</body>
</html>

我的代码:

from bs4 import BeautifulSoup
import requests

website = 'http://127.0.0.1:5500/test.html'
result = requests.get(website)
content = result.text

soup = BeautifulSoup(result.text)
# print(soup.prettify())

href_tags = ["a"]
for tags in soup.find_all(href_tags):
    print(tags.name + ' -> ' + tags.text.strip())

try 使用上面的代码时，它只显示<a>个标签文本.我还想得到<h1>,<h2>和<h3>的标签，这是放置在<a>的标签.

from bs4 import BeautifulSoup as bs import pandas as pd html = ''' <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> <title>Document</title> </head> <body> <h1>Heading H1</h1> <p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p> <a href="#">Button</a> <hr> <h2>Heading H2</h2> <p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p> <p> <a href="#">Button</a> </p> <hr> <h3>Heading H3</h3> <p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p> <p> <a href="#">Button</a> </p> <hr> </body> </html> ''' big_list = [] soup = bs(html, 'html.parser') for link in soup.select('a'): link_text = link.get_text(strip=True) link_url = link.get('href') previous_header = [x.get_text(strip=True) for x in link.find_all_previous() if x.name in ['h1', 'h2', 'h3']][0] big_list.append((link_text, link_url, previous_header)) df = pd.DataFrame(big_list, columns=['link_text', 'link_url', 'previous_header_text']) print(df)

Python BeautifulSoup 查找放置在  标签上方的

和
标签元素

和

标签元素

推荐答案

Python相关问答推荐

Locust请求中的Python和参数

如何使用Python将工作表从一个Excel工作簿复制粘贴到另一个工作簿？

SQLGory-file包FilField不允许提供自定义文件名，自动将文件保存为未命名

如何让程序打印新段落上的每一行？

如何从.cgi网站刮一张表到rame？

Pandas—合并数据帧，在公共列上保留非空值，在另一列上保留平均值

Pandas：将多级列名改为一级

如何从数据库上传数据到html？

Django REST Framework：无法正确地将值注释到多对多模型，不断得到错误字段名称字段对模型无效'<><>

Python脚本使用蓝牙运行在Windows 11与raspberry pi4

使用Python从URL下载Excel文件

实现神经网络代码时的TypeError

我的字符串搜索算法的平均时间复杂度和最坏时间复杂度是多少？

从旋转的DF查询非NaN值

将CSS链接到HTML文件的问题

我什么时候应该使用帆布和标签？

如何设置nan值为numpy数组多条件

Polars表达式无法访问中间列创建表达式

替换包含Python DataFrame中的值的<；

EST格式的Azure数据库笔记本中的当前时间戳