Python Beautifulsoup：遍历一个列表，从a到z，并解析数据，以便将其存储在pdf中.

发布于03月21日

我目前正在整理一个非常简单的解析器，它在memberlist上从a到z::我们这里有一个memberlist::

参见:https://vvonet.vvo.at/vvonet_mitgliederverzeichnisneu

注:我们必须打开"kontaktinformationen"链接，并将那里的数据复制给Pandas df.

我想我可以用python beautifulsoup请求来做这件事，要么把它打印到屏幕上，要么把它存储在pdf中.

首先，该脚本应该获取成员列表页面，提取到单个成员页面的链接，访问每个成员的"kontaktinformationen"页面，然后它应该提取联系信息. 最后，我认为最好将联系信息存储在DataFrame中. 好了--我终于能够将DataFrame打印到屏幕上，或者将其保存为CSV文件.

以下是我的try :

import requests
from bs4 import BeautifulSoup
import pandas as pd

# first,  we send a GET request to the member list page
url = "https://vvonet.vvo.at/vvonet_mitgliederverzeichnisneu"
response = requests.get(url)

# here a check if the request was successful
if response.status_code == 200:
    # Parse the HTML content of the page
    soup = BeautifulSoup(response.content, "html.parser")
    
    # Find now all member links
    member_links = soup.find_all("a", class_="font1")
    
    # now - Initialize lists to store data
    member_data = []
    
    # Iterate over member links
    for member_link in member_links:
        # Get the URL of the "kontaktinformationen" page
        member_url = "https://vvonet.vvo.at" + member_link["href"] + "/kontaktinformationen"
        
        # Send a GET request to the member's "kontaktinformationen" page
        member_response = requests.get(member_url)
        
        # Check if the request was successful
        if member_response.status_code == 200:
            # Parse the HTML content of the page
            member_soup = BeautifulSoup(member_response.content, "html.parser")
            
            # Find the contact information section
            contact_info_div = member_soup.find("div", class_="contact")
            
            # Check if contact information section exists
            if contact_info_div:
                # Extract the contact information
                contact_info_text = contact_info_div.get_text(separator="\n", strip=True)
                member_data.append(contact_info_text)
            else:
                member_data.append("Contact information not found")
        else:
            member_data.append(f"Failed to retrieve contact information for {member_link.text.strip()}")
    
    # Create a DataFrame
    df = pd.DataFrame(member_data, columns=["Contact Information"])
    
    # Display the DataFrame
    print(df)
    
    # Alternatively, you can save the DataFrame to a CSV file
    # df.to_csv("member_contact_information.csv", index=False)
else:
    print("Failed to retrieve the member list page.")

但现在我得到了一个空的数据帧..

Empty DataFrame
Columns: [Contact Information]
Index: []

import pandas as pd import requests from bs4 import BeautifulSoup url = "https://vvonet.vvo.at/vvo/vvonet_website.nsf/allMitglieder?ReadViewEntries=" soup = BeautifulSoup(requests.get(url).content, "xml") data = [] for e in soup.select("viewentry"): t = {} for d in e.select("entrydata"): t[d["name"]] = d.get_text(strip=True, separator=" ") data.append(t) df = pd.DataFrame(data) print(df)

docTitle globUNID docBundesland docFachbereich docFax docInternet docMail docOrt docStrasse docTelefon docUnternehmen 0 Acredia Versicherung AG x095DA0F5F9E395B0C1258A4A0037FD66 Wien Schadenversicherer http://www.acredia.at office@acredia.at 1010 Wien Himmelpfortgasse 29 +43/(0)5 01 02-0 Acredia Versicherung AG 1 AIG Europe S.A. - Direktion für Österreich xFB78910F9D0D7FC7C1258A4A0037FCED Wien +43/1/533 25 00-80 http://www.aig.co.at info.oesterreich@aig.com 1010 Wien Herrengasse 1 - 3 +43/1/533 25 00 AIG Europe S.A.\nDirektion für Österreich 2 Allianz Care x7B60F7881DF6129AC1258A4A0037FD61 Außerordentliches Mitglied http://www.allianz-care.com IRL-Dublin 12 15 Joyce Way, Park West Business +44/7825/510 814 Allianz Care 3 Allianz Commercial xA362FDC72B76421DC1258A4A0037FD43 Wien +43/(0)59009-402 14 http://www.commercial.allianz.com stefanie.thiem@allianz.at 1100 Wien Wiedner Gürtel 9-13 +43/(0)59009-88700 Allianz Commercial 4 Allianz Elementar Lebensversicherungs-Aktiengesellschaft xD41EC8AF73F7ED93C1258A4A0037FCE5 Wien +43/(0)5 9009-70700 http://www.allianz.at feedback@allianz.at, schaden@allianz.at 1100 Wien Wiedner Gürtel 9-13 +43/(0)5 9009-0 Allianz Elementar Lebensversicherungs-Aktiengesellschaft 5 Allianz Elementar Versicherungs-Aktiengesellschaft x95C137D749F4EB97C1258A4A0037FCE6 Wien Kfz-Versicherer Krankenversicherer Schadenversicherer Unfallversicherer +43/(0)5 9009-70000 http://www.allianz.at feedback@allianz.at, schaden@allianz.at 1100 Wien Wiedner Gürtel 9-13 +43/(0)5 9009-0 Allianz Elementar Versicherungs-Aktiengesellschaft 6 APK Versicherung AG x7C2E8BD3E6C46C0BC1258A4A0037FD34 Wien Lebensversicherer +43/(0)50 275-3709 http://www.apk-versicherung.at versicherung@apk.at 1030 Wien Thomas-Klestil-Platz 13 +43/(0)50 275-3700 APK Versicherung AG 7 ARAG SE - Direktion für Österreich xFA932E61C5D638E3C1258A4A0037FD41 Wien +43/1/531 02-1923 http://www.arag.at info@arag.at 1041 Wien Favoritenstraße 36, Postfach 182 +43/1/531 02-0 ARAG SE \nDirektion für Österreich 8 Atradius Kreditversicherung - Zweigniederlassung der Atradius Crédito y Caución S.A. de Seguros y Reaseguros xAA2CB3183BE03937C1258A4A0037FD4B Wien http://www.atradius.at versicherung.kredit@atradius.com 1220 Wien Vienna DC Tower 1, Donau-City-Straße 7 +43/1/813 0313 Atradius Kreditversicherung\nZweigniederlassung der Atradius Crédito y Caución S.A. de Seguros y Reaseguros 9 Atzbacher Versicherung V.a.G. xF489570D29A92438C1258A4A0037FD68 Oberösterreich Sachversicherungsverein +43/7673/75488-20 http://www.atzbacher-versicherung.at info@atzbacher-versicherung.at 4690 Oberndorf bei Schwanenstadt Atzbacher Straße 23 +43/7673/75488-0 Atzbacher Versicherung V.a.G. 10 AWP P&C S.A., Niederlassung für Österreich x3173A1D948F040D7C1258A4A0037FCEA Wien Unfallversicherer http://www.allianz-partners.com service.at@allianz.com 1130 Wien Hietzinger Kai 101-105 +43/1/525 03-6945 (Service Center) AWP P&C S.A., Niederlassung für Österreich (Allianz Partners) ...

Python Beautifulsoup：遍历一个列表，从a到z，并解析数据，以便将其存储在pdf中.

推荐答案

Python相关问答推荐

将DF中的名称与另一DF拆分并匹配并返回匹配的公司

Pydantic 2.7.0模型接受字符串日期时间或无

替换字符串中的多个重叠子字符串

rame中不兼容的d类型

追溯(最近最后一次调用)：文件C：\Users\Diplom/PycharmProject\Yolo01\Roboflow-4.py，第4行，在模块导入roboflow中

如何在虚拟Python环境中运行Python程序？

如何在solve()之后获得症状上的等式的值

海上重叠直方图

Polars asof在下一个可用日期加入

matplotlib图中的复杂箭头形状

我对这个简单的异步者的例子有什么错误的理解吗？

在第一次调用时使用不同行为的re. sub的最佳方式

如果不使用. to_list()[0]，我如何从一个pandas DataFrame中获取一个值？

为什么在Python中00是一个有效的整数？

无法在盐流道中获得柱子

VSCode Pylance假阳性(？)对ImportError的react

将数据从一个单元格保存到Jupyter笔记本中的下一个单元格

具有不同坐标的tkinter canvs.cocords()和canvs.moveto()

如何定义一个将类型与接收该类型的参数的可调用进行映射的字典？

搜索结果未显示.我的URL选项卡显示：http：//127.0.0.1：8000/search？"；，而不是这个："；http：//127.0.0.1：8000/search？q=name"；