推荐答案
如果所有<div class='ah-content'>
个都遵循与示例中相同的模式,则可以使用此脚本创建数据帧:
import pandas as pd
from bs4 import BeautifulSoup
html_doc = """\
<div class='ah-content'>
<h4>XYZ Community</h4>
<p>123 Street</p>
<p>Atlanta, Georgia, 12345</p>
<p>1234567890</p>
</div>"""
soup = BeautifulSoup(html_doc, "html.parser")
strings = [[t.text for t in c.find_all()] for c in soup.select(".ah-content")]
df = pd.DataFrame(strings, columns=["Name", "Address", "Address2", "Phone"])
print(df.to_markdown(index=False))
打印:
Name | Address | Address2 | Phone |
---|---|---|---|
XYZ Community | 123 Street | Atlanta, Georgia, 12345 | 1234567890 |