对于给定的html页面示例:
<div class="ia-secondary-content">
<div class="plugin_pagetree conf-macro output-inline" data-hasbody="false" data-macro-name="pagetree">
<div class="plugin_pagetree_children_list plugin_pagetree_children_list_noleftspace">
<div class="plugin_pagetree_children" id="children1326817570-0">
<ul class="plugin_pagetree_children_list" id="child_ul1326817570-0">
<li>
<div class="plugin_pagetree_childtoggle_container">
<a aria-expanded="false" aria-label="Expand item Topic 1" class="plugin_pagetree_childtoggle aui-icon aui-icon-small aui-iconfont-chevron-right" data-page-id="1630374642" data-tree-id="0" data-type="toggle" href="" id="plusminus1630374642-0"></a>
</div>
<div class="plugin_pagetree_children_content">
<span class="plugin_pagetree_children_span" id="childrenspan1630374642-0"> <a href="#">Topic 1</a></span>
</div>
<div class="plugin_pagetree_children_container" id="children1630374642-0"></div>
</li>
<li>
<div class="plugin_pagetree_childtoggle_container">
<a aria-expanded="false" aria-label="Expand item Topic 2" class="plugin_pagetree_childtoggle aui-icon aui-icon-small aui-iconfont-chevron-right" data-page-id="1565544568" data-tree-id="0" data-type="toggle" href="" id="plusminus1565544568-0"></a>
</div>
<div class="plugin_pagetree_children_content">
<span class="plugin_pagetree_children_span" id="childrenspan1565544568-0"> <a href="#">Topic 2</a></span>
</div>
<div class="plugin_pagetree_children_container" id="children1565544568-0"></div>
</li>
<li>
<div class="plugin_pagetree_childtoggle_container">
<a aria-expanded="true" aria-label="Expand item Topic 3" class="plugin_pagetree_childtoggle aui-icon aui-icon-small aui-iconfont-chevron-down" data-children-loaded="true" data-expanded="true" data-page-id="3733362288" data-tree-id="0" data-type="toggle"
href="" id="plusminus3733362288-0"></a>
</div>
<div class="plugin_pagetree_children_content">
<span class="plugin_pagetree_children_span" id="childrenspan3733362288-0"> <a href="#">Topic 3</a></span>
</div>
<div class="plugin_pagetree_children_container" id="children3733362288-0">
<ul class="plugin_pagetree_children_list" id="child_ul3733362288-0">
<li>
<div class="plugin_pagetree_childtoggle_container">
<span class="no-children icon"></span>
</div>
<div class="plugin_pagetree_children_content">
<span class="plugin_pagetree_children_span"> <a href="#">Subtopic 1</a></span>
</div>
<div class="plugin_pagetree_children_container"></div>
</li>
<li>
<div class="plugin_pagetree_childtoggle_container">
<span class="no-children icon"></span>
</div>
<div class="plugin_pagetree_children_content">
<span class="plugin_pagetree_children_span"> <a href="#">Subtopic 2</a></span>
</div>
<div class="plugin_pagetree_children_container"></div>
</li>
</ul>
</div>
</li>
<li>
<div class="plugin_pagetree_childtoggle_container">
<a aria-expanded="false" aria-label="Expand item Topic 4" class="plugin_pagetree_childtoggle aui-icon aui-icon-small aui-iconfont-chevron-right" data-page-id="2238798992" data-tree-id="0" data-type="toggle" href="" id="plusminus2238798992-0"></a>
</div>
<div class="plugin_pagetree_children_content">
<span class="plugin_pagetree_children_span" id="childrenspan2238798992-0"> <a href="#">Topic 4</a></span>
</div>
<div class="plugin_pagetree_children_container" id="children2238798992-0"></div>
</li>
</ul>
</div>
</div>
<fieldset class="hidden">
</fieldset>
</div>
</div>
我需要从这种页面树中提取最里面的嵌套链接.给定标题,我需要在其中获取所有链接,我如何找到所有最里面的嵌套链接.我想为它写一个动态提取各种html页面的最内层嵌套链接的python脚本.请注意,嵌套级别可能不同.
例如,我应该得到:
<a href="#">Subtopic 1</a>
<a href="#">Subtopic 2</a>
我试着提取所有的链接在同一嵌套 struct ,但它没有工作
# Step 1: Find the div with the given title
title = "Topic 3"
target_div = soup.find('span', class_='plugin_pagetree_children_span', text=title)
# Step 2: Extract the next div with class "plugin_pagetree_children_container"
if target_div:
container_div = target_div.find_next_sibling('div', class_='plugin_pagetree_children_container')
# Step 3: Extract all links within the container and print them
if container_div:
links = container_div.find_all('a')
for link in links:
print(link['href'])