在下一次h3之前,我如何计算详细信息的数量?
我准备了一个基本的测试片段,如下所示:
import lxml
from lxml import html
test_html="""
<main>
<h3 id="test1">
<p>test1 desc</p>
<details></details>
<details></details>
<details></details>
<details></details>
<h3 id="test2">
<p>test2 desc</p>
<details></details>
<details></details>
<details></details>
<details></details>
<h3 id="test3">
<p>test3 desc</p>
<details></details>
<details></details>
<details></details>
<details></details>
</main>
"""
main_content=html.fromstring(test_html)
我可以保存id,但实际上我不知道如何使用XPath来只计算具有该id的前面的sibling h3.
#count number of details before each h3
for idx, h3 in enumerate(main_content.xpath("//h3")):
#print h3 id
print(h3.get("id"))
#here is there any way with the xpath to count the preceding-sibling specifying the id?
print(len(h3.xpath("preceding-sibling::details")))
这会产生以下结果:
测试1 0 测试2 4. 测试3 8个
编辑: 或许可以通过以下方式解决这个问题:
print(main_content.xpath(f"count(//h3[@id='{h3.get('id')}']/following-sibling::details)-count(//h3[@id='{h3.get('id')}']/following-sibling::h3/following-sibling::details)"))
看起来很管用!
有没有办法得到元素而不是计数? 我知道我可以在XPath中使用Not,但我实际上不知道如何将其放在适当的位置: 以下是我的观点:
import lxml
from lxml import html
test_html="""
<main>
<h3 id="test1">
<p>test1 desc</p>
<details></details>
<details></details>
<details></details>
<details></details>
<h3 id="test2">
<p>test2 desc</p>
<details></details>
<details></details>
<details></details>
<details></details>
<h3 id="test3">
<p>test3 desc</p>
<details></details>
<details></details>
<details></details>
<details></details>
</main>
"""
main_content=html.fromstring(test_html)
#count number of details before each h3
for idx, h3 in enumerate(main_content.xpath("//h3")):
#print h3 id
print(h3.get("id"))
print(main_content.xpath(f"count(//h3[@id='{h3.get('id')}']/following-sibling::details)-count(//h3[@id='{h3.get('id')}']/following-sibling::h3/following-sibling::details)"))
#get the actual details
list_of_details= main_content.xpath(f"//h3[@id='{h3.get('id')}']/following-sibling::details)[not(//h3[@id='{h3.get('id')}']/following-sibling::h3/following-sibling::details)]")
这将返回一个异常:
Lxml.etree.XPath EvalError:无效表达式
这里有一个可以try 的链接:link 提前谢谢您!