如何计算没有根元素的先前 html 标记(lxml python3)编辑：还获取元素

发布于11月05日

在下一次h3之前，我如何计算详细信息的数量？

我准备了一个基本的测试片段，如下所示:

import lxml
from lxml import html
test_html="""
<main>
   <h3 id="test1">
   <p>test1 desc</p>
   <details></details>
   <details></details>
   <details></details>
   <details></details>
   <h3 id="test2">
   <p>test2 desc</p>
   <details></details>
   <details></details>
   <details></details>
   <details></details>
   <h3 id="test3">
   <p>test3 desc</p>
   <details></details>
   <details></details>
   <details></details>
   <details></details>
</main>
"""

main_content=html.fromstring(test_html)

我可以保存id，但实际上我不知道如何使用XPath来只计算具有该id的前面的sibling h3.

#count number of details before each h3
for idx, h3 in enumerate(main_content.xpath("//h3")):
    #print h3 id
    print(h3.get("id"))
    #here is there any way with the xpath to count the preceding-sibling specifying the id?
    print(len(h3.xpath("preceding-sibling::details")))

这会产生以下结果:

测试1 0 测试2 4. 测试3 8个

编辑: 或许可以通过以下方式解决这个问题:

print(main_content.xpath(f"count(//h3[@id='{h3.get('id')}']/following-sibling::details)-count(//h3[@id='{h3.get('id')}']/following-sibling::h3/following-sibling::details)"))

看起来很管用！

有没有办法得到元素而不是计数？我知道我可以在XPath中使用Not，但我实际上不知道如何将其放在适当的位置: 以下是我的观点:

import lxml
from lxml import html
test_html="""
<main>
   <h3 id="test1">
   <p>test1 desc</p>
   <details></details>
   <details></details>
   <details></details>
   <details></details>
   <h3 id="test2">
   <p>test2 desc</p>
   <details></details>
   <details></details>
   <details></details>
   <details></details>
   <h3 id="test3">
   <p>test3 desc</p>
   <details></details>
   <details></details>
   <details></details>
   <details></details>
</main>
"""

main_content=html.fromstring(test_html)
#count number of details before each h3
for idx, h3 in enumerate(main_content.xpath("//h3")):
    #print h3 id
    print(h3.get("id"))
    print(main_content.xpath(f"count(//h3[@id='{h3.get('id')}']/following-sibling::details)-count(//h3[@id='{h3.get('id')}']/following-sibling::h3/following-sibling::details)"))
    #get the actual details
    list_of_details= main_content.xpath(f"//h3[@id='{h3.get('id')}']/following-sibling::details)[not(//h3[@id='{h3.get('id')}']/following-sibling::h3/following-sibling::details)]")

这将返回一个异常:

Lxml.etree.XPath EvalError:无效表达式

这里有一个可以try 的链接:link 提前谢谢您！

test_html=""" <main> <h3 id="test1"> <p>test1 desc</p> <details></details> <details></details> <h3 id="test2"> <p>test2 desc</p> <details></details> <details></details> <details></details> <h3 id="test3"> <p>test3 desc</p> <details></details> <details></details> <details></details> <details></details> </main> """

for de in main_content.xpath('//h3'): count=0 for child in de.xpath('.//following-sibling::*'): if child.tag == "h3": break else: if child.tag == "details": count+=1 print(count)

for de in main_content.xpath('//h3'): count=0 for child in list(reversed(de.xpath('.//preceding-sibling::*'))): if child.tag == "h3": break else: if child.tag == "details": count+=1 print(count)

如何计算没有根元素的先前 html 标记(lxml python3)编辑：还获取元素

推荐答案

Html相关问答推荐

当ul为Flex时，使ul元素在指定高度内可垂直滚动

如何仅 Select 与子代CSS类匹配的第一个元素

垂直页眉，每行只显示一个使用css的字母

如何翻转卡片图像的背面

Django中的图像未从静态显示

每个元素的CSS网格高度相等，以保持响应性

柔性盒内的物品；t覆盖整个宽度，即使设置为100%

CSS 伪类 Select 器未按预期工作

为什么我的网页显示网格区域彼此相邻，而不是我设置它们的方式？

使用 Thymeleaf 将图像水平居中

如何使容器的大小适合其 position：absolute; 子元素的大小

如何将两个平行 div 的页面内的表格居中？

HTML、CSS设计图像出格

如何为某些行具有 rowspan 的表的每个奇数行着色

Mediawiki css/html 信息框创建空白行

如何在没有容器的情况下沿基线将 div 中的元素居中？

在 html 邮箱的左侧和右侧制作多个元素很热门吗？

有没有办法使用 CSS border-radius 创建 HTML 视频标签的三角形显示？

使用 CSS 样式化进度条

如何使具有两个元素的 div 行在移动设备中响应