我有以下总结的html代码(html_file.html).
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<div class="listing-wrapper__content">
<section class="card__amenities ">
<p class="l-text l-u-color-neutral-28 l-text--variant-body-small l-text--weight-regular card__amenity" itemprop="floorSize"><span data-testid="l-icon" role="document" aria-label="Tamanho do imóvel" class="l-icon l-u-color-undefined"><svg viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">...</svg></span> 94 - 100 m² </p>
<p class="l-text l-u-color-neutral-28 l-text--variant-body-small l-text--weight-regular card__amenity" itemprop="numberOfRooms"><span data-testid="l-icon" role="document" aria-label="Quantidade de quartos" class="l-icon l-u-color-undefined"><svg viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">...</svg></span> 3 </p>
<p class="l-text l-u-color-neutral-28 l-text--variant-body-small l-text--weight-regular card__amenity" itemprop="numberOfBathroomsTotal"<span data-testid="l-icon" role="document" aria-label="Quantidade de banheiros" class="l-icon l-u-color-undefined"><svg viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">...</svg></span>3</p>
<p class="l-text l-u-color-neutral-28 l-text--variant-body-small l-text--weight-regular card__amenity"><span data-testid="l-icon" role="document" aria-label="Quantidade de vagas de garagem" class="l-icon l-u-color-undefined"><svg viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><...</svg></span>2</p>
</section>
</div>
</body>
</html>
我设法提取了前三个元素.例如:
library(rvest)
pagee <- read_html("html_file.html")
nofrooms <- html_elements(pagee, ".listing-wrapper__content")%>%html_nodes("[itemprop='numberOfRooms']")%>%html_text()
nofrooms
输出为
" 3 "
问题出在last p tag分.显然,对于我来说,没有标准能够从这样的标签中提取信息.我已经try 了以下几种方法,但都没有成功:
nofgarage <- html_elements(pagee, ".listing-wrapper__content")%>%html_nodes("[aria-label='Quantidade de vagas de garagem']")%>%html_text()
nofgarage
输出为
""
正如预期的那样,结果是空的,因为我要提取的值不在span标记之间.
谢谢你的帮助