比方说,我想要获得this one份这样的成绩单.如果向下滚动,您将看到有一个h2元素,它既有文本"Transcript",又有id=‘Transcript’属性.如果我没有记错的话,出现在h2头下面的p个元素实际上是它的sibling 元素,这就是为什么以下两种解决方案都不起作用的原因:

# using rvest

t %>% 
  html_elements('#transcript') %>% 
  html_children()

t %>% 
  html_elements('#transcript p')

那么,我怎么才能只得到这p个元素呢?

我试着搜索了一下之前的智慧,只找到了BeautifulSoup用户提出的(某种)类似的问题.然而,这似乎应该是一个基本的问题,所以也许我比我认为的更离谱

推荐答案

这对你管用吗?有关说明,请参阅备注.

library(rvest)
library(xml2)

#read the page
url <- "https://80000hours.org/podcast/episodes/kevin-esvelt-stealth-wildfire-pandemics/"
page <- read_html(url)

#find the h2 elements
h2_elements <- page %>% html_elements('h2')
h2_text <- h2_elements %>% html_text()

#select the node with the word "Transcript
desired_h2 <- h2_elements[grep("Transcript", h2_text)]

#find the parent node of the desired h2
parent <- xml_parent(desired_h2)

#find all of the child "p" nodes under the parent
answer <- parent %>% html_elements("p") %>% html_text()

head(answer, 5)

[1] "Table of Contents"                                                                                                                                                                                                                                                                                                                                                            
[2] "Kevin Esvelt: So scientists correctly appreciate that, when there is controversy, you can get a paper in Nature, Science, or Cell — the top journals which are the best for your career."                                                                                                                                                                                     
[3] "Therefore, the incentives favour scientists identifying pandemic-capable viruses and determining whether posited cataclysmically destructive viruses and other forms of attack would actually function."                                                                                                                                                                      
[4] "And I have not seen any appreciable counter-incentives that could be anywhere near as powerful as the ones favouring our desire to know. Because almost all the time, it is better for us to know."                                                                                                                                                                           
[5] "So I don’t see many plausible futures in which we do not learn how to build agents that would bring down civilisation today. We just know that in the limit, if you get good enough at programming biology, we can do anything t

Html相关问答推荐

简化指标与Delta保持一致

使用固定的HTML代码创建两个可向右滚动的行

如何实现弯曲的梯形导航栏?

对称渐变作为背景

有没有一种方法可以动态地从网格或Flexbox中取出HTML元素?

当光标悬停在(相同)父元素上方时,为多个子元素创建不同的过渡动画

按钮之间的HTML文本居中不正确

如何设计缠绕锚点元素的样式?

(HTML框架标签)点击后目标框架将不再工作

如何使用 CSS 应用带有笔划的文本阴影?

如何配置 prettier 使其以 Vue.js 模板语法的特定方式运行?

用于表行组标题的正确 HTML 标记是什么?

如何并排放置部分?