Javascript 用内嵌的含selenium的Java脚本抓取网站

发布于01月08日

我对selenium还是个新手，正在努力获取this website种selenium的含量.但是，该网站似乎是基于一个模板和一个运行的Java脚本来填充它，我不知道如何访问我看到的内容，如标题(Auf DEM Bahnhof)或目标等使用selenium.

我可以通过浏览Web开发人员工具找到所需元素的标记，但在运行下面的样例脚本后它们不会返回任何内容:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import Select,WebDriverWait


class Demo():

    def demo_get_contents(self):

        # create webdriver object
        service = Service(executable_path=ChromeDriverManager().install())
        driver = webdriver.Chrome(service=service)

        driver.get('https://gloss.dliflc.edu/LessonViewer.aspx?lessonId=26143&lessonName=ger_soc434&linkTypeId=0')
        element = WebDriverWait(driver, 2).until(EC.visibility_of_all_elements_located((By.CLASS_NAME,'gloss_Overview')))
        print(element.get_attribute('text'))


demo = Demo()
demo.demo_get_contents()

我使用的是Python3.8

查看Page源代码，我可以看到可能运行accesActivity()函数的Java脚本和IFRAME，但不知道如何使用Selify来访问实际的页面内容.

import requests import xml.etree.ElementTree as ET url = 'https://gloss.dliflc.edu/GlossHtml/templates/linksLO/glossLOs/ger_soc434.xml' def get_element_text(element): return ''.join(element.itertext()).strip() def find_elements_texts(root, tag): elements = root.findall(f".//{tag}[@dir='ltr'][@esbox='0']") return [get_element_text(elem) for elem in elements] response = requests.get(url).content root = ET.fromstring(response) objectives_texts = find_elements_texts(root, "OBJECTIVES") descriptions_texts = find_elements_texts(root, "ACTY_DESCRIPTION") print(f"Objective:\n {''.join(objectives_texts)}\n") print(f"Descriptions:\n {descriptions_texts}")

Objective: Strengthen listening skills and improve comprehension by focusing on terms related to train travel in an audio about a family at a train station before a trip. Descriptions: ['Identify relevant vocabulary and get a more detailed idea of the topic.', 'Preview useful terms and expressions that appear in the upcoming dialogue.', 'Become familiar with the specifics of the situation by listening to several dialogues.', 'Transcribe portions of another dialogue.', 'Assess your knowledge by matching questions with answers.']

Javascript 用内嵌的含selenium的Java脚本抓取网站

推荐答案

Javascript相关问答推荐

如何在RTK上设置轮询，每24小时

使用i18next在React中不重新加载翻译动态数据的问题

如何在Javascript中使用Go和检索本地托管Apache Arrow Flight服务器？

在我的html表单中的用户输入没有被传送到我的google表单中

如何在输入元素中附加一个属性为checkbox？

我怎么在JS里连续加2个骰子的和呢？

是什么导致了这种奇怪的水平间距错误(？)当通过JavaScript将列表项元素追加到无序列表时，是否在按钮之间？

为什么我的getAsFile()方法返回空？

在数组中查找重叠对象并仅返回那些重叠对象

如果一个字符串前面有点、空格或无字符串，后面有空格、连字符或无字符串，则匹配正则表达式

不同表的条件API端点Reaction-redux

JavaScript将字符串数字转换为整数

如何在尚未创建的鼠标悬停事件上访问和着色div？

与在编剧中具有动态价值的定位器交互

P5play SecurityError：无法从'；窗口'；读取命名属性'；Add'；：阻止具有源的帧访问跨源帧

需要RTK-在ReactJS中查询多个组件的Mutations 数据

对不同目录中的Angular material 表列进行排序

使用VITE开发服务器处理错误

通过ng-绑定-html使用插入的HTML中的函数

正在发出错误的URL请求