早上好,
这是StackOverflow上的一个类似帖子的副本,那个帖子没有为我解决问题的答案.
在过go 的几天里,我使用Chrome驱动程序104的Python-Selify脚本在无限滚动、动态加载的页面上向下滚动时出现了问题.此脚本为 用于滚动Facebook和执行某些RPA操作,如发送消息等(我只附加了与错误相关的代码片段).
总而言之,用户输入一些帖子来达到,脚本将达到这个特定的帖子数量,例如,第一个1000个帖子,并执行某些操作(不违反Facebbook TOS)
这个脚本是100,它使用我的全部PC资源在docker实例或任何类型的容器中运行.此外,此脚本已在以下平台上进行了测试:
1-配备16 GB内存和i7处理器的Windows 11 PC
2-MacBook-16 GB
3-Windows Server 2019-32 GB内存,i7进程
4-Linux Ubuntu 22.0服务器-16 GB RAM(在此服务器上将开发/共享内存增加到30 GB)
5-Google Colab内核(增加了开发/安全管理)
以上所有文件都有完全相同的错误跟踪,但都有相同的错误,即由于页面崩溃而删除会话.
当脚本达到800-900个帖子时(这是一个随机数字,但它曾经达到12000个帖子,然后下一次在400个帖子上失败?)页面将变得非常慢,然后崩溃.现在需要注意的是,我在PC上滚动的帖子远远超过了正常情况下的1500条(就像手动操作一样),它肯定会崩溃.因此,我非常确定这是我的脚本中的一个错误,而不是因为内存问题(可能是脚本中的内存泄漏,但我的意思是不是硬件问题).当脚本中断时,RAM实际上还不到总RAM的80%.
如果我在100模式下运行脚本,我会在Chrome上收到一条错误消息,上面写着:
"哦,糟糕,Chrome内存不足"
为了节省你的时间,我在Stackover Flow上阅读了以下帖子,它们有100个帮助:
2-selenium.WebDriverException: unknown error: session deleted because of page crash from tab crashed
4-Getting "org.openqa.selenium.WebDriverException: unknown error: session deleted because of page crash" error when executing automation scripts(使用Java,但仍可阅读)
我试图解决这个问题的方法(但没有奏效):
1-调整窗口大小,根据此post.
2--使用Chrome选项--无Sandbox 和--禁用--dev-shm-用法
3--已try 使用--js标志(--max_old_space_size=8096)
4-禁用所有通知、地理位置消息、图像
5-确保Mac和Linux上的我的dev/shm足够大,Windows中的临时文件夹也足够大
6-添加了大量时间.滚动之间的睡眠().
7-try 使用不同的滚动方法(使用javascrip‘driver.ecute_script()’转到页面底部)
8-使用Firefox GeckoDriver以及Edge和Opera.
9-使用不同的方法判断页面上的帖子数量(BS4,LXML),这似乎不是问题,因为问题发生在滚动部分.
导致问题的代码片段:(Chrome选项没有在代码中列出,但我从一个单独的文件加载它们,不过我会在代码后面写下它们)
# Start Selenium Imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
# Selenium Imports Finished
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.action_chains import ActionChains
def login(email, password):
driver.get('https://www.facebook.com/')
#Email
driver.find_element(By.NAME,'email').send_keys(email)
#Password
driver.find_element(By.NAME,'pass').send_keys(password, Keys.RETURN)
time.sleep(2)
def reachPosts(noOfPosts = 50) -> None:
posts = driver.find_element(By.XPATH,"//div[@role='feed']").find_elements(By.CSS_SELECTOR, ".g4tp4svg.mfclru0v.om3e55n1.p8bdhjjv")
postsNo = len(posts)
posts = None
while postsNo < noOfPosts+1:
scroll_down()
posts = driver.find_element(By.XPATH,"//div[@role='feed']").find_elements(By.CSS_SELECTOR, ".g4tp4svg.mfclru0v.om3e55n1.p8bdhjjv")
time.sleep(1)
print(len(posts))
postsNo = len(posts)
if postsNo >= 1000:
time.sleep(10)
posts = None
posts = None
#----------------Scroll Function!-----------------------------#
def scroll_down():
"""A method for scrolling the page."""
# Scroll down to the bottom.
#driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
for i in range(3):
actions.send_keys(Keys.SPACE).perform()
#-----------------End-----------------------------------------#
def openGroup(facebookUrl, inputDate):
print("Opening Facebook Link")
driver.get(f'{facebookUrl}?sorting_setting=CHRONOLOGICAL')
time.sleep(2)
reachPosts(creds["Number of posts"])
posts = driver.find_element(By.XPATH,"//div[@role='feed']").find_elements(By.CSS_SELECTOR, ".g4tp4svg.mfclru0v.om3e55n1.p8bdhjjv")
noOfPosts = creds["Number of posts"]
def main():
global creds
creds = openCredentials()
login(creds["email"], creds["password"])
for group in creds['Facebook Groups']:
openGroup(group, creds["Date"])
time.sleep(3)
使用的Chrome选项:
"--disable-extensions",
"--disable-application-cache",
"--headless"
"window-size=600,450",
"--disable-blink-features=AutomationControlled",
"--enable-javascript",
"disable-infobars",
"--js-flags='--max_old_space_size=8196'",
"--max_old_space_size=4096",
"max_old_space_size=9000",
"--disable-dev-shm-usage",
"--incognito",
"--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
这个错误
Traceback (most recent call last):
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 313, in <module>
main()
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 302, in main
openGroup(group, creds["Date"])
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 254, in openGroup
reachPosts(creds["Number of posts"])
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 84, in reachPosts
scroll_down()
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\src\facebookScraperIndiv.py", line 104, in scroll_down
actions.send_keys(Keys.SPACE).perform()
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\common\action_chains.py", line 78, in perform
self.w3c_actions.perform()
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\common\actions\action_builder.py", line 88, in perform
self.driver.execute(Command.W3C_ACTIONS, enc)
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 434, in execute
self.error_handler.check_response(response)
File "D:\Work & Projects\Work\Upwork\Facebook Groups Scraper\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
(Session info: chrome=105.0.5195.102)
Stacktrace:
Backtrace:
Ordinal0 [0x0024DF13+2219795]
Ordinal0 [0x001E2841+1779777]
Ordinal0 [0x000F4100+803072]
Ordinal0 [0x000E6F18+749336]
Ordinal0 [0x000E5F94+745364]
Ordinal0 [0x000E6528+746792]
Ordinal0 [0x000EF42F+783407]
Ordinal0 [0x000FA938+829752]
Ordinal0 [0x0014F3CF+1176527]
Ordinal0 [0x0013E616+1107478]
Ordinal0 [0x00117F89+950153]
Ordinal0 [0x00118F56+954198]
GetHandleVerifier [0x00542CB2+3040210]
GetHandleVerifier [0x00532BB4+2974420]
GetHandleVerifier [0x002E6A0A+565546]
GetHandleVerifier [0x002E5680+560544]
Ordinal0 [0x001E9A5C+1808988]
Ordinal0 [0x001EE3A8+1827752]
Ordinal0 [0x001EE495+1827989]
Ordinal0 [0x001F80A4+1867940]
BaseThreadInitThunk [0x76236739+25]
RtlGetFullPathName_UEx [0x774D90AF+1215]
RtlGetFullPathName_UEx [0x774D907D+1165]
(No symbol) [0x00000000]