在下面的代码中,我正在try 使用XCLASS提取以下代码中粗体政客姓名后面的文本.我能够将政客的名字和URL提取到他们的个人资料中,但我该如何提取下面的文本呢?
在下面的代码中,我try 使用以下命令来提取它:
desctext = elem.find_element(By.XPATH,".//b/following-sibling::text()")
我试过一百万种其他方法,但都没有用.例如,网站上写道:"100(R),前蒙大拿州国务卿,于2022年11月11日宣布参选.[35]斯台普顿于2023年10月13日退出竞选."
我想在科里·斯台普顿之后发短信.粗体标记中嵌入了一个href标记,文本紧随其后.
driver = webdriver.Chrome()
pres_candidates_url = "https://ballotpedia.org/Presidential_candidates,_2024"
driver.get(pres_candidates_url)
elems = driver.find_elements(By.XPATH, "//div[@class='mw-parser-output']//ul//li")
all_members = []
for elem in elems:
member = {}
try:
linktext = elem.find_element(By.XPATH,".//b//a")
except:
continue
words = linktext.text.split()
print
# words = elem.text.split()
count = 0
for w in words: #linktext contains non-names so remove those based on more than one word being lowercase
if w[0].islower():
count +=1
if count < 1:
name = linktext.text
member_url = linktext.get_attribute("href")
try:
desctext = elem.find_element(By.XPATH,".//b/following-sibling::text()")
except:
print("error")
if "(D)" in desctext:
party = "Democrat"
elif "(R)" in desctext:
party = "Republican"
else:
party = desctext
metadata = {"Party:": party}
print(name, member_url, metadata)
member["name"], member["url"], member["metadata"] = name, member_url, metadata
else:
continue
all_members.append(member)