我有一个具有以下 struct 的html文件.如您所见,标题和每个标题下面的匹配项没有分组到它们自己的div中.
<div class="basketball">
<div class="header">
<span class="event_title">Playoff</span>
</div>
<div class="match">
<div class="home">Bakken</div>
<div class="away">Akken</div>
<div class="home score">90</div>
<div class="away score">70</div>
</div>
<div class="match">
<div class="home">Monaco</div>
<div class="away">Strasbourg</div>
<div class="home score">80</div>
<div class="away score">65</div>
</div>
<div class="header">
<span class="event_title">Semi Finals</span>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="header">
<span class="event_title">Quarter Finals</span>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="header">
<span class="event_title">Normal Season Matches</span>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
<div class="match">
<div class="home">Randers</div>
<div class="away">Celtics</div>
<div class="home score">60</div>
<div class="away score">90</div>
</div>
</div>
我想将标题和每个标题下面的匹配项分组到单独的div中,但这并不经济,因为原始文件有1000多行html标记.
我需要提取数据,以便输出如下所示:
data = {Playoff: [["Bakken", "Akken", 90, 70], ["Monaco", "Strasbourg", 80, 65]],
Semi Finals: [["Randers", "Celtics", 60, 90], [...]]
Quarter Finals: [.... ],
Normal Season Matches: [.... ]}
我做的第一部分是:
data = {}
for i in soup.find_all("div", class_="header"):
title = i.find("span", class_="event_title").get_text()
data[title] = []
data
# output
{'Playoff': [],
'Semi Finals': [],
'Quarter Finals': [],
'Normal Season Matches': []}
我想不出如何在单子上填上正确的匹配项.任何帮助我们都将不胜感激.