我目前正在try 使用pandas和BeautifulSoup来创建html表,但在这样做时遇到了一个问题.
这里是url:https://ciffc.net/en/ciffc/ext/member/sitrep/
由于页面本质上是动态的,每天都会添加或删除表,因此不能使用pd数据帧的索引.也就是说,下面是我希望以今天的表索引7为例从表中提取的输出.
display(df[7].iloc[1,2])
>> 'Yukon is at a level 3 prep level - but will trend upwards with the forecasted hot and dry weather.'
我没有这个问题,因为我可以使用pandas的match参数.read\u html,但此表没有标题.表中包含的数据也是非常动态的,我唯一能够识别的唯一元素是"注释"列.以下是我try 识别此表的步骤:
APLtable = pd.read_html(url, match='Comments')[0].head(14)
display(APLtable)
不幸的是,这不起作用,告诉我有以下错误
ValueError: No tables found matching pattern 'Comments'
我也try 过使用BeautifulSoup,但没有成功.考虑到网页的特殊性,我想知道是否有人知道如何引用该特定表格.
下面是有问题的html表:
</div></div><div id="section-apl" class="section-wrapper" data-title="E: Preparedness Levels"><div id="apl_table_wrapper"><table class="sticky-enabled">
<thead><tr><th class="">Agency</th><th title="Agency Preparedness Level" class=" tooltip">APL</th><th class="">Comments</th> </tr></thead>
<tbody>
<tr id="apl-table-row-0" class="odd"><td>BC</td><td>1</td><td></td> </tr>
<tr id="apl-table-row-1" class="even"><td>YT</td><td>3</td><td>Yukon is at a level 3 prep level - but will trend upwards with the forecasted hot and dry weather.</td> </tr>
<tr id="apl-table-row-2" class="odd"><td>AB</td><td>2</td><td></td> </tr>
<tr id="apl-table-row-3" class="even"><td>SK</td><td>1</td><td></td> </tr>
<tr id="apl-table-row-4" class="odd"><td>MB</td><td>1</td><td></td> </tr>
<tr id="apl-table-row-5" class="even"><td>ON</td><td>1</td><td></td> </tr>
<tr id="apl-table-row-6" class="odd"><td>QC</td><td>1</td><td></td> </tr>
<tr id="apl-table-row-7" class="even"><td>NL</td><td>2</td><td></td> </tr>
<tr id="apl-table-row-8" class="odd"><td>NB</td><td>1</td><td></td> </tr>
<tr id="apl-table-row-9" class="even"><td>NS</td><td>1</td><td></td> </tr>
<tr id="apl-table-row-10" class="odd"><td>PE</td><td>1</td><td></td> </tr>
<tr id="apl-table-row-11" class="even"><td>PC</td><td>1</td><td></td> </tr>
</tbody>
</table>