您看到的数据是通过Ajax请求加载的,因此BeautifulSoup不会看到它.你可以通过requests
来模拟它.要将数据加载到一个 Big Data 帧,可以使用下一个示例:
import requests
import pandas as pd
url = "https://flightaware.com/ajax/airport/cancelled_count.rvt"
params = {
"type": "airline",
"timeFilter": "((b.sch_block_out BETWEEN '2022-04-02 8:00' AND '2022-04-03 8:00') OR (b.sch_block_out IS NULL AND b.filed_departuretime BETWEEN '2022-04-02 8:00' AND '2022-04-03 8:00'))",
"timePeriod": "today",
"airportFilter": "",
}
all_dfs = []
for params["type"] in ("airline", "destination", "origin"):
df = pd.read_html(requests.get(url, params=params).text)[0]
df["type"] = params["type"]
all_dfs.append(df)
df_final = pd.concat(all_dfs)
print(df_final)
df_final.to_csv("data.csv", index=False)
输出:
Airline Airport Cancelled Delayed type
Airline Airport # % # %
0 China Eastern NaN 509 45% 28 2% airline
1 Spring Airlines NaN 443 82% 5 0% airline
2 Southwest NaN 428 12% 1369 39% airline
3 American Airlines NaN 317 10% 472 16% airline
4 Delta NaN 229 8% 444 16% airline
5 Spirit NaN 190 23% 207 26% airline
6 Hainan Airlines NaN 167 41% 9 2% airline
7 JetBlue NaN 144 14% 494 48% airline
8 Lion Air NaN 129 20% 53 8% airline
9 easyJet NaN 121 8% 471 32% airline
...
并保存data.csv
个(LibreOffice截图):