您可以try :
station = soup.find('h1').text.strip()
all_tracks = soup.find_all("div", attrs={'class': 'item'})
tracks = []
for track in all_tracks:
try:
artist, title = track.find('div', attrs={'class': 'title'}).text.split('\n')
title = title[title.index('-')+1:].strip()
played_time = pd.Timestamp(track.find('span')['data-time'], tz='UTC')
tracks.append([station, artist, title, played_time])
except ValueError:
pass
df = pd.DataFrame(tracks, columns=['Station', 'Artist', 'Track', 'Played Time'])
输出:
>>> df
Station Artist Track Played Time
0 Playlist LoversrockRadio Sylvia Tella Plastic Smile 2023-10-28 08:36:00+00:00
1 Playlist LoversrockRadio 01.Latoya Jackson Camp Kutchi Kai 2023-10-28 08:31:00+00:00
2 Playlist LoversrockRadio Gregory Isaacs Margaret 2023-10-28 08:27:00+00:00
3 Playlist LoversrockRadio Jerry Harris New Love For You 2023-10-28 08:24:00+00:00
4 Playlist LoversrockRadio Gappy Ranks Heaven In Your Eyes 2023-10-28 08:21:00+00:00
.. ... ... ... ...
118 Playlist LoversrockRadio Beres Hammond Settling Down 2023-10-28 00:16:00+00:00
119 Playlist LoversrockRadio Gyptain You're The Best Thing 2023-10-28 00:12:00+00:00
120 Playlist LoversrockRadio Jerry Harris New Love For You 2023-10-28 00:09:00+00:00
121 Playlist LoversrockRadio Everton Blender Lift Up Your Head 2023-10-28 00:04:00+00:00
122 Playlist LoversrockRadio Sugar Minott Make It With You 2023-10-28 00:00:00+00:00
[123 rows x 4 columns]
注:
- 如果需要的话,你得洗
artist
块.
played_time
设置为协调世界时.您必须将其转换为您当地的时区.
类似于:
>>> df['Played Time'].dt.tz_convert('Europe/Paris').dt.strftime('%-H:%M')
0 10:36
1 10:31
2 10:27
3 10:24
4 10:21
...
118 2:16
119 2:12
120 2:09
121 2:04
122 2:00
Name: Played Time, Length: 123, dtype: object