我刚刚开始使用Python,并试图为我最喜欢的广播电台之一收集曲目列表.

我已经设法获得了曲目标题,我最初的问题是艺术家和曲目是分开的.我需要的是把这个放在一条线上.

我也试图获得歌曲的时间(08UNK 52)从跨度,但不能一种方法来获得只是innner文本.

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://raddio.net/112842-loversrock-radio/playlist/'

r = requests.get(url)

soup = BeautifulSoup(r.text, "lxml")

station = soup.find("h1")
print(station.text.strip())

all_tracks = soup.find_all("div", class_ = "title")
#print(all_tracks).text

for track in all_tracks:
    individual_track_data = track.text.strip()
    print(individual_track_data)

艺术家和轨道需要在同一条线上.

<div class="title"><b>Beres Hammond</b>
                    - Settling Down                </div>

想要拿到赛道的播放时间.

<span data-name="translated-song-time" data-time="2023-10-28 07:52">**08:52**</span>

下面是我想在Pandas数据框中显示的最终目标.

enter image description here

推荐答案

您可以try :

station = soup.find('h1').text.strip()
all_tracks = soup.find_all("div", attrs={'class': 'item'})

tracks = []
for track in all_tracks:
    try:
        artist, title = track.find('div', attrs={'class': 'title'}).text.split('\n')
        title = title[title.index('-')+1:].strip()
        played_time = pd.Timestamp(track.find('span')['data-time'], tz='UTC')
        tracks.append([station, artist, title, played_time])
    except ValueError:
        pass

df = pd.DataFrame(tracks, columns=['Station', 'Artist', 'Track', 'Played Time'])

输出:

>>> df
                      Station             Artist                  Track               Played Time
0    Playlist LoversrockRadio       Sylvia Tella          Plastic Smile 2023-10-28 08:36:00+00:00
1    Playlist LoversrockRadio  01.Latoya Jackson        Camp Kutchi Kai 2023-10-28 08:31:00+00:00
2    Playlist LoversrockRadio     Gregory Isaacs               Margaret 2023-10-28 08:27:00+00:00
3    Playlist LoversrockRadio       Jerry Harris       New Love For You 2023-10-28 08:24:00+00:00
4    Playlist LoversrockRadio        Gappy Ranks    Heaven In Your Eyes 2023-10-28 08:21:00+00:00
..                        ...                ...                    ...                       ...
118  Playlist LoversrockRadio      Beres Hammond          Settling Down 2023-10-28 00:16:00+00:00
119  Playlist LoversrockRadio            Gyptain  You're The Best Thing 2023-10-28 00:12:00+00:00
120  Playlist LoversrockRadio       Jerry Harris       New Love For You 2023-10-28 00:09:00+00:00
121  Playlist LoversrockRadio    Everton Blender      Lift Up Your Head 2023-10-28 00:04:00+00:00
122  Playlist LoversrockRadio       Sugar Minott       Make It With You 2023-10-28 00:00:00+00:00

[123 rows x 4 columns]

注:

  1. 如果需要的话,你得洗artist块.
  2. played_time设置为协调世界时.您必须将其转换为您当地的时区.

类似于:

>>> df['Played Time'].dt.tz_convert('Europe/Paris').dt.strftime('%-H:%M')
0      10:36
1      10:31
2      10:27
3      10:24
4      10:21
       ...  
118     2:16
119     2:12
120     2:09
121     2:04
122     2:00
Name: Played Time, Length: 123, dtype: object

Python相关问答推荐

如何在WTForm中使用back_plumates参考brand_id?

Plotly Dash函数来切换图形参数-pPython

仅对matplotlib的条标签中的一个条标签应用不同的格式

如何在vercel中指定Python运行时版本?

有没有方法可以修复删除了换码字符的无效的SON记录?

如果AST请求默认受csref保护,那么在Django中使用@ system_decorator(csref_protect)的目的是什么?

阅读Polars Python中管道的函数定义

是pandas.DataFrame使用方法查询后仍然排序吗?

使用GEKKO在简单DTE系统中进行一致初始化

Pythind 11无法弄清楚如何访问tuple元素

Polars比较了两个预设-有没有方法在第一次不匹配时立即失败

Pandas实际上如何对基于自定义的索引(integer和非integer)执行索引

如何使用pandasDataFrames和scipy高度优化相关性计算

连接两个具有不同标题的收件箱

Python 约束无法解决n皇后之谜

将两只Pandas rame乘以指数

从dict的列中分钟

将pandas Dataframe转换为3D numpy矩阵

如何使用Numpy. stracards重新编写滚动和?

matplotlib + python foor loop