我正在try 为从集合列表中抓取的每一组节目(set1、set2、encore)创建另一组键值对.调频,而不仅仅是歌曲列表,没有分离.我无法理解的是,如何访问表示该节目集的元素,然后将歌曲列表附加到它之后,直到它到达下一集.以下是我正在访问的html:

目前,我的JSON文件如下所示:

`{0}'

    "artist": "Sample Artist",
    "day": 20,
    "month": 1,
    "songs": ["Song A","Song B","Song C"
    ],
    "tour": "2000 U.S. Tour",
    "venue": "Sample Venue, Atlanta, GA, USA",
    "year": 2000
},`

而我希望它看起来像这样:

 "artist": "Sample Artist",
    "day": 20,
    "month": 1,
    "songs": ["Song A","Song B","Song C"
    ],
    "set1": ["Song A"],
    "set2": ["Song B"],
    "encore":["Song C"],
    "tour": "2000 U.S. Tour",
    "venue": "Sample Venue, Atlanta, GA, USA",
    "year": 2000
},`

下面是我用来生成JSON歌曲列表的代码,但我不确定如何单独获取这些歌曲集:

def getConcertData(i, url, concerts):

try:
    
    soup = getSoup(url)
    
    dateBlock = soup.find_all("div", {"class": "dateBlock"})[0]
    infoContainer = soup.find_all("div", {"class": "infoContainer"})[0]
    headLineDiv = infoContainer.find_all("div", {"class": "setlistHeadline"})[0]
    setlistDiv = soup.find_all("div", {"class": "setlistList"})[0]


    #removed unrelated code for question
    
    songs = []
    
    for a in setlistDiv.find_all("a", {"class": "songLabel"}):
        songs.append(a.getText().strip())
    
    print(str(year)+"."+str(month).zfill(2)+"."+str(day).zfill(2)+": "+venue)
    
    data = dict()
    data["artist"] = artist
    data["year"] = year
    data["month"] = month
    data["day"] = day
    data["venue"] = venue
    data["tour"] = tour
    data["songs"] = songs
    # data["set1"] = 0
    # data["set2"] = 0
    # data["encore"] = 0
    
    concerts[i] = data
    

推荐答案

如果我理解正确,您希望将歌曲"分组"到其分区:

import requests
from bs4 import BeautifulSoup


url = "https://www.setlist.fm/setlist/phish/2022/ruoff-home-mortgage-music-center-noblesville-in-3b4e5a7.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")


out = {}
out["artist"] = soup.h1.a.get_text(strip=True)
out["month"] = soup.select_one(".month").text
out["day"] = soup.select_one(".day").text
out["year"] = soup.select_one(".year").text
out["venue"] = soup.select_one('a[href*="/venue/"]').text

for li in soup.select(".setlistList li.song"):
    song_name = li.a.get_text(strip=True)
    section = (
        li.find_previous("li", class_="highlight")
        .get_text(strip=True)
        .strip(" :")
    )

    out.setdefault("songs", []).append(song_name)
    out.setdefault(section, []).append(song_name)

print(out)

打印:

{
    "artist": "Phish",
    "month": "Jun",
    "day": "5",
    "year": "2022",
    "venue": "Ruoff Home Mortgage Music Center, Noblesville, IN, USA",
    "songs": [
        "While My Guitar Gently Weeps",
        "My Soul",
        "Rift",
        "Horn",
        "Wombat",
        "Evolve",
        "Guyute",
        "Limb by Limb",
        "Mercury",
        "The Moma Dance",
        "Sand",
        "Sigma Oasis",
        "Twenty Years Later",
        "The Mango Song",
        "Rise/Come Together",
        "Free",
        "Grind",
        "Slave to the Traffic Light",
    ],
    "Set 1": [
        "While My Guitar Gently Weeps",
        "My Soul",
        "Rift",
        "Horn",
        "Wombat",
        "Evolve",
        "Guyute",
        "Limb by Limb",
        "Mercury",
        "The Moma Dance",
    ],
    "Set 2": [
        "Sand",
        "Sigma Oasis",
        "Twenty Years Later",
        "The Mango Song",
        "Rise/Come Together",
        "Free",
    ],
    "Encore": ["Grind", "Slave to the Traffic Light"],
}

Python相关问答推荐

无法使用python.h文件; Python嵌入错误

Python中使用时区感知日期时间对象进行时间算术的Incredit

Python Hashicorp Vault库hvac创建新的秘密版本,但从先前版本中删除了密钥

通过优化空间在Python中的饼图中添加标签

我在使用fill_between()将最大和最小带应用到我的图表中时遇到问题

如何使用html从excel中提取条件格式规则列表?

如何制作10,000年及以后的日期时间对象?

更改键盘按钮进入'

PMMLPipeline._ fit()需要2到3个位置参数,但给出了4个位置参数

大小为M的第N位_计数(或人口计数)的公式

无法在Docker内部运行Python的Matlab SDK模块,但本地没有问题

多处理队列在与Forking http.server一起使用时随机跳过项目

为什么\b在这个正则表达式中不解释为反斜杠

在Python中调用变量(特别是Tkinter)

如何在Pyplot表中舍入值

Cython无法识别Numpy类型

当单元测试失败时,是否有一个惯例会抛出许多类似的错误消息?

从嵌套极轴列的列表中删除元素

Autocad使用pyautocad/comtypes将对象从一个图形复制到另一个图形

如何在PythonPandas 中对同一个浮动列进行逐行划分?