我有一个样例JSON文件,如下所示

data = {
    "type": "video",
    "videoID": "vid001",
    "links": [
        {"type": "video", "videoID": "vid002", "links": []},
        {"type": "video",
         "videoID": "vid003",
         "links": [
             {"type": "video", "videoID": "vid004"},
             {"type": "video", "videoID": "vid005"},
         ]
         },
        {"type": "video", "videoID": "vid006"},
        {"type": "video",
         "videoID": "vid007",
         "links": [
             {"type": "video", "videoID": "vid008", "links": [
                 {"type": "video",
                  "videoID": "vid009",
                  "links": [{"type": "video", "videoID": "vid010"}]
                  }
             ]}
         ]},
    ]
}

我只需要从json文件中提取specific key and values并将其转换为CSV文件

编号:REF:Extracting Specific Keys/Values From A Messed-Up JSON File (Python)

def extract(data, keys):
    out = []
    queue = [data]
    while len(queue) > 0:
        current = queue.pop(0)
        if type(current) == dict:
            for key in keys:
                if key in current:
                    out.append({key: current[key]})

            for val in current.values():
                if type(val) in [list, dict]:
                    queue.append(val)
        elif type(current) == list:
            queue.extend(current)
    return out

x = extract(data, ["videoID","type"])
print(pd.DataFrame.from_dict(x))

当我将两个值传递给提取()时,将NaN放在中间 result

videoID   type
0   vid001    NaN
1      NaN  video
2   vid002    NaN
3      NaN  video
4   vid003    NaN
5      NaN  video
6   vid006    NaN
7      NaN  video
8   vid007    NaN
9      NaN  video
10  vid004    NaN
11     NaN  video
12  vid005    NaN
13     NaN  video
14  vid008    NaN
15     NaN  video
16  vid009    NaN
17     NaN  video
18  vid010    NaN
19     NaN  video

我需要获得如下所示的输出

    videoID   type
0   vid001    video
1   vid002    video
2   vid003    video
3   vid004    video
etc...

并将其转换为CSV文件,有人能帮我解决这个问题吗

推荐答案

我觉得你的方法还行.您只是在for key in keys循环中犯了一个错误:您当前所做的是 for each 元素创建一个DICT({key: current[key]}).所以在最后你会有一个out列表,这是一个不相关的词典的列表,其中每个videoID在一个词典中,type在一个不同的词典中. 就像这样:

[{'videoID': 'vid001'}, {'type': 'video'}, {'videoID': 'vid002'}, {'type': 'video'}, {'videoID': 'vid003'}, {'type': 'video'}, {'videoID': 'vid006'}, {'type': 'video'}, {'videoID': 'vid007'}, {'type': 'video'}, {'videoID': 'vid004'}, {'type': 'video'}, {'videoID': 'vid005'}, {'type': 'video'}, {'videoID': 'vid008'}, {'type': 'video'}, {'videoID': 'vid009'}, {'type': 'video'}, {'videoID': 'vid010'}, {'type': 'video'}]

相反,你想要的是:

[{'videoID': 'vid001', 'type': 'video'}, {'videoID': 'vid002', 'type': 'video'}, {'videoID': 'vid003', 'type': 'video'}, {'videoID': 'vid006', 'type': 'video'}, {'videoID': 'vid007', 'type': 'video'}, {'videoID': 'vid004', 'type': 'video'}, {'videoID': 'vid005', 'type': 'video'}, {'videoID': 'vid008', 'type': 'video'}, {'videoID': 'vid009', 'type': 'video'}, {'videoID': 'vid010', 'type': 'video'}]

其中每个VIDEO ID与其类型相关.

要做到这一点,你只需要在循环关键字时创建一个字典,将每个元素添加到字典中,然后在关键字循环结束时将该字典附加到out列表中.

我会做的是:

data_couple = {}
for key in keys:
    if key in current:
        data_couple[key] = current[key]
        # out.append({key: current[key]})
out.append(data_couple)``

因此,整个extract个函数将变成:

def extract(data, keys):
    out = []
    queue = [data]
    while len(queue) > 0:
        current = queue.pop(0)
        if type(current) == dict:
            data_couple = {}
            for key in keys:
                if key in current:
                    data_couple[key] = current[key]
            out.append(data_couple)
            for val in current.values():
                if type(val) in [list, dict]:
                    queue.append(val)
        elif type(current) == list:
            queue.extend(current)
    return out

最后,要在CSV文件中编写Dict,我只需使用CSV DictWriter:

import csv

def writeToCsv(dictionary, col_name):
    with open("file.csv", "w") as f:
        wr = csv.DictWriter(f, fieldnames=col_name)
        wr.writeheader()
        for elem in dictionary:
            wr.writerow(elem)
    
writeToCsv(your_dictionary, ["videoID", "type"])

这将创建一个具有videoIDtype列的CSV 或者,正如Ivan Calderon Answer here所建议的,你也可以在一行中使用Pandas to_csv的方法:

在此之前,您应该更改您的词典:

x = extract(data, ["videoID","type"])
d = {k: [v] for k, v in x.items()}
pd.DataFrame.from_dict(data=d, orient='columns').to_csv('dict_file.csv')

但我对此不是很确定.

Python相关问答推荐

如何在箱形图中添加绘制线的传奇?

根据另一列中的nan重置值后重新加权Pandas列

ModuleNotFound错误:没有名为flags.State的模块; flags不是包

scikit-learn导入无法导入名称METRIC_MAPPING64'

Mistral模型为不同的输入文本生成相同的嵌入

如何创建一个缓冲区周围的一行与manim?

在Python中动态计算范围

根据列值添加时区

未知依赖项pin—1阻止conda安装""

解决调用嵌入式函数的XSLT中表达式的语法移位/归约冲突

按条件添加小计列

如何编辑此代码,使其从多个EXCEL文件的特定工作表中提取数据以显示在单独的文件中

为罕见情况下的回退None值键入

上传文件并使用Panda打开时的Flask 问题

替换包含Python DataFrame中的值的<;

查找数据帧的给定列中是否存在特定值

如何在Python中实现高效地支持字典和堆操作的缓存?

Groupby并在组内比较单独行上的两个时间戳

如何在polars group_by中将多个行分组到列表中

判断字典键、值对是否满足用户定义的搜索条件