我试图将以下深度嵌套的JSON转换为(最终)csv.下面只是一个示例,完整的JSON非常庞大(12GB).
{'reporting_entity_name':'Blue Cross and Blue Shield of Alabama',
'reporting_entity_type':'health insurance issuer',
'last_updated_on':'2022-06-10',
'version':'1.1.0',
'in_network':[
{'negotiation_arrangement': 'ffs',
'name': 'xploration of Kidney',
'billing_code_type': 'CPT',
'billing_code_type_version': '2022',
'billing_code': '50010',
'description': 'Renal Exploration, Not Necessitating Other Specific Procedures',
'negotiated_rates': [{
'negotiated_prices': [
{'negotiated_type': 'negotiated',
'negotiated_rate': 993.0,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'},
{'negotiated_type': 'negotiated',
'negotiated_rate': 1180.68,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'},
{'negotiated_type': 'negotiated',
'negotiated_rate': 1283.95,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'},
{'negotiated_type': 'negotiated',
'negotiated_rate': 1042.65,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'},
{'negotiated_type': 'negotiated',
'negotiated_rate': 1290.9,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'},
{'negotiated_type': 'negotiated',
'negotiated_rate': 1241.25,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'}
}]}]}
最终目标是拥有一个数据帧或字典,然后我可以将其写入csv.我希望每一行都有列:
{'reporting_entity_name':'','reporting_entity_type':'','last_updated_on':'','version':'','negotiation_arrangement':'','name':'','billing_code_type':'','billing_code_type_version':'','billing_code':'','description':'','provider_groups':'','negotiated_type':'','negotiated_rate':'','expiration_date':'','service_code':'','billing_class':''}
到目前为止,我已经try 了pandas normalize\u json、flatten和我在GitHub上找到的一些自定义模块.但似乎没有人将数据标准化/扁平化为新行,只有列.因为这是一个如此庞大的数据集,我担心在一堆嵌套循环中递归地执行此操作,因为我担心它会很快耗尽我所有的内存.提前感谢您提供的任何建议!