[英] Convert a list of "dictionary of dictionaries" to a dataframe
I have a list of "dictionary of dictionaries" that looks like this:
lis = [{'Health and Welfare Plan + Change Notification': {'evidence_capture': 'null',
'test_result_justification': 'null',
'latest_test_result_date': 'null',
'last_updated_by': 'null',
'test_execution_status': 'Not Started',
'test_result': 'null'}},
{'Health and Welfare Plan + Computations': {'evidence_capture': 'null',
'test_result_justification': 'null',
'latest_test_result_date': 'null',
'last_updated_by': 'null',
'test_execution_status': 'Not Started',
'test_result': 'null'}},
{'Health and Welfare Plan + Data Agreements': {'evidence_capture': 'null',
'test_result_justification': 'Due to the Policy',
'latest_test_result_date': '2019-10-02',
'last_updated_by': 'null',
'test_execution_status': 'In Progress',
'test_result': 'null'}},
{'Health and Welfare Plan + Data Elements': {'evidence_capture': 'null',
'test_result_justification': 'xxx',
'latest_test_result_date': '2019-10-02',
'last_updated_by': 'null',
'test_execution_status': 'In Progress',
'test_result': 'null'}},
{'Health and Welfare Plan + Data Quality Monitoring': {'evidence_capture': 'null',
'test_result_justification': 'xxx',
'latest_test_result_date': '2019-08-09',
'last_updated_by': 'null',
'test_execution_status': 'Completed',
'test_result': 'xxx'}},
{'Health and Welfare Plan + HPU Source Reliability': {'evidence_capture': 'null',
'test_result_justification': 'xxx.',
'latest_test_result_date': '2019-10-02',
'last_updated_by': 'null',
'test_execution_status': 'In Progress',
'test_result': 'null'}},
{'Health and Welfare Plan + Lineage': {'evidence_capture': 'null',
'test_result_justification': 'null',
'latest_test_result_date': 'null',
'last_updated_by': 'null',
'test_execution_status': 'Not Started',
'test_result': 'null'}},
{'Health and Welfare Plan + Metadata': {'evidence_capture': 'null',
'test_result_justification': 'Valid',
'latest_test_result_date': '2020-07-02',
'last_updated_by': 'null',
'test_execution_status': 'Completed',
'test_result': 'xxx'}},
{'Health and Welfare Plan + Usage Reconciliation': {'evidence_capture': 'null',
'test_result_justification': 'Test out of scope',
'latest_test_result_date': '2019-10-02',
'last_updated_by': 'null',
'test_execution_status': 'In Progress',
'test_result': 'null'}}]
I would like to convert the list into a dataframe that looks like this:
evidence_capture last_updated_by latest_test_result_date test_execution_status test_result test_result_justification test_category
Change Notification null null null Not Started null null Health and Welfare Plan
Computations null null null Not Started null null Health and Welfare Plan
Data Agreements null null 2019-10-02 In Progress null Due to the Policy Health and Welfare Plan
Data Elements null null 2019-10-02 In Progress null xxx Health and Welfare Plan
Data Quality Monitoring null null 2019-08-09 Completed xxx xxx Health and Welfare Plan
HPU Source Reliability null null 2019-10-02 In Progress null xxx. Health and Welfare Plan
Lineage null null null Not Started null null Health and Welfare Plan
Metadata null null 2020-07-02 Completed xxx Valid Health and Welfare Plan
Usage Reconciliation null null 2019-10-02 In Progress null Test out of scope Health and Welfare Plan
My code to build the dataframe is using a for-loop to concat the records column by column. After that to process the column names, and then transpose it. The final output would have the repeated string "Health and Welfare Plan" removed from each row index, but appended as a new column.
df3 = pd.DataFrame(lis[0])
for i in range(1, len(lis)):
df3 = pd.concat([df3, pd.DataFrame(lis[i])], axis=1)
df3.columns = [col.split(' + ')[1] for col in df3.columns]
df3 = df3.T
df3['test_category'] = 'Health and Welfare Plan'
print(df3)
The code is able to produce the final output, but using "expensive" functions of both for-loop and dataframe concat. So I was wondering if there is a better way to output the same results?