我正在try 编写一个Python程序,该程序将接受任何XML文件作为输入,并将其转换为CSV文件,而不会丢失任何XML标记/元素.我对使用任何选项都持开放态度,只要它使用的是Python.
我try 使用了xmltodict、json、csv和pandas python模块,能够阅读XML并将其转换成词典.但我无法将此词典转换为可写入CSV文件的列表,以确保捕获所有的XML字段.
我的样例XML文件:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<tag_1>
<tag_2>
<date value="06-30-2023">
<data>
<tag_3>val_3</tag_3>
<tag_4>val_4</tag_4>
<tag_5>val_5_1 & val_5_2</tag_5>
<tag_6>-0.157</tag_6>
</data>
<data>
<tag_3>val_3</tag_3>
<tag_4>val_4_2</tag_4>
<tag_5>val_5_1</tag_5>
<tag_6>-0.173</tag_6>
</data>
</date>
</tag_2>
<tag_7>
<date value="06-30-2023">
<data><tag_3>val_3</tag_3><tag_4>val_4</tag_4><tag_5>val_5_1 & val_5_2</tag_5><tag_6>-0.157</tag_6>
</data>
<data><tag_3>val_3</tag_3><tag_4>val_4_2</tag_4><tag_5>val_5_1</tag_5><tag_6>-0.173</tag_6>
</data>
</date>
</tag_7>
</tag_1>
在阅读了上面的XML之后,我能够将其转换为字典:
{'tag_1':
{'tag_2':
{'date':
{'@value': '06-30-2023',
'data': [{'tag_3': 'val_3', 'tag_4': 'val_4', 'tag_5': 'val_5_1 & val_5_2', 'tag_6': '-0.157'},
{'tag_3': 'val_3', 'tag_4': 'val_4_2', 'tag_5': 'val_5_1', 'tag_6': '-0.173'}
]
}
},
'tag_7':
{'date':
{'@value': '06-30-2023',
'data': [{'tag_3': 'val_3', 'tag_4': 'val_4', 'tag_5': 'val_5_1 & val_5_2', 'tag_6': '-0.157'},
{'tag_3': 'val_3', 'tag_4': 'val_4_2', 'tag_5': 'val_5_1', 'tag_6': '-0.173'}
]
}
}
}
}
我的预期输出(在CSV文件中)是:
tag_1,tag_2,date,data,tag_3,tag_4,tag_5,tag_6
tag_1,tag_2,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_2,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173
tag_1,tag_7,06-30-2023,data,val_3,val_4,val_5_1 & val_5_2,-0.157
tag_1,tag_7,06-30-2023,data,val_3,val_4_2,val_5_1,-0.173
到目前为止,我try 了以下几点:
import xmltodict
import json
import csv
import pandas as pd
with open("file_01.xml", "r", encoding="utf-8") as xml_fh:
str_xml = xml_fh.read()
print(f"str_xml={type(str_xml)}={str_xml}")
dict_xml = xmltodict.parse(str_xml)
print(f"dict_xml={type(dict_xml)}={dict_xml}")
df = pd.DataFrame.from_dict(dict_xml, orient='index')
df.to_csv('file_01.csv', index = False)
我得到的实际结果是:
tag_2,tag_7
"{'date': {'@value': '06-30-2023', 'data': [{'tag_3': 'val_3', 'tag_4': 'val_4', 'tag_5': 'val_5_1 & val_5_2', 'tag_6': '-0.157'}, {'tag_3': 'val_3', 'tag_4': 'val_4_2', 'tag_5': 'val_5_1', 'tag_6': '-0.173'}]}}","{'date': {'@value': '06-30-2023', 'data': [{'tag_3': 'val_3', 'tag_4': 'val_4', 'tag_5': 'val_5_1 & val_5_2', 'tag_6': '-0.157'}, {'tag_3': 'val_3', 'tag_4': 'val_4_2', 'tag_5': 'val_5_1', 'tag_6': '-0.173'}]}}"
我错过了什么吗?