我需要读入以下XML文件:

https://www.un.org/securitycouncil/sites/www.un.org.securitycouncil/files/consolidated.xml

我试过这个代码:

import requests

from lxml import objectify

url = requests.get("https://www.un.org/securitycouncil/sites/www.un.org.securitycouncil/files/consolidated.xml")
parsed = objectify.parse((url))

当我运行它时,我收到以下错误:

TypeError:无法从‘Response’进行分析

我不明白为什么.

有人能帮帮我吗?

推荐答案

以下是获取这些数据的一种方式:

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://www.un.org/securitycouncil/sites/www.un.org.securitycouncil/files/consolidated.xml'
soup = bs(requests.get(url, headers=headers).text, 'lxml')
df = pd.read_xml(str(soup), xpath='.//individual')
print(df)

结果为终端:

    dataid  versionnum  first_name  second_name     third_name  un_list_type    reference_number    listed_on   comments1   designation     ...     individual_date_of_birth    individual_place_of_birth   individual_document     sort_key    sort_key_last_mod   name_original_script    fourth_name     gender  title   submitted_by
0   6908555     1   RI  WON HO  None    DPRK    KPi.033     2016-11-30  Ri Won Ho is a DPRK Ministry of State Security...   NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
1   6908570     1   CHANG   CHANG HA    None    DPRK    KPi.037     2016-11-30  None    NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
2   6908571     1   CHO     CHUN RYONG  None    DPRK    KPi.038     2016-11-30  None    NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
3   6908858     1   EMRAAN  ALI     None    Al-Qaida    QDi.430     2021-11-23  Senior member of Islamic State in Iraq and the...   NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
4   6908565     1   JO  YONG CHOL   None    DPRK    KPi.034     2016-11-30  Jo Yong Chol is a DPRK Ministry of State Secur...   NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...
697     6908704     1   Ahmad   Oumar   Imhamad     Libya   LYi.023     2018-06-07  Listed pursuant to paragraphs 15 and 17 of res...   NaN     ...     NaN     NaN     NaN     NaN     NaN     احمد عمر امحمد الفيتوري     al-Fitouri  None    NaN     None
698     6908707     1   Abd     Al-Rahman   al-Milad    Libya   LYi.026     2018-06-07  Listed pursuant to paragraphs 15 and 17 of res...   NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
699     6908841     1   Amir    Muhammad Sa’id  Abdal-Rahman    Al-Qaida    QDi.426     2020-05-21  Leader of Islamic State in Iraq and the\n ...   NaN     ...     NaN     NaN     NaN     NaN     NaN     أمیر محمد سعید عبد\n ...    al-Salbi    None    NaN     None
700     2975510     1   FAIZULLAH   KHAN    NOORZAI     Taliban     TAi.153     2011-10-04  Prominent Taliban financier. As of mid-2009, s...   NaN     ...     NaN     NaN     NaN     NaN     NaN     فیض الله خان نورزی  na  None    NaN     None
701     2959427     1   SAID JAN    ‘ABD AL-SALAM   None    Al-Qaida    QDi.289     2011-02-09  In approximately 2005, ran a "basic training" ...   NaN     ...     NaN     NaN     NaN     NaN     NaN     سعید جان عبد السلام     None    None    NaN     None

702 rows × 25 columns

对于读取XML文档,还要勾选pandas documentation.

Python相关问答推荐

获取2个字节之间的异或

OdooElectron 商务产品详情页面中add_qty参数动态更新

如何从不同长度的HTML表格中抓取准确的字段?

如何在Python中按组应用简单的线性回归?

如果AST请求默认受csref保护,那么在Django中使用@ system_decorator(csref_protect)的目的是什么?

Twilio:CallInstance对象没有来自_的属性'

配置Sweetviz以分析对象类型列,而无需转换

在函数内部使用eval(),将函数的输入作为字符串的一部分

比较两个二元组列表,NP.isin

acme错误-Veritas错误:模块收件箱没有属性linear_util'

使用numpy提取数据块

. str.替换pandas.series的方法未按预期工作

在Pandas DataFrame操作中用链接替换'方法的更有效方法

scikit-learn导入无法导入名称METRIC_MAPPING64'

Python虚拟环境的轻量级使用

从groupby执行计算后创建新的子框架

名为__main__. py的Python模块在导入时不运行'

解决调用嵌入式函数的XSLT中表达式的语法移位/归约冲突

使用BeautifulSoup抓取所有链接

幂集,其中每个元素可以是正或负""""