我正在try 使用PYTHON请求从网站https://rss.knf.gov.pl/rss_pub/刮取表数据.

在浏览器DevTools中,我可以看到在页面刷新时(也可以从表下的Limit下拉列表中 Select ),正在发送一个POST请求来检索该数据,该请求将在json中返回该数据.我可以在DevTools的‘Response’页面上看到.

然而,当我try 模仿这个请求时,结果是一个空的json.我正在运行的代码是:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36',
    'Referer': 'https://rss.knf.gov.pl/rss_pub/'
}

payload = {
    "cmd":"get",
    "language":"en",
    "search":[],
    "limit":10,
    "offset":0,
    "method":"Default",
    "sort":[{"field":"HOLDER_FULL_NAME","direction":"asc"}],
    "searchLogic":"AND",
    "searchValue":""
}

url = 'https://rss.knf.gov.pl/rss_pub/'
s = requests.session()
s.headers.update(headers)
s.get(url)
r = s.post(url+'JSON', json=payload)

运行这段代码后,我只看到一个空的json:

>>> r.status_code
200
>>> r.text
'{}'

我是否遗漏了导致此结果的请求中的任何内容?

推荐答案

使用data=参数而不是json=:

import requests

url = "https://rss.knf.gov.pl/rss_pub/JSON"

payload = {
    "request": '{"cmd":"get","language":"pl","search":[],"limit":10000,"offset":0,"method":"Default","sort":[{"field":"HOLDER_FULL_NAME","direction":"asc"}],"searchLogic":"AND","searchValue":""}'
}

data = requests.post(url, data=payload).json()
print(data)

打印:

{
    "total": 9,
    "records": [
        {
            "HOLDER_FULL_NAME": "AQR Capital Management, LLC ",
            "POSITION_DATE": "2023-09-08",
            "ISSUER_NAME": "CDPROJEKT",
            "MODIFY_DATE": "2023-09-12",
            "ISIN": "PLOPTTC00011",
            "recid": 1,
            "NET_SHORT_POSITION_O": "0.5",
        },
        {
            "HOLDER_FULL_NAME": "AQR Capital Management, LLC ",
            "POSITION_DATE": "2023-10-23",
            "ISSUER_NAME": "ALLEGRO",
            "MODIFY_DATE": "2023-10-24",
            "ISIN": "LU2237380790",
            "recid": 2,
            "NET_SHORT_POSITION_O": "0.5",
        },
        {
            "HOLDER_FULL_NAME": "GSA Capital Partners LLP ",
            "POSITION_DATE": "2023-07-07",
            "ISSUER_NAME": "TSGAMES",
            "MODIFY_DATE": "2023-08-21",
            "ISIN": "PLTSQGM00016",
            "recid": 3,
            "NET_SHORT_POSITION_O": "0.6",
        },
        {
            "HOLDER_FULL_NAME": "Insignis FIZ ",
            "POSITION_DATE": "2023-01-13",
            "ISSUER_NAME": "GPW",
            "MODIFY_DATE": "2023-08-21",
            "ISIN": "PLGPW0000017",
            "recid": 4,
            "NET_SHORT_POSITION_O": "0.59",
        },
        {
            "HOLDER_FULL_NAME": "Marshall Wace LLP ",
            "POSITION_DATE": "2023-10-24",
            "ISSUER_NAME": "KETY",
            "MODIFY_DATE": "2023-10-25",
            "ISIN": "PLKETY000011",
            "recid": 5,
            "NET_SHORT_POSITION_O": "0.78",
        },
        {
            "HOLDER_FULL_NAME": "Marshall Wace LLP ",
            "POSITION_DATE": "2023-10-18",
            "ISSUER_NAME": "CDPROJEKT",
            "MODIFY_DATE": "2023-10-19",
            "ISIN": "PLOPTTC00011",
            "recid": 6,
            "NET_SHORT_POSITION_O": "0.72",
        },
        {
            "HOLDER_FULL_NAME": "Marshall Wace LLP ",
            "POSITION_DATE": "2023-10-12",
            "ISSUER_NAME": "JSW",
            "MODIFY_DATE": "2023-10-13",
            "ISIN": "PLJSW0000015",
            "recid": 7,
            "NET_SHORT_POSITION_O": "0.71",
        },
        {
            "HOLDER_FULL_NAME": "PSquared Asset Management AG ",
            "POSITION_DATE": "2023-10-17",
            "ISSUER_NAME": "LPP",
            "MODIFY_DATE": "2023-10-18",
            "ISIN": "PLLPP0000011",
            "recid": 8,
            "NET_SHORT_POSITION_O": "0.5",
        },
        {
            "HOLDER_FULL_NAME": "Silver Point Capital, L.P. ",
            "POSITION_DATE": "2023-10-24",
            "ISSUER_NAME": "CCC",
            "MODIFY_DATE": "2023-10-25",
            "ISIN": "PLCCC0000016",
            "recid": 9,
            "NET_SHORT_POSITION_O": "0.62",
        },
    ],
    "status": "success",
}

Python相关问答推荐

如何让pyparparsing匹配1天或2天,但1天和2天失败?

实现的差异取决于计算出的表达是直接返回还是首先存储在变量中然后返回

如何根据情况丢弃大Pandas 的前n行,使大Pandas 的其余部分完好无损

Select 用a和i标签包裹的复选框?

抓取rotowire MLB球员新闻并使用Python形成表格

Pandas 有条件轮班操作

优化pytorch函数以消除for循环

如何在虚拟Python环境中运行Python程序?

如何请求使用Python将文件下载到带有登录名的门户网站?

为什么以这种方式调用pd.ExcelWriter会创建无效的文件格式或扩展名?

avxspan与pandas period_range

将tdqm与cx.Oracle查询集成

组/群集按字符串中的子字符串或子字符串中的字符串轮询数据框

Pandas:计算中间时间条目的总时间增量

ModuleNotFoundError:没有模块名为x时try 运行我的代码''

Polars map_使用多处理对UDF进行批处理

pandas:在操作pandora之后将pandora列转换为int

处理Gekko的非最优解

Python将一个列值分割成多个列,并保持其余列相同

Pandas:计数器的滚动和,复位