我试图在这里搜索电影标题:https://classindportal.mj.gov.br/consulta-filmes和刮所得的页面.我知道这涉及到一个中间步骤,发送一个特定的请求到网站与我的搜索词,我目前无法得到工作.

使用Google DevTools时,网络选项卡显示以下信息

Request URL: https://classindportal.mj.gov.br/api/solicitacao-classificacao-consultas/list
Request Method: POST
Status Code: 200 OK
Referrer Policy: strict-origin-when-cross-origin

并且请求有效载荷包含键tituloBr,其将具有与搜索项相等的值(例如,如果我在搜索栏中键入'shrek'并按下Enter,则为{'tituloBr': 'shrek'}).

我认为搜索涉及如上所示向请求URL发送POST请求,发送数据{'tituloBr': 'shrek'},因此我使用请求库如下所示:

payload = {'tituloBr': 'shrek'}
r = requests.post('https://classindportal.mj.gov.br/api/solicitacao-classificacao-consultas/list', data = payload)

但这会给出一个错误代码400,其中r.reason显示'Bad Request'.

我不认为我发送的URL或数据有任何问题,所以我不确定问题是什么.

推荐答案

IV‘e查看了页面,似乎您需要提供token-可以通过向以下地址发送POST请求获得:

https://sso.mj.gov.br/auth/realms/PRD/protocol/openid-connect/token

因此,获取令牌,然后使用令牌向API发送another请求,以搜索您想要的电影

import requests


SEARCH_TERM = "shrek"

token_url = "https://sso.mj.gov.br/auth/realms/PRD/protocol/openid-connect/token"
movies_url = (
    "https://classindportal.mj.gov.br/api/solicitacao-classificacao-consultas/list"
)


headers = {
    "Accept": "application/json, text/plain, */*",
    "Accept-Language": "en-US,en;q=0.9,he;q=0.8",
    "Authorization": "Bearer eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJMRVNSQzZ4UGtUdnlzNUdvUHpwaHNmeTJTSmMta0ZZcjFKM2VBNS1uOExnIn0.eyJleHAiOjE3MDY1NDIwNzMsImlhdCI6MTcwNjU0MTc3MywianRpIjoiYzNkY2FhOTctMTFhNi00N2Y0LThlMjUtNzRlYzcxMTIzNGNkIiwiaXNzIjoiaHR0cHM6Ly9zc28ubWouZ292LmJyL2F1dGgvcmVhbG1zL1BSRCIsImF1ZCI6WyJjbGFzc2luZC1iYWNrZW5kIiwiYWNjb3VudCJdLCJzdWIiOiIxODNmYWI5MC1hM2Y1LTQ1MWMtODQwMi1hYzAwMWVhYmM1ZTMiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJjbGFzc2luZC1jb25zdWx0YXB1YmxpY2EtZnJvbnRlbmQiLCJhY3IiOiIxIiwiYWxsb3dlZC1vcmlnaW5zIjpbImh0dHBzOi8vY2xhc3NpbmRwb3J0YWwubWouZ292LmJyIl0sInJlYWxtX2FjY2VzcyI6eyJyb2xlcyI6WyJ1bWFfYXV0aG9yaXphdGlvbiIsImRlZmF1bHQtcm9sZXMtcHJkIl19LCJyZXNvdXJjZV9hY2Nlc3MiOnsiYWNjb3VudCI6eyJyb2xlcyI6WyJtYW5hZ2UtYWNjb3VudCIsIm1hbmFnZS1hY2NvdW50LWxpbmtzIiwidmlldy1wcm9maWxlIl19fSwic2NvcGUiOiJjbGFzc2luZC1iYWNrZW5kIiwiY2xpZW50SWQiOiJjbGFzc2luZC1jb25zdWx0YXB1YmxpY2EtZnJvbnRlbmQiLCJjbGllbnRIb3N0IjoiMTAuMjUwLjEyOC4xMTMiLCJjbGllbnRBZGRyZXNzIjoiMTAuMjUwLjEyOC4xMTMifQ.RbreSBJYQ4aPZYEQmSHWo5ZkQaEEy4M9UqWkOHg2wRAoQsxHCzo3dj3CRilyHocnt-K6toV1MUVF_pk1rg2IYeOcrq5NJFaErKGl4Iy69dG_PBwU1RHP3da5-paLDg6DPZZTu2UR1FmShuvlzaSXFNe5JSDoWP1RMjpCSP5bBpXHz0M-KvbZqPykYky-pIpxCpwEIlsL15hpTFqxrghpvWcpiLfjC-YRALynXxPZFiDzqpNq9nsQwLFCXjC6lAeZmP3GQcDZMIDEBgeSx7slomM2E360teqK2WXmZHmJxRwIWP1snJDetlxbDlDHuFxGVLyLsR8kJMbKTPnZEeDUyw",
    "Connection": "keep-alive",
    "Origin": "https://classindportal.mj.gov.br",
    "Referer": "https://classindportal.mj.gov.br/consulta-filmes",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-origin",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36",
    "sec-ch-ua": '"Not A(Brand";v="99", "Google Chrome";v="121", "Chromium";v="121"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": '"macOS"',
}


json_data = {
    "currentPage": 0,
    "pageSize": 10,
    "sortItem": None,
    "totalResults": None,
    "itens": None,
    "tituloBr": f"{SEARCH_TERM}",
    "tituloOr": "",
    "requerente": "",
    "produtor": "",
    "editora": "",
    "idModulo": 1,
}


token_data = {
    "client_id": "classind-consultapublica-frontend",
    "client_secret": "4PmaBa8bBeVow40SKFNb7qNHzAxuLoqz",
    "grant_type": "client_credentials",
    "scope": "classind-backend",
}


with requests.Session() as session:
    token = session.post(token_url, data=token_data).json()["access_token"]
    headers["Authorization"] = f"Bearer {token}"
    response = session.post(movies_url, json=json_data, headers=headers)
    print(response.json())

如果您愿意,您甚至可以将数据转换为Pandas 数据框:

import pandas as pd
# ...

with requests.Session() as session:
    token = session.post(token_url, data=token_data).json()["access_token"]
    headers["Authorization"] = f"Bearer {token}"
    response = session.post(movies_url, json=json_data, headers=headers)
    data = response.json()["itens"]
    df = pd.DataFrame(data)
    print(df)

打印内容如下:

       id       tituloBrasil  ... classificacaoAtribuida classificacaoPretendida
0  164346              SHREK  ...                  Livre                    None
1  164345            SHREK 2  ...                  Livre                    None
2  164344  SHREK PARA SEMPRE  ...                  Livre                    None
3  164343     SHREK TERCEIRO  ...                  Livre                    None
4  146845            SHREK 2  ...                  Livre                    None
5  146844     SHREK TERCEIRO  ...                  Livre                    None
6  135770              SHREK  ...                  Livre                    None
7  135769            SHREK 2  ...                  Livre                    None
8  135768  SHREK PARA SEMPRE  ...                  Livre                    None
9  135767     SHREK TERCEIRO  ...                  Livre                    None

[10 rows x 8 columns]

看起来确实有一些分页工作要做--但我将把它留给操作员.

Python相关问答推荐

取相框中一列的第二位数字

如何使用函数正确索引收件箱?

来自ARIMA结果的模型方程

从 struct 类型创建MultiPolygon对象,并使用Polars列出[list[f64]列

如何让pyparparsing匹配1天或2天,但1天和2天失败?

为什么基于条件的过滤会导致pandas中的空数据框架?

Pandas 在时间序列中设定频率

如何防止Plotly在输出到PDF时减少行中的点数?

在上下文管理器中更改异常类型

优化在numpy数组中非零值周围创建缓冲区的函数的性能

Pythind 11无法弄清楚如何访问tuple元素

使可滚动框架在tkinter环境中看起来自然

Telethon加入私有频道

如何使用表达式将字符串解压缩到Polars DataFrame中的多个列中?

将tdqm与cx.Oracle查询集成

关于Python异步编程的问题和使用await/await def关键字

pandas在第1列的id,第2列的标题,第3列的值,第3列的值?

如何在达到end_time时自动将状态字段从1更改为0

如何排除prefecture_related中查询集为空的实例?

Python—压缩叶 map html作为邮箱附件并通过sendgrid发送