我想做一个SIGAA UnB的邮寄方法,相当于" Select 一个部门,年份,毕业和按Search".

我使用Mozilla Web Inspector查看了引用的POST请求的详细信息.复制它的头部和正文,提取ID和Cookie会话,并将HTML响应导出到文件中.每次我try 这样做时,我只得到标准页面,就好像POST方法不起作用一样.有人能解释一下这里遗漏了什么吗?

from bs4 import BeautifulSoup as BS4 
import requests

page = requests.Session()

# GET Method to search for Cookies and IDs
answer = page.get("https://sigaa.unb.br/sigaa/public/turmas/listar.jsf?aba=p-ensino")
my_cookie = answer.cookies
j_id = BS4(answer.text, "html.parser").find(id="javax.faces.ViewState")["value"]

post_header = {
    "Host": "sigaa.unb.br",
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
    "Accept-Language": "pt-BR,pt;q=0.8,en-US;q=0.5,en;q=0.3",
    "Accept-Encoding": "gzip, deflate, br",
    "Content-Type": "application/x-www-form-urlencoded",
    "Content-Length": "194",
    "Origin": "https://sigaa.unb.br",
    "Connection": "keep-alive",
    "Referer": "https://sigaa.unb.br/sigaa/public/turmas/listar.jsf?aba=p-ensino",
    "Cookie": str(my_cookie["JSESSIONID"]),
    "Upgrade-Insecure-Requests": "1",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "same-origin",
    "Sec-Fetch-User": "?1",
}

post_form = {
    "formTurma": "formTurma",
    "formTurma:inputNivel" :    "G",
    "formTurma:inputDepto" :    "673",
    "formTurma:inputAno" :  "2023",
    "formTurma:inputPeriodo" :  "1",
    "formTurma:j_id_jsp_1370969402_11" :    "Buscar",
    "javax.faces.ViewState": str(j_id),
}

answer = page.post("https://sigaa.unb.br/sigaa/public/turmas/listar.jsf?aba=p-ensino", data=post_form, headers=post_header, cookies=my_cookie)
profile_file = open("sigaa.html", "w")
profile_file.write(answer.text)

推荐答案

要从服务器获得正确的响应,请设置随时间变化的javax.faces.ViewState个变量:

import requests
import pandas as pd
from bs4 import BeautifulSoup

data = {
    "formTurma": "formTurma",
    "formTurma:inputNivel": "G",
    "formTurma:inputDepto": "643",
    "formTurma:inputAno": "2022",
    "formTurma:inputPeriodo": "2",
    "formTurma:j_id_jsp_1370969402_11": "Buscar",
    "javax.faces.ViewState": ""
}

api_url = 'https://sigaa.unb.br/sigaa/public/turmas/listar.jsf'

with requests.session() as s:
    soup = BeautifulSoup(s.get(api_url).text, 'html.parser')

    state = soup.select_one('[id="javax.faces.ViewState"]')['value']
    data['javax.faces.ViewState'] = state

    df = pd.read_html(s.post(api_url, data=data).text)[1][1:-1]
    print(df)

打印:

  Código Ano-Período                               Docente                                                       Horário Horário.1 Qtde Vagas Ofertadas Qtde Vagas Ocupadas                 Local
1     01      2022.2    TANIA CRISTINA DA SILVA CRUZ (30h)  6T1234 (25/10/2022 - 18/02/2023)  Sexta-feira 12:55 às 16:55       NaN                   40                   0  CDT - Sala Interação
2     01      2022.2  JONATHAS FELIPE AIRES FERREIRA (30h)  6T1234 (25/10/2022 - 18/02/2023)  Sexta-feira 12:55 às 16:55       NaN                   40                   0  CDT - Sala Interação

Python相关问答推荐

如何根据另一列值用字典中的值替换列值

点到面的Y距离

Pandas实际上如何对基于自定义的索引(integer和非integer)执行索引

如何使用html从excel中提取条件格式规则列表?

pandas滚动和窗口中有效观察的最大数量

Mistral模型为不同的输入文本生成相同的嵌入

Streamlit应用程序中的Plotly条形图中未正确显示Y轴刻度

转换为浮点,pandas字符串列,混合千和十进制分隔符

使用Python从URL下载Excel文件

Python中的变量每次增加超过1

在Python 3中,如何让客户端打开一个套接字到服务器,发送一行JSON编码的数据,读回一行JSON编码的数据,然后继续?

使用特定值作为引用替换数据框行上的值

numpy.unique如何消除重复列?

Flask运行时无法在Python中打印到控制台

递归函数修饰器

为什么t sns.barplot图例不显示所有值?'

如何使用pytest在traceback中找到特定的异常

按列表分组到新列中

奇怪的Base64 Python解码

来自任务调度程序的作为系统的Python文件