Pythonquests.get(Url)返回Colab中的空内容

发布于03月12日

我正在通过请求爬行一个网站，但尽管有response.status_code个返回200个，但没有响应的内容.文本或响应.内容.

另一个有代码的站点运行得很好，在本地的Jupyter环境中也运行得很好，但出于某种原因，我无法通过下面的防火墙url‘colab’.

你能给我一些建议吗？

问题URL:https://gall.dcinside.com/board/view/?id=piano&no=1&exception_mode=notice&page=1

import requests
from bs4 import BeautifulSoup as bs

url = 'https://gall.dcinside.com/board/view/?id=piano&no=1&exception_mode=notice&page=1'
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Whale/3.25.232.19 Safari/537.36'}
response = requests.get(url, headers=headers, data={'buscar':100000})
soup = bs(response.content, "html.parser")
soup

<br/>
<br/>
<center>
<h2>
The request / response that are contrary to the Web firewall security policies have been blocked.
</h2>
<table>
<tr>
<td>Detect time</td>
<td>2024-03-12 21:52:05</td>
</tr>
<tr>
<td>Detect client IP</td>
<td>35.236.245.49</td>
</tr>
<tr>
<td>Detect URL</td>
<td>https://gall.dcinside.com/board/view/</td>
</tr>
</table>
</center>
<br/>

我try 将用户代理、HTTPS更改为http，以及其他类似问题的建议，但都不起作用.

import requests from bs4 import BeautifulSoup as bs from urllib.parse import urlparse, parse_qs import os # Please add your proxy address and port to use given proxy while making a request. # Note: I'm using scrapeops proxy here, you can also get a trail plan and replace the api_key with a valid key api_key = "0565b10e-c1b5-418c-b15d-02d4ebd5d6a2" proxy_value = f"http://scrapeops:{api_key}@proxy.scrapeops.io:5353" os.environ['HTTP_PROXY'] = proxy_value os.environ['HTTPS_PROXY'] = proxy_value def get_response_by_passing_headers(url): # We are parsing query parameters from the URL to pass it to the request parsed_url = urlparse(url) query_params = parse_qs(parsed_url.query) params = {key: value[0] for key, value in query_params.items()} headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7', 'Accept-Language': 'en-GB,en;q=0.9', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', 'Pragma': 'no-cache', 'Sec-Fetch-Dest': 'document', 'Sec-Fetch-Mode': 'navigate', 'Sec-Fetch-Site': 'none', 'Sec-Fetch-User': '?1', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36', 'sec-ch-ua': '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"', 'sec-ch-ua-mobile': '?0', 'sec-ch-ua-platform': '"Linux"', } # Making a request with all the headers and parameters response = requests.get('https://gall.dcinside.com/board/view/', params=params, headers=headers, verify=False) return response url = 'https://gall.dcinside.com/board/view/?id=piano&no=1&exception_mode=notice&page=1' response = get_response_by_passing_headers(url) soup = bs(response.content, "html.parser") print(soup)

Pythonquests.get(Url)返回Colab中的空内容

推荐答案

Python相关问答推荐

如何根据另一列值用字典中的值替换列值

Pandas 填充条件是另一列

在内部列表上滚动窗口

Python多处理：当我在一个巨大的pandas数据框架上启动许多进程时，程序就会陷入困境

根据在同一数据框中的查找向数据框添加值

按顺序合并2个词典列表

Python库：可选地支持numpy类型，而不依赖于numpy

如何在Python脚本中附加一个Google tab(已经打开)

DataFrames与NaN的条件乘法

计算分布的标准差

字符串合并语法在哪里记录

不能使用Gekko方程'

旋转多边形而不改变内部空间关系

使用Python查找、替换和调整PDF中的图像'

jsonschema日期格式

PYTHON中的selenium不会打开 chromium URL

如何通过特定导入在类中执行Python代码

将Pandas DataFrame中的列名的长文本打断/换行为_STRING输出？

了解如何让库认识到我具有所需的依赖项

try 使用RegEx解析由标识多行文本数据的3行头组成的日志(log)文件