我一直在开发一个可以从各种URL下载图像并将其上传到AWS S3的Python脚本.我的脚本在多个域中运行良好,但在try 从特定URL(https://www.net-a-porter.com/variants/images/17266703523615883/in/w920_a3-4_q60.jpg)下载图像时遇到超时错误.
我试图通过增加超时和添加标头来排除故障,但问题仍然存在.
import requests
import tempfile
import os
def upload_image_to_s3_from_url(self, image_url, filename, download_timeout=120):
"""
Downloads an image from the given URL to a temporary file and uploads it to AWS S3,
then returns the S3 file URL.
"""
try:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',
'Accept': 'image/avif,image/webp,image/apng,image/*,*/*;q=0.8'
}
response = requests.get(image_url, timeout=download_timeout, stream=True, headers=headers)
response.raise_for_status()
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
for chunk in response.iter_content(chunk_size=8192):
tmp_file.write(chunk)
file_url = self.upload_image_to_s3(tmp_file.name, filename)
os.unlink(tmp_file.name)
return file_url
except requests.RequestException as e:
raise Exception(f"Failed to download or upload image. Error: {e}")
遇到错误: Exception: Failed to download or upload image. Error: HTTPSConnectionPool(host='www.net-a-porter.com', port=443): Read timed out. (read timeout=60)个
我试过了:
- 将DOWNLOAD_TIMEOUT增加到更高的值
- 修改请求头以模拟真实的浏览器会话
这种方法适用于来自其他域的图像,但不适用于上面提到的URL.
如有任何见解或建议,我们将不胜感激.提前感谢您的帮助!