我试着刮掉下面的页面: http://www.nasdaqtrader.com/Trader.aspx?id=archiveheadlines&cat_id=105

使用以下代码

import requests
from bs4 import BeautifulSoup

headers = {"Referer": "http://www.nasdaqtrader.com/Trader.aspx?id=archiveheadlines&cat_id=105"}
data = {"id":2,"cat_id": 105, "method":"BL_NewsListing.GetCurrentHeadlinesPageData","params":"[]","version":"1.1"}
url = "https://www.nasdaqtrader.com/RPCHandler.axd"
req = requests.post(url, json=data, headers=headers)
result = req.json()['result']
soup = BeautifulSoup(result, 'html.parser')
table = soup.find('table')

print(table)

结果,我从第页收到数据: http://www.nasdaqtrader.com/Trader.aspx?id=currentheadlines个 这与目标结果略有不同.由于它没有显示来自目标结果的所有数据,我正在寻找解决方案.

我试着调整了来自BL_NewsListing.GetCurrentHeadlinesPageData的方法 至BL_NewsListing.GetArchiveHeadlinesPageData

但这对我来说一点也不管用.有什么主意吗? 先谢谢你.

推荐答案

try :

import pandas as pd
import requests
from bs4 import BeautifulSoup

headers = {
    "Referer": "http://www.nasdaqtrader.com/Trader.aspx?id=archiveheadlines&cat_id=105"
}

data = {
    "id": 2,
    "method": "BL_NewsListing.GetArchiveHeadlinesPageData",
    "params": "[\"105\",2023,0]",
    "version": "1.1"
}

url = "https://www.nasdaqtrader.com/RPCHandler.axd"
req = requests.post(url, json=data, headers=headers)
result = req.json()["result"]
soup = BeautifulSoup(result, "html.parser")
table = soup.find("table")

df = pd.read_html(str(table))[0]
print(df)

打印:

             Date          Market    Alert #                                                                                                                                                         Headline
0    Mar 31, 2023          NASDAQ  #2023-196                                                                                             Information regarding the redemption of TCV Acquisition Corp. (TCVA)
1    Mar 31, 2023          NASDAQ  #2023-195                                                   Information Regarding the Reverse Stock Split and CUSIP Number Change for China Natural Resources, Inc. (CHNR)
2    Mar 30, 2023          NASDAQ  #2023-194       Information Regarding the Business Combination of DiamondHead Holdings Corp. (DHHC/W/U) and Great Southern Homes, Inc. (Re-named United Homes Group, Inc.)
3    Mar 30, 2023          NASDAQ  #2023-193                                                  Information Regarding the Reverse Stock Split, Name, Symbol & CUSIP Number Change for Helbiz, Inc. (HLBZ/HLBZW)
4    Mar 30, 2023          NASDAQ  #2023-192                                                       Information Regarding the Reverse Stock Split and CUSIP Number Change for Ensysce Biosciences, Inc. (ENSC)
5    Mar 29, 2023          NASDAQ  #2023-191                                                                        Information regarding the redemption of Aries I Acquisition Corporation (RAM/RAMMU/RAMMW)
6    Mar 29, 2023          NASDAQ  #2023-190                                                  Information Regarding the Business Combination of Maxpro Capital Acquisition Corp. (JMAC/W/U) & Apollomics Inc.
7    Mar 29, 2023          NASDAQ  #2023-189                                                         Information Regarding the Reverse Stock Split , Ratio and CUSIP Number Changes for Biophytis S.A. (BPTS)
8    Mar 28, 2023          NASDAQ  #2023-188                                                                 (UPDATED: Merger Effective) Information Regarding the Merger of AgroFresh Solutions, Inc. (AGFS)

...

Python相关问答推荐

在Python中使用readline函数时如何向下行

Python在通过Inbox调用时给出不同的响应

合并其中一个具有重叠范围的两个框架的最佳方法是什么?

在Docker中运行HAProxy时无法获得503服务

如何在不使用字符串的情况下将namedtuple属性传递给方法?

如何使用没有Selenium的Python在百思买着陆页面上处理国家/地区 Select ?

使用polars .滤镜进行切片速度比pandas .loc慢

Pythind 11无法弄清楚如何访问tuple元素

如何找到满足各组口罩条件的第一行?

Pandas - groupby字符串字段并按时间范围 Select

Pandas—合并数据帧,在公共列上保留非空值,在另一列上保留平均值

当独立的网络调用不应该互相阻塞时,'

将tdqm与cx.Oracle查询集成

Django REST Framework:无法正确地将值注释到多对多模型,不断得到错误字段名称字段对模型无效'<><>

Django RawSQL注释字段

如何排除prefecture_related中查询集为空的实例?

Python避免mypy在相互引用中从另一个类重定义类时失败

为什么在FastAPI中创建与数据库的连接时需要使用生成器?

如何从pandas DataFrame中获取. groupby()和. agg()之后的子列?

不允许 Select 北极滚动?