我试图修复这个代码,我试图从这家公司的SEC网站上获取表

Libraries:

# import our libraries
import requests
import pandas as pd
from bs4 import BeautifulSoup

Definition of parameters for search:

# base URL for the SEC EDGAR browser
endpoint = r"https://www.sec.gov/cgi-bin/browse-edgar"

# define our parameters dictionary
param_dict = {'action':'getcompany',
              'CIK':'1265107',
              'type':'10-k',
              'dateb':'20190101',
              'owner':'exclude',
              'start':'',
              'output':'',
              'count':'100'}

# request the url, and then parse the response.
response = requests.get(url = endpoint, params = param_dict)
soup = BeautifulSoup(response.content, 'html.parser')

# Let the user know it was successful.
print('Request Successful')
print(response.url)

This is where the problem is, when I try to loop over the content of the table i get error shown below as if the table does not exist.

# find the document table with our data
doc_table = soup.find_all('table', class_='tableFile2')

# define a base url that will be used for link building.
base_url_sec = r"https://www.sec.gov"

master_list = []

# loop through each row in the table.
for row in doc_table[0].find_all('tr'):

The error:

enter image description here

这是我正在努力搜索的网站的链接

谢谢你的帮助.

推荐答案

下表数据是静态的.您可以使用pandas抓取表,而无需调用API url

import pandas as pd
import requests
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
}
url= 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=1265107&type=10-k&dateb=20190101&owner=exclude&start=&output=&count=100'
req=requests.get(url,headers=headers).text
df=pd.read_html(req)[2]
print(df)

Output:

Filings                      Format  ... Filing Date     File/Film Number
0    10-K  Documents Interactive Data  ...  2018-03-07   333-11002518671437   
1    10-K  Documents Interactive Data  ...  2017-03-13   333-11002517683575   
2    10-K  Documents Interactive Data  ...  2016-03-08  333-110025161489854   
3    10-K  Documents Interactive Data  ...  2015-03-06   333-11002515681017   
4    10-K  Documents Interactive Data  ...  2014-03-04   333-11002514664345   
5    10-K  Documents Interactive Data  ...  2013-03-01   333-11002513655933   
6    10-K                   Documents  ...  2006-09-20  333-110025061099734   
7    10-K                   Documents  ...  2005-09-23  333-110025051099353   

[8 rows x 5 columns]

备选方案:您的代码也运行良好.只需将用户代理作为标头注入即可.

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"
}
endpoint = r"https://www.sec.gov/cgi-bin/browse-edgar"

# define our parameters dictionary
param_dict = {'action':'getcompany',
              'CIK':'1265107',
              'type':'10-k',
              'dateb':'20190101',
              'owner':'exclude',
              'start':'',
              'output':'',
              'count':'100'}

# request the url, and then parse the response.
response = requests.get(url = endpoint, params = param_dict,headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')

# Let the user know it was successful.
print('Request Successful')
print(response.url)


doc_table = soup.find_all('table', class_='tableFile2')

# define a base url that will be used for link building.
base_url_sec = r"https://www.sec.gov"

master_list = []

# loop through each row in the table.
for row in doc_table[0].find_all('tr'):
    print(list(row.stripped_strings))

Output:

['Filings', 'Format', 'Description', 'Filing Date', 'File/Film Number']
['10-K', 'Documents', 'Interactive Data', 'Annual report [Section 13 and 15(d), not S-K Item 405]', 'Acc-no: 0001265107-18-000013\xa0(34 Act)\xa0 Size: 11 MB', '2018-03-07', '333-110025', 
'18671437']
['10-K', 'Documents', 'Interactive Data', 'Annual report [Section 13 and 15(d), not S-K Item 405]', 'Acc-no: 0001265107-17-000007\xa0(34 Act)\xa0 Size: 11 MB', '2017-03-13', '333-110025', 
'17683575']
['10-K', 'Documents', 'Interactive Data', 'Annual report [Section 13 and 15(d), not S-K Item 405]', 'Acc-no: 0001265107-16-000052\xa0(34 Act)\xa0 Size: 9 MB', '2016-03-08', '333-110025', '161489854']

Python相关问答推荐

如何在BeautifulSoup/CSS Select 器中处理regex?

如何检测鼠标/键盘的空闲时间,而不是其他输入设备?

判断Python操作:如何从字面上得到所有decorator ?

如何在一组行中找到循环?

需要帮助使用Python中的Google的People API更新联系人的多个字段'

如何在Python中自动创建数字文件夹和正在进行的文件夹?

SpaCy:Regex模式在基于规则的匹配器中不起作用

操作布尔值的Series时出现索引问题

时长超过24小时如何从Excel导入时长数据

如何在Polars中处理用户自定义函数的多行结果?

使用BeautifulSoap库从Web获取表格时没有响应

有没有更python的方法来复制python中列表的第n个元素?例如,使用列表理解

收到Firebase推送通知时,电话不会震动

字符串是批注序列[SEQUENCE[STR]]的有效类型吗?

一种获取所有相关列的极点数据帧绝对最大值的毕达式方法

第一行中的Pandas 按条件替换

已超过url:/Forecast的最大重试次数.无法在Fastapi应用程序上建立新连接:[WinError 10061]

[]和Expression API的区别是什么?

如何使用建议值设置不区分大小写的模型

Django-自动完成搜索为已注销用户抛出错误?