使用python抓取数据并请求html并将其导出到EXCEL文件

发布于02月07日

我编写了一个从网站抓取数据的代码，运行正常，但我想将它们导出到EXCEL文件中.

我是Python的新手，所以我不知道我应该做什么.

我想到pandas，但我的输出是join的打印，所以我没有找到一个好的解决方案.

这是我的代码:

from requests_html import HTMLSession
import pandas as pd
import tabulate
from tabulate import tabulate
 
matchlink = 'https://www.betexplorer.com/football/serbia/prva-liga/results/'
 
session = HTMLSession()
 
r = session.get(matchlink)

allmatch = r.html.find('.in-match')
results = r.html.find('.h-text-center a')
# search for elements containing "data-odd" attribute
matchodds = r.html.find('[data-odd]')

odds = [matchodd.attrs["data-odd"] for matchodd in matchodds]

idx = 0
for match, res in zip(allmatch, results):
    if res.text == 'POSTP.':
        continue

    print(f"{match.text} {res.text} {', '.join(odds[idx:idx+3])}")
    
    idx += 3

谢谢你的帮忙

result odds match Dubocica - FK Indjija 2:1 2.18, 2.93, 3.31 Mladost GAT - Smederevo 1:1 1.63, 3.37, 5.17 Graficar Beograd - RFK Novi Sad 2:1 1.41, 4.31, 6.28 Tekstilac Odzaci - Radnicki Beograd 5:0 1.53, 3.79, 5.49 FK Indjija - Vrsac 2:1 1.72, 3.16, 4.90 ... ... ... Jedinstvo U. - RFK Novi Sad 4:0 1.45, 4.42, 5.59 Metalac - Graficar Beograd 1:3 2.17, 3.14, 3.11 Sloboda - OFK Beograd 0:2 1.87, 3.15, 4.02 Smederevo - FK Indjija 2:0 2.76, 2.83, 2.59 Vrsac - Kolubara 1:0 2.73, 2.92, 2.57 [160 rows x 2 columns]

from typing import Generator from requests_html import HTMLSession import pandas as pd matchlink = "https://www.betexplorer.com/football/serbia/prva-liga/results/" def _get_rows(url: str) -> Generator[dict[str, str], None, None]: session = HTMLSession() r = session.get(matchlink) allmatch = r.html.find(".in-match") results = r.html.find(".h-text-center a") # search for elements containing "data-odd" attribute matchodds = r.html.find("[data-odd]") odds = [matchodd.attrs["data-odd"] for matchodd in matchodds] idx = 0 for match, res in zip(allmatch, results): if res.text == "POSTP.": continue print(f"{match.text} Z {res.text} {', '.join(odds[idx:idx+3])}") yield { "match": match.text, "result": res.text, "odds": ", ".join(odds[idx : idx + 3]), } idx += 3 if __name__ == "__main__": df = pd.DataFrame(_get_rows(matchlink)).set_index("match") print(df)

使用python抓取数据并请求html并将其导出到EXCEL文件

推荐答案

Python相关问答推荐

删除pandas rame时间序列列中未更改的值

Python主进程和分支进程如何共享gc信息？

Pandas 除以一列中出现的每个值

在Google Colab中设置Llama-2出现问题-加载判断点碎片时Cell-run失败

如何制作10，000年及以后的日期时间对象？

数据抓取失败：寻求帮助

修复mypy错误-赋值中的类型不兼容(表达式具有类型xxx，变量具有类型yyy)

为什么抓取的HTML与浏览器判断的元素不同？

NumPy中条件嵌套for循环的向量化

SQLAlchemy bindparam在mssql上失败(但在mysql上工作)

实现神经网络代码时的TypeError

让函数调用方程

Matplotlib中的字体权重

在输入行运行时停止代码

人口全部乱序 - Python—Matplotlib—映射

用SymPy在Python中求解指数函数

并行编程：同步进程

如何在FastAPI中替换Pydantic的constr，以便在BaseModel之外使用？'

浏览超过10k页获取数据，解析：欧洲搜索服务：从欧盟站点收集机会的微小刮刀&

当我定义一个继承的类时，我可以避免使用`metaclass=`吗？