我有一个ExcelElectron 表格,每行有一个图像,使用this example个作品来抓取图像.然而,我想要做的不是刮Electron 表格的图像,而是我想提取与该图像相关联的URL.如果我打开Excel文件,我可以点击图像并导航到给定的URL.是否无法通过Python提取此URL?
我已经查阅了有关Openpyxl的文档,以查看是否有任何从图像中抓取嵌入的URL的示例,但我找不到任何帮助.
如有任何帮助,将不胜感激.谢谢
我有一个ExcelElectron 表格,每行有一个图像,使用this example个作品来抓取图像.然而,我想要做的不是刮Electron 表格的图像,而是我想提取与该图像相关联的URL.如果我打开Excel文件,我可以点击图像并导航到给定的URL.是否无法通过Python提取此URL?
我已经查阅了有关Openpyxl的文档,以查看是否有任何从图像中抓取嵌入的URL的示例,但我找不到任何帮助.
如有任何帮助,将不胜感激.谢谢
作为一个很好的开始,您可以阅读/解压缩spreadsheet:
import zipfile
import pandas as pd
with zipfile.ZipFile("file.xlsx", "r") as zf:
xmls = [zf.read(fn) for fn in zf.infolist()
if fn.filename.startswith("xl/drawings/_rels/")]
urls = (
pd.concat([pd.read_xml(data).assign(SheetNumber=i)
for i, data in enumerate(xmls, start=1)]).sort_values(by=["SheetNumber", "Id"])
.loc[lambda x: x["TargetMode"].eq("External"), ["SheetNumber", "Target"]]
.reset_index(drop=True)
)
发帖主题:Re:Kolibrios
print(urls)
SheetNumber Target
0 1 https://stackoverflow.com/
1 1 https://gis.stackexchange.com/
2 2 https://meta.stackexchange.com/
3 2 https://askubuntu.com/
To go further, we can use openpyxl and the Styler to put the images next to their urls :
import string
import base64
from collections import defaultdict
from openpyxl import load_workbook
workbook = load_workbook("file.xlsx")
images = defaultdict(list)
for ws in workbook:
#https://github.com/ultr4nerd/openpyxl-image-loader
for image in ws._images:
row = image.anchor._from.row + 1
col = string.ascii_uppercase[image.anchor._from.col]
images[ws.title].append({f'{col}{row}': image._data()})
def tag_img(ser):
return r'<div style="display: flex; justify-content: center;">'\
'<img src="data:image/png;base64,{}" width="200" height="50"></div>' \
.format(base64.b64encode(ser).decode("utf-8"))
imgs = pd.concat(
[pd.DataFrame(v).stack().apply(tag_img)
.reset_index(level=1, name="Image")
.assign(SheetName=k).rename(columns={"level_1": "CellCoord"})
for k,v in images.items()], ignore_index=True
)
(
imgs.join(urls)[["SheetNumber", "SheetName", "CellCoord", "Image", "Target"]].style
.set_properties(**{"border":"1px solid",
"text-align": "center", "background-color": "white"})
.format(hyperlinks="html").pipe(display)
)
发帖主题:Re:Kolibrios
Spreadsheet used(file.xlsx
):