我已经在本地开发并成功测试了一个将xlsb文件转换为xlsx的函数.一旦我试图在Azure门户上部署和运行,我得到了以下失败Result: Failure Exception: OSError: [Errno 30] Read-only file system: 'TEST.xlsx',所以我试图研究和阅读由于Azure函数Python是基于Linux的,文件只能保存到temp目录.我试图修改我的函数以包括临时目录,但我得到了新的错误Result: Failure Exception: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/TEST.xlsb'.有什么建议我可以实现这个结果:BLOB触发的Azure函数将xlsb文件(在BLOB容器中)转换为xlsx并保存到BLOB容器中?下面是我的第一次try 和对临时目录发现的后续更改:

import os
import logging
import pandas as pd
#from io import BytesIO
import azure.functions as func
from azure.storage.blob import BlobServiceClient, ContainerClient, BlobClient
 
app = func.FunctionApp()
 
@app.blob_trigger(arg_name="myblob", path="{containerName}/{name}.xlsb",
                               connection="BlobStorageConnectionString")
def blob_trigger(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob"
                f"Name: {myblob.name}"
                f"Blob Size: {myblob.length} bytes")
   
    accountName = "name"
    accountKey = "key"
    connectionString = f"DefaultEndpointsProtocol=https;AccountName={accountName};AccountKey={accountKey};EndpointSuffix=core.windows.net"
   
    containerName = "{containerName}"
    inputBlobname = myblob.name.replace({containerName}, "")
    outputBlobname = inputBlobname.replace(".xlsb", ".xlsx")
 
    blob_service_client = BlobServiceClient.from_connection_string(connectionString)
    container_client = blob_service_client.get_container_client(containerName)
    blob_client = container_client.get_blob_client(inputBlobname)
    blob = BlobClient.from_connection_string(conn_str=connectionString, container_name=containerName, blob_name=outputBlobname)
 
    df = pd.read_excel(blob_client.download_blob().readall(), engine="pyxlsb")
 
    df.to_excel(outputBlobname, index=False)
 
    with open(outputBlobname, "rb") as data:
        blob.upload_blob(data, overwrite=True)
import os
import logging
import pandas as pd
#from io import BytesIO
import azure.functions as func
from azure.storage.blob import BlobServiceClient, ContainerClient, BlobClient

app = func.FunctionApp()

@app.blob_trigger(arg_name="myblob", path="{containerName}/{name}.xlsb",
                               connection="BlobStorageConnectionString") 
def blob_trigger(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob"
                f"Name: {myblob.name}"
                f"Blob Size: {myblob.length} bytes")
    
    accountName = "name"
    accountKey = "key"
    connectionString = f"DefaultEndpointsProtocol=https;AccountName={accountName};AccountKey={accountKey};EndpointSuffix=core.windows.net"
    
    containerName = "{containerName}"
    inputBlobname = myblob.name.replace({containerName}, "")
    localBlobname = "/tmp/" + inputBlobname
    outputBlobname = inputBlobname.replace(".xlsb", ".xlsx")

    blob_service_client = BlobServiceClient.from_connection_string(connectionString)
    container_client = blob_service_client.get_container_client(containerName)
    blob_client = container_client.get_blob_client(inputBlobname)
    blob = BlobClient.from_connection_string(conn_str=connectionString, container_name=containerName, blob_name=outputBlobname)

    df = pd.read_excel(blob_client.download_blob().readall(), engine="pyxlsb")

    df.to_excel("/tmp/" + outputBlobname, index=False)
    ROOT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))

    with open(file = os.path.join(ROOT_DIR, localBlobname), mode="rb") as data:
        blob.upload_blob(data, overwrite=True)

推荐答案

一百零二

Code :

import logging
import pandas as pd
import azure.functions as func
from azure.storage.blob import BlobServiceClient
from io import BytesIO

app = func.FunctionApp()
@app.blob_trigger(arg_name="myblob", path="<container_name>/<file_name>.xlsb",
                   connection="kamblobstr_STORAGE")
def blob_trigger(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob"
                 f"Blob Size: {myblob.length} bytes")

    accountName = "<storage_name>"
    accountKey = "<strorage_key>"
    connectionString = f"DefaultEndpointsProtocol=https;AccountName={accountName};AccountKey={accountKey};EndpointSuffix=core.windows.net"

    containerName = "<container_name>"
    outputBlobname = "<file_name>.xlsx"  

    blob_service_client = BlobServiceClient.from_connection_string(connectionString)
    container_client = blob_service_client.get_container_client(containerName)

    input_data = myblob.read()
    df = pd.read_excel(BytesIO(input_data), engine="pyxlsb")

    output_data = BytesIO()
    df.to_excel(output_data, index=False)
    output_data.seek(0)

    blob_client = container_client.get_blob_client(outputBlobname)
    blob_client.upload_blob(output_data.getvalue(), overwrite=True)

100

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "<storage_connec_string>",
    "FUNCTIONS_WORKER_RUNTIME": "python",
    "AzureWebJobsFeatureFlags": "EnableWorkerIndexing",
    "kamblobstr_STORAGE": "<storage_connec_string>"
  }
}

Output :

BLOB触发器函数代码正在运行,我将kamb.xlsb文件上传到Azure BLOB存储容器,如下所示:

enter image description here

我收到的消息输出:"blob kamb.xlsb转换为kamb.xlsx",如下所示:

 *  Executing task: .venv\Scripts\activate && func host start 

Found Python version 3.10.11 (py).

Azure Functions Core Tools
Core Tools Version:       4.0.5030 Commit hash: N/A  (64-bit)
Function Runtime Version: 4.15.2.20177

[2024-03-17T04:00:10.684Z] Host lock lease acquired by instance ID '000000xxxxxxxxxxxx'.
[2024-03-17T04:00:22.921Z] Worker process started and initialized.

Functions:

        blob_trigger: blobTrigger

For detailed output, run func with --verbose flag.
[2024-03-17T04:00:43.865Z] Executing 'Functions.blob_trigger' (Reason='New blob detected(LogsAndContainerScan): kamcontainer/kamb.xlsb', Id=4c9d45e5xxxxxxxxxxxxxxxx)
[2024-03-17T04:00:43.870Z] Trigger Details: MessageId: a1416e18xxxxxxxxxxxxxx, DequeueCount: 1, InsertedOn: 2024-03-17T04:00:43.000+00:00, BlobCreated: 2024-03-17T04:00:39.000+00:00, BlobLastModified: 2024-03-17T04:00:39.000+00:00
[2024-03-17T04:00:44.005Z] Python blob trigger function processed blobBlob Size: None bytes
[2024-03-17T04:00:47.081Z] Request URL: 'https://kamblobstr.blob.core.windows.net/kamcontainer/kamb.xlsx'
Request method: 'PUT'
Request headers:
    'Content-Length': '4976'
    'x-ms-blob-type': 'REDACTED'
    'x-ms-version': 'REDACTED'
    'Content-Type': 'application/octet-stream'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.19.1 Python/3.10.11 (Windows-10-10.0.22631-SP0)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': 'ef51c49exxxxxxxxxxxxxxx'
    'Authorization': 'REDACTED'
A body is sent with the request
[2024-03-17T04:00:49.113Z] Response status: 201
Response headers:
    'Content-Length': '0'
    'Content-MD5': 'REDACTED'
    'Last-Modified': 'Sun, 17 Mar 2024 04:00:49 GMT'
    'ETag': '"0x8DC4636D53FDEE9"'
    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
    'x-ms-request-id': '94236a36xxxxxxxxxxxxxxxxx'
    'x-ms-client-request-id': 'ef51c49exxxxxxxxxxxxxxxxxx'
    'x-ms-version': 'REDACTED'
    'x-ms-content-crc64': 'REDACTED'
    'x-ms-request-server-encrypted': 'REDACTED'
    'Date': 'Sun, 17 Mar 2024 04:00:49 GMT'
[2024-03-17T04:00:49.167Z] Executed 'Functions.blob_trigger' (Succeeded, Id=4c9d45e5xxxxxxxxxxxxxxx, Duration=6129ms)

enter image description here enter image description here

之后,我成功地将我的项目部署到Azure Function应用上,如下图所示:

enter image description here

我将kamb.xlsb文件上传到Azure BLOB存储容器,它在Azure门户上的函数应用程序中运行成功,如下图所示:

enter image description here

The blob 100 was converted to 101 in the storage container, as shown below.

enter image description here

kamb.xlsx data :

enter image description here

Function app Monitor Logs :

enter image description here

Python相关问答推荐

Python在tuple上操作不会通过整个单词匹配

pandas DataFrame GroupBy.diff函数的意外输出

计算组中唯一值的数量

用NumPy优化a[i] = a[i-1]*b[i] + c[i]的迭代计算

如何使用Python以编程方式判断和检索Angular网站的动态内容?

如何让这个星型模式在Python中只使用一个for循环?

移动条情节旁边的半小提琴情节在海运

多指标不同顺序串联大Pandas 模型

改进大型数据集的框架性能

调用decorator返回原始函数的输出

未调用自定义JSON编码器

Gunicorn无法启动Flask应用,因为无法将应用解析为属性名或函数调用.'"'' "

pandas fill和bfill基于另一列中的条件

如何根据rame中的列值分别分组值

Django Table—如果项目是唯一的,则单行

如何重新组织我的Pandas DataFrame,使列名成为列值?

如何在Python中从html页面中提取html链接?

为什么我的scipy.optimize.minimize(method=";newton-cg";)函数停留在局部最大值上?

合并Pandas中的数据帧,但处理不存在的列

Pandas:根据相邻行之间的差异过滤数据帧