在过go 的几个月里,我一直在做一个导出项目,基本上应用程序需要从BLOB存储中获取文件,将文件统一到一个文件中,压缩成一个压缩文件并将其上传到BLOB存储中.我把这个过程分成了几个步骤.性能非常好,整个过程都在工作,但当我导出大量文件时,最后一步崩溃(因为我的环境只有15 GB内存,而文件比这个大).有什么主意吗?

以下是最后一步和代码的简单说明:

  1. 从BLOB中获取所有相关文件,并将它们与文件的路径和字节[]一起存储到词典中
public async Task<Dictionary<string, byte[]>> DownloadManyAsync(Guid exportId)
{
    var tasks = new Queue<Task>();
    var files = new ConcurrentDictionary<string, byte[]>();

    var container = _blobServiceClient.GetBlobContainerClient("");
    var blobs = container.GetBlobs(prefix: "");
    var options = BlobStorageTools.GetOptions();


    foreach (var blob in blobs)
    {
        tasks.Enqueue(DownloadAndEnlist(container.GetBlobClient(blob.Name), files, options, exportId));
    }

    await Task.WhenAll(tasks);

    return files.ToDictionary(x => x.Key,
                              x => x.Value, 
                              files.Comparer);
}

public async Task DownloadAndEnlist(BlobClient blob, ConcurrentDictionary<string, byte[]> files, StorageTransferOptions options, Guid exportId)
{
    using var memoryStream = new MemoryStream();

    await blob.DownloadToAsync(memoryStream, default, options);

    files.TryAdd(blob.Name, memoryStream.ToArray());
}


  1. 创建一个Zip存档并将字节写入其中
using var memoryStream = new MemoryTributary();

using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Create, true))
{
    for (int i = files.Count - 1; i >= 0; i--)
    {
        var file = files.ElementAt(i);

        var zipArchiveEntry = archive.CreateEntry(file.Key, CompressionLevel.Fastest);

        using var zipStream = zipArchiveEntry.Open();

        zipStream.Write(file.Value, 0, file.Value.Length);

        files.Remove(file.Key);
    }
}

  1. 将压缩文件保存到BLOB中
public async Task<string> SaveExport(string fileName, Stream file)
{
    var cloudBlockBlob = _blobClient.GetContainerReference("").GetBlockBlobReference($"{fileName}.zip");

    BlockingCollection<string> blockList = new();
    Queue<Task> tasks = new();

    int bytesRead;
    int blockNumber = 0;


    if (file.Position != 0) file.Position = 0;

    do
    {
        blockNumber++;
        string blockId = $"{blockNumber:000000000}";
        string base64BlockId = Convert.ToBase64String(Encoding.UTF8.GetBytes(blockId));

        byte[] buffer = new byte[8000000];
        bytesRead = await file.ReadAsync(buffer);

        tasks.Enqueue(Task.Run(async () =>
        {
            await cloudBlockBlob.PutBlockAsync(base64BlockId, new MemoryStream(buffer, 0, bytesRead) { Position = 0 }, null);

            blockList.Add(base64BlockId);
        }));
        
    } while (bytesRead == 8000000);

    await Task.WhenAll(tasks);

    await cloudBlockBlob.PutBlockListAsync(blockList);

    return cloudBlockBlob.Uri.ToString();
}

我以为在使用az函数,但是函数有15 GB的内存限制,我也会有同样的问题.

推荐答案

article美元给了我很大的帮助.基本上,我使用BlobStream将压缩文件直接写入存储空间,因此内存使用率非常低.我希望这能帮助一些future 的开发人员解决同样的问题.

新代码:

using (var zipFileStream = await _shareExportStorage.OpenZipFileStreamAsync(filename))
{
    using (var zipOutputStream = new ZipOutputStream(zipFileStream) {IsStreamOwner = false})
    {
        zipOutputStream.SetLevel(4);

        foreach (var file in filesListTask.Result)
        {
            var properties = await _exportFilesStorage.GetBlobProperties(file);

            var zipEntry = new ZipEntry(file)
            {
                Size = properties.ContentLength
            };

            zipOutputStream.PutNextEntry(zipEntry);
            await _exportFilesStorage.DownloadOneToStreamAsync(zipOutputStream, file);
            zipOutputStream.CloseEntry();
        }
    }
}

public async Task<Stream> OpenZipFileStreamAsync(string fileName)
{
    var zipBlobClient = new BlockBlobClient(_configuration["AzureBlobStorage:ConnectionString"], "", fileName);

    return await zipBlobClient.OpenWriteAsync(true, options: new BlockBlobOpenWriteOptions
    {
        HttpHeaders = new BlobHttpHeaders
        {
            ContentType = "application/zip"
        }
    });
}

public async Task DownloadOneToStreamAsync(Stream destination, string blobName)
{
    var container = _blobServiceClient.GetBlobContainerClient("");

    var blobClient = container.GetBlobClient(blobName);

    await blobClient.DownloadToAsync(destination, new BlobDownloadToOptions 
    { 
        TransferOptions = BlobStorageTools.GetOptions()
    });
}

Csharp相关问答推荐

如何从顶部提取发票号作为单词发票后的第一个匹配

EF Core 8—应用客户端投影后无法转换集操作

有没有一种方法可以在包含混合文本的标签中嵌入超链接?

Blazor Foreach仅渲染最后一种 colored颜色

创建临时Collection 最有效的方法是什么?堆栈分配和集合表达式之间的区别?

发布用于Linux Ubuntu的C#应用程序

JsonSerializer.Deserialize<;TValue>;(String,JsonSerializerOptions)何时返回空?

WeakReference未被垃圾收集

如何将ASP.NET Core 2.1(在.NET框架上运行)更新到较新的版本?

更改执行目录

当前代码Cosmos DB 3.37.1:PartitionKey key key mismatch exception

等待一个等待函数

数据库.Migrate在对接容器重启时失败

我什么时候应该在Dapper中使用Connection.OpenAsync?

从Base64转换为不同的字符串返回相同的结果

多个选项卡上的MudForm验证

Foreach非常慢的C#

具有嵌套属性的IGGroup

无法创建工具窗口(用于VBIDE、VBA的COM加载项扩展)

使用ImmutableList时,DynamicData未按预期工作