Combine and Zip Multiple Large Files From Azure Blob Storage

Combine and Zip Multiple Large Files From Azure Blob Storage - c#

I have some code running behind an API that loops through a list of files on Azure Blob Storage, Zips them up and saves the final Zip to the same storage account. I then provide a link to the Zip file for my users to access.
This solution works fine provided the files are small. However there are many files in the 2-5 GB range and as soon as these are tested I get an out of memory exception error:
'Array dimensions exceeded supported range.'
I've seen systems like OneDrive and GoogleDrive create these files very quickly and I aspire to creating that experience for my users. But I am also fine with notifying the user when the archive is ready to download even if it is a few minutes later as I will have their email.
Here is a version of the code simplified and running in a console app:
using Microsoft.WindowsAzure.Storage;
using System.IO.Compression;
var account = CloudStorageAccount.Parse("ConnectionString");
var blobClient = account.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("ContainerName");
var blob = container.GetBlockBlobReference("ZipArchive.zip");
using (var stream = await blob.OpenWriteAsync())
using (var zip = new ZipArchive(stream, ZipArchiveMode.Create))
{
var files = new string[] {
"files/psds/VeryLargePsd_1.psd",
"files/psds/VeryLargePsd_2.psd",
"files/psds/VeryLargePsd_3.psd",
"files/zips/VeryLargeZip_1.zip",
"files/zips/VeryLargeZip_2.zip"
};
foreach (var file in files)
{
var sourceBlob = container.GetBlockBlobReference(file);
var index = file.LastIndexOf('/') + 1;
var fileName = file.Substring(index, file.Length - index);
var entry = zip.CreateEntry(fileName, CompressionLevel.Optimal);
await sourceBlob.FetchAttributesAsync();
byte[] imageBytes = new byte[sourceBlob.Properties.Length];
await sourceBlob.DownloadToByteArrayAsync(imageBytes, 0);
using (var zipStream = entry.Open())
zipStream.Write(imageBytes, 0, imageBytes.Length);
}
}

As you mentioned it working for small files and large files it throwing the error.
workarounds
1) Upload the large files with small chunks then zip them .
For more details refer this SO Thread: Upload a zip file in small chunks to azure cloud blob storage
2) This tutorial shows you deploy an application that uploads large amount of random data to an Azure storage account: Upload large amounts of random data in parallel to Azure storage
3) uploading large files, you can use Microsoft Azure Storage Data Movement Library for better performance. The Microsoft Azure Storage Data Movement Library designed for high-performance uploading, downloading and copying Azure Storage Blob and File

Related

Deleting files in Azure Synapse Notebook

This should have been simple but turned out to require a bit of GoogleFu.
I have an Azure Synapse Spark Notebook written in C# that
Receives a list of Deflate compressed IIS files.
Reads the files as binary into a DataFrame
Decompresses these files one at a time and writes them into Parquet format.
Now after all of them have been successfully processed I need to delete the compressed files.

This is my proof of concept but it works perfectly.
Create a linked service pointing to the storage account that contains the files you want to delete see Configure access to Azure Blob Storage
See code sample below
#r "nuget:Azure.Storage.Files.DataLake,12.0.0-preview.9"
using Microsoft.Spark.Extensions.Azure.Synapse.Analytics.Utils;
using Microsoft.Spark.Extensions.Azure.Synapse.Analytics.Notebook.MSSparkUtils;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;
string blob_sas_token = Credentials.GetConnectionStringOrCreds('your linked service name here');
Uri uri = new Uri($"https://'your storage account name here'.blob.core.windows.net/'your container name here'{blob_sas_token}") ;
DataLakeServiceClient _serviceClient = new DataLakeServiceClient(uri);
DataLakeFileSystemClient fileClient = _serviceClient.GetFileSystemClient("'path to directory containing the file here'") ;
fileClient.DeleteFile("'file name here'") ;
The call to Credentials.GetConnectionStringOrCreds returns a signed SAS token that is ready for your code to attach to a storage resource uri.
You could of course use the DeleteFileAsync method if you so desire.
Hope this saves someone else a few hours of GoogleFu.

How to extract Thumbnail of MP4 Video located in azure storage

I want to extract a thumbnail from an mp4 video hosted in azure storage. My current method in C# uses a NReco NuGet package:
But that is a local file. How do i extract the thumb from an azure storage file.
string mp4inputpath = server.mappath("~/testfolder/myvideo.mp4");
string thumbOutputPath = server.mappath("~/testfolder/mythumb.jpg");
var ffMpeg = new NReco.VideoConverter.FFMpegConverter();
// Get the thumb at the frame 1 second into the video
ffMpeg.GetVideoThumbnail(mp4inputpath, thumbOutputPath, 1);
That works! But i need to use an azure storage file url for mp4inputpath.
I can download the mp4 file from azure storage and save it temporarily into my azure web app. I can do that programatically.
Then extract the thumb, ie,
ffMpeg.GetVideoThumbnail(mp4inputpath, thumbOutputPath, 1);
Then delete the temporary mp4 within my app.
this works but i don't know it is advisable to download mp4 files into my azure web app. I don't know if it will scale. This is the only solution i have, so far.
string mp4Url = #"https://mysorageaccount.blob.core.windows.net/mp4/vacation/summer/dogbarking.mp4";
string thumbOutputPath = server.mappath("~/testfolder/mythumb.jpg");
var ffMpeg = new NReco.VideoConverter.FFMpegConverter();
// Get the thumb at the frame 1 second into the video
ffMpeg.GetVideoThumbnail(mp4Url, thumbOutputPath, 1);
This does not seem to work. No Error, but the thumbOutputPath file is empty.

What you've done is pretty much what you have to do, since you cannot open an object in Azure Storage as you would a local file. So, grabbing the file into a local file or a stream is what you'd need to do.
As far as scaling: that will depend on the size (and number of instances) you're running in your Web App. Just be aware that you should have both your storage account and your Web App in the same region, to reduce latency and avoid egress charges for bandwidth.

Proper way to upload image files to Azure blob C#

In my application, we're uploading a large amount of image data at a time. Request made through an Angular portal and the ASP.NET web API is receiving the request both are hosted on Azure server. From the API I'm directly converting the image data to bytes and uploading to Azure blob.
Is this a proper way to upload or Do I need to save those images on my server first (like on some path 'C:/ImagesToUpload') and then upload to Azure blob from there?
I'm concerned because we're uploading a large amount of data and the way I'm using right now, will create memory issue or not, I've no idea about that.
so if someone

I have developed same thing. We have same requirement as large number of files. I think You have to first compress the file in API side then have to send it in blob file using the SAS token. But make sure that In Azure Blob storage you must have to pass data less then the size of 5 MB so I also found solution of that.
Here I have sample code that will work pretty good after some testing.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(SettingsProvider.Get("CloudStorageConnectionString", SettingType.AppSetting));
var blobClient = storageAccount.CreateCloudBlobClient();
var filesContainer = blobClient.GetContainerReference("your_containername");
filesContainer.CreateIfNotExists();
var durationHours = 24;
//Generate SAS Token
var sasConstraints = new SharedAccessBlobPolicy
{
SharedAccessExpiryTime = DateTime.UtcNow.AddHours(durationHours),
Permissions = SharedAccessBlobPermissions.Write | SharedAccessBlobPermissions.Read
};
// Generate Random File Name using GUID
var StorageFileName = Guid.NewGuid() + DateTime.Now.ToString();
var blob = filesContainer.GetBlockBlobReference(StorageFileName);
var blobs = new CloudBlockBlob(new Uri(string.Format("{0}/{1}{2}", filesContainer.Uri.AbsoluteUri, StorageFileName, blob.GetSharedAccessSignature(sasConstraints))));
//Code for divide the file into the 4MB Chunk if its Greater than 4 MB then
BlobRequestOptions blobRequestOptions = new BlobRequestOptions()
{
SingleBlobUploadThresholdInBytes = 4 * 1024 * 1024, //1MB, the minimum
ParallelOperationThreadCount = 5,
ServerTimeout = TimeSpan.FromMinutes(30)
};
blob.StreamWriteSizeInBytes = 4 * 1024 * 1024;
//Upload it on Azure Storage
blobs.UploadFromByteArrayAsync(item.Document_Bytes, 0, item.Document_Bytes.Length - 1, AccessCondition.GenerateEmptyCondition(), blobRequestOptions, new OperationContext());
But make sure before call this funtion if you have huge amount of data then use any of compression technology. I have used "zlib" library. You can find it on http://www.componentace.com/zlib_.NET.htm for C# .NET it's freeware. If you want to know more then visit this https://www.zlib.net/.

Per my understanding, you could also leverage fineuploader to directly upload your files to Azure Blob storage without sending the file to your server first. For detailed description, you could follow Uploading Directly to Azure Blob Storage.
The script would look like as follows:
var uploader = new qq.azure.FineUploader({
element: document.getElementById('fine-uploader'),
request: {
endpoint: 'https://<your-storage-account-name>.blob.core.windows.net/<container-name>'
},
signature: {
endpoint: 'https://yourapp.com/uploadimage/signature'
},
uploadSuccess: {
endpoint: 'https://yourapp.com/uploadimage/done'
}
});
You could follow Getting Started with Fine Uploader and install the fine-uploader package, then follow here for initializing FineUploader for Azure Blob Storage, then follow here to configure CORS for your blob container and expose the endpoint for creating the SAS token. Moreover, here is a similar issue for using FineUploader.
From the API I'm directly converting the image data to bytes and uploading to Azure blob.
I'm concerned because we're uploading a large amount of data and the way I'm using right now, will create memory issue or not
For the approach about uploading the file to your Web API endpoint first, then upload to azure storage blob, I would prefer use MultipartFormDataStreamProvider for storing the uploaded file into a temp file in the server instead of MultipartMemoryStreamProvider which would use the memory. Details you could follow the related code snippet in this issue. Moreover, you could follow the github sample for uploading files using the Web API.

Upload to ADLS from file stream

I am making a custom activity in ADF, which involves reading multiple files from Azure Storage Blobs, doing some work on them, and then finally writing a resulting file to the Azure Data Lake Store.
Last step is where I stop, because as far as I can see, the .NET SDK only allows for uploading from a local file.
Is there any way to to (programmatically) upload a file to ADL Store, where it is not from a local file? Could be a blob or a stream. If not, any workarounds?

Yes, it's possible to upload from Stream, the trick is to create file first and then append your stream to it:
string dataLakeAccount = "DLSAccountName";
var adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(credentials);
adlsFileSystemClient.FileSystem.Create(dataLakeAccount, filepath, overwrite: true);
adlsFileSystemClient.FileSystem.Append(dataLakeAccount, filepath, stream);
See also this article.

Azure file Storage SMB slow to list files in directory

We have an app that lists files in a folder through Azure Files. When we use the C# method:
Directory.GetFiles(#"\\account.file.core.windows.net\xyz")
It takes around a minute when there are 2000 files.
If we use CloudStorageAccount to do the same:
CloudFileClient fileClient = storageAccount.CreateCloudFileClient();
CloudFileDirectory directory = fileClient.GetShareReference("account").GetRootDirectoryReference().GetDirectoryReference("abc");
Int64 totalLength = 0;
foreach (IListFileItem fileAndDirectory in directory.ListFilesAndDirectories())
{
CloudFile file = (CloudFile)fileAndDirectory;
if (file == null) //must be directory if null
continue;
totalLength += file.Properties.Length;
}
It returns all the files, but takes around 10 seconds. Why is there such a large difference in performance?

When using Directory.GetFiles (System File API), it actually talks to Azure File Storage via SMB protocol (v2.1 or v3.0 depends on client OS version). However when switch to CloudStorageAccount, it talks to File Storage via REST. If you use Wireshark you will discover the SMB protocol will have several back and forth requests between client and server due to the nature of the protocol. The reason for Azure File Storage supports both SMB and REST access is to allow your legacy code/application(which used to access file shares hosted by file servers) can now talk to a file share in Cloud without code change.
So the recommendation in your case is using REST call to access Azure File Storage for better performance.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Combine and Zip Multiple Large Files From Azure Blob Storage - c#

Related

Deleting files in Azure Synapse Notebook

How to extract Thumbnail of MP4 Video located in azure storage

Proper way to upload image files to Azure blob C#

Upload to ADLS from file stream

Azure file Storage SMB slow to list files in directory

Categories

Resources