In my application, we're uploading a large amount of image data at a time. Request made through an Angular portal and the ASP.NET web API is receiving the request both are hosted on Azure server. From the API I'm directly converting the image data to bytes and uploading to Azure blob.
Is this a proper way to upload or Do I need to save those images on my server first (like on some path 'C:/ImagesToUpload') and then upload to Azure blob from there?
I'm concerned because we're uploading a large amount of data and the way I'm using right now, will create memory issue or not, I've no idea about that.
so if someone
I have developed same thing. We have same requirement as large number of files. I think You have to first compress the file in API side then have to send it in blob file using the SAS token. But make sure that In Azure Blob storage you must have to pass data less then the size of 5 MB so I also found solution of that.
Here I have sample code that will work pretty good after some testing.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(SettingsProvider.Get("CloudStorageConnectionString", SettingType.AppSetting));
var blobClient = storageAccount.CreateCloudBlobClient();
var filesContainer = blobClient.GetContainerReference("your_containername");
filesContainer.CreateIfNotExists();
var durationHours = 24;
//Generate SAS Token
var sasConstraints = new SharedAccessBlobPolicy
{
SharedAccessExpiryTime = DateTime.UtcNow.AddHours(durationHours),
Permissions = SharedAccessBlobPermissions.Write | SharedAccessBlobPermissions.Read
};
// Generate Random File Name using GUID
var StorageFileName = Guid.NewGuid() + DateTime.Now.ToString();
var blob = filesContainer.GetBlockBlobReference(StorageFileName);
var blobs = new CloudBlockBlob(new Uri(string.Format("{0}/{1}{2}", filesContainer.Uri.AbsoluteUri, StorageFileName, blob.GetSharedAccessSignature(sasConstraints))));
//Code for divide the file into the 4MB Chunk if its Greater than 4 MB then
BlobRequestOptions blobRequestOptions = new BlobRequestOptions()
{
SingleBlobUploadThresholdInBytes = 4 * 1024 * 1024, //1MB, the minimum
ParallelOperationThreadCount = 5,
ServerTimeout = TimeSpan.FromMinutes(30)
};
blob.StreamWriteSizeInBytes = 4 * 1024 * 1024;
//Upload it on Azure Storage
blobs.UploadFromByteArrayAsync(item.Document_Bytes, 0, item.Document_Bytes.Length - 1, AccessCondition.GenerateEmptyCondition(), blobRequestOptions, new OperationContext());
But make sure before call this funtion if you have huge amount of data then use any of compression technology. I have used "zlib" library. You can find it on http://www.componentace.com/zlib_.NET.htm for C# .NET it's freeware. If you want to know more then visit this https://www.zlib.net/.
Per my understanding, you could also leverage fineuploader to directly upload your files to Azure Blob storage without sending the file to your server first. For detailed description, you could follow Uploading Directly to Azure Blob Storage.
The script would look like as follows:
var uploader = new qq.azure.FineUploader({
element: document.getElementById('fine-uploader'),
request: {
endpoint: 'https://<your-storage-account-name>.blob.core.windows.net/<container-name>'
},
signature: {
endpoint: 'https://yourapp.com/uploadimage/signature'
},
uploadSuccess: {
endpoint: 'https://yourapp.com/uploadimage/done'
}
});
You could follow Getting Started with Fine Uploader and install the fine-uploader package, then follow here for initializing FineUploader for Azure Blob Storage, then follow here to configure CORS for your blob container and expose the endpoint for creating the SAS token. Moreover, here is a similar issue for using FineUploader.
From the API I'm directly converting the image data to bytes and uploading to Azure blob.
I'm concerned because we're uploading a large amount of data and the way I'm using right now, will create memory issue or not
For the approach about uploading the file to your Web API endpoint first, then upload to azure storage blob, I would prefer use MultipartFormDataStreamProvider for storing the uploaded file into a temp file in the server instead of MultipartMemoryStreamProvider which would use the memory. Details you could follow the related code snippet in this issue. Moreover, you could follow the github sample for uploading files using the Web API.
Related
This should have been simple but turned out to require a bit of GoogleFu.
I have an Azure Synapse Spark Notebook written in C# that
Receives a list of Deflate compressed IIS files.
Reads the files as binary into a DataFrame
Decompresses these files one at a time and writes them into Parquet format.
Now after all of them have been successfully processed I need to delete the compressed files.
This is my proof of concept but it works perfectly.
Create a linked service pointing to the storage account that contains the files you want to delete see Configure access to Azure Blob Storage
See code sample below
#r "nuget:Azure.Storage.Files.DataLake,12.0.0-preview.9"
using Microsoft.Spark.Extensions.Azure.Synapse.Analytics.Utils;
using Microsoft.Spark.Extensions.Azure.Synapse.Analytics.Notebook.MSSparkUtils;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;
string blob_sas_token = Credentials.GetConnectionStringOrCreds('your linked service name here');
Uri uri = new Uri($"https://'your storage account name here'.blob.core.windows.net/'your container name here'{blob_sas_token}") ;
DataLakeServiceClient _serviceClient = new DataLakeServiceClient(uri);
DataLakeFileSystemClient fileClient = _serviceClient.GetFileSystemClient("'path to directory containing the file here'") ;
fileClient.DeleteFile("'file name here'") ;
The call to Credentials.GetConnectionStringOrCreds returns a signed SAS token that is ready for your code to attach to a storage resource uri.
You could of course use the DeleteFileAsync method if you so desire.
Hope this saves someone else a few hours of GoogleFu.
I have some code running behind an API that loops through a list of files on Azure Blob Storage, Zips them up and saves the final Zip to the same storage account. I then provide a link to the Zip file for my users to access.
This solution works fine provided the files are small. However there are many files in the 2-5 GB range and as soon as these are tested I get an out of memory exception error:
'Array dimensions exceeded supported range.'
I've seen systems like OneDrive and GoogleDrive create these files very quickly and I aspire to creating that experience for my users. But I am also fine with notifying the user when the archive is ready to download even if it is a few minutes later as I will have their email.
Here is a version of the code simplified and running in a console app:
using Microsoft.WindowsAzure.Storage;
using System.IO.Compression;
var account = CloudStorageAccount.Parse("ConnectionString");
var blobClient = account.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("ContainerName");
var blob = container.GetBlockBlobReference("ZipArchive.zip");
using (var stream = await blob.OpenWriteAsync())
using (var zip = new ZipArchive(stream, ZipArchiveMode.Create))
{
var files = new string[] {
"files/psds/VeryLargePsd_1.psd",
"files/psds/VeryLargePsd_2.psd",
"files/psds/VeryLargePsd_3.psd",
"files/zips/VeryLargeZip_1.zip",
"files/zips/VeryLargeZip_2.zip"
};
foreach (var file in files)
{
var sourceBlob = container.GetBlockBlobReference(file);
var index = file.LastIndexOf('/') + 1;
var fileName = file.Substring(index, file.Length - index);
var entry = zip.CreateEntry(fileName, CompressionLevel.Optimal);
await sourceBlob.FetchAttributesAsync();
byte[] imageBytes = new byte[sourceBlob.Properties.Length];
await sourceBlob.DownloadToByteArrayAsync(imageBytes, 0);
using (var zipStream = entry.Open())
zipStream.Write(imageBytes, 0, imageBytes.Length);
}
}
As you mentioned it working for small files and large files it throwing the error.
workarounds
1) Upload the large files with small chunks then zip them .
For more details refer this SO Thread: Upload a zip file in small chunks to azure cloud blob storage
2) This tutorial shows you deploy an application that uploads large amount of random data to an Azure storage account: Upload large amounts of random data in parallel to Azure storage
3) uploading large files, you can use Microsoft Azure Storage Data Movement Library for better performance. The Microsoft Azure Storage Data Movement Library designed for high-performance uploading, downloading and copying Azure Storage Blob and File
We have files store in azure storage account gen 2
We are using api approach to create,delete and read the files [ as mention here
Read File ]
We are trying to copying the file from one storage account to another using api approach. Can someone suggest fast approach to achieve it ?
Note:
I am looking for code approach in c# without AzCopy
In Gen 1, Data Movement Library is there but I am looking for Gen 2
you could use AzCopy to transfer data. You could copy data between a file system and a storage account, or between storage accounts with AzCopy.
About the details how to use AzCopy you could refer to this official doc. In this doc , there are download link and the tutorials.
Update:
About transfer files between file shares you could refer to this code:
AzCopy /Source:https://myaccount1.file.core.windows.net/myfileshare1/ /Dest:https://myaccount2.file.core.windows.net/myfileshare2/ /SourceKey:key1 /DestKey:key2 /S
Other about Copy files in File Storage you could refer to the doc.
If you still have other questions, please let me know. Hope this could help you.
Actually, it's very hard to find the working solution because the official documentation is outdated and there is a lack of any up-to-date examples there.
Outdated way
An outdated example of working with blob containers could be found here: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-dotnet?tabs=windows
That example uses WindowsAzure.Storage NuGet package that was renamed to Microsoft.Azure.Storage.* and split to separate packages.
Up-to-date solution
I'm currently working on the deployment of the static SPA to Azure Blob storage. It has a very nice feature "Static website" that serves the files.
There is a working example that could be used to copy all contents from one blob container to another. Please consider it as a hint (not production ready).
All you need is to:
Have an existing blob container.
Install Microsoft.Azure.Storage.DataMovement NuGet package.
Provide a proper connection string.
Here is the code:
// I left fully qualified names of the types to make example clear.
var connectionString = "Connection string from `Azure Portal > Storage account > Access Keys`";
var sourceContainerName = "<source>";
var destinationContainerName = "<destination>";
var storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
var client = storageAccount.CreateCloudBlobClient();
var sourceContainer = client.GetContainerReference(sourceContainerName);
var destinationContainer = client.GetContainerReference(destinationContainerName);
// Create destination container if needed
await destinationContainer.CreateIfNotExistsAsync();
var sourceBlobDir = sourceContainer.GetDirectoryReference(""); // Root directory
var destBlobDir = destinationContainer.GetDirectoryReference("");
// Use UploadOptions to set ContentType of destination CloudBlob
var options = new Microsoft.Azure.Storage.DataMovement.CopyDirectoryOptions
{
Recursive = true,
};
var context = new Microsoft.Azure.Storage.DataMovement.DirectoryTransferContext();
// Perform the copy
var transferStatus = await Microsoft.Azure.Storage.DataMovement.TransferManager
.CopyDirectoryAsync(sourceBlobDir, destBlobDir, true, options, context);
I've just started working with Data Lake and I'm currently trying to figure out the real workflow steps and how to automatize the whole process.
Say I have some files as an input and I would like to process them and download output files in order to push into my data warehouse or/and SSAS.
I've found absolutely lovely API and it's all good but I can't find a way to get all the file names in a directory to get them downloaded further.
Please correct my thoughts regarding workflow. Is there another, more elegant way to automatically get all the processed data (outputs) into a storage (like conventional SQL Server, SSAS, data warehouse and etc)?
If you have a working solution based on Data Lake, please describe the workflow (from "raw" files to reports for end-users) with a few words.
here is my example of NET Core application
using Microsoft.Azure.DataLake.Store;
using Microsoft.IdentityModel.Clients.ActiveDirectory;
using Microsoft.Rest.Azure.Authentication;
var creds = new ClientCredential(ApplicationId, Secret);
var clientCreds = ApplicationTokenProvider.LoginSilentAsync(Tenant, creds).GetAwaiter().GetResult();
var client = AdlsClient.CreateClient("myfirstdatalakeservice.azuredatalakestore.net", clientCreds);
var result = client.GetDirectoryEntry("/mynewfolder", UserGroupRepresentation.ObjectID);
Say I have some files as an input and I would like to process them and download output files in order to push into my data warehouse or/and SSAS.
If you want to download the files from the folder in the azure datalake to the local path, you could use the following code to do that.
client.BulkDownload("/mynewfolder", #"D:\Tom\xx"); //local path
But based on my understanding, you could use the azure datafactory to push your data from datalake store to azure storage blob or azure file storge.
I am working on an application where file uploads happen often, and can be pretty large in size.
Those files are being uploaded to a Web API, which will then get the Stream from the request, and pass it on to my storage service, that then uploads it to Azure Blob Storage.
I need to make sure that:
No temp files are written on the Web API instance
The request stream is not fully read into memory before passing it on to the storage service (to prevent OutOfMemoryExceptions).
I've looked at this article, which describes how to disable input stream buffering, but because many file uploads from many different users happen simultaneously, it's important that it actually does what it says on the tin.
This is what I have in my controller at the moment:
if (this.Request.Content.IsMimeMultipartContent())
{
var provider = new MultipartMemoryStreamProvider();
await this.Request.Content.ReadAsMultipartAsync(provider);
var fileContent = provider.Contents.SingleOrDefault();
if (fileContent == null)
{
throw new ArgumentException("No filename.");
}
var fileName = fileContent.Headers.ContentDisposition.FileName.Replace("\"", string.Empty);
// I need to make sure this stream is ready to be processed by
// the Azure client lib, but not buffered fully, to prevent OoM.
var stream = await fileContent.ReadAsStreamAsync();
}
I don't know how I can reliably test this.
EDIT: I forgot to mention that uploading directly to Blob Storage (circumventing my API) won't work, as I am doing some size checking (e.g. can this user upload 500mb? Has this user used his quota?).
Solved it, with the help of this Gist.
Here's how I am using it, along with a clever "hack" to get the actual file size, without copying the file into memory first. Oh, and it's twice as fast
(obviously).
// Create an instance of our provider.
// See https://gist.github.com/JamesRandall/11088079#file-blobstoragemultipartstreamprovider-cs for implementation.
var provider = new BlobStorageMultipartStreamProvider ();
// This is where the uploading is happening, by writing to the Azure stream
// as the file stream from the request is being read, leaving almost no memory footprint.
await this.Request.Content.ReadAsMultipartAsync(provider);
// We want to know the exact size of the file, but this info is not available to us before
// we've uploaded everything - which has just happened.
// We get the stream from the content (and that stream is the same instance we wrote to).
var stream = await provider.Contents.First().ReadAsStreamAsync();
// Problem: If you try to use stream.Length, you'll get an exception, because BlobWriteStream
// does not support it.
// But this is where we get fancy.
// Position == size, because the file has just been written to it, leaving the
// position at the end of the file.
var sizeInBytes = stream.Position;
Voilá, you got your uploaded file's size, without having to copy the file into your web instance's memory.
As for getting the file length before the file is uploaded, that's not as easy, and I had to resort to some rather non-pleasant methods in order to get just an approximation.
In the BlobStorageMultipartStreamProvider:
var approxSize = parent.Headers.ContentLength.Value - parent.Headers.ToString().Length;
This gives me a pretty close file size, off by a few hundred bytes (depends on the HTTP header I guess). This is good enough for me, as my quota enforcement can accept a few bytes being shaved off.
Just for showing off, here's the memory footprint, reported by the insanely accurate and advanced Performance Tab in Task Manager.
Before - using MemoryStream, reading it into memory before uploading
After - writing directly to Blob Storage
I think a better approach is for you to go directly to Azure Blob Storage from your client. By leveraging the CORS support in Azure Storage you eliminate load on your Web API server resulting in better overall scale for your application.
Basically, you will create a Shared Access Signature (SAS) URL that your client can use to upload the file directly to Azure storage. For security reasons, it is recommended that you limit the time period for which the SAS is valid. Best practices guidance for generating the SAS URL is available here.
For your specific scenario check out this blog from the Azure Storage team where they discuss using CORS and SAS for this exact scenario. There is also a sample application so this should give you everything you need.