Azure Storage Data Movement Library copy is much slower than AzCopy - c#

I have next test setup:
One test Azure blob storage account
Local folder with ~3000 small files (200 bytes each)
When I execute azcopy command:
azcopy copy --recursive "c:\localDir\*" "https://BLOBConnectionString"
it takes ~2 seconds to copy data.
When I do next c# code:
ServicePointManager.Expect100Continue = false;
ServicePointManager.DefaultConnectionLimit = 32;
TransferManager.Configurations.ParallelOperations = 32;
var account = CloudStorageAccount.Parse("https://BLOBConnectionString");
CloudBlobClient client = account.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference("container");
await container.CreateIfNotExistsAsync();
CloudBlobDirectory destinationBlob = container.GetDirectoryReference("data");
await TransferManager.UploadDirectoryAsync(#"c:\localDir\", destinationBlob);
It takes ~1 minute to copy same amount of data.
I expect to have approximately same latency results for c# code base.

I tried in my environment and got below results:
Code:
using Microsoft.Azure.Storage;
using Microsoft.Azure.Storage.Blob;
using Microsoft.Azure.Storage.DataMovement;
using System.ComponentModel;
namespace fastercpy
{
class program
{
public static void Main()
{
string storageConnectionString = "< Connection string >";
CloudStorageAccount account = CloudStorageAccount.Parse(storageConnectionString);
CloudBlobClient blobClient = account.CreateCloudBlobClient();
CloudBlobContainer blobContainer = blobClient.GetContainerReference("test");
blobContainer.CreateIfNotExists();
string sourcePath = "C:\\Users\\v-vsettu\\Documents\\Venkat";
CloudBlobDirectory destinationBlob = blobContainer.GetDirectoryReference("data");
TransferManager.Configurations.ParallelOperations = 64;
// Setup the transfer context and track the upoload progress
SingleTransferContext context = new SingleTransferContext();
context.ProgressHandler = new Progress<TransferStatus>((progress) =>
{
Console.WriteLine("Bytes uploaded: {0}", progress.BytesTransferred);
});
var task=TransferManager.UploadDirectoryAsync(sourcePath, destinationBlob);
task.Wait();
}
}
}
you have been used
TransferManager.Configurations.ParallelOperations = 32 ,So try to use TransferManager.Configurations.ParallelOperations = 64; in your code it will speed up process.
The Microsoft Azure Storage Data Movement Library was created for fast uploading, downloading, and copying of Azure Storage Blob and File.
Console:
Portal:
Reference:
Transfer data with the Data Movement library for .NET - Azure Storage | Microsoft Learn

Related

Upload files to blob storage via Azure Front Door

The app is set up on multiple on-premise services and uploads regularly some files to Azure Blob Storage placed in East US. But now it's necessary to place an instance of the app in the Australian region. As a result, upload time to the cloud increased drastically.
I have tested if Azure Front Door can help to improve it and I found that download from blob storage works 5 times faster if I use the Azure Front Door link. Now I struggle to change C# code to upload files via Azure Front Door. I have tried to use the suffix "azurefd.net" instead of "core.windows.net" in the connection string but it does not help. Could somebody give me a hint on how to upload files to Azure blob storage via Azure Front Door in C#?
As the Storage connection string uses only storage endpoint (core.windows.net), we cannot use front door endpoint (azurefd.net) in the connection string.
I integrated Azure Storage Account with Front Door. I am able to access the Blob Files in the Azure Storage Account with Front Door URL.
We cannot upload files to Blob Storage via Azure Front Door using C#
This is because Azure Storage in C# accepts connection string only from storage endpoint
Unfortunately, for upload, Azure Front Door does not provide any benefit. I used
PUT requests for the test described here: https://learn.microsoft.com/en-us/rest/api/storageservices/put-blob
PUT https://<entityName>.azurefd.net/<containerName>/<blobName>?<sharedAccessSignature>
x-ms-version: 2020-10-02
x-ms-blob-type: BlockBlob
< C:\Downloads\t1.txt
and compared times for storage account and Azure Front account. There is no difference in speed for upload.
Code that I used for test:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;
namespace SandboxV2
{
class Program
{
static async Task Main()
{
string frontDoorUrl = "https://<FRONT-DOOR>.azurefd.net";
string storageUrl = "https://{STORAGE-ACCOUNT}.blob.core.windows.net";
string sasString = "...";
Console.Write("File Path: ");
string filePath = Console.ReadLine();
await RunUploadTestAsync(filePath, frontDoorUrl, sasString, "-fd");
await RunUploadTestAsync(filePath, storageUrl, sasString, "-storage");
}
private static async Task RunUploadTestAsync(string filePath, string rootUrl, string sasString, string suffix)
{
string blobName = Path.GetFileNameWithoutExtension(filePath) + suffix + Path.GetExtension(filePath);
Console.WriteLine(rootUrl);
string containerName = "testaccess";
var speeds = new List<double>();
var times = new List<TimeSpan>();
for (int i = 0; i < 5; i++)
{
var t1 = DateTime.UtcNow;
var statusCode = await UploadAsSingleBlock(filePath, rootUrl, blobName, containerName, sasString);
var t2 = DateTime.UtcNow;
var time = t2 - t1;
var speed = new FileInfo(filePath).Length / time.TotalSeconds / 1024 / 1024 * 8;
speeds.Add(speed);
times.Add(time);
Console.WriteLine($"Code: {statusCode}. Time: {time}. Speed: {speed}");
}
Console.WriteLine($"Average time: {TimeSpan.FromTicks((long)times.Select(t => t.Ticks).Average())}. Average speed: {speeds.Average()}.");
}
private static async Task<HttpStatusCode> UploadAsSingleBlock(string filePath, string rootUrl, string blobName, string containerName, string sasString)
{
var request = new HttpRequestMessage(HttpMethod.Put, $"{rootUrl}/{containerName}/{blobName}?{sasString}");
request.Headers.Add("x-ms-version", "2020-10-02");
request.Headers.Add("x-ms-blob-type", "BlockBlob");
HttpResponseMessage response;
using (var fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
request.Content = new StreamContent(fileStream);
using (var client = new HttpClient())
response = await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
}
return response.StatusCode;
}
}
}

Downloading Image from Azure Blob C#

I am trying to download an image (.jpg) from an Azure storage blob using:
public static async Task DownloadToTemp(string path, string fileName)
{
string storageAccount_connectionString = "CONNECTION STRING";
CloudStorageAccount mycloudStorageAccount = CloudStorageAccount.Parse(storageAccount_connectionString);
CloudBlobClient blobClient = mycloudStorageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference(CONTAINER);
CloudBlockBlob cloudBlockBlob = container.GetBlockBlobReference(fileName);
// provide the file download location below
Stream file = File.OpenWrite(path);
await cloudBlockBlob.DownloadToStreamAsync(file);
file.Close();
return;
}
But when I try to open the image as a bit map Bitmap image = new Bitmap(path) I am getting the error System.ArgumentException: 'Parameter is not valid.'. I am using the call BlobHandler.DownloadToTemp(path, file).GetAwaiter() to ensure the file has been downloaded.
So the problem lies with not waiting fully for the image to be stored:
adding:
await BlobHandler.DownloadToTemp(path, file);
Ensures that the file is fully downloaded (This meant that the callerfunction had to be async).

Download Blob file into Memory Stream from Azure using C#

I am trying to read a blob from Azure using C# Console application. I can download the blob file to my local machine and read it but when I am trying to download it to memory stream it is throwing an error as attached.
I want to directly read my blob file instead of downloading it to local machine as I have to deploy it in Azure. Any suggestions on it?
var storageaccount = new CloudStorageAccount(new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials(StorageAccountName, StorageAccountKey), true);
var blobclient = storageaccount.CreateCloudBlobClient();
var container = blobclient.GetContainerReference("");
// var blobs = container.ListBlobs();
var blockBlobReference = container.GetBlockBlobReference(sourceBlobFileName);
using (var memorystream = new MemoryStream())
{
blockBlobReference.DownloadToStream(memorystream);
}
I think the problem is because you're wrapping it in a using statement. Try the following:
var storageaccount = new CloudStorageAccount(new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials(StorageAccountName, StorageAccountKey), true);
var blobclient = storageaccount.CreateCloudBlobClient();
var container = blobclient.GetContainerReference("");
// var blobs = container.ListBlobs();
var blockBlobReference = container.GetBlockBlobReference(sourceBlobFileName);
var memorystream = new MemoryStream();
blockBlobReference.DownloadToStream(memorystream);
byte[] content = memoryStream.ToArray();
I tried your code, but Blob downloaded successfully.
your screenshot just shows the variable in Visual Studio debugger, and it looks like the error doesn't actually occurred.
In screenshot, MemoryStream, CanTimeout property set to false.
So, it seems that ReadTimeout and WriteTimeout throw exceptions only because MemoryStream does not support them.
Reference:
Stream.ReadTimeout Property
Please note that the exception won't occurred until you use ReadTimeout and WriteTimeout properties.

CONTENT-MD5 is missing AZURE portal

I'm uploading files in AZURE blob storage using c# library. If I upload a file with byte[], it calculates MD5 (verified in AZURE portal - displays CONTENT-MD5 value).
CloudBlockBlob blockBlob = GetUserProjectsBlob(blobName);
blockBlob.UploadFromByteArray(file, 0, file.Length);
I need to upload a large file into AZURE. So I'm using PutBlock & PutBlockList methods..
string blockHash = GetMd5FromStream(new MemoryStream(file));
blockBlob.PutBlock(blockId, new MemoryStream(file, true), blockHash);
// To commit transaction
blockBlob.PutBlockList(blockIDsBase64);
But in the above scenario, CONTENT-MD5 is missing in the AZURE portal. I have also tried this..
BlobRequestOptions opt = new BlobRequestOptions();
opt.StoreBlobContentMD5 = true;
UseTransactionalMD5 = true;
blockBlob.PutBlockList(blockIDsBase64, null, opt);
But still no luck. Any ideas about how to resolve this?
In the following lines of code:
string blockHash = GetMd5FromStream(new MemoryStream(file));
blockBlob.PutBlock(blockId, new MemoryStream(file, true), blockHash);
// To commit transaction
blockBlob.PutBlockList(blockIDsBase64);
You're actually calculating the MD5 hash of the block data. When Storage Service receives this data, it does the hash verification to ensure that block data is not corrupted.
BlobRequestOptions opt = new BlobRequestOptions();
opt.StoreBlobContentMD5 = true;
UseTransactionalMD5 = true;
blockBlob.PutBlockList(blockIDsBase64, null, opt);
Above code does not instruct Storage Service to calculate the hash of the blob you're uploading. You would need to calculate the MD5 hash of the blob yourself and send it as a part of properties doing something like:
blockBlob.Properties.ContentMD5 = "computed hash";
blockBlob.PutBlockList(blockIDsBase64, null, opt);

azure blob storage async download with progress bar

i am trying to get a COMPLETE example of downloading a file form Azure Blob Storage using the .DownloadToStreamAsync() method wired up to a progress bar.
i've found references to old implementations of the azure storage sdk, but they dont compile with the newer sdk (that has implemented these async methods) or don't work with current nuget packages.
https://blogs.msdn.microsoft.com/avkashchauhan/2010/11/03/uploading-a-blob-to-azure-storage-with-progress-bar-and-variable-upload-block-size/
https://blogs.msdn.microsoft.com/kwill/2013/03/05/asynchronous-parallel-blob-transfers-with-progress-change-notification-2-0/
i'm a newbie to async/await threading in .NET, and was wondering if someone could help me out with taking the below (in a windows form app) and showing how i can 'hook' into the progress of the file download... i see some examples dont use the .DownloadToStream method, and they instead download chunks of the blob file.. but i wondered since these new ...Async() methods exist in the newer Storage SDK's, if there was something smarter that could be done?
So assuming the below is working (non async), what additionally would i have to do to use the blockBlob.DownloadToStreamAsync(fileStream); method, is this even the right use of this, and how can i get the progress?
ideally i am after any way i can just hook the progress of the blob download so i can update a Windows Form UI on big downloads.. so if the below is not the right way, please enlighten me :)
// Retrieve storage account from connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");
// Retrieve reference to a blob named "photo1.jpg".
CloudBlockBlob blockBlob = container.GetBlockBlobReference("photo1.jpg");
// Save blob contents to a file.
using (var fileStream = System.IO.File.OpenWrite(#"path\myfile"))
{
blockBlob.DownloadToStream(fileStream);
}
Using the awesome suggested method (downloading 1mb chunks) kindly suggsted by Gaurav, i have implemented using a background worker to do the download so i can update the UI as i go.
The main part inside the do loop that downloads the range to a stream and then writes the stream to the file system I havent touched from the original example, but i have added code to update the worker progress and to listen for the worker cancellation (to abort the download).. not sure if this could be the issue?
For completeness, below is everything inside the worker_DoWork method:
public void worker_DoWork(object sender, DoWorkEventArgs e)
{
object[] parameters = e.Argument as object[];
string localFile = (string)parameters[0];
string blobName = (string)parameters[1];
string blobContainerName = (string)parameters[2];
CloudBlobClient client = (CloudBlobClient)parameters[3];
try
{
int segmentSize = 1 * 1024 * 1024; //1 MB chunk
var blobContainer = client.GetContainerReference(blobContainerName);
var blob = blobContainer.GetBlockBlobReference(blobName);
blob.FetchAttributes();
blobLengthRemaining = blob.Properties.Length;
blobLength = blob.Properties.Length;
long startPosition = 0;
do
{
long blockSize = Math.Min(segmentSize, blobLengthRemaining);
byte[] blobContents = new byte[blockSize];
using (MemoryStream ms = new MemoryStream())
{
blob.DownloadRangeToStream(ms, startPosition, blockSize);
ms.Position = 0;
ms.Read(blobContents, 0, blobContents.Length);
using (FileStream fs = new FileStream(localFile, FileMode.OpenOrCreate))
{
fs.Position = startPosition;
fs.Write(blobContents, 0, blobContents.Length);
}
}
startPosition += blockSize;
blobLengthRemaining -= blockSize;
if (blobLength > 0)
{
decimal totalSize = Convert.ToDecimal(blobLength);
decimal downloaded = totalSize - Convert.ToDecimal(blobLengthRemaining);
decimal blobPercent = (downloaded / totalSize) * 100;
worker.ReportProgress(Convert.ToInt32(blobPercent));
}
if (worker.CancellationPending)
{
e.Cancel = true;
blobDownloadCancelled = true;
return;
}
}
while (blobLengthRemaining > 0);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
This is working, but on bigger files (30mb for example), i sometimes am getting 'can't write to file as open in another process error...' and the process fails..
Using your code:
using (var fileStream = System.IO.File.OpenWrite(#"path\myfile"))
{
blockBlob.DownloadToStream(fileStream);
}
It is not possible to show the progress because the code comes out of this function only when the download is complete. DownloadToStream function will internally split a large blob in chunks and download the chunks.
What you need to do is download these chunks using your code. What you have to do is use DownloadRangeToStream method. I answered a similar question some time back that you may find useful: Azure download blob part.

Categories

Resources