I tested and found that for uploading a folder full of files to an Azure Blob Storage container, the fastest method (with the v12 Azure Storage SDK) involves using a Queue (Queue<Task<Response<BlobContentInfo>>>) and firing off an UploadAsync() for each file, ending in a Task.WhenAll(). The blob client coordinates max concurrency, etc. as configured. However, I need to show total progress across all files, and when I use the same ProgressHandler for each blob client, the Progress<long>() event only returns total bytes uploaded per file, without any context for the underlying file whose progress is being reported. How can I get the cumulative progress of multiple async uploads?
using Azure;
using Azure.Storage;
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Models;
public async Task UploadBlobs(string sourcePath)
{
var blobContainerClient = new BlobContainerClient(connectionString, containerName);
var tasks = new Queue<Task<Response<BlobContentInfo>>>();
var progressHandler = new Progress<long>();
progressHandler.ProgressChanged += UploadProgressChanged;
var options = new BlobUploadOptions()
{
ProgressHandler = progressHandler,
TransferOptions = new StorageTransferOptions()
{
MaximumConcurrency = Environment.ProcessorCount * 2,
MaximumTransferSize = 50 * 1024 * 1024
}
};
foreach (string filePath in Directory.GetFiles(sourcePath))
{
string fileName = Path.GetFileName(filePath);
Console.WriteLine($"Uploading {fileName}\r\n");
var blobClient = blobContainerClient.GetBlobClient(fileName);
tasks.Enqueue(blobClient.UploadAsync(filePath, options));
}
await Task.WhenAll(tasks);
}
private void UploadProgressChanged(object sender, long bytesUploaded)
{
// This handles every progress change for every file, but with no file context info!
}
One possible solution to track individual blob's upload progress is to create your own progress handler and pass BlobClient to it.
Here's a rudimentary implementation of this:
public class BlobUploadProgressChange : Progress<long>
{
private readonly BlobClient _blobClient;
public BlobUploadProgressChange(BlobClient blobClient) : base()
{
_blobClient = blobClient;
}
public BlobClient BlobClient
{
get { return _blobClient; }
}
}
This is how you would modify the UploadProgressChanged event handler:
void UploadProgressChanged(object? sender, long bytesUploaded)
{
BlobUploadProgressChange item = (BlobUploadProgressChange) sender;
Console.WriteLine(item.BlobClient.Name);
Console.WriteLine($"Bytes uploaded: {bytesUploaded}");
Console.WriteLine("====================================");
}
and this is how you can use it:
foreach (string filePath in Directory.GetFiles(sourcePath))
{
string fileName = Path.GetFileName(filePath);
Console.WriteLine($"Uploading {fileName}\r\n");
var blobClient = blobContainerClient.GetBlobClient(fileName);
var progressHandler = new BlobUploadProgressChange(blobClient);
progressHandler.ProgressChanged += UploadProgressChanged;
var options = new BlobUploadOptions()
{
ProgressHandler = progressHandler
};
tasks.Enqueue(blobClient.UploadAsync(filePath, options));
}
Related
I am using Azure File share I want to create zip file only once but wants to update it multiple times (upload multiple files after once created).
is it possible to create .zip file only once and add more files in it later without **overriding **existing files in zip.?
when i tried to add more files in .zip it overrides existing files in zip with new file.
private static async Task OpenZipFile()
{
try
{
using (var zipFileStream = await OpenZipFileStream())
{
using (var zipFileOutputStream = CreateZipOutputStream(zipFileStream))
{
var level = 0;
zipFileOutputStream.SetLevel(level);
BlobClient blob = new BlobClient(new Uri(String.Format("https://{0}.blob.core.windows.net/{1}", "rtsatestdata", "comm/2/10029.txt")), _currentTenantTokenCredential);
var zipEntry = new ZipEntry("newtestdata")
{
Size = 1170
};
zipFileOutputStream.PutNextEntry(zipEntry);
blob.DownloadToAsync(zipFileOutputStream).Wait();
zipFileOutputStream.CloseEntry();
}
}
}
catch (TaskCanceledException)
{
throw;
}
}
private static async Task<Stream> OpenZipFileStream()
{
BlobContainerClient mainContainer = _blobServiceClient.GetBlobContainerClient("comm");
var blobItems = mainContainer.GetBlobs(BlobTraits.Metadata, BlobStates.None);
foreach (var item in blobItems)
{
if (item.Name == "testdata.zip")
{
BlobClient blob = new BlobClient(new Uri(String.Format("https://{0}.blob.core.windows.net/{1}", "rtsatestdata", "comm/testdata.zip")), _currentTenantTokenCredential);
return await blob.OpenWriteAsync(true
, options: new BlobOpenWriteOptions
{
HttpHeaders = new BlobHttpHeaders
{
ContentType = "application/zip"
}
}
);
}
}
}
private static ZipOutputStream CreateZipOutputStream(Stream zipFileStream)
{
return new ZipOutputStream(zipFileStream)
{
IsStreamOwner = false,
};
}
This is not possible in Azure storage. The workaround would be to download the zip, unzip it, add more files, re-zip it, and re-upload to storage.
I'm trying to print the name of the blobs in a container, however nothing prints after the line
List<BlobItem> segment = await blobContainer.GetBlobsAsync().ToListAsync();
Full code:
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using Azure;
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Models;
using System.Data.Entity;
using Azure.Storage.Sas;
using System.Linq;
namespace IterateBlobs
{
/*Main code for iterating over blobs*/
class Program
{
static void Main(string[] args)
{
Task iterateBlobs = IterateBlobsAsync();
}
private static async Task IterateBlobsAsync()
{
String connect = "connection string";
BlobServiceClient sourceClient = new BlobServiceClient(connect);
BlobContainerClient blobContainer = sourceClient.GetBlobContainerClient("container name");
// Iterate through the blobs in a container
List<BlobItem> segment = await blobContainer.GetBlobsAsync().ToListAsync();
foreach (BlobItem blobItem in segment)
{
Console.WriteLine(blobItem.Name + " ");
BlobClient blob = blobContainer.GetBlobClient(blobItem.Name);
// Check the source file's metadata
Response<BlobProperties> propertiesResponse = await blob.GetPropertiesAsync();
BlobProperties properties = propertiesResponse.Value;
}
}
}
}
Glad that you have fixed the issue.
Thanks #n0rd posting your suggestion as an answer so that it will be beneficial to other community members.
If you are iterating the Blob containers and files you have to use the async Task.
In your code you are iterating the blob containers but in a main block it doesn't wait for all iteration to complete. It will return once get any response from IterateBlobsAsync. It will exit from the main methods that is the reason you are not getting any container information.
To avoid this, you have to follow the below
Modified Code
class Program
{
static async Task Main(string[] args)
{
Task iterateBlobs = await IterateBlobsAsync();
}
private static async Task IterateBlobsAsync()
{
String connect = "connection string";
BlobServiceClient sourceClient = new BlobServiceClient(connect);
BlobContainerClient blobContainer = sourceClient.GetBlobContainerClient("container name");
// Iterate through the blobs in a container
List<BlobItem> segment = await blobContainer.GetBlobsAsync();
foreach (BlobItem blobItem in segment)
{
Console.WriteLine(blobItem.Name + " ");
BlobClient blob = blobContainer.GetBlobClient(blobItem.Name);
// Check the source file's metadata
Response<BlobProperties> propertiesResponse = await blob.GetPropertiesAsync();
BlobProperties properties = propertiesResponse.Value;
}
}
}
}
Refer MSDOC to list the blob containers.
I am trying to insert 100K records from a blob storage (tab delimited file)into a azure table storage using Azure Function. The insertion of those records are very unreliable. At times it does insert all the 100K and sometimes as low as 7500 records. I am struggling to understand the reason why it is failing to insert rest of the records. I am not seeing any exception being thrown. Here is the code that I am using. I am not sure if I am doing something wrong here.
This doesnt happen when I run it locally but only happens when the function is published
[FunctionName("LargeFileProcessor")]
public async Task Run([BlobTrigger("%InputContainer%/{name}", Connection = "AzureWebJobsStorage")]Stream myBlob, string name, ILogger log)
{
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(Environment.GetEnvironmentVariable("AzureCloudSettings.StorageConnectionString"));
// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("adeptra");
CloudBlockBlob cbb = container.GetBlockBlobReference(name);
using (var memoryStream = new MemoryStream())
{
await cbb.DownloadToStreamAsync(memoryStream);
string[] result = Encoding.
ASCII.
GetString(memoryStream.ToArray()).
Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
var largeFiles = new List<LargeFile>();
foreach (var str in result)
{
if (string.IsNullOrWhiteSpace(str.Trim())) continue;
var largeFile = MapFlatFile(str);
largeFiles.Add(largeFile);
}
//Delete the blob
await cbb.DeleteAsync();
for (int i = 0; i < largeFiles.Count; i += 100)
{
var items = largeFiles.Skip(i).Take(100);
var success = await _service.BulkInsertLargeFileAsync(items);
if(!string.IsNullOrWhiteSpace(success))
{
throw new Exception("$Failed at loop i:{i} for reason :{success}");
}
}
}
}
public async Task<string> BulkInsertLargeFileAsync(IEnumerable<LargeFile> rows)
{
try
{
var largeFileEntity = _mapper.Map<IEnumerable<LargeFileEntity>>(rows);
var result = await _largeFileDataProvider.BulkAddCommsTemplateAsync(largeFileEntity);
return null;
}
catch (Exception exception)
{
return $"Exception at BulkInsertLargeFileAsync {exception.Message}";
}
}
public async Task<IList<TableResult>> BulkAddCommsTemplateAsync(IEnumerable<LargeFileEntity> templates)
{
return await _largeRepository.BatchInsertAsync(templates);
}
public async Task<IList<TableResult>> BatchInsertAsync(IEnumerable<TEntity> rows)
{
TableBatchOperation batchOperation = new TableBatchOperation();
foreach (var entity in rows)
{
batchOperation.InsertOrMerge(entity);
}
var result = await _table.ExecuteBatchAsync(batchOperation);
return result;
}
-Alan-
EDIT:
If I turn off the "elastic scale out" option in azure function(minimum instance =1 , maximum burst = 1), then I can see the results are consistent (100K records being inserted all times).
While trying to access all files of the Azure blob folder, getting sample code for container.ListBlobs(); however it looks like an old one.
Old Code : container.ListBlobs();
New Code trying : container.ListBlobsSegmentedAsync(continuationToken);
I am trying to use the below code :
container.ListBlobsSegmentedAsync(continuationToken);
Folders are like :
Container/F1/file.json
Container/F1/F2/file.json
Container/F2/file.json
Looking for the updated version to get all files from an Azure folder.
Any sample code would help, thanks!
C# code:
//connection string
string storageAccount_connectionString = "**NOTE: CONNECTION STRING**";
// Retrieve storage account from connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(storageAccount_connectionString);
// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("**NOTE:NAME OF CONTAINER**");
//The specified container does not exist
try
{
//root directory
CloudBlobDirectory dira = container.GetDirectoryReference(string.Empty);
//true for all sub directories else false
var rootDirFolders = dira.ListBlobsSegmentedAsync(true, BlobListingDetails.Metadata, null, null, null, null).Result;
foreach (var blob in rootDirFolders.Results)
{
Console.WriteLine("Blob", blob);
}
}
catch (Exception e)
{
// Block of code to handle errors
Console.WriteLine("Error", e);
}
Here is the code for the Answer :
private async Task<List<IListBlobItem>> ListBlobsAsync(CloudBlobContainer container)
{
BlobContinuationToken continuationToken = null;
List<IListBlobItem> results = new List<IListBlobItem>();
do
{
bool useFlatBlobListing = true;
BlobListingDetails blobListingDetails = BlobListingDetails.None;
int maxBlobsPerRequest = 500;
var response = await container.ListBlobsSegmentedAsync(BOAppSettings.ConfigServiceEnvironment, useFlatBlobListing, blobListingDetails, maxBlobsPerRequest, continuationToken, null, null);
continuationToken = response.ContinuationToken;
results.AddRange(response.Results);
}
while (continuationToken != null);
return results;
}
And then you can return values like:
IEnumerable<IListBlobItem> listBlobs = await this.ListBlobsAsync(container);
foreach(CloudBlockBlob cloudBlockBlob in listBlobs)
{
BOBlobFilesViewModel boBlobFilesViewModel = new BOBlobFilesViewModel
{
CacheKey = cloudBlockBlob.Name,
Name = cloudBlockBlob.Name
};
listBOBlobFilesViewModel.Add(boBlobFilesViewModel);
}
//return listBOBlobFilesViewModel;
Update:
Getting all files name from a directory with Azure.Storage.Blobs v12 - Package
var storageConnectionString = "DefaultEndpointsProtocol=...........=core.windows.net";
var blobServiceClient = new BlobServiceClient(storageConnectionString);
//get container
var container = blobServiceClient.GetBlobContainerClient("container_name");
List<string> blobNames = new List<string>();
//Enumerating the blobs may make multiple requests to the service while fetching all the values
//Blobs are ordered lexicographically by name
//if you want metadata set BlobTraits - BlobTraits.Metadata
var blobHierarchyItems = container.GetBlobsByHierarchyAsync(BlobTraits.None, BlobStates.None, "/");
await foreach (var blobHierarchyItem in blobHierarchyItems)
{
//check if the blob is a virtual directory.
if (blobHierarchyItem.IsPrefix)
{
// You can also access files under nested folders in this way,
// of course you will need to create a function accordingly (you can do a recursive function)
// var prefix = blobHierarchyItem.Name;
// blobHierarchyItem.Name = "folderA\"
// var blobHierarchyItems= container.GetBlobsByHierarchyAsync(BlobTraits.None, BlobStates.None, "/", prefix);
}
else
{
blobNames.Add(blobHierarchyItem.Blob.Name);
}
}
There are more option and example you can find it here.
This is the link to the nuget package.
The method CloudBlobClient.ListBlobsSegmentedAsync is used to return a result segment containing a collection of blob items in the container.
To list all blobs, we can use ListBlobs method,
Here is a demo for your reference:
public static List<V> ListAllBlobs<T, V>(Expression<Func<T, V>> expression, string containerName,string prefix)
{
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("YourConnectionString;");
CloudBlobClient cloudBlobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = cloudBlobClient.GetContainerReference(containerName);
container.CreateIfNotExists();
var list = container.ListBlobs(prefix: prefix,useFlatBlobListing: true);
List<V> data = list.OfType<T>().Select(expression.Compile()).ToList();
return data;
}
Usage and screenshots:
List all blobs' names under one folder:
List all blobs' URLs under one folder:
I am trying to upload an image to our S3 service, but aside from the
seamless execution of the following code, I can't access the file I have uploaded and the file itself isn't on the bucket either.
Another including problem is that I don't have any actual progress response aside from the percentage it has done.
private AmazonUploader()
{
_access = MyKey###";
_secret = "MySecret###";
AmazonS3Config config = new AmazonS3Config ();
config.ServiceURL = "s3-eu-west-1.amazonaws.com";
config.UseHttp = true;
config.RegionEndpoint = Amazon.RegionEndpoint.EUWest1;
_client = new AmazonS3Client (_access, _secret, config);
_trans = new TransferUtility (_client);
}
public async void UploadImage(string path, string key)
{
TransferUtilityUploadRequest up = new TransferUtilityUploadRequest();
up.BucketName = "myapp/uploads";
up.FilePath = path;
up.Key = key;
up.UploadProgressEvent += up_UploadProgressEvent;
await _trans.UploadAsync(up);
}
private void up_UploadProgressEvent(object sender, UploadProgressArgs e)
{
if (e.PercentDone == 100)
{
System.Diagnostics.Debug.WriteLine ("Done uploading");
if (OnUploadComplete != null) OnUploadComplete ();
}
}
Your BucketName seems invalid. Bucketnames cant contain slashes. Maybe you want "uploads" to be part of Key?