With Microsoft.WindowsAzure.Storage.Blob one could enumerate a large BLOB container using the BlobContinuationToken in the ListBlobsSegmentedAsync method, and the BlobResultSegment would include the ContinuationToken property if a subsequent read would be required to retrieve more from the container.
With Azure.Storage.Blobs it appears the enumeration API is BlobContainerClient.GetBlobsAsync but the method signature does not provide a means of supplying a continuation token, and the results as Azure.AsyncPageable<BlobItem> do not appear to provide a property that is a continuation token.
Does anyone have a code snippet for enumerating large BLOB containers on Azure, using continuation tokens, via the Azure.Storage.Blobs package?
If you really want to use a continuation token you can do so:
string continuationToken = null;
var container = new BlobContainerClient("xxx", "test");
var blobPages = container.GetBlobsAsync().AsPages();
await foreach (var page in blobPages)
{
continuationToken = page.ContinuationToken;
foreach(var blob in page.Values)
{
...
}
}
You can pass the token to AsPages to continue iteration:
blobPages = container.GetBlobsAsync().AsPages(continuationToken: continuationToken);
or you can just iterate over the blobs, let the SDK do the job of providing all the data using a stream of blobs:
var container = new BlobContainerClient("xxx", "test");
var blobs = container.GetBlobsAsync();
await foreach (var blob in blobs)
{
//
}
Iterating over the blobs can make multiple calls to the blob service.
For more about pagination in the SDK see the docs.
Related
I'm a complete noob at c# and know very little about azure apis and a current cs student doing a project for work. I built some middleware with youtube tutorials that authenticates a with a storage account using a string connection and it enumerates, uploads, downloads, and deletes blobs within a container. The issue i'm having lies with ONLY the downloading functionality and ONLY when the storage account access is set to private. This function works fine with anon access. I suspect the issue is with appending the url, and I'm not sure how to fix it. The blobs are mainly csv data if that matters. Any help or direction to resources would be greatly appreciated 🙂 here is the relevant code:
url function
public async Task<string> GetBlob(string name, string containerName)
{
var containerClient = _blobClient.GetBlobContainerClient(containerName);
var blobClient = containerClient.GetBlobClient(name);
return blobClient.Uri.AbsoluteUri;
}
The config file
"AllowedHosts": "*",
"BlobConnection" : "<mystringconnection>******==;EndpointSuffix=core.windows.net"
action request
[HttpGet]
public async Task<IActionResult> ViewFile(string name)
{
var res = await _blobService.GetBlob(name, "<mystorageacc>");
return Redirect(res);
}
The reason you are not able to download the blobs from a private container is because you are simply returning the blob's URL from your method without any authorization information. Request to access blobs in a private container must be authorized.
What you would need to do is create a Shared Access Signature (SAS) with at least Read permission and then return that SAS URL. The method you would want to use is GenerateSasUri. Your code would be something like:
public async Task<string> GetBlob(string name, string containerName)
{
var containerClient = _blobClient.GetBlobContainerClient(containerName);
var blobClient = containerClient.GetBlobClient(name);
return blobClient.GenerateSasUri(BlobSasPermissions.Read, DateTime.UtcNow.AddMinutes(5)).Uri.AbsoluteUri;
}
This will give you a link which is valid for 5 minutes from the time of creation and has the permission to read (download) the blob.
If you want to download from the blob service;
public async Task<byte[]> ReadFileAsync(string path)
{
using var ms = new MemoryStream();
var blob = _client.GetBlobClient(path);
await blob.DownloadToAsync(ms);
return ms.ToArray();
}
If you want to download the file byte array from controllers, you can check this;
https://stackoverflow.com/a/3605510/3024129
If you want to set a blob file public access level;
https://learn.microsoft.com/en-us/azure/storage/blobs/anonymous-read-access-configure.
Pay attention to the images please;
Or you can connect with Azure Storage Explorer and choose the easy way.
I found the images on the Google, there may be differences. :)
This worked for me by returning a byte array:
byte[] base64ImageRepresentation = new byte[] { };
BlobClient blobClient = new BlobClient(blobConnectionString,
blobContainerUserDocs,+ "/" + fileName);
if (await blobClient.ExistsAsync())
{
using var ms = new MemoryStream();
await blobClient.DownloadToAsync(ms);
return ms.ToArray();
}
I'm struggling with fetching meta data from BlobItem when fetching Blobs from Azure storage.
I'm definitely missing something but can't to figure it out what or where
Here is simple block of code where I'm fetching BlobItems from test container.
Everything is fine on var properties as I did successfully fetch Metadata, but problem lies when I'm trying to fetch data for blob item inside while loop, it returns null
var containerClient = _blobServiceClient.GetBlobContainerClient(AzureStorageHelpers.BlobContainers.Files);
var properties = await containerClient.GetPropertiesAsync();
var blobs = containerClient.GetBlobsAsync();
var enumerator = blobs.GetAsyncEnumerator();
var blobList = new List<BlobItem>();
try
{
while (await enumerator.MoveNextAsync())
{
var blobItem = enumerator.Current;
var metaData = enumerator.Current.Metadata;
var dwaw = blobItem.Metadata["Name"];
blobList.Add(blobItem);
}
}
finally
{
await enumerator.DisposeAsync();
}
I'm getting null value and on Azure I can clearly see that I have defined some test metadata properties.
I'm using NET.Core 2.2 with Nuget Azure.Storage.Blobs(12.5.1)
Try passing the parameter while fetching Blobs:
var blobs = containerClient.GetBlobsAsync(BlobTraits.Metadata);
Refer this link
We have a parquet formatfile (500 mb) which is located in Azure blob.How to read the file directly from blob and save in memory of c# ,say eg:Datatable.
I am able to read parquet file which is physically located in folder using the below code.
public void ReadParqueFile()
{
using (Stream fileStream = System.IO.File.OpenRead("D:/../userdata1.parquet"))
{
using (var parquetReader = new ParquetReader(fileStream))
{
DataField[] dataFields = parquetReader.Schema.GetDataFields();
for (int i = 0; i < parquetReader.RowGroupCount; i++)
{
using (ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))
{
DataColumn[] columns = dataFields.Select(groupReader.ReadColumn).ToArray();
DataColumn firstColumn = columns[0];
Array data = firstColumn.Data;
//int[] ids = (int[])data;
}
}
}
}
}
}
(I am able to read csv file directly from blob using sourcestream).Please kindly suggest a fastest method to read the parquet file directly from blob
Per my experience, the solution to directly read the parquet file from blob is first to generate the blob url with sas token and then to get the stream of HttpClient from the url with sas and finally to read the http response stream via ParquetReader.
First, please refer to the sample code below of the section Create a service SAS for a blob of the offical document Create a service SAS for a container or blob with .NET using Azure Blob Storage SDK for .NET Core.
private static string GetBlobSasUri(CloudBlobContainer container, string blobName, string policyName = null)
{
string sasBlobToken;
// Get a reference to a blob within the container.
// Note that the blob may not exist yet, but a SAS can still be created for it.
CloudBlockBlob blob = container.GetBlockBlobReference(blobName);
if (policyName == null)
{
// Create a new access policy and define its constraints.
// Note that the SharedAccessBlobPolicy class is used both to define the parameters of an ad hoc SAS, and
// to construct a shared access policy that is saved to the container's shared access policies.
SharedAccessBlobPolicy adHocSAS = new SharedAccessBlobPolicy()
{
// When the start time for the SAS is omitted, the start time is assumed to be the time when the storage service receives the request.
// Omitting the start time for a SAS that is effective immediately helps to avoid clock skew.
SharedAccessExpiryTime = DateTime.UtcNow.AddHours(24),
Permissions = SharedAccessBlobPermissions.Read | SharedAccessBlobPermissions.Write | SharedAccessBlobPermissions.Create
};
// Generate the shared access signature on the blob, setting the constraints directly on the signature.
sasBlobToken = blob.GetSharedAccessSignature(adHocSAS);
Console.WriteLine("SAS for blob (ad hoc): {0}", sasBlobToken);
Console.WriteLine();
}
else
{
// Generate the shared access signature on the blob. In this case, all of the constraints for the
// shared access signature are specified on the container's stored access policy.
sasBlobToken = blob.GetSharedAccessSignature(null, policyName);
Console.WriteLine("SAS for blob (stored access policy): {0}", sasBlobToken);
Console.WriteLine();
}
// Return the URI string for the container, including the SAS token.
return blob.Uri + sasBlobToken;
}
Then to get the http response stream of HttpClient from the url with sas token .
var blobUrlWithSAS = GetBlobSasUri(container, blobName);
var client = new HttpClient();
var stream = await client.GetStreamAsync(blobUrlWithSAS);
Finally to read it via ParquetReader, the code comes from Reading Data of GitHub repo aloneguid/parquet-dotnet.
var options = new ParquetOptions { TreatByteArrayAsString = true };
var reader = new ParquetReader(stream, options);
Assembly in use: Assembly Microsoft.WindowsAzure.Storage, Version=9.3.1.0
What I want to do:
In my Azure storage, I have images stored as a blob, in the following fashion
I want to get the URLs of all the image blobs along with their last modified timestamp.
Please note that Image1 and Image4 might have the same name.
What I have tried:
I tried ListBlobsSegmentedAsync(BlobContinuationToken currentToken) from the root of the container and by using GetDirectoryReference(string relativeAddress) but couldn't get the desired result.
Though a bit off track, I am able to get the blob details by GetBlockBlobReference(string blobName);
What should I do?
Thanks in advance.
The ListBlobsSegmentedAsync method has 2 overloads that contain the useFlatBlobListing argument. These overloads accept 7 or 8 arguments, and I count 6 in your code.
Use the following code to list all blob in container.
public static async Task test()
{
StorageCredentials storageCredentials = new StorageCredentials("xxx", "xxxxx");
CloudStorageAccount storageAccount = new CloudStorageAccount(storageCredentials, true);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("container");
BlobContinuationToken blobContinuationToken = null;
var resultSegment = await container.ListBlobsSegmentedAsync(
prefix: null,
useFlatBlobListing: true,
blobListingDetails: BlobListingDetails.None,
maxResults: null,
currentToken: blobContinuationToken,
options: null,
operationContext: null
);
// Get the value of the continuation token returned by the listing call.
blobContinuationToken = resultSegment.ContinuationToken;
foreach (IListBlobItem item in resultSegment.Results)
{
Console.WriteLine(item.Uri);
}
}
The result is as below:
Please try this override of ListBlobsSegmentedAsync with the following parameters:
prefix: ""
useFlatBlobListing: true
blobListingDetails: BlobListingDetails.All
maxResults: 5000
currentToken: null or the continuation token returned
This will return you a list of all blobs (including inside virtual folders)
I have an Azure Function which extracts the name and file URLs from a blob container and then send this information to another function to process these files (unpack them and save them in Datalake)
For extartction of blobs:
string storageConnectionString = #"myconnstring";
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(storageConnectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("Container");
IEnumerable<IListBlobItem> blobs = new IListBlobItem[0];
foreach (IListBlobItem blobItem in container.ListBlobs())
{
if (blobItem is CloudBlobDirectory)
{
CloudBlobDirectory directory = (CloudBlobDirectory)blobItem;
blobs = directory.ListBlobs(true);
}
}
await ProcessBlobs(blobs);
and the function for process the bolbs:
public static async Task ProcessBlobs(IEnumerable<IListBlobItem> blobs)
{
var tasks = blobs.Select(currentblob =>
{
string FileUrl = currentblob.Uri.ToString();
string FileName = currentblob.Uri.Segments.Last();
//string content = "{ \"fileUri\": \""+ currentblob.Uri.ToString()+ "\" , \"fileName\": \""+ currentblob.Uri.Segments.Last()+"\"}";
var values = new Dictionary<string, string>
{
{ "fileUri", currentblob.Uri.ToString() },
{ "fileName", currentblob.Uri.Segments.Last() }
};
var content = new FormUrlEncodedContent(values);
string baseURL = #"https://<afu>.azurewebsites.net/api/process_zip_files_by_http_trigger?code=45"; ;
//string urlToInvoke = string.Format("{0}&name={1}", baseURL, FileUrl, FileName);
return RunAsync(baseURL, content);
});
await Task.WhenAll(tasks);
}
public static async Task RunAsync(string i_URL, FormUrlEncodedContent content)
{
var response = await client.PostAsync(i_URL, content);
var responseString = await response.Content.ReadAsStringAsync();
log.info(responseString);
}
the function RunAsyncprocess files asynchronous.
My question is now:
Is it generally possible to process blobs parallel, but in a synchronized way? Do you have any better and simpler idea, to implement my aim?
This is one of the best use cases for Durable Functions. Specifically, the Fan-Out/Fan-In Pattern.
You would need 4 functions in total
GetBlobList Activity Function
This is where you get the list of blobs to process. You would just get the list of blobs instead of the actual blobs instead.
ProcessBlob Activity Function
This function takes a blob path, fetches the blob and processes it.
Orchestrator Function
This is the function that calls the GetBlobList function, loops over the list of blob paths returned and calls the ProcessBlob function for each blob path.
Starter Function (Client Function)
This function simply triggers a run of the orchestration. This is usually a HTTP triggered function.
If you are new to Durable Functions, its best to walk through the quickstart doc to understand the different types of functions required.