How to set ContentMD5 in DataLakeFileClient? - c#

When uploading to an Azure Data Lake using the Microsoft Azure Storage Explorer the file automatically generates and stores a value for the ContentMD5 property. It also automatically does it in a function app that uses a Blob binding.
However, this does not automatically generate when uploading from a C# DLL.
I want to use this value to compare files in the future.
My code for the upload is very simple.
DataLakeFileClient fileClient = await directoryClient.CreateFileAsync("testfile.txt");
await fileClient.UploadAsync(fileStream);
I also know I can generate an MD5 using the below code, but I'm not certain if this is the same way that Azure Storage Explorer does it.
using (var md5gen = MD5.Create())
{
md5hash = md5gen.ComputeHash(fileStream);
}
but I have no idea how to set this value to the ContentMD5 property of the file.

I have found the solution.
The UploadAsync method has an overload that accepts a parameter of type DataLakeFileUploadOptions. This class contains a HttpHeaders object which in turn has a ContentHash property which stores it as a property of the document.
var uploadOptions = new DataLakeFileUploadOptions();
uploadOptions.HttpHeaders = new PathHttpHeaders();
uploadOptions.HttpHeaders.ContentHash = md5hash;
await fileClient.UploadAsync(fileStream, uploadOptions);

Related

Getting big data through SignalR - Blazor

I have a component library that uses JS code to generate an image as a base64 string and the image needs to be transposed to C#. The image size is larger than MaximumReceiveMessageSize.
Can I get the value of the MaximumReceiveMessageSize property in C#? I need a way to correctly split the picture into chunks, or some other way to transfer it.
My component can be used in a Wasm or Server application. I can't change the value of the MaximumReceiveMessageSize property.
Thanks
Using a stream as described in the Stream from JavaScript to .NET solved my problem.
From Microsoft docs:
In JavaScript:
function streamToDotNet() {
return new Uint8Array(10000000);
}
In C# code:
var dataReference = await JS.InvokeAsync<IJSStreamReference>("streamToDotNet");
using var dataReferenceStream = await dataReference.OpenReadStreamAsync(maxAllowedSize: 10_000_000);
var outputPath = Path.Combine(Path.GetTempPath(), "file.txt");
using var outputFileStream = File.OpenWrite(outputPath);
await dataReferenceStream.CopyToAsync(outputFileStream);
In the preceding example: JS is an injected IJSRuntime instance. The dataReferenceStream is written to disk (file.txt) at the current user's temporary folder path (GetTempPath).

Get how much disk is being used in Azure Data Lake

I'm using Azure.Storage.Files.DataLake to programmatically interact with Azure.
It looks like I can get the size of a given file with
var props = await fileClient.GetPropertiesAsync();
var fileSizeInBytes = propriedades.Value.ContentLength;
(where fileClient is an instace of DataLakeFileClient)
DataLakeDirectoryClient class offers a similar API, but ContentLength is always zero.
I would like to know how to get the size of:
a directory (with DataLakeDirectoryClient)
a file system (with DataLakeFileSystemClient)

Firebase Storage read or query existing data with .NET C#

Environment: VS project, .NET, C#
I've implemented uploading documents to my Firebase Storage Bucket via the example in the link below:
How to Upload File to Firebase Storage in .Net C# Windows Form?
I'm trying to find documentation on how to use the same library/functionality to read a file that I've manually uploaded to my Bucket.
In essence: how to 'peek' or 'read' a file that is already on Storage? I basically want to query data inside an existing csv file.
So far I've found documentation only here, which doesn't provide much in terms of a possible solution, at least as far as I can understand it...
Firebase Storage Introduction
There is seemingly more related information on the same page on the 'Firebase Store' section, but that isn't the same as Firebase Storage :/
Any ideas?
Looking at the docs, It seems you can open files by downloading them.
var client = StorageClient.Create();
// Create a bucket with a globally unique name
var bucketName = Guid.NewGuid().ToString();
var bucket = client.CreateBucket(projectId, bucketName);
// Upload some files
var content = Encoding.UTF8.GetBytes("hello, world");
var obj1 = client.UploadObject(bucketName, "file1.txt", "text/plain", new MemoryStream(content));
var obj2 = client.UploadObject(bucketName, "folder1/file2.txt", "text/plain", new MemoryStream(content));
// List objects
foreach (var obj in client.ListObjects(bucketName, ""))
{
Console.WriteLine(obj.Name);
}
// Download file
using (var stream = File.OpenWrite("file1.txt"))
{
client.DownloadObject(bucketName, "file1.txt", stream);
}

ContentHash is null in Azure.Storage.Blobs v12.x.x

I am trying to upgrade my project from Microsoft.WindowsAzure.Storage v9 (deprecated) to latest sdk Azure.Storage.Blobs v12.
My issue (post-upgrade) is accessing the ContentHash property.
Pre-upgrade steps:
upload file to blob
get MD5 hash of uploaded file provided by CloudBlob.Properties.ContentMD5 from Microsoft.WindowsAzure.Storage.Blob
compare the calculated MD5 hash with the one retrieved from azure
Post-upgrade attempts to access the MD5 hash that Azure is calculating on its side:
1.BlobClient.GetProperties() calling this method
2.BlobClient.UploadAsync() looking at the BlobContentInfo response
both return ContentHash is null. (see my later Question to see why)
One huge difference I've noticed is that with older sdk I could tell to the storage client to use MD5 computing like this:
CloudBlobClient cloudBlobClient = _cloudStorageAccount.CreateCloudBlobClient();
cloudBlobClient.DefaultRequestOptions.StoreBlobContentMD5 = true;
So I was expecting to find something similar to StoreBlobContentMD5 on the latest sdk but I couldn't.
Can anyone help me find a solution for this problem?
Edit 1:
I did a test and in azure storage I do not have a MD5 hash
Upload code:
var container = _blobServiceClient.GetBlobContainerClient(containerName);
var blob = container.GetBlobClient(blobPath);
BlobHttpHeaders blobHttpHeaders = null;
if (!string.IsNullOrWhiteSpace(fileContentType))
{
blobHttpHeaders = new BlobHttpHeaders()
{
ContentType = fileContentType,
};
}
StorageTransferOptions storageTransferOption = new StorageTransferOptions()
{
MaximumConcurrency = 2,
};
var blobResponse = await blob.UploadAsync(stream, blobHttpHeaders, null, null, null, null, storageTransferOption, default);
return blob.GetProperties();
There is not much difference between old upload code and new one apart from using new classes from new sdk.
The main difference remains the one I already stated, I can not find an equivalent setting in new sdk for StoreBlobContentMD5 .
I think this is the problem. I need to set the storage client to compute MD5 hash, as I did with old sdk.
Edit 2:
For download I can do something like this:
var properties = blob.GetProperties();
var download = await blob.DownloadAsync(range: new HttpRange(0, properties.Value.ContentLength), rangeGetContentHash: true);
By using this definition of DownloadAsync I can force MD5 hash to be calculated and it can be found in download.Value.ContentHash
Summarize to close the question:
I did a quick test with the latest version of 12.4.4 blob storage package, I can see the content-md5 is auto-generated and can also be read.
And as per the op's comment, it may due to some issues with the existing solution. And after creating a new solution, it works as expected.
The short version of this problem is, make sure the Stream you upload to Azure using the v12 version of the SDK supports Seek (see the HasSeek property). It's currently required in order to traverse the Stream to generate the hash, and reset/seek the position back to 0 so that it can be read again for the actual upload.

Azure Functions - Blob Stream Dynamic Input bindings

I'm running a C# function on azure which needs to take in files from a container. The only problem is that the paths to the input files are going to be (potentially) different each time, and the number of input files will vary from 1 to about 4 or 5. Accordingly I can't just use the default input blob bindings as far as I'm aware. My options are give the container anonymous access and just grab the files through the link or figure out how to get dynamic input bindings.
Does anyone know how to declare the path for the input blob stream at runtime (in the C# code)?
If it helps I've managed to find this for dynamic output bindings
using (var writer = await binder.BindAsync<TextWriter>(
new BlobAttribute(containerPath + fileName)))
{
writer.Write(OutputVariable);
}
Thanks in advance, Cuan
try the below code:
string filename = string.Format("{0}/{1}_{2}.json", blobname, DateTime.UtcNow.ToString("ddMMyyyy_hh.mm.ss.fff"), Guid.NewGuid().ToString("n"));
using (var writer = await binder.BindAsync<TextWriter>(
new BlobAttribute(filename, FileAccess.Write)))
{
writer.Write(JsonConvert.SerializeObject(a_object));
}
For dynamic output bindings, you could leverage the following code snippet:
var attributes = new Attribute[]
{
new BlobAttribute("{container-name}/{blob-name}"),
new StorageAccountAttribute("brucchStorage") //connection string name for storage connection
};
using (var writer = await binder.BindAsync<TextWriter>(attributes))
{
writer.Write(userBlobText);
}
Note: The above code would create the target blob if not exists and override the existing blob if it exists. Moreover, if you do not specify the StorageAccountAttribute, your target blob would be create into the storage account based on the app setting AzureWebJobsStorage.
Additionally, you could follow Azure Functions imperative bindings for more details.
UPDATE:
For dynamic input binding, you could just change the binding type as follows:
var blobString = await binder.BindAsync<string>(attributes);
Or you could set the binding type to CloudBlockBlob and add the following namespace for azure storage blob:
#r "Microsoft.WindowsAzure.Storage"
using Microsoft.WindowsAzure.Storage.Blob;
CloudBlockBlob blob = await binder.BindAsync<CloudBlockBlob>(attributes);
Moreover, more details about the operations for CloudBlockBlob, you could follow here.

Categories

Resources