Azure Storage - Download a blob with conditions

Azure Storage - Download a blob with conditions - c#

I've been having trouble assigning read conditions on blobs when trying to download them when using the Azure Storage SDK.
Basically, what I am trying to do goes like this:
Upload a blob (WORKS)
Download the blob (WORKS)
Get the Etag of the blob with blobRef.GetProperties().Etag (WORKS)
Use the Etag to try to download the blob again expecting a RequestFailedException e where the e.ErrorCode == ConditionNotMet (FAILS)
This is the code:
var condition = new Azure.Storage.Blobs.Models.BlobRequestConditions
{
IfNoneMatch = new Azure.ETag(previousEtagString),
};
//blobRef is a valid instance of Azure.Storage.Blobs.BlobClient
//target is the filePath
blobRef.DownloadTo(target, conditions: condition); // this should throw RequestFailedException
Notes:
When I try to compare the Etag fetched from step 3 (which is converted to a string) with the one I am sending in step 4, it returns they are the same blobRef.GetProperties().Etag == new Azure.Etag(blobRef.GetProperties().Etag.ToString()) -> true
I also opened the question at the GitHub Repo
Passing test cases with v11:
var blobRef = _blobContainer.GetBlockBlobReference(identifier);
blobRef.DownloadTo(target, AccessCondition.GenerateIfNoneMatchCondition(stringEtag)); //throws StorageException
Just noticed that the blob downloaded the second time, does not contain any data. It's empty. The first one has data. Are the read conditions partially working, but not throwing the exception?
This is definitely a bug. I opened the issue in the Github repo
Thanks

Use the Etag to try to download the blob again expecting a
RequestFailedException e where the e.ErrorCode == ConditionNotMet
This is not the expected behavior because Etag only changes when a blob is updated.
If a blob is not updated, Etag value remains the same. So if you do not update the blob, you can perform a conditional download multiple times without getting the precondition failure error.

Related

restore an azure blob snapshot using c# library

I am writing a CLI utility that does a lot of different things, but what I'm struggling with right now is I have a known blob. For that blob, I want to restore a snapshot that was taken of that blob
await foreach (var snapshot in containerClient.GetBlobsAsync(
BlobTraits.All,
BlobStates.Snapshots,
blobPath))
{
_logger.LogInformation($"found blob {snapshot.Name} - {snapshot.Snapshot}");
if (DecideIfRightSnapshot(snapshot)) {
BlobClient snapshotBlob = containerClient.GetBlobClient(snapshot.Name);
_logger.LogInformation($"found snapshot {snapshotBlob.Uri}");
await sourceBlob.StartCopyFromUriAsync(snapshotBlob.Uri);
}
break;
}
First, the filter isn't working right because the last blob in the list is always the base blob. But I can work around that one.
The real issue i'm struggling with is the proper way to restore a blob from a snapshot using the libs? I'm really concerned because the .Uri function always returns the base file's uri, even if its a snapshot. I was lead to believe the URI would be something like this
https://me.blob.core.windows.net/myapp/doc?snapshot=2020-12-16T17:07:44.1076450Z
but thats not the URI thats getting logged. Am i supposed to construct the full URI myself?
In all the searches they refer to this as "promoting" a snapshot. But I can't find a "promote" method in the API.
Am i doing this right?

If you're using the new version of blob storage sdk: Azure.Storage.Blobs, then you should construct the full URI by yourself. The sample code like below:
//other code
await foreach (var snapshot in containerClient.GetBlobsAsync(
BlobTraits.All,
BlobStates.Snapshots,
blobPath))
{
_logger.LogInformation($"found blob {snapshot.Name} - {snapshot.Snapshot}");
if(DecideIfRightSnapshot(snapshot)) {
BlobClient snapshotBlob = containerClient.GetBlobClient(snapshot.Name);
//construct the snapshot url
var snapshot_uri = snapshotBlob.Uri.ToString() + "?snapshot=" + snapshot.Snapshot;
_logger.LogInformation($"found snapshot {snapshot_uri }");
await sourceBlob.StartCopyFromUriAsync(snapshot_uri);
}
break;
}
For promoting, it means you restore the snapshot via azure portal. It's a UI operation and actually it calls the Put Blob From URL api. And currently, there is no such method in sdk.
But if you're using some old packages like WindowsAzure.Storage, it has many methods to operate with snapshot, see this article. Note: it's not recommended to use the old packages.

ContentHash not calculated in Azure Blob Storage v12

Continuing the saga, here is part I: ContentHash is null in Azure.Storage.Blobs v12.x.x
After a lot of debugging, root cause appears to be that the content hash was not calculated after uploading a blob, therefore the BlobContentInfo or BlobProperties were returning a null content hash and my whole flow is based on receiving the hash from Azure.
What I've discovered is that it depends on which HttpRequest stream method I call and upload to azure:
HttpRequest.GetBufferlessInputStream(), the content hash is not calculated, even if I go into azure storage explorer, the ContentMD5 of the blob is empty.
HttpRequest.InputStream() everything works as expected.
Do you know why this different behavior? And do you know how to make to receive content hash for streams received by GetBufferlessInputStream method.
So the code flow looks like this:
var stream = HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true)
var container = _blobServiceClient.GetBlobContainerClient(containerName);
var blob = container.GetBlockBlobClient(blobPath);
BlobHttpHeaders blobHttpHeaders = null;
if (!string.IsNullOrWhiteSpace(fileContentType))
{
blobHttpHeaders = new BlobHttpHeaders()
{
ContentType = fileContentType,
};
}
// retry already configured of Azure Storage API
await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);
return await blob.GetPropertiesAsync();
In the code snippet from above ContentHash is NOT calculated, but if I change the way I am getting the stream from the http request with following snippet ContentHash is calculated.
var stream = HttpContext.Current.Request.InputStream
P.S. I think its obvious, but with the old sdk, content hash was calculated for streams received by GetBufferlessInputStream method
P.S2 you can find also an open issue on github: https://github.com/Azure/azure-sdk-for-net/issues/14037
P.S3 added code snipet

Ran into this today. From my digging, it appears this is a symptom of the type of Stream you use to upload, and it's not really a bug. In order to generate a hash for your blob (which is done on the client side before uploading by the looks of it), it needs to read the stream. Which means it would need to reset the position of your stream back to 0 (for the actual upload process) after generating the hash. Doing this requires the ability to perform the Seek operation on the stream. If your stream doesn't support Seek, then it looks like it doesn't generate the hash.
To get around the issue, make sure the stream you provide supports Seek (CanSeek). If it doesn't, then use a different Stream/copy your data to a stream that does (for example MemoryStream). The alternative would be for the internals of the Blob SDK to do this for you.

A workaround is that when get the stream via GetBufferlessInputStream() method, convert it to MemoryStream, then upload the MemoryStream. Then it can generate the contenthash. Sample code like below:
var stream111 = System.Web.HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true);
//convert to memoryStream.
MemoryStream stream = new MemoryStream();
stream111.CopyTo(stream);
stream.Position = 0;
//other code
// retry already configured of Azure Storage API
await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);
Not sure why, but as per my debug, I can see when using the method GetBufferlessInputStream() in the latest SDK, during upload, it actually calls the Put Block api in the backend. And in this api, MD5 hash is not stored with the blob(Refer to here for details.). Screenshot as below:
However, when using InputStream, it calls the Put Blob api. Screenshot as below:

ContentHash is null in Azure.Storage.Blobs v12.x.x

I am trying to upgrade my project from Microsoft.WindowsAzure.Storage v9 (deprecated) to latest sdk Azure.Storage.Blobs v12.
My issue (post-upgrade) is accessing the ContentHash property.
Pre-upgrade steps:
upload file to blob
get MD5 hash of uploaded file provided by CloudBlob.Properties.ContentMD5 from Microsoft.WindowsAzure.Storage.Blob
compare the calculated MD5 hash with the one retrieved from azure
Post-upgrade attempts to access the MD5 hash that Azure is calculating on its side:
1.BlobClient.GetProperties() calling this method
2.BlobClient.UploadAsync() looking at the BlobContentInfo response
both return ContentHash is null. (see my later Question to see why)
One huge difference I've noticed is that with older sdk I could tell to the storage client to use MD5 computing like this:
CloudBlobClient cloudBlobClient = _cloudStorageAccount.CreateCloudBlobClient();
cloudBlobClient.DefaultRequestOptions.StoreBlobContentMD5 = true;
So I was expecting to find something similar to StoreBlobContentMD5 on the latest sdk but I couldn't.
Can anyone help me find a solution for this problem?
Edit 1:
I did a test and in azure storage I do not have a MD5 hash
Upload code:
var container = _blobServiceClient.GetBlobContainerClient(containerName);
var blob = container.GetBlobClient(blobPath);
BlobHttpHeaders blobHttpHeaders = null;
if (!string.IsNullOrWhiteSpace(fileContentType))
{
blobHttpHeaders = new BlobHttpHeaders()
{
ContentType = fileContentType,
};
}
StorageTransferOptions storageTransferOption = new StorageTransferOptions()
{
MaximumConcurrency = 2,
};
var blobResponse = await blob.UploadAsync(stream, blobHttpHeaders, null, null, null, null, storageTransferOption, default);
return blob.GetProperties();
There is not much difference between old upload code and new one apart from using new classes from new sdk.
The main difference remains the one I already stated, I can not find an equivalent setting in new sdk for StoreBlobContentMD5 .
I think this is the problem. I need to set the storage client to compute MD5 hash, as I did with old sdk.
Edit 2:
For download I can do something like this:
var properties = blob.GetProperties();
var download = await blob.DownloadAsync(range: new HttpRange(0, properties.Value.ContentLength), rangeGetContentHash: true);
By using this definition of DownloadAsync I can force MD5 hash to be calculated and it can be found in download.Value.ContentHash

Summarize to close the question:
I did a quick test with the latest version of 12.4.4 blob storage package, I can see the content-md5 is auto-generated and can also be read.
And as per the op's comment, it may due to some issues with the existing solution. And after creating a new solution, it works as expected.

The short version of this problem is, make sure the Stream you upload to Azure using the v12 version of the SDK supports Seek (see the HasSeek property). It's currently required in order to traverse the Stream to generate the hash, and reset/seek the position back to 0 so that it can be read again for the actual upload.

Handling sheet changes in the Google Sheets API

I'm trying to handle Spreadsheet changes for updating local version of data, and got some troubles:
Google Sheets API does not have any request for checking last modified time or file version. (Correct me, if i mistake)
Google need some time to handle changes and update version metadata of the file.
For example:
File is in version 10.
Sending BatchUpdateRequest with some data
By the end of the previous request checking file version by DriveAPI Files.Get request with field "version" and get old version 10
If wait about 15 seconds this request returns correct data, but it's not a solution because data updates every minute for each spreadsheet, so, it will spends a lot more time.
To overcome these problems i have realized logic with local calculating of Spreadsheet version, and comparing it after uploading: if online version > local version spreadsheet will be reload. But it creates new problem:
If make changes to Spreadsheet from several computers at a few moment, local version on all computers will be increased, but Google concats this changes into one version. So, for correct working it must be oldversionnubmer + countOfComputersThatMakesChanges but in fact it is oldVersionNumber + 1. Thus no one will get actual spreadsheet data because online version will not be higher than local.
In this way i have a question: How can I updates spreadsheets on changing data from another source?
GoogleSpreadsheetsVersions filling like that:
var versions = Instance.GoogleSpreadsheetsVersions;
if (!versions.ContainsKey(newTable.SpreadsheetId)) {
var request = GoogleSpreadsheetsServiceDecorator.Instance.DriveService.Files.Get(newTable.SpreadsheetId);
request.Fields = "version";
var response = request.Execute();
versions.Add(newTable.SpreadsheetId, response.Version);
}
version comparison itself:
var newInfo = new Dictionary<string, long?>();
foreach (var info in GoogleSpreadsheetsVersions)
{
try
{
//Gets file version
var request = GoogleSpreadsheetsServiceDecorator.Instance.DriveService.Files.Get(info.Key);
request.Fields = "version";
var response = request.Execute();
// local version < actual google version
if (info.Value < response.Version)
{
// setting flag of reloading for each sheet from this file
foreach (var t in GoogleSpreadsheets.Where(sheet => sheet.SpreadsheetId == info.Key))
t.IsLoadRequestRequired = true;
}
//Refreshing local versions
newInfo.Add(info.Key, response.Version);
}
catch (Exception e) when (e.Message.Contains("File not found"))
{
newInfo.Add(info.Key, null);
}
}
GoogleSpreadsheetsVersions = newInfo;
P.S.:
version field description from Google guide:
A monotonically increasing version number for the file. This reflects every change made to the file on the server, even those not visible to the user.
Local class of Spreadsheet in code means data of one sheet in Google. So, if one Google spreadsheet contains 10 sheets it will be 10 spreadsheets in programm.
May be helpful DriveAPI.Files.Get request fileds

Data import via Management API successful, but data for custom dimensions does not show

I am trying to import data for custom dimension in Google Analytics through the .NET client library. In Google Analytics, when I view the uploads for a data set from Admin > Data Import > Manage Uploads, it says my uploads are successful, but the data for the custom dimension doesn't seem to show up in my report. Right now, I am just using my custom dimension to set the category for an article.
Here is how I am uploading through the .Net client library.
string accountId = "***";
string webPropertyId = "***";
string customDataSourceId = "***";
string contentType = "application/octet-stream";
IUploadProgress progress;
using (var dataStream = CreateArticleCsvStream(articles))
{
var fs = File.Create("test.csv");
dataStream.CopyTo(fs);
fs.Close();
progress = service.Management.Uploads.UploadData(accountId, webPropertyId, customDataSourceId, dataStream, contentType).Upload();
}
if (progress.Status == UploadStatus.Failed)
{
throw progress.Exception;
}
Here is the output for test.csv
ga:pagePath,ga:dimension1
/path/to/page/,"MyCategory"
When I download the file from the data set, I get the same file as the test.csv file, it just has a random filename that gets assisgned to it.
I found this other question similar to mine, but there was no solution posted. Any help would be appreciated.
I have also waited over 24 hours, but still nothing.

It took a few days of trial and error but I finally found the solution.
First thing to check is that your Website's URL is correct under Admin > View Settings. We had ours set up like my.domain.com/path/to/site when it should have just been my.domain.com. (We are using SharePoint, which is why path/to/site was appended to the site URL)
Second thing to check is that your key/pagePath entries are all correct. In our case, we had an extra forward slash at the end of the URL. For some reason, Google Analytics displays the trailing forward slash in reports, but does not actually store it for the pagePath.
Another error may be capitalization. It seems like GA applies filters after the data has been processed. If you add the lowercase/uppercase filter, notice that it only affects how the URLs display in your reports. Behind the scenes, it seems that GA still stores the URL with whatever capitalization the hit initially came in with. For example if the URL on your site is my.domain.com/path/to/PAGE.aspx and you apply the lowercase filter, the pagePath will display in your reports as /path/to/page.aspx. But, if you use the lowercase value in your csv import, the data will not join. You must use the pagePath that appears on your site (/path/to/PAGE.aspx in this case).
It would be nice if Google gave some log files when it tries to process and join the uploaded data with the existing data, rather than just saying the upload was successful even though the processing/joining stage may fail.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.