I am writing a CLI utility that does a lot of different things, but what I'm struggling with right now is I have a known blob. For that blob, I want to restore a snapshot that was taken of that blob
await foreach (var snapshot in containerClient.GetBlobsAsync(
BlobTraits.All,
BlobStates.Snapshots,
blobPath))
{
_logger.LogInformation($"found blob {snapshot.Name} - {snapshot.Snapshot}");
if (DecideIfRightSnapshot(snapshot)) {
BlobClient snapshotBlob = containerClient.GetBlobClient(snapshot.Name);
_logger.LogInformation($"found snapshot {snapshotBlob.Uri}");
await sourceBlob.StartCopyFromUriAsync(snapshotBlob.Uri);
}
break;
}
First, the filter isn't working right because the last blob in the list is always the base blob. But I can work around that one.
The real issue i'm struggling with is the proper way to restore a blob from a snapshot using the libs? I'm really concerned because the .Uri function always returns the base file's uri, even if its a snapshot. I was lead to believe the URI would be something like this
https://me.blob.core.windows.net/myapp/doc?snapshot=2020-12-16T17:07:44.1076450Z
but thats not the URI thats getting logged. Am i supposed to construct the full URI myself?
In all the searches they refer to this as "promoting" a snapshot. But I can't find a "promote" method in the API.
Am i doing this right?
If you're using the new version of blob storage sdk: Azure.Storage.Blobs, then you should construct the full URI by yourself. The sample code like below:
//other code
await foreach (var snapshot in containerClient.GetBlobsAsync(
BlobTraits.All,
BlobStates.Snapshots,
blobPath))
{
_logger.LogInformation($"found blob {snapshot.Name} - {snapshot.Snapshot}");
if(DecideIfRightSnapshot(snapshot)) {
BlobClient snapshotBlob = containerClient.GetBlobClient(snapshot.Name);
//construct the snapshot url
var snapshot_uri = snapshotBlob.Uri.ToString() + "?snapshot=" + snapshot.Snapshot;
_logger.LogInformation($"found snapshot {snapshot_uri }");
await sourceBlob.StartCopyFromUriAsync(snapshot_uri);
}
break;
}
For promoting, it means you restore the snapshot via azure portal. It's a UI operation and actually it calls the Put Blob From URL api. And currently, there is no such method in sdk.
But if you're using some old packages like WindowsAzure.Storage, it has many methods to operate with snapshot, see this article. Note: it's not recommended to use the old packages.
Related
I've been having trouble assigning read conditions on blobs when trying to download them when using the Azure Storage SDK.
Basically, what I am trying to do goes like this:
Upload a blob (WORKS)
Download the blob (WORKS)
Get the Etag of the blob with blobRef.GetProperties().Etag (WORKS)
Use the Etag to try to download the blob again expecting a RequestFailedException e where the e.ErrorCode == ConditionNotMet (FAILS)
This is the code:
var condition = new Azure.Storage.Blobs.Models.BlobRequestConditions
{
IfNoneMatch = new Azure.ETag(previousEtagString),
};
//blobRef is a valid instance of Azure.Storage.Blobs.BlobClient
//target is the filePath
blobRef.DownloadTo(target, conditions: condition); // this should throw RequestFailedException
Notes:
When I try to compare the Etag fetched from step 3 (which is converted to a string) with the one I am sending in step 4, it returns they are the same blobRef.GetProperties().Etag == new Azure.Etag(blobRef.GetProperties().Etag.ToString()) -> true
I also opened the question at the GitHub Repo
Passing test cases with v11:
var blobRef = _blobContainer.GetBlockBlobReference(identifier);
blobRef.DownloadTo(target, AccessCondition.GenerateIfNoneMatchCondition(stringEtag)); //throws StorageException
Just noticed that the blob downloaded the second time, does not contain any data. It's empty. The first one has data. Are the read conditions partially working, but not throwing the exception?
This is definitely a bug. I opened the issue in the Github repo
Thanks
Use the Etag to try to download the blob again expecting a
RequestFailedException e where the e.ErrorCode == ConditionNotMet
This is not the expected behavior because Etag only changes when a blob is updated.
If a blob is not updated, Etag value remains the same. So if you do not update the blob, you can perform a conditional download multiple times without getting the precondition failure error.
I've been trying to figure out the fastest way to connect to an Azure Storage account, cycle through a number of containers and convert the blobs inside containers into objects. All elements in the container are JSON and match different objects.
The structure as seen on Azure Storage Explorer would be:
Azure_Subscription
|--Storage_Accounts
|--My_Storage_Account
|--blob1
|--blob2
|--blob3
etc
Now based on what I've read here in the official documentation, to access and Download each blob and convert it so that it can be handled as JSON and deserialized, I would need to do all the below (assuming I don't have a list of blob URIs).
string testConnectionString = "DefaultEndpointsProtocol=https;AccountName=;AccountKey=;EndpointSuffix=core.windows.net";
// the service clients allow working at the Azure Storage level with Tables and Blobs
TableServiceClient tableServiceClient = new TableServiceClient(testConnectionString);
BlobServiceClient blobServiceClient = new BlobServiceClient(testConnectionString);
List<blob1> blob1List = new List<blob1>;
// this gives me a list of blob containers and I can programmatically retrieve
// the name of each individual container.
Pageable<BlobContainerItem> blobContainers = blobServiceClient.GetBlobContainers();
// each BlobContainerItem represents an individual blob container (bill, building...)
foreach (BlobContainerItem blobContainerItem in blobContainers)
{
// create a ContainerClient to make calls to each individual container
BlobContainerClient clientForIndividualContainer =
blobServiceClient.GetBlobContainerClient(blobContainerItem.Name);
if (blobItem.Name.Equals("blob1"))
{
Pageable<BlobItem> blobItemList = clientForIndividualContainer.GetBlobs();
foreach (BlobItem bi in blobItemList)
{
BlobClient blobClient = clientForIndividualContainer.GetBlobClient(bi.Name);
var blobContent = blobClient.Download();
StreamReader reader = new StreamReader(blobContent.Value.Content);
string text = reader.ReadToEnd();
blob1List.Add(JsonSerializer.Deserialize<blob1>(text));
}
}
}
The project is targeting .net 5.0 and I will need to do something similar with Azure Tables as well. The goal is that I want to go through all blobs inside a number of containers (all of them JSON really) and compare them to all the blobs inside another storage account. I'm also open to any ideas on doing this differently altogether, but the purpose of this is to compare input into Azure Storage blobs and make sure that a new process uploads the same object structures. So for all blob1 items in the Azure Storage account I compare these to a list of all the oldBlob1 items in another storage account and check to see if they are all equal.
I hope the question makes sense... At this point the above code works and I can move the functionality inside the if-else into a method and instead of the if-else statement use a switch, but my main question is around reaching this point entirely. Without a massive list of blob URIs, do I need to make a BlobServiceClient to be able to make a list of BlobContainerItem(s) to then cycle through each of the containers and create for all of them BlobContainerClient(s) and then create a BlobClient for every single blob in the storage account to finally be able to get to the Content of the blob?
This seems like a lot of work to just get access to an individual file.
Continuing the saga, here is part I: ContentHash is null in Azure.Storage.Blobs v12.x.x
After a lot of debugging, root cause appears to be that the content hash was not calculated after uploading a blob, therefore the BlobContentInfo or BlobProperties were returning a null content hash and my whole flow is based on receiving the hash from Azure.
What I've discovered is that it depends on which HttpRequest stream method I call and upload to azure:
HttpRequest.GetBufferlessInputStream(), the content hash is not calculated, even if I go into azure storage explorer, the ContentMD5 of the blob is empty.
HttpRequest.InputStream() everything works as expected.
Do you know why this different behavior? And do you know how to make to receive content hash for streams received by GetBufferlessInputStream method.
So the code flow looks like this:
var stream = HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true)
var container = _blobServiceClient.GetBlobContainerClient(containerName);
var blob = container.GetBlockBlobClient(blobPath);
BlobHttpHeaders blobHttpHeaders = null;
if (!string.IsNullOrWhiteSpace(fileContentType))
{
blobHttpHeaders = new BlobHttpHeaders()
{
ContentType = fileContentType,
};
}
// retry already configured of Azure Storage API
await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);
return await blob.GetPropertiesAsync();
In the code snippet from above ContentHash is NOT calculated, but if I change the way I am getting the stream from the http request with following snippet ContentHash is calculated.
var stream = HttpContext.Current.Request.InputStream
P.S. I think its obvious, but with the old sdk, content hash was calculated for streams received by GetBufferlessInputStream method
P.S2 you can find also an open issue on github: https://github.com/Azure/azure-sdk-for-net/issues/14037
P.S3 added code snipet
Ran into this today. From my digging, it appears this is a symptom of the type of Stream you use to upload, and it's not really a bug. In order to generate a hash for your blob (which is done on the client side before uploading by the looks of it), it needs to read the stream. Which means it would need to reset the position of your stream back to 0 (for the actual upload process) after generating the hash. Doing this requires the ability to perform the Seek operation on the stream. If your stream doesn't support Seek, then it looks like it doesn't generate the hash.
To get around the issue, make sure the stream you provide supports Seek (CanSeek). If it doesn't, then use a different Stream/copy your data to a stream that does (for example MemoryStream). The alternative would be for the internals of the Blob SDK to do this for you.
A workaround is that when get the stream via GetBufferlessInputStream() method, convert it to MemoryStream, then upload the MemoryStream. Then it can generate the contenthash. Sample code like below:
var stream111 = System.Web.HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true);
//convert to memoryStream.
MemoryStream stream = new MemoryStream();
stream111.CopyTo(stream);
stream.Position = 0;
//other code
// retry already configured of Azure Storage API
await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);
Not sure why, but as per my debug, I can see when using the method GetBufferlessInputStream() in the latest SDK, during upload, it actually calls the Put Block api in the backend. And in this api, MD5 hash is not stored with the blob(Refer to here for details.). Screenshot as below:
However, when using InputStream, it calls the Put Blob api. Screenshot as below:
I am trying to upgrade my project from Microsoft.WindowsAzure.Storage v9 (deprecated) to latest sdk Azure.Storage.Blobs v12.
My issue (post-upgrade) is accessing the ContentHash property.
Pre-upgrade steps:
upload file to blob
get MD5 hash of uploaded file provided by CloudBlob.Properties.ContentMD5 from Microsoft.WindowsAzure.Storage.Blob
compare the calculated MD5 hash with the one retrieved from azure
Post-upgrade attempts to access the MD5 hash that Azure is calculating on its side:
1.BlobClient.GetProperties() calling this method
2.BlobClient.UploadAsync() looking at the BlobContentInfo response
both return ContentHash is null. (see my later Question to see why)
One huge difference I've noticed is that with older sdk I could tell to the storage client to use MD5 computing like this:
CloudBlobClient cloudBlobClient = _cloudStorageAccount.CreateCloudBlobClient();
cloudBlobClient.DefaultRequestOptions.StoreBlobContentMD5 = true;
So I was expecting to find something similar to StoreBlobContentMD5 on the latest sdk but I couldn't.
Can anyone help me find a solution for this problem?
Edit 1:
I did a test and in azure storage I do not have a MD5 hash
Upload code:
var container = _blobServiceClient.GetBlobContainerClient(containerName);
var blob = container.GetBlobClient(blobPath);
BlobHttpHeaders blobHttpHeaders = null;
if (!string.IsNullOrWhiteSpace(fileContentType))
{
blobHttpHeaders = new BlobHttpHeaders()
{
ContentType = fileContentType,
};
}
StorageTransferOptions storageTransferOption = new StorageTransferOptions()
{
MaximumConcurrency = 2,
};
var blobResponse = await blob.UploadAsync(stream, blobHttpHeaders, null, null, null, null, storageTransferOption, default);
return blob.GetProperties();
There is not much difference between old upload code and new one apart from using new classes from new sdk.
The main difference remains the one I already stated, I can not find an equivalent setting in new sdk for StoreBlobContentMD5 .
I think this is the problem. I need to set the storage client to compute MD5 hash, as I did with old sdk.
Edit 2:
For download I can do something like this:
var properties = blob.GetProperties();
var download = await blob.DownloadAsync(range: new HttpRange(0, properties.Value.ContentLength), rangeGetContentHash: true);
By using this definition of DownloadAsync I can force MD5 hash to be calculated and it can be found in download.Value.ContentHash
Summarize to close the question:
I did a quick test with the latest version of 12.4.4 blob storage package, I can see the content-md5 is auto-generated and can also be read.
And as per the op's comment, it may due to some issues with the existing solution. And after creating a new solution, it works as expected.
The short version of this problem is, make sure the Stream you upload to Azure using the v12 version of the SDK supports Seek (see the HasSeek property). It's currently required in order to traverse the Stream to generate the hash, and reset/seek the position back to 0 so that it can be read again for the actual upload.
I have the task of providing an api endpoint to find out how much space a particular module is using in our Amazon s3 bucket. I'm using the C# SDK.
I have accomplished this by adapting code from the documentation here: https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingNetSDK.html
private long GetUsedBytes(string module, string customer)
{
ListObjectsRequest listRequest = new ListObjectsRequest()
{
BucketName = BucketName,
Prefix = module + "/" + customer
};
ListObjectsResponse listResponse;
long totalSize = 0;
do
{
listResponse = s3Client.ListObjects(listRequest);
foreach (S3Object obj in listResponse.S3Objects)
{
totalSize += obj.Size;
}
listRequest.Marker = listResponse.NextMarker;
} while (listResponse.IsTruncated);
return totalSize;
}
My question is: is there a way to do this with the sdk without pulling all of the actual s3objects down off the bucket? There are several good answers about doing it with the CLI:
AWS S3: how do I see how much disk space is using
https://serverfault.com/questions/84815/how-can-i-get-the-size-of-an-amazon-s3-bucket
But I've yet to be able to find one using the SDK directly. Do I have to mimic the sdk somehow to accomplish this? Another method I considered is to get all the keys and query for their metadata, but the only way to get all the keys I've found is to grab all the objects as in the link above ^. If there's a way to get all the metadata for objects with a particular prefix that would be ideal.
Thanks for your time!
~Josh
Your code is not downloading any objects from Amazon S3. It is merely calling ListObjects() and totalling the size of each object. It will make one API call per 1000 objects.
Alternatively, you can retrieve the size of each bucket from Amazon CloudWatch.
From Monitoring Metrics with Amazon CloudWatch - Amazon S3:
Metric: BucketSizeBytes
The amount of data in bytes stored in a bucket. This value is calculated by summing the size of all objects in the bucket (both current and noncurrent objects), including the size of all parts for all incomplete multipart uploads to the bucket.
So, simply retrieve the metric from Amazon CloudWatch rather than calculating it yourself.