Continuing the saga, here is part I: ContentHash is null in Azure.Storage.Blobs v12.x.x
After a lot of debugging, root cause appears to be that the content hash was not calculated after uploading a blob, therefore the BlobContentInfo or BlobProperties were returning a null content hash and my whole flow is based on receiving the hash from Azure.
What I've discovered is that it depends on which HttpRequest stream method I call and upload to azure:
HttpRequest.GetBufferlessInputStream(), the content hash is not calculated, even if I go into azure storage explorer, the ContentMD5 of the blob is empty.
HttpRequest.InputStream() everything works as expected.
Do you know why this different behavior? And do you know how to make to receive content hash for streams received by GetBufferlessInputStream method.
So the code flow looks like this:
var stream = HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true)
var container = _blobServiceClient.GetBlobContainerClient(containerName);
var blob = container.GetBlockBlobClient(blobPath);
BlobHttpHeaders blobHttpHeaders = null;
if (!string.IsNullOrWhiteSpace(fileContentType))
{
blobHttpHeaders = new BlobHttpHeaders()
{
ContentType = fileContentType,
};
}
// retry already configured of Azure Storage API
await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);
return await blob.GetPropertiesAsync();
In the code snippet from above ContentHash is NOT calculated, but if I change the way I am getting the stream from the http request with following snippet ContentHash is calculated.
var stream = HttpContext.Current.Request.InputStream
P.S. I think its obvious, but with the old sdk, content hash was calculated for streams received by GetBufferlessInputStream method
P.S2 you can find also an open issue on github: https://github.com/Azure/azure-sdk-for-net/issues/14037
P.S3 added code snipet
Ran into this today. From my digging, it appears this is a symptom of the type of Stream you use to upload, and it's not really a bug. In order to generate a hash for your blob (which is done on the client side before uploading by the looks of it), it needs to read the stream. Which means it would need to reset the position of your stream back to 0 (for the actual upload process) after generating the hash. Doing this requires the ability to perform the Seek operation on the stream. If your stream doesn't support Seek, then it looks like it doesn't generate the hash.
To get around the issue, make sure the stream you provide supports Seek (CanSeek). If it doesn't, then use a different Stream/copy your data to a stream that does (for example MemoryStream). The alternative would be for the internals of the Blob SDK to do this for you.
A workaround is that when get the stream via GetBufferlessInputStream() method, convert it to MemoryStream, then upload the MemoryStream. Then it can generate the contenthash. Sample code like below:
var stream111 = System.Web.HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true);
//convert to memoryStream.
MemoryStream stream = new MemoryStream();
stream111.CopyTo(stream);
stream.Position = 0;
//other code
// retry already configured of Azure Storage API
await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);
Not sure why, but as per my debug, I can see when using the method GetBufferlessInputStream() in the latest SDK, during upload, it actually calls the Put Block api in the backend. And in this api, MD5 hash is not stored with the blob(Refer to here for details.). Screenshot as below:
However, when using InputStream, it calls the Put Blob api. Screenshot as below:
Related
I am writing a CLI utility that does a lot of different things, but what I'm struggling with right now is I have a known blob. For that blob, I want to restore a snapshot that was taken of that blob
await foreach (var snapshot in containerClient.GetBlobsAsync(
BlobTraits.All,
BlobStates.Snapshots,
blobPath))
{
_logger.LogInformation($"found blob {snapshot.Name} - {snapshot.Snapshot}");
if (DecideIfRightSnapshot(snapshot)) {
BlobClient snapshotBlob = containerClient.GetBlobClient(snapshot.Name);
_logger.LogInformation($"found snapshot {snapshotBlob.Uri}");
await sourceBlob.StartCopyFromUriAsync(snapshotBlob.Uri);
}
break;
}
First, the filter isn't working right because the last blob in the list is always the base blob. But I can work around that one.
The real issue i'm struggling with is the proper way to restore a blob from a snapshot using the libs? I'm really concerned because the .Uri function always returns the base file's uri, even if its a snapshot. I was lead to believe the URI would be something like this
https://me.blob.core.windows.net/myapp/doc?snapshot=2020-12-16T17:07:44.1076450Z
but thats not the URI thats getting logged. Am i supposed to construct the full URI myself?
In all the searches they refer to this as "promoting" a snapshot. But I can't find a "promote" method in the API.
Am i doing this right?
If you're using the new version of blob storage sdk: Azure.Storage.Blobs, then you should construct the full URI by yourself. The sample code like below:
//other code
await foreach (var snapshot in containerClient.GetBlobsAsync(
BlobTraits.All,
BlobStates.Snapshots,
blobPath))
{
_logger.LogInformation($"found blob {snapshot.Name} - {snapshot.Snapshot}");
if(DecideIfRightSnapshot(snapshot)) {
BlobClient snapshotBlob = containerClient.GetBlobClient(snapshot.Name);
//construct the snapshot url
var snapshot_uri = snapshotBlob.Uri.ToString() + "?snapshot=" + snapshot.Snapshot;
_logger.LogInformation($"found snapshot {snapshot_uri }");
await sourceBlob.StartCopyFromUriAsync(snapshot_uri);
}
break;
}
For promoting, it means you restore the snapshot via azure portal. It's a UI operation and actually it calls the Put Blob From URL api. And currently, there is no such method in sdk.
But if you're using some old packages like WindowsAzure.Storage, it has many methods to operate with snapshot, see this article. Note: it's not recommended to use the old packages.
I am using PGPCore to encrypt/decrypt files in an Azure Function. Files are stored in a Blob Block Container.
Encryption works fine, nothing to mention.
Decryption instead is giving me a few headaches.
Ideally, I would like to use the OpenWrite/Read methods on BlobBlockClient in order to avoid downloading and encrypting in memory. I was hoping to use something like this:
await using var sourceStream = await sourceBlobClient.OpenReadAsync();
var destBlobClient = destContainer.GetBlockBlobClient(command.SourcePath);
await using var destStream = await destBlobClient.OpenWriteAsync(true);
await using var privateKey = _pgpKeysProvider.GetPrivate();
using var pgp = new PgpCore.PGP();
await pgp.DecryptStreamAsync(sourceStream, destStream, privateKey.Key, privateKey.Passphrase);
I am however experiencing a few issues:
OpenReadAsync() returns a non-seekable stream, which apparently cannot be used by DecryptStreamAsync(). To mitigate this, I'm downloading the blob in memory, which I honestly wanted to avoid:
await using var sourceStream = new MemoryStream();
await sourceBlobClient.DownloadToAsync(sourceStream);
sourceStream.Position = 0;
Is there any way to fix this?
even if I bite the bullet and download the encrypted file into a MemoryStream, DecryptStreamAsync() is returning an empty stream. The only solution I have found so far is to decrypt to another MemoryStream and then upload this one to the Blob Container. Which I also wanted to avoid. I even tried to get rid of the using and manually call Dispose() on the streams, just in case there was any potential issue there. No luck.
Small files aren't a problem, I'm concerned about big ones (eg. 4gb or more)
Any advice?
I am trying to upgrade my project from Microsoft.WindowsAzure.Storage v9 (deprecated) to latest sdk Azure.Storage.Blobs v12.
My issue (post-upgrade) is accessing the ContentHash property.
Pre-upgrade steps:
upload file to blob
get MD5 hash of uploaded file provided by CloudBlob.Properties.ContentMD5 from Microsoft.WindowsAzure.Storage.Blob
compare the calculated MD5 hash with the one retrieved from azure
Post-upgrade attempts to access the MD5 hash that Azure is calculating on its side:
1.BlobClient.GetProperties() calling this method
2.BlobClient.UploadAsync() looking at the BlobContentInfo response
both return ContentHash is null. (see my later Question to see why)
One huge difference I've noticed is that with older sdk I could tell to the storage client to use MD5 computing like this:
CloudBlobClient cloudBlobClient = _cloudStorageAccount.CreateCloudBlobClient();
cloudBlobClient.DefaultRequestOptions.StoreBlobContentMD5 = true;
So I was expecting to find something similar to StoreBlobContentMD5 on the latest sdk but I couldn't.
Can anyone help me find a solution for this problem?
Edit 1:
I did a test and in azure storage I do not have a MD5 hash
Upload code:
var container = _blobServiceClient.GetBlobContainerClient(containerName);
var blob = container.GetBlobClient(blobPath);
BlobHttpHeaders blobHttpHeaders = null;
if (!string.IsNullOrWhiteSpace(fileContentType))
{
blobHttpHeaders = new BlobHttpHeaders()
{
ContentType = fileContentType,
};
}
StorageTransferOptions storageTransferOption = new StorageTransferOptions()
{
MaximumConcurrency = 2,
};
var blobResponse = await blob.UploadAsync(stream, blobHttpHeaders, null, null, null, null, storageTransferOption, default);
return blob.GetProperties();
There is not much difference between old upload code and new one apart from using new classes from new sdk.
The main difference remains the one I already stated, I can not find an equivalent setting in new sdk for StoreBlobContentMD5 .
I think this is the problem. I need to set the storage client to compute MD5 hash, as I did with old sdk.
Edit 2:
For download I can do something like this:
var properties = blob.GetProperties();
var download = await blob.DownloadAsync(range: new HttpRange(0, properties.Value.ContentLength), rangeGetContentHash: true);
By using this definition of DownloadAsync I can force MD5 hash to be calculated and it can be found in download.Value.ContentHash
Summarize to close the question:
I did a quick test with the latest version of 12.4.4 blob storage package, I can see the content-md5 is auto-generated and can also be read.
And as per the op's comment, it may due to some issues with the existing solution. And after creating a new solution, it works as expected.
The short version of this problem is, make sure the Stream you upload to Azure using the v12 version of the SDK supports Seek (see the HasSeek property). It's currently required in order to traverse the Stream to generate the hash, and reset/seek the position back to 0 so that it can be read again for the actual upload.
I am looking into PdfReport.Core and have been asked to let our .NET CORE 2.0 WEB-API return a PDF to the calling client. The client would be any https caller like a ajax or mvc client.
Below is a bit of the code I am using. I am using swashbuckle to test the api, which looks like it is returning the report but when I try to open in a PDF viewer it says it is curropted. I am thinking I am not actually outputting the actual PDF to the stream, suggestions?
[HttpGet]
[Route("api/v1/pdf")]
public FileResult GetPDF()
{
var outputStream = new MemoryStream();
InMemoryPdfReport.CreateStreamingPdfReport(_hostingEnvironment.WebRootPath, outputStream);
outputStream.Position = 0;
return new FileStreamResult(outputStream, "application/pdf")
{
FileDownloadName = "report.pdf"
};
}
I'm not familiar with that particular library, but generally speaking with streams, file corruption is a result of either 1) the write not being flushed or 2) incorrect positioning within the stream.
Since, you've set the position back to zero, I'm guessing the problem is that your write isn't being flushed correctly. Essentially, when you write to a stream, the data is not necessarily "complete" in the stream. Sometimes writes are queued to more efficiently write in batches. Sometimes, there's cleanup tasks a particular stream writer needs to complete to "finalize" everything. For example, with a format like PDF, end matter may need to be appended to the bytes, particular to the format. A stream writer that is writing PDF would take care of this in a flush operation, since it cannot be completed until all writing is done.
Long and short, review the documentation of the library. In particular, look for any method/process that deals with "flushing". That's most likely what your missing.
I have to interface with a slightly archaic system that doesn't use webservices. In order to send data to this system, I need to post an XML document into a form on the other system's website. This XML document can get very large so I would like to compress it.
The other system sits on IIS and I use C# my end. I could of course implement something that compresses the data before posting it, but that requires the other system to change so it can decompress the data. I would like to avoid changing the other system as I don't own it.
I have heard vague things about enabling compression / http 1.1 in IIS and the browser but I have no idea how to translate that to my program. Basically, is there some property I can set in my program that will make my program automatically compress the data that it is sending to IIS and for IIS to seamlessly decompress it so the receiving app doesn't even know the difference?
Here is some sample code to show roughly what I am doing;
private static void demo()
{
Stream myRequestStream = null;
Stream myResponseStream = null;
HttpWebRequest myWebRequest = (HttpWebRequest)System.Net
.WebRequest.Create("http://example.com");
byte[] bytMessage = null;
bytMessage = Encoding.ASCII.GetBytes("data=xyz");
myWebRequest.ContentLength = bytMessage.Length;
myWebRequest.Method = "POST";
// Set the content type as form so that the data
// will be posted as form
myWebRequest.ContentType = "application/x-www-form-urlencoded";
//Get Stream object
myRequestStream = myWebRequest.GetRequestStream();
//Writes a sequence of bytes to the current stream
myRequestStream.Write(bytMessage, 0, bytMessage.Length);
//Close stream
myRequestStream.Close();
WebResponse myWebResponse = myWebRequest.GetResponse();
myResponseStream = myWebResponse.GetResponseStream();
}
"data=xyz" will actually be "data=[a several MB XML document]".
I am aware that this question may ultimately fall under the non-programming banner if this is achievable through non-programmatic means so apologies in advance.
I see no way to compress the data on one side and receiving them uncompressed on the other side without actively uncompressing the data..
No idea if this will work since all of the examples I could find were for download, but you could try using gzip to compress the data, then set the Content-Encoding header on the outgoing message to gzip. I believe that the Length should be the length of the zipped message, although you may want to play with making it the length of the unencoded message if that doesn't work.
Good luck.
EDIT I think the issue is whether the ISAPI filter that supports compression is ever/always/configurably invoked on upload. I couldn't find an answer to that so I suspect that the answer is never, but you won't know until you try (or find the answer that eluded me).