PGP encryption of large files on Azure Blob

PGP encryption of large files on Azure Blob - c#

I am using PGPCore to encrypt/decrypt files in an Azure Function. Files are stored in a Blob Block Container.
Encryption works fine, nothing to mention.
Decryption instead is giving me a few headaches.
Ideally, I would like to use the OpenWrite/Read methods on BlobBlockClient in order to avoid downloading and encrypting in memory. I was hoping to use something like this:
await using var sourceStream = await sourceBlobClient.OpenReadAsync();
var destBlobClient = destContainer.GetBlockBlobClient(command.SourcePath);
await using var destStream = await destBlobClient.OpenWriteAsync(true);
await using var privateKey = _pgpKeysProvider.GetPrivate();
using var pgp = new PgpCore.PGP();
await pgp.DecryptStreamAsync(sourceStream, destStream, privateKey.Key, privateKey.Passphrase);
I am however experiencing a few issues:
OpenReadAsync() returns a non-seekable stream, which apparently cannot be used by DecryptStreamAsync(). To mitigate this, I'm downloading the blob in memory, which I honestly wanted to avoid:
await using var sourceStream = new MemoryStream();
await sourceBlobClient.DownloadToAsync(sourceStream);
sourceStream.Position = 0;
Is there any way to fix this?
even if I bite the bullet and download the encrypted file into a MemoryStream, DecryptStreamAsync() is returning an empty stream. The only solution I have found so far is to decrypt to another MemoryStream and then upload this one to the Blob Container. Which I also wanted to avoid. I even tried to get rid of the using and manually call Dispose() on the streams, just in case there was any potential issue there. No luck.
Small files aren't a problem, I'm concerned about big ones (eg. 4gb or more)
Any advice?

Related

Using memory stream is throwing out of memory exeption

I have a requirement where I need to encrypt file of size 1-2 GB in azure function. In am using PGP core library to encrypt file in memory. The below code is throwing out of memory exception if file size is above 700 MB. Note:- I am using azure function. Scaling up of App service plan didn't help.
I there any alternate of Memory stream that I can use. After encryption , I am uploading file into blob storage.
var privateKeyEncoded = Encoding.UTF8.GetString(Convert.FromBase64String(_options.PGPKeys.PublicKey));
using Stream privateKeyStream = StringToStreamUtility.GenerateStreamFromString(privateKeyEncoded);
privateKeyStream.Position = 0;
var encryptionKeys = new EncryptionKeys(privateKeyStream);
var pgp = new PGP(encryptionKeys);
//encrypt stream
var encryptStream = new MemoryStream();
await pgp.EncryptStreamAsync(streamToEncrypt, encryptStream );

MemoryStream is a Stream wrapper over a byte[]` buffer. Every time that buffer is full, a new one with double the size is allocated and the data is copied. This eventually uses double the final buffer size (4GB for a 2GB file) but worse, it results in such memory fragmentation that eventually the memory allocator can't find a new contiguous memory block to allocate. That's when you get an OOM.
While you could avoid OOM errors by specifying a capacity in the constructor, storing 2GB in memory before even starting to write it is very wasteful. With a real FileStream the encrypted bytes would be written out as soon as they were available.
Azure Functions allow temporary storage. This means you can create a temporary file, open a stream on it and use it for encryption.
var tempPath=Path.GetTempFileName();
try
{
using (var outputStream=File.Open(tempPath))
{
await pgp.EncryptStreamAsync(streamToEncrypt, outputStream);
...
}
}
finally
{
File.Delete(tempPath);
}

MemoryStream uses a byte[] internally, and any byte[] is going to get a bit brittle as it gets around/above 1GiB (although in theory a byte[] can be nearly 2 GiB, in reality this isn't a good idea, and is rarely seen).
Frankly, MemoryStream simply isn't a good choice here; I'd probably suggest using a temporary file instead, and use a FileStream. This doesn't attempt to keep everything in memory at once, and is more reliable at large sizes. Alternatively: avoid ever needing all the data at once completely, by performing the encryption in a pass-thru streaming way.

ContentHash not calculated in Azure Blob Storage v12

Continuing the saga, here is part I: ContentHash is null in Azure.Storage.Blobs v12.x.x
After a lot of debugging, root cause appears to be that the content hash was not calculated after uploading a blob, therefore the BlobContentInfo or BlobProperties were returning a null content hash and my whole flow is based on receiving the hash from Azure.
What I've discovered is that it depends on which HttpRequest stream method I call and upload to azure:
HttpRequest.GetBufferlessInputStream(), the content hash is not calculated, even if I go into azure storage explorer, the ContentMD5 of the blob is empty.
HttpRequest.InputStream() everything works as expected.
Do you know why this different behavior? And do you know how to make to receive content hash for streams received by GetBufferlessInputStream method.
So the code flow looks like this:
var stream = HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true)
var container = _blobServiceClient.GetBlobContainerClient(containerName);
var blob = container.GetBlockBlobClient(blobPath);
BlobHttpHeaders blobHttpHeaders = null;
if (!string.IsNullOrWhiteSpace(fileContentType))
{
blobHttpHeaders = new BlobHttpHeaders()
{
ContentType = fileContentType,
};
}
// retry already configured of Azure Storage API
await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);
return await blob.GetPropertiesAsync();
In the code snippet from above ContentHash is NOT calculated, but if I change the way I am getting the stream from the http request with following snippet ContentHash is calculated.
var stream = HttpContext.Current.Request.InputStream
P.S. I think its obvious, but with the old sdk, content hash was calculated for streams received by GetBufferlessInputStream method
P.S2 you can find also an open issue on github: https://github.com/Azure/azure-sdk-for-net/issues/14037
P.S3 added code snipet

Ran into this today. From my digging, it appears this is a symptom of the type of Stream you use to upload, and it's not really a bug. In order to generate a hash for your blob (which is done on the client side before uploading by the looks of it), it needs to read the stream. Which means it would need to reset the position of your stream back to 0 (for the actual upload process) after generating the hash. Doing this requires the ability to perform the Seek operation on the stream. If your stream doesn't support Seek, then it looks like it doesn't generate the hash.
To get around the issue, make sure the stream you provide supports Seek (CanSeek). If it doesn't, then use a different Stream/copy your data to a stream that does (for example MemoryStream). The alternative would be for the internals of the Blob SDK to do this for you.

A workaround is that when get the stream via GetBufferlessInputStream() method, convert it to MemoryStream, then upload the MemoryStream. Then it can generate the contenthash. Sample code like below:
var stream111 = System.Web.HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true);
//convert to memoryStream.
MemoryStream stream = new MemoryStream();
stream111.CopyTo(stream);
stream.Position = 0;
//other code
// retry already configured of Azure Storage API
await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);
Not sure why, but as per my debug, I can see when using the method GetBufferlessInputStream() in the latest SDK, during upload, it actually calls the Put Block api in the backend. And in this api, MD5 hash is not stored with the blob(Refer to here for details.). Screenshot as below:
However, when using InputStream, it calls the Put Blob api. Screenshot as below:

C#- Renci.Ssh.Net- Which one gives optimized performance- WriteAllText Vs. UploadFile

I need to generate multiple XML files at SFTP location from C# code. for SFTP connectivity, I am using Renci.Ssh.net. I found there are different methods to generate files including WriteAllText() and UploadFile(). I am producing XML string runtime, currently I've used WriteAllText() method (just to avoid creating the XML file on local and thus to avoid IO operation).
using (SftpClient client = new SftpClient(host,port, sftpUser, sftpPassword))
{
client.Connect();
if (client.IsConnected)
{
client.BufferSize = 1024;
var filePath = sftpDir + fileName;
client.WriteAllText(filePath, contents);
client.Disconnect();
}
client.Dispose();
}
Will using UploadFile(), either from FileStream or MemoryStream give me better performance in long run?
The result document size will be in KB, around 60KB.
Thanks!

SftpClient.UploadFile is optimized for uploads of large amount of data.
But for 60KB, I'm pretty sure that it makes no difference whatsoever. So you can continue using the more convenient SftpClient.WriteAllText.
Though, I believe that most XML generators (like .NET XmlWriter are able to write XML to Stream (it's usually the preferred output API, rather than a string). So the use of SftpClient.UploadFile can be more convenient in the end.
See also What is the difference between SftpClient.UploadFile and SftpClient.WriteAllBytes?

C# Azure Function compare changes between two Blobs

I am working on an Azure Function that should read two .csv files stored as Blobs in Azure Blob Storage and return a third, new, blob with lines that are different between both input blobs.
For example:
csv1:
12,aaa,bbb,ccc,ddd,eee,fff
13,aaa,bbb,ccc,ddd,eee,fff
csv2:
12,bbb,aaa,ccc,ddd,eee,fff
13,aaa,bbb,ccc,ddd,eee,fff
14,aaa,bbb,ccc,ddd,eee,fff
Output csv:
12,bbb,aaa,ccc,ddd,eee,fff
14,aaa,bbb,ccc,ddd,eee,fff
So far I have been able to read the Blob files but I have been unsuccessful in comparing them directly. I did manage to get it to work by reading in the Blobs and loading them into two different Datatables and perform the comparison between them. However, that method is far too slow and I am pretty sure there is a far more efficient way of handling it.
(Being more at home with Powershell, the Compare-Object function is kinda the exact thing I would love to create).
I can load in the Blobs using either the .DownloadText() or the .DownloadToStream() methods so getting the Blob contents is no problem.
blobA = container.GetBlockBlobReference("FileA");
blobB = container.GetBlockBlobReference("FileB");
string blobContentsA = blobA.DownloadText();
string blobContentsB = blobB.DownloadText();
or
string textA;
using (var memoryStream = new MemoryStream())
{
blobA.DownloadToStream(memoryStream);
textA = System.Text.Encoding.UTF8.GetString(memoryStream.ToArray());
}
string textB;
using (var memoryStream = new MemoryStream())
{
blobA.DownloadToStream(memoryStream);
textB = System.Text.Encoding.UTF8.GetString(memoryStream.ToArray());
}
I tried the code below but then I get a "cannot convert from 'System.Collections.Generic.IEnumerable' to 'string'" message so I guess I have to do something there, but what I have no clue to be honest.

Creating a zip file in situ within azure blob storage

I have files stored in one container within a blob storage account. I need to create a zip file in a second container containing the files from the first container.
I have a solution that works using a worker role and DotNetZip but because the zip file could end up being 1GB in size I am concerned that doing all the work in-process, using MemoryStream objects etc. is not the best way of doing this. My biggest concern is that of memory usage and freeing up resources given that this process could happen several times a day.
Below is some very stripped down code showing the basic process in the worker role:
using (ZipFile zipFile = new ZipFile())
{
foreach (var uri in uriCollection)
{
var blob = new CloudBlob(uri);
byte[] fileBytes = blob.DownloadByteArray();
using (var fileStream = new MemoryStream(fileBytes))
{
fileStream.Seek(0, SeekOrigin.Begin);
byte[] bytes = CryptoHelp.EncryptAsBytes(fileStream, "password", null);
zipFile.AddEntry("entry name", bytes);
}
}
using (var zipStream = new MemoryStream())
{
zipFile.Save(zipStream);
zipStream.Seek(0, SeekOrigin.Begin);
var blobRef = ContainerDirectory.GetBlobReference("output uri");
blobRef.UploadFromStream(zipStream);
}
}
Can someone suggest a better approach please?

At the time of writing this question, I was unaware of the LocalStorage options available in Azure. I was able to write files individually to this and the work with them within the LocalStorage and then write them back to blob storage.

If all you are worried aobut is your memorysteam taking up too much memory then what you can do is implement your own stream and as your stream is being read, you you add your zip files to the stream and remove already read files from the stream. This will keep your memory stream size to the size of one file.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

PGP encryption of large files on Azure Blob - c#

Related

Using memory stream is throwing out of memory exeption

ContentHash not calculated in Azure Blob Storage v12

C#- Renci.Ssh.Net- Which one gives optimized performance- WriteAllText Vs. UploadFile

C# Azure Function compare changes between two Blobs

Creating a zip file in situ within azure blob storage

Categories

Resources