C# Azure Function compare changes between two Blobs

C# Azure Function compare changes between two Blobs - c#

I am working on an Azure Function that should read two .csv files stored as Blobs in Azure Blob Storage and return a third, new, blob with lines that are different between both input blobs.
For example:
csv1:
12,aaa,bbb,ccc,ddd,eee,fff
13,aaa,bbb,ccc,ddd,eee,fff
csv2:
12,bbb,aaa,ccc,ddd,eee,fff
13,aaa,bbb,ccc,ddd,eee,fff
14,aaa,bbb,ccc,ddd,eee,fff
Output csv:
12,bbb,aaa,ccc,ddd,eee,fff
14,aaa,bbb,ccc,ddd,eee,fff
So far I have been able to read the Blob files but I have been unsuccessful in comparing them directly. I did manage to get it to work by reading in the Blobs and loading them into two different Datatables and perform the comparison between them. However, that method is far too slow and I am pretty sure there is a far more efficient way of handling it.
(Being more at home with Powershell, the Compare-Object function is kinda the exact thing I would love to create).
I can load in the Blobs using either the .DownloadText() or the .DownloadToStream() methods so getting the Blob contents is no problem.
blobA = container.GetBlockBlobReference("FileA");
blobB = container.GetBlockBlobReference("FileB");
string blobContentsA = blobA.DownloadText();
string blobContentsB = blobB.DownloadText();
or
string textA;
using (var memoryStream = new MemoryStream())
{
blobA.DownloadToStream(memoryStream);
textA = System.Text.Encoding.UTF8.GetString(memoryStream.ToArray());
}
string textB;
using (var memoryStream = new MemoryStream())
{
blobA.DownloadToStream(memoryStream);
textB = System.Text.Encoding.UTF8.GetString(memoryStream.ToArray());
}
I tried the code below but then I get a "cannot convert from 'System.Collections.Generic.IEnumerable' to 'string'" message so I guess I have to do something there, but what I have no clue to be honest.

Related

How to set ContentMD5 in DataLakeFileClient?

When uploading to an Azure Data Lake using the Microsoft Azure Storage Explorer the file automatically generates and stores a value for the ContentMD5 property. It also automatically does it in a function app that uses a Blob binding.
However, this does not automatically generate when uploading from a C# DLL.
I want to use this value to compare files in the future.
My code for the upload is very simple.
DataLakeFileClient fileClient = await directoryClient.CreateFileAsync("testfile.txt");
await fileClient.UploadAsync(fileStream);
I also know I can generate an MD5 using the below code, but I'm not certain if this is the same way that Azure Storage Explorer does it.
using (var md5gen = MD5.Create())
{
md5hash = md5gen.ComputeHash(fileStream);
}
but I have no idea how to set this value to the ContentMD5 property of the file.

I have found the solution.
The UploadAsync method has an overload that accepts a parameter of type DataLakeFileUploadOptions. This class contains a HttpHeaders object which in turn has a ContentHash property which stores it as a property of the document.
var uploadOptions = new DataLakeFileUploadOptions();
uploadOptions.HttpHeaders = new PathHttpHeaders();
uploadOptions.HttpHeaders.ContentHash = md5hash;
await fileClient.UploadAsync(fileStream, uploadOptions);

PGP encryption of large files on Azure Blob

I am using PGPCore to encrypt/decrypt files in an Azure Function. Files are stored in a Blob Block Container.
Encryption works fine, nothing to mention.
Decryption instead is giving me a few headaches.
Ideally, I would like to use the OpenWrite/Read methods on BlobBlockClient in order to avoid downloading and encrypting in memory. I was hoping to use something like this:
await using var sourceStream = await sourceBlobClient.OpenReadAsync();
var destBlobClient = destContainer.GetBlockBlobClient(command.SourcePath);
await using var destStream = await destBlobClient.OpenWriteAsync(true);
await using var privateKey = _pgpKeysProvider.GetPrivate();
using var pgp = new PgpCore.PGP();
await pgp.DecryptStreamAsync(sourceStream, destStream, privateKey.Key, privateKey.Passphrase);
I am however experiencing a few issues:
OpenReadAsync() returns a non-seekable stream, which apparently cannot be used by DecryptStreamAsync(). To mitigate this, I'm downloading the blob in memory, which I honestly wanted to avoid:
await using var sourceStream = new MemoryStream();
await sourceBlobClient.DownloadToAsync(sourceStream);
sourceStream.Position = 0;
Is there any way to fix this?
even if I bite the bullet and download the encrypted file into a MemoryStream, DecryptStreamAsync() is returning an empty stream. The only solution I have found so far is to decrypt to another MemoryStream and then upload this one to the Blob Container. Which I also wanted to avoid. I even tried to get rid of the using and manually call Dispose() on the streams, just in case there was any potential issue there. No luck.
Small files aren't a problem, I'm concerned about big ones (eg. 4gb or more)
Any advice?

Reliable image handling on Azure platform / Dotnet core

I'm struggling a little with images on the Azure platform under dotnet core and I'm hoping someone can make a sensible suggestion.
Simple enough premise: user uploads image, saved in a database as base64 (about to move to Azure storage blob, but that's irrelevant to this). Later on, site owner comes along and clicks a button to get all these images down in a ZIP file. In the old days of .net framework this was easy. Now I seem to be hitting a set of 'yes, but' comments.
Yes, there's system.drawing.image but you can't use that because it's not in dotnet core (until recently).
Yes, you can use CoreCompat but it doesn't work on Azure because in Web Applications there's no support for GDI+.
Yes, even if I could, I'm developing on a Mac so it won't work locally as far as I can see.
I have tried beta4 of ImageSharp without a lot of success. It's random - sometimes it works, sometimes it just throws OutOfMemoryException.
I have tried SkiaSharp but similar results; sometimes it works, sometimes it spits out random error messages.
I'm not doing anything fancy in terms of processing, no resizing or anything. It should be a case of load file to byte array from Convert.FromBase64String, create Zip file entry, ultimately spit out zip file. The ZIP portion is fine, but I need something decent that can do the image work.
Here's a bit of code:
if(!string.IsNullOrEmpty(del.Headshot))
{
var output=SKImage.FromBitmap(SKBitmap.Decode(Convert.FromBase64String(del.Headshot)));
MemoryStream savedFile=new MemoryStream();
output.Encode(SKEncodedImageFormat.Jpeg, 100).SaveTo(savedFile);
string name=$"{del.FirstName} {del.LastName} - {del.Company}".Trim(Path.GetInvalidFileNameChars()) + "_Headshot.jpg";
ZipArchiveEntry entry=zip.CreateEntry(name);
using(Stream entryStream=entry.Open())
{
entryStream.Write(savedFile.ToArray(), 0, Convert.ToInt32(savedFile.Length));
}
output.Dispose();
savedFile.Dispose();
}
Can anyone give me a sensible suggestion for a library that can handle images, cross-platform and on Azure, before I pull out what little hair remains!
Thanks
EDIT: The first answer is technically correct, I don't need anything else. However, I might have been a bit wrong when I said I wasn't doing any image manipulation. Because it's all base64 without a filename being stored anywhere, I've no idea what sort of file it is. I'm therefore saving each one as JPEG to ensure that I can always output that file type and extension. Users I guess could be uploading JPG / PNG or even GIF.

Technically you do not need any of those other imaging (unless you are doing more that just zipping the content). Convert the base64 to byte array and pass that to the zip file. No need to save to disk just to read it back again for zipping.
//...
if(!string.IsNullOrEmpty(del.Headshot)) {
var imageBytes = Convert.FromBase64String(del.Headshot);
string name = $"{del.FirstName} {del.LastName} - {del.Company}".Trim(Path.GetInvalidFileNameChars()) + "_Headshot.jpg";
ZipArchiveEntry entry = zip.CreateEntry(name);
using(Stream entryStream = entry.Open()) {
entryStream.Write(imageBytes, 0, imageBytes.Length));
}
}
//...
Also using a minor hack for known image types when converted to base64
public static class ImagesUtility {
static IDictionary<string, string> mimeMap =
new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
{
{ "IVBOR", "png" },
{ "/9J/4", "jpg" },
//...add others
};
/// <summary>
/// Extract image file extension from base64 string.
/// </summary>
/// <param name="base64String">base64 string.</param>
/// <returns>file extension from string.</returns>
public static string GetFileExtension(string base64String) {
var data = base64String.Substring(0, 5);
var extension = mimeMap[data.ToUpper()];
return extension;
}
}
You could try to determine the file extension from its prefix
if(!string.IsNullOrEmpty(del.Headshot)) {
var imageBytes = Convert.FromBase64String(del.Headshot);
var ext = ImagesUtility.GetFileExtension(del.Headshot) ?? "jpg";
string name = $"{del.FirstName} {del.LastName} - {del.Company}".Trim(Path.GetInvalidFileNameChars()) + $"_Headshot.{ext}";
ZipArchiveEntry entry = zip.CreateEntry(name);
using(Stream entryStream = entry.Open()) {
entryStream.Write(imageBytes, 0, imageBytes.Length));
}
}
Now ideally, if you are able to control the type of image uploaded, then you should also validate and do what ever image processing when the data is being saved along with any needed metadata (ie content type). That way when extracting it from storage, you can be confident that it is the correct type and size. That saves you having to worry about that later on.

Aspose.Drawing and Aspose.Imaging can handle images and cross-platform running on .NET Core (I'm one of the developers).

Strange results from OpenReadAsync() when reading data from Azure Blob storage

I'm having a go at modifying an existing C# (dot net core) app that reads a type of binary file to use Azure Blob Storage.
I'm using Windows.Azure.Storage (8.6.0).
The issue is that this app reads the binary data from files from a Stream in very small blocks (e.g. 5000-6000 bytes). This reflects how the data is structured.
Example pseudo code:
var blocks = new List<byte[]>();
var numberOfBytesToRead = 6240;
var numberOfBlocksToRead = 1700;
using (var stream = await blob.OpenReadAsync())
{
stream.Seek(3000, SeekOrigin.Begin); // start reading at a particular position
for (int i = 1; i <= numberOfBlocksToRead; i++)
{
byte[] traceValues = new byte[numberOfBytesToRead];
stream.Read(traceValues, 0, numberOfBytesToRead);
blocks.Add(traceValues);
}
}`
If I try to read a 10mb file using OpenReadAsync(), I get invalid/junk values in the byte arrays after around 4,190,000 bytes.
If I set StreamMinimumReadSize to 100Mb it works.
If I read more data per block (e.g. 1mb) it works.
Some of the files can be more than 100Mb, so setting the StreamMinimumReadSize may not be the best solution.
What is going on here, and how can I fix this?

Are the invalid/junk values zeros? If so (and maybe even if not) check the return value from stream.Read. That method is not guaranteed to actually read the number of bytes that you ask it to. It can read less. In which case you are supposed to call it again in a loop, until it has read the total amount that you want. A quick web search should show you lots of examples of the necessary looping.

how to get a picture from a postgresql database

i want to select a picture that save as a large object in a postgresql database.
i know i use lo_export to do this.
but there is a problem: i want to save this picture directly to my computer because i cant access the files that save on Server using lo_export
(i think that is best for me if picture transfer to my computer by a select query)

I don't exactly know my way around C# but the Npgsql Manual has an example sort of like this of writing a bytea column to a file:
command = new NpgsqlCommand("select blob from t where id = 1);", conn);
Byte[] result = (Byte[])command.ExecuteScalar();
FileStream fs = new FileStream(args[0] + "database", FileMode.Create, FileAccess.Write);
BinaryWriter bw = new BinaryWriter(new BufferedStream(fs));
bw.Write(result);
bw.Flush();
fs.Close();
bw.Close();
So you just read it out of the database pretty much like any other column and write it to a local file. The example is about half way down the page I linked to, just search for "bytea" and you'll find it.
UPDATE: For large objects, the process appears to be similar but less SQL-ish. The manual (as linked to above) includes a few large object examples:
NpgsqlTransaction t = Polacz.BeginTransaction();
LargeObjectManager lbm = new LargeObjectManager(Polacz);
LargeObject lo = lbm.Open(takeOID(idtowaru),LargeObjectManager.READWRITE); //take picture oid from metod takeOID
byte[] buf = new byte[lo.Size()];
buf = lo.Read(lo.Size());
MemoryStream ms = new MemoryStream();
ms.Write(buf,0,lo.Size());
// ...
Image zdjecie = Image.FromStream(ms);
Search the manual for "large object" and you'll find it.

Not familiar with C# but if you've contrib/dblink around and better access to a separate postgresql server, this might work:
select large object from bad db server.
copy large object into good db server using dblink.
lo_export on good db server.

If your pictures don't exceed 1GB (or if you don't access only parts of the bytes) then using bytea is the better choice to store them.
A lot of SQL GUI tools allow to directly download (even view) the content of bytea columns directly

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Azure Function compare changes between two Blobs - c#

Related

How to set ContentMD5 in DataLakeFileClient?

PGP encryption of large files on Azure Blob

Reliable image handling on Azure platform / Dotnet core

Strange results from OpenReadAsync() when reading data from Azure Blob storage

how to get a picture from a postgresql database

Categories

Resources