Decompressing gzipped ReadOnlyMemory<byte> before I do JsonDocument.Parse

Decompressing gzipped ReadOnlyMemory<byte> before I do JsonDocument.Parse - c#

The websocket client is returning a ReadOnlyMemory<byte>.
The issue is that JsonDocument.Parse fails due to the fact that the buffer has been compressed. I've got to decompress it somehow before I parse it. How do I do that? I cannot really change the websocket library code.
What I want is something like public Func<ReadOnlyMemory<byte>> DataInterpreterBytes = () => which optionally decompresses these bytes out of this class. How do I do that? Is it possible to decompress ReadOnlyMemory<byte> and if the handler is unused to basically to do nothing.
private static string DecompressData(byte[] byteData)
{
using var decompressedStream = new MemoryStream();
using var compressedStream = new MemoryStream(byteData);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
deflateStream.CopyTo(decompressedStream);
decompressedStream.Position = 0;
using var streamReader = new StreamReader(decompressedStream);
return streamReader.ReadToEnd();
}
Snippet
private void OnMessageReceived(object? sender, MessageReceivedEventArgs e)
{
var timestamp = DateTime.UtcNow;
_logger.LogTrace("Message was received. {Message}", Encoding.UTF8.GetString(e.Message.Buffer.Span));
// We dispose that object later on
using var document = JsonDocument.Parse(e.Message.Buffer);
var tokenData = document.RootElement;

So, if you had a byte array, you'd do this:
private static JsonDocument DecompressData(byte[] byteData)
{
using var compressedStream = new MemoryStream(byteData);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
return JsonDocument.Parse(deflateStream);
}
This is similar to your snippet above, but no need for the intermediate copy: just read straight from the GzipStream. JsonDocument.Parse also has an overload that takes a stream, so you can use that and avoid yet another useless copy.
Unfortunately, you don't have a byte array, you have a ReadOnlyMemory<byte>. There is no way out of the box to create a memory stream out of a ReadOnlyMemory<byte>. Honestly, it feels like an oversight, like they forgot to put that feature into .NET.
So here are your options instead.
The first option is to just convert the ReadOnlyMemory<byte> object to an array with ToArray():
// assuming e.Message.Buffer is a ReadOnlyMemory<byte>
using var document = DecompressData(e.Message.Buffer.ToArray());
This is really straightforward, but remember it actually copies the data, so for large documents it might not be a good idea if you want to avoid using too much memory.
The second is to try and extract the underlying array from the memory. This can be achieved with MemoryMarshal.TryGetArray, which gives you an ArraySegment (but might fail if the memory isn't actually a managed array).
private static JsonDocument DecompressData(ReadOnlyMemory<byte> byteData)
{
if(MemoryMarshal.TryGetArray(byteData, out var segment))
{
using var compressedStream = new MemoryStream(segment.Array, segment.Offset, segment.Count);
// rest of the code goes here
}
else
{
// Welp, this memory isn't actually an array, so... tough luck?
}
}
The third way might feel dirty, but if you're okay with using unsafe code, you can just pin the memory's span and then use UnmanagedMemoryStream:
private static unsafe JsonDocument DecompressData(ReadOnlyMemory<byte> byteData)
{
fixed (byte* ptr = byteData.Span)
{
using var compressedStream = new UnmanagedMemoryStream(ptr, byteData.Length);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
return JsonDocument.Parse(deflateStream);
}
}
The other solution is to write your own Stream class that supports this. The Windows Community Toolkit has an extension method that returns a Stream wrapper around the memory object. If you're not okay with using an entire third party library just for that, you can probably just roll your own, it's not that much code.

Related

How to perform a minimal-allocation conversion of 'Memory<T>' into 'Stream'?

There doesn't seem to exist a native way of converting a Memory<T> instance into a Stream in the framework. Is there a simple way to achieve this using some allocation-free approach that only uses the actual memory from the Memory<T> instance, or perhaps a way that leverages only a minimal buffer?
I wasn't able to find any examples going from Memory<T> to Stream: only the opposite is common since there are several newer overloads on Stream that allow handling Memory<T> instances.
It seems so intuitive that one would be able to convert from Memory<> to MemoryStream (due mostly to their names, admittedly) so I was a bit disappointed to find this wasn't the case.
I also wasn't able to find any easy way of creating a System.IO.Pipelines PipeReader or PipeWriter with a Memory<T>, as those have a AsStream extension.

The thing is, the Memory<T> might not actually be looking at an array: it might be using a MemoryManager, or doing something else.
That said, it's possible to use MemoryMarshal.TryGetArray to see whether it does reference an array, and take an optimized path if so:
ReadOnlyMemory<byte> memory = new byte[] { 1, 2, 3, 4, 5 }.AsMemory();
var stream = MemoryMarshal.TryGetArray(memory, out var arraySegment)
? new MemoryStream(arraySegment.Array, arraySegment.Offset, arraySegment.Count)
: new MemoryStream(memory.ToArray());

You will need to use unsafe code:
private static unsafe void DoUnsafeMemoryStuff<T>(ReadOnlyMemory<T> mem) where T : unmanaged
{
Stream stream;
MemoryHandle? memHandle = null;
GCHandle? byteHandle = null;
int itemSize = sizeof(T);
if (MemoryMarshal.TryGetArray(memory, out var arr))
{
byteHandle = GCHandle.Alloc(arr.Array, GCHandleType.Pinned);
int offsetBytes = arr.Offset * itemSize;
int totalBytes = arr.Count * itemSize;
int streamBytesRemaining = totalBytes - offsetBytes;
stream = new UnmanagedMemoryStream((byte*)byteHandle.Pointer + offsetBytes, streamBytesRemaining);
}
else
{
// common path, will be very fast to get the stream setup
memHandle = mem.Pin();
stream = new UnmanagedMemoryStream((byte*)memHandle.Pointer, mem.Length * itemSize);
}
// use stream like any other stream - you will need to keep your memory object and the handle object alive until your stream usage is complete
try
{
// do stuff with stream
}
finally
{
// cleanup
if (memHandle.HasValue) { memHandle.Value.Dispose(); }
if (byteHandle.HasValue) { byteHandle.Value.Free(); }
}
}

Decompress Stream to String using SevenZipSharp

I'd like to compress a string using SevenZipSharp and have cobbled together a C# console application (I'm new to C#) using the following code, (bits and pieces of which came from similar questions here on SO).
The compress part seems to work (albeit I'm passing in a file instead of a string), output of the compressed string to the console looks like gibberish but I'm stuck on the decompress...
I'm trying to do the same thing as here (I think):
https://stackoverflow.com/a/4305399/3451115
https://stackoverflow.com/a/45861659/3451115
https://stackoverflow.com/a/36331690/3451115
Appreciate any help, ideally the console will display the compressed string followed by the decompressed string.
Thanks :)
using System;
using System.IO;
using SevenZip;
namespace _7ZipWrapper
{
public class Program
{
public static void Main()
{
SevenZipCompressor.SetLibraryPath(#"C:\Temp\7za64.dll");
SevenZipCompressor compressor = new SevenZipCompressor();
compressor.CompressionMethod = CompressionMethod.Ppmd;
compressor.CompressionLevel = SevenZip.CompressionLevel.Ultra;
compressor.ScanOnlyWritable = true;
var compStream = new MemoryStream();
var decompStream = new MemoryStream();
compressor.CompressFiles(compStream, #"C:\Temp\a.txt");
StreamReader readerC = new StreamReader(compStream);
Console.WriteLine(readerC.ReadToEnd());
Console.ReadKey();
// works up to here... below here output to consol is: ""
SevenZipExtractor extractor = new SevenZip.SevenZipExtractor(compStream);
extractor.ExtractFile(0, decompStream);
StreamReader readerD = new StreamReader(decompStream);
Console.WriteLine(readerD.ReadToEnd());
Console.ReadKey();
}
}
}

The result of compression is binary data - it isn't a string. If you try to read it as a string, you'll just see garbage. That's to be expected - you shouldn't be treating it as a string.
The next problem is that you're trying to read from compStream twice, without "rewinding" it first. You're starting from the end of the stream, which means there's no data for it to decompress. If you just add:
compStream.Position = 0;
before you create the extractor, you may well find it works immediately. You may also need to rewind the decompStream before reading from it. So you'd have code like this:
// Rewind to the start of the stream before decompressing
compStream.Position = 0;
SevenZipExtractor extractor = new SevenZip.SevenZipExtractor(compStream);
extractor.ExtractFile(0, decompStream);
// Rewind to the start of the decompressed stream before reading
decompStream.Position = 0;

Convert List of bytes to memorystream without using ToArray()

How can I save my list<byte> to MemoryStream() without using ToArray() or creating new array ?
This is my current method:
public Packet(List<byte> data)
{
// Create new stream from data buffer
using (Stream stream = new MemoryStream(data.ToArray()))
{
using (BinaryReader reader = new BinaryReader(stream))
{
Length = reader.ReadInt16();
pID = reader.ReadByte();
Result = reader.ReadByte();
Message = reader.ReadString();
ID = reader.ReadInt32();
}
}
}

The ToArray solution is the most efficient solution possible using documented APIs. MemoryStream will not copy the array. It will just store it. So the only copy is in List<T>.ToArray().
If you want to avoid that copy you need to pry List<T> open using reflection and access the backing array. I advise against that.
Instead, use a collection that allows you to obtain the backing array using legal means. Write your own, or use a MemoryStream in the first place.
A List<T> is not the most efficient way to move around bytes anyway. Storing them is fine, moving them usually has more overhead. For example, adding items bytewise will be far slower than a memcpy.

What about something like:
public Packet(List<byte> data)
{
using (Stream stream = new MemoryStream())
{
// Loop list and write out bytes
foreach(byte b in data)
stream.WriteByte(b);
// Reset stream position ready for read
stream.Seek(0, SeekOrigin.Begin);
using (BinaryReader reader = new BinaryReader(stream))
{
Length = reader.ReadInt16();
pID = reader.ReadByte();
Result = reader.ReadByte();
Message = reader.ReadString();
ID = reader.ReadInt32();
}
}
}
But why do you have a list in the first place? Can't you pass it into the method as a byte[] to start with? It'd be interesting to see how you populate that list.

Stream chain wrapping done in the right way

I'm using Bouncy Castle cryptographic API for C# to create armored output of encrypted data.
The code looks ugly, in particular, like this:
string result = string.Empty;
using (var outputStream = new MemoryStream())
{
using (var armoredStream = AddArmorWrappingTo(outputStream))
{
using (var encryptedStream = AddEncryptionWrappingTo(armoredStream))
{
using (var literalStream = AddLiteralWrappingTo(encryptedStream))
{
using (var inputStream = new MemoryStream(input))
{
this.Write(inputStream, literalStream);
}
}
}
}
result = Encoding.ASCII.GetString(outputStream.ToArray());
}
return result;
The issue here is if I will need to add compression of the data, I cannot change this piece of code, I need to write new one instead, since compressing in Bouncy Castle's world done as one more stream wrapper around future output stream.
To work properly, the streams need to be wrapped in correct order, and closed properly, otherwise there will be no usable result of this operation.
In addition all these intermediate streams should also present (I cannot overwrite the same stream variable over and over).
I've created extension methods to stream wrapper creators, and it looks like this now:
string result = string.Empty;
Stream[] pack = new Stream[3];
var outputStream = new MemoryStream();
var inputStream = new MemoryStream(input);
pack[0] = outputStream.Armor();
pack[1] = pack[0].EncryptWith(PublicKey);
pack[2] = pack[1].SplitByLiterals();
this.Write(inputStream, pack[2]);
pack[2].Close();
pack[1].Close();
pack[0].Close();
result = Encoding.ASCII.GetString(outputStream.ToArray());
return result;
I would say, the code become even worse.
My question is, is it possible to optimize stream wrapping? Maybe create array of delegates to wrap streams one by one and close them afterwards?
What's your experience on such tasks, is it possible to make this code more maintainable? Since currently adding compressing, or signing or excluding armoring is pain...

Can I get a GZipStream for a file without writing to intermediate temporary storage?

Can I get a GZipStream for a file on disk without writing the entire compressed content to temporary storage? I'm currently using a temporary file on disk in order to avoid possible memory exhaustion using MemoryStream on very large files (this is working fine).
public void UploadFile(string filename)
{
using (var temporaryFileStream = File.Open("tempfile.tmp", FileMode.CreateNew, FileAccess.ReadWrite))
{
using (var fileStream = File.OpenRead(filename))
using (var compressedStream = new GZipStream(temporaryFileStream, CompressionMode.Compress, true))
{
fileStream.CopyTo(compressedStream);
}
temporaryFileStream.Position = 0;
Uploader.Upload(temporaryFileStream);
}
}
What I'd like to do is eliminate the temporary storage by creating GZipStream, and have it read from the original file only as the Uploader class requests bytes from it. Is such a thing possible? How might such an implementation be structured?
Note that Upload is a static method with signature static void Upload(Stream stream).
Edit: The full code is here if it's useful. I hope I've included all the relevant context in my sample above however.

Yes, this is possible, but not easily with any of the standard .NET stream classes. When I needed to do something like this, I created a new type of stream.
It's basically a circular buffer that allows one producer (writer) and one consumer (reader). It's pretty easy to use. Let me whip up an example. In the meantime, you can adapt the example in the article.
Later: Here's an example that should come close to what you're asking for.
using (var pcStream = new ProducerConsumerStream(BufferSize))
{
// start upload in a thread
var uploadThread = new Thread(UploadThreadProc(pcStream));
uploadThread.Start();
// Open the input file and attach the gzip stream to the pcStream
using (var inputFile = File.OpenRead("inputFilename"))
{
// create gzip stream
using (var gz = new GZipStream(pcStream, CompressionMode.Compress, true))
{
var bytesRead = 0;
var buff = new byte[65536]; // 64K buffer
while ((bytesRead = inputFile.Read(buff, 0, buff.Length)) != 0)
{
gz.Write(buff, 0, bytesRead);
}
}
}
// The entire file has been compressed and copied to the buffer.
// Mark the stream as "input complete".
pcStream.CompleteAdding();
// wait for the upload thread to complete.
uploadThread.Join();
// It's very important that you don't close the pcStream before
// the uploader is done!
}
The upload thread should be pretty simple:
void UploadThreadProc(object state)
{
var pcStream = (ProducerConsumerStream)state;
Uploader.Upload(pcStream);
}
You could, of course, put the producer on a background thread and have the upload be done on the main thread. Or have them both on background threads. I'm not familiar with the semantics of your uploader, so I'll leave that decision to you.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Decompressing gzipped ReadOnlyMemory<byte> before I do JsonDocument.Parse - c#

Related

How to perform a minimal-allocation conversion of 'Memory<T>' into 'Stream'?

Decompress Stream to String using SevenZipSharp

Convert List of bytes to memorystream without using ToArray()

Stream chain wrapping done in the right way

Can I get a GZipStream for a file without writing to intermediate temporary storage?

Categories

Resources