C# FlatBufferBuilder create String from Stream - c#

Suppose you need to read a large string from a stream and you want to put that string into a flatbuffer.
Currently what I do is read the stream into a string and then use the FlatbufferBuilder.CreateString(string s) function.
This works fine but it does have as a drawback that the string is copied and loaded into memory twice: once by reading it from the stream into the string; and then a second time the string is copied into the flatbuffer.
I was wondering if there is a way to fill the flatbuffer string directly from a stream?
For a more concrete example:
Suppose your flatbuffer schema looks like:
table Message
{
_Data: string;
}
root_type Message;
We can then create a flatbuffer like this (with myData a string)
var fbb = new FlatBufferBuilder(myData.Length);
var dataOffset = fbb.CreateString(myData);
var message = Message.CreateMessage(fbb, dataOffset);
Message.FinishMessageBuffer(fbb, message);
So the question is can we somehow do the same thing, where myData is a System.IO.Stream?
Obviously the following works, but I'd like to avoid first reading the Stream into memory.
using (var reader = new StreamReader(myStream)
{
var myData = reader.ReadToEnd();
var fbb = new FlatBufferBuilder(myData.Length);
var dataOffset = fbb.CreateString(myData);
var message = Message.CreateMessage(fbb, dataOffset);
Message.FinishMessageBuffer(fbb, message);
}

There is currently no way to avoid that copy twice, afaik.. it should be relatively simple to implement a version of CreateString that takes a stream and reduces it to one copy. You could have a go at that and open a PR on github with the result.

Related

Decompressing gzipped ReadOnlyMemory<byte> before I do JsonDocument.Parse

The websocket client is returning a ReadOnlyMemory<byte>.
The issue is that JsonDocument.Parse fails due to the fact that the buffer has been compressed. I've got to decompress it somehow before I parse it. How do I do that? I cannot really change the websocket library code.
What I want is something like public Func<ReadOnlyMemory<byte>> DataInterpreterBytes = () => which optionally decompresses these bytes out of this class. How do I do that? Is it possible to decompress ReadOnlyMemory<byte> and if the handler is unused to basically to do nothing.
private static string DecompressData(byte[] byteData)
{
using var decompressedStream = new MemoryStream();
using var compressedStream = new MemoryStream(byteData);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
deflateStream.CopyTo(decompressedStream);
decompressedStream.Position = 0;
using var streamReader = new StreamReader(decompressedStream);
return streamReader.ReadToEnd();
}
Snippet
private void OnMessageReceived(object? sender, MessageReceivedEventArgs e)
{
var timestamp = DateTime.UtcNow;
_logger.LogTrace("Message was received. {Message}", Encoding.UTF8.GetString(e.Message.Buffer.Span));
// We dispose that object later on
using var document = JsonDocument.Parse(e.Message.Buffer);
var tokenData = document.RootElement;
So, if you had a byte array, you'd do this:
private static JsonDocument DecompressData(byte[] byteData)
{
using var compressedStream = new MemoryStream(byteData);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
return JsonDocument.Parse(deflateStream);
}
This is similar to your snippet above, but no need for the intermediate copy: just read straight from the GzipStream. JsonDocument.Parse also has an overload that takes a stream, so you can use that and avoid yet another useless copy.
Unfortunately, you don't have a byte array, you have a ReadOnlyMemory<byte>. There is no way out of the box to create a memory stream out of a ReadOnlyMemory<byte>. Honestly, it feels like an oversight, like they forgot to put that feature into .NET.
So here are your options instead.
The first option is to just convert the ReadOnlyMemory<byte> object to an array with ToArray():
// assuming e.Message.Buffer is a ReadOnlyMemory<byte>
using var document = DecompressData(e.Message.Buffer.ToArray());
This is really straightforward, but remember it actually copies the data, so for large documents it might not be a good idea if you want to avoid using too much memory.
The second is to try and extract the underlying array from the memory. This can be achieved with MemoryMarshal.TryGetArray, which gives you an ArraySegment (but might fail if the memory isn't actually a managed array).
private static JsonDocument DecompressData(ReadOnlyMemory<byte> byteData)
{
if(MemoryMarshal.TryGetArray(byteData, out var segment))
{
using var compressedStream = new MemoryStream(segment.Array, segment.Offset, segment.Count);
// rest of the code goes here
}
else
{
// Welp, this memory isn't actually an array, so... tough luck?
}
}
The third way might feel dirty, but if you're okay with using unsafe code, you can just pin the memory's span and then use UnmanagedMemoryStream:
private static unsafe JsonDocument DecompressData(ReadOnlyMemory<byte> byteData)
{
fixed (byte* ptr = byteData.Span)
{
using var compressedStream = new UnmanagedMemoryStream(ptr, byteData.Length);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
return JsonDocument.Parse(deflateStream);
}
}
The other solution is to write your own Stream class that supports this. The Windows Community Toolkit has an extension method that returns a Stream wrapper around the memory object. If you're not okay with using an entire third party library just for that, you can probably just roll your own, it's not that much code.

Convert.ToBase64String throws 'System.OutOfMemoryException' for byte [] (file: large size)

I am trying to convert byte[] to base64 string format so that i can send that information to third party. My code as below:
byte[] ByteArray = System.IO.File.ReadAllBytes(path);
string base64Encoded = System.Convert.ToBase64String(ByteArray);
I am getting below error:
Exception of type 'System.OutOfMemoryException' was thrown. Can you
help me please ?
Update
I just spotted #PanagiotisKanavos' comment pointing to Is there a Base64Stream for .NET?. This does essentially the same thing as my code below attempts to achieve (i.e. allows you to process the file without having to hold the whole thing in memory in one go), but without the overhead/risk of self-rolled code / rather using a standard .Net library method for the job.
Original
The below code will create a new temporary file containing the Base64 encoded version of your input file.
This should have a lower memory footprint, since rather than doing all data at once, we handle it several bytes at a time.
To avoid holding the output in memory, I've pushed that back to a temp file, which is returned. When you later need to use that data for some other process, you'd need to stream it (i.e. so that again you're not consuming all of this data at once).
You'll also notice that I've used WriteLine instead of Write; which will introduce non base64 encoded characters (i.e. the line breaks). That's deliberate, so that if you consume the temp file with a text reader you can easily process it line by line.
However, you can amend per your needs.
void Main()
{
var inputFilePath = #"c:\temp\bigfile.zip";
var convertedDataPath = ConvertToBase64TempFile(inputFilePath);
Console.WriteLine($"Take a look in {convertedDataPath} for your converted data");
}
//inputFilePath = where your source file can be found. This is not impacted by the below code
//bufferSizeInBytesDiv3 = how many bytes to read at a time (divided by 3); the larger this value the more memory is required, but the better you'll find performance. The Div3 part is because we later multiple this by 3 / this ensures we never have to deal with remainders (i.e. since 3 bytes = 4 base64 chars)
public string ConvertToBase64TempFile(string inputFilePath, int bufferSizeInBytesDiv3 = 1024)
{
var tempFilePath = System.IO.Path.GetTempFileName();
using (var fileStream = File.Open(inputFilePath,FileMode.Open))
{
using (var reader = new BinaryReader(fileStream))
{
using (var writer = new StreamWriter(tempFilePath))
{
byte[] data;
while ((data = reader.ReadBytes(bufferSizeInBytesDiv3 * 3)).Length > 0)
{
writer.WriteLine(System.Convert.ToBase64String(data)); //NB: using WriteLine rather than Write; so when consuming this content consider removing line breaks (I've used this instead of write so you can easily stream the data in chunks later)
}
}
}
}
return tempFilePath;
}

Memory Issue in string C#

I have little test program
public class Test
{
public string Response { get; set; }
}
My console simply call Test class
class Program
{
static void Main(string[] args)
{
Test t = new Test();
using (StreamReader reader = new StreamReader("C:\\Test.txt"))
{
t.Response = reader.ReadToEnd();
}
t.Response = t.Response.Substring(0, 5);
Console.WriteLine(t.Response);
Console.Read();
}
}
I have appox 60 MB data in my Test.txt file. When the program get executes, it is taking lot of memory because string is immutable. What is the better way handle this kind of scenario using string.
I know that i can use string builder. but i have created this program to replicate a scenario in one of my production application which uses string.
when i tried with GC.Collect(), memory is released immediately. I am not sure whether i can call GC in code.
Please help. Thanks.
UPDATE:
I think i did not explain it clearly. sorry for the confusion.
I am just reading data from file to get huge data as don't want create 60MB of data in code.
My pain point is below line of code where i have huge data in Response field.
t.Response = t.Response.Substring(0, 5);
You could limit your reads to a block of bytes (buffer). Loop through and read the next block into your buffer and write that buffer out. This will prevent a large chunk of data being stored in memory.
using (StreamReader reader = new StreamReader(#"C:\Test.txt", true))
{
char[] buffer = new char[1024];
int idx = 0;
while (reader.ReadBlock(buffer, idx, buffer.Length) > 0)
{
idx += buffer.Length;
Console.Write(buffer);
}
}
Can you read your file line by line? If so, I would recommend calling:
IEnumerable<string> lines = File.ReadLines(path)
When you iterate this collection using
foreach(string line in lines)
{
// do something with line
}
the collection will be iterated using lazy evaluation. That means the entire contents of the file won't need to be kept in memory while you do something with each line.
StreamReader provides just version of Read that you looking for - Read(Char[], Int32, Int32) - which lets you pick out first characters of the stream. Alternatively you can read char-by-char with regular StreamReader.Read till you decided that you have enough.
var textBuffer = new char[5];
reader.ReadToEnd(textBuffer, 0, 5); // TODO: check if it actually read engough
t.Response = new string(textBuffer);
Note that if you know encoding of the stream you may use lower level reading as byte array and use System.Text.Encoding classes to construct strings with encoding yourself instead of relaying on StreamReader.

Stream chain wrapping done in the right way

I'm using Bouncy Castle cryptographic API for C# to create armored output of encrypted data.
The code looks ugly, in particular, like this:
string result = string.Empty;
using (var outputStream = new MemoryStream())
{
using (var armoredStream = AddArmorWrappingTo(outputStream))
{
using (var encryptedStream = AddEncryptionWrappingTo(armoredStream))
{
using (var literalStream = AddLiteralWrappingTo(encryptedStream))
{
using (var inputStream = new MemoryStream(input))
{
this.Write(inputStream, literalStream);
}
}
}
}
result = Encoding.ASCII.GetString(outputStream.ToArray());
}
return result;
The issue here is if I will need to add compression of the data, I cannot change this piece of code, I need to write new one instead, since compressing in Bouncy Castle's world done as one more stream wrapper around future output stream.
To work properly, the streams need to be wrapped in correct order, and closed properly, otherwise there will be no usable result of this operation.
In addition all these intermediate streams should also present (I cannot overwrite the same stream variable over and over).
I've created extension methods to stream wrapper creators, and it looks like this now:
string result = string.Empty;
Stream[] pack = new Stream[3];
var outputStream = new MemoryStream();
var inputStream = new MemoryStream(input);
pack[0] = outputStream.Armor();
pack[1] = pack[0].EncryptWith(PublicKey);
pack[2] = pack[1].SplitByLiterals();
this.Write(inputStream, pack[2]);
pack[2].Close();
pack[1].Close();
pack[0].Close();
result = Encoding.ASCII.GetString(outputStream.ToArray());
return result;
I would say, the code become even worse.
My question is, is it possible to optimize stream wrapping? Maybe create array of delegates to wrap streams one by one and close them afterwards?
What's your experience on such tasks, is it possible to make this code more maintainable? Since currently adding compressing, or signing or excluding armoring is pain...

Serialized data on tcpclient needs to state amount?

I have sent data as byte using TcpClient and I wanted to send my own class instead bytes of data.
By bytes of data, what I meant is that I am sending the data converted into bytes like this:
using (MemoryStream bufferStream = new MemoryStream())
{
using (BinaryWriter bufferData = new BinaryWriter(bufferStream))
{
// Simple PONG Action
bufferData.Write((byte)10);
}
_logger.Info("Received PING request, Sending PONG");
return bufferStream.ToArray();
}
And instead I would like to send it like this, without having to declare its size or w/e
public class MyCommunicationData
{
public ActionType Action { get; set; }
public Profile User { get; set; }
...
}
Normally, when I send my data as bytes the first 5 bytes I use to indicate the action and the message size.
But if I migrate to serialize all the data as a single class, do I still need to send what action and size it is or using serialized messages the client and server would know what to read etc or is there a way to do so I can send it without having to specify things out of the serialization object ?
Not sure if this matters here, I am using AsyncCallback to read and write to the network stream:
_networkStream = _client.tcpClient.GetStream();
_callbackRead = new AsyncCallback(_OnReadComplete);
_callbackWrite = new AsyncCallback(_OnWriteComplete);
Let me know if you need me to post any other functions.
If you use a text based serializer(for ex, Json), you can utilize StreamReader's ReadLine and StreamWriter's WriteLine (created from tcpClient.GetStream).
Your code would be something like
writer.WriteLine(JsonConvert.SerializeObject(commData))
and to get the data on the other end
var myobj = JsonConvert.DeserializeObject<MyCommunicationData>(reader.ReadLine())
--EDIT--
//**Server**
Task.Factory.StartNew(() =>
{
var reader = new StreamReader(tcpClient.GetStream());
var writer = new StreamReader(tcpClient.GetStream());
while (true)
{
var myobj = JsonConvert.DeserializeObject<MyCommunicationData>(reader.ReadLine());
//do work with obj
//write response to client
writer.WriteLine(JsonConvert.SerializeObject(commData));
}
},
TaskCreationOptions.LongRunning);

Categories

Resources