How to serialize very large files to a byte array? - c#

I have a custom object. One of the properties on the object is a byte array with the contents of a file. This file can be VERY large (800+ MB in some instances). Since using the JsonSerializer and XmlSerializer are out of the question (the resulting string is too large), I think going with a byte array is the next best option.
I've looked through some other solutions (like this) but currently have no luck with what I need. We're working out of .NET 5 and things like the BinaryFormatter are no-go.
Is it somehow possible to take my object and write it to a stream so I can deal with it in a byte array?
Thanks

Related

Read binary file using BinaryFormatter

I am making a dotnet app that needs to read a binary file from a 3rd party.
The file is containing a 516 byte header record/struct (a couple of long identifiers and a couple of fixed length char array strings) followed by a number of payload structs (240 bytes of integers and booleans and chars each).
I know I can read this file in dotnet using BinaryReader and deserialize the fields in the structs one by one.
I have poco/structs that correctly defines the properties needed for the 2 record types but I can't se anyway of letting BinaryFormatter know which type (and how much) to read from the stream next as the binder seems to be relying on typename being serialized along with record payload which they are not in this file.
I would like to know: Is there a way of doing this via the BinaryFormatter, deserializing poco's directly?

Fastest way to serialize C# object array into string

I am looking for the fastest way to serialize and deserialize a C# array of objects into a string...
Why a string and not a byte array? Well, I am working with a networking system (The Unity3d networking system to be specific) and they have placed a rather annoying restriction which does not allow the sending of byte arrays or custom types, two things I need (hard to explain my situation).
The simplest solution I have come up with for this is to serialize my custom types into a string, and then transmit that string as opposed to directly sending the object array.
So, that is the question! What is the fastest way to serialize an object array into a string? I would preferably like to avoid using voodoo characters (invisible/special characters), as I am not sure if Unity3d will cull them, but base64 encoding doesn't take full advantage of the allowed character spectrum. I am also worried about the efficiency of using base 64.
Obviously, since this is networking related, having the serialized data be as small as possible is a plus.
EDIT:
One possible way to do this would be to serialize to a byte array, and then pretend that that byte array is a string. The problem is, I am afraid that .Net 2.0, or Unity's networking system will end up culling some of the special or invisible characters created using this method... Something which very much needs to be avoided. I am hoping for a solution that has near or equal speed to this, but does not use any of the characters that are likely to be culled. (I have no idea what characters these are, but I have had bad experiences with Unity when it came to direct conversions to strings from byte arrays)
Json.Net is what I always use its simple and gets the job done in a human readable way. Json is about as lightweight as it gets and is widely used for sending data over the wire.
I'll give you this answer as accepted, but I suggest adding base64 encoding to your answer!
–Georges Oates Larsen
Thank you, and yes that is also a great option if readability is not an issue.
We use SoapFormatter so that the object be embedded in Javascript variables and otherwise be "safe" to pass around:
using (MemoryStream oStream = new MemoryStream())
{
(new SoapFormatter()).Serialize(oStream, theObject);
return Encoding.Default.GetString(oStream.ToArray());
}
using(MemoryStream s = new MemoryStream()) {
new BinaryFormatter().Serialize(s, obj);
return Convert.ToBase64String(s.ToArray());
}
and
using(MemoryStream s = new MemoryStream(Convert.FromBase64String(str))) {
return new BinaryFormatter().Deserialize(s);
}

How to mix Pinvoke WriteFile and BinaryFormatter usage of the same FileStream?

I want to use WriteFile to write big (~500mb) multidimensional array into file (because BinaryFormatter is very slow at writing big stuff and there is no other way in .Net framework to write multidimensiona byte arrays, only single bytes or single-dimensional arrays, and doing for loop and writing byte by byte is slow).
However, turns out, this is forbidden:
IOException
The OS handle's position is not what FileStream expected. Do not use a handle simultaneously in one FileStream and in Win32 code or another FileStream. This may cause data loss.
Is there any way around this, aside from re-opening the file stream each time I want to write using BinaryFormatter after I wrote using WriteFile?
(I understand this question has been abandoned and will be deleted soon.)
Firstly, WriteFile and BinaryFormatter just don't mix. WriteFile assumes you know the file format, i.e. the interpretation of the bytes that are written to the file. BinaryFormatter is a serializer, based on a file format that is internal to Microsoft .NET implementation (some would say proprietary, even though the information can be found online). As a consequence, you cannot even pass a file serialized by BinaryFormatter between Microsoft .NET and Mono C#.
Based on OP's description, it is clear that OP should not have used BinaryFormatter in the first place. Otherwise OP would be solely responsible for the loss (unrecoverability) of such data.
As Hans Passant commented, the performance of FileStream.Write should be able to match the Win32 call to WriteFile, asymptotically speaking. What this means is that the time overhead for each call can be modeled as alpha * numberOfBytesWritten + beta, where beta is the pure constant overhead per call. One can make this overhead relatively negligible by increasing the number of bytes written per call.
Given that we cannot directly pass a multidimensional C# array into WriteFile, here is the suggestion. Based on OP's comment, it is assumed the multidimensional array will have size byte[1024, 1024, 1024].
First, allocate a temporary 1D array of sufficient size. Typical recommendations range from 4KB to several MB, but that is only an optimization detail. For this example, we use 1MB = byte[1048576] because it nicely divides the total array size.
Then, we write a top-level for-loop over the outermost dimension.
In the next step, we use the System.Array.Copy utility function to copy the 1024 x 1024 bytes from the innermost two dimensions into the temporary 1D array. This relies on C# specification on multidimensional arrays, as documented on the System.Array.Copy function:
When copying between multidimensional arrays, the array behaves like a long one-dimensional array, where the rows (or columns) are conceptually laid end-to-end.
Once copied into the temporary 1D array, it can be written out to FileStream.Write.

C# export / write multidimension array to file (csv or whatever)

Hi Designing a program and i just wanted advise on writing a multiDim array to a file.
I am using XNA and have a multidimension array with a Vector3(x, y, z)'s in it.
There are hundred thousand if not millions of values, and i want to be able to save them in a file (saving the game level). I have no bias to one idea, i just need to store data...thats it! All the other game data like player stats etc etc i am using XMLSerializer and its working wonders.
Now i was playing with xml serializer alot and have learn that you cannot export MultiDim Arrays... so frustrating (but i am sure there is a good reason why - hopefully). I played with Jagged's with no luck.
Used System.IO.File.WriteAllText then quickly relised that is only for string... daahhh
Basically i think i need to go down the BinaryWrite method, re-writing my own serializer, over even try running a sql server to host the masses of data... stupid idea? please tell me and can you point me in the write direction. As i primarily have a web (php) background the thought of running a server that syncs data / level data is attractive to me... but might not be applicable here.
Thanks for anything,
Mal
You can just serialise the lot with the built-in .NET serialisers provided the objects in the array are serialisable (and IIRC Vector3s are).
void SerializeVector3Array(string filename, Vector3[,] array)
{
BinaryFormatter bf = new BinaryFormatter();
Stream s = File.Open(filename, FileMode.Create);
bf.Serialize(s, array);
s.Close();
}
Vector3[,] DeserializeVector3Array(string filename)
{
Stream s = File.Open(filename, FileMode.Open);
BinaryFormatter bf = new BinaryFormatter();
Vector3[,] array = (Vector3[,])bf.Deserialize(s);
s.Close();
return array;
}
That should be a rough template of what you're after.
Why dont you try Json Serialization? Json has less noise than XML, occupies less space when written to file especially if you do so without indenting, etc. It does not have trouble with arrays, dictionaries, dates and other objects as far as my experience with it goes.
I recommend using JSON.NET and if not, then look at this thread
If Json.net finds it difficult to serialize a library class with many private & static variables, then it is trivial to write a POCO class and map the library class essential properties to your POCO and serialize and map the POCO back and forth.

parsing binary file in C#

I have a binary file. i stored it in byte array. file size can be 20MB or more. then i want to parse or find particular value in the file. i am doing it by 2 ways ->
1. By converting full file in char array.
2. By converting full file in hex string.(i also have hex values)
what is best way to parse full file..or should i do in binary form. i am using vs-2005.
From the aspect of memory consumption, it would be best it you could parse it directly, on-the-fly.
Converting it to a char array in C# means effectively doubling it's size in memory (presuming you are converting each byte to a char), while hex string will take at least 4 times the size (C# chars are 16-bit unicode characters).
On the other hand, it you need to make many searches and parsing over an existing set of data repeatedly, you may benefit from having it stored in any form which suits your needs better.
What's stopping you from seaching in the byte[]?
IMHO, If you're simply searching for a byte of specified value, or several continous bytes, this is the easiest way and most efficient way to do it.
If I understood your question correctly you need to find strings which can contain any characters in a large binary file. Does the binary file contain text? If so do you know the encoding? If so you can use StreamReader class like so:
using (StreamReader sr = new StreamReader("C:\test.dat", System.Text.Encoding.UTF8))
{
string s = sr.ReadLine();
}
In any case I think it's much more efficient using some kind of stream access to the file, instead of loading it all to memory.
You could load it by chunks into the memory, and then use some pattern matching algorithm (like Knuth-Moris-Pratt or Karp-Rabin)

Categories

Resources