C# export / write multidimension array to file (csv or whatever) - c#

Hi Designing a program and i just wanted advise on writing a multiDim array to a file.
I am using XNA and have a multidimension array with a Vector3(x, y, z)'s in it.
There are hundred thousand if not millions of values, and i want to be able to save them in a file (saving the game level). I have no bias to one idea, i just need to store data...thats it! All the other game data like player stats etc etc i am using XMLSerializer and its working wonders.
Now i was playing with xml serializer alot and have learn that you cannot export MultiDim Arrays... so frustrating (but i am sure there is a good reason why - hopefully). I played with Jagged's with no luck.
Used System.IO.File.WriteAllText then quickly relised that is only for string... daahhh
Basically i think i need to go down the BinaryWrite method, re-writing my own serializer, over even try running a sql server to host the masses of data... stupid idea? please tell me and can you point me in the write direction. As i primarily have a web (php) background the thought of running a server that syncs data / level data is attractive to me... but might not be applicable here.
Thanks for anything,
Mal

You can just serialise the lot with the built-in .NET serialisers provided the objects in the array are serialisable (and IIRC Vector3s are).
void SerializeVector3Array(string filename, Vector3[,] array)
{
BinaryFormatter bf = new BinaryFormatter();
Stream s = File.Open(filename, FileMode.Create);
bf.Serialize(s, array);
s.Close();
}
Vector3[,] DeserializeVector3Array(string filename)
{
Stream s = File.Open(filename, FileMode.Open);
BinaryFormatter bf = new BinaryFormatter();
Vector3[,] array = (Vector3[,])bf.Deserialize(s);
s.Close();
return array;
}
That should be a rough template of what you're after.

Why dont you try Json Serialization? Json has less noise than XML, occupies less space when written to file especially if you do so without indenting, etc. It does not have trouble with arrays, dictionaries, dates and other objects as far as my experience with it goes.
I recommend using JSON.NET and if not, then look at this thread
If Json.net finds it difficult to serialize a library class with many private & static variables, then it is trivial to write a POCO class and map the library class essential properties to your POCO and serialize and map the POCO back and forth.

Related

How to serialize very large files to a byte array?

I have a custom object. One of the properties on the object is a byte array with the contents of a file. This file can be VERY large (800+ MB in some instances). Since using the JsonSerializer and XmlSerializer are out of the question (the resulting string is too large), I think going with a byte array is the next best option.
I've looked through some other solutions (like this) but currently have no luck with what I need. We're working out of .NET 5 and things like the BinaryFormatter are no-go.
Is it somehow possible to take my object and write it to a stream so I can deal with it in a byte array?
Thanks

Fastest way to serialize C# object array into string

I am looking for the fastest way to serialize and deserialize a C# array of objects into a string...
Why a string and not a byte array? Well, I am working with a networking system (The Unity3d networking system to be specific) and they have placed a rather annoying restriction which does not allow the sending of byte arrays or custom types, two things I need (hard to explain my situation).
The simplest solution I have come up with for this is to serialize my custom types into a string, and then transmit that string as opposed to directly sending the object array.
So, that is the question! What is the fastest way to serialize an object array into a string? I would preferably like to avoid using voodoo characters (invisible/special characters), as I am not sure if Unity3d will cull them, but base64 encoding doesn't take full advantage of the allowed character spectrum. I am also worried about the efficiency of using base 64.
Obviously, since this is networking related, having the serialized data be as small as possible is a plus.
EDIT:
One possible way to do this would be to serialize to a byte array, and then pretend that that byte array is a string. The problem is, I am afraid that .Net 2.0, or Unity's networking system will end up culling some of the special or invisible characters created using this method... Something which very much needs to be avoided. I am hoping for a solution that has near or equal speed to this, but does not use any of the characters that are likely to be culled. (I have no idea what characters these are, but I have had bad experiences with Unity when it came to direct conversions to strings from byte arrays)
Json.Net is what I always use its simple and gets the job done in a human readable way. Json is about as lightweight as it gets and is widely used for sending data over the wire.
I'll give you this answer as accepted, but I suggest adding base64 encoding to your answer!
–Georges Oates Larsen
Thank you, and yes that is also a great option if readability is not an issue.
We use SoapFormatter so that the object be embedded in Javascript variables and otherwise be "safe" to pass around:
using (MemoryStream oStream = new MemoryStream())
{
(new SoapFormatter()).Serialize(oStream, theObject);
return Encoding.Default.GetString(oStream.ToArray());
}
using(MemoryStream s = new MemoryStream()) {
new BinaryFormatter().Serialize(s, obj);
return Convert.ToBase64String(s.ToArray());
}
and
using(MemoryStream s = new MemoryStream(Convert.FromBase64String(str))) {
return new BinaryFormatter().Deserialize(s);
}

C# serialize large array to disk

I have a very large graph stored in a single dimensional array (about 1.1 GB) which I am able to store in memory on my machine which is running Windows XP with 2GB of ram and 2GB of virtual memory. I am able to generate the entire data set in memory, however when I try to serialize it to disk using the BinaryFormatter, the file size gets to about 50MB and then gives me an out of memory exception. The code I am using to write this is the same I use amongst all of my smaller problems:
StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(#"C:\temp\states.dat"))
{
bf.Serialize(file, diskReady);
}
The search algorithm is very lightweight, and I am able to perform searches on this graph with no problems once it is in memory.
I really have 3 questions:
Is there a more reliable way to
write a large data set to disk. I
guess you can define large as when
the size of the data set approaches
the amount of available memory,
though I am not sure how accurate
that is.
Should I move to a more database
centric approach?
Can anyone point me to some
literature on reading portions of a
large data set from a disk file in
C#?
Write entries to file yourself. One simple solution would be like:
StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(#"C:\temp\states.dat"))
{
foreach(StateInformation si in diskReady)
using(MemoryStream ms = new MemoryStream())
{
bf.Serialize(ms, diskReady);
byte[] ser = ms.ToArray();
int len = ser.Length;
file.WriteByte((byte) len & 0x000000FF);
file.WriteByte((byte) (len & 0x0000FF00) >> 8);
file.WriteByte((byte) (len & 0x00FF0000) >> 16);
file.WriteByte((byte) (len & 0x7F000000) >> 24);
file.Write(ser, 0, len);
}
}
No more than the memory for a single StateInformation object's memory is needed at a time, and to deserialise you read four bytes, construct the length, create a buffer of that size, fill it, and deserialise.
All of the above could be seriously optimised for speed, memory use and disk-size if you create a more specialised format, but the above goes to show the principle.
My experience of larger sets of information like this is to manually write it to disk, rather than using built in serialization.
This may not be pratical depending on how complex you're StateInformation class is, but if it is fairly simple you could write/read the binary data manually using a BinaryReader and BinaryWriter instead. These will allow you to read/write most value types directly to the stream, in an expected predetermined order dictated by your code.
This option should allow you to read/write your data quickly, although it is awkward if you then wish to add information into the StateInformation at a later date, or to take it out as you'll have to manage upgrading your files.
What is contained in StateInformation? Is it a class? struct?
If you are simply worried about an easy to use container format that is easily serializable to disk - created a typed DataSet, store the information into the DataSet, then use the WriteXml() method on the DataSet to persist it to disk. You can then create the empty DataSet, and then use ReadXml() to load the contents back into memory.
If StateInformation is in a struct with value types, you can look at MemoryMappedFile to store/use the contents of the array by referencing the file directly, treating it as memory. This approach is quite a bit more complicated than the DataSet, but has its own set of advantage.

.NET BinaryWriter.Write() Method -- Writing Multiple Datatypes Simultaneously

I'm using BinaryWriter to write records to a file. The records are comprised of a class with the following property datatypes.
Int32,
Int16,
Byte[],
Null Character
To write each record, I call BinaryWriter.Write four times--one for each datatype. This works fine but I'd like to know if there's any way to just call the BinaryWriter.Write() method a single time using all of these datatypes. The reasoning for this is that another program is reading my binary file and will occasionally only read part of a record because it starts reading between my write calls. Unfortunately, I don't have control over the code to the other program else I would modify the way it reads.
Add a .ToBinary() method to your class that returns byte[].
public byte[] ToBinary()
{
byte[] result= new byte[number_of_bytes_you_need];
// fill buf
return result;
}
In your calling code (approximate as I haven't compiled this)
stream.BinaryWrite(myObj.toBinary());
You're still writing each value independently, but it cleans up the code a little.
Also, as sindre suggested, consider using the serialization, as it makes it incredibly easy to recreate your objects from the file in question, and requires less effort than writing the file as you're attempting to.
Sync: you can't depend on any of these solutions to fix your file sync issue. Even if you manage to reduce your binaryWrite() call to a single statement, not using serialization or the .ToBinary() method I've outlined, bytes are still written sequentially by the framework. This is a limitation of the physical structure of the disk. If you have control over the file format, add a record length field written before any of the record data. In the app that's reading the file, make sure that you have record_length bytes before attempting to process the next record from the file. While you're at it, put this in a database. If you don't have control over the file format, you're kind of out of luck.
In keeping with you writing to a BinaryWriter, I would have the object create a binary record using a second BinaryWriter that is then written to the BinaryWriter. So on your class you could have a method like this:
public void WriteTo(BinaryWriter writer)
{
MemoryStream ms = new MemoryStream();
BinaryWriter bw = new BinaryWriter(ms);
bw.Write(value1);
bw.Write(value2);
bw.Write(value3);
bw.Write(value4);
writer.Write(ms.ToArray());
}
This would create a single record with the same format as you're already writing to the main BinaryWriter, just it would build it all at once then write it as a byte array.
Create a class of the record and use the binary formatter:
FileStream fs = new FileStream("file.dat", FileMode.Create);
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(fs, <insert instance of a class here>);
fs.Close();
I haven't done this myself so I'm not absolutely sure it would work, the class cannot contain any other data, that's for sure. If you have no luck with a class you could try a struct.
Edit:
Just came up with another possible solution, create a struct of your data and use the Buffer.BlockCopy function:
byte[] writeBuffer = new byte[sizeof(structure)];
structure[] strucPtr = new structure[1]; // must be an array, 1 element is enough though
strucPtr[0].item1 = 0213; // initialize all the members
// Copy the structure array into the byte array.
Buffer.BlockCopy(strucPtr, 0, writeBuffer, 0, writeBuffer.Length);
Now you can write the writeBuffer to file in one go.
Second edit:
I don't agree with the sync problems not beeing possible to solve. First of all, the data is written to the file in entire sectors, not one byte at a time. And the file is really not updated until you flush it, thus writing data and updating the file length. The best and safest thing to do is to open the file exclusively, write a record (or several), and close the file. That requires the reading applications to use a similiar manner to read the file (open ex, read, close), as well as handling "access denied" errors gracefully.
Anyhow, I'm quite sure this will perform better no matter what when your'e writing an entire record at a time.

Generating very large XML file with Linq-to-XML and Linq-to-SQL

I'm trying to do a dump to XML of a very large database (many gigabytes). I'm using Linq-to-SQL to get the data out of the database and Linq-to-XML to generate XML. I'm using XStreamingElement to keep memory use low. The job still allocates all available memory, however, before keeling over without having written any XML. The structure looks like this:
var foo =
new XStreamingElement("contracts",
<LinqtoSQL which fetches data>.Select(d =>
new XElement("contract",
... generate attributes etc...
using (StreamWriter sw = new StreamWriter("contracts.xml"))
{
using (XmlWriter xw = XmlWriter.Create(sw))
{
foo.WriteTo(xw);
}
}
I've also tried saving with:
foo.Save("contracts.xml", SaveOptions.DisableFormatting);
...to no avail.
Any clues?
How complex is the data? I'm not overly familiar with XStreamingElement, but I wonder if you might have more joy using XmlWriter directly? Especially for like data in a loop, it can be used pretty easily.
I would, however, have concerns over xml as the choice for this data. Is this s requirement? Or simply a convenient available format? In particular, it can be hard to parse that size of xml conveniently, as you'd have to use XmlReader (which is harder to get right than XmlWriter).
If you can use other formats, I'd advise it... a few leap to mind, but I won't babble on unless you mention that you'd be interested.
Sure, you only need one clue for that: don't do it. :-)
XML is not an adequate format for database dumps because it does not handle large amounts of data well.
All databases have some sort of "dump" utility to export their data in a format that can then be read into another database - that would be the way to go.
Right, "solved" the problem by chunking my data into sets of 10,000 items and writing them to separate XML files. Will ponder other data exchange format and buy a larger server.
I would still be mighty interesting if someone had figured out how to properly take advantage of XStreamingElement.

Categories

Resources