C# serialize large array to disk - c#

I have a very large graph stored in a single dimensional array (about 1.1 GB) which I am able to store in memory on my machine which is running Windows XP with 2GB of ram and 2GB of virtual memory. I am able to generate the entire data set in memory, however when I try to serialize it to disk using the BinaryFormatter, the file size gets to about 50MB and then gives me an out of memory exception. The code I am using to write this is the same I use amongst all of my smaller problems:
StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(#"C:\temp\states.dat"))
{
bf.Serialize(file, diskReady);
}
The search algorithm is very lightweight, and I am able to perform searches on this graph with no problems once it is in memory.
I really have 3 questions:
Is there a more reliable way to
write a large data set to disk. I
guess you can define large as when
the size of the data set approaches
the amount of available memory,
though I am not sure how accurate
that is.
Should I move to a more database
centric approach?
Can anyone point me to some
literature on reading portions of a
large data set from a disk file in
C#?

Write entries to file yourself. One simple solution would be like:
StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(#"C:\temp\states.dat"))
{
foreach(StateInformation si in diskReady)
using(MemoryStream ms = new MemoryStream())
{
bf.Serialize(ms, diskReady);
byte[] ser = ms.ToArray();
int len = ser.Length;
file.WriteByte((byte) len & 0x000000FF);
file.WriteByte((byte) (len & 0x0000FF00) >> 8);
file.WriteByte((byte) (len & 0x00FF0000) >> 16);
file.WriteByte((byte) (len & 0x7F000000) >> 24);
file.Write(ser, 0, len);
}
}
No more than the memory for a single StateInformation object's memory is needed at a time, and to deserialise you read four bytes, construct the length, create a buffer of that size, fill it, and deserialise.
All of the above could be seriously optimised for speed, memory use and disk-size if you create a more specialised format, but the above goes to show the principle.

My experience of larger sets of information like this is to manually write it to disk, rather than using built in serialization.
This may not be pratical depending on how complex you're StateInformation class is, but if it is fairly simple you could write/read the binary data manually using a BinaryReader and BinaryWriter instead. These will allow you to read/write most value types directly to the stream, in an expected predetermined order dictated by your code.
This option should allow you to read/write your data quickly, although it is awkward if you then wish to add information into the StateInformation at a later date, or to take it out as you'll have to manage upgrading your files.

What is contained in StateInformation? Is it a class? struct?
If you are simply worried about an easy to use container format that is easily serializable to disk - created a typed DataSet, store the information into the DataSet, then use the WriteXml() method on the DataSet to persist it to disk. You can then create the empty DataSet, and then use ReadXml() to load the contents back into memory.
If StateInformation is in a struct with value types, you can look at MemoryMappedFile to store/use the contents of the array by referencing the file directly, treating it as memory. This approach is quite a bit more complicated than the DataSet, but has its own set of advantage.

Related

Difficulty reading large file into byte array

I have a very large BMP file that I have to read in all at once because I need to reverse the bytes when writing it to a temp file. This BMP is 1.28GB, and I'm getting the "Out of memory" error. I can't read it completely (using ReadAllBytes) or using a buffer into a binary array because I can't initialize an array of that size. I also can't read it into a List (which I could then Reverse()) using a buffer because halfway through it runs out of memory.
So basically the question is, how do I read a very large file backwards (ie, starting at LastByte and ending at FirstByte) and then write that to disk?
Bonus: when writing the reversed file to disk, do not write the last 54 bytes.
With a StreamReader object, you can Seek (place the "cursor") to any particular byte, so you can use that to go over the entire file's contents in reverse.
Example:
const int bufferSize = 1024;
string fileName = 'yourfile.txt';
StreamReader myStream = new StreamReader(fileName);
myStream.BaseStream.Seek(bufferSize, SeekOrigin.End);
char[] bytes = new char[bufferSize];
while(myStream.BaseStream.Position > 0)
{
bytes.Initialize();
myStream.BaseStream.Seek(bufferSize, SeekOrigin.Current);
int bytesRead = myStream.Read(bytes, 0, bufferSize);
}
You can not normally handle so big files in .NET, due the implied memory limit for CLR applications and collections inside them neither for 32 nor for 64 platform.
For this you can use Memory Mapped File, to read a file directly from the disk, without loading it into the memory. One time memory mapping created move the reading pointer to end of the file and read backwards.
Hope this helps.
You can use Memory Mapped Files.
http://msdn.microsoft.com/en-us/library/vstudio/dd997372%28v=vs.100%29.aspx
Also, you can use FileStream and positioning on necessary position by stream.Seek(xxx, SeekOrigin.Begin) (relative position) or Position property (absolute position).

C# export / write multidimension array to file (csv or whatever)

Hi Designing a program and i just wanted advise on writing a multiDim array to a file.
I am using XNA and have a multidimension array with a Vector3(x, y, z)'s in it.
There are hundred thousand if not millions of values, and i want to be able to save them in a file (saving the game level). I have no bias to one idea, i just need to store data...thats it! All the other game data like player stats etc etc i am using XMLSerializer and its working wonders.
Now i was playing with xml serializer alot and have learn that you cannot export MultiDim Arrays... so frustrating (but i am sure there is a good reason why - hopefully). I played with Jagged's with no luck.
Used System.IO.File.WriteAllText then quickly relised that is only for string... daahhh
Basically i think i need to go down the BinaryWrite method, re-writing my own serializer, over even try running a sql server to host the masses of data... stupid idea? please tell me and can you point me in the write direction. As i primarily have a web (php) background the thought of running a server that syncs data / level data is attractive to me... but might not be applicable here.
Thanks for anything,
Mal
You can just serialise the lot with the built-in .NET serialisers provided the objects in the array are serialisable (and IIRC Vector3s are).
void SerializeVector3Array(string filename, Vector3[,] array)
{
BinaryFormatter bf = new BinaryFormatter();
Stream s = File.Open(filename, FileMode.Create);
bf.Serialize(s, array);
s.Close();
}
Vector3[,] DeserializeVector3Array(string filename)
{
Stream s = File.Open(filename, FileMode.Open);
BinaryFormatter bf = new BinaryFormatter();
Vector3[,] array = (Vector3[,])bf.Deserialize(s);
s.Close();
return array;
}
That should be a rough template of what you're after.
Why dont you try Json Serialization? Json has less noise than XML, occupies less space when written to file especially if you do so without indenting, etc. It does not have trouble with arrays, dictionaries, dates and other objects as far as my experience with it goes.
I recommend using JSON.NET and if not, then look at this thread
If Json.net finds it difficult to serialize a library class with many private & static variables, then it is trivial to write a POCO class and map the library class essential properties to your POCO and serialize and map the POCO back and forth.

.NET BinaryWriter.Write() Method -- Writing Multiple Datatypes Simultaneously

I'm using BinaryWriter to write records to a file. The records are comprised of a class with the following property datatypes.
Int32,
Int16,
Byte[],
Null Character
To write each record, I call BinaryWriter.Write four times--one for each datatype. This works fine but I'd like to know if there's any way to just call the BinaryWriter.Write() method a single time using all of these datatypes. The reasoning for this is that another program is reading my binary file and will occasionally only read part of a record because it starts reading between my write calls. Unfortunately, I don't have control over the code to the other program else I would modify the way it reads.
Add a .ToBinary() method to your class that returns byte[].
public byte[] ToBinary()
{
byte[] result= new byte[number_of_bytes_you_need];
// fill buf
return result;
}
In your calling code (approximate as I haven't compiled this)
stream.BinaryWrite(myObj.toBinary());
You're still writing each value independently, but it cleans up the code a little.
Also, as sindre suggested, consider using the serialization, as it makes it incredibly easy to recreate your objects from the file in question, and requires less effort than writing the file as you're attempting to.
Sync: you can't depend on any of these solutions to fix your file sync issue. Even if you manage to reduce your binaryWrite() call to a single statement, not using serialization or the .ToBinary() method I've outlined, bytes are still written sequentially by the framework. This is a limitation of the physical structure of the disk. If you have control over the file format, add a record length field written before any of the record data. In the app that's reading the file, make sure that you have record_length bytes before attempting to process the next record from the file. While you're at it, put this in a database. If you don't have control over the file format, you're kind of out of luck.
In keeping with you writing to a BinaryWriter, I would have the object create a binary record using a second BinaryWriter that is then written to the BinaryWriter. So on your class you could have a method like this:
public void WriteTo(BinaryWriter writer)
{
MemoryStream ms = new MemoryStream();
BinaryWriter bw = new BinaryWriter(ms);
bw.Write(value1);
bw.Write(value2);
bw.Write(value3);
bw.Write(value4);
writer.Write(ms.ToArray());
}
This would create a single record with the same format as you're already writing to the main BinaryWriter, just it would build it all at once then write it as a byte array.
Create a class of the record and use the binary formatter:
FileStream fs = new FileStream("file.dat", FileMode.Create);
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(fs, <insert instance of a class here>);
fs.Close();
I haven't done this myself so I'm not absolutely sure it would work, the class cannot contain any other data, that's for sure. If you have no luck with a class you could try a struct.
Edit:
Just came up with another possible solution, create a struct of your data and use the Buffer.BlockCopy function:
byte[] writeBuffer = new byte[sizeof(structure)];
structure[] strucPtr = new structure[1]; // must be an array, 1 element is enough though
strucPtr[0].item1 = 0213; // initialize all the members
// Copy the structure array into the byte array.
Buffer.BlockCopy(strucPtr, 0, writeBuffer, 0, writeBuffer.Length);
Now you can write the writeBuffer to file in one go.
Second edit:
I don't agree with the sync problems not beeing possible to solve. First of all, the data is written to the file in entire sectors, not one byte at a time. And the file is really not updated until you flush it, thus writing data and updating the file length. The best and safest thing to do is to open the file exclusively, write a record (or several), and close the file. That requires the reading applications to use a similiar manner to read the file (open ex, read, close), as well as handling "access denied" errors gracefully.
Anyhow, I'm quite sure this will perform better no matter what when your'e writing an entire record at a time.

Preallocating file space in C#?

I am creating a downloading application and I wish to preallocate room on the harddrive for the files before they are actually downloaded as they could potentially be rather large, and noone likes to see "This drive is full, please delete some files and try again." So, in that light, I wrote this.
// Quick, and very dirty
System.IO.File.WriteAllBytes(filename, new byte[f.Length]);
It works, atleast until you download a file that is several hundred MB's, or potentially even GB's and you throw Windows into a thrashing frenzy if not totally wipe out the pagefile and kill your systems memory altogether. Oops.
So, with a little more enlightenment, I set out with the following algorithm.
using (FileStream outFile = System.IO.File.Create(filename))
{
// 4194304 = 4MB; loops from 1 block in so that we leave the loop one
// block short
byte[] buff = new byte[4194304];
for (int i = buff.Length; i < f.Length; i += buff.Length)
{
outFile.Write(buff, 0, buff.Length);
}
outFile.Write(buff, 0, f.Length % buff.Length);
}
This works, well even, and doesn't suffer the crippling memory problem of the last solution. It's still slow though, especially on older hardware since it writes out (potentially GB's worth of) data out to the disk.
The question is this: Is there a better way of accomplishing the same thing? Is there a way of telling Windows to create a file of x size and simply allocate the space on the filesystem rather than actually write out a tonne of data. I don't care about initialising the data in the file at all (the protocol I'm using - bittorrent - provides hashes for the files it sends, hence worst case for random uninitialised data is I get a lucky coincidence and part of the file is correct).
FileStream.SetLength is the one you want. The syntax:
public override void SetLength(
long value
)
If you have to create the file, I think that you can probably do something like this:
using (FileStream outFile = System.IO.File.Create(filename))
{
outFile.Seek(<length_to_write>-1, SeekOrigin.Begin);
OutFile.WriteByte(0);
}
Where length_to_write would be the size in bytes of the file to write. I'm not sure that I have the C# syntax correct (not on a computer to test), but I've done similar things in C++ in the past and it's worked.
Unfortunately, you can't really do this just by seeking to the end. That will set the file length to something huge, but may not actually allocate disk blocks for storage. So when you go to write the file, it will still fail.

What is the most efficient way to save a byte array as a file on disk in C#?

Pretty simple scenario. I have a web service that receives a byte array that is to be saved as a particular file type on disk. What is the most efficient way to do this in C#?
That would be File.WriteAllBytes().
System.IO.File.WriteAllBytes(path, data) should do fine.
And WriteAllBytes just performs
using (FileStream stream = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.Read))
{
stream.Write(bytes, 0, bytes.Length);
}
BinaryWriter has a misleading name, it's intended for writing primitives as a byte representations instead of writing binary data. All its Write(byte[]) method does is perform Write() on the stream its using, in this case a FileStream.
Not sure what you mean by "efficient" in this context, but I'd use System.IO.File.WriteAllBytes(string path, byte[] bytes) - Certainly efficient in terms of LOC.
I had a similar problem dumping a 300 MB Byte array to a disk file...
I used StreamWriter, and it took me a good 30 minutes to dump the file.
Using FilePut took me arround 3-4 minutes, and when I used BinaryWriter, the file was dumped in 50-60 seconds.
If you use BinaryWriter you will have a better performance.
Perhaps the System.IO.BinaryWriter and BinaryReader classes would help.
http://msdn.microsoft.com/en-us/library/system.io.binarywriter.aspx
"Writes primitive types in binary to a stream and supports writing strings in a specific encoding."
http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx
"Reads primitive data types as binary values in a specific encoding."
Actually, the most efficient way would be to stream the data and to write it as you receive it. WCF supports streaming so this may be something you'd want to look into. This is particularly important if you're doing this with large files, since you almost certainly don't want the file contents in memory on both the server and client.

Categories

Resources