Process very large XML file - c#

I need to process an XML file with the following structure:
<FolderSizes>
<Version></Version>
<DateTime Un=""></DateTime>
<Summary>
<TotalSize Bytes=""></TotalSize>
<TotalAllocated Bytes=""></TotalAllocated>
<TotalAvgFileSize Bytes=""></TotalAvgFileSize>
<TotalFolders Un=""></TotalFolders>
<TotalFiles Un=""></TotalFiles>
</Summary>
<DiskSpaceInfo>
<Drive Type="" Total="" TotalBytes="" Free="" FreeBytes="" Used=""
UsedBytes=""><![CDATA[ ]]></Drive>
</DiskSpaceInfo>
<Folder ScanState="">
<FullPath Name=""><![CDATA[ ]]></FullPath>
<Attribs Int=""></Attribs>
<Size Bytes=""></Size>
<Allocated Bytes=""></Allocated>
<AvgFileSz Bytes=""></AvgFileSz>
<Folders Un=""></Folders>
<Files Un=""></Files>
<Depth Un=""></Depth>
<Created Un=""></Created>
<Accessed Un=""></Accessed>
<LastMod Un=""></LastMod>
<CreatedCalc Un=""></CreatedCalc>
<AccessedCalc Un=""></AccessedCalc>
<LastModCalc Un=""></LastModCalc>
<Perc><![CDATA[ ]]></Perc>
<Owner><![CDATA[ ]]></Owner>
<!-- Special element; see paragraph below -->
<Folder></Folder>
</Folder>
</FolderSizes>
The <Folder> element is special in that it repeats within the <FolderSizes> element but can also appear within itself; I reckon up to about 5 levels.
The problem is that the file is really big at a whopping 11GB so I'm having difficulty processing it - I have experience with XML documents, but nothing on this scale.
What I would like to do is to import the information into a SQL database because then I will be able to process the information in any way necessary without having to concern myself with this immense, impractical file.
Here are the things I have tried:
Simply load the file and attempt to process it with a simple C# program using an XmlDocument or XDocument object
Before I even started I knew this would not work, as I'm sure everyone would agree, but I tried it anyway, and ran the application on a VM (since my notebook only has 4GB RAM) with 30GB memory. The application ended up using 24GB memory, and taking very, very long, so I just cancelled it.
Attempt to process the file using an XmlReader object
This approach worked better in that it didn't use as much memory, but I still had a few problems:
It was taking really long because I was reading the file one line at a time.
Processing the file one line at a time makes it difficult to really work with the data contained in the XML because now you have to detect the start of a tag, and then the end of that tag (hopefully), and then create a document from that information, read the info, attempt to determine which parent tag it belongs to because we have multiple levels... Sound prone to problems and errors
Did I mention it takes really long reading the file one line at a time; and that still without actually processing that line - literally just reading it.
Import the information using SQL Server
I created a stored procedure using XQuery and running it recursively within itself processing the <Folder> elements. This went quite well - I think better than the other two approaches - until one of the <Folder> elements ended up being rather big, producing a An XML operation resulted an XML data type exceeding 2GB in size. Operation aborted. error. I read up about it and I don't think it's an adjustable limit.
Here are more things I think I should try:
Re-write my C# application to use unmanaged code
I don't have much experience with unmanaged code, so I'm not sure how well it will work and how to make it as unmanaged as possible.
I once wrote a little application that works with my webcam, receiving the image, inverting the colours, and painting it to a panel. Using normal managed code didn't work - the result was about 2 frames per second. Re-writing the colour inversion method to use unmanaged code solved the problem. That's why I thought that unmanaged might be a solution.
Rather go for C++ in stead of C#
Not sure if this is really a solution. Would it necessarily be better that C#? Better than unmanaged C#?
The problem here is that I haven't actually worked with C++ before, so I'll need to get to know a few things about C++ before I can really start working with it, and then probably not very efficiently yet.
I thought I'd ask for some advice before I go any further, possibly wasting my time.
Thanks in advance for you time and assistance.
EDIT
So before I start processing the file I run through it and check the size in a attempt to provide the user with feedback as to how long the processing might take; I made a screenshot of the calculation:
That's about 1500 lines per second; if the average line length is about 50 characters, that's 50 bytes per line, that's 75 kilobytes per second, for an 11GB file should take about 40 hours, if my maths is correct. But this is only stepping each line. It's not actually processing the line or doing anything with it, so when that starts, the processing rate drops significantly.
This is the method that runs during the size calculation:
private int _totalLines = 0;
private bool _cancel = false; // set to true when the cancel button is clicked
private void CalculateFileSize()
{
xmlStream = new StreamReader(_filePath);
xmlReader = new XmlTextReader(xmlStream);
while (xmlReader.Read())
{
if (_cancel)
return;
if (xmlReader.LineNumber > _totalLines)
_totalLines = xmlReader.LineNumber;
InterThreadHelper.ChangeText(
lblLinesRemaining,
string.Format("{0} lines", _totalLines));
string elapsed = string.Format(
"{0}:{1}:{2}:{3}",
timer.Elapsed.Days.ToString().PadLeft(2, '0'),
timer.Elapsed.Hours.ToString().PadLeft(2, '0'),
timer.Elapsed.Minutes.ToString().PadLeft(2, '0'),
timer.Elapsed.Seconds.ToString().PadLeft(2, '0'));
InterThreadHelper.ChangeText(lblElapsed, elapsed);
if (_cancel)
return;
}
xmlStream.Dispose();
}
Still runnig, 27 minutes in :(

you can read an XML as a logical stream of elements instead of trying to read it line-by-line and piece it back together yourself. see the code sample at the end of this article
also, your question has already been asked here

Related

Why is my encoding showing twice?

byte[] lengthBytes = new byte[4];
serverStream.Read(lengthBytes, 0, 4);
MessageBox.Show("'>>" + System.Text.Encoding.UTF8.GetString(lengthBytes) + "<<'");
MessageBox.Show("Hello");
This is the code I used for debugging. I get 2 messageboxes now. If I used Debug.WriteLine it was also printed twice.
Msgbox 1: '>>/ (Note that this is still 4 characters long, the last 3 bytes are null.
Msgbox 2: '>>{"ac<<'
Msgbox 3: Hello
I'm trying to send 4 bytes with an integer, the length of the message. This is going fine ('/ ' is utf8 for 47). The problem is that the first 4 bytes of the message are also being read ('{"ac'). I totally dont know how this happens, I'm already debugging this for several hours and I just can't get my head around it. One of my friends suggested to make an account on StackOverflow so here I am :p
Thanks for all the help :)
EDIT: The real code for the people who asked
My code http://kutj.es/2ah-j9
You are making traditional programmer mistakes, everybody has to make them once to learn how to avoid it and do it right. This primarily went off the rails by writing debugging code that is buggy and made it lot harder to find your mistake:
Never write debugging code that uses MessageBox.Show(). It is a very, very evil function, it causes re-entrancy. And expensive word that means that it only freezes the user interface, it doesn't freeze your program. It continues to run, one of the things that can go wrong is that the code that you posted is executed again. Re-entered. You'll see two message boxes. And you'll have a completely corrupted program state because your code was never written to assume it could be re-entered. Which is why you complained that 4 bytes of data were swallowed.
The proper tool to use here is the feature that really freezes your program. A debugger breakpoint.
Never assume that binary data can be converted to text. Those 4 bytes you received contain binary zeros. There is no character for it. Worse, it acts as a string terminator to many operating system calls, the kind used by the debugger, Debug.WriteLine() etc. This is why you can't see the "<<"
The proper tool to use here is a debugger watch or tooltip, it lets you look into the array directly. If you absolutely have to generate a diagnostic string then use BitConverter.GetString().
Never assume that a stream's Read() method will always return the number of bytes you asked for. Using the return value in your code is a hard requirement. This is the real bug in your program, the only you are actually trying to fix.
The proper solution is to continue to call Read() until you counted down the number of bytes you expected to receive from the length you read earlier. You'll need a MemoryStream to store the chunks of byte[]s you get.
Perhaps this link regarding Encoding.GetString() will help you out a bit. The part to pay attention to being:
If the data to be converted is available only in sequential blocks
(such as data read from a stream) or if the amount of data is so large
that it needs to be divided into smaller blocks, you should use the
Decoder object returned by the GetDecoder method of a derived class.
The problem was that I started the getMessage void 2 times. This started the while 2 times (in different threads).
Elgonzo helped me finding the problem, he is a great guy :)

"Where are my bytes?" or Investigation of file length traits

This is a continuation of my question about downloading files in chunks. The explanation will be quite big, so I'll try to divide it to several parts.
1) What I tried to do?
I was creating a download manager for a Window-Phone application. First, I tried to solve the problem of downloading
large files (the explanation is in the previous question). No I want to add "resumable download" feature.
2) What I've already done.
At the current moment I have a well-working download manager, that allows to outflank the Windows Phone RAM limit.
The plot of this manager, is that it allows to download small chunks of file consequently, using HTTP Range header.
A fast explanation of how it works:
The file is downloaded in chunks of constant size. Let's call this size "delta". After the file chunk was downloaded,
it is saved to local storage (hard disk, on WP it's called Isolated Storage) in Append mode (so, the downloaded byte array is
always added to the end of the file). After downloading a single chunk the statement
if (mediaFileLength >= delta) // mediaFileLength is a length of downloaded chunk
is checked. If it's true, that
means, there's something left for download and this method is invoked recursively. Otherwise it means, that this chunk
was last, and there's nothing left to download.
3) What's the problem?
Until I used this logic at one-time downloads (By one-time I mean, when you start downloading file and wait until the download is finished)
that worked well. However, I decided, that I need "resume download" feature. So, the facts:
3.1) I know, that the file chunk size is a constant.
3.2) I know, when the file is completely downloaded or not. (that's a indirect result of my app logic,
won't weary you by explanation, just suppose, that this is a fact)
On the assumption of these two statements I can prove, that the number of downloaded chunks is equal to
(CurrentFileLength)/delta. Where CurrentFileLenght is a size of already downloaded file in bytes.
To resume downloading file I should simply set the required headers and invoke download method. That seems logic, isn't it? And I tried to implement it:
// Check file size
using (IsolatedStorageFileStream fileStream = isolatedStorageFile.OpenFile("SomewhereInTheIsolatedStorage", FileMode.Open, FileAccess.Read))
{
int currentFileSize = Convert.ToInt32(fileStream.Length);
int currentFileChunkIterator = currentFileSize / delta;
}
And what I see as a result? The downloaded file length is equal to 2432000 bytes (delta is 304160, Total file size is about 4,5 MB, we've downloaded only half of it). So the result is
approximately 7,995. (it's actually has long/int type, so it's 7 and should be 8 instead!) Why is this happening?
Simple math tells us, that the file length should be 2433280, so the given value is very close, but not equal.
Further investigations showed, that all values, given from the fileStream.Length are not accurate, but all are close.
Why is this happening? I don't know precisely, but perhaps, the .Length value is taken somewhere from file metadata.
Perhaps, such rounding is normal for this method. Perhaps, when the download was interrupted, the file wasn't saved totally...(no, that's real fantastic, it can't be)
So the problem is set - it's "How to determine number of the chunks downloaded". Question is how to solve it.
4) My thoughts about solving the problem.
My first thought was about using maths here. Set some epsilon-neiborhood and use it in currentFileChunkIterator = currentFileSize / delta; statement.
But that will demand us to remember about type I and type II errors (or false alarm and miss, if you don't like the statistics terms.) Perhaps, there's nothing left to download.
Also, I didn't checked, if the difference of the provided value and the true value is supposed to grow permanently
or there will be cyclical fluctuations. With the small sizes (about 4-5 MB) I've seen only growth, but that doesn't prove anything.
So, I'm asking for help here, as I don't like my solution.
5) What I would like to hear as answer:
What causes the difference between real value and received value?
Is there a way to receive a true value?
If not, is my solution good for this problem?
Are there other better solutions?
P.S. I won't set a Windows-Phone tag, because I'm not sure that this problem is OS-related. I used the Isolated Storage Tool
to check the size of downloaded file, and it showed me the same as the received value(I'm sorry about Russian language at screenshot):
I'm answering to your update:
This is my understanding so far: The length actually written to the file is more (rounded up to the next 1KiB) than you actually wrote to it. This causes your assumption of "file.Length == amount downloaded" to be wrong.
One solution would be to track this information separately. Create some meta-data structure (which can be persisted using the same storage mechanism) to accurately track which blocks have been downloaded, as well as the entire size of the file:
[DataContract] //< I forgot how serialization on the phone works, please forgive me if the tags differ
struct Metadata
{
[DataMember]
public int Length;
[DataMember]
public int NumBlocksDownloaded;
}
This would be enough to reconstruct which blocks have been downloaded and which have not, assuming that you keep downloading them in a consecutive fashion.
edit
Of course you would have to change your code from a simple append to moving the position of the stream to the correct block, before writing the data to the stream:
file.Position = currentBlock * delta;
file.Write(block, 0, block.Length);
Just as a possible bug. Dont forget to verify if the file was modified during requests. Specialy during long time between ones, that can occor on pause/resume.
The error could be big, like the file being modified to small size and your count getting "erronic", and the file being the same size but with modified contents, this will leave a corrupted file.
Have you heard an anecdote about a noob-programmer and 10 guru-programmers? Guru programmers were trying to find an error in his solution, and noob had already found it, but didn't tell about it, as it was something that stupid, we was afraid to be laughed at.
Why I remembered this? Because the situation is similar.
The explanation of my question was very heavy, and I decided not to mention some small aspects, that I was sure, worked correctly. (And they really worked correctly)
One of this small aspects, was the fact, that the downloaded file was encrypted via AES PKCS7 padding. Well, the decryption worked correctly, I knew it, so why should I mention it? And I didn't.
So, then I tried to find out, what exactly causes the error with the last chunk. The most credible version was about problems with buffering, and I tried to find, where am I leaving the missing bytes. I tested again and again, but I couldn't find them, as every chunk was saving without any losses. And one day I comprehended:
There is no spoon
There is no error.
What's the point of AES PKCS7? Well, the primary one is that it makes the decrypted file smaller. Not much, only at 16 bytes. And it was considered in my decryption method and download method, so there should be no problem, right?
But what happens, when the download process interrupts? The last chunk will save correctly, there will be no errors with buffering or other ones. And then we want to continue download. The number of the downloaded chunks will be equal to currentFileChunkIterator = currentFileSize / delta;
And here I should ask myself: "Why are you trying to do something THAT stupid?"
"Your downloaded one chunk size is not delta. Actually, it's less than delta". (the decryption makes chunk smaller to 16 bytes, remember?)
The delta itself consists of 10 equal parts, that are being decrypted. So we should divide not by delta, but by (delta - 16 * 10) which is (304160 - 160) = 304000.
I sense a rat here. Let's try to find out the number of the downloaded chunks:
2432000 / 304000 = 8. Wait... OH SHI~
So, that's the end of story.
The whole solution logic was right.
The only reason it failed, was my thought, that, for some reason, the downloaded decrypted file size should be the same as the sum of downloaded encrypted chunks.
And, of course, as I didn't mention about the decryption(it's mentioned only in previous question, which is only linked), none of you could give me a correct answer. I'm terribly sorry about that.
In continue to my comment..
The original file size as I understand from your description is 2432000 bytes.
The Chunk size is set to 304160 bytes (or 304160 per "delta").
So, the machine which send the file was able to fill 7 chunks and sent them.
The receiving machine now has 7 x 304160 bytes = 2129120 bytes.
The last chunk will not be filled to the end as there is not enough bytes left to fill to it.. so it will contain: 2432000 - 2129120 = 302880 which is less than 304160
If you add the numbers you will get 7x304160 + 1x302880 = 2432000 bytes
So according to that the original file transferred in full to the destination.
The problem is that you are calculating 8x304160 = 2433280 insisting that even the last chunk must be filled completely - but with what?? and why??
In humble.. are you locked in some kind of math confusion or did I misunderstand your problem?
Please answer, What is the original file size and what size is being received at the other end? (totals!)

c# XML or alternative

I am developing a program to log data from a incoming serial communication. I have to invoke the serial box by sending a command, to recieve something. All this works fine, but i have a problem.
The program have to be run from a netbook ( approx: 1,5 gHZ, 2 gig ram ), and it can't keep up when i ask it to save these information to a XML file.
I am only getting communication every 5 second, i am not reading the file anywhere.
I use xml.save(string filename) to save the file.
Is there another, better way, to save the information to my XML, or should i use an alternative?
If i should use an alternative, which should it be?
Edit:
Added some code:
XmlDocument xml = new XmlDocument();
xml.Load(logFile);
XmlNode p = xml.GetElementsByTagName("records")[0];
for (int i = 0; i < newDat.Length; i++)
{
XmlNode q = xml.CreateElement("record");
XmlNode a = xml.CreateElement("time");
XmlNode b = xml.CreateElement("temp");
XmlNode c = xml.CreateElement("addr");
a.AppendChild(xml.CreateTextNode(outDat[i, 0]));
b.AppendChild(xml.CreateTextNode(outDat[i, 1]));
c.AppendChild(xml.CreateTextNode(outDat[i, 2]));
sendTime = outDat[i, 0];
points.Add(outDat[i, 2], outDat[i, 1]);
q.AppendChild(a);
q.AppendChild(b);
q.AppendChild(c);
p.AppendChild(q);
}
xml.AppendChild(p);
xml.Save(this.logFile);
This is the XML related code, running once every 5 seconds. I am reading (I get no error), adding some childs, and then saving it again. It is when I save that I get the error.
You may want to look at using an XMLWriter and building the XML file by hand. That would allow you to open a file and keep it open for the duration of the logging, appending one XML fragment at a time, as you read in data. The XMLReader class is optimized for forward-only writing to an XMLStream.
The above approach should be much faster when compared to using the Save method to serialize (save) a full XML document each time you read data and when you really only want to append a new fragment at the end.
EDIT
Based on the code sample you posted, it's the Load and Save that's causing the unnecessary performance bottleneck. Every time you're adding a log entry you're essentially loading the full XML document and behind the scenes parsing it into a full-blown XML tree. Then you modify the tree (by adding nodes) and then serialize it all to disk again. This is very very counter productive.
My proposed solution is really the way to go: create and open the log file only once; then use an XMLWriter to write out the XML elements one by one, each time you read new data; this way you're not holding the full contents of the XML log in memory and you're only appending small chunks of data at the end of a file - which should be unnoticeable in terms of overhead; at the end, simply close the root XML tag, close the XMLWriter and close the file. That's it! This is guaranteed to not slow down your UI even if you implement it synchronously, on the UI thread.
While not a direct answer to your question, it sounds like you're doing everything in a very linear way:
Receive command
Modify in memory XML
Save in memory XML to disk
GoTo 1
I would suggest you look into using some threading, or possibly Task's to make this more asynchronous. This would certainly be more difficult, and you would have to wrestle with the task synchronization, but in the long run it's going to perform a lot better.
I would look at having a thread (possibly the main thread, not sure if you're using WinForms, a console app or what) that receives the command, and posts the "changes" to a holding class. Then have a second thread, which periodically polls this holding class and checks it for a "Dirty" state. When it detects this state, it grabs a copy of the XML and saves it to disk.
This allows your serial communication to continue uninterrupted, regardless of how poorly the hardware you're running on performs.
Normally for log files one picks append-friendly format, otherwise you have to re-parse whole file every time you need to append new record and save the result. Plain text CSV is likely the simplest option.
One other option if you need to have XML-like file is to store list of XML fragments instead of full XML. This way you still can use XML API (XmlReader can read fragments when specifying ConformanceLevel.Frament in XmlReaderSettings of XmlReader.Create call), but you don't need to re-read whole document to append new entry - simple file-level append is enough. I.e. WCF logs are written this way.
The answer from #Miky Dinescu is one technique for doing this if your output must be an XML formatted file. The reason why is that you are asking it to completed load and reparse the entire XML file every single time you add another entry. Loading and parsing the XML file becomes more and more IO, memory, and CPU intensive the bigger the file gets. So it doesn't take long before the amount of overhead that has will overwhelm any hardware when it must run within a very limited time frame. Otherwise you need to re-think your whole process and could simply buffer all the data into an in memory buffer which you could write out (flush) at a much more leisurely pace.
I made this work, however I do not believe that it is the "best practice" method.
I have another class, where I have my XmlDocument running at all times, and then trying to save every time data is added. If it fails, it simply waits to save the next time.
I will suggest others to look at Miky Dinescu's suggestion. I just felt that I was in to deep to change how to save data.

File.ReadAllText leading to memory leak [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
TextBox.Text Leaking Memory in WPF Application
I've got an application trailing a logfile. Every time the logfile updates (which is usually a series of updates in a row) the memory use balloons out of control.
I've tracked down the problem to this call:
if (File.Exists(Path + "\\logfile.txt"))
Data = File.ReadAllText(Path + "\\logfile.txt");
This is being called from within LoadAllData, here.
private void FileChangeNotificationHandler(object source, FileSystemEventArgs e)
{
this.Dispatcher.BeginInvoke
(new Action(delegate()
{
Logfile.GetPath();
Logfile.LoadAllData();
LogText.Clear();
LogText.Text = Logfile.Data;
if (CheckFollowTail.IsChecked == true) LogText.ScrollToEnd();
}));
}
Does anyone have insight on why this occurring? I assume it's related to the delegate or the handler.
It's probably just down to the amount and frequency with which you are loading log file data into memory.
GC takes time, so it you are repeating this in quick succession, then chances are you'll have several files worth of data in memory until the next GC. This seems very inefficient. You should consider the use of a stream based reader, to avoid keeping all the data in memory. If you do use a stream reader, make sure you dispose of it afterwards to avoid introducing another leak.
The another thing to check it that your not subscribing to a static event somewhere and therefore preventing your object tree from being disposed. Is it a web app?
First of all, checking if the file exists is wrong. This is because the file system is volatile and because there is more than just existence at play (permissions, for example). The correct way to do this is to just open the file, and then handle the exception if it fails.
Now, on to your stated problem. What I suspect is happening is that the log is growing large enough to use the Large Object Heap (85000 bytes is all that's needed, iirc, and remember that .Net uses utf16 (2-byte) characters). A 43K ascii log file is all you'll need to start causing problems, because at that size your .Net string is no longer garbage collected in the normal way. Every time you read the file you end up adding another instance of the entire log file to memory.
To best recommend how to get around this, it will be helpful to know what kind of component you use for your LogText variable. But pending that information, I can at least suggest a few pointers:
Ideally, you would just keep the file open (using FileShare.ReadWrite) and read from the stream every time you get a change notification. But that's not always possible.
If you have to re-open the file each time, at least read the text line by line (using a StreamReader) rather than pulling it all at once using File.ReadAllLines(). This will help you keep your log file broken up into smaller pieces that won't end up on the large object heap.
Unfortunately, I suspect that in the end you're stuck building one big string to assign to a plain textbox. If this is the case, I strongly recommend that you either only ever build and show the last part of the log (less than 85000 bytes worth) or that you search for a Large Object Heap-safe Textbox component to use.

Methodology for saving values over time

I have a task, which I know how to code (in C#),
but I know a simple implementation will not meet ALL my needs.
So, I am looking for tricks which might meet ALL my needs.
I am writing a simulation involving N number of entities interacting over time.
N will start at around 30 and move in to many thousands.
a. The number of entities will change during the course of the simulation.
b. I expect this will require each entity to have its own trace file.
Each Entity has a minimum of 20 parameters, up to millions; I want to track over time.
a. This will most likely required that we can’t keep all values in memory at all times. Some subset should be fine.
b. The number of parameters per entity will initially be fixed, but I can think of some test which would have the number of parameters slowing changing over time.
Simulation will last for millions of time steps and I need to keep every value for every parameter.
What I will be using these traces for:
a. Plotting a subset (configurable) of the parameters for a fixed amount of time from the current time step to the past.
i. Normally on the order of 300 time steps.
ii. These plots are in real time while the simulation is running.
b. I will be using these traces to re-play the simulation, so I need to quickly access all the parameters at a give time step so I can quickly move to different times in the simulation.
i. This requires the values be stored in a file(s) which can be inspected/loaded after restarting the software.
ii. Using a database is NOT an option.
c. I will be using the parameters for follow up analysis which I can’t define up front so a more flexible system is desirable.
My initial thought:
One class per entity which holds all the parameters.
Backed by a memory mapped file.
Only a fixed, but moving, amount of the file is mapped to main memory
A second memory mapped file which holds time indexes to main file for quicker access during re-playing of simulation. This may be very important because each entity file will represent a different time slice of the full simulation.
I would start with SQLite. SQLite is like a binary format library that you can query conveniently and quickly. It is not really like a database, in that you can really run it on any machine, with no installation whatsoever.
I strongly recommend against XML, given the requirement of millions of steps, potentially with millions parameters.
EDIT: Given the sheer amount of data involved, SQLite may well end up being too slow for you. Don't get me wrong, SQLite is very fast, but it won't beat seeks & reads, and it looks like your use case is such that basic binary IO is rather appropriate.
If you go with the binary IO method you should expect some moderately involved coding, and the absence of such niceties as your file staying in a consistent state if the application dies halfway through (unless you code this specifically that is).
KISS -- just write a logfile for each entity and at each time slice write out every parameter in a specified order (so you don't double the size of the logfile by adding parameter names). You can have a header in each logfile if you want to specify the parameter names of each column and the identify of the entity.
If there are many parameter values that will remain fixed or slowly changing during the course of the simulation, you can write these out to another file that encodes only changes to parameter values rather than every value at each time slice.
You should probably synchronize the logging so that each log entry is written out with the same time value. Rather than coordinate through a central file, just make the first value in each line of the file the time value.
Forget about database - too slow and too much overhead for simulation replay purposes. For replaying of a simulation, you will simply need sequential access to each time slice, which is most efficiently and fastest implemented by simply reading in the lines of the files one by one.
For the same reason - speed and space efficiency - forget XML.
Just for the memory part...
1.You can save the data as xElemet (sorry for not knowing much about linq) but it holds an XML logic.
2.hold a record counter.
after n records save the xelement to an xmlFile (data1.xml,...dataN.xml)
It can be a perfect log to any parameter you have with any logic you like:
<run>
<step id="1">
<param1 />
<param2 />
<param3 />
</step>
.
.
.
<step id="N">
<param1 />
<param2 />
<param3 />
</step>
</run>
This way your memory is free and the data is relatively free.
You don't have to think too much about DB issues and it's pretty amazing what LINQ can do for you... just open the currect XML log file...
here is what i am doing now
int bw = 0;
private void timer1_Tick(object sender, EventArgs e)
{
bw = Convert.ToInt32(lblBytesReceived.Text) - bw;
SqlCommand comnd = new SqlCommand("insert into tablee (bandwidthh,timee) values (" + bw.ToString() + ",#timee)", conn);
conn.Open();
comnd.Parameters.Add("#timee",System.Data.SqlDbType.Time).Value = DateTime.Now.TimeOfDay;
comnd.ExecuteNonQuery();
conn.Close();
}

Categories

Resources