I have a method which uses a binarywriter to write a record consisting of few uints and a byte array to a file. This method executes about a dozen times a second as part of my program. The code is below:
iLogFileMutex.WaitOne();
using (BinaryWriter iBinaryWriter = new BinaryWriter(File.Open(iMainLogFilename, FileMode.OpenOrCreate, FileAccess.Write)))
{
iBinaryWriter.Seek(0, SeekOrigin.End);
foreach (ViewerRecord vR in aViewerRecords)
{
iBinaryWriter.Write(vR.Id);
iBinaryWriter.Write(vR.Timestamp);
iBinaryWriter.Write(vR.PayloadLength);
iBinaryWriter.Write(vR.Payload);
}
}
iLogFileMutex.ReleaseMutex();
The above code works fine, but if I remove the line with the seek call, the resulting binary file is corrupted. For example certain records are completely missing or parts of them are just not present although the vast majority of records are written just fine. So I imagine that the cause of the bug is when I repeatedly open and close the file the current position in the file isn't always at the end and things get overwritten.
So my question is: Why isn't C# ensuring that the current position is at the end when I open the file?
PS: I have ruled out threading issues from causing this bug
If you want to append to the file, you must use FileMode.Append in your Open call, otherwise the file will open with its position set to the start of the file, not the end.
The problem is a combination of FileMode.OpenOrCreate and the type of the ViewerRecord members. One or more of them isn't of a fixed size type, probably a string.
Things go wrong when the file already exists. You'll start writing data at the start of the file, overwriting existing data. But what you write only overwrites an existing record by chance, the string would have to be the exact same size. If you don't write enough records then you won't overwrite all of the old records. And get into trouble when you read the file, you'll read part of an old record after you've read the last written record. You'll get junk for a while.
Making the record a fixed size doesn't really solve the problem, you'll read a good record but it will be an old one. Which particular set of old records you'll get depends on how much new data you wrote. This should be just as bad as reading garbled data.
If you really do need to preserve the old records then you should append to the file, FileMode.Append. If you don't then you should rewrite the file, FileMode.Create.
Related
I did check to see if any existing questions matched mine but I didn't see any, if I did, my mistake.
I have two text files to compare against each other, one is a temporary log file that is overwritten sometimes, and the other is a permanent log, which will collect and append all of the contents of the temp log into one file (it will collect new lines in the log since when it last checked and append the new lines to the end of the complete log). However after a point this may lead to the complete log becoming quite large and therefore not so efficient to compare against so i have been thinking about different methods to approach this.
my first idea is to "buffer" the temp log (being that it will normally be the smaller of the two) strings into a list and simply loop through the archive log and do something like:
List<String> bufferedlines = new List<string>();
using (StreamReader ArchiveStream = new StreamReader(ArchivePath))
{
if (bufferedlines.Contains(ArchiveStream.ReadLine()))
{
}
}
Now there is a couple of ways I could proceed from here, I could create yet another list to store the inconsistencies, close the read stream (I'm not sure you can both read and write at the same time, if you can that might make things easier for my options) then open a write stream in append mode and write the list to the file. alternatively, cutting out the buffering the inconsistencies, i could open a write stream while the files are being compared and on the spot write the lines that aren't matched.
The other method i could think of was limited by my knowledge of whether it could be done or not, which was rather than buffer either file, compare the streams side by side as they are read and append the lines on the fly. Something like:
using (StreamReader ArchiveStream = new StreamReader(ArchivePath))
{
using (StreamReader templogStream = new StreamReader(tempPath))
{
if (!(ArchiveStream.ReadAllLines.Contains(TemplogStream.ReadLine())))
{
//write the line to the file
}
}
}
As I said I'm not sure whether that would work or that it may be more efficient than the first method, so i figured i'd ask, see if anyone had insight into how this might properly be implemented, and whether it was the most efficient way or there was a better method out there.
Effectively what you want here is all of the items from one set that aren't in another set. This is set subtraction, or in LINQ terms, Except. If your data sets were sufficiently small you could simply do this:
var lines = File.ReadLines(TempPath)
.Except(File.ReadLines(ArchivePath))
.ToList();//can't write to the file while reading from it
File.AppendAllLines(ArchivePath, lines);
Of course, this code requires bringing the all of the lines in the temp file into memory, because that's just how Except is implemented. It creates a HashSet of all of the items so that it can efficiently find matches from the other sequence.
Presumably here the number of lines that need to be added here is pretty small, so the fact that the lines that we find here all need to be stored in memory isn't a problem. If there will potentially be a lot the, you'd want to write them out to another file besides the first one (possibly concatting the two files together when done, if needed).
Environment: Any .Net Framework welcomed.
I have a log file that gets written to 24/7.
I am trying to create an application that will read the log file and process the data.
What's the best way to read the log file efficiently? I imagine monitoring the file with something like FileSystemWatcher. But how do I make sure I don't read the same data once it's been processed by my application? Or say the application aborts for some unknown reason, how would it pick up where it left off last?
There's usually a header and footer around the payload that's in the log file. Maybe an id field in the content as well. Not sure yet though about the id field being there.
I also imagined maybe saving the lines read count somewhere to maybe use that as bookmark.
For obvious reasons reading the whole content of the file, as well as removing lines from the log files (after loading them into your application) is out of question.
What I can think of as a partial solution is having a small database (probable something much smaller than a full-blown MySQL/MS SQL/PostgreSQL instance) and populating table with what has been read from the log file. I am pretty sure that even if there is power cut off and then the machine is booted again, most of the relational databases should be able to restore it's state with ease. This solution requires some data that could be used to identify the row from the log file (for example: exact time of the action logged, machine on which the action has taken place etc.)
Well, you will have to figure out your magic for your particular case yourself. If you are going to use well-known text encoding it may be pretty simple thoght. Look toward System.IO.StreamReader and it's ReadLine(), DiscardBufferedData() methods and BaseStream property. You should be able to remember your last position in the file and rewind to that position later and start reading again, given that you are sure that file is only appended. There are other things to consider though and there is no single universal answer to this.
Just as a naive example (you may still need to adjust a lot to make it work):
static void Main(string[] args)
{
string filePath = #"c:\log.txt";
using (var stream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
using (var streamReader = new StreamReader(stream,Encoding.Unicode))
{
long pos = 0;
if (File.Exists(#"c:\log.txt.lastposition"))
{
string strPos = File.ReadAllText(#"c:\log.txt.lastposition");
pos = Convert.ToInt64(strPos);
}
streamReader.BaseStream.Seek(pos, SeekOrigin.Begin); // rewind to last set position.
streamReader.DiscardBufferedData(); // clearing buffer
for(;;)
{
string line = streamReader.ReadLine();
if( line==null) break;
ProcessLine(line);
}
// pretty sure when everything is read position is at the end of file.
File.WriteAllText(#"c:\log.txt.lastposition",streamReader.BaseStream.Position.ToString());
}
}
}
I think you will find the File.ReadLines(filename) function in conjuction with LINQ will be very handy for something like this. ReadAllLines() will load the entire text file into memory as a string[] array, but ReadLines will allow you to begin enumerating the lines immediately as it traverses through the file. This not only saves you time but keeps the memory usage very low as it is processing each line one at a time. Using statements are important because if this program is interrupted it will close the filestreams flushing the writer and saving unwritten content to the file. Then when it starts up it will skip all the files that are already read.
int readCount = File.ReadLines("readLogs.txt").Count();
using (FileStream readLogs = new FileStream("readLogs.txt", FileMode.Append))
using (StreamWriter writer = new StreamWriter(readLogs))
{
IEnumerable<string> lines = File.ReadLines(bigLogFile.txt).Skip(readCount);
foreach (string line in lines)
{
// do something with line or batch them if you need more than one
writer.WriteLine(line);
}
}
As MaciekTalaska mentioned, I would strongly recommend using a database if this is something written to 24/7 and will get quite large. File systems are simply not equipped to handle such volume and you will spend a lot of time trying to invent solutions where a database could do it in a breeze.
Is there a reason why it logs to a file? Files are great because they are simple to use and, being the lowest common denominator, there is relatively little that can go wrong. However, files are limited. As you say, there's no guarantee a write to the file will be complete when you read the file. Multiple applications writing to the log can interfere with each other. There is no easy sorting or filtering mechanism. Log files can grow very big very quickly and there's no easy way to move old events (say those more than 24 hours old) into separate files for backup and retention.
Instead, I would considering writing the logs to a database. The table structure can be very simple but you get the advantage of transactions (so you can extract or backup with ease) and search, sort and filter using an almost universally understood syntax. If you are worried about load spikes, use a message queue, like http://msdn.microsoft.com/en-us/library/ms190495.aspx for SQL Server.
To make the transition easier, consider using a logging framework like log4net. It abstracts much of this away from your code.
Another alternative is to use a system like syslog or, if you have multiple servers and a large volume of logs, flume. By moving the log files away the source computer, you can store them or inspect them on a different machine far more effectively. However, these are probably overkill for your current problem.
I am developing a program to log data from a incoming serial communication. I have to invoke the serial box by sending a command, to recieve something. All this works fine, but i have a problem.
The program have to be run from a netbook ( approx: 1,5 gHZ, 2 gig ram ), and it can't keep up when i ask it to save these information to a XML file.
I am only getting communication every 5 second, i am not reading the file anywhere.
I use xml.save(string filename) to save the file.
Is there another, better way, to save the information to my XML, or should i use an alternative?
If i should use an alternative, which should it be?
Edit:
Added some code:
XmlDocument xml = new XmlDocument();
xml.Load(logFile);
XmlNode p = xml.GetElementsByTagName("records")[0];
for (int i = 0; i < newDat.Length; i++)
{
XmlNode q = xml.CreateElement("record");
XmlNode a = xml.CreateElement("time");
XmlNode b = xml.CreateElement("temp");
XmlNode c = xml.CreateElement("addr");
a.AppendChild(xml.CreateTextNode(outDat[i, 0]));
b.AppendChild(xml.CreateTextNode(outDat[i, 1]));
c.AppendChild(xml.CreateTextNode(outDat[i, 2]));
sendTime = outDat[i, 0];
points.Add(outDat[i, 2], outDat[i, 1]);
q.AppendChild(a);
q.AppendChild(b);
q.AppendChild(c);
p.AppendChild(q);
}
xml.AppendChild(p);
xml.Save(this.logFile);
This is the XML related code, running once every 5 seconds. I am reading (I get no error), adding some childs, and then saving it again. It is when I save that I get the error.
You may want to look at using an XMLWriter and building the XML file by hand. That would allow you to open a file and keep it open for the duration of the logging, appending one XML fragment at a time, as you read in data. The XMLReader class is optimized for forward-only writing to an XMLStream.
The above approach should be much faster when compared to using the Save method to serialize (save) a full XML document each time you read data and when you really only want to append a new fragment at the end.
EDIT
Based on the code sample you posted, it's the Load and Save that's causing the unnecessary performance bottleneck. Every time you're adding a log entry you're essentially loading the full XML document and behind the scenes parsing it into a full-blown XML tree. Then you modify the tree (by adding nodes) and then serialize it all to disk again. This is very very counter productive.
My proposed solution is really the way to go: create and open the log file only once; then use an XMLWriter to write out the XML elements one by one, each time you read new data; this way you're not holding the full contents of the XML log in memory and you're only appending small chunks of data at the end of a file - which should be unnoticeable in terms of overhead; at the end, simply close the root XML tag, close the XMLWriter and close the file. That's it! This is guaranteed to not slow down your UI even if you implement it synchronously, on the UI thread.
While not a direct answer to your question, it sounds like you're doing everything in a very linear way:
Receive command
Modify in memory XML
Save in memory XML to disk
GoTo 1
I would suggest you look into using some threading, or possibly Task's to make this more asynchronous. This would certainly be more difficult, and you would have to wrestle with the task synchronization, but in the long run it's going to perform a lot better.
I would look at having a thread (possibly the main thread, not sure if you're using WinForms, a console app or what) that receives the command, and posts the "changes" to a holding class. Then have a second thread, which periodically polls this holding class and checks it for a "Dirty" state. When it detects this state, it grabs a copy of the XML and saves it to disk.
This allows your serial communication to continue uninterrupted, regardless of how poorly the hardware you're running on performs.
Normally for log files one picks append-friendly format, otherwise you have to re-parse whole file every time you need to append new record and save the result. Plain text CSV is likely the simplest option.
One other option if you need to have XML-like file is to store list of XML fragments instead of full XML. This way you still can use XML API (XmlReader can read fragments when specifying ConformanceLevel.Frament in XmlReaderSettings of XmlReader.Create call), but you don't need to re-read whole document to append new entry - simple file-level append is enough. I.e. WCF logs are written this way.
The answer from #Miky Dinescu is one technique for doing this if your output must be an XML formatted file. The reason why is that you are asking it to completed load and reparse the entire XML file every single time you add another entry. Loading and parsing the XML file becomes more and more IO, memory, and CPU intensive the bigger the file gets. So it doesn't take long before the amount of overhead that has will overwhelm any hardware when it must run within a very limited time frame. Otherwise you need to re-think your whole process and could simply buffer all the data into an in memory buffer which you could write out (flush) at a much more leisurely pace.
I made this work, however I do not believe that it is the "best practice" method.
I have another class, where I have my XmlDocument running at all times, and then trying to save every time data is added. If it fails, it simply waits to save the next time.
I will suggest others to look at Miky Dinescu's suggestion. I just felt that I was in to deep to change how to save data.
I'm trying to write a simple .txt via StreamWriter. I want it to look like this:
12
26
100
So simple numbers. But how do I tell the Reader/writer in which line to write or read.
http://msdn.microsoft.com/en-us/library/system.io.streamreader.aspx
Here it says that ReadLine() reads a line of current the Stream. But how do I know which line it is. Or is it always the first one?
I want to read the numbers, modify them and then write them back.
Any suggestions?
Thanks in advance!
A reader is conceptually a unidirectional thing, from the start (or more accurately, the current position in the stream) to the end.
Each time you read a line, it is simply buffering data until it finds a new line; it doesn't really have the concept of a current line (or moving between lines).
As long as the file isn't massive, you should be OK reading the entire file, working on a string (or string array), then saving the entire file; inserting/removing text content is otherwise non-trivial (especially when you consider the mysteries of encodings).
File.ReadAllLines and File.WriteAllLines may be easier in your scenario.
I think the only way is to read all lines in array (or any other data structude, i.e. list), modify and write it back to file.
Maybe xml will be better for your purposes?
I'm using BinaryWriter to write records to a file. The records are comprised of a class with the following property datatypes.
Int32,
Int16,
Byte[],
Null Character
To write each record, I call BinaryWriter.Write four times--one for each datatype. This works fine but I'd like to know if there's any way to just call the BinaryWriter.Write() method a single time using all of these datatypes. The reasoning for this is that another program is reading my binary file and will occasionally only read part of a record because it starts reading between my write calls. Unfortunately, I don't have control over the code to the other program else I would modify the way it reads.
Add a .ToBinary() method to your class that returns byte[].
public byte[] ToBinary()
{
byte[] result= new byte[number_of_bytes_you_need];
// fill buf
return result;
}
In your calling code (approximate as I haven't compiled this)
stream.BinaryWrite(myObj.toBinary());
You're still writing each value independently, but it cleans up the code a little.
Also, as sindre suggested, consider using the serialization, as it makes it incredibly easy to recreate your objects from the file in question, and requires less effort than writing the file as you're attempting to.
Sync: you can't depend on any of these solutions to fix your file sync issue. Even if you manage to reduce your binaryWrite() call to a single statement, not using serialization or the .ToBinary() method I've outlined, bytes are still written sequentially by the framework. This is a limitation of the physical structure of the disk. If you have control over the file format, add a record length field written before any of the record data. In the app that's reading the file, make sure that you have record_length bytes before attempting to process the next record from the file. While you're at it, put this in a database. If you don't have control over the file format, you're kind of out of luck.
In keeping with you writing to a BinaryWriter, I would have the object create a binary record using a second BinaryWriter that is then written to the BinaryWriter. So on your class you could have a method like this:
public void WriteTo(BinaryWriter writer)
{
MemoryStream ms = new MemoryStream();
BinaryWriter bw = new BinaryWriter(ms);
bw.Write(value1);
bw.Write(value2);
bw.Write(value3);
bw.Write(value4);
writer.Write(ms.ToArray());
}
This would create a single record with the same format as you're already writing to the main BinaryWriter, just it would build it all at once then write it as a byte array.
Create a class of the record and use the binary formatter:
FileStream fs = new FileStream("file.dat", FileMode.Create);
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(fs, <insert instance of a class here>);
fs.Close();
I haven't done this myself so I'm not absolutely sure it would work, the class cannot contain any other data, that's for sure. If you have no luck with a class you could try a struct.
Edit:
Just came up with another possible solution, create a struct of your data and use the Buffer.BlockCopy function:
byte[] writeBuffer = new byte[sizeof(structure)];
structure[] strucPtr = new structure[1]; // must be an array, 1 element is enough though
strucPtr[0].item1 = 0213; // initialize all the members
// Copy the structure array into the byte array.
Buffer.BlockCopy(strucPtr, 0, writeBuffer, 0, writeBuffer.Length);
Now you can write the writeBuffer to file in one go.
Second edit:
I don't agree with the sync problems not beeing possible to solve. First of all, the data is written to the file in entire sectors, not one byte at a time. And the file is really not updated until you flush it, thus writing data and updating the file length. The best and safest thing to do is to open the file exclusively, write a record (or several), and close the file. That requires the reading applications to use a similiar manner to read the file (open ex, read, close), as well as handling "access denied" errors gracefully.
Anyhow, I'm quite sure this will perform better no matter what when your'e writing an entire record at a time.