C# Weird behavior after error handling in StreamWriter "using" block - c#

I have C# code that collects data from csv files, and writes it out in a certain way to another csv file. The user has an option to sort or not sort (sorting does some other stuff too), based on state of a checkbox called "sortBox." Most of the work happens inside a using block for a StreamWriter. Within the block, there is a step where it writes to a dictionary.
There is a corner case where sorting makes the dictionary think there is a duplicate value, and an error occurs. The corner case is not worth fixing, and since not sorting is not a big deal, I am trying to offer an option when dictionary error occurs for SW to uncheck sortBox, go back to the start of the method, and write out the unsorted data. Note that the output file has one of two names, depending on whether data are sorted or not. I am fine with having both files, the sorted on never getting written to due to the error, and the unsorted one getting written to.
This is a compact version of my lame attempt. The recursive call might be a dumb idea -- and maybe the reason for the odd behavior I will describe, but I do not know what other approach to take.
void CollectAndStreamStats(string readFolder, string writeFolder)
{
//<stuff>
try
{
using (StreamWriter csvOut = new StreamWriter(fullPath))
{
//<stuff>
if (sortBox.Checked)
{
//<stuff>
Try
{
resultsDictionary.Add(keyString, str); //THIS IS WHAT I AM ERROR CHECKING
}
catch (ArgumentException e) //OH NO, CANNOT ADD TO DICTIONARY, UNCHECKING SORT CHECK BOX SHOULD SOLVE IT
{
d = MessageBoc.Show(“You’re screwed, do you want to uncheck sort option and analyze again?”,”Sucker”, MessageBoxButtons.YesNo,other options)
if (d==DialogResult.Yes)
{
csvOut.Close(); //CLOSE STREAM CREATED BY STREAMWRITER
sortBox.Checked = false; //UNCHECK THE BOX, SHOULD WORK OK NOW. (CORNER CASE THING).
CollectAndStreamStats(csvReadFolder, csvWriteFolder);
} //RUN THIS SAME METHOD ALL OVER AGAIN, BUT WITH THAT CHECK BOX UNCHECKED. NOTE THAT CSVoUT WILL HAVE A DIFFERENT NAME AND STREAMING WILL BE TO NEW FILE
}
// **Loop for csvOut.WriteLine goes here if sortBox is checked. It happens elsewhere if it is unchecked.**
}
}
catch (IOException e)
{
MessageBox.Show("Well, I tried ::shrug::");
}
Well, some of it works. It unchecks sortBox -- local variables confirm this, as does UI -- compiles the unsorted data, and creates and writes to the second (unsorted) file. But then -- despite sortBox.checked being false, it enters the "if (sortBox.Checked)" loop, decides that the file name is the original sorted one again, and tries to write to it, only to throw an error saying it cannot write a closed stream.
No luck with online searches. There must be a right way, any thoughts?
Thanks much in advance,
Aram

Related

Efficient Methods of Comparing Text Files Simultaneously

I did check to see if any existing questions matched mine but I didn't see any, if I did, my mistake.
I have two text files to compare against each other, one is a temporary log file that is overwritten sometimes, and the other is a permanent log, which will collect and append all of the contents of the temp log into one file (it will collect new lines in the log since when it last checked and append the new lines to the end of the complete log). However after a point this may lead to the complete log becoming quite large and therefore not so efficient to compare against so i have been thinking about different methods to approach this.
my first idea is to "buffer" the temp log (being that it will normally be the smaller of the two) strings into a list and simply loop through the archive log and do something like:
List<String> bufferedlines = new List<string>();
using (StreamReader ArchiveStream = new StreamReader(ArchivePath))
{
if (bufferedlines.Contains(ArchiveStream.ReadLine()))
{
}
}
Now there is a couple of ways I could proceed from here, I could create yet another list to store the inconsistencies, close the read stream (I'm not sure you can both read and write at the same time, if you can that might make things easier for my options) then open a write stream in append mode and write the list to the file. alternatively, cutting out the buffering the inconsistencies, i could open a write stream while the files are being compared and on the spot write the lines that aren't matched.
The other method i could think of was limited by my knowledge of whether it could be done or not, which was rather than buffer either file, compare the streams side by side as they are read and append the lines on the fly. Something like:
using (StreamReader ArchiveStream = new StreamReader(ArchivePath))
{
using (StreamReader templogStream = new StreamReader(tempPath))
{
if (!(ArchiveStream.ReadAllLines.Contains(TemplogStream.ReadLine())))
{
//write the line to the file
}
}
}
As I said I'm not sure whether that would work or that it may be more efficient than the first method, so i figured i'd ask, see if anyone had insight into how this might properly be implemented, and whether it was the most efficient way or there was a better method out there.
Effectively what you want here is all of the items from one set that aren't in another set. This is set subtraction, or in LINQ terms, Except. If your data sets were sufficiently small you could simply do this:
var lines = File.ReadLines(TempPath)
.Except(File.ReadLines(ArchivePath))
.ToList();//can't write to the file while reading from it
File.AppendAllLines(ArchivePath, lines);
Of course, this code requires bringing the all of the lines in the temp file into memory, because that's just how Except is implemented. It creates a HashSet of all of the items so that it can efficiently find matches from the other sequence.
Presumably here the number of lines that need to be added here is pretty small, so the fact that the lines that we find here all need to be stored in memory isn't a problem. If there will potentially be a lot the, you'd want to write them out to another file besides the first one (possibly concatting the two files together when done, if needed).

appending and reading text file

Environment: Any .Net Framework welcomed.
I have a log file that gets written to 24/7.
I am trying to create an application that will read the log file and process the data.
What's the best way to read the log file efficiently? I imagine monitoring the file with something like FileSystemWatcher. But how do I make sure I don't read the same data once it's been processed by my application? Or say the application aborts for some unknown reason, how would it pick up where it left off last?
There's usually a header and footer around the payload that's in the log file. Maybe an id field in the content as well. Not sure yet though about the id field being there.
I also imagined maybe saving the lines read count somewhere to maybe use that as bookmark.
For obvious reasons reading the whole content of the file, as well as removing lines from the log files (after loading them into your application) is out of question.
What I can think of as a partial solution is having a small database (probable something much smaller than a full-blown MySQL/MS SQL/PostgreSQL instance) and populating table with what has been read from the log file. I am pretty sure that even if there is power cut off and then the machine is booted again, most of the relational databases should be able to restore it's state with ease. This solution requires some data that could be used to identify the row from the log file (for example: exact time of the action logged, machine on which the action has taken place etc.)
Well, you will have to figure out your magic for your particular case yourself. If you are going to use well-known text encoding it may be pretty simple thoght. Look toward System.IO.StreamReader and it's ReadLine(), DiscardBufferedData() methods and BaseStream property. You should be able to remember your last position in the file and rewind to that position later and start reading again, given that you are sure that file is only appended. There are other things to consider though and there is no single universal answer to this.
Just as a naive example (you may still need to adjust a lot to make it work):
static void Main(string[] args)
{
string filePath = #"c:\log.txt";
using (var stream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
using (var streamReader = new StreamReader(stream,Encoding.Unicode))
{
long pos = 0;
if (File.Exists(#"c:\log.txt.lastposition"))
{
string strPos = File.ReadAllText(#"c:\log.txt.lastposition");
pos = Convert.ToInt64(strPos);
}
streamReader.BaseStream.Seek(pos, SeekOrigin.Begin); // rewind to last set position.
streamReader.DiscardBufferedData(); // clearing buffer
for(;;)
{
string line = streamReader.ReadLine();
if( line==null) break;
ProcessLine(line);
}
// pretty sure when everything is read position is at the end of file.
File.WriteAllText(#"c:\log.txt.lastposition",streamReader.BaseStream.Position.ToString());
}
}
}
I think you will find the File.ReadLines(filename) function in conjuction with LINQ will be very handy for something like this. ReadAllLines() will load the entire text file into memory as a string[] array, but ReadLines will allow you to begin enumerating the lines immediately as it traverses through the file. This not only saves you time but keeps the memory usage very low as it is processing each line one at a time. Using statements are important because if this program is interrupted it will close the filestreams flushing the writer and saving unwritten content to the file. Then when it starts up it will skip all the files that are already read.
int readCount = File.ReadLines("readLogs.txt").Count();
using (FileStream readLogs = new FileStream("readLogs.txt", FileMode.Append))
using (StreamWriter writer = new StreamWriter(readLogs))
{
IEnumerable<string> lines = File.ReadLines(bigLogFile.txt).Skip(readCount);
foreach (string line in lines)
{
// do something with line or batch them if you need more than one
writer.WriteLine(line);
}
}
As MaciekTalaska mentioned, I would strongly recommend using a database if this is something written to 24/7 and will get quite large. File systems are simply not equipped to handle such volume and you will spend a lot of time trying to invent solutions where a database could do it in a breeze.
Is there a reason why it logs to a file? Files are great because they are simple to use and, being the lowest common denominator, there is relatively little that can go wrong. However, files are limited. As you say, there's no guarantee a write to the file will be complete when you read the file. Multiple applications writing to the log can interfere with each other. There is no easy sorting or filtering mechanism. Log files can grow very big very quickly and there's no easy way to move old events (say those more than 24 hours old) into separate files for backup and retention.
Instead, I would considering writing the logs to a database. The table structure can be very simple but you get the advantage of transactions (so you can extract or backup with ease) and search, sort and filter using an almost universally understood syntax. If you are worried about load spikes, use a message queue, like http://msdn.microsoft.com/en-us/library/ms190495.aspx for SQL Server.
To make the transition easier, consider using a logging framework like log4net. It abstracts much of this away from your code.
Another alternative is to use a system like syslog or, if you have multiple servers and a large volume of logs, flume. By moving the log files away the source computer, you can store them or inspect them on a different machine far more effectively. However, these are probably overkill for your current problem.

Validate CSV file

I have a webpage that is used to submit a CSV file to the server. I have to validate the file, for stuff like correct number of columns, correct data type, cross field validations, data-range validations, etc. And finally either show a successful message or return a CSV with error messages and line numbers.
Currently every row and every column is looped through to find out all the errors in the CSV file. But it becomes very slow for bigger files, sometimes resulting in a server time-out. Can someone please suggest a better way to do this.
Thanks
To validate a CSV file you will surely need to check each column. The only best way if possible in your scenario is to validate the entry itself while appending to the CSV file..
Edit
As pinpointed an error by #accolaum, i have edited my code
It will only work provided each row is delimited with a `\n`
IF you only want to Validate number of Columns.. then its easier.. Just take the mod of all the entries with the num of columns
bool file_isvalid;
string data = streamreader.ReadLine();
while(data != null)
{
if(data.Split(',').Length % Num_Of_Columns == 0)
{
file_isvalid = true;
//Perform opertaion
}
else
{
file_isvalid = false;
//Perform Operation
}
data = streamreader.ReadLine();
}
Hope it helps
I would suggest a rule based approach, similar to unit tests. Think of every! error that can possibly occour and order them in increasing abstraction level
Correct file encoding
Correct number of lines/columns
correct column headers
correct number/text/date formats
correct number ranges
bussiness rules??
...
These rules could also have automatic fixes. So if you could automatically detect the encoding, you could correct it before testing all the rules.
Implementation could be done using the command pattern
public abstract class RuleBase
{
public abstract bool Test();
public virtual bool CanCorrect()
{
return false;
}
}
Then create a subclass for each test you want to make and put them in a list.
The timeout can be overcome by using a background thread only for test incoming files. The user has to wait till his file is validated and becomes "active". When finished you can forward him to the next page.
You may be able to optimize your code to perform faster, but what you really want to do is to spawn a worker thread to do the processing.
Two benefits of this
You can redirect the user to another page so that they know their request has submitted
The worker thread can be given a callback so that it can report its status - if you want to, you could put a progress bar or a percentage on the 'submitted' page so that the user can see as their file is being processed.
It is not good design to have the user waiting for long running processes to complete - they should be given updates or notifications, rather than just a 'loading' icon on their browser.
edit: This is my answer because (1) I can't recommend code improvements without seeing your code, and (2) efficiency improvements are probably only going to yield incremental improvements (unless you are doing something really wrong), which won't solve your problem long term.
Validation of csv data usually always needs to look at every single cell. Can you post some of your code, there may be ways to optimse it.
EDIT
in most cases this is the best solution
foreach(row) {
foreach (column) {
validate cell
}
}
if you were really keen, you could try something with regex's
foreach(row) {
validate row by regex
}
but then you are really just off loading the validation code from you to the regex, and i really hate using regexs
You could use XMLReader and parse against an XSD

C# Saving "X" times into one .txt file without overwriting last string

Well, now i have a new problem.
Im writing code in C#
I want to save from textBoxName into group.txt file each time i enter string into textbox and click on save button. It should save at this order (if its possible to sort it like A-Z that would be great):
1. Petar Milutinovic
2. Ljiljana Milutinovic
3. Stefan Milutinovic
4. ... etc
I cant get it to work, i tried to use tehniques from my first question, and no solution yet :(
This is easy one i guess, but im still a beginer and i need this baddly...
Try to tackle this from a top-down approach. Write out what should happen, because it's not obvious from your question.
Example:
User enters a value in a (single-line?) textbox
User clicks Save
One new line is appended to the end of a file, with the contents of the textbox in step 1
Note: each line is prefixed with a line number, in the form "X. Sample" where X is the line number and Sample is the text from the textbox.
Is the above accurate?
(If you just want to add a line to a text file, see http://msdn.microsoft.com/en-us/library/ms143356.aspx - File.AppendAllText(filename, myTextBox.Text + Environment.NewLine); may be what you want)
Here's a simple little routine you can use to read, sort, and write the file. There are loads of ways this can be done, mine probably isn't even the best. Even now I'm thinking "I could have written that using a FileStream and done the iteration for counting then", but they're micro-optimizations that can be done later if you have performance issues with multi-megabyte files.
public static void AddUserToGroup(string userName)
{
// Read the users from the file
List<string> users = File.ReadAllLines("group.txt").ToList();
// Strip out the index number
users = users.Select(u => u.Substring(u.IndexOf(". ") + 2)).ToList();
users.Add(userName); // Add the new user
users.Sort((x,y) => x.CompareTo(y)); // Sort
// Reallocate the number
for (int i = 0; i < users.Count; i++)
{
users[i] = (i + 1).ToString() + ". " + users[i];
}
// Write to the file again
File.WriteAllLines("group.txt", users);
}
If you need the file to be sorted every time a new line is added, you'll either have to load the file into a list, add the line, and sort it, or use some sort of search (I'd recommend a binary search) to determine where the new line belongs and insert it accordingly. The second approach doesn't have many advantages, though, as you basically have to rewrite the entire file in order to insert a line - it only saves you time in the best case scenario, which occurs when the line to be inserted falls at the end of the file. Additionally, the second method is a bit lighter on the processor, as you aren't attempting to sort every line - for small files however, the difference will be unnoticeable.

binarywriter not opening file at end of stream

I have a method which uses a binarywriter to write a record consisting of few uints and a byte array to a file. This method executes about a dozen times a second as part of my program. The code is below:
iLogFileMutex.WaitOne();
using (BinaryWriter iBinaryWriter = new BinaryWriter(File.Open(iMainLogFilename, FileMode.OpenOrCreate, FileAccess.Write)))
{
iBinaryWriter.Seek(0, SeekOrigin.End);
foreach (ViewerRecord vR in aViewerRecords)
{
iBinaryWriter.Write(vR.Id);
iBinaryWriter.Write(vR.Timestamp);
iBinaryWriter.Write(vR.PayloadLength);
iBinaryWriter.Write(vR.Payload);
}
}
iLogFileMutex.ReleaseMutex();
The above code works fine, but if I remove the line with the seek call, the resulting binary file is corrupted. For example certain records are completely missing or parts of them are just not present although the vast majority of records are written just fine. So I imagine that the cause of the bug is when I repeatedly open and close the file the current position in the file isn't always at the end and things get overwritten.
So my question is: Why isn't C# ensuring that the current position is at the end when I open the file?
PS: I have ruled out threading issues from causing this bug
If you want to append to the file, you must use FileMode.Append in your Open call, otherwise the file will open with its position set to the start of the file, not the end.
The problem is a combination of FileMode.OpenOrCreate and the type of the ViewerRecord members. One or more of them isn't of a fixed size type, probably a string.
Things go wrong when the file already exists. You'll start writing data at the start of the file, overwriting existing data. But what you write only overwrites an existing record by chance, the string would have to be the exact same size. If you don't write enough records then you won't overwrite all of the old records. And get into trouble when you read the file, you'll read part of an old record after you've read the last written record. You'll get junk for a while.
Making the record a fixed size doesn't really solve the problem, you'll read a good record but it will be an old one. Which particular set of old records you'll get depends on how much new data you wrote. This should be just as bad as reading garbled data.
If you really do need to preserve the old records then you should append to the file, FileMode.Append. If you don't then you should rewrite the file, FileMode.Create.

Categories

Resources