Having a strange problem that I've never encountered nor heard of happening. It seems that occasionally, the ReadLine() function of the StreamReader class will return NULL, as if it's at the end of the file, BUT it's not.
My log file indicates that everything is happening just as if it had actually reached the end of the file, but yet it's only processing part of it. There doesn't appear to be any consistency, because if I restart the process from scratch, the whole file is processed without incident.
Clearly, there is nothing funky in the file itself, or it would do this on the same line each time, plus it has happened with a few different files, and each time they are re-run, it works fine.
Anyone run across anything similar, or have any suggestions on what might be causing such a thing?
Thanks,
Andrew
Sample:
line = _readerStream.ReadLine();
if (null != line)
{
eventRetVal = loadFileLineEvent(line);
}
else
{
// do some housecleaning and log file completed
}
_readerStream is the stream which has been opened elsewhere.
loadFileLineEvent is a delegate that gets passed in which processes the line. This has its own error handling (with logging), so there's no issue in there.
The routine above (not shown in its entirety) has error handling around it also (with logging), which is not being triggered either.
It's getting to the "else" and logging that it reached the end of the file, but it's obvious from the number of records I got that it didn't.
Have you tried a more traditional approach to reading the stream? This way your checking for the end of the stream before reading the next potentially empty/null line. Seems like your code should work, but with a possible null exception thrown for trying to read a line that doesn't exists(not sure if SR throws for that though).
using (StreamReader SR = new StreamReader(OFD.FileName))
{
while (!SR.EndOfStream)
{
string CurrentLine = SR.ReadLine();
var eventRetVal = loadFileLineEvent(CurrentLine);
}
}
Related
Let me preface this question by saying I'm absolutely not a pro C# programmer and have pretty much brute forced my way through most of my small programs so far.
I'm working on a small WinForms application to SSH into a few devices, tail -f a log file on each, and display the real-time output in TextBoxes while also saving to log files. Right now, it works, but hogs nearly 30% of my CPU during logging and I'm sure I'm doing something wrong.
After creating the SshClient and connecting, I run the tail command like so (these variables are part of a logger class which exists for each connection):
command = client.CreateCommand("tail -f /tmp/messages")
result = command.BeginExecute();
stream = command.OutputStream;
I then have a log reading/writing function:
public async Task logOutput(IAsyncResult result, Stream stream, TextBox textBox, string logPath)
{
// Clear textbox ( thread-safe :) )
textBox.Invoke((MethodInvoker)(() => textBox.Clear()));
// Create reader for stream and writer for text file
StreamReader reader = new StreamReader(stream, Encoding.UTF8, true, 1024, true);
StreamWriter sw = File.AppendText(logPath);
// Start reading from SSH stream
while (!result.IsCompleted || !reader.EndOfStream)
{
string line = await reader.ReadLineAsync();
if (line != null)
{
// append to textbox
textBox.Invoke((Action)(() => textBox.AppendText(line + Environment.NewLine)));
// append to file
sw.WriteLine(line);
}
}
}
Which I call the following way, per device connection:
Task.Run(() => logOutput(logger.result, logger.stream, textBox, fileName), logger.token);
Everything works fine, it's just the CPU usage that's the issue. I'm guessing I'm creating way more than one thread per logging process, but I don't know why or how to fix that.
Does anything stand out as a simple fix to the above code? Or even better - is there a way to set up a callback that only prints the new data when the result object gets new text?
All help is greatly appreciated!
EDIT 3/4/2021
I tried a simple test using CopyToAsync by changing the code inside logOutput() to the following:
public async Task logOutput(IAsyncResult result, Stream stream, string logPath)
{
using (Stream fileStream = File.Open(logPath, FileMode.OpenOrCreate))
{
// While the result is running, copy everything from the command stream to a file
while (!result.IsCompleted)
{
await stream.CopyToAsync(fileStream);
}
}
}
However this results in the text files never getting data written to them, and CPU usage is actually slightly worse.
2ND EDIT 3/4/2021
Doing some more debugging, it appears the high CPU usage occurs only when there's no new data coming in. As far as I can tell, this is because the ReadLineAsync() method is constantly firing regardless of whether or not there's actually new data from the SSH command that's running, and it's running as fast as possible hogging all the CPU cycles it can. I'm not entirely sure why that is though, and could really use some help here. I would've assumed that ReadLineAsync() would simply wait until a new line was available from the SSH command to continue.
The solution ended up being much simpler than I would've thought.
There's a known bug in SSH.NET where the command's OutputStream will continually spit out null data when there's no actual new data recieved. This causes the while loop in my code to be running as fast as possible, consuming a bunch of CPU in the process.
The solution is simply to add a short asynchronous delay in the loop. I included the delay only when the recieved data is null, so that reading isn't interrupted when there's actual valid data coming through.
while (!result.IsCompleted && !token.IsCancellationRequested)
{
string line = await reader.ReadLineAsync();
// Append line if it's valid
if (string.IsNullOrEmpty(line))
{
await Task.Delay(10); // prevents high CPU usage
continue;
}
// Append line to textbox
textBox.Invoke((Action)(() => textBox.AppendText(line + Environment.NewLine)));
// Append line to file
writer.WriteLine(line);
}
On a Ryzen 5 3600, this brought my CPU usage from ~30-40% while the program was running to less than 1% even when data is flowing. Much better.
I have a program that continuously writes its log to a text file.
I don't have the source code of it, so I can not modify it in any way and it is also protected with Themida.
I need to read the log file and execute some scripts depending on the content of the file.
I can not delete the file because the program that is continuously writing to it has locked the file.
So what will be the better way to read the file and only read the new lines of the file?
Saving the last line position? Or is there something that will be useful for solving it in C#?
Perhaps use the FileSystemWatcher along with opening the file with FileShare (as it is being used by another process). Hans Passant has provided a nice answer for this part here:
var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (var sr = new StreamReader(fs)) {
// etc...
}
Have a look at this question and the accepted answer which may also help.
using (var fs = new FileStream("test.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite | FileShare.Delete))
using (var reader = new StreamReader(fs))
{
while (true)
{
var line = reader.ReadLine();
if (!String.IsNullOrWhiteSpace(line))
Console.WriteLine("Line read: " + line);
}
}
I tested the above code and it works if you are trying to read one line at a time. The only issue is that if the line is flushed to the file before it is finished being written then you will read the line in multiple parts. As long as the logging system is writing each line all at once it should be okay.
If not then you may want to read into a buffer instead of using ReadLine, so you can parse the buffer yourself by detecting each Environment.NewLine substring.
You can just keep calling ReadToEnd() in a tight loop. Even after it reaches the end of the file it'll just return an empty string "". If some more data is written to the file it will pick it up on a subsequent call.
while (true)
{
string moreData = streamReader.ReadToEnd();
Thread.Sleep(100);
}
Bear in mind you might read partial lines this way. Also if you are dealing with very large files you will probably need another approach.
Use the filesystemwatcher to detect changes and get new lines using last read position and seek the file.
http://msdn.microsoft.com/en-us/library/system.io.filestream.seek.aspx
The log file is being "continuously" updated so you really shouldn't use FileSystemWatcher to raise an event each time the file changes. This would be triggering continuously, and you already know it will be very frequently changing.
I'd suggest using a timer event to periodically process the file. Read this SO answer for a good pattern to use System.Threading.Timer1. Keep a file stream open for reading or reopen each time and Seek to the end position of your last successful read. By "last successful read" I mean that you should encapsulate the reading and validating of a complete log line. Once you've successfully read and validated a log line, then you have a new position for the next Seek.
1 Note that System.Threading.Timer will execute on a system supplied thread that is kept in business by the ThreadPool. For short tasks this is more desirable that a dedicated thread.
Use this answer on another post c# continuously read file.
This one is quite efficient, and it checks once per second if the file size has changed. So the file is usually not read-locked as a result.
The other answers are quite valid and simple. A couple of them will read-lock the file continuously, but that's probably not a problem for most.
I'm writing an application that manipulates a text file. The first half of my function reads the textfile, while the second half writes to (optionally) the same file. Although I call .close() on the StreamReader object before opening the StreamWriter object, I still get a IOException: The process cannot access the file "file.txt" because it is being used by another process.
How do I force my program to release the file before continuing?
public static void manipulateFile(String fileIn, String fileOut,String obj)
{
StreamReader sr = new StreamReader(fileIn);
String line;
while ((line = sr.ReadLine()) != null)
{
//code to split up file into part1, part2, and part3[]
}
sr.Close();
//Write the file
if (fileOut != null)
{
StreamWriter sw = new StreamWriter(fileOut);
sw.Write(part1 + part2);
foreach (String s in part3)
{
sw.WriteLine(s);
}
sw.Close();
}
}
Your code as posted runs fine - I don't see the exception.
However calling Close() manually like that is a bad idea - if an exception is thrown your call to Close() might never be made. You should use a finally block, or better yet : a using statement.
using (StreamReader sr = new StreamReader(fileIn))
{
// ...
}
But the actual problem you are experiencing might not be specifically with this method, but a general problem with forgetting to close files properly in using blocks. I suggest you go through all your code base and look for all the places in your code where you use IDisposable objects and check that you dispose them correctly even when there could be exceptions.
Getting read access to a file that's already opened elsewhere isn't usually difficult. Most code would open a file for reading with FileShare.Read, allowing somebody else to read the file as well. StreamReader does so for example.
Getting write access is an entirely different ball of wax. That same FileShare.Read does not include FileShare.Write, allowing you to write the file while somebody else is reading it. That's very troublesome, you're jerking the mat out from under that somebody else, suddenly providing entirely different data.
All you have to do is find out who that 'somebody else' might be. SysInternals' Handles utility can tell you. Hopefully it is your own program, you could do something about that.
May sound like a stupid question, but are you sure you didn't edit the file with another application, which didn't release the file? I've had this situation before, mostly with Excel files where Excel didn't completely unloading from memory (or me just being dumb enough not to close the other application sometimes). Might happen with whatever application you use for .txt files, if any. Just a suggestion.
My application traverses a directory tree and in each directory it tries to open a file with a particular name (using File.OpenRead()). If this call throws FileNotFoundException then it knows that the file does not exist. Would I rather have a File.Exists() call before that to check if file exists? Would this be more efficient?
Update
I ran these two methods in a loop and timed each:
void throwException()
{
try
{
throw new NotImplementedException();
}
catch
{
}
}
void fileOpen()
{
string filename = string.Format("does_not_exist_{0}.txt", random.Next());
try
{
File.Open(filename, FileMode.Open);
}
catch
{
}
}
void fileExists()
{
string filename = string.Format("does_not_exist_{0}.txt", random.Next());
File.Exists(filename);
}
Random random = new Random();
These are the results without the debugger attached and running a release build :
Method Iterations per second
throwException 10100
fileOpen 2200
fileExists 11300
The cost of a throwing an exception is a lot higher than I was expecting, and calling FileOpen on a file that doesn't exist seems much slower than checking the existence of a file that doesn't exist.
In the case where the file will often not be present it appears to be faster to check if the file exists. I would imagine that in the opposite case - when the file is usually present you will find it is faster to catch the exception. If performance is critical to your application I suggest that you benchmark both apporaches on realistic data.
As mentioned in other answers, remember that even in you check for existence of the file before opening it you should be careful of the race condition if someone deletes the file after your existence check but just before you open it. You still need to handle the exception.
No, don't. If you use File.Exists, you introduce concurrency problem. If you wrote this code:
if file exists then
open file
then if another program deleted your file between when you checked File.Exists and before you actually open the file, then the program will still throw exception.
Second, even if a file exists, that does not mean you can actually open the file, you might not have the permission to open the file, or the file might be a read-only filesystem so you can't open in write mode, etc.
File I/O is much, much more expensive than exception, there is no need to worry about the performance of exceptions.
EDIT:
Benchmarking Exception vs Exists in Python under Linux
import timeit
setup = 'import random, os'
s = '''
try:
open('does not exist_%s.txt' % random.randint(0, 10000)).read()
except Exception:
pass
'''
byException = timeit.Timer(stmt=s, setup=setup).timeit(1000000)
s = '''
fn = 'does not exists_%s.txt' % random.randint(0, 10000)
if os.path.exists(fn):
open(fn).read()
'''
byExists = timeit.Timer(stmt=s, setup=setup).timeit(1000000)
print 'byException: ', byException # byException: 23.2779269218
print 'byExists: ', byExists # byExists: 22.4937438965
Is this behavior truly exceptional? If it is expected, you should be testing with an if statement, and not using exceptions at all. Performance isn't the only issue with this solution and from the sound of what you are trying to do, performance should not be an issue. Therefore, style and a good approach should be the items of concern with this solution.
So, to summarize, since you expect some tests to fail, do use the File.Exists to check instead of catching exceptions after the fact. You should still catch other exceptions that can occur, of course.
It depends !
If there's a high chance for the file to be there (you know this for your scenario, but as an example something like desktop.ini) I would rather prefer to directly try to open it.
Anyway, in case of using File.Exist you need to put File.OpenRead in try/catch for concurrency reasons and avoiding any run-time exception but it would considerably boost your application performance if the chance for file to be there is low. Ostrich algorithm
Wouldn't it be most efficient to run a directory search, find it, and then try to open it?
Dim Files() as string = System.IO.Directory.GetFiles("C:\", "SpecificName.txt", IO.SearchOption.AllDirectories)
Then you would get an array of strings that you know exist.
Oh, and as an answer to the original question, I would say that yes, try/catch would introduce more processor cycles, I would also assume that IO peeks actually take longer than the overhead of the processor cycles.
Running the Exists first, then the open second, is 2 IO functions against 1 of just trying to open it. So really, I'd say the overall performance is going to be a judgment call on processor time vs. hard drive speed on the PC it will be running on. If you've got a slower processor, I'd go with the check, if you've got a fast processor, I might go with the try/catch on this one.
File.Exists is a good first line of defense. If the file doesn't exist, then you're guaranteed to get an exception if you try to open it. The existence check is cheaper than the cost of throwing and catching an exception. (Maybe not much cheaper, but a bit.)
There's another consideration, too: debugging. When you're running in the debugger, the cost of throwing and catching an exception is higher, because the IDE has hooks into the exception mechanism that increase your overhead. And if you've checked any of the "Break on thrown" checkboxes in Debug > Exceptions, then any avoidable exceptions become a huge pain point. For that reason alone, I would argue for preventing exceptions when possible.
However, you still need the try-catch, for the reasons pointed out by other answers here. The File.Exists call is merely an optimization; it doesn't save you from needing to catch exceptions due to timing, permissions, solar flares, etc.
I don't know about efficiency but I would prefer the File.Exists check. The problem is all the other things that could happen: bad file handle, etc. If your program logic knows that sometimes the file doesn't exist and you want to have a different behavior for existing vs. non-existing files, use File.Exists. If its lack of existence is the same as other file-related exceptions, just use exception handling.
Vexing Exceptions -- more about using exceptions well
Yes, you should use File.Exists. Exceptions should be used for exceptional situations not to control the normal flow of your program. In your case, a file not being there is not an exceptional occurrence. Therefore, you should not rely on exceptions.
UPDATE:
So everyone can try it for themselves, I'll post my test code. For non existing files, relying on File.Open to throw an exception for you is about 50 times worse than checking with File.Exists.
class Program
{
static void Main(string[] args)
{
TimeSpan ts1 = TimeIt(OpenExistingFileWithCheck);
TimeSpan ts2 = TimeIt(OpenExistingFileWithoutCheck);
TimeSpan ts3 = TimeIt(OpenNonExistingFileWithCheck);
TimeSpan ts4 = TimeIt(OpenNonExistingFileWithoutCheck);
}
private static TimeSpan TimeIt(Action action)
{
int loopSize = 10000;
DateTime startTime = DateTime.Now;
for (int i = 0; i < loopSize; i++)
{
action();
}
return DateTime.Now.Subtract(startTime);
}
private static void OpenExistingFileWithCheck()
{
string file = #"C:\temp\existingfile.txt";
if (File.Exists(file))
{
using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
{
}
}
}
private static void OpenExistingFileWithoutCheck()
{
string file = #"C:\temp\existingfile.txt";
using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
{
}
}
private static void OpenNonExistingFileWithCheck()
{
string file = #"C:\temp\nonexistantfile.txt";
if (File.Exists(file))
{
using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
{
}
}
}
private static void OpenNonExistingFileWithoutCheck()
{
try
{
string file = #"C:\temp\nonexistantfile.txt";
using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
{
}
}
catch (Exception ex)
{
}
}
}
On my computer:
ts1 = .75 seconds (same with or without debugger attached)
ts2 = .56 seconds (same with or without debugger attached)
ts3 = .14 seconds (same with or without debugger attached)
ts4 = 14.28 seconds (with debugger attached)
ts4 = 1.07 (without debugger attached)
UPDATE:
I added details on whether a dubgger was attached or not. I tested debug and release build but the only thing that made a difference was the one function that ended up throwing exceptions while the debugger was attached (which makes sense). Still though, checking with File.Exists is the best choice.
I would say that, generally speaking, exceptions "increase" the overall "performance" of your system!
In your sample, anyway, it is better to use File.Exists...
The problem with using File.Exists first is that it opens the file too. So you end up opening the file twice. I haven't measured it, but I guess this additional opening of the file is more expensive than the occasional exceptions.
If the File.Exists check improves the performance depends on the probability of the file existing. If it likely exists then don't use File.Exists, if it usually doesn't exist the the additional check will improve the performance.
The overhead of an exception is noticeable, but it's not significant compared to file operations.
My program is unable to File.Move or File.Delete a file because it is being used "by another process", but it's actually my own program that is using it.
I use Directory.GetFiles to initially get the file paths, and from there, I process the files by simply looking at their names and processing information that way. Consequently all I'm doing is working with the strings themselves, right? Afterwards, I try to move the files to a "Handled" directory. Nearly all of them will usually move, but from time to time, they simply won't because they're being used by my program.
Why is it that most of them move but one or two stick around? Is there anything I can do to try freeing up the file? There's no streams to close.
Edit Here's some code:
public object[] UnzipFiles(string[] zipFiles)
{
ArrayList al = new ArrayList(); //not sure of proper array size, so using arraylist
string[] files = null;
for (int a = 0; a < zipFiles.Length; a++)
{
string destination = settings.GetTorrentSaveFolder() + #"\[CSL]--Temp\" + Path.GetFileNameWithoutExtension(zipFiles[a]) + #"\";
try
{
fz.ExtractZip(zipFiles[a], destination, ".torrent");
files = Directory.GetFiles(destination,
"*.torrent", SearchOption.AllDirectories);
for (int b = 0; b < files.Length; b++)
al.Add(files[b]);
}
catch(Exception e)
{}
}
try
{
return al.ToArray(); //return all files of all zips
}
catch (Exception e)
{
return null;
}
}
This is called from:
try
{
object[] rawFiles = directory.UnzipFiles(zipFiles);
string[] files = Array.ConvertAll<object, string>(rawFiles, Convert.ToString);
if (files != null)
{
torrents = builder.Build(files);
xml.AddTorrents(torrents);
directory.MoveProcessedFiles(xml);
directory.MoveProcessedZipFiles();
}
}
catch (Exception e)
{ }
Therefore, the builder builds objects of class Torrent. Then I add the objects of class Torrent into a xml file, which stores information about it, and then I try to move the processed files which uses the xml file as reference about where each file is.
Despite it all working fine for most of the files, I'll get an IOException thrown about it being used by another process eventually here:
public void MoveProcessedZipFiles()
{
string[] zipFiles = Directory.GetFiles(settings.GetTorrentSaveFolder(), "*.zip", SearchOption.TopDirectoryOnly);
if (!Directory.Exists(settings.GetTorrentSaveFolder() + #"\[CSL] -- Processed Zips"))
Directory.CreateDirectory(settings.GetTorrentSaveFolder() + #"\[CSL] -- Processed Zips");
for (int a = 0; a < zipFiles.Length; a++)
{
try
{
File.Move(zipFiles[a], settings.GetTorrentSaveFolder() + #"\[CSL] -- Processed Zips\" + zipFiles[a].Substring(zipFiles[a].LastIndexOf('\\') + 1));
}
catch (Exception e)
{
}
}
}
Based on your comments, this really smells like a handle leak. Then, looking at your code, the fz.ExtractZip(...) looks like the best candidate to be using file handles, and hence be leaking them.
Is the type of fz part of your code, or a third party library? If it's within your code, make sure it closes all its handles (the safest way is via using or try-finally blocks). If it's part of a third party library, check the documentation and see if it requires any kind of cleanup. It's quite possible that it implements IDisposable; in such case put its usage within a using block or ensure it's properly disposed.
The line catch(Exception e) {} is horribly bad practice. You should only get rid of exceptions this way when you know exactly what exception may be thrown and why do you want to ignore it. If an exception your program can't handle happens, it's better for it to crash with a descriptive error message and valuable debug information (eg: exception type, stack trace, etc), than to ignore the issue and continue as if nothing had gone wrong, because an exception means that something has definitely gone wrong.
Long story short, the quickest approach to debug your program would be to:
replace your generic catchers with finally blocks
add/move any relevant cleanup code to the finally blocks
pay attention to any exception you get: where was it thrown form? what kind of exception is it? what the documentation or code comments say about the method throwing it? and so on.
Either
4.1. If the type of fz is part of your code, look for leaks there.
4.2. If it's part of a third party library, review the documentation (and consider getting support from the author).
Hope this helps
What this mean: "there is no streams to close"? You mean that you do not use streams or that you close them?
I believe that you nevertheless have some opened stream.
Do you have some static classes that uses this files?
1. Try to write simple application that will only parse move and delete the files, see if this will works.
2. Write here some pieces of code that works with your files.
3. Try to use unlocker to be sure twice that you have not any other thing that uses those files: http://www.emptyloop.com/unlocker/ (don't forget check files for viruses :))
Class Path was handling multiple files to get me their filenames. Despite being unsuccessful in reproducing the same issue, forcing a garbage collect using GC.Collect at the end of the "processing" phase of my program has been successful in fixing the issue.
Thanks again all who helped. I learned a lot.