What I'm doing
I'm working on a webservice which is copying files from one location to another. Files are being updated(the size should be increased every 3 seconds since there is text added).
1st option:
I'm checking every 10 seconds if any of the files is modified(they are being modified every 5 seconds cca) so I can copy(and overwrite) them to the final destination. Atm I'm using a code which is comparing the last edit time of the file with actual time - some amount of time(1 minute atm).
DateTime lastEditTime = new DateTime();
lastEditTime = File.GetLastWriteTime(myFile);
if (lastEditTime > DateTime.Now.AddMinutes(-1))
{
File.Copy(myFile, newFileName, true);
}
But I think this is kinda bad approach since there might be some time space or something and I won't get some changes.
2nd option
I could check the file sizes(using the FileInfo.Length property probably) of each file in the source directory and compare them to the ones in final destination.
This should be ok too since the file sizes should only grow since the text is added only.
3rd option
I read a lot people recommend the FileSystemWatcher but I don't want to miss some changes which might happen - at least I read that at other SO questions(see https://stackoverflow.com/a/240008/2296407).
What is my question?
What is the best option to know if any file was changed(if the file in source is different from file in final destination) in last x mins or seconds cause I don't want to copy everything since there might be a lot of files.
By being best option I mean: is it faster to compare each files sizes or compare the File.GetLastWriteTime(myFile) with actual time - some time. In the second case ther is also question: How big the time span should be? If I put a big time span I will probably copy more files than I actually need but if I put it low I might miss some changes.
If you have some better options feel free to share them with me!
Although you already mentioned it in your option 3, I still think should give it a try with the FileSystemWatcher Class. As far as I understood you, you have not yet done so, right?
Although it is true that the watcher may lose some event in the default configuration, you can still make it work reliably if you do some tweaking.
Have a look at the "Remarks" section in the documentation (highlights by me):
The Windows operating system notifies your component of file changes
in a buffer created by the FileSystemWatcher. If there are many
changes in a short time, the buffer can overflow. This causes the
component to lose track of changes in the directory, and it will only
provide blanket notification. Increasing the size of the buffer with
the InternalBufferSize property is expensive, as it comes from
non-paged memory that cannot be swapped out to disk, so keep the
buffer as small yet large enough to not miss any file change events.
To avoid a buffer overflow, use the NotifyFilter and
IncludeSubdirectories properties so you can filter out unwanted change
notifications.
Things you can do to make it reliably work:
Note that a FileSystemWatcher may miss an event when the buffer size
is exceeded. To avoid missing events, follow these guidelines:
Increase the buffer size by setting the InternalBufferSize property.
Avoid watching files with long file names, because a long file name contributes to filling up the buffer. Consider renaming these files
using shorter names.
Keep your event handling code as short as possible.
For example user Nomix says he raised the buffer size (property InternalBufferSize) up to 16 MB and has never had a problem with the FileSystemWatcher class (SO post is here.) And I can confirm this with a project in my company that works fine for years, too, since we found out about the buffer.
Initialization of the object might look like this for example:
private void InitWatcher()
{
// Create a new FileSystemWatcher and set its properties.
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.Path = "Your path to watch";
// You only want to watch a single folder
watcher.IncludeSubdirectories = false;
// You mentioned both LastWrite and Size
// You can combine them or just watch for only a specific property
// Simply configure it to your needs
watcher.NotifyFilter = NotifyFilters.LastWrite | NotifyFilters.Size
// Only watch text files.
watcher.Filter = "*.txt";
// Add event handlers, omit those you are not interested in
watcher.Changed += new FileSystemEventHandler(OnChanged);
// Begin watching.
watcher.EnableRaisingEvents = true;
}
You can then subscribe to those events that are of interest for you, like the Changed event and react to it as easy as:
private static void OnChanged(object source, FileSystemEventArgs e)
{
File.Copy(e.FullPath, newFileName, true);
}
Related
Environment: Any .Net Framework welcomed.
I have a log file that gets written to 24/7.
I am trying to create an application that will read the log file and process the data.
What's the best way to read the log file efficiently? I imagine monitoring the file with something like FileSystemWatcher. But how do I make sure I don't read the same data once it's been processed by my application? Or say the application aborts for some unknown reason, how would it pick up where it left off last?
There's usually a header and footer around the payload that's in the log file. Maybe an id field in the content as well. Not sure yet though about the id field being there.
I also imagined maybe saving the lines read count somewhere to maybe use that as bookmark.
For obvious reasons reading the whole content of the file, as well as removing lines from the log files (after loading them into your application) is out of question.
What I can think of as a partial solution is having a small database (probable something much smaller than a full-blown MySQL/MS SQL/PostgreSQL instance) and populating table with what has been read from the log file. I am pretty sure that even if there is power cut off and then the machine is booted again, most of the relational databases should be able to restore it's state with ease. This solution requires some data that could be used to identify the row from the log file (for example: exact time of the action logged, machine on which the action has taken place etc.)
Well, you will have to figure out your magic for your particular case yourself. If you are going to use well-known text encoding it may be pretty simple thoght. Look toward System.IO.StreamReader and it's ReadLine(), DiscardBufferedData() methods and BaseStream property. You should be able to remember your last position in the file and rewind to that position later and start reading again, given that you are sure that file is only appended. There are other things to consider though and there is no single universal answer to this.
Just as a naive example (you may still need to adjust a lot to make it work):
static void Main(string[] args)
{
string filePath = #"c:\log.txt";
using (var stream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
using (var streamReader = new StreamReader(stream,Encoding.Unicode))
{
long pos = 0;
if (File.Exists(#"c:\log.txt.lastposition"))
{
string strPos = File.ReadAllText(#"c:\log.txt.lastposition");
pos = Convert.ToInt64(strPos);
}
streamReader.BaseStream.Seek(pos, SeekOrigin.Begin); // rewind to last set position.
streamReader.DiscardBufferedData(); // clearing buffer
for(;;)
{
string line = streamReader.ReadLine();
if( line==null) break;
ProcessLine(line);
}
// pretty sure when everything is read position is at the end of file.
File.WriteAllText(#"c:\log.txt.lastposition",streamReader.BaseStream.Position.ToString());
}
}
}
I think you will find the File.ReadLines(filename) function in conjuction with LINQ will be very handy for something like this. ReadAllLines() will load the entire text file into memory as a string[] array, but ReadLines will allow you to begin enumerating the lines immediately as it traverses through the file. This not only saves you time but keeps the memory usage very low as it is processing each line one at a time. Using statements are important because if this program is interrupted it will close the filestreams flushing the writer and saving unwritten content to the file. Then when it starts up it will skip all the files that are already read.
int readCount = File.ReadLines("readLogs.txt").Count();
using (FileStream readLogs = new FileStream("readLogs.txt", FileMode.Append))
using (StreamWriter writer = new StreamWriter(readLogs))
{
IEnumerable<string> lines = File.ReadLines(bigLogFile.txt).Skip(readCount);
foreach (string line in lines)
{
// do something with line or batch them if you need more than one
writer.WriteLine(line);
}
}
As MaciekTalaska mentioned, I would strongly recommend using a database if this is something written to 24/7 and will get quite large. File systems are simply not equipped to handle such volume and you will spend a lot of time trying to invent solutions where a database could do it in a breeze.
Is there a reason why it logs to a file? Files are great because they are simple to use and, being the lowest common denominator, there is relatively little that can go wrong. However, files are limited. As you say, there's no guarantee a write to the file will be complete when you read the file. Multiple applications writing to the log can interfere with each other. There is no easy sorting or filtering mechanism. Log files can grow very big very quickly and there's no easy way to move old events (say those more than 24 hours old) into separate files for backup and retention.
Instead, I would considering writing the logs to a database. The table structure can be very simple but you get the advantage of transactions (so you can extract or backup with ease) and search, sort and filter using an almost universally understood syntax. If you are worried about load spikes, use a message queue, like http://msdn.microsoft.com/en-us/library/ms190495.aspx for SQL Server.
To make the transition easier, consider using a logging framework like log4net. It abstracts much of this away from your code.
Another alternative is to use a system like syslog or, if you have multiple servers and a large volume of logs, flume. By moving the log files away the source computer, you can store them or inspect them on a different machine far more effectively. However, these are probably overkill for your current problem.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
TextBox.Text Leaking Memory in WPF Application
I've got an application trailing a logfile. Every time the logfile updates (which is usually a series of updates in a row) the memory use balloons out of control.
I've tracked down the problem to this call:
if (File.Exists(Path + "\\logfile.txt"))
Data = File.ReadAllText(Path + "\\logfile.txt");
This is being called from within LoadAllData, here.
private void FileChangeNotificationHandler(object source, FileSystemEventArgs e)
{
this.Dispatcher.BeginInvoke
(new Action(delegate()
{
Logfile.GetPath();
Logfile.LoadAllData();
LogText.Clear();
LogText.Text = Logfile.Data;
if (CheckFollowTail.IsChecked == true) LogText.ScrollToEnd();
}));
}
Does anyone have insight on why this occurring? I assume it's related to the delegate or the handler.
It's probably just down to the amount and frequency with which you are loading log file data into memory.
GC takes time, so it you are repeating this in quick succession, then chances are you'll have several files worth of data in memory until the next GC. This seems very inefficient. You should consider the use of a stream based reader, to avoid keeping all the data in memory. If you do use a stream reader, make sure you dispose of it afterwards to avoid introducing another leak.
The another thing to check it that your not subscribing to a static event somewhere and therefore preventing your object tree from being disposed. Is it a web app?
First of all, checking if the file exists is wrong. This is because the file system is volatile and because there is more than just existence at play (permissions, for example). The correct way to do this is to just open the file, and then handle the exception if it fails.
Now, on to your stated problem. What I suspect is happening is that the log is growing large enough to use the Large Object Heap (85000 bytes is all that's needed, iirc, and remember that .Net uses utf16 (2-byte) characters). A 43K ascii log file is all you'll need to start causing problems, because at that size your .Net string is no longer garbage collected in the normal way. Every time you read the file you end up adding another instance of the entire log file to memory.
To best recommend how to get around this, it will be helpful to know what kind of component you use for your LogText variable. But pending that information, I can at least suggest a few pointers:
Ideally, you would just keep the file open (using FileShare.ReadWrite) and read from the stream every time you get a change notification. But that's not always possible.
If you have to re-open the file each time, at least read the text line by line (using a StreamReader) rather than pulling it all at once using File.ReadAllLines(). This will help you keep your log file broken up into smaller pieces that won't end up on the large object heap.
Unfortunately, I suspect that in the end you're stuck building one big string to assign to a plain textbox. If this is the case, I strongly recommend that you either only ever build and show the last part of the log (less than 85000 bytes worth) or that you search for a Large Object Heap-safe Textbox component to use.
I am writing a log of lots and lots of formatted text to a textbox in a .net windows form app.
It is slow once the data gets over a few megs. Since I am appending the string has to be reallocated every time right? I only need to set the value to the text box once, but in my code I am doing line+=data tens of thousands of times.
Is there a faster way to do this? Maybe a different control? Is there a linked list string type I can use?
StringBuilder will not help if the text box is added to incrementally, like log output for example.
But, if the above is true and if your updates are frequent enough it may behoove you to cache some number of updates and then append them in one step (rather than appending constantly). That would save you many string reallocations... and then StringBuilder would be helpful.
Notes:
Create a class-scoped StringBuilder member (_sb)
Start a timer (or use a counter)
Append text updates to _sb
When timer ticks or certain counter reached reset and append to
text box
restart process from #1
No one has mentioned virtualization yet, which is really the only way to provide predictable performance for massive volumes of data. Even using a StringBuilder and converting it to a string every half a second will be very slow once the log gets large enough.
With data virtualization, you would only hold the necessary data in memory (i.e. what the user can see, and perhaps a little more on either side) whilst the rest would be stored on disk. Old data would "roll out" of memory as new data comes in to replace it.
In order to make the TextBox appear as though it has a lot of data in it, you would tell it that it does. As the user scrolls around, you would replace the data in the buffer with the relevant data from the underlying source (using random file access). So your UI would be monitoring a file, not listening for logging events.
Of course, this is all a lot more work than simply using a StringBuilder, but I thought it worth mentioning just in case.
Build your String together with a StringBuilder, then convert it to a String using toString(), and assign this to the textbox.
I have found that setting the textbox's WordWrap property to false greatly improves performance, as long as you're ok with having to scroll to the right to see all of your text. In my case, I wanted to paste a 20-50 MB file into a MultiLine textbox to do some processing on it. That took several minutes with WordWrap on, and just several seconds with WordWrap off.
I have built a small app that allows me to choose a directory and count the total size of files in that directory and its sub directories.
It allows me to select a drive and this populates a tree control with the drives immediate folders which I can then count its size!
It is written in .net and simply loops round on the directories and for each directory adds up the file sizes.
It brings my pc to a halt when It runs on say the windows or program files folders.
I had thought of Multi threading but I haven't done this before.
Any ideas to increase performance?
thanks
Your code is really going to slog since you're just using strings to refer to directories and files. Use a DirectoryInfo on your root directory; get a list of FileSystemInfos from that one using DirectoryInfo.GetFileSystemInfos(); iterate on that list, recursing in for DirectoryInfo objects and just adding the size for FileInfo objects. That should be a LOT faster.
I'd simply suggest using a background worker to preform the work. You'll probably want to make sure controls that shouldn't be usable aren't but anything that would be usable could stay usable.
Google: http://www.google.com/search?q=background+worker
This would allow your application to be multi-threaded with out some of the complexity of multiple threads. Everything has been packaged up and it convenient to use.
Do you want to increase performance or increase system responsiveness?
You can increase RESPONSIVENESS by instructing the spidering application to run its message queue loop periodically, which handles screen repaints, etc. This would allow you to give a progress update as it executes the scan, while actually decreasing performance (because you're yielding CPU priority).
This gets sub-directories:
string[] directories = Directory.GetDirectories(node.FullPath);
foreach (string dir in directories)
{
TreeNode nd = node.Nodes.Add(dir, dir.Substring(dir.LastIndexOf("\\")).Replace("\\", ""), 3);
if (showItsChildren)
ShowChildDirectories(nd, true);
size += GetDirectorySize(nd.FullPath);
}
Thsi counts file sizes:
long b = 0;
// Get array of all file names.
string[] a = Directory.GetFiles(p, "*.*");
// 2
// Calculate total bytes of all files in a loop.
foreach (string name in a)
{
// 3
// Use FileInfo to get length of each file.
FileInfo info = new FileInfo(name);
b += info.Length;
IncrementCount();
}
Try to comment out all the parts that update the UI, if it's still slow it's the disk I/O and there's nothing you can do, if it gets faster you can update the UI every X files to save UI work.
You can make your UI responsive by doing all the work in a worker thread, but it will make it slightly slower.
Disk I/O is relatively slow it is also often needed by other applications (swap file, temp files ...) also, multi-threading won't help much, all the file are on the same physical disk and it's likely the disk I/O is the bottleneck.
Just a guess, but I bet your performance hit involves the UI and not the file scan. Comment out the code that creates the TreeNode.
Try to not make your tree paint until after you complete your scan:
Make sure that the root tree node for all of your files is NOT added to the treey. Add all the children, and then add the "top" node/nodes at the very end of your processing. See how that works.
I am creating a downloading application and I wish to preallocate room on the harddrive for the files before they are actually downloaded as they could potentially be rather large, and noone likes to see "This drive is full, please delete some files and try again." So, in that light, I wrote this.
// Quick, and very dirty
System.IO.File.WriteAllBytes(filename, new byte[f.Length]);
It works, atleast until you download a file that is several hundred MB's, or potentially even GB's and you throw Windows into a thrashing frenzy if not totally wipe out the pagefile and kill your systems memory altogether. Oops.
So, with a little more enlightenment, I set out with the following algorithm.
using (FileStream outFile = System.IO.File.Create(filename))
{
// 4194304 = 4MB; loops from 1 block in so that we leave the loop one
// block short
byte[] buff = new byte[4194304];
for (int i = buff.Length; i < f.Length; i += buff.Length)
{
outFile.Write(buff, 0, buff.Length);
}
outFile.Write(buff, 0, f.Length % buff.Length);
}
This works, well even, and doesn't suffer the crippling memory problem of the last solution. It's still slow though, especially on older hardware since it writes out (potentially GB's worth of) data out to the disk.
The question is this: Is there a better way of accomplishing the same thing? Is there a way of telling Windows to create a file of x size and simply allocate the space on the filesystem rather than actually write out a tonne of data. I don't care about initialising the data in the file at all (the protocol I'm using - bittorrent - provides hashes for the files it sends, hence worst case for random uninitialised data is I get a lucky coincidence and part of the file is correct).
FileStream.SetLength is the one you want. The syntax:
public override void SetLength(
long value
)
If you have to create the file, I think that you can probably do something like this:
using (FileStream outFile = System.IO.File.Create(filename))
{
outFile.Seek(<length_to_write>-1, SeekOrigin.Begin);
OutFile.WriteByte(0);
}
Where length_to_write would be the size in bytes of the file to write. I'm not sure that I have the C# syntax correct (not on a computer to test), but I've done similar things in C++ in the past and it's worked.
Unfortunately, you can't really do this just by seeking to the end. That will set the file length to something huge, but may not actually allocate disk blocks for storage. So when you go to write the file, it will still fail.