CPU & Memory spikes, during Parallel.ForEach

CPU & Memory spikes, during Parallel.ForEach - c#

I am building an application for work to copy files and folders, with a few more options but these are not being utilised during this issue.
The function in question iterates through each file in a directory, and then copies the file to an identical directory, in a new location (so it preserves nested file structures).
The application is a Windows Form, and due to issues writing to a text box at the same time, I have surrounded the parallel function in a Task.Factory.StartNew(), which fixed that issue.
Task.Factory.StartNew(() =>
{
Parallel.ForEach(Directory.GetFiles(root, "*.*", SearchOption.AllDirectories), newPath =>
{
try
{
File.Copy(newPath, newPath.Replace(root, destination), false);
WriteToOutput("recreated the file '" + newPath.Replace(root, destination) + "'");
}
catch (Exception e)
{
WriteToOutput(e.Message);
}
});
});
When run, the diagnostic tools show spikes every few seconds. How can I 'even out' these spikes and make the performance consistent? I am also writing to the screen for each file that is moved, and there is a noticeable second or so pause between every maybe, 20/25 files.
The below screenshot is a sample from the Diagnostic Tools.

Your work is primarily IO bound, not CPU bound. You don't have any work for a CPU to do most of the time. You're just waiting for the hard drive to do its work. The spikes in your CPU are merely the short periods of time after the disk has finished an operation where the CPU is trying to figure out what to ask it to do next, which takes very little time, hence why you see spikes, not plateaus.

I am concerned by this sentence:
due to issues writing to a text box at the same time, I have surrounded the parallel function in a Task.Factory.StartNew(), which fixed that issue
I honestly doubt that fixed the issue. It probably concealed it. You do not appear to be awaiting or checking on the Task, so you are therefore not observing any exceptions. The short CPU spike and the delay in output could easily be caused by a stack unwind of some kind.
If you having trouble updating the UI from your worker threads, make sure you understand the purpose of Invoke and be sure you are using it. Then get rid of the StartNew, or make sure you are handling any exceptions.

What you're doing is to press the disk with many file read requests in parallel. Well, disks, like any other I/O device, don't work well in that mode.
For one thing, if you're reading the HDD, then it definitely cannot answer the parallel requests simply because it would have to move the reading head to multiple locations at the same time.
Even with an SDD, the device cannot answer the requests at the same rate at which CPU can ask.
In any case, the disk will definitely not be able to return the data at uniform speed. Many file read requests will be pending for the whole eternity (measured in CPU time), leaving those tasks blocked. That is the reason why performance is uneven when storming the disk with many parallel operations.
When attempting to process many files, you might choose to allocate one task to read them, and then process the loaded data in parallel. Think about that design instead. The I/O-bound task would be only one and it won't be blocked more than necessary. That will let the drive return the data at maximum speed which it can achieve at the time. The CPU-bound tasks would be non-blocking, obviously, because their data would already be in memory at the time any of the tasks is started. I would expect that design to provide smooth performance.

Related

Whether there should be a difference to copy files in parallel and not in parallel?

I tried to copy 200 files with both below solutions but I didn't saw the difference ( I used System.Diagnostics.Stopwatch to measure time). In both cases it took 8 seconds. Should not the second(Parallel) solution be faster? I thought because it is IO operations using Parallel will speed up the copy.
What I'm missing?
// Case1 - Regular iteration
foreach (FileInfo file in files)
{
string temppath = Path.Combine(destDirName, file.Name);
file.CopyTo(temppath, false);
}
// Case2 - Parallel
Parallel.ForEach(files, file =>
{
string temppath = Path.Combine(destDirName, file.Name);
file.CopyTo(temppath, false);
});

Parallel execution of tasks doesn't guarantee performance improvements.
In your case, file copying will likely be IO bound, not CPU bound. The CPU typically has much more bandwidth available than the IO device (unless you happen to be using a Fusion IO card or something), so IO actions tend to cause the CPU to wait a lot.
Tasks that use a lot of CPU can be done in parallel to get performance gains. Tasks that also wait on external factors can also benefit from being shifted onto another thread so as not to become a blocking task.
However, tasks that are waiting for the same external resource will likely see little benefit unless that external resource can itself handle the traffic associated with multiple threads (IO tends to never be able to handle the traffic and could in fact cause contention of the resource, slowing it down).

Exactly the same as #Adam above, but with added points:
If you get a chance, you might like to try this with one, or two networked drives. File transfer on high-latency connections have a large setup latency that may well give an advantage to the multithreaded solution, especially with large numbers of small files.
It may well be also useful to use a pool with a small, fixed number of threads, (2 say), to see if that reduces disk contention and improves performance.
If you like my answer, please upvote #Adam, not me.

How to make my i7 processor reach 100% usage with this code (fastest way to parse xml)

How to make my i7 processor reach 100% usage with this code? Is there something special that happens in the XmlDocument? is just because of the context change? and if so why putting more threads wouldnt make the the processor use its full power? what would be the fatest way to parse several strings at a time?
EDIT:
Maybe this code will make it more clear, no matter what number of threads it uses 30% of the processor:
private void Form1_Load(object sender, EventArgs e)
{
Action action = () =>
{
while (true)
{
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml("<html><body><div>1111</div><div>222</div></body></html>");
var nodes = xmlDocument.SelectNodes("//div");
}
};
Parallel.For(0, 16, i => action());
}

In your code sample (and you would see this with a profiler) you are wasting a LOT time waiting for available resources to run those threads. Because you are constantly requesting more and more Parallel.For (which is a non-blocking call) - your process is spending significant time waiting for threads to finish and then the next thread to be selected (an ever growing amount of such threads all requesting time to run).
Consider this output from the profiler:
The RED color is synchronization! Look how much work is going on by the kernel to let my app run so many threads! Note, if you had a single core processor, you'd definitely see 100%
You're going to have the best time reading this xml by splitting the string and parsing them separately (post-load from I/O of course). You may not see 100% cpu usage, but that's going to be the best option. Play with different partition sizes of the string (i.e. substring sizes).
For an amazing read on parallel patterns, I recommend this paper by Stephen Toub: http://download.microsoft.com/download/3/4/D/34D13993-2132-4E04-AE48-53D3150057BD/Patterns_of_Parallel_Programming_CSharp.pdf
EDIT I did some searching for a smart way to read xml in multiple threads. My best advice is this:
Split your xml files into smaller files if you can.
Use one thread per xml file.
If 1&2 aren't sufficient for you perf needs, consider not loading it as xml completely, but partitioning the string (splitting it), and parsing a bit by hand (not to an XmlDocument). I would only do this if 1 and 2 are good enough for your needs. Each partition (substring) would run on its own thread. Remember too that "more threds" != "more cpu usage", at least not for your app. As we see in the profiler example, too many threads costs a lot of overhead. Keep it simple.

Is this the actual code you are running, or are you loading the xml from a file or other URL? If this is the actual code, then it's probably finishing too fast and the CLR doesn't have time to optimize the threadcount, but when you put the infinite loop it guarantees you'll max out the CPUs.
If you are loading XML from real sources, then threads can be waiting for IO responses and that won't consume any CPU while that's happening. To speed that case up you can preload all the XML using lots of threads (like 20+) into memory, and then use 8 threads to do the XML parsing afterwards.

The processor is the fastest component on a modern PC. Bottlenecks often come in the form of RAM or Hard Drives. In the first case, you are continuously creating a variable with the potential to eat up a lot of memory. Thus, its intuitive that RAM becomes the bottleneck as cache quickly runs dry.
In the second case you are not creating any variables (I'm sure .NET is doing plenty in the background, albeit in a highly optimized way). So, its intuitive that all the work stays at the CPU.
How the OS handles memory, interrupts etc is impossible to fully define. You can use tools that help define these situations, but last time I checked there's not even a Memory analyzer for .NET code. So that's why I say take the answer with a grain of salt.

The Task Parallel Library distributes the Actions, so you lose a bit of control when it comes to process utilization. For the most part that's a good thing because we don't have to worry about creating too many threads, making our threads too big, etc. If you want to explicitly create threads then the following code should push your processor to the max:
Parallel.For(0, 16, index => new Thread(() =>
{
while (true)
new Thread(() =>
{
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml("<html><body><div>1111</div><div>222</div></body></html>");
var nodes = xmlDocument.SelectNodes("//div");
}).Start();
}).Start());
I'm not saying I recommend this approach, just showing a working example of the code pushing my processor to the max (AMD FX-6200). I was seeing about 30% using the Task Parallel Library, too.

Threads vs Processes in .NET

I have a long-running process that reads large files and writes summary files. To speed things up, I'm processing multiple files simultaneously using regular old threads:
ThreadStart ts = new ThreadStart(Work);
Thread t = new Thread(ts);
t.Start();
What I've found is that even with separate threads reading separate files and no locking between them and using 4 threads on a 24-core box, I can't even get up to 10% on the CPU or 10% on disk I/O. If I use more threads in my app, it seems to run even more slowly.
I'd guess I'm doing something wrong, but where it gets curious is that if I start the whole exe a second and third time, then it actually processes files two and three times faster. My question is, why can't I get 12 threads in my one app to process data and tax the machine as well as 4 threads in 3 instances of my app?
I've profiled the app and the most time-intensive and frequently called functions are all string processing calls.

It's possible that your computing problem is not CPU bound, but I/O bound. It doesn't help to state that your disk I/O is "only at 10%". I'm not sure such performance counter even exists.
The reason why it gets slower while using more threads is because those threads are all trying to get to their respective files at the same time, while the disk subsystem is having a hard time trying to accomodate all of the different threads. You see, even with a modern technology like SSDs where the seek time is several orders of magnitude smaller than with traditional hard drives, there's still a penalty involved.
Rather, you should conclude that your problem is disk bound and a single thread will probably be the fastest way to solve your problem.
One could argue that you could use asynchronous techniques to process a bit that's been read, while on the background the next bit is being read in, but I think you'll see very little performance improvement there.
I've had a similar problem not too long ago in a small tool where I wanted to calculate MD5 signatures of all the files on my harddrive and I found that the CPU is way too fast compared to the storage system and I got similar results trying to get more performance by using more threads.
Using the Task Parallel Library didn't alleviate this problem.

First of all on a 24 core box if you are using only 4 threads the most cpu it could ever use is 16.7% so really you are getting 60% utilization, which is fairly good.
It is hard to tell if your program is I/O bound at this point, my guess is that is is. You need to run a profiler on your project and see what sections of code your project is spending the most of it's time. If it is sitting on a read/write operation it is I/O bound.
It is possable you have some form of inter-thread locking being used. That would cause the program to slow down as you add more threads, and yes running a second process would fix that but fixing your locking would too.
What it all boils down to is without profiling information we can not say if using a second process will speed things up or make things slower, we need to know if the program is hanging on a I/O operation, a locking operation, or just taking a long time in a function that can be parallelized better.

I think you find out what file cache is not ideal in case when one proccess write data in many file concurrently. File cache should sync to disk when the number of dirty page cache exceeds a threshold. It seems concurrent writers in one proccess hit threshold faster than the single thread writer. You can read read about file system cache here File Cache Performance and Tuning

Try using Task library from .net 4 (System.Threading.Task). This library have built-in optimizations for different number of processors.
Have no clue what is you problem, maybe because your code snippet is not really informative

How to Download 5 files at a time using Thread in .net framework 3.5

I need to download certain files using FTP.Already it is implemented without using the thread. It takes too much time to download all the files.
So i need to use some thread for speed up the process .
my code is like
foreach (string str1 in files)
{
download_FTP(str1)
}
I refer this , But i don't want every files to be queued at ones.say for example 5 files at a time.

If the process is too slow, it means most likely that the network/Internet connection is the bottleneck. In that case, downloading the files in parallel won't significantly increase the performance.
It might be another story though if you are downloading from different servers. We may then imagine that some of the servers are slower than others. In that case, parallel downloads would increase the overall performance since the program would download files from other servers while being busy with slow downloads.
EDIT: OK, we have more info from you: Single server, many small files.
Downloading multiple files involves some overhead. You can decrease this overhead by somehow grouping the files (tar, zip, whatever) on server-side. Of course, this may not be possible. If your app would talk to a web server, I'd advise to create a zip file on the fly server-side according to the list of files transmitted in the request. But you are on an FTP server so I'll assume you have nearly no flexibility server-side.
Downloading several files in parallel may probably increase the throughput in your case. Be very careful though about restrictions set by the server such as the max amount of simultaneous connections. Also, keep in mind that if you have many simultaneous users, you'll end up with a big amount of connections on the server: users x threads. Which may prove counter-productive according to the scalability of the server.
A commonly accepted rule of good behaviour consists in limiting to max 2 simultaneoud connections per user. YMMV.

Okay, as you're not using .NET 4 that makes it slightly harder - the Task Parallel Library would make it really easy to create five threads reading from a producer/consumer queue. However, it still won't be too hard.
Create a Queue<string> with all the files you want to download
Create 5 threads, each of which has a reference to the queue
Make each thread loop, taking an item off the queue and downloading it, or finishing if the queue is empty
Note that as Queue<T> isn't thread-safe, you'll need to lock to make sure that only one thread tries to fetch an item from the queue at a time:
string fileToDownload = null;
lock(padlock)
{
if (queue.Count == 0)
{
return; // Done
}
fileToDownload = queue.Dequeue();
}
As noted elsewhere, threading may not speed things up at all - it depends where the bottleneck is. If the bottleneck is the user's network connection, you won't be able to get more data down the same size of pipe just by using multi-threading. On the other hand, if you have a lot of small files to download from different hosts, then it may be latency rather than bandwidth which is the problem, in which case threading will help.

look up on ParameterizedThreadStart
List<System.Threading.ParameterizedThreadStart> ThreadsToUse = new List<System.Threading.ParameterizedThreadStart>();
int count = 0;
foreach (string str1 in files)
{
ThreadsToUse.add(System.Threading.ParameterizedThreadStart aThread = new System.Threading.ParameterizedThreadStart(download_FTP));
ThreadsToUse[count].Invoke(str1);
count ++;
}
I remember something about Thread.Join that can make all threads respond to one start statement, due to it being a delegate.
There is also something else you might want to look up on which i'm still trying to fully grasp which is AsyncThreads, with these you will know when the file has been downloaded. With a normal thread you gonna have to find another way to flag it's finished.
This may or may not help your speed, in one way of your line speed is low then it wont help you much,
on the other hand some servers set each connection to be capped to a certain speed in which you this in theory will set up multiple connections to the server therefore having a slight increase in speed. how much increase tho I cannot answer.
Hope this helps in some way

I can add some experience to the comments already posted. In an app some years ago I had to generate a treeview of files on an FTP server. Listing files does not normally require actual downloading, but some of the files were zipped folders and I had to download these and unzip them, (sometimes recursively), to display the files/folders inside. For a multithreaded solution, this reqired a 'FolderClass' for each folder that could keep state and so handle both unzipped and zipped folders. To start the operation off, one of these was set up with the root folder and submitted to a P-C queue and a pool of threads. As the folder was LISTed and iterated, more FolderClass instances were submitted to the queue for each subfolder. When a FolderClass instance reached the end of its LIST, it PostMessaged itself, (it was not C#, for which you would need BeginInvoke or the like), to the UI thread where its info was added to the listview.
This activity was characterised by a lot of latency-sensitive TCP connect/disconnect with occasional download/unzip.
A pool of, IIRC, 4-6 threads, (as already suggested by other posters), provided the best performance on the single-core system i had at the time and, in this particular case, was much faster than a single-threaded solution. I can't remember the figures exactly, but no stopwatch was needed to detect the performance boost - something like 3-4 times faster. On a modern box with multiiple cores where LISTs and unzips could occur concurrently, I would expect even more improvement.
There were some problems - the visual ListView component could not keep up with the incoming messages, (because of the multiple threads, data arrived for aparrently 'random' positions on the treeview and so required continual tree navigation for display), and so the UI tended to freeze during the operation. Another problem was detecting when the operation had actually finished. These snags are probably not relevant to your download-many-small-files app.
Conclusion - I expect that downloading a lot of small files is going to be faster if multithreaded with multiple connections, if only from mitigating the connect/disconnect latency which can be larger than the actual data download time. In the extreme case of a satellite connection with high speed but very high latency, a large thread pool would provide a massive speedup.
Note the valid caveats from the other posters - if the server, (or its admin), disallows or gets annoyed at the multiple connections, you may get no boost, limited bandwidth or a nasty email from the admin!
Rgds,
Martin

C#: poor performance with multithreading with heavy I/O

I've written an application in C# that moves jpgs from one set of directories to another set of directories concurrently (one thread per fixed subdirectory). The code looks something like this:
string destination = "";
DirectoryInfo dir = new DirectoryInfo("");
DirectoryInfo subDirs = dir.GetDirectories();
foreach (DirectoryInfo d in subDirs)
{
FileInfo[] files = subDirs.GetFiles();
foreach (FileInfo f in files)
{
f.MoveTo(destination);
}
}
However, the performance of the application is horrendous - tons of page faults/sec. The number of files in each subdirectory can get quite large, so I think a big performance penalty comes from a context switch, to where it can't keep all the different file arrays in RAM at the same time, such that it's going to disk nearly every time.
There's a two different solutions that I can think of. The first is rewriting this in C or C++, and the second is to use multiple processes instead of multithreading.
Edit: The files are named based on a time stamp, and the directory they are moved to are based on that name. So the directories they are moved to would correspond to the hour it was created; 3-27-2009/10 for instance.
We are creating a background worker per directory for threading.
Any suggestions?

Rule of thumb, don't parallelize operations with serial dependencies. In this case your hard drive is the bottleneck and to many threads are just going to make performance worse.
If you are going to use threads try to limit the number to the number of resources you have available, cores and hard disks not the number of jobs you have pending, directories to copy.

Reconsidered answer
I've been rethinking my original answer below. I still suspect that using fewer threads would probably be a good idea, but as you're just moving files, it shouldn't actually be that IO intensive. It's possible that just listing the files is taking a lot of disk work.
However, I doubt that you're really running out of memory for the files. How much memory have you got? How much memory is the process taking up? How many threads are you using, and how many cores do you have? (Using significantly more threads than you have cores is a bad idea, IMO.)
I suggest the following plan of attack:
Work out where the bottlenecks actually are. Try fetching the list of files but not doing the moving them. See how hard the disk is hit, and how long it takes.
Experiment with different numbers of threads, with a queue of directories still to process.
Keep an eye on the memory use and garbage collections. The Windows performance counters for the CLR are good for this.
Original answer
Rewriting in C or C++ wouldn't help. Using multiple processes wouldn't help. What you're doing is akin to giving a single processor a hundred threads - except you're doing it with the disk instead.
It makes sense to parallelise tasks which use IO if there's also a fair amount of computation involved, but if it's already disk bound, asking the disk to work with lots of files at the same time is only going to make things worse.
You may be interested in a benchmark (description and initial results) I've recently been running, testing "encryption" of individual lines of a file. When the level of "encryption" is low (i.e. it's hardly doing any CPU work) the best results are always with a single thread.

If you've got a block of work that is dependent on a system bottleneck, in this case disk IO, you would be better off not using multiple threads or processes. All that you will end up doing is generating a lot of extra CPU and memory activity while waiting for the disk. You would probably find the performance of your app improved if you used a single thread to do your moves.

It seems you are moving a directory, surely just renaming/moving the directory would be sufficient. If you are on the same source and hard disk, it would be instant.
Also capturing all the file info for every file would be unnecessary, the name of the file would suffice.

the performence problem comes from the hard drive there is no point from redoing everything with C/C++ nor from multiple processes

Are you looking at the page-fault count and inferring memory pressure from that? You might well find that the underlying Win32/OS file copy is using mapped files/page faults to do its work, and the faults are not a sign of a problem anyway. Much of Window's own file handling is done via page faults (e.g. 'loading' executable code) - they're not a bad thing per se.
If you are suffering from memory pressure, then I would surmise that it's more likely to be caused by creating a huge number of threads (which are very expensive), rather than by the file copying.
Don't change anything without profiling, and if you profile and find the time is spent in framework methods which are merely wrappers on Win32 functions (download the framework source and have a look at how those methods work), then don't waste time on C++.

If GetFiles() is indeed returning a large set of data, you could write an enumerator, as in:
IEnumerable<string> GetFiles();

So, you're moving files, one at a time, from one subfolder to another subfolder? Wouldn't you be causing lots of disk seeks as the drive head moves back and forth? You might get better performance from reading the files into memory (at least in batches if not all at once), writing them to disk, then deleting the originals from disk.
And if you're doing multiple sets of folders in separate threads, then you're moving the disk head around even more. This is one case where multiple threads isn't doing you a favor (although you might get some benefit if you have a RAID or SAN, etc).
If you were processing the files in some way, then mulptithreading could help if different CPUs could calculate on multiple files at once. But you can't get four CPUs to move one disk head to four different locations at once.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.