How many threads to use? - c#

I know there are some existing questions and they provide a very good general perspective on things. I'm hoping to get some details on the C#/VB.Net side for the actual implementation (not philosophy) of some of these perspectives.
My Particular Case
I have a WCF Service which, amongst other things, receives files. For most of the service's life this particular area is actually just sat doing nothing - when work does come it arrives in high bursts of greatly varying quantities.
For each file received (which at a max can be thousands per second) the service needs to work on the files for between 1-10 seconds (each) depending on a number of other services, local resources, and network IO wait times.
To aid the service with these burst workloads I implemented a Queue system. Those thousands of files recieved per second are placed onto the Queue. A controller calculates the number of threads to use based on the size of the queue, up until it reaches a "Peak Max Threads" setting which prevents it from creating additional threads. These threads are placed in a thread pool, and reused to cycle through the queue. The controller will; at intervals; recalculate the number of threads required. If the queue size reduces, a relevant number of threads are released.
The age old problem
How many threads should I peak at? Clearly, adding a new thread everytime a file was received would be silly for lack of a better word - the performance, at best, would deteriorate. Capping the threads when CPU utilization is only 10% across each core, also doesn't seem to be the best use of resources.
So, is there an appropriate way to determine how many threads to cap at? I would rather the service could determine this for itself by sampling available resources, but is there a performance hit from doing so? I know the common answer is to monitor workloads, adjust the counts through trial and error until I find a number I like, but due to the nature of this service (long periods of idle followed by high/burst workloads) it could take a long time to get that kind of information.
What then if we move the server's image to a different host which is faster/slower/different to the first? I have to re-sample the process all over again?
Ideally what I'm after, is for the co-ordinator to intelligently increase the size of the threadpool until CPU utilisation is at x% (would 80% be reasonable? 90%? 99%?). Clearly, I want to do this without adding more threads than is necessary to hit x% otherwise all I'll end up with is threads not just waiting on IO resources, but awaiting each other too.
Thanks in advance!
Related questions (if you want some generic ideas):
How many threads to create?
How many threads is too many?
How many threads to create and when?
A Complication for you
Where would be the fun if I didn't make the problem more difficult?
As it currently stands, the service does hit 100% cpu during these bursts, regularly. The issue is the CPU utilisation spikes. It goes from idle (0-10%) to 100%, and back down again. I'm not sure I can help that - ideally I wouldn't take it all the way to 100%. The problem exists because the files mentioned are in fact images, and part of the services' process is to pass the image through to the System.Windows.Media blackbox which does some complex image processing for me.
There are then lulls in between the spikes because of the IO waits and other processing that goes on. If the spikes hitting 100% can't be helped (and I'm all for knowing how to prevent that, or if I should) how should I aim for the CPU utilisation graph to look? Sat constantly at 100%? Bouncing between 50-100? If I do go through the effort of sampling to decide what does seem to work best, is it guaranteed that switching the virtual servers' host will also work best with the same graph?
This added complexity I won't take into consideration for those of you willing to answer. Feel free to ignore this section. However, any answer that also accounts for this complication, or even answers that just provide tips on how to handle it, I'll at the very least upvote!
Heck of a long question - sorry about that - and thanks for reading so much!!

PerformanceCounter allows you to query for processor usage.
However ,have you tried something the framework provides?
foreach (var file in files)
{
var workitem = file;
Task.Factory.StartNew(() =>
{
// do work on workitem
}, TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
}
You can tune the concurrency level for Tasks in the Task.Factory.
The .NET 4 threadpool by default will schedule the number of threads it finds most performing on the hardware where it runs, but you can change how that works with the previous link.
Probably you need a custom solution but it would be ok to benchmark yours with the standard.
Edit: (comment note):
No links needed, I may have used an invented term since english is not my language. What I mean is: have a variable where you store the variance before the last check (prevDelta), and call it delta. add this to the varuiable avrageDelta and divide by 2, each time you 'check'. You will have the variable averageDelta that will mostly be low since you have no activity. Then have another set of delta variables, one you have already (delta - prevdelta), and store it in a delta variable that is not the average of all deltas but the average of deltas in a small timespan (you will have to come up with an algortihm to calculate accurately this temporal variance). Once done this you can compare the average delta and the 'temporal delta'. The average delta will be mostly low and will slowly go up whjen bursts come. In the same period the temporal delta will go up really fast. Then you have the situation when the burst stops, the average delta goes slowly down, and the 'temporal' goes really fast.

You could use I/O Completion Ports to asynchronously fetch your images without tying up any threads until it comes time to process what you have fetched.
You could then limit your thread pool based on the number of cores on your client PC, making sure to leave a core free for other processes to use.

What about a dynamic thread manager that monitors their overall performance and according to this spawns new threads or kills old ones? The main problem here is only how to define the performance measurement function. The rest can be done with a periodically scheduled job that increases or decreases the number of threads according to the previous number of threads and performance in that case or something like that. Maybe also in connection to resources utilization (CPU, disks, network...).

Related

How many threads my current machine can handle optimally?

Original Question
Is there a heuristic or algorithim to programatically find out how many threads i can open in order to obtain maximum throughput of a async operation such as writing on a socket?
Further explained question
I'm assisting a algorithms professor in my college and he posted a assignment where the students are supossed to learn the basics about distributed computing, in his words: Sockets... The assignment is to create a "server" that listens on a given port, receives a string, performs a simple operations on it (i think it's supposed to count it's length) and return Ok or Rejected... The "server" must be able to handle a minimum of 60k submitions per second... My job is to create a little app to simulate 60K clients...
I've managed to automate the distribution of servers and the clients across a university lab in order to test 10 servers at a time (network infrastructure became the bottleneck), the problem here is: A lab is homogeneous, 2 labs are not! If not tunned correctly the "client" usually can't simulate 60k users and report back to me, especially when the lab is a older one, AND i would like to provide the client to the students so they could test their own "server" more reliably... The ability to determine the optimal number of threads to spawn has now become vital! PS: Fire-and-Forget is not a option because the client also tests if the returned value is correct, e.g If i send "Short sentence" i know the result will be "Rejected" and i have to check it...
A class have 60 students... and there's the morning class and the night class, so each week there will be 120 "servers" to test because as the semester moves along the "server" part will have to do more stuff, the client no (it will always only send a string and receive "Ok"/"Rejected")... So there's enough work to be done in order to justify all this work i'm doing...
Edit1
- Changed from Console to a async operation
- I dont want the maximum number of threads, i want the number that will provide maximum throughput! I imagine that on a 6 core pc the number will be higher than on a 2 core pc
Edit2
- I'm building a simple console app to perform some test in another app... one of thouse is a specific kind of load test (RUDY attack) where i have to simulate a lot of clients performing a specific attack... The thing is that there's a curve between throughput and number of threads, where after a given point, opening more threads actually decreases my throughput...
Edit3
Added more context to the initial question...
The Windows console is really meant to be used by more than one thread, otherwise you get interleaved writes. So the thread count for maximum console output would be one.
It's when you're doing computation that multiple threads makes sense. Then, it's rarely useful to use more than one thread per logical processor - or one background thread plus on UI thread for UI apps on a single-core processor.
It depends entirely on the situation - so the actual answer to your question of "is there a magical algorithm that will give me the perfect setup for max throughput?" is ... no.
Sure, more cores means more threads that can run and less context-switching. That said, you've edited your question to include an IO-bound example. IO-bound operations generally make use of completion ports for async operations. So, in that particular case, removing your use of your own dedicated threads for such an operation would be your main concern towards achieving maximum throughput.
Since you changed the question, I'll provide another answer.
It depends on the workload. If you're doing compute-heavy tasks, then use every logical processor. If you're doing IO, then use async calls rather than spawning new threads.
Of course, .NET has a way of managing this for you - the Thread Pool. Use it. Don't worry about how many threads you need, just kick off tasks.
If you are actually trying to do something productive (instead of just printing to the console), you should use System.Threading.Tasks.Task.Factory.StartNew. You can start as many tasks as you want. The runtime will try to distribute them amongst the available hardware threads as well as it can.

Control Memory-Hungy Multi-Threaded App

This a VERY open question.
Basically, I have a computing application that launches test combinations for N Scenarios.
Each test is conducted in a single dedicated thread, and involves reading large binary data, processing it, and dropping results to DB.
If the number of threads is too large, the app gets rogue and eats out all available memory and hangs out..
What is the most efficient way to exploit all CPU+RAM capabilities (High Performance computing i.e 12Cores/16GB RAM) without putting the system down to its knees (which happens if "too many" simultaneous threads are launched, "too many" being a relative notion of course)
I have to specify that I have a workers buffer queue with N workers, every time one finishes and dies a new one is launched via a Queue. This works pretty fine as of now. But I would like to avoid "manually" and "empirically" setting the number of simultaneous threads and have an intelligent scalable system that drops as many threads at a time that the system can properly handle, and stop at a "reasonable" memory usage (the target server is dedicated to the app so there is no problem regarding other applications except the system)
PS : I know that .Net 3.5 comes with Thread Pools and .Net 4 has interesting TPL capabilites, that I am still considering right now (I never went very deep into this so far).
PS 2 : After reading this post I was a bit puzzled by the "don't do this" answers. Though I think such request is fair for a memory-demanding computing program.
EDIT
After reading this post I will to try to use WMI features
All built-in threading capabilities in .NET do not support adjusting according to memory usage. You need to build this yourself.
You can either predict memory usage or react to low memory conditions. Alternatives:
Look at the amount of free memory on the system before launching a new task. If it is below 500mb, wait until enough has been freed.
Launch tasks as they come and throttle as soon as some of them start to fail because of OOM. Restart them later. This alternative sucks big time because your process will do garbage collections like crazy to avoid the OOMs.
I recommend (1).
You can either look at free system memory or your own processes memory usage. In order to get the memory usage I recommend looking at private bytes using the Process class.
If you set aside 1GB of buffer on your 16GB system you run at 94% efficiency and are pretty safe.

Threads vs Processes in .NET

I have a long-running process that reads large files and writes summary files. To speed things up, I'm processing multiple files simultaneously using regular old threads:
ThreadStart ts = new ThreadStart(Work);
Thread t = new Thread(ts);
t.Start();
What I've found is that even with separate threads reading separate files and no locking between them and using 4 threads on a 24-core box, I can't even get up to 10% on the CPU or 10% on disk I/O. If I use more threads in my app, it seems to run even more slowly.
I'd guess I'm doing something wrong, but where it gets curious is that if I start the whole exe a second and third time, then it actually processes files two and three times faster. My question is, why can't I get 12 threads in my one app to process data and tax the machine as well as 4 threads in 3 instances of my app?
I've profiled the app and the most time-intensive and frequently called functions are all string processing calls.
It's possible that your computing problem is not CPU bound, but I/O bound. It doesn't help to state that your disk I/O is "only at 10%". I'm not sure such performance counter even exists.
The reason why it gets slower while using more threads is because those threads are all trying to get to their respective files at the same time, while the disk subsystem is having a hard time trying to accomodate all of the different threads. You see, even with a modern technology like SSDs where the seek time is several orders of magnitude smaller than with traditional hard drives, there's still a penalty involved.
Rather, you should conclude that your problem is disk bound and a single thread will probably be the fastest way to solve your problem.
One could argue that you could use asynchronous techniques to process a bit that's been read, while on the background the next bit is being read in, but I think you'll see very little performance improvement there.
I've had a similar problem not too long ago in a small tool where I wanted to calculate MD5 signatures of all the files on my harddrive and I found that the CPU is way too fast compared to the storage system and I got similar results trying to get more performance by using more threads.
Using the Task Parallel Library didn't alleviate this problem.
First of all on a 24 core box if you are using only 4 threads the most cpu it could ever use is 16.7% so really you are getting 60% utilization, which is fairly good.
It is hard to tell if your program is I/O bound at this point, my guess is that is is. You need to run a profiler on your project and see what sections of code your project is spending the most of it's time. If it is sitting on a read/write operation it is I/O bound.
It is possable you have some form of inter-thread locking being used. That would cause the program to slow down as you add more threads, and yes running a second process would fix that but fixing your locking would too.
What it all boils down to is without profiling information we can not say if using a second process will speed things up or make things slower, we need to know if the program is hanging on a I/O operation, a locking operation, or just taking a long time in a function that can be parallelized better.
I think you find out what file cache is not ideal in case when one proccess write data in many file concurrently. File cache should sync to disk when the number of dirty page cache exceeds a threshold. It seems concurrent writers in one proccess hit threshold faster than the single thread writer. You can read read about file system cache here File Cache Performance and Tuning
Try using Task library from .net 4 (System.Threading.Task). This library have built-in optimizations for different number of processors.
Have no clue what is you problem, maybe because your code snippet is not really informative

Does multi-threading equal less CPU?

I have a small list of rather large files that I want to process, which got me thinking...
In C#, I was thinking of using Parallel.ForEach of TPL to take advantage of modern multi-core CPUs, but my question is more of a hypothetical character;
Does the use of multi-threading in practicality mean that it would take longer time to load the files in parallel (using as many CPU-cores as possible), as opposed to loading each file sequentially (but with probably less CPU-utilization)?
Or to put it in another way (:
What is the point of multi-threading? More tasks in parallel but at a slower rate, as opposed to focusing all computing resources on one task at a time?
In order to not increase latency, parallel computational programs typically only create one thread per core. Applications which aren't purely computational tend to add more threads so that the number of runnable threads is the number of cores (the others are in I/O wait, and not competing for CPU time).
Now, parallelism on disk-I/O bound programs may well cause performance to decrease, if the disk has a non-negligible seek time then much more time will be wasted performing seeks and less time actually reading. This is called "churning" or "thrashing". Elevator sorting helps somewhat, true random access (such as solid state memories) helps more.
Parallelism does almost always increase the total raw work done, but this is only important if battery life is of foremost importance (and by the time you account for power used by other components, such as the screen backlight, completing quicker is often still more efficient overall).
You asked multiple questions, so I've broken up my response into multiple answers:
Multithreading may have no effect on loading speed, depending on what your bottleneck during loading is. If you're loading a lot of data off disk or a database, I/O may be your limiting factor. On the other hand if 'loading' involves doing a lot of CPU work with some data, you may get a speed up from using multithreading.
Generally speaking you can't focus "all computing resources on one task." Some multicore processors have the ability to overclock a single core in exchange for disabling other cores, but this speed boost is not equal to the potential performance benefit you would get from fully utilizing all of the cores using multithreading/multiprocessing. In other words it's asymmetrical -- if you have a 4 core 1Ghz CPU, it won't be able to overclock a single core all the way to 4ghz in exchange for disabling the others. In fact, that's the reason the industry is going multicore in the first place -- at least for now we've hit limits on how fast we can make a single CPU run, so instead we've gone the route of adding more CPUs.
There are 2 reasons for multithreading. The first is that you want to tasks to run at the same time simply because it's desirable for both to be able to happen simultaneously -- e.g. you want your GUI to continue to respond to clicks or keyboard presses while it's doing other work (event loops are another way to accomplish this though). The second is to utilize multiple cores to get a performance boost.
For loading files from disk, this is likely to make things much slower. What happens is the operating system tries to lay out files on disk such that you should only need to do an expensive disk seek once for each file. If you have a lot of threads reading a lot of files, you're gonna have contention over which thread has access to the disk, and you'll have to seek back to the right place in the file every time the next thread gets a turn.
What you can do is use exactly two threads. Set one to load all of the files in the background, and let the other remain available for other tasks, like handling user input. In C# winforms, you can do this easily with a BackgroundWorker control.
Multi-threading is useful for highly parallelizable tasks. CPU intensive tasks are perfect. Your CPU has many cores, many threads can use many cores. They'll use more CPU time, but in the end they'll use less "user" time. If your app is I/O bounded, then multithreading isn't always the solution (but it COULD help)
It might be helpful to first understand the difference between Multithreading and Parallelism, as more often than not I see them being used rather interchangeably. Joseph Albahari has written a quite interesting guide about the subject: Threading in C# - Part 5 - Parallelism
As with all great programming endeavors, it depends. By and large, you'll be requesting files from one physical store, or one physical controller which will serialize the requests anyhow (or worse, cause a LOT of head back-and-forth on a classical hard drive) and slow down the already slow I/O.
OTOH, if the controllers and the medium are separate, multiple cores loading data from them should be improved over a sequential method.

How to Download 5 files at a time using Thread in .net framework 3.5

I need to download certain files using FTP.Already it is implemented without using the thread. It takes too much time to download all the files.
So i need to use some thread for speed up the process .
my code is like
foreach (string str1 in files)
{
download_FTP(str1)
}
I refer this , But i don't want every files to be queued at ones.say for example 5 files at a time.
If the process is too slow, it means most likely that the network/Internet connection is the bottleneck. In that case, downloading the files in parallel won't significantly increase the performance.
It might be another story though if you are downloading from different servers. We may then imagine that some of the servers are slower than others. In that case, parallel downloads would increase the overall performance since the program would download files from other servers while being busy with slow downloads.
EDIT: OK, we have more info from you: Single server, many small files.
Downloading multiple files involves some overhead. You can decrease this overhead by somehow grouping the files (tar, zip, whatever) on server-side. Of course, this may not be possible. If your app would talk to a web server, I'd advise to create a zip file on the fly server-side according to the list of files transmitted in the request. But you are on an FTP server so I'll assume you have nearly no flexibility server-side.
Downloading several files in parallel may probably increase the throughput in your case. Be very careful though about restrictions set by the server such as the max amount of simultaneous connections. Also, keep in mind that if you have many simultaneous users, you'll end up with a big amount of connections on the server: users x threads. Which may prove counter-productive according to the scalability of the server.
A commonly accepted rule of good behaviour consists in limiting to max 2 simultaneoud connections per user. YMMV.
Okay, as you're not using .NET 4 that makes it slightly harder - the Task Parallel Library would make it really easy to create five threads reading from a producer/consumer queue. However, it still won't be too hard.
Create a Queue<string> with all the files you want to download
Create 5 threads, each of which has a reference to the queue
Make each thread loop, taking an item off the queue and downloading it, or finishing if the queue is empty
Note that as Queue<T> isn't thread-safe, you'll need to lock to make sure that only one thread tries to fetch an item from the queue at a time:
string fileToDownload = null;
lock(padlock)
{
if (queue.Count == 0)
{
return; // Done
}
fileToDownload = queue.Dequeue();
}
As noted elsewhere, threading may not speed things up at all - it depends where the bottleneck is. If the bottleneck is the user's network connection, you won't be able to get more data down the same size of pipe just by using multi-threading. On the other hand, if you have a lot of small files to download from different hosts, then it may be latency rather than bandwidth which is the problem, in which case threading will help.
look up on ParameterizedThreadStart
List<System.Threading.ParameterizedThreadStart> ThreadsToUse = new List<System.Threading.ParameterizedThreadStart>();
int count = 0;
foreach (string str1 in files)
{
ThreadsToUse.add(System.Threading.ParameterizedThreadStart aThread = new System.Threading.ParameterizedThreadStart(download_FTP));
ThreadsToUse[count].Invoke(str1);
count ++;
}
I remember something about Thread.Join that can make all threads respond to one start statement, due to it being a delegate.
There is also something else you might want to look up on which i'm still trying to fully grasp which is AsyncThreads, with these you will know when the file has been downloaded. With a normal thread you gonna have to find another way to flag it's finished.
This may or may not help your speed, in one way of your line speed is low then it wont help you much,
on the other hand some servers set each connection to be capped to a certain speed in which you this in theory will set up multiple connections to the server therefore having a slight increase in speed. how much increase tho I cannot answer.
Hope this helps in some way
I can add some experience to the comments already posted. In an app some years ago I had to generate a treeview of files on an FTP server. Listing files does not normally require actual downloading, but some of the files were zipped folders and I had to download these and unzip them, (sometimes recursively), to display the files/folders inside. For a multithreaded solution, this reqired a 'FolderClass' for each folder that could keep state and so handle both unzipped and zipped folders. To start the operation off, one of these was set up with the root folder and submitted to a P-C queue and a pool of threads. As the folder was LISTed and iterated, more FolderClass instances were submitted to the queue for each subfolder. When a FolderClass instance reached the end of its LIST, it PostMessaged itself, (it was not C#, for which you would need BeginInvoke or the like), to the UI thread where its info was added to the listview.
This activity was characterised by a lot of latency-sensitive TCP connect/disconnect with occasional download/unzip.
A pool of, IIRC, 4-6 threads, (as already suggested by other posters), provided the best performance on the single-core system i had at the time and, in this particular case, was much faster than a single-threaded solution. I can't remember the figures exactly, but no stopwatch was needed to detect the performance boost - something like 3-4 times faster. On a modern box with multiiple cores where LISTs and unzips could occur concurrently, I would expect even more improvement.
There were some problems - the visual ListView component could not keep up with the incoming messages, (because of the multiple threads, data arrived for aparrently 'random' positions on the treeview and so required continual tree navigation for display), and so the UI tended to freeze during the operation. Another problem was detecting when the operation had actually finished. These snags are probably not relevant to your download-many-small-files app.
Conclusion - I expect that downloading a lot of small files is going to be faster if multithreaded with multiple connections, if only from mitigating the connect/disconnect latency which can be larger than the actual data download time. In the extreme case of a satellite connection with high speed but very high latency, a large thread pool would provide a massive speedup.
Note the valid caveats from the other posters - if the server, (or its admin), disallows or gets annoyed at the multiple connections, you may get no boost, limited bandwidth or a nasty email from the admin!
Rgds,
Martin

Categories

Resources