Program executing slow when running many threads

Program executing slow when running many threads - c#

I have written a program in C# that does a lot of parallel work using different threads. When i reach approx 300 threads the GUI of the program starts to become slow and the execution of threads is also slowing down drastically. The threads are reading and writing data from a mySQL Database runnning on a different machine.
The funny thing is that if i split the work between two processes on the same machine everything runs perfect. Is there a thread limit per process in the .net framework or in windows? Or why am I getting this behaviour? Could it be a network related problem? I am running Windows 7 Ultimate and i have tried both VS2010 and VS 2012 with the same behaviour.

The way processor time is allocated is that the Operating System gives processor time to every process, then every process gives time to every thread.
So two processes will get twice the processor time, and that's why it works faster if you divide the program into two processes.
If you want to make the GUI run smoother, just set the priority higher for that thread.
This way the GUI thread will get more processor time then the other threads, but not so much that it will noticeably slow down the other threads.

300 threads is silly.
The number of threads should be in the range of your number of cores (2..8) and/or the max simultaneous connections (sometimes only 4 over TCP) your system supports.
Get beyond that and you're only wasting memory, at 1 MB per thread. In a 32bit system, 300 MB is already consuming a lot of the available mem space. And I assume each thread has some buffers attached.
If 2 separate processes perform better than1 then it probably isn't the context switching but either memory usage or a connection limit that holds you back.

Use ThreadPool. That should automatically allocate the optimal number of threads based on your system by throttling the number of threads in existence. You can also set the maximum number of threads allowable at any one time.
Also, if you're allocating thread to parallelize tasks from within a for-loop, foreach-loop, or linq statment you should look at the Parallel Class or PLINQ.

The accepted answer to this question will probably explain what is happening, but 300 threads seems like to many to be a good idea for any normal application.

At first if you have 300 threads for an application then probably you should rethink about your program design.
Setting up GUI threads priority may give you a better performance of GUI. But if you run so much thread the OS have to allocate space in program stack. And the stack is a continuous segment of the memory. So each time you create a new thread the allocated memory space for the stack may be incapable to hold the new thread. And then the OS must have to allocate a larger continuous space in the memory and copy all the data from the old stack to new stack. So obviously this may cause performance slow of your program.

Related

Threading issues with > 30 threads. CPU scales non-linearly

I am having some trouble with my C# application.
I made sure threads do not access any resources outside themselves.
Now I have threadpool thread that makes a tcp connection, creates the thread objects and runs, with 1 thread performance is great. With 50 threads it seems the same, maybe 5-10% slower, CPU 10-20%. With 100 threads, the CPU usage goes from 10-20% to 70-99%.
One of our developers said that windows threads suck compared to linux thread and the context switching is incurring huge penalties. He proposes to create multiplexing with 4-8 core threads running all the instances.
But I thought problems like this start happening once you have 1000+ threads. Can anyone comment with some good sources to read more about this topic, and about thread / cpu performance and correct practices?
EDIT: OK Many answers seem a little off point because some assumptions are being made so I will add some extra points:
Running 3 applications with 50 threads at 10-20% cpu usage makes them all use that much. 30-60% CPU usage total.
Running 1 application with 150 threads makes it cap cpu at 70-99%.
This is what i mean by threads not scaling.

To expand on my comment..
It's not that Windows threads "suck" in comparison to POSIX threads it's just that you're trying to do more things than your CPU can physically handle at a time. CPU Usage is not particularly a relevant performance indicator that you should be looking at here.
If your CPU has 4 cores, your optimum amount of constantly-running threads is 4. Any more and performance degradation is going to happen as yes, context switching will have a performance impact as it tries to process through the threads simultaneously with only 1 resource.
Think of your threads as giant stacks of books on your table, you've got to knock each individual book off the top of each stack and you want them all doing as fast as you can. You've got 4 of these book stacks (threads) but only 2 arms (cores), how do you do it? The most likely option is to alternate which stack you knock books off each time, so there's no real performance benefit as the time taken for a single stack is going to take as long as any other.
The only time when this would differ is if you're running a blocking (ie. waiting for I/O) operation and your threads are idle. In this idle time your cores are free to work on another thread which can give a perceived performance benefit. Of course, when the resource that your other thread is waiting for becomes available you're back in the same situation you are in currently.

Slow worker thread performance with priority queue

I was attempting to use worker threads to speed up a larger algorithm when I noticed that using independent priority queue's on more threads actually slowed performance down. So I wrote a small test case.
In which I query how many threads to start, set each thread to its own processor, and the push and pop a lot of stuff from my priority queues. Each thread owns it's own priority queue, and they're allocated separately so I don't suspect false sharing.
I put the test case here, because it's longer than a snippet.
(The processor affinity bit comes from NCrunch)
The priority queue is of my own creation because .NET didn't have a built in queue. It uses a Pairing Heap if that makes any difference.
At any rate if I run the program with one thread and one core, it gets about 100% usage.
Usage drops with two threads / two cores
And eventually pittles down to 30% usage with all 8 cores.
Which is a problem because the drop in performance nullifies any would be gain from multithreading. What's causing the drop in performance? Each queue is completely independent of the other thread's

Some problems like solving pi are more suited to parallelization and hyperthreading can actually give you a speed up. When you are dealing with a heavy memory problem like you are, Hyperthreading can't help and can actually hurt. Check out "pipelining" in CPU architecture.
There are not many practical problems for which you can get a 2x speedup by using 2-cpus. The more cpus, the more overhead. In your test case algorithm, I suspect cores are having to wait for the memory subsystem. If you tweak the memory requirements, you will see an increase in performance (and utilization) as you move the memory requirements closer to the CPU cache-size.

The OS is assigning the processing to whichever CPU it wishes to at a moment. Therefore, you are seeing every processor do some amount of work.
Additionally, when you say "drop in performance", have you checked how many contentions the system is creating? You probably are relieving yourself of contentions amongst the threads as well.

Multithreading takes more time than Sequential Execution in C#

I am using Multithreading in my while loop ,
as
while(reader.read())
{
string Number= (reader["Num"] != DBNull.Value) ? reader["Num"].ToString() : string.Empty;
threadarray[RowCount] = new Thread(() =>
{
object ID= (from r in Datasetds.Tables[0].AsEnumerable()
where r.Field<string>("Num") == Number
select r.Field<int>("ID")).First<int>();
});
threadarray[RowCount].Start();
RowCount++;
}
But with sequential execution ,for 200 readers it just takes 0.4 s
but with threading it takes 1.1 sec... This is an example but i have same problem when i execute it with number of lines of code in threading with multiple database operations.
for sequential it takes 10 sec for threading it takes more...
can anyone please suggest me?
Thanks...

Threading is not always quicker and in many cases can be slower (such as the case seen here). There are plenty of reasons why but the two most significant are
Creating a thread is a relatively expensive OS operation
Context switching (where the CPU stops working on one thread and starts working on another) is again a relatively expensive operation
Creating 200 threads will take a fair amount of time (with the default stack size this will allocate 200MB of memory for stacks alone), and unless you have a machine with 200 cores the OS will also need to spend a fair amount of time context switching between those threads too.
The end result is that the time that the machine spends creating threads and switching between them simply outstrips the amount of time that the machine spends doing any work. You may see a performance improvement if you reduce the number of threads being used. Try starting with 1 thread for each core that your machine has.
Multithreading where you have more threads than cores is generally only useful in scenarios where the CPU is hanging around waiting for things to happen (like disk IO or network communication). This isn't the case here.

Threading isn't always the solution and the way you're using it is definitely not thread-safe. Things like disk I/O or other bottlenecks won't benefit from threading in certain circumstances.
Also, there is a cost for starting up threads. Not that I would recommend it for your situation, but check out the TPL. http://msdn.microsoft.com/en-us/library/dd460717.aspx

Multithreading usually is a choice for non blocking execution. Like everything on earth it has its associated costs.
For the commodity of parallel execution, we pay with performance.
Usually there is nothing faster then sequential execution of a single task.
It's hard to suggest something real, in you concrete scenario.
May be you can think about multiple process execution, instead of multiple threads execution.
But I repeast it's hard to tell if you will get benefits from this, without knowing application complete architecture and requirements.

it seems you are creating a thread for each read(). so if it has 200 read(), you have 200 threads running (maybe fewer since some may finished quickly). depends on what you are doing in the thread, 200 threads running at the same time may actually slow down the system because of overheads like others mentioned.
multuthread helps you when 1) the job in the thread takes some time to finish; 2) you have a control on how many threads running at the same time.
in your case, you need try, say 10 threads. if 10 threads are running, wait until 1 of them finished, then allocate the thread to a new read().
if the job in the thread does not take much time, then better use single thread.

Sci Fi author and technologist Jerry Pournelle once said that in an ideal world every process should have its own processor. This is not an ideal world, and your machine has probably 1 - 4 processors. Your Windows system alone is running scads of processes even when you yourself are daydreaming. I just counted the processes running on my Core 2 Quad XP machine, and SYSTEM is running 65 processes. That's 65 processes that have to be shared between 4 processors. Add more threads and each one gets only a slice of processor power.
If you had a Beowulf Cluster you could share threads off into individual machines and you'd probably get very good timings. But your machine can't do this with only 4 processors. The more you ask it to do the worse performance is going to be.

Threading cost - minimum execution time when threads would add speed

I am working on a C# application that works with an array. It walks through it (meaning that at one time only a narrow part of the array is used). I am considering adding threads in it to make it perform faster (it runs on a dualcore computer). The problem is that I do not know if it would actually help, because threads cost something and this cost could easily be more than the parallel gain... So how do I determine if threading would help?

Try writing some benchmarks that mimic, as closely as possible, the real-world conditions in which your software will actually be used.
Test and time the single-threaded version. Test and time the multi-threaded version. Compare the two sets of results.

If your application is CPU bound (i.e. it isn't spending time trying to read files or waiting for data from a device) and there is little to no sharing of live data (data being altered, if its read only its fine) between the threads then you can pretty much increase the speed by 50->75% by adding another thread (as long as it still remains CPU bound of course).
The main overhead in multithreading comes from 2 places.
Creation & initialization of the thread. Creating a thread requires quite a few resources to be allocated and involves swaps between kernel and user mode, this is expensive though a once off per thread so you can pretty much ignore it if the thread is running for any reasonable amount of time. The best way to mitigate this problem is to use a thread pool as it will keep the thread on hand and not need to be recreated.
Handling synchronization of data. If one thread is reading from data that another is writing, bad things will generally happen (worse if both are changing it). This requires you to lock your data before altering it so that no thread reads a half written value. These locks are generally quite slow as well. To mitigate this problem, you need to design your data layout so that the threads don't need to read or write to the same data as much as possible. If you do need a lot of these locks it can then become slower than the single thread option.
In short, if you are doing something that requires the CPU's to share a lot of data, then multi-threading it will be slower and if the program isn't CPU bound there will be little or no difference (could be a lot slower depending on what it is bound to, e.g. a cd/hard drive). If your program matches these conditions, then it will PROBABLY be worthwhile to add another thread (though the only way to be certain would be profiling).
One more little note, you should only create as many CPU bound threads as you have physical cores (threads that idle most of the time, such as a GUI message pump thread, can be ignored for this condition).
P.S. You can reduce the cost of locking data by using a methodology called "lock-free programming", though this something that should really only be attempted by people with a lot of experience with multi-threading and a clear understanding of their target architecture (including how the cache is treated and the memory bus).

I agree with Luke's answer. Benchmark it, it's the only way to be sure.
I can also give a prediction of the results - the fastest version will be when the number of threads matches the number of cores, EXCEPT if the array is very small and each thread would have to process just a few items, the setup/teardown times might get larger than the processing itself. How few - that depends on what you do. Again - benchmark.
I'd advise to find out a "minimum number of items for a thread to be useful". Then, when you are deciding how many threads to spawn (or take from a pool), check how many cores the computer has and how many items there are. Spawn as many threads as possible, but no more than the computer has cores, and not so many that each thread would have less than the minimum number of items to process.
For example if the minimum number of items is, say, 1000; and the computer has 4 cores; and your list contains 2500 items, you would spawn just 2 threads, because more threads would be inefficient (each would process less than 1000 items).

Making a step by step list for Luke's idea:
Make a single threaded test app
Download Sysinternals Process Monitor and run it
Run your test app and find it on the process list (remember to run it as a release build outside of Visual Studio)
Double click the process and select the Performance Graph tab
Observe the CPU time used by your process
If the CPU time is sittling flat 50% for more than a few seconds, you can probably speed your overall process up using threads (assuming the bunch of stuff Mr Peters refered to holds true)
(However, the best you can do on a duel core machine is to halve the time it takes to run. If your process only take 4 seconds, it might not be worth getting it to run in 2 seconds)
Using the task parallel library / Rx provides a friendlier interface than System.Threading.ThreadPool, which might make your world a bit easier.

You miss imho one item, which is that it is not always about execution time. There is:
The problem to koop a UI operational during an operation. Even if the UI is "dormant", a nonresponsive message pump makes a worse impression.
The possibility to use a thread pool to actually not ahve to start / stop threads all the time. I use thread pools very extensively, and various parts of the applications keep them busy.
Anyhow, ignoring my point 1 - where you may go multi threaded without speeding things up in order to keep your UI responsive - I would say it is always then faster when you can actually either split up work (so you can keep more than one core busy) or offload it for othe reasons.

Preventing multithreaded apps from running out of memory

I've written a small, in house C# program which batch converts between various file formats, mostly relying on other API's. Currently, the UI spawns a BackgroundWorker (not a Thread) to handle the conversions, and then fills a queue with the requests which empties as the worker completes jobs. The queue object themselves are very small (3 Strings to tell the worker what to do) and don't contribute greatly to the program's memory footprint. I've been fairly draconian with my memory management (disposing of Images when finished with them, and manually garbage collecting at certain points.) Still, the program tends to use about 100 MB of memory at a time, and uses about 50% of total CPU time. It seems like if I naively implemented threading, it would quickly run out of system memory (unless the CLR does some sort of magic I don't know about.)
Is there a simple/efficient way to spawn threads to prevent the system from running out of memory besides catching OutOfMemory exceptions and rolling back the thread that died (seems very inefficient, but there's no way to preserve state without using prohibitive amounts of memory)?

If you use the ThreadPool.QueueUserWorkItem to spawn the conversions you will automatically get a limit on how many threads that are running. The ThreadPool manages this internally and will process queued calls as soon as a pool thread becomes available.

Put a limit on the queue size and make sender wait if it's full. You would have to find the right queue size limit empirically though.

Out of memory exceptions can be tricky and maybe caused by fragmented memory, not actually running out of memory. Therefore they can be tricky to track down.
Tess from Microsoft product support in Sweeden (http://blogs.msdn.com/tess/) has got a good number of posts on tracking down memory where memory is going to help with this process.
A good profile of your app could also help. JetBrains have a good one as is AQTime.

Using 100MB of memory is not a lot for an image processing application.
Also, in .net when you get an OutOfMemory exception the entire process dies, you can't recover from this.
If you need more memory (or, more correctly, more address space) than the system can give you can either page memory in and out as you use it or switch to 64bit.
You can limit the queue size to make sure the only "heavy" memory user is the worker thread, you can have more than one worker thread, that will increase your memory usage but will also empty the queue faster, just remember that in a CPU intensive operations like this having more threads than CPU cores is inefficient.
You talk about "naively implemented threading", multi-threading is full of pitfalls - a naive implementation will be inefficient and full of bugs.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.