Multithreading takes more time than Sequential Execution in C# - c#

I am using Multithreading in my while loop ,
as
while(reader.read())
{
string Number= (reader["Num"] != DBNull.Value) ? reader["Num"].ToString() : string.Empty;
threadarray[RowCount] = new Thread(() =>
{
object ID= (from r in Datasetds.Tables[0].AsEnumerable()
where r.Field<string>("Num") == Number
select r.Field<int>("ID")).First<int>();
});
threadarray[RowCount].Start();
RowCount++;
}
But with sequential execution ,for 200 readers it just takes 0.4 s
but with threading it takes 1.1 sec... This is an example but i have same problem when i execute it with number of lines of code in threading with multiple database operations.
for sequential it takes 10 sec for threading it takes more...
can anyone please suggest me?
Thanks...

Threading is not always quicker and in many cases can be slower (such as the case seen here). There are plenty of reasons why but the two most significant are
Creating a thread is a relatively expensive OS operation
Context switching (where the CPU stops working on one thread and starts working on another) is again a relatively expensive operation
Creating 200 threads will take a fair amount of time (with the default stack size this will allocate 200MB of memory for stacks alone), and unless you have a machine with 200 cores the OS will also need to spend a fair amount of time context switching between those threads too.
The end result is that the time that the machine spends creating threads and switching between them simply outstrips the amount of time that the machine spends doing any work. You may see a performance improvement if you reduce the number of threads being used. Try starting with 1 thread for each core that your machine has.
Multithreading where you have more threads than cores is generally only useful in scenarios where the CPU is hanging around waiting for things to happen (like disk IO or network communication). This isn't the case here.

Threading isn't always the solution and the way you're using it is definitely not thread-safe. Things like disk I/O or other bottlenecks won't benefit from threading in certain circumstances.
Also, there is a cost for starting up threads. Not that I would recommend it for your situation, but check out the TPL. http://msdn.microsoft.com/en-us/library/dd460717.aspx

Multithreading usually is a choice for non blocking execution. Like everything on earth it has its associated costs.
For the commodity of parallel execution, we pay with performance.
Usually there is nothing faster then sequential execution of a single task.
It's hard to suggest something real, in you concrete scenario.
May be you can think about multiple process execution, instead of multiple threads execution.
But I repeast it's hard to tell if you will get benefits from this, without knowing application complete architecture and requirements.

it seems you are creating a thread for each read(). so if it has 200 read(), you have 200 threads running (maybe fewer since some may finished quickly). depends on what you are doing in the thread, 200 threads running at the same time may actually slow down the system because of overheads like others mentioned.
multuthread helps you when 1) the job in the thread takes some time to finish; 2) you have a control on how many threads running at the same time.
in your case, you need try, say 10 threads. if 10 threads are running, wait until 1 of them finished, then allocate the thread to a new read().
if the job in the thread does not take much time, then better use single thread.

Sci Fi author and technologist Jerry Pournelle once said that in an ideal world every process should have its own processor. This is not an ideal world, and your machine has probably 1 - 4 processors. Your Windows system alone is running scads of processes even when you yourself are daydreaming. I just counted the processes running on my Core 2 Quad XP machine, and SYSTEM is running 65 processes. That's 65 processes that have to be shared between 4 processors. Add more threads and each one gets only a slice of processor power.
If you had a Beowulf Cluster you could share threads off into individual machines and you'd probably get very good timings. But your machine can't do this with only 4 processors. The more you ask it to do the worse performance is going to be.

Related

Threading issues with > 30 threads. CPU scales non-linearly

I am having some trouble with my C# application.
I made sure threads do not access any resources outside themselves.
Now I have threadpool thread that makes a tcp connection, creates the thread objects and runs, with 1 thread performance is great. With 50 threads it seems the same, maybe 5-10% slower, CPU 10-20%. With 100 threads, the CPU usage goes from 10-20% to 70-99%.
One of our developers said that windows threads suck compared to linux thread and the context switching is incurring huge penalties. He proposes to create multiplexing with 4-8 core threads running all the instances.
But I thought problems like this start happening once you have 1000+ threads. Can anyone comment with some good sources to read more about this topic, and about thread / cpu performance and correct practices?
EDIT: OK Many answers seem a little off point because some assumptions are being made so I will add some extra points:
Running 3 applications with 50 threads at 10-20% cpu usage makes them all use that much. 30-60% CPU usage total.
Running 1 application with 150 threads makes it cap cpu at 70-99%.
This is what i mean by threads not scaling.
To expand on my comment..
It's not that Windows threads "suck" in comparison to POSIX threads it's just that you're trying to do more things than your CPU can physically handle at a time. CPU Usage is not particularly a relevant performance indicator that you should be looking at here.
If your CPU has 4 cores, your optimum amount of constantly-running threads is 4. Any more and performance degradation is going to happen as yes, context switching will have a performance impact as it tries to process through the threads simultaneously with only 1 resource.
Think of your threads as giant stacks of books on your table, you've got to knock each individual book off the top of each stack and you want them all doing as fast as you can. You've got 4 of these book stacks (threads) but only 2 arms (cores), how do you do it? The most likely option is to alternate which stack you knock books off each time, so there's no real performance benefit as the time taken for a single stack is going to take as long as any other.
The only time when this would differ is if you're running a blocking (ie. waiting for I/O) operation and your threads are idle. In this idle time your cores are free to work on another thread which can give a perceived performance benefit. Of course, when the resource that your other thread is waiting for becomes available you're back in the same situation you are in currently.

Program executing slow when running many threads

I have written a program in C# that does a lot of parallel work using different threads. When i reach approx 300 threads the GUI of the program starts to become slow and the execution of threads is also slowing down drastically. The threads are reading and writing data from a mySQL Database runnning on a different machine.
The funny thing is that if i split the work between two processes on the same machine everything runs perfect. Is there a thread limit per process in the .net framework or in windows? Or why am I getting this behaviour? Could it be a network related problem? I am running Windows 7 Ultimate and i have tried both VS2010 and VS 2012 with the same behaviour.
The way processor time is allocated is that the Operating System gives processor time to every process, then every process gives time to every thread.
So two processes will get twice the processor time, and that's why it works faster if you divide the program into two processes.
If you want to make the GUI run smoother, just set the priority higher for that thread.
This way the GUI thread will get more processor time then the other threads, but not so much that it will noticeably slow down the other threads.
300 threads is silly.
The number of threads should be in the range of your number of cores (2..8) and/or the max simultaneous connections (sometimes only 4 over TCP) your system supports.
Get beyond that and you're only wasting memory, at 1 MB per thread. In a 32bit system, 300 MB is already consuming a lot of the available mem space. And I assume each thread has some buffers attached.
If 2 separate processes perform better than1 then it probably isn't the context switching but either memory usage or a connection limit that holds you back.
Use ThreadPool. That should automatically allocate the optimal number of threads based on your system by throttling the number of threads in existence. You can also set the maximum number of threads allowable at any one time.
Also, if you're allocating thread to parallelize tasks from within a for-loop, foreach-loop, or linq statment you should look at the Parallel Class or PLINQ.
The accepted answer to this question will probably explain what is happening, but 300 threads seems like to many to be a good idea for any normal application.
At first if you have 300 threads for an application then probably you should rethink about your program design.
Setting up GUI threads priority may give you a better performance of GUI. But if you run so much thread the OS have to allocate space in program stack. And the stack is a continuous segment of the memory. So each time you create a new thread the allocated memory space for the stack may be incapable to hold the new thread. And then the OS must have to allocate a larger continuous space in the memory and copy all the data from the old stack to new stack. So obviously this may cause performance slow of your program.

Increasing thread and process priority to reduce execution time for a processor-intensive, parallelized application

I know that setting thread priority is a bit of a taboo subject on stack overflow but I am convinced that my application is a good candidate for increased priority. To justify that, I have explained the context below. The question now is HOW to do that effectively?
The application is .NET 4 (C#) console application that executes a complex algorithm with an execution time of about five hours. The algorithm is not memory-intensive at all, just processor intensive. It does number crunching and does NOT perform any disk I/O, database connectivity, network connectivity, etc. The output of the application is a just ONE number that it writes to the console at the end. In other words, the algorithm is completely self-contained and has no dependencies.
The application runs on its own dedicated 16 core 64 bit machine running Windows Server with far more free RAM than it requires (8GB). By dedicated I mean the server has been procured to run this application EXCLUSIVELY.
I have already optimized the code as much as I could with extensive profiling, fancy math shortcuts and bit twiddling hacks.
Here is the overall structure in pseudocode:
public static void Main ()
{
Process.GetCurrentProcess().PriorityBoostEnabled = true;
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;
// Of course this only affects the main thread rather than child threads.
Thread.CurrentThread.Priority = ThreadPriority.Highest;
BigInteger seed = SomeExtremelyLargeNumber; // Millions of digits.
// The following loop takes [seed] and processes some numbers.
result1 = Parallel.For(/* With thread-static variables. */);
while (true) // Main loop that cannot be parallelized.
{
// Processes result1.
result2 = Parallel.For(/* With thread-static variables. */);
// Processes result2.
result1 = Parallel.For(/* With thread-static variables. */);
if (result1 == criteria)
break;
// Note: This loop does not need to sleep or care about system responsiveness.
}
}
Now based on the thread priority related questions on SO, I gather that anything using ThreadPool should not be messed around with in terms of priority. So If I need to switch to manual threads, so be it.
Question:
How should I change the above code to manual threading to benefit from increased thread priority (not using thread pool etc.)?
Will setting priority to highest on all child threads even help? I mean will the child threads just be fighting with each other or will that give them an edge over external OS tasks?
Considering there are 16 cores, should I be running 16 or 15 threads? Is there a general guideline to this?
Will setting process priority to real-time help as well?
With such an app, I would expect changing the priorities to make 0% difference to the overall runtime. If you're already maxed out on CPU use with all 16 cores at 100% doing real work, there's not much more you can do.
You don't need to bother with setting the priority for individual threads, just do it for the entire process, since most of its threads are doing important work apparently.
However, I don't expect it to make any difference for CPU intensive apps like yours. The only processes that could forcibly preempt your own process are I/O intensive apps that traditionally are favored by most OSes, but since you have a dedicated machine, that won't be a problem (also, Windows Server is pretty lightweight in my experience, so it won't interfere if your app is the only one running).
As a side-note:
The algorithm is not memory-intensive at all, just processor
intensive. It does number crunching and does NOT perform any disk I/O,
database connectivity, network connectivity, etc.
The fact that it does not perform "obvious" I/O operations does not mean that it cannot be memory intensive. If you are processing large arrays or other data-structures, the CPU will constantly emit read/write operations to the main memory and a lot of work is being done to move data between the various memory levels. Even working with just numbers can negatively impact the performance of a program if done incorrectly.
My code can change the priority of the process and the threads.
public void SetPriorityProcessAndTheards(string nameProcess,ProcessPriorityClass processPriority, ThreadPriorityLevel threadPriorityLevel)
{
foreach(Process a in Process.GetProcessesByName(nameProcess))
{
a.PriorityBoostEnabled = true;
a.PriorityClass = processPriority;
foreach(ProcessThread processThread in a.Threads)
{
processThread.PriorityLevel = threadPriorityLevel;
processThread.PriorityBoostEnabled = true;
}
}
}
I would expect you to limit the number of threads to the number of cores/threads. Sometimes the Parallel Task Library uses too many threads. For your cpumaxed process either corecount or threadcount (hyperthreading fakecores added) would be best, so supply and fix the threadcount;
// Create a ParallelOptions object and supply this to the Parallel.For()
var po = new ParallelOptions {MaxDegreeOfParallelism = Environment.ProcessorCount}
Parallel.For(,,po,);
// Environment.ProcessorCount gives number of Cores (NOT processors)
// Never found out how to detect fake cores or hyperthreads, check Task Monitor ;-)
You can re-use the po object for all the parallel.For() statements. I have never really benefitted from priority fiddling, even on CPU bound threaded apps.

ThreadPool behaviour: not growing from minimum size

I had set up my thread pool like this:
ThreadPool.SetMaxThreads(10000, 10000);
ThreadPool.SetMinThreads(20, 20);
However, my app started hanging under heavy load. This seemed to be because worker tasks were not executing: I had used ThreadPool.QueueUserWorkItem to run some tasks which in turn used the same method to queue further work. This is obviously dangerous with a limited thread pool (a deadlock situation), but I am using a thread pool not to limit maximum threads but to reduce thread creation overhead.
I can see the potential trap there, but I believed that setting a maximum of 10000 threads on the pool would mean that if an item was queued, all threads were busy, and there weren't 10000 threads in the pool, a new one would be created and the task processed there.
However, I changed to this:
ThreadPool.SetMaxThreads(10000, 10000);
ThreadPool.SetMinThreads(200, 200);
..and the app started working. If that made it start working, am I missing something about how/when the thread pool expands from minimum toward maximum size?
The job of the threadpool scheduler is to ensure there are no more executing TP threads than cpu cores. The default minimum is equal to the number of cores. A happy number since that minimizes the overhead due to thread context switching. Twice a second, the scheduler steps in and allows another thread to execute if the existing ones haven't completed.
It will therefore take a hour and twenty minutes of having threads that don't complete to get to your new maximum. It is fairly unlikely to ever get there, a 32-bit machine will keel over when 2000 threads have consumed all available virtual memory. You'd have a shot at it on a 64-bit operating system with a very large paging file. Lots of RAM required to avoid paging death, you'd need at least 12 gigabytes.
The generic diagnostic is that you are using TP threads inappropriately. They take too long, usually caused by blocking on I/O. A regular Thread is the proper choice for those kind of jobs. That's probably hard to fix right now, especially since you're happy with what you got. Raising the minimum is indeed a quick workaround. You'll have to hand-tune it since the TP scheduler can't do a reasonable job anymore.
Whenever you use the thread pool, you are at the mercy of its "thread injection and retirement algorithm".
The algorithm is not properly documented ( that I know of ) and not configurable.
If you're using Tasks, you can write your own Task Scheduler
The performance issue you described, is similar to what is documented in this ASP.NET KB article,
http://support.microsoft.com/kb/821268
To summarize, you need to carefully choose the parameters (this article mentions the typical settings for default ASP.NET thread pool, but you can apply the trick to your app), and further tune them based on performance testing and the characteristics of your app.
Notice that the more you learn about load, you will see that "heavy load" is no longer a good term to describe the situation. Sometimes you need to further categorize the cases, to include detailed term, such as burst load, and so on.
If your logic depends on having a minimum amount of threads you need to change that, urgently.
Setting a MinThreads of 200 (or even 20) is wasting quite a bit of memory. Note that the MaxThreads won't be relevant here, you probably don't have the 10 GB mem for that.
The fact that a min of 200 helps you out is suspicious and as a solution it is probably very brittle.
Take a look at normal Producer/Consumer patterns, and/or use a bounded queue to couple your tasks.

Threading cost - minimum execution time when threads would add speed

I am working on a C# application that works with an array. It walks through it (meaning that at one time only a narrow part of the array is used). I am considering adding threads in it to make it perform faster (it runs on a dualcore computer). The problem is that I do not know if it would actually help, because threads cost something and this cost could easily be more than the parallel gain... So how do I determine if threading would help?
Try writing some benchmarks that mimic, as closely as possible, the real-world conditions in which your software will actually be used.
Test and time the single-threaded version. Test and time the multi-threaded version. Compare the two sets of results.
If your application is CPU bound (i.e. it isn't spending time trying to read files or waiting for data from a device) and there is little to no sharing of live data (data being altered, if its read only its fine) between the threads then you can pretty much increase the speed by 50->75% by adding another thread (as long as it still remains CPU bound of course).
The main overhead in multithreading comes from 2 places.
Creation & initialization of the thread. Creating a thread requires quite a few resources to be allocated and involves swaps between kernel and user mode, this is expensive though a once off per thread so you can pretty much ignore it if the thread is running for any reasonable amount of time. The best way to mitigate this problem is to use a thread pool as it will keep the thread on hand and not need to be recreated.
Handling synchronization of data. If one thread is reading from data that another is writing, bad things will generally happen (worse if both are changing it). This requires you to lock your data before altering it so that no thread reads a half written value. These locks are generally quite slow as well. To mitigate this problem, you need to design your data layout so that the threads don't need to read or write to the same data as much as possible. If you do need a lot of these locks it can then become slower than the single thread option.
In short, if you are doing something that requires the CPU's to share a lot of data, then multi-threading it will be slower and if the program isn't CPU bound there will be little or no difference (could be a lot slower depending on what it is bound to, e.g. a cd/hard drive). If your program matches these conditions, then it will PROBABLY be worthwhile to add another thread (though the only way to be certain would be profiling).
One more little note, you should only create as many CPU bound threads as you have physical cores (threads that idle most of the time, such as a GUI message pump thread, can be ignored for this condition).
P.S. You can reduce the cost of locking data by using a methodology called "lock-free programming", though this something that should really only be attempted by people with a lot of experience with multi-threading and a clear understanding of their target architecture (including how the cache is treated and the memory bus).
I agree with Luke's answer. Benchmark it, it's the only way to be sure.
I can also give a prediction of the results - the fastest version will be when the number of threads matches the number of cores, EXCEPT if the array is very small and each thread would have to process just a few items, the setup/teardown times might get larger than the processing itself. How few - that depends on what you do. Again - benchmark.
I'd advise to find out a "minimum number of items for a thread to be useful". Then, when you are deciding how many threads to spawn (or take from a pool), check how many cores the computer has and how many items there are. Spawn as many threads as possible, but no more than the computer has cores, and not so many that each thread would have less than the minimum number of items to process.
For example if the minimum number of items is, say, 1000; and the computer has 4 cores; and your list contains 2500 items, you would spawn just 2 threads, because more threads would be inefficient (each would process less than 1000 items).
Making a step by step list for Luke's idea:
Make a single threaded test app
Download Sysinternals Process Monitor and run it
Run your test app and find it on the process list (remember to run it as a release build outside of Visual Studio)
Double click the process and select the Performance Graph tab
Observe the CPU time used by your process
If the CPU time is sittling flat 50% for more than a few seconds, you can probably speed your overall process up using threads (assuming the bunch of stuff Mr Peters refered to holds true)
(However, the best you can do on a duel core machine is to halve the time it takes to run. If your process only take 4 seconds, it might not be worth getting it to run in 2 seconds)
Using the task parallel library / Rx provides a friendlier interface than System.Threading.ThreadPool, which might make your world a bit easier.
You miss imho one item, which is that it is not always about execution time. There is:
The problem to koop a UI operational during an operation. Even if the UI is "dormant", a nonresponsive message pump makes a worse impression.
The possibility to use a thread pool to actually not ahve to start / stop threads all the time. I use thread pools very extensively, and various parts of the applications keep them busy.
Anyhow, ignoring my point 1 - where you may go multi threaded without speeding things up in order to keep your UI responsive - I would say it is always then faster when you can actually either split up work (so you can keep more than one core busy) or offload it for othe reasons.

Categories

Resources