Calling same function on multiple thread increases performance - c#

In my Application i have structure some thing like below
if(iterationCount==-3)
{
CreatFullNetwork(obj1)
}
if(iterationCount==-2)
{
CreatFullNetwork(obj2)//Method is same as previous
}
if(iterationCount==-1)
{
//Obj1,2,3 are same object but the sorting order variables inside object are different
CreatFullNetwork(obj3)//Method is same as previous
}
To increase performance i am planning to create 3 threads and run it parallel.Is this a good approach will it work.
Note: CreatFullNetwork() is very huge method it has sub methods in it and creates lots of collections and updates them

In a comment you state that the function call is not CPU bound. It does not reach close to 100% CPU utilization. In which case it seems very unlikely that your program's performance is liable to be improved by multi-threading.
On top of that it seems that your code uses a lot of shared variables that are not synchronized. Before you could even contemplate running the code in parallel you'd need to deal with that issue. Typically there are two ways to do that:
Serialize access to shared variables to avoid data races.
Arrange for each thread a private copy of the information and variables it needs.
Generally speaking, option 2 is better since serialization has performance overhead due to the use of locks. However, option 2 may be hard to achieve and can have its own performance issues in case you need to copy a lot of data.
Most of this is moot if your code is not CPU bound. That said, perhaps the bottleneck is at a different machine. Perhaps the bottleneck is in database access. If the database can handle parallel access efficiently then perhaps threading will help.
The bottom line is that you need to have a much clearer understanding of what your code is doing and what is limiting performance before you can contemplate options to speed it up. Threading is not a universal panacea. It won't help speed up all programs, and you always need to know how best to deploy it.

Related

.NET 4 Parallel.ForEach and PLINQ: can they overwhelm the thread pool and kill the app performance?

The more I use Parallel.ForEach and PLINQ in my code, the more faces and code review push backs I am getting. So I wonder is there any reason for me NOT to use PLINQ, at extreme, on each LINQ statement? Can the runtime not be smart enough to start spawning so many threads (or consuming so many threads from the thread pool) that the app performance would actually degrade instead of improve? The same question applies to Parallel library.
I do understand implications related to thread-safety and overhead of using multi-threading. I also realize not everything is good for parallelizing. All I am wondering about if I should stop defending my approaches and just give up on these two fine things because my peers think I'd better do thread control myself instead of relying on .NET facilities?
UPDATE: please assume the hardware is sufficiently good to satisfy prerequisites for use of multithreading.
It all comes down to two things:
Is the extra work required to partition the collection and synchronize the threads greater than the performance gain compared to a regular foreach?
Are all the threads going to use a shared resource that will become a bottle neck?
An example of the second case is doing a Parallel.ForEach over the results of a Linq to Sql statement. In that case, if your results are coming from the DB very slowly, each thread may spend more time waiting for data to process than actually doing something.
See: http://msdn.microsoft.com/en-us/library/dd997392.aspx
To set the number of worker threads you can use .WithDegreeOfParallelism(N)
eg
var query = from item in source.AsParallel().WithDegreeOfParallelism(2)
where Compute(item) > 42
select item;
See http://msdn.microsoft.com/en-us/library/dd997425.aspx
When dig into performance questions this deep, I think the best thing to do is... measure, measure and measure. Even if somebody answered that PLINK is great and will boost the performance of your application, would you trust that without verifing it with profiling? Although general answers may exists you cannot spare the effort to measure the performance in your exact case. The overall performance depends on so many things and it can be that PLINK helps in one case but not in the other.My personal experiences with PLINK is that after swicthing every LINQ query into PLINK the response times are way better when the load is small, and there is no difference when the load is around its maximum. But I can imagine a case where PLINK hurts the overall performance under a huge load. Have to check it for your own particular case.Well... and if you want to convince other people that you are walking the right path, what else would be better than measurement results?

Divide work among processes or threads?

I am interning for a company this summer, and I got passed down this program which is a total piece. It does very computationally intensive operations throughout most of its duration. It takes about 5 minutes to complete a run on a small job, and the guy I work with said that the larger jobs have taken up to 4 days to run. My job is to find a way to make it go faster. My idea was that I could split the input in half and pass the halves to two new threads or processes, I was wondering if I could get some feedback on how effective that might be and whether threads or processes are the way to go.
Any inputs would be welcomed.
Hunter
I'd take a strong look at TPL that was introduced in .net4 :) PLINQ might be especially useful for easy speedups.
Genereally speaking, splitting into diffrent processes(exefiles) is inadvicable for perfomance since starting processes is expensive. It does have other merits such as isolation(if part of a program crashes) though, but i dont think they are applicable for your problem.
If the jobs are splittable, then going multithreaded/multiprocessed will bring better speed. That is assuming, of course, that the computer they run on actually has multiple cores/cpus.
Threads or processes doesn't really matter regarding speed (if the threads don't share data). The only reason to use processes that I know of is when a job is likely to crash an entire process, which is not likely in .NET.
Use threads if theres lots of memory sharing in your code but if you think you'd like to scale the program to run across multiple computers (when required cores > 16) then develop it using processes with a client/server model.
Best way when optimising code, always, is to Profile it to find out where the Logjam's are IMO.
Sometimes you can find non obvious huge speed increases with little effort.
Eqatec, and SlimTune are two free C# profilers which may be worth trying out.
(Of course the other comments about which parallelization architecture to use are spot on - it's just I prefer analysis first....
Have a look at the Task Parallel Library -- this sounds like a prime candidate problem for using it.
As for the threads vs processes dilemma: threads are fine unless there is a specific reason to use processes (e.g. if you were using buggy code that you couldn't fix, and you did not want a bad crash in that code to bring down your whole process).
Well if the problem has a parallel solution then this is the right way to (ideally) significantly (but not always) increase performance.
However, you don't control making additional processes except for running an app that launches multiple mini apps ... which is not going to help you with this problem.
You are going to need to utilize multiple threads. There is a pretty cool library added to .NET for parallel programming you should take a look at. I believe its namespace is System.Threading.Tasks or System.Threading with the Parallel class.
Edit: I would definitely suggest though, that you think about whether or not a linear solution may fit better. Sometimes parallel solutions would taken even longer. It all depends on the problem in question.
If you need to communicate/pass data, go with threads (and if you can go .Net 4, use the Task Parallel Library as others have suggested). If you don't need to pass info that much, I suggest processes (scales a bit better on multiple cores, you get the ability to do multiple computers in a client/server setup [server passes info to clients and gets a response, but other than that not much info passing], etc.).
Personally, I would invest my effort into profiling the application first. You can gain a much better awareness of where the problem spots are before attempting a fix. You can parallelize this problem all day long, but it will only give you a linear improvement in speed (assuming that it can be parallelized at all). But, if you can figure out how to transform the solution into something that only takes O(n) operations instead of O(n^2), for example, then you have hit the jackpot. I guess what I am saying is that you should not necessarily focus on parallelization.
You might find spots that are looping through collections to find specific items. Instead you can transform these loops into hash table lookups. You might find spots that do frequent sorting. Instead you could convert those frequent sorting operations into a single binary search tree (SortedDictionary) which maintains a sorted collection efficiently through the many add/remove operations. And maybe you will find spots that repeatedly make the same calculations. You can cache the results of already made calculations and look them up later if necessary.

Should Threads be avoided if at all possible inside software components?

I have recently been looking at code, specifically component oriented code that uses threads internally. Is this a bad practise. The code I looked at was from an F# example that showed the use of event based programming techniques. I can not post the code in case of copyright infringements, but it does spin up a thread of its own. Is this regarded as bad practise or is it feasible that code not written by yourself has full control of thread creation. I do point out that this code is not a visual component and is very much "built from scratch".
What are the best practises of component creation where threading would be helpful?
I am completely language agnostic on this, the f# example could have been in c# or python.
I am concerned about the lack of control over the components run time and hogging of resources, the example just implemented another thread, but as far as I can see there is nothing stopping this type of design from spawning as many threads as it wishes, well to the limit of what your program allows.
I did think of methods such as object injecting and so fourth, but threads are weird as they are from a component perspective pure "action" as opposed to "model, state, declarations"
any help would be great.
This is too general a question to bear any answer more specific than "it depends" :-)
There are cases when using internal threads within a component is completely valid, and there are cases when not. This has to be decided on a case by case basis. Overall, though, since threads do make the code much more difficult to test and maintain, and increase the chances of subtle, hard to find bugs, they should be used with caution, only when there is a really decisive reason to use them.
An example to the legitimate use of threads is a worker thread, where a component handling an event starts an action which takes a long time to execute (such as a lengthy computation, a web request, or extensive file I/O), and spawns a separate thread to do the job, so that the control can be immediately returned to the interface to handle further user input. Without the worker thread, the UI would be totally unresponsive for a long time, which usually makes users angry.
Another example is a lengthy calculation/process which lends itself well to parallel execution, i.e. it consists of many smaller independent tasks of more or less similar size. If there are strong performance requirements, it does indeed make sense to execute the individual tasks in a concurrent fashion using a pool of worker threads. Many languages provide high level support for such designs.
Note that components are generally free to allocate and use any other kinds of resources too and thus wreak havoc in countless other ways - are you ever worried about a component eating up all memory, exhausting the available file handles, reserving ports etc.? Many of these can cause much more trouble globally within a system than spawning extra threads.
There's nothing wrong about creating new threads in a component/library. The only thing wrong would be if it didn't give the consumer of the API/component a way to synchronize whenever necessary.
First of all, what is the nature of component you are talking about? Is it a dll to be consumed by some different code? What does it do? What are the business requirements? All these are essential to determine if you do need to worry about parallelism or not.
Second of all, threading is just a tool to acheive better performance, responsivness so avoiding it at all cost everywhere does not sound like a smart approach - threading is certainly vital for some business needs.
Third of all, when comparing threading symantics in c# vs f#, you have to remember that those are very different beasts in theirselfs - f# implicitly makes threading safer to code as there is no notion of global variables hence the critical section in your code is something easier to eschew in f# than in c#. That puts your as a deleloper in a better place bc you dont have to deal with memoryblocks, locks, semaphores etc.
I would say if your 'component' relies heavily on threading you might want to consider using either the parallel FX in c# or even go with f# since it kind of approaches working with processer time slicing and parallelism in more elegant way (IMHO).
And last but not least, when you say about hogging up computer resources by using threading in your component - please remember that coding threads do not necessarily impose higher resource impact per se – you can just as easily do the same damage on one thread if you don’t dispose of your objects (unmaneged) properly, granted you might get OutOfMemeory Exception faster when you make the same mistake on several threads…

Debugging and diagnosing lock convoying problems in .NET

I am looking into performance issues of a large C#/.NET 3.5 system that exhibits performance degradation as the number of users making requests scales up to 40-50 distinct user requests per second.
The request durations increase significantly, while CPU and I/O loads appear to stay about the same. This leads me to believe we may have problem with how shared objects in our system, which are protected using c# lock() {...} statements may be affecting concurrent access performance. Specifically, I suspect that some degree of lock convoying is occurring on frequently used shared data that is protected by critical sections (because it it read/write).
Does anyone have suggestions on how to actually diagnose if lock convoying is the problem .. or if lock contention of any kind is contributing to long request times?
Lock convoys are hard to debug in general. Does your code path have sequential lock statements either directly or in branches?
The Total # of Contentions performance counter gives a base estimate of contention in the app.
Also break open a profiler and look. You can also write some perf counters to track down the slow parts of a code path. Also make sure that locks are only being held for as long as absolutely necessary.
Also check out the Windows Performance Tools. I have found these to be extremely useful as you can track down lots of low level problems like abnormal amounts of context switching.
A good place to start is by having a look at the Lock and Thread performance counters. Out of interesting what exactly are you locking for in your Web app? Locking in most ASP.NET applications isn't common.
I can't provide much insight into the diagnostics, but if you find proof to back up your assumption then you might be interested in System.Threading.ReaderWriterLockSlim which allows for concurrent reads, but prevents concurrent writes.

Spinlocks, How Useful Are They?

How often do you find yourself actually using spinlocks in your code? How common is it to come across a situation where using a busy loop actually outperforms the usage of locks?
Personally, when I write some sort of code that requires thread safety, I tend to benchmark it with different synchronization primitives, and as far as it goes, it seems like using locks gives better performance than using spinlocks. No matter for how little time I actually hold the lock, the amount of contention I receive when using spinlocks is far greater than the amount I get from using locks (of course, I run my tests on a multiprocessor machine).
I realize that it's more likely to come across a spinlock in "low-level" code, but I'm interested to know whether you find it useful in even a more high-level kind of programming?
It depends on what you're doing. In general application code, you'll want to avoid spinlocks.
In low-level stuff where you'll only hold the lock for a couple of instructions, and latency is important, a spinlock mat be a better solution than a lock. But those cases are rare, especially in the kind of applications where C# is typically used.
In C#, "Spin locks" have been, in my experience, almost always worse than taking a lock - it's a rare occurrence where spin locks will outperform a lock.
However, that's not always the case. .NET 4 is adding a System.Threading.SpinLock structure. This provides benefits in situations where a lock is held for a very short time, and being grabbed repeatedly. From the MSDN docs on Data Structures for Parallel Programming:
In scenarios where the wait for the lock is expected to be short, SpinLock offers better performance than other forms of locking.
Spin locks can outperform other locking mechanisms in cases where you're doing something like locking through a tree - if you're only having locks on each node for a very, very short period of time, they can out perform a traditional lock. I ran into this in a rendering engine with a multithreaded scene update, at one point - spin locks profiled out to outperform locking with Monitor.Enter.
For my realtime work, particularly with device drivers, I've used them a fair bit. It turns out that (when last I timed this) waiting for a sync object like a semaphore tied to a hardware interrupt chews up at least 20 microseconds, no matter how long it actually takes for the interrupt to occur. A single check of a memory-mapped hardware register, followed by a check to RDTSC (to allow for a time-out so you don't lock up the machine) is in the high nannosecond range (basicly down in the noise). For hardware-level handshaking that shouldn't take much time at all, it is really tough to beat a spinlock.
My 2c: If your updates satisfy some access criteria then they are good spinlock candidates:
fast, ie you will have time to acquire the spinlock, perform the updates and release the spinlock in a single thread quanta so that you don't get pre-empted while holding the spinlock
localized all data you update are in preferably one single page that is already loaded, you do not want a TLB miss while you holding the spinlock, and you definetely don't want an page fault swap read!
atomic you do not need any other lock to perform the operation, ie. never wait for locks under spinlock.
For anything that has any potential to yield, you should use a notified lock structure (events, mutex, semaphores etc).
One use case for spin locks is if you expect very low contention but are going to have a lot of them. If you don't need support for recursive locking, a spinlock can be implemented in a single byte, and if contention is very low then the CPU cycle waste is negligible.
For a practical use case, I often have arrays of thousands of elements, where updates to different elements of the array can safely happen in parallel. The odds of two threads trying to update the same element at the same time are very small (low contention) but I need one lock for every element (I'm going to have a lot of them). In these cases, I usually allocate an array of ubytes of the same size as the array I'm updating in parallel and implement spinlocks inline as (in the D programming language):
while(!atomicCasUbyte(spinLocks[i], 0, 1)) {}
myArray[i] = newVal;
atomicSetUbyte(spinLocks[i], 0);
On the other hand, if I had to use regular locks, I would have to allocate an array of pointers to Objects, and then allocate a Mutex object for each element of this array. In scenarios such as the one described above, this is just plain wasteful.
If you have performance critical code and you have determined that it needs to be faster than it currently is and you have determined that the critical factor is the lock speed, then it'd be a good idea to try a spinlock. In other cases, why bother? Normal locks are easier to use correctly.
Please note the following points :
Most mutexe's implementations spin for a little while before the thread is actually unscheduled. Because of this it is hard to compare theses mutexes with pure spinlocks.
Several threads spining "as fast as possible" on the same spinlock will consome all the bandwidth and drasticly decrease your program efficiency. You need to add tiny "sleeping" time by adding noop in your spining loop.
You hardly ever need to use spinlocks in application code, if anything you should avoid them.
I can't thing of any reason to use a spinlock in c# code running on a normal OS. Busy locks are mostly a waste on the application level - the spinning can cause you to use the entire cpu timeslice, vs a lock will immediatly cause a context switch if needed.
High performance code where you have nr of threads=nr of processors/cores might benefit in some cases, but if you need performance optimization at that level your likely making next gen 3D game, working on an embedded OS with poor synchronization primitives, creating an OS/driver or in any case not using c#.
I used spin locks for the stop-the-world phase of the garbage collector in my HLVM project because they are easy and that is a toy VM. However, spin locks can be counter-productive in that context:
One of the perf bugs in the Glasgow Haskell Compiler's garbage collector is so annoying that it has a name, the "last core slowdown". This is a direct consequence of their inappropriate use of spinlocks in their GC and is excacerbated on Linux due to its scheduler but, in fact, the effect can be observed whenever other programs are competing for CPU time.
The effect is clear on the second graph here and can be seen affecting more than just the last core here, where the Haskell program sees performance degradation beyond only 5 cores.
Always keep these points in your mind while using spinlocks:
Fast user mode execution.
Synchronizes threads within a single process, or multiple processes if in shared memory.
Does not return until the object is owned.
Does not support recursion.
Consumes 100% of CPU while "waiting".
I have personally seen so many deadlocks just because someone thought it will be a good idea to use spinlock.
Be very very careful while using spinlocks
(I can't emphasize this enough).

Categories

Resources