Appropriate Multi-Threading Option - c#

Scenario
I have a very heavy number-crunching process that pools large datasets from 3 different databases and then does a bit of processing on each to eventually produce a result.
This process is fine if it is only used by a single asset. However I now have 3500 assets that I need to process, which takes about 1hr30mins in the state of the current process.
Question
What is my best option for speeding this process up in terms of a multi-threaded c# application? Realistically I don't have to share anything between the processing of each asset, so I'm confident that being able to run process multiple assets at a time shouldn't cause too many issues.
Thoughts
I've heard good things about thread pools, but I guess realistically I want something that isn't too huge to implement, is easily understandable and can run off a decent number of threads at a time.
Help would be greatly appreciated.

In .net you can use the existing Thread Pool, no need to implement one yourself. Here is the relevant MSDN.
You should take care not to run too many processes at once (3500 are a bit much), but using the supplied queuing mechanism should get you started in the right direction.
Another thing to try is using PLINQ.

If you don't have a multi-core processor, multiple machines, and/or the thread processes are not I/O bound, multithreading will not help. Start by profiling the current processing to see where the time is going.
Thread pools are fine, and you can use a task queue to do simple load-balancing, but if there's no spare CPU cycles in the current application this would be a waste of time.

The nicest option would be to use the new Task Parallel Library in .NET 4, if you can do this using VS 2010 RC. This has built-in load balancing and work stealing queues, so it will make this task easy to thread, and very scalable.
However, if you need to do this in .NET 3.5, I would recommend using the ThreadPool, and just using ThreadPool.QueueUserWorkItem to start each task.
If your tasks are all very computationally intensive for their entire lifetime, you may want to prevent having too many running concurrently. Some form of queue, which you pull work from and execute, can be beneficial in this case. Just place all of your work items into a queue, and have threads pull work from the queue (with appropriate locking), and process.
If you have a multi-core system, and CPU cycles are your bottleneck, this should scale very well.

The .Net built in ThreadPool will solve both of your requirements of running a decent number of threads as well as being simple to work with. I have previously written an article on the subject which you can find here.

With using SQL Server 2005 or later, you can create user-defined functions in C# and use them from within T-SQL procedures, which can give a marked speedup for number crunching. SQL Server is multi-threaded and does a good job with it, so consider keeping as much of the processing in the database engine as you can.

Related

multi core processing in c#

I have a c# desktop application. Its purpose is 2 fold.
1). To display a live feed from an IP camera to my winform application.
2). Send any captured motion to my server.
It is (2) that is labour intensive. I believe I have optimised it as much as I can and the RAM is manageable.
However, in my quest to learn and to try to make my code even more efficient I am always open to new approaches.
Today, I have come across parallel processing. But, reading some links it seems to suggest there would be not much performance gain using parallel processing. Indeed in all my travels (contracts) I have never seen anyone use parallel processing in C# development.
Should I take early heed and not bother to look into this or should I see whether there is anything to gain by 'off-loading' my motion detection code to a separate parallel process?
Peoples advice/experience would be greatly informative.
Thanks
I would recommend taking a look at the Task Parallel Library provided in the .NET Framework, it's based on an idea that a piece of work is a Task. The idea is to give an abstraction to having to manage and create threads manually.
Tasks can run in parallel, on their own threads or run on the same thread, depending on the workload and configuration. Task Parallel Library is also great for asynchronous operations and work very well with I/O where the hardware can cause a blocking thread which can cause performance issues in your application, for example reading from a hard drive will cause some issues.
I suggest running a profiler on your application, visual studio professional onwards comes with a built in profiler that will enable you to trace and pin-point intensive operations that could possibly be improved with concurrency. If your application is running smooth, then there is no need, but there's nothing wrong with forward thinking and learning the Task Parallel Library as im sure there will be a point where this will benefit you from knowing how to implement concurrency in your application.
I've used TPL to solve various performance issues with large database calls in iterative loops and it's great for these IO operations, TPL will also take into account the hardware which it's being executed on and if used correctly, always be the most optimal for the hardware its running on. You could take your same piece of code and run it on a 2 core machine and it will still work the best to its abilities the hardware can provide without you having to worry about creating too many threads etc.
Personally, I'd say some asynchronous operations could be a good addition to your application since this is regarding external network camera devices which could cause blocking threads in your application.

Do I need a Multi-threaded WPF application for this scenario?

I need an application/service which runs in the background and generate bills on a particular date of every month.
I went through many articles explaining the difference between Windows Service and Scheduled task Application and came to a conclusion that an application would suit my scenario.
Having said this, I wonder if I need to use Multi-threading in my application as I understand Multi-threading is basically to create a responsive UI while doing long running tasks but since my application will have no UI , do I need to have multi-threading actually?
Is there any difference in performance for a single threaded application to get the data from various sources (say database,webservice) and a multi-threaded application where we distribute each task to a thread and finally integrate all the output?
Typically, an application like this will have no user interface at all, in which case your rationale for multi-threading may be meaningless in this case.
That being said, whether or not to use multiple threads for processing your data is another issue entirely. You could, if it makes sense to do so. If this is an application that's going to run once per month, it may be just as easy to leave it single threaded, as there's likely not a time constraint for completion.
If you need to process the items quickly, though, it may make sense to thread portions of the application.
Is there any difference in performance for a single threaded application to get the data from various sources (say database,webservice) and a multi-threaded application where we distribute each task to a thread and finally integrate all the output?
Typically, yes. That's the most common reason to introduce threading - it allows you to do more work in less time. It does add a fair amount of complexity (depends on the scenario), however.
You would, presumably, get a faster response time from a multi-threaded program if these two cases are true: You have a multi core processor, which almost everyone does these days. Pulling data from all of your sources could be done in any order, and accessing that source with one thread would not lock it up from another.
The best reason to use multiple threads in this cause would be if you spend a lot of time blocking; waiting for something else to respond. If you're reading tons of data from your hard drive as fast as the disk can give it, then having two threads that read data shouldn't give you anything faster. In fact, I think it would be a bit slower. But if you're getting a lot of data from, say, sockets (the internet), and your threads spend a fair amount of time waiting for external servers to respond (and you're not using all of your bandwidth), then a multi threaded program would give you a boost in speed.

C# multi threading query

I am trying to write a program in C# that will connect to around 400 computers and retrieve some information, lets say it retrieves the list of web services running on each computer.
I am assuming I need a well threaded application to be able to retrieve info from such a huge number of servers really quick. I am pretty blank on how to start working on this, can you guys give me a head start as to how to begin!
Thanks!
I see no reason why you should use threading in your main logic. Use asynchronous APIs and schedule their callback to the main thread. That way you get the benefits of asynchrony, but without most of the difficulty related to threading.
You'll only need multithreading in your logic code if the work you need to do on the data is that expensive. And even then you usually can get aways with parallelizing using side effect free functions.
Take a look at the Task Parallel Library.
Speficically Data Parallelism.
You could also use PLINQ if you wanted.
You should also execute the threads parallely on a multi-core CPU to enhance performance.
My favourite references on the topic are given below -
http://www.albahari.com/threading/
http://www.codeproject.com/KB/Parallel_Programming/NET4ParallelIntro.aspx
Where and how do you get the list of those 400 servers to query?
how often do you need to do this?
you could use a windows service or schedule a task which invoke your software and in it you could do a foreach element in the server list and start a call to such server in a different thread using thread queue/pool, but there is a maximum so you won't start 400 threads all together anyway.
describe a bit better your solution and we see what you can do :)
Take a look at this library: Task Parallel Library. You can make efficient use of your system resources and manage your work easier than managing your threads directly.
There might be considerable impact on the server side when you start query all 400 computers. But you can take a look at Parallel LINQ (PLINQ), where you can limit the degree of parallelism.
You can also use thread pooling for this matter, e.g. a Task class.
Createing manual threads may not be a good idea, as they are not highly reusable and take quite a lot of memory/CPU to be created

When to use Multithread?

When do you use threads in a application? For example, in simple CRUD operations, use of smtp, calling webservices that may take a few time if the server is facing bandwith issues, etc.
To be honest, i don't know how to determine if i need to use a thread (i know that it must be when we're excepting that a operation will take a few time to be done).
This may be a "noob" question but it'll be great if you share with me your experience in threads.
Thanks
I added C# and .NET tags to your question because you mention C# in your title. If that is not accurate, feel free to remove the tags.
There are different styles of multithreading. For example, there are asynchronous operations with callback functions. .NET 4 introduces the parallel Linq library. The style of multithreading you would use, or whether to use any at all, depends on what you are trying to accomplish.
Parallel execution, such as what parallel Linq would generally be trying to do, takes advantage of multiple processor cores executing instructions that do not need to wait for data from each other. There are many sources for such algorithms outside Linq, such as this. However, it is possible that parallel execution may be unable to you or that it does not suit your application.
More traditional multithreading takes advantage of threading within the .NET library (in this case) as provided by System.Thread. Remember that there is some overhead in starting processes on threads, so only use threads when the advantages of doing so outweigh this overhead. Generally speaking, you would only want to use this type of single-processor multithreading when the task running under the thread will have long gaps in which the processor could be doing something else. For example, I/O from hard disk (and, consequently, from a database system that uses one) is many orders of magnitude slower than memory access. Network access can also be slow, as another example. Multithreading could allow another process to be running while waiting for these slow (compared to the processor) operations to complete.
Another example when I have used traditional multithreading is to cache some values the first time a particular ASP.NET page is accessed within a session. I kick off a thread so that the user does not have to wait for the caching to complete before interacting with the page. I also regulate the behavior when the caching does not complete before the user requests another page so that, if the caching does not complete, it is not a problem. It simply makes some further requests faster that were previously too slow.
Consider also the cost that multithreading has to the maintainability of your application. Threaded applications can be harder to debug, for example.
I hope this answers your question at least somewhat.
Joseph Albahari summarized it very well here:
Maintaining a responsive user interface
Making efficient use of an otherwise blocked CPU
Parallel programming
Speculative execution
Allowing requests to be processed simultaneously
One reason to use threads is to split large, CPU-bound tasks across a number of CPUs/cores, to finish faster. Another is to let an extended task execute asynchronously, so the foreground can remain responsive while it runs.
Your examples seem to be concentrating on the second of these. While it can be a good reason, if you can use asynchronous I/O instead, that's usually preferable (e.g., almost anything using sockets can/will be better off using the socket(s) asynchronously). Asynchronous I/O is easier to cancel, and it'll usually have lower CPU overhead as well.
You can use threads when you need different execution paths. This leads(when done correctly) to more responsive and/or faster applications but also leads to more complex code and debugging.
In a simple CRUD scenario maybe is not that useful, but maybe your UI is consuming a slow web service. If you your code is tied to your UI thread you will have unresponsive UI between the service calls.
In that case, using System.Threading.Threads maybe be overkill because you don't need so much control. Using a BackgrounWorker maybe a better choice.
Threading is something difficult to master, but the benefits when used correctly are huge, performance is the most common.
Somehow you have answered your question by yourself. Using threads whenever you execute time consuming operations is right choice. Also you should it in situations when you want to make things faster. For example you want to process some amount of files - each file can be processed by different thread.
By using threads you can better utilize power of multi-core/processor machines.
Monitoring some data in background of your application.
There are dozens of such scenarios.
Realising my comment might suffice as an answer ...
I like to view multi-threading scenarios from a resource perspective. In other words, UI (graphics), networking, disk IO, CPU (cores), RAM etc. I find that helps when deciding where to use multi-threading in the general sense at least.
The reasoning behind this is simply that I can take advantage of one resource on a specific thread (eg. Disk IO) while at the same time using another thread to accomplish something else using a different resource.

How do I pick the best number of threads for hyptherthreading/multicore?

I have some embarrassingly-parallelizable work in a .NET 3.5 console app and I want to take advantage of hyperthreading and multi-core processors. How do I pick the best number of worker threads to utilize either of these the best on an arbitrary system? For example, if it's a dual core I will want 2 threads; quad core I will want 4 threads. What I'm ultimately after is determining the processor characteristics so I can know how many threads to create.
I'm not asking how to split up the work nor how to do threading, I'm asking how do I determine the "optimal" number of the threads on an arbitrary machine this console app will run on.
I'd suggest that you don't try to determine it yourself. Use the ThreadPool and let .NET manage the threads for you.
You can use Environment.ProcessorCount if that's the only thing you're after. But usually using a ThreadPool is indeed the better option.
The .NET thread pool also has provisions for sometimes allocating more threads than you have cores to maximise throughput in certain scenarios where many threads are waiting for I/O to finish.
The correct number is obviously 42.
Now on the serious note. Just use the thread pool, always.
1) If you have a lengthy processing task (ie. CPU intensive) that can be partitioned into multiple work piece meals then you should partition your task and then submit all individual work items to the ThreadPool. The thread pool will pick up work items and start churning on them in a dynamic fashion as it has self monitoring capabilities that include starting new threads as needed and can be configured at deployment by administrators according to the deployment site requirements, as opposed to pre-compute the numbers at development time. While is true that the proper partitioning size of your processing task can take into account the number of CPUs available, the right answer depends so much on the nature of the task and the data that is not even worth talking about at this stage (and besides the primary concerns should be your NUMA nodes, memory locality and interlocked cache contention, and only after that the number of cores).
2) If you're doing I/O (including DB calls) then you should use Asynchronous I/O and complete the calls in ThreadPool called completion routines.
These two are the the only valid reasons why you should have multiple threads, and they're both best handled by using the ThreadPool. Anything else, including starting a thread per 'request' or 'connection' are in fact anti patterns on the Win32 API world (fork is a valid pattern in *nix, but definitely not on Windows).
For a more specialized and way, way more detailed discussion of the topic I can only recommend the Rick Vicik papers on the subject:
designing-applications-for-high-performance-part-1.aspx
designing-applications-for-high-performance-part-ii.aspx
designing-applications-for-high-performance-part-iii.aspx
The optimal number would just be the processor count. Optimally you would always have one thread running on a CPU (logical or physical) to minimise context switches and the overhead that has with it.
Whether that is the right number depends (very much as everyone has said) on what you are doing. The threadpool (if I understand it correctly) pretty much tries to use as few threads as possible but spins up another one each time a thread blocks.
The blocking is never optimal but if you are doing any form of blocking then the answer would change dramatically.
The simplest and easiest way to get good (not necessarily optimal) behaviour is to use the threadpool. In my opinion its really hard to do any better than the threadpool so thats simply the best place to start and only ever think about something else if you can demonstrate why that is not good enough.
A good rule of the thumb, given that you're completely CPU-bound, is processorCount+1.
That's +1 because you will always get some tasks started/stopped/interrupted and n tasks will almost never completely fill up n processors.
The only way is a combination of data and code analysis based on performance data.
Different CPU families and speeds vs. memory speed vs other activities on the system are all going to make the tuning different.
Potentially some self-tuning is possible, but this will mean having some form of live performance tuning and self adjustment.
Or even better than the ThreadPool, use .NET 4.0 Task instances from the TPL. The Task Parallel Library is built on a foundation in the .NET 4.0 framework that will actually determine the optimal number of threads to perform the tasks as efficiently as possible for you.
I read something on this recently (see the accepted answer to this question for example).
The simple answer is that you let the operating system decide. It can do a far better job of deciding what's optimal than you can.
There are a number of questions on a similar theme - search for "optimal number threads" (without the quotes) gives you a couple of pages of results.
I would say it also depends on what you are doing, if your making a server application then using all you can out of the CPU`s via either Environment.ProcessorCount or a thread pool is a good idea.
But if this is running on a desktop or a machine that not dedicated to this task, you might want to leave some CPU idle so the machine "functions" for the user.
It can be argued that the real way to pick the best number of threads is for the application to profile itself and adaptively change its threading behavior based on what gives the best performance.
I wrote a simple number crunching app that used multiple threads, and found that on my Quad-core system, it completed the most work in a fixed period using 6 threads.
I think the only real way to determine is through trialling or profiling.
In addition to processor count, you may want to take into account the process's processor affinity by counting bits in the affinity mask returned by the GetProcessAffinityMask function.
If there is no excessive i/o processing or system calls when the threads are running, then the number of thread (except the main thread) is in general equal to the number of processors/cores in your system, otherwise you can try to increase the number of threads by testing.

Categories

Resources