I have a c# desktop application. Its purpose is 2 fold.
1). To display a live feed from an IP camera to my winform application.
2). Send any captured motion to my server.
It is (2) that is labour intensive. I believe I have optimised it as much as I can and the RAM is manageable.
However, in my quest to learn and to try to make my code even more efficient I am always open to new approaches.
Today, I have come across parallel processing. But, reading some links it seems to suggest there would be not much performance gain using parallel processing. Indeed in all my travels (contracts) I have never seen anyone use parallel processing in C# development.
Should I take early heed and not bother to look into this or should I see whether there is anything to gain by 'off-loading' my motion detection code to a separate parallel process?
Peoples advice/experience would be greatly informative.
Thanks
I would recommend taking a look at the Task Parallel Library provided in the .NET Framework, it's based on an idea that a piece of work is a Task. The idea is to give an abstraction to having to manage and create threads manually.
Tasks can run in parallel, on their own threads or run on the same thread, depending on the workload and configuration. Task Parallel Library is also great for asynchronous operations and work very well with I/O where the hardware can cause a blocking thread which can cause performance issues in your application, for example reading from a hard drive will cause some issues.
I suggest running a profiler on your application, visual studio professional onwards comes with a built in profiler that will enable you to trace and pin-point intensive operations that could possibly be improved with concurrency. If your application is running smooth, then there is no need, but there's nothing wrong with forward thinking and learning the Task Parallel Library as im sure there will be a point where this will benefit you from knowing how to implement concurrency in your application.
I've used TPL to solve various performance issues with large database calls in iterative loops and it's great for these IO operations, TPL will also take into account the hardware which it's being executed on and if used correctly, always be the most optimal for the hardware its running on. You could take your same piece of code and run it on a 2 core machine and it will still work the best to its abilities the hardware can provide without you having to worry about creating too many threads etc.
Personally, I'd say some asynchronous operations could be a good addition to your application since this is regarding external network camera devices which could cause blocking threads in your application.
Related
I've reviewed the documentation for Xamarin, and it recommends using ThreadPool for multithreaded functionality, as can be seen here:
http://docs.xamarin.com/guides/ios/application_fundamentals/threading
However, a benchmark has been done showing that Grand Central Dispatch is much more performant than ThreadPool
http://joeengalan.wordpress.com/2012/02/26/execution-differences-between-grand-central-dispatch-and-threadpool-queueuserworkitem-in-monotouch/
Therefore my question is, why does Xamarin recommend ThreadPool over Grand Central Dispatch? Is Xamarin eventually going to tie ThreadPool into Grand Central Dispatch? When would one choose one over the other? Because if ThreadPool is going to be optimized by Xamarin, and eventually outperform Grand Central Dispatch, then I do not want to use Grand Central Dispatch.
There is very little "extra performance" that you can squeeze out of a machine, in particular a mobile device by introducing more threads.
Like my comment on that post that you linked says (from February 2012) as well as the first paragraph on the article you linked explains the reason.
The difference between GCD and ThreadPool is that the ThreadPool in Mono has a "slow start" setup, so that it does not create more threads than necessary in the presence of work peaks. You can easily starve the CPU by launching too many threads, so the threadpool throttles itself after the initial threads have been created and then tries to only create a new thread every second (give or take, I dont remember the actual details).
If you want to force the ThreadPool to actually spin up a lot of threads, you can control that with the ThreadPool.SetMinThreads.
The reason to use the ThreadPool is that the same code will work across all platforms.
Notice that the document talks about using the ThreadPool over the other standard .NET threading APIs and does not say anything about using GCD or not. It is merely that the threadpool is a better choice than rolling your own management using Threads.
That said, API wise, these days I recommend people to use the Task Parallel Library (TPL) which is a much higher level way of thinking about your background operations than a thread. In addition, you get same API across platform with the flexibility of using either the built-in threadpool, or dispatching to GCD, by switching one line of code.
The current problem with mono (xamarin) thread pool is that it does not perform.
The Xamarin debugger gets chocked with as little as 10 simultaneous tasks. In release it is not much better.
In my case the same code on windows is outperforming on Mac 10x or more (note I am not using ANYTHING system specific). I tried different combinations of thread pool, async methods and asynchronous callbacks (BeginRead etc.) - Xamarin sacks at all of it.
If I have to guess it relates to their obsession with IOS being inherently single threaded. As for them recommending it , I have a guess too - that is the only part of framework that works as far as multithreading is concerned.
I spent weeks trying to optimize my code , but there is nothing you can do , if your use multithreading you are stack.
I am working on a WPF application.
In a screen/View i have to make 6 calls to a WCF service. None of those calls are related in the sense they dont share data neither are they dependent on each other. I am planning to use TPL and make these 6 WCF service calls as 6 tasks. Now the application might be either deployed on a single core machine or multiple core machine.
I am being told that usage of TPL on single core machine would actually increase the time take for the tasks to complete because of the overhead that would be placed on the cpu scheduler to time splice different tasks. Is this true. If yes should i still continue with my design or should i look at alternatives.
if i have to look at alternatives, what are those alternatives :) ?
When doing something CPU intensive, you would be adding overhead by running parallel threads on a single core machine.
In your case the tasks are not CPU intensive, they are waiting for a service call to respond, so you can very well run parallel threads on a single core machine.
Depending on how the server handles the calls, there might not be any time increase anyway. If the calls are queued on the server, it will take about the same time to run all calls anyway. In that case it would be better to run the calls in sequence, just because it's simpler.
Your best bet is to profile using multi-core and single core. Most bios's can set the number of active core's so it shouldn't be a big problem. You can do some mock testing to find out if it will work for you.
Obviously using task switching has overhead issues but as long as each task's time is much longer than the setup time you won't notice it.
There are many ways to implement multi-tasking behavior and if you do not know which is best then chances are you need to actually write some test cases and do some profiling. This is not difficult to do. If you are simply trying to use multi-core systems then it generally is quite easy with the latest version of .NET and you can even set it up for multi-core but revert back to single core by using appropriate constructs.
the async/await pattern, for example, can easily be ran synchronously by either using #ifdef or removing all await keywords(with a search and replace tool). Parallel.For loops are easily convertible to normal for loops either directly or by changing MaxDegreeOfParallelism. Tasks can easily be ran synchronously.
If you would like to make it more transparent you could use some pre-processing scripting like T4.
In general, When running multi threads on single core it will be slower since it has Context Switch between the threads.
I think the following diagram will explain you the difference:
As you can see the diagram refer to 4 threads running on single core, first time in multi-tasking and the second time Sequential.
you can see that in multi-tasking all threads will finish at a later time than Sequential tasking.
In your specific case in probably won't be the same and I think #Guffa is right in his answer since its involving WCF calling
I am trying to write a program in C# that will connect to around 400 computers and retrieve some information, lets say it retrieves the list of web services running on each computer.
I am assuming I need a well threaded application to be able to retrieve info from such a huge number of servers really quick. I am pretty blank on how to start working on this, can you guys give me a head start as to how to begin!
Thanks!
I see no reason why you should use threading in your main logic. Use asynchronous APIs and schedule their callback to the main thread. That way you get the benefits of asynchrony, but without most of the difficulty related to threading.
You'll only need multithreading in your logic code if the work you need to do on the data is that expensive. And even then you usually can get aways with parallelizing using side effect free functions.
Take a look at the Task Parallel Library.
Speficically Data Parallelism.
You could also use PLINQ if you wanted.
You should also execute the threads parallely on a multi-core CPU to enhance performance.
My favourite references on the topic are given below -
http://www.albahari.com/threading/
http://www.codeproject.com/KB/Parallel_Programming/NET4ParallelIntro.aspx
Where and how do you get the list of those 400 servers to query?
how often do you need to do this?
you could use a windows service or schedule a task which invoke your software and in it you could do a foreach element in the server list and start a call to such server in a different thread using thread queue/pool, but there is a maximum so you won't start 400 threads all together anyway.
describe a bit better your solution and we see what you can do :)
Take a look at this library: Task Parallel Library. You can make efficient use of your system resources and manage your work easier than managing your threads directly.
There might be considerable impact on the server side when you start query all 400 computers. But you can take a look at Parallel LINQ (PLINQ), where you can limit the degree of parallelism.
You can also use thread pooling for this matter, e.g. a Task class.
Createing manual threads may not be a good idea, as they are not highly reusable and take quite a lot of memory/CPU to be created
When do you use threads in a application? For example, in simple CRUD operations, use of smtp, calling webservices that may take a few time if the server is facing bandwith issues, etc.
To be honest, i don't know how to determine if i need to use a thread (i know that it must be when we're excepting that a operation will take a few time to be done).
This may be a "noob" question but it'll be great if you share with me your experience in threads.
Thanks
I added C# and .NET tags to your question because you mention C# in your title. If that is not accurate, feel free to remove the tags.
There are different styles of multithreading. For example, there are asynchronous operations with callback functions. .NET 4 introduces the parallel Linq library. The style of multithreading you would use, or whether to use any at all, depends on what you are trying to accomplish.
Parallel execution, such as what parallel Linq would generally be trying to do, takes advantage of multiple processor cores executing instructions that do not need to wait for data from each other. There are many sources for such algorithms outside Linq, such as this. However, it is possible that parallel execution may be unable to you or that it does not suit your application.
More traditional multithreading takes advantage of threading within the .NET library (in this case) as provided by System.Thread. Remember that there is some overhead in starting processes on threads, so only use threads when the advantages of doing so outweigh this overhead. Generally speaking, you would only want to use this type of single-processor multithreading when the task running under the thread will have long gaps in which the processor could be doing something else. For example, I/O from hard disk (and, consequently, from a database system that uses one) is many orders of magnitude slower than memory access. Network access can also be slow, as another example. Multithreading could allow another process to be running while waiting for these slow (compared to the processor) operations to complete.
Another example when I have used traditional multithreading is to cache some values the first time a particular ASP.NET page is accessed within a session. I kick off a thread so that the user does not have to wait for the caching to complete before interacting with the page. I also regulate the behavior when the caching does not complete before the user requests another page so that, if the caching does not complete, it is not a problem. It simply makes some further requests faster that were previously too slow.
Consider also the cost that multithreading has to the maintainability of your application. Threaded applications can be harder to debug, for example.
I hope this answers your question at least somewhat.
Joseph Albahari summarized it very well here:
Maintaining a responsive user interface
Making efficient use of an otherwise blocked CPU
Parallel programming
Speculative execution
Allowing requests to be processed simultaneously
One reason to use threads is to split large, CPU-bound tasks across a number of CPUs/cores, to finish faster. Another is to let an extended task execute asynchronously, so the foreground can remain responsive while it runs.
Your examples seem to be concentrating on the second of these. While it can be a good reason, if you can use asynchronous I/O instead, that's usually preferable (e.g., almost anything using sockets can/will be better off using the socket(s) asynchronously). Asynchronous I/O is easier to cancel, and it'll usually have lower CPU overhead as well.
You can use threads when you need different execution paths. This leads(when done correctly) to more responsive and/or faster applications but also leads to more complex code and debugging.
In a simple CRUD scenario maybe is not that useful, but maybe your UI is consuming a slow web service. If you your code is tied to your UI thread you will have unresponsive UI between the service calls.
In that case, using System.Threading.Threads maybe be overkill because you don't need so much control. Using a BackgrounWorker maybe a better choice.
Threading is something difficult to master, but the benefits when used correctly are huge, performance is the most common.
Somehow you have answered your question by yourself. Using threads whenever you execute time consuming operations is right choice. Also you should it in situations when you want to make things faster. For example you want to process some amount of files - each file can be processed by different thread.
By using threads you can better utilize power of multi-core/processor machines.
Monitoring some data in background of your application.
There are dozens of such scenarios.
Realising my comment might suffice as an answer ...
I like to view multi-threading scenarios from a resource perspective. In other words, UI (graphics), networking, disk IO, CPU (cores), RAM etc. I find that helps when deciding where to use multi-threading in the general sense at least.
The reasoning behind this is simply that I can take advantage of one resource on a specific thread (eg. Disk IO) while at the same time using another thread to accomplish something else using a different resource.
Scenario
I have a very heavy number-crunching process that pools large datasets from 3 different databases and then does a bit of processing on each to eventually produce a result.
This process is fine if it is only used by a single asset. However I now have 3500 assets that I need to process, which takes about 1hr30mins in the state of the current process.
Question
What is my best option for speeding this process up in terms of a multi-threaded c# application? Realistically I don't have to share anything between the processing of each asset, so I'm confident that being able to run process multiple assets at a time shouldn't cause too many issues.
Thoughts
I've heard good things about thread pools, but I guess realistically I want something that isn't too huge to implement, is easily understandable and can run off a decent number of threads at a time.
Help would be greatly appreciated.
In .net you can use the existing Thread Pool, no need to implement one yourself. Here is the relevant MSDN.
You should take care not to run too many processes at once (3500 are a bit much), but using the supplied queuing mechanism should get you started in the right direction.
Another thing to try is using PLINQ.
If you don't have a multi-core processor, multiple machines, and/or the thread processes are not I/O bound, multithreading will not help. Start by profiling the current processing to see where the time is going.
Thread pools are fine, and you can use a task queue to do simple load-balancing, but if there's no spare CPU cycles in the current application this would be a waste of time.
The nicest option would be to use the new Task Parallel Library in .NET 4, if you can do this using VS 2010 RC. This has built-in load balancing and work stealing queues, so it will make this task easy to thread, and very scalable.
However, if you need to do this in .NET 3.5, I would recommend using the ThreadPool, and just using ThreadPool.QueueUserWorkItem to start each task.
If your tasks are all very computationally intensive for their entire lifetime, you may want to prevent having too many running concurrently. Some form of queue, which you pull work from and execute, can be beneficial in this case. Just place all of your work items into a queue, and have threads pull work from the queue (with appropriate locking), and process.
If you have a multi-core system, and CPU cycles are your bottleneck, this should scale very well.
The .Net built in ThreadPool will solve both of your requirements of running a decent number of threads as well as being simple to work with. I have previously written an article on the subject which you can find here.
With using SQL Server 2005 or later, you can create user-defined functions in C# and use them from within T-SQL procedures, which can give a marked speedup for number crunching. SQL Server is multi-threaded and does a good job with it, so consider keeping as much of the processing in the database engine as you can.