C# multi threading query - c#

I am trying to write a program in C# that will connect to around 400 computers and retrieve some information, lets say it retrieves the list of web services running on each computer.
I am assuming I need a well threaded application to be able to retrieve info from such a huge number of servers really quick. I am pretty blank on how to start working on this, can you guys give me a head start as to how to begin!
Thanks!

I see no reason why you should use threading in your main logic. Use asynchronous APIs and schedule their callback to the main thread. That way you get the benefits of asynchrony, but without most of the difficulty related to threading.
You'll only need multithreading in your logic code if the work you need to do on the data is that expensive. And even then you usually can get aways with parallelizing using side effect free functions.

Take a look at the Task Parallel Library.
Speficically Data Parallelism.
You could also use PLINQ if you wanted.

You should also execute the threads parallely on a multi-core CPU to enhance performance.
My favourite references on the topic are given below -
http://www.albahari.com/threading/
http://www.codeproject.com/KB/Parallel_Programming/NET4ParallelIntro.aspx

Where and how do you get the list of those 400 servers to query?
how often do you need to do this?
you could use a windows service or schedule a task which invoke your software and in it you could do a foreach element in the server list and start a call to such server in a different thread using thread queue/pool, but there is a maximum so you won't start 400 threads all together anyway.
describe a bit better your solution and we see what you can do :)

Take a look at this library: Task Parallel Library. You can make efficient use of your system resources and manage your work easier than managing your threads directly.

There might be considerable impact on the server side when you start query all 400 computers. But you can take a look at Parallel LINQ (PLINQ), where you can limit the degree of parallelism.
You can also use thread pooling for this matter, e.g. a Task class.
Createing manual threads may not be a good idea, as they are not highly reusable and take quite a lot of memory/CPU to be created

Related

Using ThreadPool threads in library

I'm working on a .net core library that will get used mostly in web apps. This library is being built with performance in mind as this is the main design decision. There is some code that is fairly heavy and due to this, will get cached so that subsequent calls are quick. As you can imagine, the first call is slower and I don't want that. I want to execute this code at the earliest possible time to warm up the cache without affecting the other operations. I was thinking of using Task.Start() without awaiting to to achieve this.
My question is, is it frowned upon to use threadpool threads in a library, i.e what is the etiquette on this? As this will be mostly used on web apps, I feel I don't want to interfere with the client's threadpool. That being said, the library will only use one background thread and this will be less than a second. Or should I just let the client take the performance hit for first calls?
If I understand you correctly; it's perfectly legitimate to use multi-threading in a library; as a matter of fact: it happens all the time.
Basically, a lot of async Task methods do this in one way or another. (Sometimes there is no thread)
If it's so heavy you need multiple parallel threads for a long period in time, than it's best to create an explicit initialize routine, and warn the caller in the docs.
Task.Run is typically used for such processing.

multi core processing in c#

I have a c# desktop application. Its purpose is 2 fold.
1). To display a live feed from an IP camera to my winform application.
2). Send any captured motion to my server.
It is (2) that is labour intensive. I believe I have optimised it as much as I can and the RAM is manageable.
However, in my quest to learn and to try to make my code even more efficient I am always open to new approaches.
Today, I have come across parallel processing. But, reading some links it seems to suggest there would be not much performance gain using parallel processing. Indeed in all my travels (contracts) I have never seen anyone use parallel processing in C# development.
Should I take early heed and not bother to look into this or should I see whether there is anything to gain by 'off-loading' my motion detection code to a separate parallel process?
Peoples advice/experience would be greatly informative.
Thanks
I would recommend taking a look at the Task Parallel Library provided in the .NET Framework, it's based on an idea that a piece of work is a Task. The idea is to give an abstraction to having to manage and create threads manually.
Tasks can run in parallel, on their own threads or run on the same thread, depending on the workload and configuration. Task Parallel Library is also great for asynchronous operations and work very well with I/O where the hardware can cause a blocking thread which can cause performance issues in your application, for example reading from a hard drive will cause some issues.
I suggest running a profiler on your application, visual studio professional onwards comes with a built in profiler that will enable you to trace and pin-point intensive operations that could possibly be improved with concurrency. If your application is running smooth, then there is no need, but there's nothing wrong with forward thinking and learning the Task Parallel Library as im sure there will be a point where this will benefit you from knowing how to implement concurrency in your application.
I've used TPL to solve various performance issues with large database calls in iterative loops and it's great for these IO operations, TPL will also take into account the hardware which it's being executed on and if used correctly, always be the most optimal for the hardware its running on. You could take your same piece of code and run it on a 2 core machine and it will still work the best to its abilities the hardware can provide without you having to worry about creating too many threads etc.
Personally, I'd say some asynchronous operations could be a good addition to your application since this is regarding external network camera devices which could cause blocking threads in your application.

When to use Multithread?

When do you use threads in a application? For example, in simple CRUD operations, use of smtp, calling webservices that may take a few time if the server is facing bandwith issues, etc.
To be honest, i don't know how to determine if i need to use a thread (i know that it must be when we're excepting that a operation will take a few time to be done).
This may be a "noob" question but it'll be great if you share with me your experience in threads.
Thanks
I added C# and .NET tags to your question because you mention C# in your title. If that is not accurate, feel free to remove the tags.
There are different styles of multithreading. For example, there are asynchronous operations with callback functions. .NET 4 introduces the parallel Linq library. The style of multithreading you would use, or whether to use any at all, depends on what you are trying to accomplish.
Parallel execution, such as what parallel Linq would generally be trying to do, takes advantage of multiple processor cores executing instructions that do not need to wait for data from each other. There are many sources for such algorithms outside Linq, such as this. However, it is possible that parallel execution may be unable to you or that it does not suit your application.
More traditional multithreading takes advantage of threading within the .NET library (in this case) as provided by System.Thread. Remember that there is some overhead in starting processes on threads, so only use threads when the advantages of doing so outweigh this overhead. Generally speaking, you would only want to use this type of single-processor multithreading when the task running under the thread will have long gaps in which the processor could be doing something else. For example, I/O from hard disk (and, consequently, from a database system that uses one) is many orders of magnitude slower than memory access. Network access can also be slow, as another example. Multithreading could allow another process to be running while waiting for these slow (compared to the processor) operations to complete.
Another example when I have used traditional multithreading is to cache some values the first time a particular ASP.NET page is accessed within a session. I kick off a thread so that the user does not have to wait for the caching to complete before interacting with the page. I also regulate the behavior when the caching does not complete before the user requests another page so that, if the caching does not complete, it is not a problem. It simply makes some further requests faster that were previously too slow.
Consider also the cost that multithreading has to the maintainability of your application. Threaded applications can be harder to debug, for example.
I hope this answers your question at least somewhat.
Joseph Albahari summarized it very well here:
Maintaining a responsive user interface
Making efficient use of an otherwise blocked CPU
Parallel programming
Speculative execution
Allowing requests to be processed simultaneously
One reason to use threads is to split large, CPU-bound tasks across a number of CPUs/cores, to finish faster. Another is to let an extended task execute asynchronously, so the foreground can remain responsive while it runs.
Your examples seem to be concentrating on the second of these. While it can be a good reason, if you can use asynchronous I/O instead, that's usually preferable (e.g., almost anything using sockets can/will be better off using the socket(s) asynchronously). Asynchronous I/O is easier to cancel, and it'll usually have lower CPU overhead as well.
You can use threads when you need different execution paths. This leads(when done correctly) to more responsive and/or faster applications but also leads to more complex code and debugging.
In a simple CRUD scenario maybe is not that useful, but maybe your UI is consuming a slow web service. If you your code is tied to your UI thread you will have unresponsive UI between the service calls.
In that case, using System.Threading.Threads maybe be overkill because you don't need so much control. Using a BackgrounWorker maybe a better choice.
Threading is something difficult to master, but the benefits when used correctly are huge, performance is the most common.
Somehow you have answered your question by yourself. Using threads whenever you execute time consuming operations is right choice. Also you should it in situations when you want to make things faster. For example you want to process some amount of files - each file can be processed by different thread.
By using threads you can better utilize power of multi-core/processor machines.
Monitoring some data in background of your application.
There are dozens of such scenarios.
Realising my comment might suffice as an answer ...
I like to view multi-threading scenarios from a resource perspective. In other words, UI (graphics), networking, disk IO, CPU (cores), RAM etc. I find that helps when deciding where to use multi-threading in the general sense at least.
The reasoning behind this is simply that I can take advantage of one resource on a specific thread (eg. Disk IO) while at the same time using another thread to accomplish something else using a different resource.

Appropriate Multi-Threading Option

Scenario
I have a very heavy number-crunching process that pools large datasets from 3 different databases and then does a bit of processing on each to eventually produce a result.
This process is fine if it is only used by a single asset. However I now have 3500 assets that I need to process, which takes about 1hr30mins in the state of the current process.
Question
What is my best option for speeding this process up in terms of a multi-threaded c# application? Realistically I don't have to share anything between the processing of each asset, so I'm confident that being able to run process multiple assets at a time shouldn't cause too many issues.
Thoughts
I've heard good things about thread pools, but I guess realistically I want something that isn't too huge to implement, is easily understandable and can run off a decent number of threads at a time.
Help would be greatly appreciated.
In .net you can use the existing Thread Pool, no need to implement one yourself. Here is the relevant MSDN.
You should take care not to run too many processes at once (3500 are a bit much), but using the supplied queuing mechanism should get you started in the right direction.
Another thing to try is using PLINQ.
If you don't have a multi-core processor, multiple machines, and/or the thread processes are not I/O bound, multithreading will not help. Start by profiling the current processing to see where the time is going.
Thread pools are fine, and you can use a task queue to do simple load-balancing, but if there's no spare CPU cycles in the current application this would be a waste of time.
The nicest option would be to use the new Task Parallel Library in .NET 4, if you can do this using VS 2010 RC. This has built-in load balancing and work stealing queues, so it will make this task easy to thread, and very scalable.
However, if you need to do this in .NET 3.5, I would recommend using the ThreadPool, and just using ThreadPool.QueueUserWorkItem to start each task.
If your tasks are all very computationally intensive for their entire lifetime, you may want to prevent having too many running concurrently. Some form of queue, which you pull work from and execute, can be beneficial in this case. Just place all of your work items into a queue, and have threads pull work from the queue (with appropriate locking), and process.
If you have a multi-core system, and CPU cycles are your bottleneck, this should scale very well.
The .Net built in ThreadPool will solve both of your requirements of running a decent number of threads as well as being simple to work with. I have previously written an article on the subject which you can find here.
With using SQL Server 2005 or later, you can create user-defined functions in C# and use them from within T-SQL procedures, which can give a marked speedup for number crunching. SQL Server is multi-threaded and does a good job with it, so consider keeping as much of the processing in the database engine as you can.

Looking for a good exercise to help me get better at Multithreading

I think of myself as a pretty decent developer, however when it comes to multithreading I'm a total n00b. I mean the only multithreading I've done at work was the very basic stuff like spawning off multiple threads using the ThreadPool to do some background work. No synchronization was necessary, and there was never any need to create threads manually.
So, my question is this; I want to write some application that will need to be heavily multithreaded and that will need to do all of the advanced things like synchronization etc.. I just can't think of anything to write. I've thought of maybe trying to write my own ThreadPool, but I think I need to learn to walk before I can run. So what ideas can anyone suggest? It doesn't have to have any real world useage, it can be totally pointless and worthless, but I just want to get better. I've read tons of articles and tutorials on all the theory, but the only way to REALLY get better is by doing. So, any ideas?
Recursive Quick Sort. Compare sort time as a function of number of threads.
Craps simulator. How many rolls of the dice can you do per minute?
Web page crawler. Give it a URL and download all of the child pages and images. Watch for pages that reference each other so you don't go into an infinite loop. Note that the threads will block waiting for the network response, giving you a different CPU utilization than pure calculation-based threads. Use a queue to keep track of unread pages and a dictionary to keep track of active threads. Threads that timeout go back to the queue.
WCF web server. Spawn a new thread for each request. Write a multi-threaded WPF client that updates the user-interface in real-time.
Is that enough?
How about some kind of pointless batch-processing app? Generate a horrendous amount of data in a single thread and dump it out to a file, then start splitting the work off into threads of differing sizes and dumping them out to another file, time them and compare the files at the end to ensure the order is the same. This would get you into multi-threading, locking, mutexes and what not and also show the benefits of multithreading certain tasks vs. processing in a single thread.
First thing that popped into my head. Could be dull and/or pointless, but don't shoot the messenger! :)
I think you should draw attention for this books:
Windows via C/C++ by Jeffrey Richter. This is one of the best books about multithreading
Concurrent Programming on Windows by Joe Duffy. Another excelent book about multithreading
Excelent articles by Herb Sutter (Herb, we all waiting for your new book!)
Effective Concurrency series
Some blogs:
Herb Sutter's blog.
Parallel programming with .Net
Jeffrey Richter's Blog
Joe Duffy's blog
P.S. How about Power Threading as an example of multithreading (and ThreadPool implementation)?

Categories

Resources