I trying to find out if there is any built in method or property which tells about time taken by each task while executing multiple tasks (say adding tasks in an array or list and executing all of them at the same time and waiting to finish all of them). And in between of this process if one or more task takes a lot more time than expected, I should be able to identify that task and remove that task from the array or list. If there is no in-built method or property then is there any other way to find out this?
Hard to beat a good old Stopwatch, from the System.Diagnostics library. Set up a ConcurrentDictionary<int,StopWatch>, with the integer representing the managed thread ID of the task (you can also key it to Task objects or the Threads themselves depending on how you're spinning them up, or you can set up a communication token including a "Cancel" method). Each thread or task, when it starts, should create a Stopwatch, add it into the Dictionary, then Start() it, before continuing to do its work. When it's done, it should Stop() its stopwatch and remove it from the Dictionary (you can have it put the resulting Elapsed time into a ConcurrentQueue that you can use in the supervisor thread to log running times; the dictionary is used to track running times of active threads so your supervisor can manage them). Your supervisor thread can then periodically check for tasks taking much longer than average, and when it finds one, it can trip the cancellation token and remove the entry from the Dictionary.
You can use a StopWatch.
It's specifically made for timing the execution of your code accurately.
It sounds like you are running these tasks in parallel probably with the TPL, if that is the case, maybe you can assign a single CancellationToken to all the Tasks you are creating then use that token to stop all the tasks simultaneously if the StopWatch exceeds the time limit.
Alternatively, assign a StopWatch separately to each of your tasks and have them stop whenever necessary.
Related
I have a windows service that does long operations every X minutes using System.Timers.Timer.
I need to kwnow if there is a way to figure out, in the next loop, if the previous execution is still running.
Is there a way to know how much CPU memory the previous thread using?
A simple solution would be to use a shared, volatile bool that is set in the start of the method, and reset at the end. Or use Interlocked.Increment/decrement of a shared counter to keep a running total of the number of threads in your method.
If you do not want invocations to overlap another, one solution would be to use a Threading.Timer, set the due-time you want and a infinite period. At the end of each event you call .Change to reset the due-time. Another way to do essentially the same thing would be a loop over Task.Delay:
while(!cancellationToke.IsCancellationRequested){
// do work
await Task.Delay(periodTime);
}
Your question about CPU/Memory does not make that much sense. A thread is either running, waiting to run, or blocked. And the only memory that can be directly attributed to the thread would be the stack for that thread, and that is usually quite small, and has a 1Mb limit by default if I'm remembering correctly. If you want to measure how much time a thread is in the running state, you need to do periodic sampling or instrument the scheduler.
I've a c# single threaded application and currently working on to make it multi-threaded with the use of thread pools. I am stuck in deciding which model would work for my problem.
Here's my current scenario
While(1)
{
do_sometask();
wait(time);
}
And this is repeated almost forever. The new scenario has multiple threads which does the above. I could easily implement it by spawning number of threads based on the tasks I have to perform, where all the threads perform some task and wait forever.
The issue here is I may not know the number of tasks, so I can't just blindly spawn 500 threads. I thought about using threadpool, but because almost every thread loops forever and won't ever be freed up for new tasks in the queue, am not sure which other model to use.
I am looking for an idea or solution where I could break the loop in the thread and free it up instead of waiting, but come back and resume the same task after the wait(when the time gets elapsed, using something like a timer/checking timestamp of when the last task is performed).
With this I could use a limited number of threads (like in a thread pool) and serve the tasks which are coming in during the time old threads waits(virtually).
Any help is really appreciated.
If all you have is a bunch of things that happen periodically, it sounds what you want is a bunch of timers. Create a timer for each task, to fire when appropriate. So if you have two different tasks:
using System.Threading;
// Task1 happens once per minute
Timer task1Timer = new Timer(
s => DoTask1(),
null,
TimeSpan.FromMinutes(1),
TimeSpan.FromMinutes(1));
// Task2 happens once every 47 seconds
Timer task2Timer = new Timer(
s => DoTask2(),
null,
TimeSpan.FromSeconds(47),
TimeSpan.FromSeconds(47);
The timer is a pretty lightweight object, so having a whole bunch of them isn't really a problem. The timer only takes CPU resources when it fires. The callback method will be executed on a pool thread.
There is one potential problem. If you have a whole lot of timers all with the same period, then the callbacks will all be called at the same time. The threadpool should handle that gracefully by limiting the number of concurrent tasks, but I can't say for sure. But if your wait times are staggered, this is going to work well.
If you have small wait times (less than a second), then you probably need a different technique. I'll detail that if required.
With this design, you only have one thread blocked at any time.
Have one thread (the master thread) waiting on a concurrent blocking collection, such as the BlockingCollection. This thread will be blocked by a call to TryTake until something is placed in the collection, or after a certain amount of time has passed via a timeout passed into the call (more on this later).
Once it is unblocked, it may have a unit of work to be processed. It checks to see if there is one (i.e., the TryTake call didn't time out), then if there is capacity to perform this work, and if so, queues up a thread (pool, Task or whatevs) to service the work. This master thread then goes back to the blocking collection and tries to take another unit of work. The cycle continues.
As a unit of work is begun, it will be noted so that the main thread can see how many threads are working. Once this unit is completed, the notation will be removed. The thread is then freed.
You want to use a timeout so that if it is judged that too many operations are running concurrently, you will be able to re-evaluate this a set period of time down the road. Otherwise, that unit of work sits in the blocking collection until a new unit is added, which is not optimal.
Outside users of this instance can queue up new units of work by simply dropping them in the collection.
You can use a cancellation token to immediately unblock the thread when it's time to shut down operations. Have the worker operations take cancellation tokens as well so they can halt on shutdown.
I could implement it with the help of a threadpool and few conditions to check the last activity of the task before adding it to the threadpool queue.
I am writing a Windows Service that receives messages/requests and executes them asynchronously - it does not need to wait for the item to complete, nor care about the result. I am successfully able to execute the requests as Tasks using System.Threading.Tasks.Task. Most of these items execute quickly (less than a second), but some take longer (2-3 minutes).
As a windows service, I need to respond to the "Stop" command and some of the Tasks will still be running. It is preferable to not Cancel the Tasks as the longer running ones might leave data in a bad state (and a rollback is very tricky).
What is the best way to handle this? I thought of keeping a List of the tasks that I have started so that I can do a WaitAll. During the execution of the service it will process tens of thousands of requests. How would I know when to remove completed Tasks from the List so the List doesn't grow wildly? I don't think I should be holding references to that many Task objects.
Thanks in advance.
You can use CancellationToken for this purposes.
Once OnStop event occured, you just call method Cancel() on the CancellationTokenSource and it will be propagated to all tasks that you passed the token in.
There are several techniques how to correctly cancel a task.
You may explicitly check from time to time inside the task if cancellation requested.
Or I beleive there is a property ThrowWhenCancelled on a token itself, and if cancellation has been requested token will throw an CancellationException.
If you don't care about the task results, but want to have a tracking list, just keep the List<Task<T>> (or may be ConcurrentBag<Task<T>>)as a local variable. From time to time start another task that will go through the list and check the Task.Status property, if it's running or else.
I don't think keeping that many references should be an issue as long as you maintain those references correctly.
Also, it depends how do you want to stop your application. If you are fine with the task being killed along with the application you may not track them (unless they hold some resources to hold, and you need to free them) at all. But in most of the cases I would say it should be correctly finalized.
EDIT: just reread your post. RequestAdditionalTime call may help you to wait until long runnning tasks are finished.
Check it on MSDN: ServiceBase.RequestAdditionalTime Method
If you only care about tasks finishing before your service terminates I would suggest using Thread instead of Task.
new Thread(WorkerMethod).Start();
Thread created this way is a so called foreground thread and your application (service) will not end until all foreground threads end. You need to make sure all your foreground threads do not hang under any condition otherwise your application will never terminate by itself. You can achieve the same thing with Task but you would need to keep list of all tasks you have run and use Task.WaitAll to wait for all of them to finish in your Stop event.
If you need to control your threads (i.e. keep reference to them) you need to use some sort of collection.
List<Thread> threads = new List<Thread>();
Thread thrd;
threads.Add(thrd = new Thread(WorkerMethod));
thrd.Start();
But if you actually need to control and cancel you tasks/threads you should rather go with Task which makes cancellation easier.
From what I understand about the difference between Task & Thread is that task happened in the thread-pool while the thread is something that I need to managed by myself .. ( and that task can be cancel and return to the thread-pool in the end of his mission )
But in some blog I read that if the operating system need to create task and create thread => it will be easier to create ( and destroy ) task.
Someone can explain please why creating task is simple that thread ?
( or maybe I missing something here ... )
I think that what you are talking about when you say Task is a System.Threading.Task. If that's the case then you can think about it this way:
A program can have many threads, but a processor core can only run one Thread at a time.
Threads are very expensive, and switching between the threads that are running is also very expensive.
So... Having thousands of threads doing stuff is inefficient. Imagine if your teacher gave you 10,000 tasks to do. You'd spend so much time cycling between them that you'd never get anything done. The same thing can happen to the CPU if you start too many threads.
To get around this, the .NET framework allows you to create Tasks. Tasks are a bit of work bundled up into an object, and they allow you to do interesting things like capture the output of that work and chain pieces of work together (first go to the store, then buy a magazine).
Tasks are scheduled on a pool of threads. The specific number of threads depends on the scheduler used, but the default scheduler tries to pick a number of threads that is optimal for the number of CPU cores that you have and how much time your tasks are spending actually using CPU time. If you want to, you can even write your own scheduler that does something specific like making sure that all Tasks for that scheduler always operate on a single thread.
So think of Tasks as items in your to-do list. You might be able to do 5 things at once, but if your boss gives you 10000, they will pile up in your inbox until the first 5 that you are doing get done. The difference between Tasks and the ThreadPool is that Tasks (as I mentioned earlier) give you better control over the relationship between different items of work (imagine to-do items with multiple instructions stapled together), whereas the ThreadPool just allows you to queue up a bunch of individual, single-stage items (Functions).
You are hearing two different notions of task. The first is the notion of a job, and the second is the notion of a process.
A long time ago (in computer terms), there were no threads. Each running instance of a program was called a process, since it simply performed one step after another after another until it exited. This matches the intuitive idea of a process as a series of steps, like that of a factory assembly line. The operating system manages the process abstraction.
Then, developers began to add multiple assembly lines to the factories. Now a program could do more than one thing at once, and either a library or (more commonly today) the operating system would manage the scheduling of the steps within each thread. A thread is kind of a lightweight process, but a thread belongs to a process, and all the threads in a process share memory. On the other hand, multiple processes can't mess with each others' memory. So, the multiple threads in your web server can each access the same information about the connection, but Word can't access Excel's in-memory data structures because Word and Excel are running as separate processes. The idea of a process as a series of steps doesn't really match the model of a process with threads, so some people took to calling the "abstraction formerly known as a process" a task. This is the second definition of task that you saw in the blog post. Note that plenty of people still use the word process to mean this thing.
Well, as threads became more commmon, developers added even more abstractions over top of them to make them easier to use. This led to the rise of the thread pool, which is a library-managed "pool" of threads. You pass the library a job, and the library picks a thread and runs the job on that thread. The .NET framework has a thread pool implementation, and the first time you heard about a "task" the documentation really meant a job that you pass to the thread pool.
So in a sense, both the documentation and the blog post are right. The overloading of the term task is the unfortunate source of confusion.
Threads have been a part of .Net from v1.0, Tasks were introduced in the Task Parallel Library TPL which was released in .Net 4.0.
You can consider a Task as a more sophisticated version of a Thread. They are very easy to use and have a lot of advantages over Threads as follows:
You can create return types to Tasks as if they are functions.
You can the "ContinueWith" method, which will wait for the previous task and then start the execution. (Abstracting wait)
Abstracts Locks which should be avoided as per guidlines of my company.
You can use Task.WaitAll and pass an array of tasks so you can wait till all tasks are complete.
You can attach task to the parent task, thus you can decide whether the parent or the child will exist first.
You can achieve data parallelism with LINQ queries.
You can create parallel for and foreach loops
Very easy to handle exceptions with tasks.
*Most important thing is if the same code is run on single core machine it will just act as a single process without any overhead of threads.
Disadvantage of tasks over threads:
You need .Net 4.0
Newcomers who have learned operating systems can understand threads better.
New to the framework so not much assistance available.
Some tips:-
Always use Task.Factory.StartNew method which is semantically perfect and standard.
Take a look at Task Parallel Libray for more information
http://msdn.microsoft.com/en-us/library/dd460717.aspx
Expanding on the comment by Eric Lippert:
Threads are a way that allows your application to do several things in parallel. For example, your application might have one thread that processes the events from the user, like button clicks, and another thread that performs some long computation. This way, you can do two different things “at the same time”. If you didn't do that, the user wouldn't be to click buttons until the computation finished. So, Thread is something that can execute some code you wrote.
Task, on the other hand represents an abstract notion of some job. That job can have a result, and you can wait until the job finishes (by calling Wait()) or say that you want to do something after the job finishes (by calling ContinueWith()).
The most common job that you want to represent is to perform some computation in parallel with the current code. And Task offers you a simple way to do that. How and when the code actually runs is defined by TaskScheduler. The default one uses a ThreadPool: a set of threads that can run any code. This is done because creating and switching threads in inefficient.
But Task doesn't have to be directly associated with some code. You can use TaskCompletionSource to create a Task and then set its result whenever you want. For example, you could create a Task and mark it as completed when the user clicks a button. Some other code could wait on that Task and while it's waiting, there is no code executing for that Task.
If you want to know when to use Task and when to use Thread: Task is simpler to use and more efficient that creating your own Threads. But sometimes, you need more control than what is offered by Task. In those cases, it makese sense to use Thread directly.
Tasks really are just a wrapper for the boilerplate code of spinning up threads manually. At the root, there is no difference. Tasks just make the management of threads easier, as well as they are generally more expressive due to the lessening of the boilerplate noise.
I am investigating the design of a work queue processor where the QueueProcessor retrieves a Command Pattern object from the Queue and executes it in a new thread.
I am trying to get my head around a potential Queue lockup scenario where nested Commands may result in a deadlock.
E.G.
A FooCommand object is placed onto the queue which the QueueProcessor then executes in its own thread.
The executing FooCommand places a BarCommand onto the queue.
Assuming that the maximum allowed threads was only 1 thread, the QueueProcessor would be in a deadlocked state since the FooCommand is infinitely waiting for the BarCommand to complete.
How can this situation be managed? Is a queue object the right object for the job? Are there any checks and balances that can be put into place to resolve this issue?
Many thanks. ( application uses C# .NET 3.0 )
You could redesign things so that FooCommand doesn't use the queue to run BarCommand but runs it directly, or you could split FooCommand into two, and have the first half stop immediately after queueing BarCommand, and have BarCommand queue the second have of FooCommand after it's done its work.
Queuing implicitly assumes an asynchronous execution model. By waiting for the command to exit, you are working synchronously.
Maybe you can split up the commands in three parts: FooCommand1 that executes until the BarCommand has to be sent, BarCommand and finally FooCommand2 that continues after BarCommand has finished. These three commands can be queued separately. Of course, BarCommand should make sure that FooCommand2 is queued.
For simple cases like this an additional monitoring thread that can spin off more threads on demand is helpful.
Basically every N seconds check to see if any jobs have been finished, if not, add another thread.
This won't necessarily handle even more complex deadlock problems, but it will solve this one.
My recommendation for the heavier problem is to restrict waits to newly spawned process, in other words, you can only wait on something you started, that way you never get deadlocks, since cycles are impossible in that situation.
If you are building the Queue object yourself there are a few things you can try:
Dynamically add new service threads. Use a timer and add a thread if the available thread count has been zero for too long.
If a command is trying to queue another command and wait for the result then you should synchronously execute the second command in the same thread. If the first thread simply waits for the second you won't get a concurrency benefit anyway.
I assume you want to queue BarCommand so it is able to run in parallel with FooCommand, but BarCommand will need the result at some later point. If this is the case then I would recommend using Future from the Parallel Extensions library.
Bart DeSmet has a good blog entry on this. Basically you want to do:
public void FooCommand()
{
Future<int> BarFuture = new Future<int>( () => BarCommand() );
// Do Foo's Processing - Bar will (may) be running in parallel
int barResult = BarFuture.Value;
// More processing that needs barResult
}
With libararies such as the Parallel Extensions I'd avoid "rolling your own" scheduling.