I am writing a Windows Service that receives messages/requests and executes them asynchronously - it does not need to wait for the item to complete, nor care about the result. I am successfully able to execute the requests as Tasks using System.Threading.Tasks.Task. Most of these items execute quickly (less than a second), but some take longer (2-3 minutes).
As a windows service, I need to respond to the "Stop" command and some of the Tasks will still be running. It is preferable to not Cancel the Tasks as the longer running ones might leave data in a bad state (and a rollback is very tricky).
What is the best way to handle this? I thought of keeping a List of the tasks that I have started so that I can do a WaitAll. During the execution of the service it will process tens of thousands of requests. How would I know when to remove completed Tasks from the List so the List doesn't grow wildly? I don't think I should be holding references to that many Task objects.
Thanks in advance.
You can use CancellationToken for this purposes.
Once OnStop event occured, you just call method Cancel() on the CancellationTokenSource and it will be propagated to all tasks that you passed the token in.
There are several techniques how to correctly cancel a task.
You may explicitly check from time to time inside the task if cancellation requested.
Or I beleive there is a property ThrowWhenCancelled on a token itself, and if cancellation has been requested token will throw an CancellationException.
If you don't care about the task results, but want to have a tracking list, just keep the List<Task<T>> (or may be ConcurrentBag<Task<T>>)as a local variable. From time to time start another task that will go through the list and check the Task.Status property, if it's running or else.
I don't think keeping that many references should be an issue as long as you maintain those references correctly.
Also, it depends how do you want to stop your application. If you are fine with the task being killed along with the application you may not track them (unless they hold some resources to hold, and you need to free them) at all. But in most of the cases I would say it should be correctly finalized.
EDIT: just reread your post. RequestAdditionalTime call may help you to wait until long runnning tasks are finished.
Check it on MSDN: ServiceBase.RequestAdditionalTime Method
If you only care about tasks finishing before your service terminates I would suggest using Thread instead of Task.
new Thread(WorkerMethod).Start();
Thread created this way is a so called foreground thread and your application (service) will not end until all foreground threads end. You need to make sure all your foreground threads do not hang under any condition otherwise your application will never terminate by itself. You can achieve the same thing with Task but you would need to keep list of all tasks you have run and use Task.WaitAll to wait for all of them to finish in your Stop event.
If you need to control your threads (i.e. keep reference to them) you need to use some sort of collection.
List<Thread> threads = new List<Thread>();
Thread thrd;
threads.Add(thrd = new Thread(WorkerMethod));
thrd.Start();
But if you actually need to control and cancel you tasks/threads you should rather go with Task which makes cancellation easier.
Related
I trying to find out if there is any built in method or property which tells about time taken by each task while executing multiple tasks (say adding tasks in an array or list and executing all of them at the same time and waiting to finish all of them). And in between of this process if one or more task takes a lot more time than expected, I should be able to identify that task and remove that task from the array or list. If there is no in-built method or property then is there any other way to find out this?
Hard to beat a good old Stopwatch, from the System.Diagnostics library. Set up a ConcurrentDictionary<int,StopWatch>, with the integer representing the managed thread ID of the task (you can also key it to Task objects or the Threads themselves depending on how you're spinning them up, or you can set up a communication token including a "Cancel" method). Each thread or task, when it starts, should create a Stopwatch, add it into the Dictionary, then Start() it, before continuing to do its work. When it's done, it should Stop() its stopwatch and remove it from the Dictionary (you can have it put the resulting Elapsed time into a ConcurrentQueue that you can use in the supervisor thread to log running times; the dictionary is used to track running times of active threads so your supervisor can manage them). Your supervisor thread can then periodically check for tasks taking much longer than average, and when it finds one, it can trip the cancellation token and remove the entry from the Dictionary.
You can use a StopWatch.
It's specifically made for timing the execution of your code accurately.
It sounds like you are running these tasks in parallel probably with the TPL, if that is the case, maybe you can assign a single CancellationToken to all the Tasks you are creating then use that token to stop all the tasks simultaneously if the StopWatch exceeds the time limit.
Alternatively, assign a StopWatch separately to each of your tasks and have them stop whenever necessary.
I thought that they were basically the same thing — writing programs that split tasks between processors (on machines that have 2+ processors). Then I'm reading this, which says:
Async methods are intended to be non-blocking operations. An await
expression in an async method doesn’t block the current thread while
the awaited task is running. Instead, the expression signs up the rest
of the method as a continuation and returns control to the caller of
the async method.
The async and await keywords don't cause additional threads to be
created. Async methods don't require multithreading because an async
method doesn't run on its own thread. The method runs on the current
synchronization context and uses time on the thread only when the
method is active. You can use Task.Run to move CPU-bound work to a
background thread, but a background thread doesn't help with a process
that's just waiting for results to become available.
and I'm wondering whether someone can translate that to English for me. It seems to draw a distinction between asynchronicity (is that a word?) and threading and imply that you can have a program that has asynchronous tasks but no multithreading.
Now I understand the idea of asynchronous tasks such as the example on pg. 467 of Jon Skeet's C# In Depth, Third Edition
async void DisplayWebsiteLength ( object sender, EventArgs e )
{
label.Text = "Fetching ...";
using ( HttpClient client = new HttpClient() )
{
Task<string> task = client.GetStringAsync("http://csharpindepth.com");
string text = await task;
label.Text = text.Length.ToString();
}
}
The async keyword means "This function, whenever it is called, will not be called in a context in which its completion is required for everything after its call to be called."
In other words, writing it in the middle of some task
int x = 5;
DisplayWebsiteLength();
double y = Math.Pow((double)x,2000.0);
, since DisplayWebsiteLength() has nothing to do with x or y, will cause DisplayWebsiteLength() to be executed "in the background", like
processor 1 | processor 2
-------------------------------------------------------------------
int x = 5; | DisplayWebsiteLength()
double y = Math.Pow((double)x,2000.0); |
Obviously that's a stupid example, but am I correct or am I totally confused or what?
(Also, I'm confused about why sender and e aren't ever used in the body of the above function.)
Your misunderstanding is extremely common. Many people are taught that multithreading and asynchrony are the same thing, but they are not.
An analogy usually helps. You are cooking in a restaurant. An order comes in for eggs and toast.
Synchronous: you cook the eggs, then you cook the toast.
Asynchronous, single threaded: you start the eggs cooking and set a timer. You start the toast cooking, and set a timer. While they are both cooking, you clean the kitchen. When the timers go off you take the eggs off the heat and the toast out of the toaster and serve them.
Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.
Now does it make sense that multithreading is only one kind of asynchrony? Threading is about workers; asynchrony is about tasks. In multithreaded workflows you assign tasks to workers. In asynchronous single-threaded workflows you have a graph of tasks where some tasks depend on the results of others; as each task completes it invokes the code that schedules the next task that can run, given the results of the just-completed task. But you (hopefully) only need one worker to perform all the tasks, not one worker per task.
It will help to realize that many tasks are not processor-bound. For processor-bound tasks it makes sense to hire as many workers (threads) as there are processors, assign one task to each worker, assign one processor to each worker, and have each processor do the job of nothing else but computing the result as quickly as possible. But for tasks that are not waiting on a processor, you don't need to assign a worker at all. You just wait for the message to arrive that the result is available and do something else while you're waiting. When that message arrives then you can schedule the continuation of the completed task as the next thing on your to-do list to check off.
So let's look at Jon's example in more detail. What happens?
Someone invokes DisplayWebSiteLength. Who? We don't care.
It sets a label, creates a client, and asks the client to fetch something. The client returns an object representing the task of fetching something. That task is in progress.
Is it in progress on another thread? Probably not. Read Stephen's article on why there is no thread.
Now we await the task. What happens? We check to see if the task has completed between the time we created it and we awaited it. If yes, then we fetch the result and keep running. Let's suppose it has not completed. We sign up the remainder of this method as the continuation of that task and return.
Now control has returned to the caller. What does it do? Whatever it wants.
Now suppose the task completes. How did it do that? Maybe it was running on another thread, or maybe the caller that we just returned to allowed it to run to completion on the current thread. Regardless, we now have a completed task.
The completed task asks the correct thread -- again, likely the only thread -- to run the continuation of the task.
Control passes immediately back into the method we just left at the point of the await. Now there is a result available so we can assign text and run the rest of the method.
It's just like in my analogy. Someone asks you for a document. You send away in the mail for the document, and keep on doing other work. When it arrives in the mail you are signalled, and when you feel like it, you do the rest of the workflow -- open the envelope, pay the delivery fees, whatever. You don't need to hire another worker to do all that for you.
In-browser Javascript is a great example of an asynchronous program that has no multithreading.
You don't have to worry about multiple pieces of code touching the same objects at the same time: each function will finish running before any other javascript is allowed to run on the page. (Update: Since this was written, JavaScript has added async functions and generator functions. These functions do not always run to completion before any other javascript is executed: whenever they reach a yield or await keyword, they yield execution to other javascript, and can continue execution later, similar to C#'s async methods.)
However, when doing something like an AJAX request, no code is running at all, so other javascript can respond to things like click events until that request comes back and invokes the callback associated with it. If one of these other event handlers is still running when the AJAX request gets back, its handler won't be called until they're done. There's only one JavaScript "thread" running, even though it's possible for you to effectively pause the thing you were doing until you have the information you need.
In C# applications, the same thing happens any time you're dealing with UI elements--you're only allowed to interact with UI elements when you're on the UI thread. If the user clicked a button, and you wanted to respond by reading a large file from the disk, an inexperienced programmer might make the mistake of reading the file within the click event handler itself, which would cause the application to "freeze" until the file finished loading because it's not allowed to respond to any more clicking, hovering, or any other UI-related events until that thread is freed.
One option programmers might use to avoid this problem is to create a new thread to load the file, and then tell that thread's code that when the file is loaded it needs to run the remaining code on the UI thread again so it can update UI elements based on what it found in the file. Until recently, this approach was very popular because it was what the C# libraries and language made easy, but it's fundamentally more complicated than it has to be.
If you think about what the CPU is doing when it reads a file at the level of the hardware and Operating System, it's basically issuing an instruction to read pieces of data from the disk into memory, and to hit the operating system with an "interrupt" when the read is complete. In other words, reading from disk (or any I/O really) is an inherently asynchronous operation. The concept of a thread waiting for that I/O to complete is an abstraction that the library developers created to make it easier to program against. It's not necessary.
Now, most I/O operations in .NET have a corresponding ...Async() method you can invoke, which returns a Task almost immediately. You can add callbacks to this Task to specify code that you want to have run when the asynchronous operation completes. You can also specify which thread you want that code to run on, and you can provide a token which the asynchronous operation can check from time to time to see if you decided to cancel the asynchronous task, giving it the opportunity to stop its work quickly and gracefully.
Until the async/await keywords were added, C# was much more obvious about how callback code gets invoked, because those callbacks were in the form of delegates that you associated with the task. In order to still give you the benefit of using the ...Async() operation, while avoiding complexity in code, async/await abstracts away the creation of those delegates. But they're still there in the compiled code.
So you can have your UI event handler await an I/O operation, freeing up the UI thread to do other things, and more-or-less automatically returning to the UI thread once you've finished reading the file--without ever having to create a new thread.
I am using ThreadPool in .NET to make some web request in the background, and I want to have a "Stop" button to cancel all the threads even if they are in the middle of making a request, so a simple bool wont do the job.
How can I do that?
Your situation is pretty much the canonical use-case for the Cancellation model in the .NET framework.
The idea is that you create a CancellationToken object and make it available to the operation that you might want to cancel. Your operation occasionally checks the token's IsCancellationRequested property, or calls ThrowIfCancellationRequested.
You can create a CancellationToken, and request cancellation through it, by using the CancellationTokenSource class.
This cancellation model integrates nicely with the .NET Task Parallel Library, and is pretty lightweight, more so than using system objects such as ManualResetEvent (though that is a perfectly valid solution too).
The correct way to handle this is to have a flag object that you signal.
The code running in those threads needs to check that flag periodically to see if it should exit.
For instance, a ManualResetEvent object is suitable for this.
You could then ask the threads to exit like this:
evt.Set();
and inside the threads you would check for it like this:
if (evt.WaitOne(0))
return; // or otherwise exit the thread
Secondly, since you're using the thread pool, what happens is that all the items you've queued up will still be processed, but if you add the if-statement above to the very start of the thread method, it will exit immediately. If that is not good enough you should build your own system using normal threads, that way you have complete control.
Oh, and just to make sure, do not use Thread.Abort. Ask the threads to exit nicely, do not outright kill them.
If you are going to stop/cancel something processing in another thread, ThreadPool is not the best choice, you should use Thread instead, and manage all of them in a container(e.g. a global List<Thread>), that guarantees you have full control of all the threads.
I've a c# single threaded application and currently working on to make it multi-threaded with the use of thread pools. I am stuck in deciding which model would work for my problem.
Here's my current scenario
While(1)
{
do_sometask();
wait(time);
}
And this is repeated almost forever. The new scenario has multiple threads which does the above. I could easily implement it by spawning number of threads based on the tasks I have to perform, where all the threads perform some task and wait forever.
The issue here is I may not know the number of tasks, so I can't just blindly spawn 500 threads. I thought about using threadpool, but because almost every thread loops forever and won't ever be freed up for new tasks in the queue, am not sure which other model to use.
I am looking for an idea or solution where I could break the loop in the thread and free it up instead of waiting, but come back and resume the same task after the wait(when the time gets elapsed, using something like a timer/checking timestamp of when the last task is performed).
With this I could use a limited number of threads (like in a thread pool) and serve the tasks which are coming in during the time old threads waits(virtually).
Any help is really appreciated.
If all you have is a bunch of things that happen periodically, it sounds what you want is a bunch of timers. Create a timer for each task, to fire when appropriate. So if you have two different tasks:
using System.Threading;
// Task1 happens once per minute
Timer task1Timer = new Timer(
s => DoTask1(),
null,
TimeSpan.FromMinutes(1),
TimeSpan.FromMinutes(1));
// Task2 happens once every 47 seconds
Timer task2Timer = new Timer(
s => DoTask2(),
null,
TimeSpan.FromSeconds(47),
TimeSpan.FromSeconds(47);
The timer is a pretty lightweight object, so having a whole bunch of them isn't really a problem. The timer only takes CPU resources when it fires. The callback method will be executed on a pool thread.
There is one potential problem. If you have a whole lot of timers all with the same period, then the callbacks will all be called at the same time. The threadpool should handle that gracefully by limiting the number of concurrent tasks, but I can't say for sure. But if your wait times are staggered, this is going to work well.
If you have small wait times (less than a second), then you probably need a different technique. I'll detail that if required.
With this design, you only have one thread blocked at any time.
Have one thread (the master thread) waiting on a concurrent blocking collection, such as the BlockingCollection. This thread will be blocked by a call to TryTake until something is placed in the collection, or after a certain amount of time has passed via a timeout passed into the call (more on this later).
Once it is unblocked, it may have a unit of work to be processed. It checks to see if there is one (i.e., the TryTake call didn't time out), then if there is capacity to perform this work, and if so, queues up a thread (pool, Task or whatevs) to service the work. This master thread then goes back to the blocking collection and tries to take another unit of work. The cycle continues.
As a unit of work is begun, it will be noted so that the main thread can see how many threads are working. Once this unit is completed, the notation will be removed. The thread is then freed.
You want to use a timeout so that if it is judged that too many operations are running concurrently, you will be able to re-evaluate this a set period of time down the road. Otherwise, that unit of work sits in the blocking collection until a new unit is added, which is not optimal.
Outside users of this instance can queue up new units of work by simply dropping them in the collection.
You can use a cancellation token to immediately unblock the thread when it's time to shut down operations. Have the worker operations take cancellation tokens as well so they can halt on shutdown.
I could implement it with the help of a threadpool and few conditions to check the last activity of the task before adding it to the threadpool queue.
I am investigating the design of a work queue processor where the QueueProcessor retrieves a Command Pattern object from the Queue and executes it in a new thread.
I am trying to get my head around a potential Queue lockup scenario where nested Commands may result in a deadlock.
E.G.
A FooCommand object is placed onto the queue which the QueueProcessor then executes in its own thread.
The executing FooCommand places a BarCommand onto the queue.
Assuming that the maximum allowed threads was only 1 thread, the QueueProcessor would be in a deadlocked state since the FooCommand is infinitely waiting for the BarCommand to complete.
How can this situation be managed? Is a queue object the right object for the job? Are there any checks and balances that can be put into place to resolve this issue?
Many thanks. ( application uses C# .NET 3.0 )
You could redesign things so that FooCommand doesn't use the queue to run BarCommand but runs it directly, or you could split FooCommand into two, and have the first half stop immediately after queueing BarCommand, and have BarCommand queue the second have of FooCommand after it's done its work.
Queuing implicitly assumes an asynchronous execution model. By waiting for the command to exit, you are working synchronously.
Maybe you can split up the commands in three parts: FooCommand1 that executes until the BarCommand has to be sent, BarCommand and finally FooCommand2 that continues after BarCommand has finished. These three commands can be queued separately. Of course, BarCommand should make sure that FooCommand2 is queued.
For simple cases like this an additional monitoring thread that can spin off more threads on demand is helpful.
Basically every N seconds check to see if any jobs have been finished, if not, add another thread.
This won't necessarily handle even more complex deadlock problems, but it will solve this one.
My recommendation for the heavier problem is to restrict waits to newly spawned process, in other words, you can only wait on something you started, that way you never get deadlocks, since cycles are impossible in that situation.
If you are building the Queue object yourself there are a few things you can try:
Dynamically add new service threads. Use a timer and add a thread if the available thread count has been zero for too long.
If a command is trying to queue another command and wait for the result then you should synchronously execute the second command in the same thread. If the first thread simply waits for the second you won't get a concurrency benefit anyway.
I assume you want to queue BarCommand so it is able to run in parallel with FooCommand, but BarCommand will need the result at some later point. If this is the case then I would recommend using Future from the Parallel Extensions library.
Bart DeSmet has a good blog entry on this. Basically you want to do:
public void FooCommand()
{
Future<int> BarFuture = new Future<int>( () => BarCommand() );
// Do Foo's Processing - Bar will (may) be running in parallel
int barResult = BarFuture.Value;
// More processing that needs barResult
}
With libararies such as the Parallel Extensions I'd avoid "rolling your own" scheduling.