Implement timeout for function/block - c#

I would like to do sth like this pseudo-code:
try for new Timespan(0,0,4) {
// maximum execution time of this block: 4 seconds
result = longRunningFunction(parameter);
doSthWithResult(result);
...
} catch(TimeOutException) {
Console.WriteLine("TimeOut occured");
}
Is such a construct available in C#, and if not, how would I implement a behaviour that allows trying to execute a function/block for a certain amount of time?
If that helps: I am searching for ASP.NET WebAPI (although I could use the timeout in a WinForms App as well).

If you want the longRunningFunction() itself to stop, then you need to implement logic in that method to do so. How to do that depends on the exact implementation of the method, which you haven't provided, so that would be unanswerable given your current question.
However, in many cases it's sufficient to simply abandon an operation, letting it run to completion on its own but simply ignoring the result. You might call that "getting on with your life". :)
If that's the case here, then something like this should work for you:
Task<T> resultTask = Task.Run(() => longRunningFunction(parameter));
// maximum execution time of this block: 4 seconds
await Task.WhenAny(resultTask, Task.Delay(4000));
if (resultTask.IsCompleted)
{
doSthWithResult(resultTask.Result);
}
else
{
Console.WriteLine("TimeOut occurred");
}
Replace T in the resultTask declaration with whatever the actual return type for longRunningFunction() is.
Note that the above is opportunistic, in that even if the long-running operation takes longer than 4 seconds and the Task.Delay(4000) wins the race, as long as it completes by the time your code gets back to the if (resultTask.IsCompleted check, it will still be considered a success. If you want to give it a strict cut-off, ignoring the result if the Task.Delay(4000) completes first even if the operation finishes by the time you actually check which finished first, you can do that by looking at which task finished first:
Task winner = await Task.WhenAny(resultTask, Task.Delay(4000));
if (resultTask == winner)
{
doSthWithResult(resultTask.Result);
}
else
...
Finally: even if you do need the longRunningFunction() to stop, you can use the above technique and then interrupt the operation in the else clause where you report the time-out (via whatever mechanism is appropriate in your caseā€¦again, without the actual code it's not possible to say exactly what that would be).

Related

Thread safe caching of calculation results

I want to cache calculation results in a ConcurrentDictionary<TKey,TValue>. Several threads may query the cache for an entry and generate it if it does not exist.
Since GetOrAdd(TKey, Func<TKey,TValue>) is not atomic, I think I should use GetOrAdd(TKey, TValue) with Task<CacheItem> as TValue.
So, when a thread wants to query a cache item, it generates a cold task coldTask, that is a task, which is not started, and potentially generates the the item, calls var cacheTask = cache.GetOrAdd(key, coldTask) for some key object, and then checks whether cacheTask is started or even has a result. If cacheTask is not started, the calling thread starts the task.
Is this a valid approach in principle?
One problem that remains is that
if(cacheTask.Status == Status.Created)
cacheTask.Start();
is not atomic, so the cacheTask may be started from another thread, before cacheTask.Start() is called here.
Is
try {
if(cacheTask.Status == Status.Created)
cacheTask.Start();
} catch {}
a valid workaround?
The principle should be fine, to start the task you should be able to do something like:
var newTask = new Task(...);
var dictionaryTask = myDictionary.GetOrAdd(myKey, newTask);
if(dictionaryTask == newTask){
newTask.Start();
}
return await dictionaryTask;
That should ensure that only the thread that created the task starts it.
I would suggest checking out Lazy<T> since it is somewhat related. I would also suggest doing some bench-marking, since the most appropriate approach will depend on your specific use case. Keep in mind that async/await, or blocking, a task will have some overhead, so it will depend on the cost of generating values, and the frequency this is done at.
As I suggested in the comments, I'd use TaskCompletionSource<TResult> and reference equality to avoid races and unnecessary additional tasks to be scheduled:
var tcs = new TaskCompletionSource<CacheItem>();
var actualTask = theDictionary.GetOrAdd(key, tcs.Task);
if(ReferenceEquals(actualTask, tcs.Task))
{
//Do the actual work here
tcs.SetResult(new CacheItem());
}
return actualTask;
If generation can fail then the //Do the actual work here section should be wrapped in a try/catch and SetException should be used on the completion source (to indicate to any existing waiters that the failure has occurred). But then you have to consider what it means for that failed entry in the cache, whether to remove or retry, etc, and all of the complexity that arises from trying to build a cache in the first place.

Why threads continue to run after a cancel has been called?

Consider this simple example code:
var cts = new CancellationTokenSource();
var items = Enumerable.Range(1, 20);
var results = items.AsParallel().WithCancellation(cts.Token).Select(i =>
{
double result = Math.Log10(i);
return result;
});
try
{
foreach (var result in results)
{
if (result > 1)
cts.Cancel();
Console.WriteLine($"result = {result}");
}
}
catch (OperationCanceledException e)
{
if (cts.IsCancellationRequested)
Console.WriteLine($"Canceled");
}
Foreach of the results in the parallel results it prints the results until result > 1
This code output is something like:
result = 0.9030899869919435
result = 0.8450980400142568
result = 0.7781512503836436
result = 0
result = 0.6020599913279624
result = 0.47712125471966244
result = 0.3010299956639812
result = 0.6989700043360189
result = 0.9542425094393249
result = 1
result = 1.0413926851582251 <-- This is normal
result = 1.2041199826559248 <-- Why it prints this value (and below)
result = 1.0791812460476249
result = 1.2304489213782739
result = 1.1139433523068367
result = 1.255272505103306
result = 1.146128035678238
result = 1.2787536009528289
result = 1.1760912590556813
result = 1.3010299956639813
Canceled
My question is why it continue printing values over 1? I had expected that it the Cancel() token will be terminate the process.
Update 1
#mike-s's answer suggested:
It's also useful to check a cancellation token inside a loop (as a
means to abort the loop) or before a long operation.
I've tried adding a check
foreach (var result in results)
{
if (result > 1)
cts.Cancel();
if (!cts.IsCancellationRequested) //<----Check the cancellation token before printing
Console.WriteLine($"result = {result}");
}
It still gives the same result's output.
My question is why it continue printing values over 1?
Imagine you hired a hundred pilots to fly a hundred planes from a hundred airports. A bunch of them take off, and then you send a message saying "cancel all the flights". Well, there are a bunch of planes on the runway at takeoff speed when you send that message, and the message arrives after they are in the air. Those flights will not be cancelled!
You are discovering the most important thing to know about multithreaded programming. You have to reason as though every possible ordering of things happening might occur. That includes messages arriving later than you think they should.
In particular, your problem is a result of your abuse of the parallelization mechanisms, which are designed to parallelize long work. You've created a bunch of tasks that take less time to run than it takes to send the message stopping them. It should not be a surprise in that case that some of the tasks complete after they've been told to stop.
I expected that calling Cancel() on the token would terminate the process.
Your expectation is completely, totally wrong. Stop expecting that, since that expectation in no way conforms to reality. A cancellation token is a request to cancel an operation as soon as it is convenient to do so. It's not terminating a thread or a process.
However, even if you did terminate the threads, you would still observe this behaviour. Thread termination is an event like any other, and that event is not instantaneous. It takes time to execute, and other threads can continue their work while that thread termination is executing.
what do you mean by "convenient" in "a request to cancel an operation as soon as it is convenient to do so"?
Let's take a step back.
If the work to be done is extremely short, then there is no need to represent it as a task. Just do the work! In general if work takes less than about 30ms, just do the work.
Therefore, let's assume that every task takes a long time.
Now, why might a task take a long time? There are generally two reasons:
We're waiting for another system to complete some task. We're waiting for a network packet or a disk read or some such thing.
We have a huge amount of computation, and the CPU is saturated.
Suppose we are in the first situation. Does parallelizing help? No. If you are waiting for a package in the mail, hiring one, two, ten or a hundred people to wait does not make the package come faster.
But that does help for the second case; if we have an extra CPU in the machine we can dedicate two CPUs to solve the problem in about half the time.
Therefore we can assume that if we are parallelizing a task, it is because the CPU is doing a lot of work.
Great. Now, what is the nature of "CPU does a lot of work?" It almost always involves a loop somewhere.
So then, how do we cancel a task? We do not cancel a task by terminating the thread. We ask the task to cancel itself. A well-designed task will take a cancellation token, and in its loop will check to see if the cancellation token is indicating that the task is cancelled. Cancellation is cooperative. The task has to cooperate and decide when it checks to see if it is cancelled.
Notice that checking to see if you are cancelled is work, and that is work that takes time away from the real task. If you spend half your time checking to see if you are cancelled, your task takes twice as long as it could. And remember, the point of parallelizing the task is to make it take half as long, so doubling the amount of time it takes to do the task is a non-starter.
Therefore most tasks do not check every time through the loop if they are cancelled. A well-designed task will check every few milliseconds, not every few nanoseconds.
That's what I mean by "a cancellation is a request to stop when it is convenient". The task, if it was written correctly, should know what a good time to check for cancellation is so that it balances responsiveness against performance.
Cancel() on a cancellation token is just signaling the cancellation token, which just impacts other places in the code that check the token (such as calls to cts.IsCancellationRequested). Framework calls often will check the cancellation token and abort. It's also useful to check a cancellation token inside a loop (as a means to abort the loop) or before a long operation.
The cancellation token does not forcibly terminate a thread or process. There are other APIs for that, such as Environment.Exit.
Following up on Eric's excellent answer ... "a thread or process" and "a unit of work" usually should not be the same thing. Creating a thread to carry out one unit of work and then die is like shooting flaming arrows into the air: you can't control it, can't predict it, and those arrows start interfering with each other. The system becomes choked with so much work that it can't work on anything. (A condition called "thrashing.")
A much better strategy is modeled after a fast-food restaurant: a small number of workers, each with an assigned task, taking work-requests from a queue and delivering the finished sandwiches to another. At any instant, any queue might contain more or fewer entries. You don't see any of the workers falling down, dead. During lunch rush-hour, more workers are busy but at the same tasks. During a slow period they remain at their posts, patiently waiting for the next order to arrive. Any particular work-request might be flagged as "cancelled," and the workers notice this and respond accordingly. No part of the restaurant is "over-committed," and the entire operation is able to consistently produce a predictable number of sandwiches per hour, according to management control.

Chaining arbitrary number of tasks together in C#.NET

What I have
I have a set of asynchronous processing methods, similar to:
public class AsyncProcessor<T>
{
//...rest of members, etc.
public Task Process(T input)
{
//Some special processing, most likely inside a Task, so
//maybe spawn a new Task, etc.
Task task = Task.Run(/* maybe private method that does the processing*/);
return task;
}
}
What I want
I would like to chain them all together, to execute in sequential order.
What I tried
I have tried to do the following:
public class CompositeAsyncProcessor<T>
{
private readonly IEnumerable<AsyncProcessor<T>> m_processors;
//Constructor receives the IEnumerable<AsyncProcessor<T>> and
//stores it in the field above.
public Task ProcessInput(T input)
{
Task chainedTask = Task.CompletedTask;
foreach (AsyncProcessor<T> processor in m_processors)
{
chainedTask = chainedTask.ContinueWith(t => processor.Process(input));
}
return chainedTask;
}
}
What went wrong
However, tasks do not run in order because, from what I have understood, inside the call to ContinueWith, the processor.Process(input) call is performed immediately and the method returns independently of the status of the returned task. Therefore, all processing Tasks still begin almost simultaneously.
My question
My question is whether there is something elegant that I can do to chain the tasks in order (i.e. without execution overlap). Could I achieve this using the following statement, (I am struggling a bit with the details), for example?
chainedTask = chainedTask.ContinueWith(async t => await processor.Process(input));
Also, how would I do this without using async/await, only ContinueWith?
Why would I want to do this?
Because my Processor objects have access to, and request things from "thread-unsafe" resources. Also, I cannot just await all the methods because I have no idea about how many they are, so I cannot just write down the necessary lines of code.
What do I mean by thread-unsafe? A specific problem
Because I may be using the term incorrectly, an illustration is a bit better to explain this bit. Among the "resources" used by my Processor objects, all of them have access to an object such as the following:
public interface IRepository
{
void Add(object obj);
bool Remove(object obj);
IEnumerable<object> Items { get; }
}
The implementation currently used is relatively naive. So some Processor objects add things, while others retrieve the Items for inspection. Naturally, one of the exceptions I get all too often is:
InvalidOperationException: Collection was modified, enumeration
operation may not execute.
I could spend some time locking access and pre-running the enumerations. However, this was the second option I would get down to, while my first thought was to just make the processes run sequentially.
Why must I use Tasks?
While I have full control in this case, I could say that for the purposes of the question, I might not be able to change the base implementation, so what would happen if I were stuck with Tasks? Furthermore, the operations actually do represent relatively time-consuming CPU-bound operations plus I am trying to achieve a responsive user interface so I needed to unload some burden to asynchronous operations. While being useful and, in most of my use-cases, not having the necessity to chain multiple of them, rather a single one each time (or a couple, but always specific and of a specific count, so I was able to hook them together without iterations and async/await), one of the use-cases finally necessitated chaining an unknown number of Tasks together.
How I deal with this currently
The way I am dealing with this currently is to append a call to Wait() inside the ContinueWith call, i.e.:
foreach (AsyncProcessor<T> processor in m_processors)
{
chainedTask = chainedTask.ContinueWith(t => processor.Process(input).Wait());
}
I would appreciate any idea on how I should do this, or how I could do it more elegantly (or, "async-properly", so to speak). Also, I would like to know how I can do this without async/await.
Why my question is different from this question, which did not answer my question entirely.
Because the linked question has two tasks, so the solution is to simply write the two lines required, while I have an arbitrary (and unknown) number of tasks, so I need an suitable iteration. Also, my method is not async. I now understand (from the single briefly available answer, which was deleted) that I could do it fairly easily if I changed my method to async and await each processor's Task method, but I still wish to know how this could be achieved without async/await syntax.
Why my question is not a duplicate of the other linked questions
Because none of them explains how to chain correctly using ContinueWith and I am interested in a solution that utilizes ContinueWith and does not make use of the async/await pattern. I know this pattern may be the preferable solution, I want to understand how to (if possible) make arbitrary chaining using ContinueWith calls properly. I now know I don't need ContinueWith. The question is, how do I do it with ContinueWith?
foreach + await will run Processes sequentially.
public async Task ProcessInputAsync(T input)
{
foreach (var processor in m_processors)
{
await processor.Process(input));
}
}
Btw. Process, should be called ProcessAsync
The method Task.ContinueWith does not understand async delegates, like Task.Run do, so when you return a Task it considers this as a normal return value and wraps it in another Task. So you end up receiving a Task<Task> instead of what you expected to get. The problem would be obvious if the AsyncProcessor.Process was returning a generic Task<T>. In this case you would get a compile error because of the illegal casting from Task<Task<T>> to Task<T>. In your case you cast from Task<Task> to Task, which is legal, since Task<TResult> derives from Task.
Solving the problem is easy. You just need to unwrap the Task<Task> to a simple Task, and there is a built-in method Unwrap that does exactly that.
There is another problem that you need to solve though. Currently your code suppresses all exceptions that may occur on each individual AsyncProcessor.Process, which I don't think it was intended. So you must decide which strategy to follow in this case. Are you going to propagate the first exception immediately, or you prefer to cache them all and propagate them at the end bundled in an AggregateException, like the Task.WhenAll does? The example bellow implements the first strategy.
public class CompositeAsyncProcessor<T>
{
//...
public Task Process(T input)
{
Task current = Task.CompletedTask;
foreach (AsyncProcessor<T> processor in m_processors)
{
current = current.ContinueWith(antecessor =>
{
if (antecessor.IsFaulted)
return Task.FromException<T>(antecessor.Exception.InnerException);
return processor.Process(input);
},
CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default
).Unwrap();
}
return current;
}
}
I have used an overload of ContinueWith that allows configuring all the options, because the defaults are not ideal. The default TaskContinuationOptions is None. Configuring it to ExecuteSynchronously you minimize the thread switches, since each continuation will run in the same thread that completed the previous one.
The default task scheduler is TaskScheduler.Current. By specifying TaskScheduler.Default you make it explicit that you want the continuations to run in thread-pool threads (for some exceptional cases that won't be able to run synchronously). The TaskScheduler.Current is context specific, and if it ever surprises you it won't be in a good way.
As you see there are a lot of gotchas with the old-school ContinueWith approach. Using the modern await in a loop is a lot easier to implement, and a lot more difficult to get it wrong.

Using TPL to batch/de-parallelise separate invocations

Maybe the TPL isn't the right tool, but at least from one not particularly familiar with it, it seems like it ought to have what I'm looking for. I'm open to answers that don't use it though.
Given a method like this:
public Task Submit(IEnumerable<WorkItem> work)
This can execute an expensive async operation on a collection of items. Normally the caller batches up these items and submits as many as it can at once, and there's a fairly long delay between such batches, so it executes fairly efficiently.
However there are some occasions where no external batching happens and Submit gets called for a small number of items (typically only one) many times in quick succession, possibly even concurrently from separate threads.
What I'd like to do is to defer processing (while accumulating the arguments) until there has been a certain amount of time with no calls, and then execute the operation with the whole batch, in the originally specified order.
Or in other words, each time the method is called it should add its arguments to the list of pending items and then restart the delay from zero, such that a certain idle time is required before anything is processed.
I don't want a size limit on the batch (so I don't think BatchBlock is the right answer), I just want a delay/timeout. I'm certain that the calling pattern is such that there will be an idle period at some point.
I'm not sure whether it's better to defer even the first call, or if it should start the operation immediately and only defer subsequent calls if the operation is still in progress.
If it makes the problem easier, I'm ok with making Submit return void instead of a Task (ie. not being able to observe when it completes).
I'm sure I can muddle together something that works like this, but it seems like the sort of thing that ought to already exist somewhere. Can anyone point me in the right direction? (I'd prefer not to use non-core libraries, though.)
Ok, so for lack of finding anything suitable I ended up implementing something myself. Seems to do the trick. (I implemented it a bit more generically than shown here in my actual code, so I could reuse it more easily, but this illustrates the concept.)
private readonly ConcurrentQueue<WorkItem> _Items
= new ConcurrentQueue<WorkItem>();
private CancellationTokenSource _CancelSource;
public async Task Submit(IEnumerable<WorkItem> items)
{
var cancel = ReplacePreviousTasks();
foreach (var item in items)
{
_Items.Enqueue(item);
}
await Task.Delay(TimeSpan.FromMilliseconds(250), cancel.Token);
if (!cancel.IsCancellationRequested)
{
await RunOperation();
}
}
private CancellationTokenSource ReplacePreviousTasks()
{
var cancel = new CancellationTokenSource();
var old = Interlocked.Exchange(ref _CancelSource, cancel);
if (old != null)
{
old.Cancel();
}
return cancel;
}
private async Task RunOperation()
{
var items = new List<WorkItem>();
WorkItem item;
while (_Items.TryDequeue(out item))
{
items.Add(item);
}
// do the operation on items
}
If multiple submissions occur within 250ms, the earlier ones are cancelled, and the operation executes once on all of the items after the 250ms is up (counting from the latest submit).
If another submit occurs while the operation is running, it will continue to run without cancelling (there's a tiny chance it will steal some of the items from the later call, but that's ok).
(Technically checking cancel.IsCancellationRequested isn't really necessary, since the await above will throw an exception if it was cancelled during the delay. But it doesn't hurt, and there is a tiny window it might catch.)

Can many instances of an async task share a reference to a concurrent collection and add items concurrently to it in C#?

I'm just beginning to learn C# threading and concurrent collections, and am not sure of the proper terminology to pose my question, so I'll describe briefly what I'm trying to do. My grasp of the subject is rudimentary at best at this point. Is my approach below even feasible as I've envisioned it?
I have 100,000 urls in a Concurrent collection that must be tested--is the link still good? I have another concurrent collection, initially empty, that will contain the subset of urls that an async request determines to have been moved (400, 404, etc errors).
I want to spawn as many of these async requests concurrently as my PC and our bandwidth will allow, and was going to start at 20 async-web-request-tasks per second and work my way up from there.
Would it work if a single async task handled both things: it would make the async request and then add the url to the BadUrls collection if it encountered a 4xx error? A new instance of that task would be spawned every 50ms:
class TestArgs args {
ConcurrentBag<UrlInfo> myCollection { get; set; }
System.Uri currentUrl { get; set; }
}
ConcurrentQueue<UrlInfo> Urls = new ConncurrentQueue<UrlInfo>();
// populate the Urls queue
<snip>
// initialize the bad urls collection
ConcurrentBag<UrlInfo> BadUrls = new ConcurrentBag<UrlInfo>();
// timer fires every 50ms, whereupon a new args object is created
// and the timer callback spawns a new task; an autoEvent would
// reset the timer and dispose of it when the queue was empty
void SpawnNewUrlTask(){
// if queue is empty then reset the timer
// otherwise:
TestArgs args = {
myCollection = BadUrls,
currentUrl = getNextUrl() // take an item from the queue
};
Task.Factory.StartNew( asyncWebRequestAndConcurrentCollectionUpdater, args);
}
public async Task asyncWebRequestAndConcurrentCollectionUpdater(TestArgs args)
{
//make the async web request
// add the url to the bad collection if appropriate.
}
Feasible? Way off?
The approach seems fine, but there are some issues with the specific code you've shown.
But before I get to that, there have been suggestions in the comments that Task Parallelism is the way to go. I think that's misguided. There's a common misconception that if you want to have lots of work going on in parallel, you necessarily need lots of threads. That's only true if the work is compute-bound. But the work you're doing will be IO bound - this code is going to spend the vast majority of its time waiting for responses. It will do very little computation. So in practice, even if it only used a single thread, your initial target of 20 requests per second doesn't seem like a workload that would cause a single CPU core to break into a sweat.
In short, a single thread can handle very high levels of concurrent IO. You only need multiple threads if you need parallel execution of code, and that doesn't look likely to be the case here, because there's so little work for the CPU in this particular job.
(This misconception predates await and async by years. In fact, it predates the TPL - see http://www.interact-sw.co.uk/iangblog/2004/09/23/threadless for a .NET 1.1 era illustration of how you can handle thousands of concurrent requests with a tiny number of threads. The underlying principles still apply today because Windows networking IO still basically works the same way.)
Not that there's anything particularly wrong with using multiple threads here, I'm just pointing out that it's a bit of a distraction.
Anyway, back to your code. This line is problematic:
Task.Factory.StartNew( asyncWebRequestAndConcurrentCollectionUpdater, args);
While you've not given us all your code, I can't see how that will be able to compile. The overloads of StartNew that accept two arguments require the first to be either an Action, an Action<object>, a Func<TResult>, or a Func<object,TResult>. In other words, it has to be a method that either takes no arguments, or accepts a single argument of type object (and which may or may not return a value). Your 'asyncWebRequestAndConcurrentCollectionUpdater' takes an argument of type TestArgs.
But the fact that it doesn't compile isn't the main problem. That's easily fixed. (E.g., change it to Task.Factory.StartNew(() => asyncWebRequestAndConcurrentCollectionUpdater(args));) The real issue is what you're doing is a bit weird: you're using Task.StartNew to invoke a method that already returns a Task.
Task.StartNew is a handy way to take a synchronous method (i.e., one that doesn't return a Task) and run it in a non-blocking way. (It'll run on the thread pool.) But if you've got a method that already returns a Task, then you didn't really need to use Task.StartNew. The weirdness becomes more apparent if we look at what Task.StartNew returns (once you've fixed the compilation error):
Task<Task> t = Task.Factory.StartNew(
() => asyncWebRequestAndConcurrentCollectionUpdater(args));
That Task<Task> reveals what's happening. You've decided to wrap a method that was already asynchronous with a mechanism that is normally used to make non-asynchronous methods asynchronous. And so you've now got a Task that produces a Task.
One of the slightly surprising upshots of this is that if you were to wait for the task returned by StartNew to complete, the underlying work would not necessarily be done:
t.Wait(); // doesn't wait for asyncWebRequestAndConcurrentCollectionUpdater to finish!
All that will actually do is wait for asyncWebRequestAndConcurrentCollectionUpdater to return a Task. And since asyncWebRequestAndConcurrentCollectionUpdater is already an async method, it will return a task more or less immediately. (Specifically, it'll return a task the moment it performs an await that does not complete immediately.)
If you want to wait for the work you've kicked off to finish, you'll need to do this:
t.Result.Wait();
or, potentially more efficiently, this:
t.Unwrap().Wait();
That says: get me the Task that my async method returned, and then wait for that. This may not be usefully different from this much simpler code:
Task t = asyncWebRequestAndConcurrentCollectionUpdater("foo");
... maybe queue up some other tasks ...
t.Wait();
You may not have gained anything useful by introducing `Task.Factory.StartNew'.
I say "may" because there's an important qualification: it depends on the context in which you start the work. C# generates code which, by default, attempts to ensure that when an async method continues after an await, it does so in the same context in which the await was initially performed. E.g., if you're in a WPF app and you await while on the UI thread, when the code continues it will arrange to do so on the UI thread. (You can disable this with ConfigureAwait.)
So if you're in a situation in which the context is essentially serialized (either because it's single-threaded, as will be the case in a GUI app, or because it uses something resembling a rental model, e.g. the context of an particular ASP.NET request), it may actually be useful to kick an async task off via Task.Factory.StartNew because it enables you to escape the original context. However, you just made your life harder - tracking your tasks to completion is somewhat more complex. And you might have been able to achieve the same effect simply by using ConfigureAwait inside your async method.
And it may not matter anyway - if you're only attempting to manage 20 requests a second, the minimal amount of CPU effort required to do that means that you can probably manage it entirely adequately on one thread. (Also, if this is a console app, the default context will come into play, which uses the thread pool, so your tasks will be able to run multithreaded in any case.)
But to get back to your question, it seems entirely reasonable to me to have a single async method that picks a url off the queue, makes the request, examines the response, and if necessary, adds an entry to the bad url collection. And kicking the things off from a timer also seems reasonable - that will throttle the rate at which connections are attempted without getting bogged down with slow responses (e.g., if a load of requests end up attempting to talk to servers that are offline). It might be necessary to introduce a cap for the maximum number of requests in flight if you hit some pathological case where you end up with tens of thousands of URLs in a row all pointing to a server that isn't responding. (On a related note, you'll need to make sure that you're not going to hit any per-client connection limits with whichever HTTP API you're using - that might end up throttling the effective throughput.)
You will need to add some sort of completion handling - just kicking off asynchronous operations and not doing anything to handle the results is bad practice, because you can end up with exceptions that have nowhere to go. (In .NET 4.0, these used to terminate your process, but as of .NET 4.5, by default an unhandled exception from an asynchronous operation will simply be ignored!) And if you end up deciding that it is worth launching via Task.Factory.StartNew remember that you've ended up with an extra layer of wrapping, so you'll need to do something like myTask.Unwrap().ContinueWith(...) to handle it correctly.
Of course you can. Concurrent collections are called 'concurrent' because they can be used... concurrently by multiple threads, with some warranties about their behaviour.
A ConcurrentQueue will ensure that each element inserted in it is extracted exactly once (concurrent threads will never extract the same item by mistake, and once the queue is empty, all the items have been extracted by a thread).
EDIT: the only thing that could go wrong is that 50ms is not enough to complete the request, and so more and more tasks cumulate in the task queue. If that happens, your memory could get filled, but the thing would work anyway. So yes, it is feasible.
Anyway, I would like to underline the fact that a task is not a thread. Even if you create 100 tasks, the framework will decide how many of them will be actually executed concurrently.
If you want to have more control on the level of parallelism, you should use asynchronous requests.
In your comments, you wrote "async web request", but I can't understand if you wrote async just because it's on a different thread or because you intend to use the async API.
If you were using the async API, I'd expect to see some handler attached to the completion event, but I couldn't see it, so I assumed you're using synchronous requests issued from an asynchronous task.
If you're using asynchronous requests, then it's pointless to use tasks, just use the timer to issue the async requests, since they are already asynchronous.
When I say "asynchronous request" I'm referring to methods like WebRequest.GetResponseAsync and WebRequest.BeginGetResponse.
EDIT2: if you want to use asynchronous requests, then you can just make requests from the timer handler. The BeginGetResponse method takes two arguments. The first one is a callback procedure, that will be called to report the status of the request. You can pass the same procedure for all the requests. The second one is an user-provided object, which will store status about the request, you can use this argument to differentiate among different requests. You can even do it without the timer. Something like:
private readonly int desiredConcurrency = 20;
struct RequestData
{
public UrlInfo url;
public HttpWebRequest request;
}
/// Handles the completion of an asynchronous request
/// When a request has been completed,
/// tries to issue a new request to another url.
private void AsyncRequestHandler(IAsyncResult ar)
{
if (ar.IsCompleted)
{
RequestData data = (RequestData)ar.AsyncState;
HttpWebResponse resp = data.request.EndGetResponse(ar);
if (resp.StatusCode != 200)
{
BadUrls.Add(data.url);
}
//A request has been completed, try to start a new one
TryIssueRequest();
}
}
/// If urls is not empty, dequeues a url from it
/// and issues a new request to the extracted url.
private bool TryIssueRequest()
{
RequestData rd;
if (urls.TryDequeue(out rd.url))
{
rd.request = CreateRequestTo(rd.url); //TODO implement
rd.request.BeginGetResponse(AsyncRequestHandler, rd);
return true;
}
else
{
return false;
}
}
//Called by a button handler, or something like that
void StartTheRequests()
{
for (int requestCount = 0; requestCount < desiredConcurrency; ++requestCount)
{
if (!TryIssueRequest()) break;
}
}

Categories

Resources