Is a good idea to async-ize a sync method? - c#

In my ASP.NET Core app, at some points I'm querying a couple ADs for data. This being AD, the queries take some time to complete and the DirectoryServices API contains only synchronous calls.
Is it a good practice to try and wrap the AD sync calls as async? I think it's done like this (just an example, not the real query):
private async Task<string[]> GetUserGroupsAsync(string samAccountName)
{
var func = new Func<string, string[]>(sam =>
{
var result = new List<string>();
using (var ctx = new PrincipalContext(ContextType.Domain, "", "", ""))
{
var p = new UserPrincipal(ctx)
{
SamAccountName = sam
};
using (var search_obj = new PrincipalSearcher(p))
{
var query_result = search_obj.FindOne();
if (query_result != null)
{
var usuario = query_result as UserPrincipal;
var directory_entry = usuario.GetUnderlyingObject() as DirectoryEntry;
var grupos = usuario.GetGroups(ctx).OfType<GroupPrincipal>().ToArray();
if (grupos != null)
{
foreach (GroupPrincipal g in grupos)
{
result.Add(g.Name);
}
}
}
}
}
return result.ToArray();
});
var result = await Task.Run(() => func(samAccountName));
return result;
}

Is it a good practice
Usually not.
In a desktop app where you don't want to hold up the UI thread, then this idea can actually be a good idea. That Task.Run moves the work to a different thread and the UI thread can continue responding to user input while you're waiting for a response.
You tagged ASP.NET. The answer there is also "it depends". ASP.NET has a limited amount of worker threads that it's allowed to use. The benefit of asynchronous code is to allow a thread to go and work on some other request while you're waiting for a response. Thus, you can serve more requests with the same amount of available threads. It helps the overall performance of your application.
If you're calling await GetUserGroupsAsync(), then there is absolutely no benefit to doing what you're doing. You're freeing up the calling thread, but you've created a new thread that is going to sit locked until a response is returned. So your net thread savings is zero, and you have the additional CPU overhead of setting up the task.
If you intend on calling GetUserGroupsAsync() and then going out and getting other data while you wait for a response, then this can save time. It won't save threads, but just time. But you should be conscious that you are now taking up two threads for each request instead of just one, which means you can hit the ASP.NET max thread count faster, potentially hurting the overall performance of your application.
But whether you want to save time in ASP.NET, or if you want to free up the UI thread in a desktop app, I would still argue that you should not use Task.Run inside GetUserGroupsAsync(). If the caller wants to offload that waiting to another thread so it can then go get other data, then the caller can use Task.Run, like this:
var groupsTask = Task.Run(() => GetUserGroupsAsync());
// make HTTP request or get some other external data while we wait
var groups = await groupsTask;

The decision on whether you should create a method for a class should depend on the answer to the question: if someone thinks of what this class represents, would he think that this class will have this functionality?
Compare this with class string and methods about string equality. Most people would think that two strings are equal if they have exactly the same characters in the same order. However, for a lot of applications, it might be handy to be able to compare two strings with case insensitivity. Instead of changing the equality method of string, a new class is created. This StringComparer class contains a lot of methods to compare strings using different definitions of equality.
If someone would say: "Okay, I've just created a class that represents several methods to compare two strings for equality". Would you expect that comparing with case insensitivity is one of the methods of this class? Of course you would!
The same should be with your class. I don't know what your class represents. However, apparently you thought, that someone who has an object of this class would be happy to "Get User Groups". He is happy that he doesn't have to know how that someone made this method for him, and that he doesn't need to know the insides of the class to be able to get the user groups.
This information hiding is an important thing of classes. It gives the creator of the class the freedom to internally change how the class works, without having to change usage of the class.
So if everyone who knows what your class represents would think: "of course getting user groups will take a considerable amount of time", and "of course, my thread will be waiting idly when getting user groups", then users of your class would expect the presence of asyn-await, to prevent idly waiting.
On the other hand, it might be that users of your class would say: "Well, I know that getting user groups will take some heavy calculations. It will take some time, but my thread will be very busy". In that case, they won't expect an async method.
Assuming that you have a non-async method to get the user groups:
string[] GetUserGroups(string samAccountName) {...}
The async method would be very simple:
Task<string[] GetUserGroupsAsync(string samAccountName)
{
return Task.Run(() => GetUserGroups(samAccountName));
}
The only thing you would have to decide is: do the users of my class expect this method?
Advantages and Disadvantages
Disadvantage of having a Sync and an Async method:
People who learn about your class have to learn about more methods
Users of your class can't decide how the async method calls the sync one, without creating an extra async method, which will only add to the confusion
You'll have to add an extra unit test
You'll have to maintain the async method forever.
Advantages of having an async method:
If in future a user group would be fetched from another process, for instance a database, or an XML file, or maybe the internet, then you can internally change the class, without having to change the many, many users (after all, all your classes are very popular, aren't they :)
Conclusion
If people look at your class, and they wouldn't even think that fetching user groups would be an async method, then don't create it.
If you think that maybe in future it could be that another process provides the user groups, then it would be wise to prepare your users about this.

Related

Is it possible to throttle Parallel.ForEachAsync in .NET 6.0 to avoid rate limiting?

I'm fairly new to programming (< 3 years exp), so I don't have a great understanding of the subjects in this post. Please bear with me.
My team is developing an integration with a third party system, and one of the third party's endpoints lacks a meaningful way to get a list of entities matching a condition.
We have been fetching these entities by looping over the collection of requests, and adding the results of each awaited call to a list. This works just fine, but getting the entities takes a lot longer than getting entities from other endpoints that lets us get a list of entities by providing a list of ids.
.NET 6.0 introduced Parallel.ForEachAsync(), which lets us execute multiple awaitable tasks asynchronously in parallel.
For example:
public async Task<List<TEntity>> GetEntitiesInParallelAsync<TEntity>(List<IRestRequest> requests)
where TEntity : IEntity
{
var entities = new ConcurrentBag<TEntity>();
// Create a function that takes a RestRequest and returns the
// result of the request's execution, for each request
var requestExecutionTasks = requests.Select(i =>
new Func<Task<TEntity>>(() => GetAsync<TEntity>(i)));
// Execute each of the functions asynchronously in parallel,
// and add the results to the aggregate as they come in
await Parallel.ForEachAsync(requestExecutionTasks, new ParallelOptions
{
// This lets us limit the number of threads to use. -1 is unlimited
MaxDegreeOfParallelism = -1
}, async (func, _) => entities.Add(await func()));
return entities.ToList();
}
Using this code rather than the simple foreach-loop sped up the time it takes to get the ~30 entities on my test instance, by 91% on average. That's awesome. However, we are worried about the rate limiting that is likely to occur when we use it on a client's system with possibly thousands of entities. We have a system in place that detects the "you are rate limited"-message from their API, and cues the requests for a second or so before trying again, but this is not as much a good solution as it is a safety measure.
If we where just looping over the requests, we could have throttled the calls by doing something like await Task.Delay(minimumDelay) in each iteration of the loop. Correct me if I'm wrong, but from what I understand this wouldn't actually work when executing the requests in parallel foreach, as it would make all requests wait the same amount of time before the execution. Is there a way to make each individual request wait a certain amount of time before execution, only if we are close to being rate limited? If at all possible, I would like to do this without limiting the number of threads to use.
Edit
I wanted to let this question sit a little so more people could answer. Since no new answers or comments have been added, I'm marking the one answer I got as correct. That being said, the answer suggests a different approach than using Parallel.ForEachAsync.
If I understand the current answer correctly, the answer to my original question of whether or not it's possible to throttle Parallel.ForEachAsync, would be: "no, it's not".
My suggestion is to ditch the Parallel.ForEachAsync approach, and use instead the new Chunk LINQ operator in combination with the Task.WhenAll method. You can launch 100 asynchronous operations every second like this:
public async Task<List<TEntity>> GetEntitiesInParallelAsync<TEntity>(
List<IRestRequest> requests) where TEntity : IEntity
{
var tasks = new List<Task<TEntity>>();
foreach (var chunk in requests.Chunk(100))
{
tasks.AddRange(chunk.Select(request => GetAsync<TEntity>(request)));
await Task.Delay(TimeSpan.FromSeconds(1.0));
}
return (await Task.WhenAll(tasks)).ToList();
}
It is assumed that the time required to launch an asynchronous operation (to invoke the GetAsync method) is negligible.
This approach has the inherent disadvantage that in case of an exception, the failure will not be propagated before all operations are completed. For comparison the Parallel.ForEachAsync method stops invoking the async delegate and completes ASAP, after the first failure is detected.

How do I prevent the creation of multiple background processes working on the same request?

I have a simple WebApi controller method that's purpose is to trigger some background processing and return a 202 Accepted response immediately (without necessarily having completed the background processing as is consistent with a 202 response.)
public async Task<IHttpActionResult> DoSomething(string id)
{
HostingEnvironment.QueueBackgroundWorkItem(async ct =>
{
//Do work
}
return ResponseMessage(new HttpResponseMessage(HttpStatusCode.Accepted));
}
However, I want to be able to prevent multiple requests to the same endpoint with the same id from triggering multiple instances of the same background processing simultaneously.
Effectively, if two requests with the same id's were to be made at the same time (or near enough), the first one would do the processing for that id and the second one would be able to identify this and take action accordingly and not duplicate the work that's already being done.
I'd like to avoid databases/persistent storage if at all possible and I'm aware of the risks of running async tasks within an IIS worker process - for the sake of the argument, the background processing is not critical.
How can I go about doing this? Any help would be much appreciated.
You'll need some kind of storage shared between all your possible workers. This could be, for example:
A static variable. Probably the easiest to implement, but has limitations when the application does not run in only one AppDomain (especially important if you want to scale). So probably not the best way
An SQL Database: Probably the most common one. If your application already uses one, I'd say go for this route.
A No-SQL database, for example a key-value store. Might be an alternative if your application does not use a SQL database yet.
Some external component such a workflow management tool
EDIT:
Example using ConcurrentDictionary (per request of the thread starter - I stil think using a (possibly NoSQL) database would be the most elegant way) - actually you could just put Tasks into the dictionary:
private ConcurrentDictionary<string, Task<SomeType>> _cache = new //...
var task = _cache.GetOrAdd("<Key>", key => Task.Run(() => /*do some work*/));
if (task.IsCompleted)
/*result ready*/;
else
/*have to wait*/;

Calling static async methods from ASP.net project

I'm wondering will this scenario be thread safe and are there issues that I'm not currently seeing:
From ASP.net controller I call non-static method from non-static class (this class is in another project, and class is injected into controller).
This method (which is non-static) does some work and calls some other static method passing it userId
Finally static method does some work (for which userId is needed)
I believe this approach is thread safe, and that everything will be done properly if two users call this method at the same time (let's say in same nanosecond). Am I correct or completely wrong ? If I am wrong what would be correct way of using static methods within ASP.net project ?
EDIT
Here is code :)
This is call from the controller:
await _workoutService.DeleteWorkoutByIdAsync(AzureRedisFeedsConnectionMultiplexer.GetRedisDatabase(),AzureRedisLeaderBoardConnectionMultiplexer.GetRedisDatabase(), workout.Id, userId);
Here how DeleteWorkoutByIdAsync looks like:
public async Task<bool> DeleteWorkoutByIdAsync(IDatabase redisDb,IDatabase redisLeaderBoardDb, Guid id, string userId)
{
using (var databaseContext = new DatabaseContext())
{
var workout = await databaseContext.Trenings.FindAsync(id);
if (workout == null)
{
return false;
}
databaseContext.Trenings.Remove(workout);
await databaseContext.SaveChangesAsync();
await RedisFeedService.StaticDeleteFeedItemFromFeedsAsync(redisDb,redisLeaderBoardDb, userId, workout.TreningId.ToString());
}
return true;
}
As you can notice DeleteWorkoutByIdAsync calls static method StaticDeleteFeedItemFromFeedsAsync which looks like this:
public static async Task StaticDeleteFeedItemFromFeedsAsync(IDatabase redisDb,IDatabase redisLeaderBoardDd, string userId, string workoutId)
{
var deleteTasks = new List<Task>();
var feedAllRedisVals = await redisDb.ListRangeAsync("FeedAllWorkouts:" + userId);
DeleteItemFromRedisAsync(redisDb, feedAllRedisVals, "FeedAllWorkouts:" + userId, workoutId, ref deleteTasks);
await Task.WhenAll(deleteTasks);
}
And here is static method DeleteItemFromRedisAsync which is called in StaticDeleteFeedItemFromFeedsAsync:
private static void DeleteItemFromRedisAsync(IDatabase redisDb, RedisValue [] feed, string redisKey, string workoutId, ref List<Task> deleteTasks)
{
var itemToRemove = "";
foreach (var f in feed)
{
if (f.ToString().Contains(workoutId))
{
itemToRemove = f;
break;
}
}
if (!string.IsNullOrEmpty(itemToRemove))
{
deleteTasks.Add(redisDb.ListRemoveAsync(redisKey, itemToRemove));
}
}
"Thread safe" isn't a standalone term. Thread Safe in the the face of what? What kind of concurrent modifications are you expecting here?
Let's look at a few aspects here:
Your own mutable shared state: You have no shared state whatsoever in this code; so it's automatically thread safe.
Indirect shared state: DatabaseContext. This looks like an sql database, and those tend to be thread "safe", but what exactly that means depends on the database in question. For example, you're removing a Trenings row, and if some other thread also removes the same row, you're likely to get a (safe) concurrency violation exception. And depending on isolation level, you may get concurrency violation exceptions even for other certain mutations of "Trenings". At worst that means one failed request, but the database itself won't corrupt.
Redis is essentially single-threaded, so all operations are serialized and in that sense "thread safe" (which might not buy you much). Your delete code gets a set of keys, then deletes at most one of those. If two or more threads simultaneously attempt to delete the same key, it is possible that one thread will attempt to delete a non-existing key, and that may be unexpected to you (but it won't cause DB corruption).
Implicit consistency between redis+sql: It looks like you're using guids, so the chances of unrelated things clashing are small. Your example only contains a delete operation (which is likely no to cause consistency issues), so it's hard to speculate whether under all other circumstances redis and the sql database will stay consistent. In general, if your IDs are never reused, you're probably safe - but keeping two databases in sync is a hard problem, and you're quite likely to make a mistake somewhere.
However, your code seems excessively complicated for what it's doing. I'd recommend you simplify it dramatically if you want to be able to maintain this in the long run.
Don't use ref parameters unless you really know what you're doing (and it's not necessary here).
Don't mix up strings with other data types, so avoid ToString() where possible. Definitely avoid nasty tricks like Contains to check for key equality. You want your code to break when something unexpected happens, because code that "limps along" can be virtually impossible to debug (and you will write bugs).
Don't effectively return an array of tasks if the only thing you can really do is wait for all of them - might as well do that in the callee to simplify the API.
Don't use redis. It's probably just a distraction here - you already have another database, so it's very unlikely you need it here, except for performance reasons, and it's extremely premature to go adding whole extra database engines for a hypothetical performance problem. There's a reasonable chance that the extra overhead of requiring extra connections may make your code slower than if you had just one db, especially if you can't save many sql queries.
Note: this answer was posted before the OP amended their question to add their code, revealing that this is actually a question of whether async/await is thread-safe.
Static methods are not a problem in and of themselves. If a static method is self-contained and manages to do its job using local variables only, then it is perfectly thread safe.
Problems arise if the static method is not self-contained, (delegates to thread-unsafe code,) or if it manipulates static state in a non-thread safe fashion, i.e. accesses static variables for both read and write outside of a lock() clause.
For example, int.parse() and int.tryParse() are static, but perfectly thread safe. Imagine the horror if they were not thread-safe.
what you are doing here is synchronizing on a list (deleteTasks). If you do this i would recommend 1 of 2 things.
1) Either use thread safe collections
https://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx
2) Let your DeleteItemFromRedisAsync return a task and await it.
Although i think in this particular case i don't see any issues as soon as you refactor it and DeleteItemFromRedisAsync can get called multiple times in parallel then you will have issues. The reason being is that if multiple threads can modify your list of deleteTasks then you are not longer guaranteed you collect them all (https://msdn.microsoft.com/en-us/library/dd997373(v=vs.110).aspx if 2 threads do an "Add"/Add-to-the-end in a non-thread safe way at the same time then 1 of them is lost) so you might have missed a task when waiting for all of them to finish.
Also i would avoid mixing paradigms. Either use async/await or keep track of a collection of tasks and let methods add to that list. don't do both. This will help the maintainability of your code in the long run. (note, threads can still return a task, you collect those and then wait for all of them. but then the collecting method is responsible for any threading issues instead of it being hidden in the method that is being called)

Difference between Find and FindAsync

I am writing a very, very simple query which just gets a document from a collection according to its unique Id. After some frusteration (I am new to mongo and the async / await programming model), I figured this out:
IMongoCollection<TModel> collection = // ...
FindOptions<TModel> options = new FindOptions<TModel> { Limit = 1 };
IAsyncCursor<TModel> task = await collection.FindAsync(x => x.Id.Equals(id), options);
List<TModel> list = await task.ToListAsync();
TModel result = list.FirstOrDefault();
return result;
It works, great! But I keep seeing references to a "Find" method, and I worked this out:
IMongoCollection<TModel> collection = // ...
IFindFluent<TModel, TModel> findFluent = collection.Find(x => x.Id == id);
findFluent = findFluent.Limit(1);
TModel result = await findFluent.FirstOrDefaultAsync();
return result;
As it turns out, this too works, great!
I'm sure that there's some important reason that we have two different ways to achieve these results. What is the difference between these methodologies, and why should I choose one over the other?
The difference is in a syntax.
Find and FindAsync both allows to build asynchronous query with the same performance, only
FindAsync returns cursor which doesn't load all documents at once and provides you interface to retrieve documents one by one from DB cursor. It's helpful in case when query result is huge.
Find provides you more simple syntax through method ToListAsync where it inside retrieves documents from cursor and returns all documents at once.
Imagine that you execute this code in a web request, with invoking find method the thread of the request will be frozen until the database return results it's a synchron call, if it's a long database operation that takes seconds to complete, you will have one of the threads available to serve web request doing nothing simply waiting that database return the results, and wasting valuable resources (the number of threads in thread pool is limited).
With FindAsync, the thread of your web request will be free while is waiting the database for returning the results, this means that during the database call this thread is free to attend an another web request. When the database returns the result then the code continue execution.
For long operations like read/writes from file system, database operations, comunicate with another services, it's a good idea to use async calls. Because while you are waiting for the results, the threads are available for serve another web request. This is more scalable.
Take a look to this microsoft article https://msdn.microsoft.com/en-us/magazine/dn802603.aspx.

Can many instances of an async task share a reference to a concurrent collection and add items concurrently to it in C#?

I'm just beginning to learn C# threading and concurrent collections, and am not sure of the proper terminology to pose my question, so I'll describe briefly what I'm trying to do. My grasp of the subject is rudimentary at best at this point. Is my approach below even feasible as I've envisioned it?
I have 100,000 urls in a Concurrent collection that must be tested--is the link still good? I have another concurrent collection, initially empty, that will contain the subset of urls that an async request determines to have been moved (400, 404, etc errors).
I want to spawn as many of these async requests concurrently as my PC and our bandwidth will allow, and was going to start at 20 async-web-request-tasks per second and work my way up from there.
Would it work if a single async task handled both things: it would make the async request and then add the url to the BadUrls collection if it encountered a 4xx error? A new instance of that task would be spawned every 50ms:
class TestArgs args {
ConcurrentBag<UrlInfo> myCollection { get; set; }
System.Uri currentUrl { get; set; }
}
ConcurrentQueue<UrlInfo> Urls = new ConncurrentQueue<UrlInfo>();
// populate the Urls queue
<snip>
// initialize the bad urls collection
ConcurrentBag<UrlInfo> BadUrls = new ConcurrentBag<UrlInfo>();
// timer fires every 50ms, whereupon a new args object is created
// and the timer callback spawns a new task; an autoEvent would
// reset the timer and dispose of it when the queue was empty
void SpawnNewUrlTask(){
// if queue is empty then reset the timer
// otherwise:
TestArgs args = {
myCollection = BadUrls,
currentUrl = getNextUrl() // take an item from the queue
};
Task.Factory.StartNew( asyncWebRequestAndConcurrentCollectionUpdater, args);
}
public async Task asyncWebRequestAndConcurrentCollectionUpdater(TestArgs args)
{
//make the async web request
// add the url to the bad collection if appropriate.
}
Feasible? Way off?
The approach seems fine, but there are some issues with the specific code you've shown.
But before I get to that, there have been suggestions in the comments that Task Parallelism is the way to go. I think that's misguided. There's a common misconception that if you want to have lots of work going on in parallel, you necessarily need lots of threads. That's only true if the work is compute-bound. But the work you're doing will be IO bound - this code is going to spend the vast majority of its time waiting for responses. It will do very little computation. So in practice, even if it only used a single thread, your initial target of 20 requests per second doesn't seem like a workload that would cause a single CPU core to break into a sweat.
In short, a single thread can handle very high levels of concurrent IO. You only need multiple threads if you need parallel execution of code, and that doesn't look likely to be the case here, because there's so little work for the CPU in this particular job.
(This misconception predates await and async by years. In fact, it predates the TPL - see http://www.interact-sw.co.uk/iangblog/2004/09/23/threadless for a .NET 1.1 era illustration of how you can handle thousands of concurrent requests with a tiny number of threads. The underlying principles still apply today because Windows networking IO still basically works the same way.)
Not that there's anything particularly wrong with using multiple threads here, I'm just pointing out that it's a bit of a distraction.
Anyway, back to your code. This line is problematic:
Task.Factory.StartNew( asyncWebRequestAndConcurrentCollectionUpdater, args);
While you've not given us all your code, I can't see how that will be able to compile. The overloads of StartNew that accept two arguments require the first to be either an Action, an Action<object>, a Func<TResult>, or a Func<object,TResult>. In other words, it has to be a method that either takes no arguments, or accepts a single argument of type object (and which may or may not return a value). Your 'asyncWebRequestAndConcurrentCollectionUpdater' takes an argument of type TestArgs.
But the fact that it doesn't compile isn't the main problem. That's easily fixed. (E.g., change it to Task.Factory.StartNew(() => asyncWebRequestAndConcurrentCollectionUpdater(args));) The real issue is what you're doing is a bit weird: you're using Task.StartNew to invoke a method that already returns a Task.
Task.StartNew is a handy way to take a synchronous method (i.e., one that doesn't return a Task) and run it in a non-blocking way. (It'll run on the thread pool.) But if you've got a method that already returns a Task, then you didn't really need to use Task.StartNew. The weirdness becomes more apparent if we look at what Task.StartNew returns (once you've fixed the compilation error):
Task<Task> t = Task.Factory.StartNew(
() => asyncWebRequestAndConcurrentCollectionUpdater(args));
That Task<Task> reveals what's happening. You've decided to wrap a method that was already asynchronous with a mechanism that is normally used to make non-asynchronous methods asynchronous. And so you've now got a Task that produces a Task.
One of the slightly surprising upshots of this is that if you were to wait for the task returned by StartNew to complete, the underlying work would not necessarily be done:
t.Wait(); // doesn't wait for asyncWebRequestAndConcurrentCollectionUpdater to finish!
All that will actually do is wait for asyncWebRequestAndConcurrentCollectionUpdater to return a Task. And since asyncWebRequestAndConcurrentCollectionUpdater is already an async method, it will return a task more or less immediately. (Specifically, it'll return a task the moment it performs an await that does not complete immediately.)
If you want to wait for the work you've kicked off to finish, you'll need to do this:
t.Result.Wait();
or, potentially more efficiently, this:
t.Unwrap().Wait();
That says: get me the Task that my async method returned, and then wait for that. This may not be usefully different from this much simpler code:
Task t = asyncWebRequestAndConcurrentCollectionUpdater("foo");
... maybe queue up some other tasks ...
t.Wait();
You may not have gained anything useful by introducing `Task.Factory.StartNew'.
I say "may" because there's an important qualification: it depends on the context in which you start the work. C# generates code which, by default, attempts to ensure that when an async method continues after an await, it does so in the same context in which the await was initially performed. E.g., if you're in a WPF app and you await while on the UI thread, when the code continues it will arrange to do so on the UI thread. (You can disable this with ConfigureAwait.)
So if you're in a situation in which the context is essentially serialized (either because it's single-threaded, as will be the case in a GUI app, or because it uses something resembling a rental model, e.g. the context of an particular ASP.NET request), it may actually be useful to kick an async task off via Task.Factory.StartNew because it enables you to escape the original context. However, you just made your life harder - tracking your tasks to completion is somewhat more complex. And you might have been able to achieve the same effect simply by using ConfigureAwait inside your async method.
And it may not matter anyway - if you're only attempting to manage 20 requests a second, the minimal amount of CPU effort required to do that means that you can probably manage it entirely adequately on one thread. (Also, if this is a console app, the default context will come into play, which uses the thread pool, so your tasks will be able to run multithreaded in any case.)
But to get back to your question, it seems entirely reasonable to me to have a single async method that picks a url off the queue, makes the request, examines the response, and if necessary, adds an entry to the bad url collection. And kicking the things off from a timer also seems reasonable - that will throttle the rate at which connections are attempted without getting bogged down with slow responses (e.g., if a load of requests end up attempting to talk to servers that are offline). It might be necessary to introduce a cap for the maximum number of requests in flight if you hit some pathological case where you end up with tens of thousands of URLs in a row all pointing to a server that isn't responding. (On a related note, you'll need to make sure that you're not going to hit any per-client connection limits with whichever HTTP API you're using - that might end up throttling the effective throughput.)
You will need to add some sort of completion handling - just kicking off asynchronous operations and not doing anything to handle the results is bad practice, because you can end up with exceptions that have nowhere to go. (In .NET 4.0, these used to terminate your process, but as of .NET 4.5, by default an unhandled exception from an asynchronous operation will simply be ignored!) And if you end up deciding that it is worth launching via Task.Factory.StartNew remember that you've ended up with an extra layer of wrapping, so you'll need to do something like myTask.Unwrap().ContinueWith(...) to handle it correctly.
Of course you can. Concurrent collections are called 'concurrent' because they can be used... concurrently by multiple threads, with some warranties about their behaviour.
A ConcurrentQueue will ensure that each element inserted in it is extracted exactly once (concurrent threads will never extract the same item by mistake, and once the queue is empty, all the items have been extracted by a thread).
EDIT: the only thing that could go wrong is that 50ms is not enough to complete the request, and so more and more tasks cumulate in the task queue. If that happens, your memory could get filled, but the thing would work anyway. So yes, it is feasible.
Anyway, I would like to underline the fact that a task is not a thread. Even if you create 100 tasks, the framework will decide how many of them will be actually executed concurrently.
If you want to have more control on the level of parallelism, you should use asynchronous requests.
In your comments, you wrote "async web request", but I can't understand if you wrote async just because it's on a different thread or because you intend to use the async API.
If you were using the async API, I'd expect to see some handler attached to the completion event, but I couldn't see it, so I assumed you're using synchronous requests issued from an asynchronous task.
If you're using asynchronous requests, then it's pointless to use tasks, just use the timer to issue the async requests, since they are already asynchronous.
When I say "asynchronous request" I'm referring to methods like WebRequest.GetResponseAsync and WebRequest.BeginGetResponse.
EDIT2: if you want to use asynchronous requests, then you can just make requests from the timer handler. The BeginGetResponse method takes two arguments. The first one is a callback procedure, that will be called to report the status of the request. You can pass the same procedure for all the requests. The second one is an user-provided object, which will store status about the request, you can use this argument to differentiate among different requests. You can even do it without the timer. Something like:
private readonly int desiredConcurrency = 20;
struct RequestData
{
public UrlInfo url;
public HttpWebRequest request;
}
/// Handles the completion of an asynchronous request
/// When a request has been completed,
/// tries to issue a new request to another url.
private void AsyncRequestHandler(IAsyncResult ar)
{
if (ar.IsCompleted)
{
RequestData data = (RequestData)ar.AsyncState;
HttpWebResponse resp = data.request.EndGetResponse(ar);
if (resp.StatusCode != 200)
{
BadUrls.Add(data.url);
}
//A request has been completed, try to start a new one
TryIssueRequest();
}
}
/// If urls is not empty, dequeues a url from it
/// and issues a new request to the extracted url.
private bool TryIssueRequest()
{
RequestData rd;
if (urls.TryDequeue(out rd.url))
{
rd.request = CreateRequestTo(rd.url); //TODO implement
rd.request.BeginGetResponse(AsyncRequestHandler, rd);
return true;
}
else
{
return false;
}
}
//Called by a button handler, or something like that
void StartTheRequests()
{
for (int requestCount = 0; requestCount < desiredConcurrency; ++requestCount)
{
if (!TryIssueRequest()) break;
}
}

Categories

Resources