C# async and tasks - c#

I have a function that sends requests to search for information from a url. The search criteria is a list and the search iterates through each item and requests info from the url. To speed it up I divide the list into x subsets, and create a task for each subset. Then each subset sends 3 simultaneous requests, as follows:
This is the main entry point:
Search search = new Search();
await Task.Run(() => search.Start());
The Start function:
public void Search()
{
//Each subset is a List<T> ie where T is certain search criteria
//If originalList.Count = 30 and max items per subset is 10, then subsets will be 3 lists of 10 items each
var subsets = CreateSubsets(originalList);
List<Task> tasks = new List<Task>(subsets.Count);
for (int i = 0; i < subsets.Count; i++)
tasks.Add(Task.Factory.StartNew(() => SearchSubset(subsets[i]));
Task.WaitAll(tasks.ToArray());
foreach (Task task in tasks)
if (task != null)
task.Dispose();
}
private void SearchSubset(List<SearchCriteria> subset)
{
//Checking that i+1 and i+2 is within subset.Count-1 has been omitted
for (int i = 0; i < subset.Count; i+=3)
{
Task[] tasks = {Task.Factory.StartNew(() => SearchCriteria(subset[i])),
Task.Factory.StartNew(() => SearchCriteria(subset[i+1])),
Task.Factory.StartNew(() => SearchCriteria(subset[i+2]))};
//Wait & dispose like above
}
}
private void SearchCriteria(SearchCriteria criteria)
{
//SearchForCriteria uses WebRequest and WebResponse (callback)
//to query the url and return the response.content
var results = SearchForCriteria(criteria);
//process results...
}
The above code works fine and the search is quite fast. However, does the above code create too much overhead, and is there is more cleaner (or simpler) way to achieve the same results?

This is not the most efficient method, but if this is for a desktop application, efficiency isn't your primary concern anyway. So, unless you are actually seeing performance degradation from this code, you shouldn't change it.
That said, I would have approached this differently.
You're using the TPL to parallelize I/O-bound operations. You're using dynamic parallelism, the most complex kind; as Jeff Mercado commented, your code would be simpler and slightly more efficient if you used a higher-level parallelism abstraction such as Parallel or PLINQ).
However, any parallel approach is going to waste thread pool threads by blocking them. Since this is I/O-bound, I would recommend using async/await to make them concurrent.
If you want to do simple throttling, you can use SemaphoreSlim. I don't think you need to do throttling like this in addition to your subsets, but if you want an async equivalent to your existing code, it would look something like this:
public Task SearchAsync()
{
var subsets = CreateSubsets(originalList);
return Task.WhenAll(subsets.Select(subset => SearchSubsetAsync(subset)));
}
private Task SearchSubsetAsync(List<SearchCriteria> subset)
{
var semaphore = new SemaphoreSlim(3);
return Task.WhenAll(subset.Select(criteria => SearchCriteriaAsync(criteria, semaphore)));
}
private async Task SearchCriteriaAsync(SearchCriteria criteria, SemaphoreSlim semaphore)
{
await semaphore.WaitAsync();
try
{
// SearchForCriteriaAsync uses HttpClient (async).
var results = await SearchForCriteriaAsync(criteria);
// Consider returning results rather than processing them here.
}
finally
{
semaphore.Release();
}
}

Related

Parallel loop containing both async and synchronous

I've got a loop that needs to be run in parallel as each iteration is slow and processor intensive but I also need to call an async method as part of each iteration in the loop.
I've seen questions on how to handle an async method in the loop but not a combination of async and synchronous, which is what I've got.
My (simplified) code is as follows - I know this won't work properly due to the async action being passed to foreach.
protected IDictionary<int, ReportData> GetReportData()
{
var results = new ConcurrentDictionary<int, ReportData>();
Parallel.ForEach(requestData, async data =>
{
// process data synchronously
var processedData = ProcessData(data);
// get some data async
var reportRequest = await BuildRequestAsync(processedData);
// synchronous building
var report = reportRequest.BuildReport();
results.TryAdd(data.ReportId, report);
});
// This needs to be populated before returning
return results;
}
Is there any way to get execute the action in parallel when the action has to be async in order to await the single async call.
It's not a practical option to convert the synchronous functions to async.
I don't want to split the action up and have a Parallel.ForEach followed by the async calls with a WhenAll and another Parallel.ForEach as the speed of each stage can vary greatly between different iterations so splitting it would be inefficient as the faster ones would be waiting for the slower ones before continuing.
I did wonder if a PLINQ ForAll could be used instead of the Parallel.ForEach but have never used PLINQ and not sure if it would wait for all of the iterations to be completed before returning, i.e. would the Tasks still be running at the end of the process.
Is there any way to get execute the action in parallel when the action has to be async in order to await the single async call.
Yes, but you'll need to understand what Parallel gives you that you lose when you take alternative approaches. Specifically, Parallel will automatically determine the appropriate number of threads and adjust based on usage.
It's not a practical option to convert the synchronous functions to async.
For CPU-bound methods, you shouldn't convert them.
I don't want to split the action up and have a Parallel.ForEach followed by the async calls with a WhenAll and another Parallel.ForEach as the speed of each stage can vary greatly between different iterations so splitting it would be inefficient as the faster ones would be waiting for the slower ones before continuing.
The first recommendation I would make is to look into TPL Dataflow. It allows you to define a "pipeline" of sorts that keeps the data flowing through while limiting the concurrency at each stage.
I did wonder if a PLINQ ForAll could be used instead of the Parallel.ForEach
No. PLINQ is very similar to Parallel in how they work. There's a few differences over how aggressive they are at CPU utilization, and some API differences - e.g., if you have a collection of results coming out the end, PLINQ is usually cleaner than Parallel - but at a high-level view they're very similar. Both only work on synchronous code.
However, you could use a simple Task.Run with Task.WhenAll as such:
protected async Task<IDictionary<int, ReportData>> GetReportDataAsync()
{
var tasks = requestData.Select(async data => Task.Run(() =>
{
// process data synchronously
var processedData = ProcessData(data);
// get some data async
var reportRequest = await BuildRequestAsync(processedData);
// synchronous building
var report = reportRequest.BuildReport();
return (Key: data.ReportId, Value: report);
})).ToList();
var results = await Task.WhenAll(tasks);
return results.ToDictionary(x => x.Key, x => x.Value);
}
You may need to apply a concurrency limit (which Parallel would have done for you). In the asynchronous world, this would look like:
protected async Task<IDictionary<int, ReportData>> GetReportDataAsync()
{
var throttle = new SemaphoreSlim(10);
var tasks = requestData.Select(data => Task.Run(async () =>
{
await throttle.WaitAsync();
try
{
// process data synchronously
var processedData = ProcessData(data);
// get some data async
var reportRequest = await BuildRequestAsync(processedData);
// synchronous building
var report = reportRequest.BuildReport();
return (Key: data.ReportId, Value: report);
}
finally
{
throttle.Release();
}
})).ToList();
var results = await Task.WhenAll(tasks);
return results.ToDictionary(x => x.Key, x => x.Value);
}

Multi-threading in a foreach loop

I have read a few stackoverflow threads about multi-threading in a foreach loop, but I am not sure I am understanding and using it right.
I have tried multiple scenarios, but I am not seeing much increase in performance.
Here is what I believe runs Asynchronous tasks, but running synchronously in the loop using a single thread:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
}
stopWatch.Stop();
Here is what I hoped to be running Asynchronously (still using a single thread) - I would have expected some speed improvement already here:
List<Task<ExchangeTicker>> exchTkrs = new List<Task<ExchangeTicker>>();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
exchTkrs.Add(selectedApi.GetTickerAsync(symbol));
}
}
ExchangeTicker[] retTickers = await Task.WhenAll(exchTkrs);
stopWatch.Stop();
Here is what I would have hoped to run Asynchronously in Multi-thread:
stopWatch.Start();
Parallel.ForEach(selectedApis, async (IExchangeAPI selectedApi) =>
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
});
stopWatch.Stop();
Stop watch results interpreted as follows:
Console.WriteLine("Time elapsed (ns): {0}", stopWatch.Elapsed.TotalMilliseconds * 1000000);
Console outputs:
Time elapsed (ns): 4183308100
Time elapsed (ns): 4183946299.9999995
Time elapsed (ns): 4188032599.9999995
Now, the speed improvement looks minuscule. Am I doing something wrong or is that more or less what I should be expecting? I suppose writing to files would be a better to check that.
Would you mind also confirming I am interpreting the different use cases correctly?
Finally, using a foreach loop in order to get the ticker from multiple platforms in parallel may not be the best approach. Suggestions on how to improve this would be welcome.
EDIT
Note that I am using the ExchangeSharp code base that you can find here
Here is what the GerTickerAsync() method looks like:
public virtual async Task<ExchangeTicker> GetTickerAsync(string marketSymbol)
{
marketSymbol = NormalizeMarketSymbol(marketSymbol);
return await Cache.CacheMethod(MethodCachePolicy, async () => await OnGetTickerAsync(marketSymbol), nameof(GetTickerAsync), nameof(marketSymbol), marketSymbol);
}
For the Kraken API, you then have:
protected override async Task<ExchangeTicker> OnGetTickerAsync(string marketSymbol)
{
JToken apiTickers = await MakeJsonRequestAsync<JToken>("/0/public/Ticker", null, new Dictionary<string, object> { { "pair", NormalizeMarketSymbol(marketSymbol) } });
JToken ticker = apiTickers[marketSymbol];
return await ConvertToExchangeTickerAsync(marketSymbol, ticker);
}
And the Caching method:
public static async Task<T> CacheMethod<T>(this ICache cache, Dictionary<string, TimeSpan> methodCachePolicy, Func<Task<T>> method, params object?[] arguments) where T : class
{
await new SynchronizationContextRemover();
methodCachePolicy.ThrowIfNull(nameof(methodCachePolicy));
if (arguments.Length % 2 == 0)
{
throw new ArgumentException("Must pass function name and then name and value of each argument");
}
string methodName = (arguments[0] ?? string.Empty).ToStringInvariant();
string cacheKey = methodName;
for (int i = 1; i < arguments.Length;)
{
cacheKey += "|" + (arguments[i++] ?? string.Empty).ToStringInvariant() + "=" + (arguments[i++] ?? string.Empty).ToStringInvariant("(null)");
}
if (methodCachePolicy.TryGetValue(methodName, out TimeSpan cacheTime))
{
return (await cache.Get<T>(cacheKey, async () =>
{
T innerResult = await method();
return new CachedItem<T>(innerResult, CryptoUtility.UtcNow.Add(cacheTime));
})).Value;
}
else
{
return await method();
}
}
At first it should be pointed out that what you are trying to achieve is performance, not asynchrony. And you are trying to achieve it by running multiple operations concurrently, not in parallel. To keep the explanation simple I'll use a simplified version of your code, and I'll assume that each operation is a direct web request, without an intermediate caching layer, and with no dependencies to values existing in dictionaries.
foreach (var symbol in selectedSymbols)
{
var ticker = await selectedApi.GetTickerAsync(symbol);
}
The above code runs the operations sequentially. Each operation starts after the completion of the previous one.
var tasks = new List<Task<ExchangeTicker>>();
foreach (var symbol in selectedSymbols)
{
tasks.Add(selectedApi.GetTickerAsync(symbol));
}
var tickers = await Task.WhenAll(tasks);
The above code runs the operations concurrently. All operations start at once. The total duration is expected to be the duration of the longest running operation.
Parallel.ForEach(selectedSymbols, async symbol =>
{
var ticker = await selectedApi.GetTickerAsync(symbol);
});
The above code runs the operations concurrently, like the previous version with Task.WhenAll. It offers no advantage, while having the huge disadvantage that you no longer have a way to await the operations to complete. The Parallel.ForEach method will return immediately after launching the operations, because the Parallel class doesn't understand async delegates (it does not accept Func<Task> lambdas). Essentially there are a bunch of async void lambdas in there, that are running out of control, and in case of an exception they will bring down the process.
So the correct way to run the operations concurrently is the second way, using a list of tasks and the Task.WhenAll. Since you've already measured this method and haven't observed any performance improvements, I am assuming that there something else that serializes the concurrent operations. It could be something like a SemaphoreSlim hidden somewhere in your code, or some mechanism on the server side that throttles your requests. You'll have to investigate further to find where and why the throttling happens.
In general, when you do not see an increase by multi threading, it is because your task is not CPU limited or large enough to offset the overhead.
In your example, i.e.:
selectedApi.GetTickerAsync(symbol);
This can hae 2 reasons:
1: Looking up the ticker is brutally fast and it should not be an async to start with. I.e. when you look it up in a dictionary.
2: This is running via a http connection where the runtime is LIMITING THE NUMBER OF CONCURRENT CALLS. Regardless how many tasks you open, it will not use more than 4 at the same time.
Oh, and 3: you think async is using threads. It is not. It is particularly not the case in a codel ike this:
await selectedApi.GetTickerAsync(symbol);
Where you basically IMMEDIATELY WAIT FOR THE RESULT. There is no multi threading involved here at all.
foreach (IExchangeAPI selectedApi in selectedApis) {
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
} }
This is linear non threaded code using an async interface to not block the current thread while the (likely expensive IO) operation is in place. It starts one, THEN WAITS FOR THE RESULT. No 2 queries ever start at the same time.
If you want a possible (just as example) more scalable way:
In the foreach, do not await but add the task to a list of tasks.
Then start await once all the tasks have started. Likein a 2nd loop.
WAY not perfect, but at least the runtime has a CHANCE to do multiple lookups at the same time. Your await makes sure that you essentially run single threaded code, except async, so your thread goes back into the pool (and is not waiting for results), increasing your scalability - an item possibly not relevant in this case and definitely not measured in your test.

Dispatcher, Async/Await, Concurrent work

I have bunch of async methods, which I invoke from Dispatcher. The methods does not perform any work in the background, they just waits for some I/O operations, or wait for response from the webserver.
async Task FetchAsync()
{
// Prepare request in UI thread
var response = await new WebClient().DownloadDataTaskAsync();
// Process response in UI thread
}
now, I want to perform load tests, by calling multiple FetchAsync() in parallel with some max degree of parallelism.
My first attempt was using Paralell.Foreach(), but id does not work well with async/await.
var option = new ParallelOptions {MaxDegreeOfParallelism = 10};
Parallel.ForEach(UnitsOfWork, uow => uow.FetchAsync().Wait());
I've been looking at reactive extensions, but I'm still not able to take advantage of Dispatcher and async/await.
My goal is to not create separate thread for each FetchAsync(). Can you give me some hints how to do it?
Just call Fetchasync without awaiting each call and then use Task.WhenAll to await all of them together.
var tasks = new List<Task>();
var max = 10;
for(int i = 0; i < max; i++)
{
tasks.Add(FetchAsync());
}
await Task.WhenAll(tasks);
Here is a generic reusable solution to your question that you can reuse not only with your FetchAsync method but for any async method that has the same signature. The api includes real time concurrent throttling support as well:
Parameters are self explanatory:
totalRequestCount: is how many async requests (FatchAsync calls) you want to do in total, async processor is the FetchAsync method itself, maxDegreeOfParallelism is the optional nullable parameter. If you want real time concurrent throttling with max number of concurrent async requests, set it, otherwise not.
public static Task ForEachAsync(
int totalRequestCount,
Func<Task> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
IEnumerable<Task> tasks;
if (maxDegreeOfParallelism != null)
{
SemaphoreSlim throttler = new SemaphoreSlim(maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value);
tasks = Enumerable.Range(0, totalRequestCount).Select(async requestNumber =>
{
await throttler.WaitAsync();
try
{
await asyncProcessor().ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
}
else
{
tasks = Enumerable.Range(0, totalRequestCount).Select(requestNumber => asyncProcessor());
}
return Task.WhenAll(tasks);
}

How to correctly queue up tasks to run in C#

I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}

C# Parallel - Adding items to the collection being iterated over, or equivalent?

Right now, I've got a C# program that performs the following steps on a recurring basis:
Grab current list of tasks from the database
Using Parallel.ForEach(), do work for each task
However, some of these tasks are very long-running. This delays the processing of other pending tasks because we only look for new ones at the start of the program.
Now, I know that modifying the collection being iterated over isn't possible (right?), but is there some equivalent functionality in the C# Parallel framework that would allow me to add work to the list while also processing items in the list?
Generally speaking, you're right that modifying a collection while iterating it is not allowed. But there are other approaches you could be using:
Use ActionBlock<T> from TPL Dataflow. The code could look something like:
var actionBlock = new ActionBlock<MyTask>(
task => DoWorkForTask(task),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
while (true)
{
var tasks = GrabCurrentListOfTasks();
foreach (var task in tasks)
{
actionBlock.Post(task);
await Task.Delay(someShortDelay);
// or use Thread.Sleep() if you don't want to use async
}
}
Use BlockingCollection<T>, which can be modified while consuming items from it, along with GetConsumingParititioner() from ParallelExtensionsExtras to make it work with Parallel.ForEach():
var collection = new BlockingCollection<MyTask>();
Task.Run(async () =>
{
while (true)
{
var tasks = GrabCurrentListOfTasks();
foreach (var task in tasks)
{
collection.Add(task);
await Task.Delay(someShortDelay);
}
}
});
Parallel.ForEach(collection.GetConsumingPartitioner(), task => DoWorkForTask(task));
Here is an example of an approach you could try. I think you want to get away from Parallel.ForEaching and do something with asynchronous programming instead because you need to retrieve results as they finish, rather than in discrete chunks that could conceivably contain both long running tasks and tasks that finish very quickly.
This approach uses a simple sequential loop to retrieve results from a list of asynchronous tasks. In this case, you should be safe to use a simple non-thread safe mutable list because all of the mutation of the list happens sequentially in the same thread.
Note that this approach uses Task.WhenAny in a loop which isn't very efficient for large task lists and you should consider an alternative approach in that case. (See this blog: http://blogs.msdn.com/b/pfxteam/archive/2012/08/02/processing-tasks-as-they-complete.aspx)
This example is based on: https://msdn.microsoft.com/en-GB/library/jj155756.aspx
private async Task<ProcessResult> processTask(ProcessTask task)
{
// do something intensive with data
}
private IEnumerable<ProcessTask> GetOutstandingTasks()
{
// retreive some tasks from db
}
private void ProcessAllData()
{
List<Task<ProcessResult>> taskQueue =
GetOutstandingTasks()
.Select(tsk => processTask(tsk))
.ToList(); // grab initial task queue
while(taskQueue.Any()) // iterate while tasks need completing
{
Task<ProcessResult> firstFinishedTask = await Task.WhenAny(taskQueue); // get first to finish
taskQueue.Remove(firstFinishedTask); // remove the one that finished
ProcessResult result = await firstFinishedTask; // get the result
// do something with task result
taskQueue.AddRange(GetOutstandingTasks().Select(tsk => processData(tsk))) // add more tasks that need performing
}
}

Categories

Resources