How to improve the performance of aync? - c#

I'm working on a problem where I have to delete records using a service call. The issue is that I have a for each loop where i have multiple await operations.This is making the operation take lot of time and performance is lacking
foreach(var a in list<long>b)
{
await _serviceresolver().DeleteOperationAsync(id,a)
}

The issue is that I have a for each loop where i have multiple await operations.
This is making the operation take lot of time and performance is lacking
The number one solution is to reduce the number of calls. This is often called "chunky" over "chatty". So if your service supports some kind of bulk-delete operation, then expose it in your service type and then you can just do:
await _serviceresolver().BulkDeleteOperationAsync(id, b);
But if that isn't possible, then you can at least use asynchronous concurrency. This is quite different from parallelism; you don't want to use Parallel or PLINQ.
var service = _serviceresolver();
var tasks = b.Select(a => service.DeleteOperationAsync(id, a)).ToList();
await Task.WhenAll(tasks);

I do not know what code is behind this DeleteOperationAsync, but for sure async/await isn't designed to speed things up. It was designated to "spare" threads (colloquially speaking)
The best would be to change the method to take as a parameter the whole list of ids - instead of taking and sending just one id.
And then to perform this async/await heavy operation only once for all of the ids.
If that is not possible, you could just run it in parallel using TPL (but it is ready the worst-case scenario - really:) )
Parallel.ForEach(listOfIdsToDelete,
async idToDelete => await _serviceresolver().DeleteOperationAsync(id,idToDelete)
);

You're waiting for each async operation to finish right now. If you can fire them all off concurrently, you can just call them without the await, or if you need to know when they finish, you can just fire them all off and then wait for them all to finish by tracking the tasks in a list:
List<Task> tasks = new List<Task>();
foreach (var a in List<long> b)
tasks.Add(_serviceresolveer().DeleteOperationAsync(id, a));
await Task.WhenAll(tasks);

You can use PLINQ (to leverage of all the processors of your machine) and the Task.WhenAll method (to no freeze the calling thread). In code, resulting something like this:
class Program {
static async Task Main(string[] args) {
var list = new List<long> {
4, 3, 2
};
var service = new Service();
var response =
from item in list.AsParallel()
select service.DeleteOperationAsync(item);
await Task.WhenAll(response);
}
}
public class Service {
public async Task DeleteOperationAsync(long value) {
await Task.Delay(2000);
Console.WriteLine($"Finished... {value}");
}
}

Related

Multi-threading in a foreach loop

I have read a few stackoverflow threads about multi-threading in a foreach loop, but I am not sure I am understanding and using it right.
I have tried multiple scenarios, but I am not seeing much increase in performance.
Here is what I believe runs Asynchronous tasks, but running synchronously in the loop using a single thread:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
}
stopWatch.Stop();
Here is what I hoped to be running Asynchronously (still using a single thread) - I would have expected some speed improvement already here:
List<Task<ExchangeTicker>> exchTkrs = new List<Task<ExchangeTicker>>();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
exchTkrs.Add(selectedApi.GetTickerAsync(symbol));
}
}
ExchangeTicker[] retTickers = await Task.WhenAll(exchTkrs);
stopWatch.Stop();
Here is what I would have hoped to run Asynchronously in Multi-thread:
stopWatch.Start();
Parallel.ForEach(selectedApis, async (IExchangeAPI selectedApi) =>
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
});
stopWatch.Stop();
Stop watch results interpreted as follows:
Console.WriteLine("Time elapsed (ns): {0}", stopWatch.Elapsed.TotalMilliseconds * 1000000);
Console outputs:
Time elapsed (ns): 4183308100
Time elapsed (ns): 4183946299.9999995
Time elapsed (ns): 4188032599.9999995
Now, the speed improvement looks minuscule. Am I doing something wrong or is that more or less what I should be expecting? I suppose writing to files would be a better to check that.
Would you mind also confirming I am interpreting the different use cases correctly?
Finally, using a foreach loop in order to get the ticker from multiple platforms in parallel may not be the best approach. Suggestions on how to improve this would be welcome.
EDIT
Note that I am using the ExchangeSharp code base that you can find here
Here is what the GerTickerAsync() method looks like:
public virtual async Task<ExchangeTicker> GetTickerAsync(string marketSymbol)
{
marketSymbol = NormalizeMarketSymbol(marketSymbol);
return await Cache.CacheMethod(MethodCachePolicy, async () => await OnGetTickerAsync(marketSymbol), nameof(GetTickerAsync), nameof(marketSymbol), marketSymbol);
}
For the Kraken API, you then have:
protected override async Task<ExchangeTicker> OnGetTickerAsync(string marketSymbol)
{
JToken apiTickers = await MakeJsonRequestAsync<JToken>("/0/public/Ticker", null, new Dictionary<string, object> { { "pair", NormalizeMarketSymbol(marketSymbol) } });
JToken ticker = apiTickers[marketSymbol];
return await ConvertToExchangeTickerAsync(marketSymbol, ticker);
}
And the Caching method:
public static async Task<T> CacheMethod<T>(this ICache cache, Dictionary<string, TimeSpan> methodCachePolicy, Func<Task<T>> method, params object?[] arguments) where T : class
{
await new SynchronizationContextRemover();
methodCachePolicy.ThrowIfNull(nameof(methodCachePolicy));
if (arguments.Length % 2 == 0)
{
throw new ArgumentException("Must pass function name and then name and value of each argument");
}
string methodName = (arguments[0] ?? string.Empty).ToStringInvariant();
string cacheKey = methodName;
for (int i = 1; i < arguments.Length;)
{
cacheKey += "|" + (arguments[i++] ?? string.Empty).ToStringInvariant() + "=" + (arguments[i++] ?? string.Empty).ToStringInvariant("(null)");
}
if (methodCachePolicy.TryGetValue(methodName, out TimeSpan cacheTime))
{
return (await cache.Get<T>(cacheKey, async () =>
{
T innerResult = await method();
return new CachedItem<T>(innerResult, CryptoUtility.UtcNow.Add(cacheTime));
})).Value;
}
else
{
return await method();
}
}
At first it should be pointed out that what you are trying to achieve is performance, not asynchrony. And you are trying to achieve it by running multiple operations concurrently, not in parallel. To keep the explanation simple I'll use a simplified version of your code, and I'll assume that each operation is a direct web request, without an intermediate caching layer, and with no dependencies to values existing in dictionaries.
foreach (var symbol in selectedSymbols)
{
var ticker = await selectedApi.GetTickerAsync(symbol);
}
The above code runs the operations sequentially. Each operation starts after the completion of the previous one.
var tasks = new List<Task<ExchangeTicker>>();
foreach (var symbol in selectedSymbols)
{
tasks.Add(selectedApi.GetTickerAsync(symbol));
}
var tickers = await Task.WhenAll(tasks);
The above code runs the operations concurrently. All operations start at once. The total duration is expected to be the duration of the longest running operation.
Parallel.ForEach(selectedSymbols, async symbol =>
{
var ticker = await selectedApi.GetTickerAsync(symbol);
});
The above code runs the operations concurrently, like the previous version with Task.WhenAll. It offers no advantage, while having the huge disadvantage that you no longer have a way to await the operations to complete. The Parallel.ForEach method will return immediately after launching the operations, because the Parallel class doesn't understand async delegates (it does not accept Func<Task> lambdas). Essentially there are a bunch of async void lambdas in there, that are running out of control, and in case of an exception they will bring down the process.
So the correct way to run the operations concurrently is the second way, using a list of tasks and the Task.WhenAll. Since you've already measured this method and haven't observed any performance improvements, I am assuming that there something else that serializes the concurrent operations. It could be something like a SemaphoreSlim hidden somewhere in your code, or some mechanism on the server side that throttles your requests. You'll have to investigate further to find where and why the throttling happens.
In general, when you do not see an increase by multi threading, it is because your task is not CPU limited or large enough to offset the overhead.
In your example, i.e.:
selectedApi.GetTickerAsync(symbol);
This can hae 2 reasons:
1: Looking up the ticker is brutally fast and it should not be an async to start with. I.e. when you look it up in a dictionary.
2: This is running via a http connection where the runtime is LIMITING THE NUMBER OF CONCURRENT CALLS. Regardless how many tasks you open, it will not use more than 4 at the same time.
Oh, and 3: you think async is using threads. It is not. It is particularly not the case in a codel ike this:
await selectedApi.GetTickerAsync(symbol);
Where you basically IMMEDIATELY WAIT FOR THE RESULT. There is no multi threading involved here at all.
foreach (IExchangeAPI selectedApi in selectedApis) {
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
} }
This is linear non threaded code using an async interface to not block the current thread while the (likely expensive IO) operation is in place. It starts one, THEN WAITS FOR THE RESULT. No 2 queries ever start at the same time.
If you want a possible (just as example) more scalable way:
In the foreach, do not await but add the task to a list of tasks.
Then start await once all the tasks have started. Likein a 2nd loop.
WAY not perfect, but at least the runtime has a CHANCE to do multiple lookups at the same time. Your await makes sure that you essentially run single threaded code, except async, so your thread goes back into the pool (and is not waiting for results), increasing your scalability - an item possibly not relevant in this case and definitely not measured in your test.

Running async methods in parallel

I've got an async method, GetExpensiveThing(), which performs some expensive I/O work. This is how I am using it:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = await GetExpensiveThing();
var second = await GetExpensiveThing();
return new List<Thing>() { first, second };
}
But since it's an expensive method, I want to execute these calls in in parallel. I would have thought moving the awaits would have solved this:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = GetExpensiveThing();
var second = GetExpensiveThing();
return new List<Thing>() { await first, await second };
}
That didn't work, so I wrapped them in some tasks and this works:
// Parallel execution
public async Task<List<Thing>> GetThings()
{
var first = Task.Run(() =>
{
return GetExpensiveThing();
});
var second = Task.Run(() =>
{
return GetExpensiveThing();
});
return new List<Thing>() { first.Result, second.Result };
}
I even tried playing around with awaits and async in and around the tasks, but it got really confusing and I had no luck.
Is there a better to run async methods in parallel, or are tasks a good approach?
Is there a better to run async methods in parallel, or are tasks a good approach?
Yes, the "best" approach is to utilize the Task.WhenAll method. However, your second approach should have ran in parallel. I have created a .NET Fiddle, this should help shed some light. Your second approach should actually be running in parallel. My fiddle proves this!
Consider the following:
public Task<Thing[]> GetThingsAsync()
{
var first = GetExpensiveThingAsync();
var second = GetExpensiveThingAsync();
return Task.WhenAll(first, second);
}
Note
It is preferred to use the "Async" suffix, instead of GetThings and GetExpensiveThing - we should have GetThingsAsync and GetExpensiveThingAsync respectively - source.
Task.WhenAll() has a tendency to become unperformant with large scale/amount of tasks firing simultaneously - without moderation/throttling.
If you are doing a lot of tasks in a list and wanting to await the final outcome, then I propose using a partition with a limit on the degree of parallelism.
I have modified Stephen Toub's blog elegant approach to modern LINQ:
public static Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> funcBody, int maxDoP = 4)
{
async Task AwaitPartition(IEnumerator<T> partition)
{
using (partition)
{
while (partition.MoveNext())
{
await Task.Yield(); // prevents a sync/hot thread hangup
await funcBody(partition.Current);
}
}
}
return Task.WhenAll(
Partitioner
.Create(source)
.GetPartitions(maxDoP)
.AsParallel()
.Select(p => AwaitPartition(p)));
}
How it works is simple, take an IEnumerable - dissect it into evenish partitions and the fire a function/method against each element, in each partition, at the same time. No more than one element in each partition at anyone time, but n Tasks in n partitions.
Extension Usage:
await myList.ParallelForEachAsync(myFunc, Environment.ProcessorCount);
Edit:
I now keep some overloads in a repository on Github if you need more options. It's in a NuGet too for NetStandard.
Edit 2: Thanks to comments from Theodor below, I was able to mitigate poorly written Async Tasks from blocking parallelism by using await Task.Yield();.
You can your the Task.WhenAll, which returns when all depending tasks are done
Check this question here for reference
If GetExpensiveThing is properly asynchronous (meaning it doesn't do any IO or CPU work synchronously), your second solution of invoking both methods and then awaiting the results should've worked. You could've also used Task.WhenAll.
However, if it isn't, you may get better results by posting each task to the thread-pool and using the Task.WhenAll combinator, e.g.:
public Task<IList<Thing>> GetThings() =>
Task.WhenAll(Task.Run(() => GetExpensiveThing()), Task.Run(() => GetExpensiveThing()));
(Note I changed the return type to IList to avoid awaits altogether.)
You should avoid using the Result property. It causes the caller thread to block and wait for the task to complete, unlike await or Task.WhenAll which use continuations.

How to correctly queue up tasks to run in C#

I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}

C# Parallel - Adding items to the collection being iterated over, or equivalent?

Right now, I've got a C# program that performs the following steps on a recurring basis:
Grab current list of tasks from the database
Using Parallel.ForEach(), do work for each task
However, some of these tasks are very long-running. This delays the processing of other pending tasks because we only look for new ones at the start of the program.
Now, I know that modifying the collection being iterated over isn't possible (right?), but is there some equivalent functionality in the C# Parallel framework that would allow me to add work to the list while also processing items in the list?
Generally speaking, you're right that modifying a collection while iterating it is not allowed. But there are other approaches you could be using:
Use ActionBlock<T> from TPL Dataflow. The code could look something like:
var actionBlock = new ActionBlock<MyTask>(
task => DoWorkForTask(task),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
while (true)
{
var tasks = GrabCurrentListOfTasks();
foreach (var task in tasks)
{
actionBlock.Post(task);
await Task.Delay(someShortDelay);
// or use Thread.Sleep() if you don't want to use async
}
}
Use BlockingCollection<T>, which can be modified while consuming items from it, along with GetConsumingParititioner() from ParallelExtensionsExtras to make it work with Parallel.ForEach():
var collection = new BlockingCollection<MyTask>();
Task.Run(async () =>
{
while (true)
{
var tasks = GrabCurrentListOfTasks();
foreach (var task in tasks)
{
collection.Add(task);
await Task.Delay(someShortDelay);
}
}
});
Parallel.ForEach(collection.GetConsumingPartitioner(), task => DoWorkForTask(task));
Here is an example of an approach you could try. I think you want to get away from Parallel.ForEaching and do something with asynchronous programming instead because you need to retrieve results as they finish, rather than in discrete chunks that could conceivably contain both long running tasks and tasks that finish very quickly.
This approach uses a simple sequential loop to retrieve results from a list of asynchronous tasks. In this case, you should be safe to use a simple non-thread safe mutable list because all of the mutation of the list happens sequentially in the same thread.
Note that this approach uses Task.WhenAny in a loop which isn't very efficient for large task lists and you should consider an alternative approach in that case. (See this blog: http://blogs.msdn.com/b/pfxteam/archive/2012/08/02/processing-tasks-as-they-complete.aspx)
This example is based on: https://msdn.microsoft.com/en-GB/library/jj155756.aspx
private async Task<ProcessResult> processTask(ProcessTask task)
{
// do something intensive with data
}
private IEnumerable<ProcessTask> GetOutstandingTasks()
{
// retreive some tasks from db
}
private void ProcessAllData()
{
List<Task<ProcessResult>> taskQueue =
GetOutstandingTasks()
.Select(tsk => processTask(tsk))
.ToList(); // grab initial task queue
while(taskQueue.Any()) // iterate while tasks need completing
{
Task<ProcessResult> firstFinishedTask = await Task.WhenAny(taskQueue); // get first to finish
taskQueue.Remove(firstFinishedTask); // remove the one that finished
ProcessResult result = await firstFinishedTask; // get the result
// do something with task result
taskQueue.AddRange(GetOutstandingTasks().Select(tsk => processData(tsk))) // add more tasks that need performing
}
}

Am I Creating Multithreading Using Task.Run

I am looking to make a skeleton that will process concurrent downloads. I am not sure if I get that for free using Task.Run or not? My code below has a foreach loop, but what I really want is not to go one at a time but take 5 concurrently on their own thread and then continue processing whenever one gets done move it through more processing. Am I doing this now?
public async Task ProcessBatch()
{
CreateConnection();
List<string> items=new List<string>();
foreach (var item in items)
{
//do do stuff that possibly takes awhile
var result = await Task.Run(() => DownloadItems());
//do more processing on whatever item is done now
}
}
You're causing a different thread to be used by Task.Run(), but then you're causing your for loop to await that thread's return before it moves on to the next item, so you're not getting any concurrency out of this.
One alternative would be to use Task.Run() to produce a list of tasks first, and then await them all after they've been created.
var tasks = new List<Task<DownloadResult>>();
foreach (var item in items)
{
//do do stuff that possibly takes awhile
tasks.Add(Task.Run(() => DownloadItems()));
}
foreach (var task in tasks)
{
var result = await task;
//do more processing on whatever item is done now
}
However, for something this simple, I'd recommend using the .AsParallel() extension method that's part of the TPL.
items.AsParallel()
.Select(item => DownloadItems())
.ForAll(result => AdditionalProcessing(result));
Am I Creating Multithreading Using Task.Run
I am looking to make a skeleton that will process concurrent downloads.
It's important to distinguish parallelism from concurrency. Parallelism is about multiple threads. Concurrency can be accomplished by parallelism, but it can also be accomplished by asynchronous code.
In particular, threads are a great fit if you have CPU-bound work to do, whereas asynchrony excels at I/O. Since "downloads" strongly implies I/O, I'd recommend an asynchronous approach:
public async Task ProcessBatchAsync()
{
CreateConnection();
List<string> items = new List<string>();
var tasks = items.Select(item => ProcessItemAsync(item));
await Task.WhenAll(tasks);
}
private async Task ProcessItemAsync(string item)
{
var result = await DownloadItemAsync(item);
//do more processing
}
However, if DownloadItem isn't already asynchronous and you don't want to invest the time to make it so, then a multithreading approach (Parallel / Parallel LINQ, or Task.Run if you must) is an acceptable compromise for a UI application.

Categories

Resources