Parllel Foreach vs Async Foreach for DB calls - c#

I have scenario where I have to call a same database stored procedure for each item in a list. I don't want to use foreach as it will degrade the performance, Which will be best option parllel foeach or async/await foreach?
Below is sample code
public Task<List<object>> CallMethod()
{
foreach(var obj in input)
{
List.Add(await Task.Run(() =>CallDatabase(obj)));
}
return List;
}
public CallDatabase(object)
{
//Code to call DB
}
All the objects received from the DB are independents.
After few research I am planning to use async calls, will this improve performance?

I'm not sure it's going to make any difference either way. I assume you still need to wait for all results to be loaded, in which case async does not help, and your bottleneck will most likely be network I/O and server processing rather than local CPU, so parallelism will not help either.
That said, if you don't need the results of the query and don't care if there is an error, then async may help in a fire-and-forget scenario.
Your biggest bang for your buck may be to try and get multiple results in one query rather than firing off a bunch of individual queries.

Defintiely Async, as Parallel.ForEach is meant for compute intensive operations. It spreads over available core resources and orchestrates them accordingly. Async, instead, is meant for just this kind of operations: make a request to the service, go ahead and receive notification once resources requested before are available.

This is mostly comment to D Stanley's answer - switching to parallel/async code unlikely to improve performance.
If your main concern is responsiveness/scalability - async would be better as generally DB access is IO-bound operation. It also allows to pick between sequential and parallel processing (i.e. in case your DB layer does not support concurrent requests on same connection for some reason). Additionally with async it is easier to get synchronization right for updating UI/request if you use default synchronization context.
Sequential: it will run about as long as non-async solution, but the thread will be free to perform other activities at the same time (for UI applications like WinForms/WPF) or process requests (ASP.Net).
async public Task<ResultType> CallMethodAsync()
{
foreach(var obj in input)
{
var singleResult = await CallDatabaseAsync(obj);
// combine results if needed
}
// return combined results
}
Parallel: will run all requests at the same time, will likely be faster than sequential solution.
async public Task<ResultType> CallMethodAsync()
{
List<Task<SingleResultType>> tasks = new List<Task<SingleResultType>>();
foreach(var obj in input)
{
tasks.Add(await CallDatabaseAsync(obj));
}
await Task.WhenAll(tasks);
foreach(SingleResultType result in tasks.Select(t=>t.Result))
{
// combine results if needed
}
// return combined results
}
Note that async generally requires all your code to be asynchronous - so if you converting just small piece of code to run in parallel Parallel.ForEach may be easier solution as it does not involve dealing with await vs Task.Wait - Deadlock?.

I have implemented a solution; not sure if this async or will improve performance a bit or not, I am very new to async so don't have much idea.
public static async Task<List<Response>> Execute(Request objInput)
{
List<Response> objList = new List<Response>();
foreach (object obj in objInput.objs)
{
objList.Add(await Task.Run(() =>GetDataFromDB(obj)));
}
return objList;
}
private static object GetDataFromDB(object obj)
{
//Call DB and build the object
}
If this is not the correct way to implement async, then please offer other ideas.

Related

async or not async method

I am writing a blazor server web application.
This application works with a database and Entity Framework.
Here is a method i've wrote:
private void MyMethod()
{
var query = mydbcontext.mytable1.Where(t => ...);
foreach (var item in query)
{
...
}
}
As you can see this method is not declared with "async Task". So she will be called without "await" keyword.
I can declare it with "async Task" and call it with "await". It works but it gives me a warning because i have no async call inside.
Let's suppose i decide to not declare it with "async Task" in order to avoid the warning.
What will happen if i need to change something in my function later which needs an Async call (For example this):
private async Task MyMethod()
{
var query = mydbcontext.mytable1.Where(t => ...);
foreach (var item in query)
{
...
}
var countresult = await query.CountAsync();
}
I will need to search all calls of MyMethod and add "await" on each of this calls.
To prevent that, I am wondering if i should not declare all my methods with "async Task". But this is ugly because i will get warnings.
What is the best practice ?
And is there a way to do this loop as ASync ?
private async Task MyMethod()
{
var query = mydbcontext.mytable1.Where(t => ...);
await foreachAsync (var item in query)
{
...
}
var countresult = await query.CountAsync();
}
Thanks
... I am wondering if i should not declare all my methods with "async Task".
If there is no I/O being performed by the method then there is no need to make the method return a Task or Task<T> and by extension no need to use the async keyword.
If the method does do operations that use I/O (like a database call) then you have a couple of options and what you choose depends on the context your code is being used in.
When to use Asynchronous Operations
If the context can benefit from using asynchronous I/O operations, like the context of a web application, then your methods should return Task or Task<T>. The benefit of using asynchronous I/O is that threads are not locked waiting on I/O to complete which is important in applications that need to service many simultaneous requests that are thread resource intensive (like a web application). A windows forms or wpf application is also applicable because it allows for easy coding so the main thread can resume operations while I/O completes (the ui won't appear frozen).
When not to use Asynchronous Operations
If the context cannot benefit from using asynchronous I/O operations then make all your methods synchronous and do not return Task or Task<T>. An example of this would be a console application or a windows service application. You mentioned a Blazor web application in your question so this scenario does not appear to apply to you.
Provide both Asynchronous Operations and Synchronous Operations
If you want to allow the caller to choose then you can include 2 variations for each call that executes an I/O operation. A good example of this are the extensions written for Entity Framework. Almost all of the operations can be performed asynchronously or synchronously. Examples include Single/SingleAsync, All/AllAsync, ToArray/ToArrayAsync, etc.
This approach will also allow you to extend a public method later by adding an async overload if you extend the operation to include an I/O call at a future point in time. There are ways to do this without creating too much code duplication. See Async Programming - Brownfield Async Development by Stephen Cleary for various techniques. My favorite approach is the The Flag Argument Hack.
Side note: If you do not need to consume the results of an I/O call in your method and there is only one I/O method you do not need to use the async/await keywords. You can return the result of the I/O operation directly.
private void MyMethod()
{
var query = mydbcontext.mytable1.Where(t => ...);
foreach (var item in query) // Here you will retrieve your data from the DB.
{
...
}
// If you're going to count it here, don't use the query. You already retrieved
// the data for your foreach.
}
It would be better to retrieve the data before the foreach. That way you will not call the DB twice for 1. the foreach, 2. the count. I.E.
private async Task MyMethod()
{
var data = await mydbcontext.mytable1.Where(t => ...).ToListAsync();
foreach (var item in data)
{
...
}
// Don't call the DB here. You already have the data, so just count the items in
// the list.
var countresult = await data.Count;
}

async await for a single task at a time

During my job interview, I was given a task to create an asynchronous wrapper over some long running method, processing some data, but to create it so that only a single task could be running at a time. I was not very familiar with async/await pattern, so I did my best and wrote some mixup between task-style and event-style, so that my wrapper was holding a task currently being executed, and exposing a public method and a public event. Method took data to process as an argument, and if there was no task running, started one, if there was a task, it enqueued the data. Task was raising the public event upon completion, which was sending process results to subscribers and starting a new task if there is any enqueued.
So, as you could probably guess by that point, I failed an interview, but now that I did some research, I am trying to figure out how to properly do it (it should have also been thread-safe, but I was too busy worrying about that). So my question is, if I have
public class SynchronousProcessor
{
public string Process(string arg)
{
Thread.Sleep(1500); //Work imitation
return someRandomString;
}
}
public class AsynchronousWrapper
{
SynchronousProcessor proc = new SynchronousProcessor();
public async Task<string> ProcessAsync(string arg)
{
return Task.Run(() => proc.Process(arg));
}
}
, or something like this, how do I properly handle calls to ProcessAsync(string) if there is already a task executing?
Many job interview questions are asked for a purpose other than to see you write the code. Usually, questions are a bit vague specifically to see what clarifying questions you ask - your questions determine how well you do. Writing code on a whiteboard is secondary at best.
I was given a task to create an asynchronous wrapper over some long running method, processing some data
First question: is this long-running method asynchronous? If so, then there would not be a need for Task.Run. But if not...
Followup question: if it's not asynchronous, should it be? I.e., is it I/O-based? If so, then we could invest the time to make it properly asynchronous. But if not...
Followup question: if we need a task wrapper (around CPU-based code or around blocking I/O code), is the environment agreeable to a wrapper? I.e., is this a desktop/mobile app and not code that would be used in ASP.NET?
create it so that only a single task could be running at a time.
Clarifying questions: if a second request comes in when one is already running, does the second request "queue up"? Or would it "merge" with an existing request? If merging, do they need to "key" off of the input data - or some subset of the input data?
Every one of these questions change how the answer is structured.
exposing a public method and a public event.
This could be what threw it. Between Task<T> / IProgress<T> and Rx, events are seldom needed. They really only should be used if you're on a team that won't learn Rx.
Oh, and don't worry about "failing" an interview. I've "failed" over 2/3 of my interviews over the course of my career. I just don't interview well.
It depends on how fancy you want to get. One simple way is to store a task, and chain the subsequent tasks (with a bit of synchronization):
public class AsynchronousWrapper
{
private Task previousTask = Task.CompletedTask;
private SynchronousProcessor proc = new SynchronousProcessor();
public Task<string> ProcessAsync(string arg)
{
lock (proc)
{
var task = previousTask.ContinueWith(_ => proc.Process(arg));
previousTask = task;
return task;
}
}
}
As #MickyD already said, you need to know the Best Practices in Asynchronous Programming to solve such problems right way. Your solution has a code smell as it provide async wrapper with Task.Run for a synchronous code. As you were asked about the library development, it will be quite impacting your library consumers.
You have to understand that asynchronous isn't multithreading, as it can be done with one thread. It's like waiting for a mail - you don't hire a worker to wait by the mailbox.
Other solutions here aren't, well, async, because break other rule for async code: do not block async action, so you should avoid the lock construct.
So, back to your problem: if you face a task which states
only a single task could be running at a time
It is not about the lock (Monitor), it is about Semaphore(Slim). If for some reason in future you'll need to improve your code so more than one task can be executed simultaneously, you'll have to rewrite your code. In case of Semaphore usage you'll need to change only one constant. Also it has an async wrappers for waiting methods
So your code can be like this (note that the Task.Run is removed, as it is a client responsibility to provide an awaitable):
public class AsynchronousWrapper
{
private static SemaphoreSlim _mutex = new SemaphoreSlim(1);
public async Task<T> ProcessAsync<T>(Task<T> arg)
{
await _mutex.WaitAsync().ConfigureAwait(false);
try
{
return await arg;
}
finally
{
_mutex.Release();
}
}
}

Running synchronous method async

I have a method that iterates a list of objects and for each item in the list fetches data from an external api.
Sometimes this can be very slow (naturally) and I'd like to add all my items to a Taks list instead to be able to run multiple threads at the same time. Is this possible without rewriting it all to be async? Today I'm using WebClient and fetches everything synchronously.
I tried something like this at first:
public Main()
{
List<Task<Model>> taskList = new List<Task<Model>>();
foreach (var aThing in myThings)
{
taskList.Add(GetStuffAsync(aThing));
}
List<Model> list = Task.WhenAll(taskList.ToArray()).Result.ToList();
}
public async Task<Model> GetStuffAsync(object aThing)
{
// Stuff that is all synchronous
// ...
return anObject;
}
Rather than using async here, you can just make GetStuff a normal synchronous method and use Task.Run to create new tasks (which will normally be run on multiple threads) so that your fetches occur in parallel. You could also consider using Parallel.ForEach and the like, which is effectively similar to your current Main code.
Your async approach will not do what you want at the moment, because async methods run synchronously at least as far as the first await expression... in your case you don't have any await expressions, so by the time GetStuffAsync returns, it's already done all the actual work - you have no parallelism.
Of course, an alternative is to use your current code but actually do make GetStuffAsync asynchronous instead.
It's also worth bearing in mind that the HTTP stack in .NET has per-host connection pool limits - so if you're fetching multiple URLs from the same host, you may want to increase those as well, or you won't get much parallelism.
Try this concept:
foreach(var aThing in myThings)
{
Thread myTask = new Thread(new ParameterizedThreadStart(TaskMethod));
myTask.Start(aThing);
}
and there must be a method called TaskMethod that is void TaskMethod(object obj)

Optimizing for fire & forget using async/await and tasks

I have about 5 million items to update. I don't really care about the response (A response would be nice to have so I can log it, but I don't want a response if that will cost me time.) Having said that, is this code optimized to run as fast as possible? If there are 5 million items, would I run the risk of getting any task cancelled or timeout errors? I get about 1 or 2 responses back every second.
var tasks = items.Select(async item =>
{
await Update(CreateUrl(item));
}).ToList();
if (tasks.Any())
{
await Task.WhenAll(tasks);
}
private async Task<HttpResponseMessage> Update(string url)
{
var client = new HttpClient();
var response = await client.SendAsync(url).ConfigureAwait(false);
//log response.
}
UPDATE:
I am actually getting TaskCanceledExceptions. Did my system run out of threads? What could I do to avoid this?
You method will kick off all tasks at the same time, which may not be what you want. There wouldn't be any threads involved because with async operations There is no thread, but there may be number of concurrent connection limits.
There may be better tools to do this but if you want to use async/await one option is to use Stephen Toub's ForEachAsync as documented in this article. It allows you to control how many simultaneous operations you want to execute, so you don't overrun your connection limit.
Here it is from the article:
public static class Extensions
{
public static async Task ExecuteInPartition<T>(IEnumerator<T> partition, Func<T, Task> body)
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select ExecuteInPartition(partition, body));
}
}
Usage:
public async Task UpdateAll()
{
// Allow for 100 concurrent Updates
await items.ForEachAsync(100, async t => await Update(t));
}
A much better approach would be to use TPL Dataflow's ActionBlock with MaxDegreeOfParallelism and a single HttpClient:
Task UpdateAll(IEnumerable<Item> items)
{
var block = new ActionBlock<Item>(
item => UpdateAsync(CreateUrl(item)),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 1000});
foreach (var item in items)
{
block.Post(item);
}
block.Complete();
return block.Completion;
}
async Task UpdateAsync(string url)
{
var response = await _client.SendAsync(url).ConfigureAwait(false);
Console.WriteLine(response.StatusCode);
}
A single HttpClient can be used concurrently for multiple requests, and so it's much better to only create and disposing a single instance instead of 5 million.
There are numerous problems in firing so many request at the same time: The machine's network stack, the target web site, timeouts and so forth. The ActionBlock caps that number with the MaxDegreeOfParallelism (which you should test and optimize for your specific case). It's important to note that TPL may choose a lower number when it deems it to be appropriate.
When you have a single async call at the end of an async method or lambda expression, it's better for performance to remove the redundant async-await and just return the task (i.e return block.Completion;)
Complete will notify the ActionBlock to not accept any more items, but finish processing items it already has. When it's done the Completion task will be done so you can await it.
I suspect you are suffering from outgoing connection management preventing large numbers of simultaneous connections to the same domain. The answers given in this extensive Q+A might give you some avenues to investigate.
What is limiting the # of simultaneous connections my ASP.NET application can make to a web service?
In terms of your code structure, I'd personally try and use a dynamic pool of connections. You know that you cant actually get 5m connections simultaneously so trying to attempt it will just fail to work - you may as well deal with a reasonable and configured limit of (for instance) 20 connections and use them in a pool. In this way you can tune up or down.
alternatively you could investigate HTTP Pipelining (which I've not used) which is intended specifically for the job you are doing (batching up Http requests). http://en.wikipedia.org/wiki/HTTP_pipelining

Wrapping synchronous code into asynchronous call

I have a method in ASP.NET application, that consumes quite a lot of time to complete. A call to this method might occur up to 3 times during one user request, depending on the cache state and parameters that user provides. Each call takes about 1-2 seconds to complete. The method itself is synchronous call to the service and there is no possibility to override the implementation.
So the synchronous call to the service looks something like the following:
public OutputModel Calculate(InputModel input)
{
// do some stuff
return Service.LongRunningCall(input);
}
And the usage of the method is (note, that call of method may happen more than once):
private void MakeRequest()
{
// a lot of other stuff: preparing requests, sending/processing other requests, etc.
var myOutput = Calculate(myInput);
// stuff again
}
I tried to change the implementation from my side to provide simultaneous work of this method, and here is what I came to so far.
public async Task<OutputModel> CalculateAsync(InputModel input)
{
return await Task.Run(() =>
{
return Calculate(input);
});
}
Usage (part of "do other stuff" code runs simultaneously with the call to service):
private async Task MakeRequest()
{
// do some stuff
var task = CalculateAsync(myInput);
// do other stuff
var myOutput = await task;
// some more stuff
}
My question: Do I use the right approach to speed up the execution in ASP.NET application or am I doing unnecessary job trying to run synchronous code asynchronously?
Can anyone explain why the second approach is not an option in ASP.NET (if it is really not)?
Also, if such approach is applicable, do I need to call such method asynchronously if it is the only call we might perform at the moment (I have such case, when no other stuff there is to do while waiting for completion)?
Most of the articles in the net on this topic covers using async-await approach with the code, that already provides awaitable methods, but that's not my case. Here is the nice article describing my case, which doesn't describe the situation of parallel calls, declining the option to wrap sync call, but in my opinion my situation is exactly the occasion to do it.
It's important to make a distinction between two different types of concurrency. Asynchronous concurrency is when you have multiple asynchronous operations in flight (and since each operation is asynchronous, none of them are actually using a thread). Parallel concurrency is when you have multiple threads each doing a separate operation.
The first thing to do is re-evaluate this assumption:
The method itself is synchronous call to the service and there is no possibility to override the implementation.
If your "service" is a web service or anything else that is I/O-bound, then the best solution is to write an asynchronous API for it.
I'll proceed with the assumption that your "service" is a CPU-bound operation that must execute on the same machine as the web server.
If that's the case, then the next thing to evaluate is another assumption:
I need the request to execute faster.
Are you absolutely sure that's what you need to do? Are there any front-end changes you can make instead - e.g., start the request and allow the user to do other work while it's processing?
I'll proceed with the assumption that yes, you really do need to make the individual request execute faster.
In this case, you'll need to execute parallel code on your web server. This is most definitely not recommended in general because the parallel code will be using threads that ASP.NET may need to handle other requests, and by removing/adding threads it will throw the ASP.NET threadpool heuristics off. So, this decision does have an impact on your entire server.
When you use parallel code on ASP.NET, you are making the decision to really limit the scalability of your web app. You also may see a fair amount of thread churn, especially if your requests are bursty at all. I recommend only using parallel code on ASP.NET if you know that the number of simultaneous users will be quite low (i.e., not a public server).
So, if you get this far, and you're sure you want to do parallel processing on ASP.NET, then you have a couple of options.
One of the easier methods is to use Task.Run, very similar to your existing code. However, I do not recommend implementing a CalculateAsync method since that implies the processing is asynchronous (which it is not). Instead, use Task.Run at the point of the call:
private async Task MakeRequest()
{
// do some stuff
var task = Task.Run(() => Calculate(myInput));
// do other stuff
var myOutput = await task;
// some more stuff
}
Alternatively, if it works well with your code, you can use the Parallel type, i.e., Parallel.For, Parallel.ForEach, or Parallel.Invoke. The advantage to the Parallel code is that the request thread is used as one of the parallel threads, and then resumes executing in the thread context (there's less context switching than the async example):
private void MakeRequest()
{
Parallel.Invoke(() => Calculate(myInput1),
() => Calculate(myInput2),
() => Calculate(myInput3));
}
I do not recommend using Parallel LINQ (PLINQ) on ASP.NET at all.
I found that the following code can convert a Task to always run asynchronously
private static async Task<T> ForceAsync<T>(Func<Task<T>> func)
{
await Task.Yield();
return await func();
}
and I have used it in the following manner
await ForceAsync(() => AsyncTaskWithNoAwaits())
This will execute any Task asynchronously so you can combine them in WhenAll, WhenAny scenarios and other uses.
You could also simply add the Task.Yield() as the first line of your called code.
this is probably the easiest generic way in your case
return await new Task(
new Action(
delegate () {
// put your synchronous code here
}
)
);

Categories

Resources