I have a few functions that build a user response class and I am still grasping the TASK async await.
From the code below is there a way to run all the all in parallel rather than one at a time?
I guess my first question should be how is the call taking place the way it is set up now?
My second question is how can i run all these calls in parallel?
It is not necessary for the returns to return in any specific order
public static async Task<ProjectForDrawings> GetProjectInfo(string cnn, int projectID)
{
return await Task.Run(() =>
{
ProjectForDrawings projectForDrawings = DataBase.proc_GetProject_ForDrawings.ToRecord<ProjectForDrawings>(cnn, projectID);
projectForDrawings.Submittals = DataBase.proc_GetSubmittal.ToList(cnn, projectID);
projectForDrawings.ProjectLeafs = DataBase.proc_GetProjectLeafs.ToList<ProjectLeaf>(cnn, projectID);
projectForDrawings.Revisions = DataBase.proc_GetRevisionsForProject.ToList<Revisions>(cnn, projectID);
return projectForDrawings;
});
}
how is the call taking place the way it is set up now?
It schedules the work to a background thread (Task.Run) and then asynchronously waits for it to complete (await). The work will execute each database proc one at a time, synchronously blocking the background thread until it completes.
how can i run all these calls in parallel?
You can start all the tasks, and then await them all with Task.WhenAll:
public static async Task<ProjectForDrawings> GetProjectInfo(string cnn, int projectID)
{
ProjectForDrawings projectForDrawings = DataBase.proc_GetProject_ForDrawings.ToRecord<ProjectForDrawings>(cnn, projectID);
var submittalsTask = Task.Run(() => DataBase.proc_GetSubmittal.ToList(cnn, projectID));
var leafsTask = Task.Run(() => DataBase.proc_GetProjectLeafs.ToList<ProjectLeaf>(cnn, projectID));
var revisionsTask = Task.Run(() => DataBase.proc_GetRevisionsForProject.ToList<Revisions>(cnn, projectID));
await Task.WhenAll(submittalsTask, leafsTask, revisionsTask);
projectForDrawings.Submittals = await submittalsTask;
projectForDrawings.ProjectLeafs = await leafsTask;
projectForDrawings.Revisions = await revisionsTask;
return projectForDrawings;
}
However, many (most?) databases do not allow multiple queries per database connection, so this may not work for your database. Also, it may not be a good idea to parallelize calls on the database in the first place - it is possible to cause a self-imposed denial-of-service. Finally, using Task.Run in the implementation is not a good pattern (for reasons I describe on my blog) - using natural async methods would be better.
Related
I've got a loop that needs to be run in parallel as each iteration is slow and processor intensive but I also need to call an async method as part of each iteration in the loop.
I've seen questions on how to handle an async method in the loop but not a combination of async and synchronous, which is what I've got.
My (simplified) code is as follows - I know this won't work properly due to the async action being passed to foreach.
protected IDictionary<int, ReportData> GetReportData()
{
var results = new ConcurrentDictionary<int, ReportData>();
Parallel.ForEach(requestData, async data =>
{
// process data synchronously
var processedData = ProcessData(data);
// get some data async
var reportRequest = await BuildRequestAsync(processedData);
// synchronous building
var report = reportRequest.BuildReport();
results.TryAdd(data.ReportId, report);
});
// This needs to be populated before returning
return results;
}
Is there any way to get execute the action in parallel when the action has to be async in order to await the single async call.
It's not a practical option to convert the synchronous functions to async.
I don't want to split the action up and have a Parallel.ForEach followed by the async calls with a WhenAll and another Parallel.ForEach as the speed of each stage can vary greatly between different iterations so splitting it would be inefficient as the faster ones would be waiting for the slower ones before continuing.
I did wonder if a PLINQ ForAll could be used instead of the Parallel.ForEach but have never used PLINQ and not sure if it would wait for all of the iterations to be completed before returning, i.e. would the Tasks still be running at the end of the process.
Is there any way to get execute the action in parallel when the action has to be async in order to await the single async call.
Yes, but you'll need to understand what Parallel gives you that you lose when you take alternative approaches. Specifically, Parallel will automatically determine the appropriate number of threads and adjust based on usage.
It's not a practical option to convert the synchronous functions to async.
For CPU-bound methods, you shouldn't convert them.
I don't want to split the action up and have a Parallel.ForEach followed by the async calls with a WhenAll and another Parallel.ForEach as the speed of each stage can vary greatly between different iterations so splitting it would be inefficient as the faster ones would be waiting for the slower ones before continuing.
The first recommendation I would make is to look into TPL Dataflow. It allows you to define a "pipeline" of sorts that keeps the data flowing through while limiting the concurrency at each stage.
I did wonder if a PLINQ ForAll could be used instead of the Parallel.ForEach
No. PLINQ is very similar to Parallel in how they work. There's a few differences over how aggressive they are at CPU utilization, and some API differences - e.g., if you have a collection of results coming out the end, PLINQ is usually cleaner than Parallel - but at a high-level view they're very similar. Both only work on synchronous code.
However, you could use a simple Task.Run with Task.WhenAll as such:
protected async Task<IDictionary<int, ReportData>> GetReportDataAsync()
{
var tasks = requestData.Select(async data => Task.Run(() =>
{
// process data synchronously
var processedData = ProcessData(data);
// get some data async
var reportRequest = await BuildRequestAsync(processedData);
// synchronous building
var report = reportRequest.BuildReport();
return (Key: data.ReportId, Value: report);
})).ToList();
var results = await Task.WhenAll(tasks);
return results.ToDictionary(x => x.Key, x => x.Value);
}
You may need to apply a concurrency limit (which Parallel would have done for you). In the asynchronous world, this would look like:
protected async Task<IDictionary<int, ReportData>> GetReportDataAsync()
{
var throttle = new SemaphoreSlim(10);
var tasks = requestData.Select(data => Task.Run(async () =>
{
await throttle.WaitAsync();
try
{
// process data synchronously
var processedData = ProcessData(data);
// get some data async
var reportRequest = await BuildRequestAsync(processedData);
// synchronous building
var report = reportRequest.BuildReport();
return (Key: data.ReportId, Value: report);
}
finally
{
throttle.Release();
}
})).ToList();
var results = await Task.WhenAll(tasks);
return results.ToDictionary(x => x.Key, x => x.Value);
}
public static async void DoSomething(IEnumerable<IDbContext> dbContexts)
{
IEnumerator<IDbContext> dbContextEnumerator = dbContexts.GetEnumerator();
Task<ProjectSchema> projectSchemaTask = Task.Run(() => Core.Data.ProjectRead
.GetAll(dbContextEnumerator.Current)
.Where(a => a.PJrecid == pjRecId)
.Select(b => new ProjectSchema
{
PJtextid = b.PJtextid,
PJcustomerid = b.PJcustomerid,
PJininvoiceable = b.PJininvoiceable,
PJselfmanning = b.PJselfmanning,
PJcategory = b.PJcategory
})
.FirstOrDefault());
Task<int?> defaultActivitySchemeTask = projectSchemaTask.ContinueWith(antecedent =>
{
//This is where an exception may get thrown
return ProjectTypeRead.GetAll(dbContextEnumerator.Current)
.Where(a => a.PTid == antecedent.Result.PJcategory)
.Select(a => a.PTactivitySchemeID)
.FirstOrDefaultAsync().Result;
}, TaskContinuationOptions.OnlyOnRanToCompletion);
Task<SomeModel> customerTask = projectSchemaTask.ContinueWith((antecedent) =>
{
//This is where an exception may get thrown
return GetCustomerDataAsync(antecedent.Result.PJcustomerid,
dbContextEnumerator.Current).Result;
}, TaskContinuationOptions.OnlyOnRanToCompletion);
await Task.WhenAll(defaultActivitySchemeTask, customerTask);
}
The exception I am getting:
NotSupportedException: A second operation started on this context before a previous asynchronous operation completed. Use 'await' to ensure that any asynchronous operations have completed before calling another method on this context. Any instance members are not guaranteed to be thread safe.
The exception is only thrown about every 1/20 calls to this function. And the exception seems only to happen when I am chaining tasks with ContinueWith().
How can there be a second operation on context, when I am using a new one for each request?
This is just an example of my code. In the real code I have 3 parent tasks, and each parent has 1-5 chained tasks attached to them.
What am I doing wrong?
yeah, you basically shouldn't use ContinueWith these days; in this case, you are ending up with two continuations on the same task (for defaultActivitySchemeTask and customerTask); how they interact is now basically undefined, and will depend on exactly how the two async flows work, but you could absolutely end up with overlapping async operations here (for example, in the simplest "continuations are sequential", as soon as the first awaits because it is incomplete, the second will start). Frankly, this should be logically sequential await based code, probably not using Task.Run too, but let's keep it for now:
ProjectSchema projectSchema = await Task.Run(() => ...);
int? defaultActivityScheme = await ... first bit
SomeModel customer = await ... second bit
We can't do the two subordinate queries concurrently without risking concurrent async operations on the same context.
In your example you seem to be running two continuations in parallel, so there is a possibility that they will overlap causing a concurrency problem. DbContext is not thread safe, so you need to make sure that your asynchronous calls are sequential. Keep in mind that using async/await will simply turn your code into a state machine, so you can control which operations has completed before moving to the next operation. Using async methods alone will not ensure parallel operations but wrapping your operation in Task.Run will. So you you need to ask yourself is Task.Run is really required (i.e. is scheduling work in the ThreadPool) to make it parallel.
You mentioned that in your real code you have 3 parent tasks and each parent has 1-5 chained tasks attached to them. If the 3 parent tasks have separate DbContexts, they could run in parallel (each one of them wrapped in Task.Run), but their chained continuations need to be sequential (leveraging async/await keywords). Like that:
public async Task DoWork()
{
var parentTask1 = Task.Run(ParentTask1);
var parentTask2 = Task.Run(ParentTask2);
var parentTask3 = Task.Run(ParentTask3);
await Task.WhenAll(parentTask1 , parentTask2, parentTask3);
}
private async Task ParentTask1()
{
// chained child asynchronous continuations
await Task.Delay(100);
await Task.Delay(100);
}
private async Task ParentTask2()
{
// chained child asynchronous continuations
await Task.Delay(100);
await Task.Delay(100);
}
private async Task ParentTask3()
{
// chained child asynchronous continuations
await Task.Delay(100);
await Task.Delay(100);
}
If your parent tasks operate on the same DbContext, in order to avoid concurrency you would need to await them one by one (no need to wrap them into Task.Run):
public async Task DoWork()
{
await ParentTask1();
await ParentTask2();
await ParentTask3();
}
I've got an async method, GetExpensiveThing(), which performs some expensive I/O work. This is how I am using it:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = await GetExpensiveThing();
var second = await GetExpensiveThing();
return new List<Thing>() { first, second };
}
But since it's an expensive method, I want to execute these calls in in parallel. I would have thought moving the awaits would have solved this:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = GetExpensiveThing();
var second = GetExpensiveThing();
return new List<Thing>() { await first, await second };
}
That didn't work, so I wrapped them in some tasks and this works:
// Parallel execution
public async Task<List<Thing>> GetThings()
{
var first = Task.Run(() =>
{
return GetExpensiveThing();
});
var second = Task.Run(() =>
{
return GetExpensiveThing();
});
return new List<Thing>() { first.Result, second.Result };
}
I even tried playing around with awaits and async in and around the tasks, but it got really confusing and I had no luck.
Is there a better to run async methods in parallel, or are tasks a good approach?
Is there a better to run async methods in parallel, or are tasks a good approach?
Yes, the "best" approach is to utilize the Task.WhenAll method. However, your second approach should have ran in parallel. I have created a .NET Fiddle, this should help shed some light. Your second approach should actually be running in parallel. My fiddle proves this!
Consider the following:
public Task<Thing[]> GetThingsAsync()
{
var first = GetExpensiveThingAsync();
var second = GetExpensiveThingAsync();
return Task.WhenAll(first, second);
}
Note
It is preferred to use the "Async" suffix, instead of GetThings and GetExpensiveThing - we should have GetThingsAsync and GetExpensiveThingAsync respectively - source.
Task.WhenAll() has a tendency to become unperformant with large scale/amount of tasks firing simultaneously - without moderation/throttling.
If you are doing a lot of tasks in a list and wanting to await the final outcome, then I propose using a partition with a limit on the degree of parallelism.
I have modified Stephen Toub's blog elegant approach to modern LINQ:
public static Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> funcBody, int maxDoP = 4)
{
async Task AwaitPartition(IEnumerator<T> partition)
{
using (partition)
{
while (partition.MoveNext())
{
await Task.Yield(); // prevents a sync/hot thread hangup
await funcBody(partition.Current);
}
}
}
return Task.WhenAll(
Partitioner
.Create(source)
.GetPartitions(maxDoP)
.AsParallel()
.Select(p => AwaitPartition(p)));
}
How it works is simple, take an IEnumerable - dissect it into evenish partitions and the fire a function/method against each element, in each partition, at the same time. No more than one element in each partition at anyone time, but n Tasks in n partitions.
Extension Usage:
await myList.ParallelForEachAsync(myFunc, Environment.ProcessorCount);
Edit:
I now keep some overloads in a repository on Github if you need more options. It's in a NuGet too for NetStandard.
Edit 2: Thanks to comments from Theodor below, I was able to mitigate poorly written Async Tasks from blocking parallelism by using await Task.Yield();.
You can your the Task.WhenAll, which returns when all depending tasks are done
Check this question here for reference
If GetExpensiveThing is properly asynchronous (meaning it doesn't do any IO or CPU work synchronously), your second solution of invoking both methods and then awaiting the results should've worked. You could've also used Task.WhenAll.
However, if it isn't, you may get better results by posting each task to the thread-pool and using the Task.WhenAll combinator, e.g.:
public Task<IList<Thing>> GetThings() =>
Task.WhenAll(Task.Run(() => GetExpensiveThing()), Task.Run(() => GetExpensiveThing()));
(Note I changed the return type to IList to avoid awaits altogether.)
You should avoid using the Result property. It causes the caller thread to block and wait for the task to complete, unlike await or Task.WhenAll which use continuations.
I have an application that pulls a fair amount of data from different sources. A local database, a networked database, and a web query. Any of these can take a few seconds to complete. So, first I decided to run these in parallel:
Parallel.Invoke(
() => dataX = loadX(),
() => dataY = loadY(),
() => dataZ = loadZ()
);
As expected, all three execute in parallel, but execution on the whole block doesn't come back until the last one is done.
Next, I decided to add a spinner or "busy indicator" to the application. I don't want to block the UI thread or the spinner won't spin. So these need to be ran in async mode. But if I run all three in an async mode, then they in affect happen "synchronously", just not in the same thread as the UI. I still want them to run in parallel.
spinner.IsBusy = true;
Parallel.Invoke(
async () => dataX = await Task.Run(() => { return loadX(); }),
async () => dataY = await Task.Run(() => { return loadY(); }),
async () => dataZ = await Task.Run(() => { return loadZ(); })
);
spinner.isBusy = false;
Now, the Parallel.Invoke does not wait for the methods to finish and the spinner is instantly off. Worse, dataX/Y/Z are null and exceptions occur later.
What's the proper way here? Should I use a BackgroundWorker instead? I was hoping to make use of the .NET 4.5 features.
It sounds like you really want something like:
spinner.IsBusy = true;
try
{
Task t1 = Task.Run(() => dataX = loadX());
Task t2 = Task.Run(() => dataY = loadY());
Task t3 = Task.Run(() => dataZ = loadZ());
await Task.WhenAll(t1, t2, t3);
}
finally
{
spinner.IsBusy = false;
}
That way you're asynchronously waiting for all the tasks to complete (Task.WhenAll returns a task which completes when all the other tasks complete), without blocking the UI thread... whereas Parallel.Invoke (and Parallel.ForEach etc) are blocking calls, and shouldn't be used in the UI thread.
(The reason that Parallel.Invoke wasn't blocking with your async lambdas is that it was just waiting until each Action returned... which was basically when it hit the start of the await. Normally you'd want to assign an async lambda to Func<Task> or similar, in the same way that you don't want to write async void methods usually.)
As you stated in your question, two of your methods query a database (one via sql, the other via azure) and the third triggers a POST request to a web service. All three of those methods are doing I/O bound work.
What happeneds when you invoke Parallel.Invoke is you basically trigger three ThreadPool threads to block and wait for I/O based operations to complete, which is pretty much a waste of resources, and will scale pretty badly if you ever need to.
Instead, you could use async apis which all three of them expose:
SQL Server via Entity Framework 6 or ADO.NET
Azure has async api's
Web request via HttpClient.PostAsync
Lets assume the following methods:
LoadXAsync();
LoadYAsync();
LoadZAsync();
You can call them like this:
spinner.IsBusy = true;
try
{
Task t1 = LoadXAsync();
Task t2 = LoadYAsync();
Task t3 = LoadZAsync();
await Task.WhenAll(t1, t2, t3);
}
finally
{
spinner.IsBusy = false;
}
This will have the same desired outcome. It wont freeze your UI, and it would let you save valuable resources.
I have a method which calls database as shown below:
BL Method to call DAO method:
public async Task<List<Classes>> GetClassesAndAddRules(string classId)
{
var classData = await Task.Run( () => _daoClass.GetClasses(classId));
//logic for adding rule
//..................................
}
DatabaseCall in DAO:
//*below method takes 1 second approx to return*
public List<Classes> GetClasses(string id)
{
var retVal = new List<Classes>();
using (var context = new test_db_context())
{
var rows = context.GetClassesById(id);
foreach (ClassesDBComplexType row in rows)
{
retVal.Add(Mapper.Map<GetClassesByClassIdOut>(row));
}
}
return retVal;
}
Is there any performance boost just my calling the DAO method using await ?
My understanding is GetClasses() will be called on a separate thread so that it doesn't block and continue processing other stuff.
Any help is appreciated.
The code you posted won't compile. From the title of your question, I'm assuming that your call actually looks like await Task.Run(() => _daoClass.GetClasses(classId));
In that case, the use of Task.Run will make a difference in performance: it will be worse.
The point of async on the server side is to free up the request thread instead of blocking it. What you're doing with await Task.Run(...) is to free up the request thread by starting work on another thread. In other words, the Task.Run code has the same amount of work to do plus thread marshaling.