Parallel.Invoke does not wait for async methods to complete - c#

I have an application that pulls a fair amount of data from different sources. A local database, a networked database, and a web query. Any of these can take a few seconds to complete. So, first I decided to run these in parallel:
Parallel.Invoke(
() => dataX = loadX(),
() => dataY = loadY(),
() => dataZ = loadZ()
);
As expected, all three execute in parallel, but execution on the whole block doesn't come back until the last one is done.
Next, I decided to add a spinner or "busy indicator" to the application. I don't want to block the UI thread or the spinner won't spin. So these need to be ran in async mode. But if I run all three in an async mode, then they in affect happen "synchronously", just not in the same thread as the UI. I still want them to run in parallel.
spinner.IsBusy = true;
Parallel.Invoke(
async () => dataX = await Task.Run(() => { return loadX(); }),
async () => dataY = await Task.Run(() => { return loadY(); }),
async () => dataZ = await Task.Run(() => { return loadZ(); })
);
spinner.isBusy = false;
Now, the Parallel.Invoke does not wait for the methods to finish and the spinner is instantly off. Worse, dataX/Y/Z are null and exceptions occur later.
What's the proper way here? Should I use a BackgroundWorker instead? I was hoping to make use of the .NET 4.5 features.

It sounds like you really want something like:
spinner.IsBusy = true;
try
{
Task t1 = Task.Run(() => dataX = loadX());
Task t2 = Task.Run(() => dataY = loadY());
Task t3 = Task.Run(() => dataZ = loadZ());
await Task.WhenAll(t1, t2, t3);
}
finally
{
spinner.IsBusy = false;
}
That way you're asynchronously waiting for all the tasks to complete (Task.WhenAll returns a task which completes when all the other tasks complete), without blocking the UI thread... whereas Parallel.Invoke (and Parallel.ForEach etc) are blocking calls, and shouldn't be used in the UI thread.
(The reason that Parallel.Invoke wasn't blocking with your async lambdas is that it was just waiting until each Action returned... which was basically when it hit the start of the await. Normally you'd want to assign an async lambda to Func<Task> or similar, in the same way that you don't want to write async void methods usually.)

As you stated in your question, two of your methods query a database (one via sql, the other via azure) and the third triggers a POST request to a web service. All three of those methods are doing I/O bound work.
What happeneds when you invoke Parallel.Invoke is you basically trigger three ThreadPool threads to block and wait for I/O based operations to complete, which is pretty much a waste of resources, and will scale pretty badly if you ever need to.
Instead, you could use async apis which all three of them expose:
SQL Server via Entity Framework 6 or ADO.NET
Azure has async api's
Web request via HttpClient.PostAsync
Lets assume the following methods:
LoadXAsync();
LoadYAsync();
LoadZAsync();
You can call them like this:
spinner.IsBusy = true;
try
{
Task t1 = LoadXAsync();
Task t2 = LoadYAsync();
Task t3 = LoadZAsync();
await Task.WhenAll(t1, t2, t3);
}
finally
{
spinner.IsBusy = false;
}
This will have the same desired outcome. It wont freeze your UI, and it would let you save valuable resources.

Related

Cancel background running jobs in Parallel.foreach

I have a parallel foreach for few api calls. Api will return some data and I can process it. Lets say from front end, I call to ProcessInsu method and suddenly the user change his mind and go to other page. (There should be a clear button and user can exit the page after clicking clear button.) After calling ProcessInsu method, it will run APIs in background. It is waste of resources because, the user already change his mind and do other work. I need some methodology to cancel background running jobs.
public async Task ProcessInsu(InsuranceAccounts insuranceCompAccounts,string resourceId)
{
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = Convert.ToInt32(Math.Ceiling((Environment.ProcessorCount * 0.75) * 2.0));
Parallel.ForEach(insuranceCompAccounts, parallelOptions, async (insuranceComp) =>
{
await Processor.Process(resourceId,insuranceComp); //When looping, From each call, it will call to different different insurance companies.
});
}
I tried this sample code and I could not do my work with that. Any expert can guide me to do this?
As others have noted, you can't use async with Parallel - it won't work correctly. Instead, you want to do asynchronous concurrency as such:
public async Task ProcessInsu(InsuranceAccounts insuranceCompAccounts, string resourceId)
{
var tasks = insuranceCompAccounts.Select(async (insuranceComp) =>
{
await Processor.Process(resourceId, insuranceComp);
}).ToList();
await Task.WhenAll(tasks);
}
Now that the code is corrected, you can add cancellation support. E.g.:
public async Task ProcessInsu(InsuranceAccounts insuranceCompAccounts, string resourceId)
{
var cts = new CancellationTokenSource();
var tasks = insuranceCompAccounts.Select(async (insuranceComp) =>
{
await Processor.Process(resourceId, insuranceComp, cts.Token);
}).ToList();
await Task.WhenAll(tasks);
}
When you are ready to cancel the operation, call cts.Cancel();.
Note that Process now takes a CancellationToken. It should pass this token on to whatever I/O APIs it's using.

Async tasks on different dbContexts. Says a second opreration started on this context

public static async void DoSomething(IEnumerable<IDbContext> dbContexts)
{
IEnumerator<IDbContext> dbContextEnumerator = dbContexts.GetEnumerator();
Task<ProjectSchema> projectSchemaTask = Task.Run(() => Core.Data.ProjectRead
.GetAll(dbContextEnumerator.Current)
.Where(a => a.PJrecid == pjRecId)
.Select(b => new ProjectSchema
{
PJtextid = b.PJtextid,
PJcustomerid = b.PJcustomerid,
PJininvoiceable = b.PJininvoiceable,
PJselfmanning = b.PJselfmanning,
PJcategory = b.PJcategory
})
.FirstOrDefault());
Task<int?> defaultActivitySchemeTask = projectSchemaTask.ContinueWith(antecedent =>
{
//This is where an exception may get thrown
return ProjectTypeRead.GetAll(dbContextEnumerator.Current)
.Where(a => a.PTid == antecedent.Result.PJcategory)
.Select(a => a.PTactivitySchemeID)
.FirstOrDefaultAsync().Result;
}, TaskContinuationOptions.OnlyOnRanToCompletion);
Task<SomeModel> customerTask = projectSchemaTask.ContinueWith((antecedent) =>
{
//This is where an exception may get thrown
return GetCustomerDataAsync(antecedent.Result.PJcustomerid,
dbContextEnumerator.Current).Result;
}, TaskContinuationOptions.OnlyOnRanToCompletion);
await Task.WhenAll(defaultActivitySchemeTask, customerTask);
}
The exception I am getting:
NotSupportedException: A second operation started on this context before a previous asynchronous operation completed. Use 'await' to ensure that any asynchronous operations have completed before calling another method on this context. Any instance members are not guaranteed to be thread safe.
The exception is only thrown about every 1/20 calls to this function. And the exception seems only to happen when I am chaining tasks with ContinueWith().
How can there be a second operation on context, when I am using a new one for each request?
This is just an example of my code. In the real code I have 3 parent tasks, and each parent has 1-5 chained tasks attached to them.
What am I doing wrong?
yeah, you basically shouldn't use ContinueWith these days; in this case, you are ending up with two continuations on the same task (for defaultActivitySchemeTask and customerTask); how they interact is now basically undefined, and will depend on exactly how the two async flows work, but you could absolutely end up with overlapping async operations here (for example, in the simplest "continuations are sequential", as soon as the first awaits because it is incomplete, the second will start). Frankly, this should be logically sequential await based code, probably not using Task.Run too, but let's keep it for now:
ProjectSchema projectSchema = await Task.Run(() => ...);
int? defaultActivityScheme = await ... first bit
SomeModel customer = await ... second bit
We can't do the two subordinate queries concurrently without risking concurrent async operations on the same context.
In your example you seem to be running two continuations in parallel, so there is a possibility that they will overlap causing a concurrency problem. DbContext is not thread safe, so you need to make sure that your asynchronous calls are sequential. Keep in mind that using async/await will simply turn your code into a state machine, so you can control which operations has completed before moving to the next operation. Using async methods alone will not ensure parallel operations but wrapping your operation in Task.Run will. So you you need to ask yourself is Task.Run is really required (i.e. is scheduling work in the ThreadPool) to make it parallel.
You mentioned that in your real code you have 3 parent tasks and each parent has 1-5 chained tasks attached to them. If the 3 parent tasks have separate DbContexts, they could run in parallel (each one of them wrapped in Task.Run), but their chained continuations need to be sequential (leveraging async/await keywords). Like that:
public async Task DoWork()
{
var parentTask1 = Task.Run(ParentTask1);
var parentTask2 = Task.Run(ParentTask2);
var parentTask3 = Task.Run(ParentTask3);
await Task.WhenAll(parentTask1 , parentTask2, parentTask3);
}
private async Task ParentTask1()
{
// chained child asynchronous continuations
await Task.Delay(100);
await Task.Delay(100);
}
private async Task ParentTask2()
{
// chained child asynchronous continuations
await Task.Delay(100);
await Task.Delay(100);
}
private async Task ParentTask3()
{
// chained child asynchronous continuations
await Task.Delay(100);
await Task.Delay(100);
}
If your parent tasks operate on the same DbContext, in order to avoid concurrency you would need to await them one by one (no need to wrap them into Task.Run):
public async Task DoWork()
{
await ParentTask1();
await ParentTask2();
await ParentTask3();
}

How to run multiple call with-in a function parallel

I have a few functions that build a user response class and I am still grasping the TASK async await.
From the code below is there a way to run all the all in parallel rather than one at a time?
I guess my first question should be how is the call taking place the way it is set up now?
My second question is how can i run all these calls in parallel?
It is not necessary for the returns to return in any specific order
public static async Task<ProjectForDrawings> GetProjectInfo(string cnn, int projectID)
{
return await Task.Run(() =>
{
ProjectForDrawings projectForDrawings = DataBase.proc_GetProject_ForDrawings.ToRecord<ProjectForDrawings>(cnn, projectID);
projectForDrawings.Submittals = DataBase.proc_GetSubmittal.ToList(cnn, projectID);
projectForDrawings.ProjectLeafs = DataBase.proc_GetProjectLeafs.ToList<ProjectLeaf>(cnn, projectID);
projectForDrawings.Revisions = DataBase.proc_GetRevisionsForProject.ToList<Revisions>(cnn, projectID);
return projectForDrawings;
});
}
how is the call taking place the way it is set up now?
It schedules the work to a background thread (Task.Run) and then asynchronously waits for it to complete (await). The work will execute each database proc one at a time, synchronously blocking the background thread until it completes.
how can i run all these calls in parallel?
You can start all the tasks, and then await them all with Task.WhenAll:
public static async Task<ProjectForDrawings> GetProjectInfo(string cnn, int projectID)
{
ProjectForDrawings projectForDrawings = DataBase.proc_GetProject_ForDrawings.ToRecord<ProjectForDrawings>(cnn, projectID);
var submittalsTask = Task.Run(() => DataBase.proc_GetSubmittal.ToList(cnn, projectID));
var leafsTask = Task.Run(() => DataBase.proc_GetProjectLeafs.ToList<ProjectLeaf>(cnn, projectID));
var revisionsTask = Task.Run(() => DataBase.proc_GetRevisionsForProject.ToList<Revisions>(cnn, projectID));
await Task.WhenAll(submittalsTask, leafsTask, revisionsTask);
projectForDrawings.Submittals = await submittalsTask;
projectForDrawings.ProjectLeafs = await leafsTask;
projectForDrawings.Revisions = await revisionsTask;
return projectForDrawings;
}
However, many (most?) databases do not allow multiple queries per database connection, so this may not work for your database. Also, it may not be a good idea to parallelize calls on the database in the first place - it is possible to cause a self-imposed denial-of-service. Finally, using Task.Run in the implementation is not a good pattern (for reasons I describe on my blog) - using natural async methods would be better.

c# Executing Multiple calls in Parallel

I'm looping through an Array of values, for each value I want to execute a long running process. Since I have multiple tasks to be performed that have no inter dependency I want to be able to execute them in parallel.
My code is:
List<Task<bool>> dependantTasksQuery = new List<Task<bool>>();
foreach (int dependantID in dependantIDList)
{
dependantTasksQuery.Add(WaitForDependantObject(dependantID));
}
Task<bool>[] dependantTasks = dependantTasksQuery.ToArray();
//Wait for all dependant tasks to complete
bool[] lengths = await Task.WhenAll(dependantTasks);
The WaitForDependantObject method just looks like:
async Task<bool> WaitForDependantObject(int idVal)
{
System.Threading.Thread.Sleep(20000);
bool waitDone = true;
return waitDone;
}
As you can see I've just added a sleep to highlight my issue. What is happening when debugging is that on the line:
dependantTasksQuery.Add(WaitForDependantObject(dependantID));
My code is stopping and waiting the 20 seconds for the method to complete. I did not want to start the execution until I had completed the loop and built up the Array. Can somebody point me to what I'm doing wrong? I'm pretty sure I need an await somewhere
In your case WaitForDependantObject isn't asynchronous at all even though it returns a task. If that's your goal do as Luke Willis suggests. To make these calls both asynchronous and truly parallel you need to offload them to a Thread Pool thread with Task.Run:
bool[] lengths = await Task.WhenAll(dependantIDList.Select(() => Task.Run(() => WaitForDependantObject(dependantID))));
async methods run synchronously until an await is reached and them returns a task representing the asynchronous operation. In your case you don't have an await so the methods simply execute one after the other. Task.Run uses multiple threads to enable parallelism even on these synchronous parts on top of the concurrency of awaiting all the tasks together with Task.WhenAll.
For WaitForDependantObject to represent an async method more accurately it should look like this:
async Task<bool> WaitForDependantObject(int idVal)
{
await Task.Delay(20000);
return true;
}
Use Task.Delay to make method asynchronous and looking more real replacement of mocked code:
async Task<bool> WaitForDependantObject(int idVal)
{
// how long synchronous part of method takes (before first await)
System.Threading.Thread.Sleep(1000);
// method returns as soon as awiting started
await Task.Delay(2000); // how long IO or other async operation takes place
// simulate data processing, would run on new thread unless
// used in WPF/WinForms/ASP.Net and no call to ConfigureAwait(false) made by caller.
System.Threading.Thread.Sleep(1000);
bool waitDone = true;
return waitDone;
}
You can do this using Task.Factory.StartNew.
Replace this:
dependantTasksQuery.Add(WaitForDependantObject(dependantID));
with this:
dependantTasksQuery.Add(
Task.Factory.StartNew(
() => WaitForDependantObject(dependantID)
)
);
This will run your method within a new Task and add the task to your List.
You will also want to change the method signature of WaitForDependantObject to be:
bool WaitForDependantObject(int idVal)
You can then wait for your tasks to complete with:
Task.WaitAll(dependentTasksQuery.ToArray());
And get your results with:
bool[] lengths = dependentTasksQuery.Select(task => task.Result).ToArray();

Parallel.ForEach using Thread.Sleep equivalent

So here's the situation: I need to make a call to a web site that starts a search. This search continues for an unknown amount of time, and the only way I know if the search has finished is by periodically querying the website to see if there's a "Download Data" link somewhere on it (it uses some strange ajax call on a javascript timer to check the backend and update the page, I think).
So here's the trick: I have hundreds of items I need to search for, one at a time. So I have some code that looks a little bit like this:
var items = getItems();
Parallel.ForEach(items, item =>
{
startSearch(item);
var finished = isSearchFinished(item);
while(finished == false)
{
finished = isSearchFinished(item); //<--- How do I delay this action 30 Secs?
}
downloadData(item);
}
Now, obviously this isn't the real code, because there could be things that cause isSearchFinished to always be false.
Obvious infinite loop danger aside, how would I correctly keep isSearchFinished() from calling over and over and over, but instead call every, say, 30 seconds or 1 minute?
I know Thread.Sleep() isn't the right solution, and I think the solution might be accomplished by using Threading.Timer() but I'm not very familiar with it, and there are so many threading options that I'm just not sure which to use.
It's quite easy to implement with tasks and async/await, as noted by #KevinS in the comments:
async Task<ItemData> ProcessItemAsync(Item item)
{
while (true)
{
if (await isSearchFinishedAsync(item))
break;
await Task.Delay(30 * 1000);
}
return await downloadDataAsync(item);
}
// ...
var items = getItems();
var tasks = items.Select(i => ProcessItemAsync(i)).ToArray();
await Task.WhenAll(tasks);
var data = tasks.Select(t = > t.Result);
This way, you don't block ThreadPool threads in vain for what is mostly a bunch of I/O-bound network operations. If you're not familiar with async/await, the async-await tag wiki might be a good place to start.
I assume you can convert your synchronous methods isSearchFinished and downloadData to asynchronous versions using something like HttpClient for non-blocking HTTP request and returning a Task<>. If you are unable to do so, you still can simply wrap them with Task.Run, as await Task.Run(() => isSearchFinished(item)) and await Task.Run(() => downloadData(item)). Normally this is not recommended, but as you have hundreds of items, it sill would give you a much better level of concurrency than with Parallel.ForEach in this case, because you won't be blocking pool threads for 30s, thanks to asynchronous Task.Delay.
You can also write a generic function using TaskCompletionSource and Threading.Timer to return a Task that becomes complete once a specified retry function succeeds.
public static Task RetryAsync(Func<bool> retryFunc, TimeSpan retryInterval)
{
return RetryAsync(retryFunc, retryInterval, CancellationToken.None);
}
public static Task RetryAsync(Func<bool> retryFunc, TimeSpan retryInterval, CancellationToken cancellationToken)
{
var tcs = new TaskCompletionSource<object>();
cancellationToken.Register(() => tcs.TrySetCanceled());
var timer = new Timer((state) =>
{
var taskCompletionSource = (TaskCompletionSource<object>) state;
try
{
if (retryFunc())
{
taskCompletionSource.TrySetResult(null);
}
}
catch (Exception ex)
{
taskCompletionSource.TrySetException(ex);
}
}, tcs, TimeSpan.FromMilliseconds(0), retryInterval);
// Once the task is complete, dispose of the timer so it doesn't keep firing. Also captures the timer
// in a closure so it does not get disposed.
tcs.Task.ContinueWith(t => timer.Dispose(),
CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default);
return tcs.Task;
}
You can then use RetryAsync like this:
var searchTasks = new List<Task>();
searchTasks.AddRange(items.Select(
downloadItem => RetryAsync( () => isSearchFinished(downloadItem), TimeSpan.FromSeconds(2)) // retry timout
.ContinueWith(t => downloadData(downloadItem),
CancellationToken.None,
TaskContinuationOptions.OnlyOnRanToCompletion,
TaskScheduler.Default)));
await Task.WhenAll(searchTasks.ToArray());
The ContinueWith part specifies what you do once the task has completed successfully. In this case it will run your downloadData method on a thread pool thread because we specified TaskScheduler.Default and the continuation will only execute if the task ran to completion, i.e. it was not canceled and no exception was thrown.

Categories

Resources