This question already has answers here:
Using async/await for multiple tasks
(8 answers)
Closed 4 years ago.
I have a method in which I'm retrieving a list of deployments. For each deployment I want to retrieve an associated release. Because all calls are made to an external API, I now have a foreach-loop in which those calls are made.
public static async Task<List<Deployment>> GetDeployments()
{
try
{
var depjson = await GetJson($"{BASEURL}release/deployments?deploymentStatus=succeeded&definitionId=2&definitionEnvironmentId=5&minStartedTime={MinDateTime}");
var deployments = (JsonConvert.DeserializeObject<DeploymentWrapper>(depjson))?.Value?.OrderByDescending(x => x.DeployedOn)?.ToList();
foreach (var deployment in deployments)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
}
return deployments;
}
catch (Exception)
{
throw;
}
}
This all works perfectly fine. However, I do not like the await in the foreach-loop at all. I also believe this is not considered good practice. I just don't see how to refactor this so the calls are made parallel, because the result of each call is used to set a property of the deployment.
I would appreciate any suggestions on how to make this method faster and, whenever possible, avoid the await-ing in the foreach-loop.
There is nothing wrong with what you are doing now. But there is a way to call all tasks at once instead of waiting for a single task, then processing it and then waiting for another one.
This is how you can turn this:
wait for one -> process -> wait for one -> process ...
into
wait for all -> process -> done
Convert this:
foreach (var deployment in deployments)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
}
To:
var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}"));
var reljsons = await Task.WhenAll(deplTasks);
for(var index = 0; index < deployments.Count; index++)
{
deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]);
}
First you take a list of unfinished tasks. Then you await it and you get a collection of results (reljson's). Then you have to deserialize them and assign to Release.
By using await Task.WhenAll() you wait for all the tasks at the same time, so you should see a performance boost from that.
Let me know if there are typos, I didn't compile this code.
Fcin suggested to start all Tasks, await for them all to finish and then start deserializing the fetched data.
However, if the first Task is already finished, but the second task not, and internally the second task is awaiting, the first task could already start deserializing. This would shorten the time that your process is idly waiting.
So instead of:
var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}"));
var reljsons = await Task.WhenAll(deplTasks);
for(var index = 0; index < deployments.Count; index++)
{
deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]);
}
I'd suggest the following slight change:
// async fetch the Release data of Deployment:
private async Task<Release> FetchReleaseDataAsync(Deployment deployment)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
return JsonConvert.DeserializeObject<Release>(reljson);
}
// async fill the Release data of Deployment:
private async Task FillReleaseDataAsync(Deployment deployment)
{
deployment.Release = await FetchReleaseDataAsync(deployment);
}
Then your procedure is similar to the solution that Fcin suggested:
IEnumerable<Task> tasksFillDeploymentWithReleaseData = deployments.
.Select(deployment => FillReleaseDataAsync(deployment)
.ToList();
await Task.WhenAll(tasksFillDeploymentWithReleaseData);
Now if the first task has to wait while fetching the release data, the 2nd task begins and the third etc. If the first task already finished fetching the release data, but the other tasks are awaiting for their release data, the first task starts already deserializing it and assigns the result to deployment.Release, after which the first task is complete.
If for instance the 7th task got its data, but the 2nd task is still waiting, the 7th task can deserialize and assign the data to deployment.Release. Task 7 is completed.
This continues until all tasks are completed. Using this method there is less waiting time because as soon as one task has its data it is scheduled to start deserializing
If i understand you right and you want to make the var reljson = await GetJson parralel:
Try this:
Parallel.ForEach(deployments, (deployment) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
});
you might limit the number of parallel executions such as:
Parallel.ForEach(
deployments,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
(deployment) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
});
you might also want to be able to break the loop:
Parallel.ForEach(deployments, (deployment, state) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
if (noFurtherProcessingRequired) state.Break();
});
Related
Hi Recently i was working in .net core web api project which is downloading files from external api.
In this .net core api recently found some issues while the no of files is more say more than 100. API is downloading max of 50 files and skipping others. WebAPI is deployed on AWS Lambda and timeout is 15mnts.
Actually the operation is timing out due to the long download process
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
bool DownloadFlag = false;
foreach (DownloadAttachment downloadAttachment in downloadAttachments)
{
DownloadFlag = await DownloadAttachment(downloadAttachment.id);
//update the download status in database
if(DownloadFlag)
{
bool UpdateFlag = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
if (UpdateFlag)
{
await DeleteAttachment(downloadAttachment.id);
}
}
}
return true;
}
catch (Exception ext)
{
log.Error(ext, "Error in Saving attachment {attachemntId}",downloadAttachment.id);
return false;
}
}
Document service code
public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
{
return await _documentRepository.UpdateAttachmentDownloadStatus(AttachmentID);
}
And DB update code
public async Task<bool> UpdateAttachmentDownloadStatus(string AttachmentID)
{
using (var db = new SqlConnection(_connectionString.Value))
{
var Result = 0; bool SuccessFlag = false;
var parameters = new DynamicParameters();
parameters.Add("#pm_AttachmentID", AttachmentID);
parameters.Add("#pm_Result", Result, System.Data.DbType.Int32, System.Data.ParameterDirection.Output);
var result = await db.ExecuteAsync("[Loan].[UpdateDownloadStatus]", parameters, commandType: CommandType.StoredProcedure);
Result = parameters.Get<int>("#pm_Result");
if (Result > 0) { SuccessFlag = true; }
return SuccessFlag;
}
}
How can i move this async task to run parallel ? and get the result? i tried following code
var task = Task.Run(() => DownloadAttachment( downloadAttachment.id));
bool result = task.Result;
Is this approach is fine? how can improve the performance? how to get the result from each parallel task and update to DB and delete based on success flag? Or this error is due to AWS timeout?
Please help
If you extracted the code that handles individual files to a separate method :
private async Task DownloadSingleAttachment(DownloadAttachment attachment)
{
try
{
var download = await DownloadAttachment(downloadAttachment.id);
if(download)
{
var update = await _DocumentService.UpdateDownloadStatus(downloadAttachment.id);
if (update)
{
await DeleteAttachment(downloadAttachment.id);
}
}
}
catch(....)
{
....
}
}
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
foreach (var attachment in downloadAttachments)
{
await DownloadSingleAttachment(attachment);
}
}
....
}
It would be easy to start all downloads at once, although not very efficient :
public async Task<bool> DownloadAttachmentsAsync(List<DownloadAttachment> downloadAttachment)
{
try
{
//Start all of them
var tasks=downloadAttachments.Select(att=>DownloadSingleAttachment(att));
await Task.WhenAll(tasks);
}
....
}
This isn't very efficient because external services hate lots of concurrent calls from a single source as you do, and almost certainly impose throttling. The database doesn't like lots of concurrent calls either, because in all database products concurrent calls lead to blocking one way or another. Even in databases that use multiversioning, this comes with an overhead.
Using Dataflow classes - Single block
One easy way to fix this is to use .NET's Dataflow classes to break the operation into a pipeline of steps, and execute each one with a different number of concurrent tasks.
We could put the entire operation into a single block, but that could cause problems if the update and delete operations aren't thread-safe :
var dlOptions= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
};
var downloader=new ActionBlock<DownloadAttachment>(async att=>{
await DownloadSingleAttachment(att);
},dlOptions);
foreach (var attachment in downloadAttachments)
{
await downloader.SendAsync(attachement.id);
}
downloader.Complete();
await downloader.Completion;
Dataflow - Multiple steps
To avoid possible thread issues, the rest of the methods can go to their own blocks. They could both go into one ActionBlock that calls both Update and Delete, or they could go into separate blocks if the methods talk to different services with different concurrency requirements.
The downloader block will execute at most 10 concurrent downloads. By default, each block uses only a single task at a time.
The updater and deleter blocks have their default DOP=1, which means there's no risk of race conditions as long as they don't try to use eg the same connection at the same time.
var downloader=new TransformBlock<string,(string id,bool download)>(
async id=> {
var download=await DownloadAttachment(id);
return (id,download);
},dlOptions);
var updater=new TransformBlock<(string id,bool download),(string id,bool update)>(
async (id,download)=> {
if(download)
{
var update = await _DocumentService.UpdateDownloadStatus(id);
return (id,update);
}
return (id,false);
});
var deleter=new ActionBlock<(string id,bool update)>(
async (id,update)=> {
if(update)
{
await DeleteAttachment(id);
}
});
The blocks can be linked into a pipeline now and used. The setting PropagateCompletion = true means that as soon as a block is finished processing, it will tell all its connected blocks to finish as well :
var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
downloader.LinkTo(updater, linkOptions);
updater.LinkTo(deleter,linkOptions);
We can pump data into the head block as long as we need. When we're done, we call the head block's Complete() method. As each block finishes processing its data, it will propagate its completion to the next block in the pipeline. We need to await for the last (tail) block to complete to ensure all the attachments have been processed:
foreach (var attachment in downloadAttachments)
{
await downloader.SendAsync(attachement.id);
}
downloader.Complete();
await deleter.Completion;
Each block has an input and (when necessary) an output buffer, which means the "producer" and "consumers" of the messages don't have to be in sync, or even know of each other. All the "producer" needs to know is where to find the head block in a pipeline.
Throttling and backpressure
One way to throttle is to use a fixed number of tasks through MaxDegreeOfParallelism.
It's also possible to put a limit to the input buffer, thus blocking previous steps or producers if a block can't process messages fast enough. This can be done simply by setting the BoundedCapacity option for a block:
var dlOptions= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
BoundedCapacity=20,
};
var updaterOptions= new ExecutionDataflowBlockOptions
{
BoundedCapacity=20,
};
...
var downloader=new TransformBlock<...>(...,dlOptions);
var updater=new TransformBlock<...>(...,updaterOptions);
No other changes are necessary
To run multiple asynchronous operations you could do something like this:
public async Task RunMultipleAsync<T>(IEnumerable<T> myList)
{
const int myNumberOfConcurrentOperations = 10;
var mySemaphore = new SemaphoreSlim(myNumberOfConcurrentOperations);
var tasks = new List<Task>();
foreach(var myItem in myList)
{
await mySemaphore.WaitAsync();
var task = RunOperation(myItem);
tasks.Add(task);
task.ContinueWith(t => mySemaphore.Release());
}
await Task.WhenAll(tasks);
}
private async Task RunOperation<T>(T myItem)
{
// Do stuff
}
Put your code from DownloadAttachmentsAsync at the 'Do stuff' comment
This will use a semaphore to limit the number of concurrent operations, since running to many concurrent operations is often a bad idea due to contention. You would need to experiment to find the optimal number of concurrent operations for your use case. Also note that error handling have been omitted to keep the example short.
If I have a list of tasks which I want to execute together but at the same time I want to execute a certain number of them together, so I await for one of them until one of them finishes, which then should mean awaiting should stop and a new task should be allowed to start, but when one of them finishes, I don't know how to stop awaiting for the task which is currently being awaited, I don't want to cancel the task, just stop awaiting and let it continue running in the background.
I have the following code
foreach (var link in SharedVars.DownloadQueue)
{
if (currentRunningCount != batch)
{
var task = DownloadFile(extraPathPerLink, link, totalLen);
_ = task.ContinueWith(_ =>
{
downloadQueueLinksTasks.Remove(task);
currentRunningCount--;
// TODO SHOULD CHANGE WHAT IS AWAITED
});
currentRunningCount++;
downloadQueueLinksTasks.Add(task);
}
if (currentRunningCount == batch)
{
// TODO SHOULD NOT AWAIT 0
await downloadQueueLinksTasks[0];
}
}
I found about Task.WhenAny but from this comment here I understood that the other tasks will be ignored so it's not the solution I want to achieve.
I'm sorry if the question is stupid or wrong but I can't seem to find any information related on how to solve it, or what is the name of the operation I want to achieve so I can even search correctly.
Solution Edit
All the answers provided are correct, I accepted the one I decided to use but still all of them are correct.
Thank you everyone, I learned a lot from all of you from these different answers and different ways to approach the problem and how to think about it.
What I learned about this specific problem was that I still needed to await for the other tasks left, so the solution was to have the Task.WhenAny inside the loop (which returns the finished task (this is also important)) AND Task.WhenAll outside the loop to await the other left tasks.
Task.WhenAny returns the Task which completed.
foreach (var link in SharedVars.DownloadQueue)
{
var task = DownloadFile(extraPathPerLink, link, totalLen);
downloadQueueLinksTasks.Add(task);
if (downloadQueueLinksTasks.Count == batch)
{
// Wait for any Task to complete, then remove it from
// the list of pending tasks.
var completedTask = await Task.WhenAny(downloadQueueLinksTasks);
downloadQueueLinksTasks.Remove(completedTask);
}
}
// Wait for all of the remaining Tasks to complete
await Task.WhenAll(downloadQueueLinksTasks);
You can use Task.WaitAny()
Here is the demonstration of the behavior:
public static async Task Main(string[] args)
{
IList<Task> tasks = new List<Task>();
tasks.Add(TestAsync(0));
tasks.Add(TestAsync(1));
tasks.Add(TestAsync(2));
tasks.Add(TestAsync(3));
tasks.Add(TestAsync(4));
tasks.Add(TestAsync(5));
var result = Task.WaitAny(tasks.ToArray());
Console.WriteLine("returned task id is {0}", result);
///do other operations where
//before exiting wait for other tasks so that your tasks won't get cancellation signal
await Task.WhenAll(tasks.ToArray());
}
public static async Task TestAsync(int i)
{
Console.WriteLine("Staring to wait" + i);
await Task.Delay(new Random().Next(1000, 10000));
Console.WriteLine("Task finished" + i);
}
Output:
Staring to wait0
Staring to wait1
Staring to wait2
Staring to wait3
Staring to wait4
Staring to wait5
Task finished0
returned task id is 0
Task finished4
Task finished2
Task finished1
Task finished5
Task finished3
What you're asking about is throttling, which for asynchronous code is best expressed via SemaphoreSlim:
var semaphore = new SemaphoreSlim(batch);
var tasks = SharedVars.DownloadQueue.Select(link =>
{
await semaphore.WaitAsync();
try { return DownloadFile(extraPathPerLink, link, totalLen); }
finally { semaphore.Release(); }
});
var results = await Task.WhenAll(tasks);
You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
IDisposable subscription =
SharedVars.DownloadQueue
.ToObservable()
.Select(link =>
Observable.FromAsync(() => DownloadFile(extraPathPerLink, link, totalLen)))
.Merge(batch) //max concurrent downloads
.Subscribe(file =>
{
/* process downloaded file here
(no longer a task) */
});
If you need to stop the downloads before they would naturally finish just call subscription.Dispose().
I have a series of Tasks in an array. If a Task is "Good" it returns a string. If it's "Bad": it return a null.
I want to be able to run all the Tasks in parallel, and once the first one comes back that is "Good", then cancel the others and get the "Good" result.
I am doing this now, but the problem is that all the tasks need to run, then I loop through them looking for the first good result.
List<Task<string>> tasks = new List<Task<string>>();
Task.WaitAll(tasks.ToArray());
I want to be able to run all the Tasks in parallel, and once the first one comes back that is "Good", then cancel the others and get the "Good" result.
This is misunderstanding, since Cancellation in TPL is co-operative, so once the Task is started, there's no way to Cancel it. CancellationToken can work before Task is started or later to throw an exception, if Cancellation is requested, which is meant to initiate and take necessary action, like throw custom exception from the logic
Check the following query, it has many interesting answers listed, but none of them Cancel. Following is also a possible option:
public static class TaskExtension<T>
{
public static async Task<T> FirstSuccess(IEnumerable<Task<T>> tasks, T goodResult)
{
// Create a List<Task<T>>
var taskList = new List<Task<T>>(tasks);
// Placeholder for the First Completed Task
Task<T> firstCompleted = default(Task<T>);
// Looping till the Tasks are available in the List
while (taskList.Count > 0)
{
// Fetch first completed Task
var currentCompleted = await Task.WhenAny(taskList);
// Compare Condition
if (currentCompleted.Status == TaskStatus.RanToCompletion
&& currentCompleted.Result.Equals(goodResult))
{
// Assign Task and Clear List
firstCompleted = currentCompleted;
break;
}
else
// Remove the Current Task
taskList.Remove(currentCompleted);
}
return (firstCompleted != default(Task<T>)) ? firstCompleted.Result : default(T);
}
}
Usage:
var t1 = new Task<string>(()=>"bad");
var t2 = new Task<string>(()=>"bad");
var t3 = new Task<string>(()=>"good");
var t4 = new Task<string>(()=>"good");
var taskArray = new []{t1,t2,t3,t4};
foreach(var tt in taskArray)
tt.Start();
var finalTask = TaskExtension<string>.FirstSuccess(taskArray,"good");
Console.WriteLine(finalTask.Result);
You may even return Task<Task<T>>, instead of Task<T> for necessary logical processing
You can achieve your desired results using following example.
List<Task<string>> tasks = new List<Task<string>>();
// ***Use ToList to execute the query and start the tasks.
List<Task<string>> goodBadTasks = tasks.ToList();
// ***Add a loop to process the tasks one at a time until none remain.
while (goodBadTasks.Count > 0)
{
// Identify the first task that completes.
Task<string> firstFinishedTask = await Task.WhenAny(goodBadTasks);
// ***Remove the selected task from the list so that you don't
// process it more than once.
goodBadTasks.Remove(firstFinishedTask);
// Await the completed task.
string firstFinishedTaskResult = await firstFinishedTask;
if(firstFinishedTaskResult.Equals("good")
// do something
}
EDIT : If you want to terminate all the tasks you can use CancellationToken.
For more detail read the docs.
I was looking into Task.WhenAny() which will trigger on the first "completed" task. Unfortunately, a completed task in this sense is basically anything... even an exception is considered "completed". As far as I can tell there is no other way to check for what you call a "good" value.
While I don't believe there is a satisfactory answer for your question I think there may be an alternative solution to your problem. Consider using Parallel.ForEach.
Parallel.ForEach(tasks, (task, state) =>
{
if (task.Result != null)
state.Stop();
});
The state.Stop() will cease the execution of the Parallel loop when it finds a non-null result.
Besides having the ability to cease execution when it finds a "good" value, it will perform better under many (but not all) scenarios.
Use Task.WhenAny It returns the finished Task. Check if it's null. If it is, remove it from the List and call Task.WhenAny Again.
If it's good, Cancel all Tasks in the List (they should all have a CancellationTokenSource.Token.
Edit:
All Tasks should use the same CancellationTokenSource.Token. Then you only need to cancel once.
Here is some code to clarify:
private async void button1_Click(object sender, EventArgs e)
{
CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
List<Task<string>> tasks = new List<Task<string>>();
tasks.Add(Task.Run<string>(() => // run your tasks
{
while (true)
{
if (cancellationTokenSource.Token.IsCancellationRequested)
{
return null;
}
return "Result"; //string or null
}
}));
while (tasks.Count > 0)
{
Task<string> resultTask = await Task.WhenAny(tasks);
string result = await resultTask;
if (result == null)
{
tasks.Remove(resultTask);
}
else
{
// success
cancellationTokenSource.Cancel(); // will cancel all tasks
}
}
}
I am using Async await with Task.Factory method.
public async Task<JobDto> ProcessJob(JobDto jobTask)
{
try
{
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
await T;
}
This method I am calling inside a loop like this
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
What I notice is that new tasks opens up inside Process explorer and they also start working (based on log file). however out of 10 sometimes 8 or sometimes 7 finishes. Rest of them just never come back.
why would that be happening ?
Are they timing out ? Where can I set timeout for my tasks ?
UPDATE
Basically above, I would like each Task to start running as soon as they are called and wait for the response on AWAIT T keyword. I am assuming here that once they finish each of them will come back at Await T and do the next action. I am alraedy seeing this result for 7 out of 10 tasks but 3 of them are not coming back.
Thanks
It is hard to say what the issues is without the rest if the code, but you code can be simplified by making ProcessJob synchronous and then calling Task.Run with it.
public JobDto ProcessJob(JobDto jobTask)
{
JobWorker jobWorker = new JobWorker();
return jobWorker.Execute(jobTask);
}
Start tasks and wait for all tasks to finish. Prefer using Task.Run rather than Task.Factory.StartNew as it provides more favourable defaults for pushing work to the background. See here.
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = Task.Run(() => ProcessJob(jobList[i]));
}
try
{
await Task.WhenAll(tasks);
}
catch(Exception ex)
{
// handle exception
}
First, let's make a reproducible version of your code. This is NOT the best way to achieve what you are doing, but to show you what is happening in your code!
I'll keep the code almost same as your code, except I'll use simple int rather than your JobDto and on completion of the job Execute() I'll write in a file that we can verify later. Here's the code
public class SomeMainClass
{
public void StartProcessing()
{
var jobList = Enumerable.Range(1, 10).ToArray();
var tasks = new Task[10];
//[1] start 10 jobs, one-by-one
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
//[4] here we have 10 awaitable Task in tasks
//[5] do all other unrelated operations
Thread.Sleep(1500); //assume it works for 1.5 sec
// Task.WaitAll(tasks); //[6] wait for tasks to complete
// The PROCESS IS COMPLETE here
}
public async Task ProcessJob(int jobTask)
{
try
{
//[2] start job in a ThreadPool, Background thread
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
//[3] await here will keep context of calling thread
await T; //... and release the calling thread
}
catch (Exception) { /*handle*/ }
}
}
public class JobWorker
{
static object locker = new object();
const string _file = #"C:\YourDirectory\out.txt";
public void Execute(int jobTask) //on complete, writes in file
{
Thread.Sleep(500); //let's assume does something for 0.5 sec
lock(locker)
{
File.AppendAllText(_file,
Environment.NewLine + "Writing the value-" + jobTask);
}
}
}
After running just the StartProcessing(), this is what I get in the file
Writing the value-4
Writing the value-2
Writing the value-3
Writing the value-1
Writing the value-6
Writing the value-7
Writing the value-8
Writing the value-5
So, 8/10 jobs has completed. Obviously, every time you run this, the number and order might change. But, the point is, all the jobs did not complete!
Now, if I un-comment the step [6] Task.WaitAll(tasks);, this is what I get in my file
Writing the value-2
Writing the value-3
Writing the value-4
Writing the value-1
Writing the value-5
Writing the value-7
Writing the value-8
Writing the value-6
Writing the value-9
Writing the value-10
So, all my jobs completed here!
Why the code is behaving like this, is already explained in the code-comments. The main thing to note is, your tasks run in ThreadPool based Background threads. So, if you do not wait for them, they will be killed when the MAIN process ends and the main thread exits!!
If you still don't want to await the tasks there, you can return the list of tasks from this first method and await the tasks at the very end of the process, something like this
public Task[] StartProcessing()
{
...
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
...
return tasks;
}
//in the MAIN METHOD of your application/process
var tasks = new SomeMainClass().StartProcessing();
// do all other stuffs here, and just at the end of process
Task.WaitAll(tasks);
Hope this clears all confusion.
It's possible your code is swallowing exceptions. I would add a ContineWith call to the end of the part of the code that starts the new task. Something like this untested code:
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
}).ContinueWith(tsk =>
{
var flattenedException = tsk.Exception.Flatten();
Console.Log("Exception! " + flattenedException);
return true;
});
},TaskContinuationOptions.OnlyOnFaulted); //Only call if task is faulted
Another possibility is that something in one of the tasks is timing out (like you mentioned) or deadlocking. To track down whether a timeout (or maybe deadlock) is the root cause, you could add some timeout logic (as described in this SO answer):
int timeout = 1000; //set to something much greater than the time it should take your task to complete (at least for testing)
var task = TheMethodWhichWrapsYourAsyncLogic(cancellationToken);
if (await Task.WhenAny(task, Task.Delay(timeout, cancellationToken)) == task)
{
// Task completed within timeout.
// Consider that the task may have faulted or been canceled.
// We re-await the task so that any exceptions/cancellation is rethrown.
await task;
}
else
{
// timeout/cancellation logic
}
Check out the documentation on exception handling in the TPL on MSDN.
I has a simple console app where I want to call many Urls in a loop and put the result in a database table. I am using .Net 4.5 and using async i/o to fetch the URL data. Here is a simplified version of what I am doing. All methods are async except for the database operation. Do you guys see any issues with this? Are there better ways of optimizing?
private async Task Run(){
var items = repo.GetItems(); // sync method to get list from database
var tasks = new List<Task>();
// add each call to task list and process result as it becomes available
// rather than waiting for all downloads
foreach(Item item in items){
tasks.Add(GetFromWeb(item.url).ContinueWith(response => { AddToDatabase(response.Result);}));
}
await Task.WhenAll(tasks); // wait for all tasks to complete.
}
private async Task<string> GetFromWeb(url) {
HttpResponseMessage response = await GetAsync(url);
return await response.Content.ReadAsStringAsync();
}
private void AddToDatabase(string item){
// add data to database.
}
Your solution is acceptable. But you should check out TPL Dataflow, which allows you to set up a dataflow "mesh" (or "pipeline") and then shove the data through it.
For a problem this simple, Dataflow won't really add much other than getting rid of the ContinueWith (I always find manual continuations awkward). But if you plan to add more steps or change your data flow in the future, Dataflow should be something you consider.
Your solution is pretty much correct, with just two minor mistakes (both of which cause compiler errors). First, you don't call ContinueWith on the result of List.Add, you need call continue with on the task and then add the continuation to your list, this is solved by just moving a parenthesis. You also need to call Result on the reponse Task.
Here is the section with the two minor changes:
tasks.Add(GetFromWeb(item.url)
.ContinueWith(response => { AddToDatabase(response.Result);}));
Another option is to leverage a method that takes a sequence of tasks and orders them by the order that they are completed. Here is my implementation of such a method:
public static IEnumerable<Task<T>> Order<T>(this IEnumerable<Task<T>> tasks)
{
var taskList = tasks.ToList();
var taskSources = new BlockingCollection<TaskCompletionSource<T>>();
var taskSourceList = new List<TaskCompletionSource<T>>(taskList.Count);
foreach (var task in taskList)
{
var newSource = new TaskCompletionSource<T>();
taskSources.Add(newSource);
taskSourceList.Add(newSource);
task.ContinueWith(t =>
{
var source = taskSources.Take();
if (t.IsCanceled)
source.TrySetCanceled();
else if (t.IsFaulted)
source.TrySetException(t.Exception.InnerExceptions);
else if (t.IsCompleted)
source.TrySetResult(t.Result);
}, CancellationToken.None, TaskContinuationOptions.PreferFairness, TaskScheduler.Default);
}
return taskSourceList.Select(tcs => tcs.Task);
}
Using this your code can become:
private async Task Run()
{
IEnumerable<Item> items = repo.GetItems(); // sync method to get list from database
foreach (var task in items.Select(item => GetFromWeb(item.url))
.Order())
{
await task.ConfigureAwait(false);
AddToDatabase(task.Result);
}
}
Just though I'd throw in my hat as well with the Rx solution
using System.Reactive;
using System.Reactive.Linq;
private Task Run()
{
var fromWebObservable = from item in repo.GetItems.ToObservable(Scheduler.Default)
select GetFromWeb(item.url);
fromWebObservable
.Select(async x => await x)
.Do(AddToDatabase)
.ToTask();
}