There is something that is bugging me for a while, something that happen in my code I can't understand.
I've defined a workflow to extract information from the Facebook API, basically I have 3 different async tasks.
Execute Reports
private static async Task<List<DataTypes.ReportRun>> ExecuteMarketingReports(FacebookClient fb, List<DataTypes.Request.Batch> batch_requests)
{
//Do stuff - Make async POST requests to
//FB to execute reports on FB server side
}
Monitorize Reports
private static async Task<List<DataTypes.AdReportRun>> MonitorizeMarketingReports(FacebookClient fb, List<DataTypes.ReportRun> tListReportRun)
{
//Do stuff -- Check if the reports are ready or not, and return a list with status
}
GetReportData
private static async Task GetReportData(FacebookClient fb, List<DataTypes.AdReportRun> tListReportRun, DateTime since, DateTime until, string breakdown)
{
//Do stuff - Gets report thata once the reports are finish and create the files
}
This is the main Task where all the other methods are called
private static async Task PullInsightsData(FacebookClient fb, List<DataTypes.Request.Batch> batchRequests, DateTime since, DateTime until, string breakdown)
{
var tResult = new List<DataTypes.AdReportRun>();
int retry = 0;
List<DataTypes.AdReportRun> tReportCompleteList = new List<DataTypes.AdReportRun>();
List<DataTypes.AdReportRun> tReportIncompleteList = new List<DataTypes.AdReportRun>();
var report_ids = await ExecuteMarketingReports(fb, batchRequests);
Thread.Sleep(20000); // Waits 20 seconds before get the info.
do
{
/*Start monitorizing the reports*/
var tReport_info = await MonitorizeMarketingReports(fb, report_ids);
/*Get the reports that are complete*/
tReportCompleteList = tReport_info.Where(x => x.async_percent_completion == 100).ToList();
if (tReportCompleteList.Count > 0)
await GetReportData(fb, tReportCompleteList, since, until, breakdown);
tReportIncompleteList = tReport_info.Where(x => x.async_percent_completion < 100).ToList();
report_ids = (from x in tReportIncompleteList
select new DataTypes.ReportRun { report_run_id = x.id }).ToList();
var sleepTime = TimeSpan.FromSeconds(Math.Pow(2, retry + 1));
Thread.Sleep(sleepTime);
retry++;
} while (report_ids.Count > 0 && retry < 8);
}
I call my Main task in this foreach loop, and this is where the problem occurs.
for (int i = 0; i < ActiveAdAccounts.Count; i = i + 50)
{
var AdAccountsSubList = ActiveAdAccounts.Skip(i).Take(50).ToList();
var batchRequests = ....
await PullInsightsData(fb, batchRequests, (DateTime)since, (DateTime)until, breakdown.Replace(",", "_"));
//tTaskList.Add(PullInsightsData(fb, batchRequests, (DateTime)since, (DateTime)until, breakdown.Replace(",", "_")));
}
//Task.WaitAll(tTaskList);
I don't understand why the foreach loop does't continue using the await the console application closes suddenly, shouldn't await "wait" until the task is finish and so then proceed to the next line of code?
I've solved the problem putting all the tasks into a list and waiting for all, but I would like an explanation for this.
[EDITED] Question was edited to create a minimal reproducible example.
Related
I have a Task that generates a PDF file for an order (it takes about 10 seconds to create one PDF):
public async Task GeneratePDF(Guid Id) {
var order = await
_context
.Orders
.Include(order => order.Customer)
... //a lot more Include and ThenInclude statements
.FirstOrDefaultAsync(order ==> order.Id == Id);
var document = ... //PDF generated here, takes about 10 seconds
order.PDF = document ;
_context.SaveChangesAsync();
}
I tried the following:
public async Task GenerateAllPDFs() {
var orderIds = await _context.Orders.Select(order=> order.Id).ToListAsync();
foreach (var id in orderIds)
{
_ = GeneratePDF(id).ContinueWith(t => Console.WriteLine(t.Exception), TaskContinuationOptions.OnlyOnFaulted);
}
}
this gives me the error:
System.ObjectDisposedException: Cannot access a disposed object. A common cause of this error is disposing a context that was resolved from dependency injection and then later trying to use the same context instance elsewhere in your application. This may occur if you are calling Dispose() on the context, or wrapping the context in a using statement. If you are using dependency injection, you should let the dependency injection container take care of disposing context instances.
If I change the task as follows...
public async Task GenerateAllPDFs() {
var orderIds = await _context.Orders.Select(order=> order.Id).ToListAsync();
foreach (var id in orderIds)
{
_ = await GeneratePDF(id);
}
}
...it runs the task for each order in series, taking ages to complete (I have a few thousands orders, taking about 10 seconds per order)...
How can I run this task in parallel for all orders in the context, so that the time it takes to complete is much less than sequential processing?
You can map your order IDs to tasks and await them all like:
public async Task GeneratePDF(Order order) {
var document = ... //PDF generated here, takes about 10 seconds
order.PDF = document ;
}
public async Task GenerateAllPDFs() {
var orderIds = await _context.Orders.ToListAsync();
var tasks = orderIds.Select((order) => GeneratePDF(order).ContinueWith(t => Console.WriteLine(t.Exception), TaskContinuationOptions.OnlyOnFaulted));
await Task.WhenAll(tasks);
await _context.SaveChangesAsync();
}
Here is my suggestion from the comment as an answer. I would split it in 3 parts:
1) get all orders,
2) then do a Parallel.Foreach to generate all documents in parallel. and assign each document to the proper order and in the end
3) do a single _context.SaveChangesAsync(); to make a bulk update on the data on the server
public async Task GenerateAllPDFs()
{
var allOrders = await _context.Orders.ToListAsync();
System.Threading.Tasks.Parallel.ForEach(allOrders, order =>
{
var document = ... //PDF generated here, takes about 10 seconds
order.PDF = document ;
});
await _context.SaveChangesAsync();
}
You need to implement parallel programing.
https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/task-based-asynchronous-programming
public class Example
{
public static void Main()
{
Task[] taskArray = new Task[10];
for (int i = 0; i < taskArray.Length; i++) {
taskArray[i] = Task.Factory.StartNew( (Object obj ) => {
CustomData data = obj as CustomData;
if (data == null)
return;
data.ThreadNum = Thread.CurrentThread.ManagedThreadId;
},
new CustomData() {Name = i, CreationTime = DateTime.Now.Ticks} );
}
Task.WaitAll(taskArray);
foreach (var task in taskArray) {
var data = task.AsyncState as CustomData;
if (data != null)
Console.WriteLine("Task #{0} created at {1}, ran on thread #{2}.",
data.Name, data.CreationTime, data.ThreadNum);
}
}
}
I think I will have to "duplicate" the GeneratePDF method to facilitate the batch processing by implementing the other answers , since I need this method also in non-batch mode...
I found some strange behavior running tasks within a for loop and awaiting them all. As mentioned below, what the code does is starting a "loop" for definite number of tasks, each task creates a "ToDo" item and each task has a continuation that assigns each created task to a person and finally but them in a listBox using Invoke so the UI thread is called. this works fine, but i do not get the expected data in the listBox. I do not expect them to be ordered, but i do expect them to be paired, e.g. :
Person_8, Todo 8
Person 5, Todo 5
etc...
And they should only appear ones in the listBox of course ! But instead, i get strange output (and output is never the same for each run), here are some examples running the code:
enter image description here
enter image description here
And here is the code:
private async void buttonCreateToDo_Click(object sender, EventArgs e){
await CreateToDoAsync();
}
private async Task CreateToDoAsync(){
List<Task> taskList = new List<Task>();
for (int i = 1; i < 10; i++){
var task = Task.Run(() => CreateToDo(i));
Task continuation = task.ContinueWith((antecedent) => Invoke(new AssignTaskDelegate(AssignTask), (new Person() {
Name = $"Person_{i}",
ToDoForPerson = antecedent.Result
})));
taskList.Add(task);
}
await Task.WhenAll(taskList.ToArray());
}
private ToDo CreateToDo(int toDoId) {
return new ToDo(){
Id = toDoId,
Description = $"Todo {toDoId}"
};
}
private void AssignTask(Person person){
listBoxToDo.Items.Add($"{person.Name}, {person.ToDoForPerson.Description}");
}
Your issue is that for loop runs much faster than the creation of the tasks and so by the time the tasks run the variable i has reached the end of the loop.
To fix this you need to take a copy of i inside the loop and use that.
Try this code:
private async Task CreateToDoAsync()
{
List<Task> taskList = new List<Task>();
for (int i = 1; i < 10; i++)
{
var local_i = i;
var task = Task.Run(() => CreateToDo(local_i));
Task continuation = task.ContinueWith((antecedent) => Invoke(new AssignTaskDelegate(AssignTask), (new Person()
{
Name = $"Person_{local_i}",
ToDoForPerson = antecedent.Result
})));
taskList.Add(task);
}
await Task.WhenAll(taskList.ToArray());
}
Now, in preference I'd use Microsoft's Reactive Framework (NuGet "System.Reactive") to do this work. Your code would look like this:
private async Task CreateToDoAsync()
{
var query =
from i in Observable.Range(1, 9)
from t in Observable.Start(() => CreateToDo(i))
select new Person() { Name = $"Person_{i}", ToDoForPerson = t };
await query.ObserveOn(listBoxToDo).Do(x => AssignTask(x));
}
Done. That's it.
When I run my code (with the AssignTask outputting to the Console) I get this:
Person_1, Todo 1
Person_2, Todo 2
Person_3, Todo 3
Person_6, Todo 6
Person_7, Todo 7
Person_4, Todo 4
Person_5, Todo 5
Person_8, Todo 8
Person_9, Todo 9
The .ObserveOn(listBoxToDo) works for Windows Forms to marshall back to the UI thread.
You are using variable "i" from the loop inside the task. "i" might have changed while your task started.
So I am trying to learn how to write asynchronous methods and have been banging my head to get asynchronous calls to work. What always seems to happen is the code hangs on "await" instruction until it eventually seems to time out and crash the loading form in the same method with it.
There are two main reason this is strange:
The code works flawlessly when not asynchronous and just a simple loop
I copied the MSDN code almost verbatim to convert the code to asynchronous calls here: https://msdn.microsoft.com/en-us/library/mt674889.aspx
I know there are a lot of questions already about this on the forms but I have gone through most of them and tried a lot of other ways (with the same result) and now seem to think something is fundamentally wrong after MSDN code wasn't working.
Here is the main method that is called by a background worker:
// this method loads data from each individual webPage
async Task LoadSymbolData(DoWorkEventArgs _e)
{
int MAX_THREADS = 10;
int tskCntrTtl = dataGridView1.Rows.Count;
Dictionary<string, string> newData_d = new Dictionary<string, string>(tskCntrTtl);
// we need to make copies of things that can change in a different thread
List<string> links = new List<string>(dataGridView1.Rows.Cast<DataGridViewRow>()
.Select(r => r.Cells[dbIndexs_s.url].Value.ToString()).ToList());
List<string> symbols = new List<string>(dataGridView1.Rows.Cast<DataGridViewRow>()
.Select(r => r.Cells[dbIndexs_s.symbol].Value.ToString()).ToList());
// we need to create a cancelation token once this is working
// TODO
using (LoadScreen loadScreen = new LoadScreen("Querying stock servers..."))
{
// we cant use the delegate becaus of async keywords
this.loaderScreens.Add(loadScreen);
// wait until the form is loaded so we dont get exceptions when writing to controls on that form
while ( !loadScreen.IsLoaded() );
// load the total number of operations so we can simplify incrementing the progress bar
// on seperate form instances
loadScreen.LoadProgressCntr(0, tskCntrTtl);
// try to run all async tasks since they are non-blocking threaded operations
for (int i = 0; i < tskCntrTtl; i += MAX_THREADS)
{
List<Task<string[]>> ProcessURL = new List<Task<string[]>>();
List<int> taskList = new List<int>();
// Make a list of task indexs
for (int task = i; task < i + MAX_THREADS && task < tskCntrTtl; task++)
taskList.Add(task);
// ***Create a query that, when executed, returns a collection of tasks.
IEnumerable<Task<string[]>> downloadTasksQuery =
from task in taskList select QueryHtml(loadScreen, links[task], symbols[task]);
// ***Use ToList to execute the query and start the tasks.
List<Task<string[]>> downloadTasks = downloadTasksQuery.ToList();
// ***Add a loop to process the tasks one at a time until none remain.
while (downloadTasks.Count > 0)
{
// Identify the first task that completes.
Task<string[]> firstFinishedTask = await Task.WhenAny(downloadTasks); // <---- CODE HANGS HERE
// ***Remove the selected task from the list so that you don't
// process it more than once.
downloadTasks.Remove(firstFinishedTask);
// Await the completed task.
string[] data = await firstFinishedTask;
if (!newData_d.ContainsKey(data.First()))
newData_d.Add(data.First(), data.Last());
}
}
// now we have the dictionary with all the information gathered from teh websites
// now we can add the columns if they dont already exist and load the information
// TODO
loadScreen.UpdateProgress(100);
this.loaderScreens.Remove(loadScreen);
}
}
And here is the async method for querying web pages:
async Task<string[]> QueryHtml(LoadScreen _loadScreen, string _link, string _symbol)
{
string data = String.Empty;
try
{
HttpClient client = new HttpClient();
var doc = new HtmlAgilityPack.HtmlDocument();
var html = await client.GetStringAsync(_link); // <---- CODE HANGS HERE
doc.LoadHtml(html);
string percGrn = doc.FindInnerHtml(
"//span[contains(#class,'time_rtq_content') and contains(#class,'up_g')]//span[2]");
string percRed = doc.FindInnerHtml(
"//span[contains(#class,'time_rtq_content') and contains(#class,'down_r')]//span[2]");
// create somthing we'll nuderstand later
if ((String.IsNullOrEmpty(percGrn) && String.IsNullOrEmpty(percRed)) ||
(!String.IsNullOrEmpty(percGrn) && !String.IsNullOrEmpty(percRed)))
throw new Exception();
// adding string to empty gives string
string perc = percGrn + percRed;
bool isNegative = String.IsNullOrEmpty(percGrn);
double percDouble;
if (double.TryParse(Regex.Match(perc, #"\d+([.])?(\d+)?").Value, out percDouble))
data = (isNegative ? 0 - percDouble : percDouble).ToString();
}
catch (Exception ex) { }
finally
{
// update the progress bar...
_loadScreen.IncProgressCntr();
}
return new string[] { _symbol, data };
}
I could really use some help. Thanks!
In short when you combine async with any 'regular' task functions you get a deadlock
http://olitee.com/2015/01/c-async-await-common-deadlock-scenario/
the solution is by using configureawait
var html = await client.GetStringAsync(_link).ConfigureAwait(false);
The reason you need this is because you didn't await your orginal thread.
// ***Create a query that, when executed, returns a collection of tasks.
IEnumerable<Task<string[]>> downloadTasksQuery = from task in taskList select QueryHtml(loadScreen,links[task], symbols[task]);
What's happeneing here is that you mix the await paradigm with thre regular task handling paradigm. and those don't mix (or rather you have to use the ConfigureAwait(false) for this to work.
I am using Async await with Task.Factory method.
public async Task<JobDto> ProcessJob(JobDto jobTask)
{
try
{
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
await T;
}
This method I am calling inside a loop like this
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
What I notice is that new tasks opens up inside Process explorer and they also start working (based on log file). however out of 10 sometimes 8 or sometimes 7 finishes. Rest of them just never come back.
why would that be happening ?
Are they timing out ? Where can I set timeout for my tasks ?
UPDATE
Basically above, I would like each Task to start running as soon as they are called and wait for the response on AWAIT T keyword. I am assuming here that once they finish each of them will come back at Await T and do the next action. I am alraedy seeing this result for 7 out of 10 tasks but 3 of them are not coming back.
Thanks
It is hard to say what the issues is without the rest if the code, but you code can be simplified by making ProcessJob synchronous and then calling Task.Run with it.
public JobDto ProcessJob(JobDto jobTask)
{
JobWorker jobWorker = new JobWorker();
return jobWorker.Execute(jobTask);
}
Start tasks and wait for all tasks to finish. Prefer using Task.Run rather than Task.Factory.StartNew as it provides more favourable defaults for pushing work to the background. See here.
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = Task.Run(() => ProcessJob(jobList[i]));
}
try
{
await Task.WhenAll(tasks);
}
catch(Exception ex)
{
// handle exception
}
First, let's make a reproducible version of your code. This is NOT the best way to achieve what you are doing, but to show you what is happening in your code!
I'll keep the code almost same as your code, except I'll use simple int rather than your JobDto and on completion of the job Execute() I'll write in a file that we can verify later. Here's the code
public class SomeMainClass
{
public void StartProcessing()
{
var jobList = Enumerable.Range(1, 10).ToArray();
var tasks = new Task[10];
//[1] start 10 jobs, one-by-one
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
//[4] here we have 10 awaitable Task in tasks
//[5] do all other unrelated operations
Thread.Sleep(1500); //assume it works for 1.5 sec
// Task.WaitAll(tasks); //[6] wait for tasks to complete
// The PROCESS IS COMPLETE here
}
public async Task ProcessJob(int jobTask)
{
try
{
//[2] start job in a ThreadPool, Background thread
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
//[3] await here will keep context of calling thread
await T; //... and release the calling thread
}
catch (Exception) { /*handle*/ }
}
}
public class JobWorker
{
static object locker = new object();
const string _file = #"C:\YourDirectory\out.txt";
public void Execute(int jobTask) //on complete, writes in file
{
Thread.Sleep(500); //let's assume does something for 0.5 sec
lock(locker)
{
File.AppendAllText(_file,
Environment.NewLine + "Writing the value-" + jobTask);
}
}
}
After running just the StartProcessing(), this is what I get in the file
Writing the value-4
Writing the value-2
Writing the value-3
Writing the value-1
Writing the value-6
Writing the value-7
Writing the value-8
Writing the value-5
So, 8/10 jobs has completed. Obviously, every time you run this, the number and order might change. But, the point is, all the jobs did not complete!
Now, if I un-comment the step [6] Task.WaitAll(tasks);, this is what I get in my file
Writing the value-2
Writing the value-3
Writing the value-4
Writing the value-1
Writing the value-5
Writing the value-7
Writing the value-8
Writing the value-6
Writing the value-9
Writing the value-10
So, all my jobs completed here!
Why the code is behaving like this, is already explained in the code-comments. The main thing to note is, your tasks run in ThreadPool based Background threads. So, if you do not wait for them, they will be killed when the MAIN process ends and the main thread exits!!
If you still don't want to await the tasks there, you can return the list of tasks from this first method and await the tasks at the very end of the process, something like this
public Task[] StartProcessing()
{
...
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
...
return tasks;
}
//in the MAIN METHOD of your application/process
var tasks = new SomeMainClass().StartProcessing();
// do all other stuffs here, and just at the end of process
Task.WaitAll(tasks);
Hope this clears all confusion.
It's possible your code is swallowing exceptions. I would add a ContineWith call to the end of the part of the code that starts the new task. Something like this untested code:
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
}).ContinueWith(tsk =>
{
var flattenedException = tsk.Exception.Flatten();
Console.Log("Exception! " + flattenedException);
return true;
});
},TaskContinuationOptions.OnlyOnFaulted); //Only call if task is faulted
Another possibility is that something in one of the tasks is timing out (like you mentioned) or deadlocking. To track down whether a timeout (or maybe deadlock) is the root cause, you could add some timeout logic (as described in this SO answer):
int timeout = 1000; //set to something much greater than the time it should take your task to complete (at least for testing)
var task = TheMethodWhichWrapsYourAsyncLogic(cancellationToken);
if (await Task.WhenAny(task, Task.Delay(timeout, cancellationToken)) == task)
{
// Task completed within timeout.
// Consider that the task may have faulted or been canceled.
// We re-await the task so that any exceptions/cancellation is rethrown.
await task;
}
else
{
// timeout/cancellation logic
}
Check out the documentation on exception handling in the TPL on MSDN.
I'm using Asp .Net 4.5.1.
I have tasks to run, which call a web-service, and some might fail. I need to run N successful tasks which perform some light CPU work and mainly call a web service, then stop, and I want to throttle.
For example, let's assume we have 300 URLs in some collection. We need to run a function called Task<bool> CheckUrlAsync(url) on each of them, with throttling, meaning, for example, having only 5 run "at the same time" (in other words, have maximum 5 connections used at any given time). Also, we only want to perform N (assume 100) successful operations, and then stop.
I've read this and this and still I'm not sure what would be the correct way to do it.
How would you do it?
Assume ASP .Net
Assume IO call (http call to web serice), no heavy CPU operations.
Use Semaphore slim.
var semaphore = new SemaphoreSlim(5);
var tasks = urlCollection.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
};
while(tasks.Where(t => t.Completed).Count() < 100)
{
await.Task.WhenAny(tasks);
}
Although I would prefer to use Rx.Net to produce some better code.
using(var semaphore = new SemaphoreSlim(5))
{
var results = urlCollection.ToObservable()
.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
}).Take(100).ToList();
}
Okay...this is going to be fun.
public static class SemaphoreHelper
{
public static Task<T> ContinueWith<T>(
this SemaphoreSlim semaphore,
Func<Task<T>> action)
var ret = semaphore.WaitAsync()
.ContinueWith(action);
ret.ContinueWith(_ => semaphore.Release(), TaskContinuationOptions.None);
return ret;
}
var semaphore = new SemaphoreSlim(5);
var results = urlCollection.Select(
url => semaphore.ContinueWith(() => CheckUrlAsync(url)).ToList();
I do need to add that the code as it stands will still run all 300 URLs, it just will return quicker...thats all. You would need to add the cancelation token to the semaphore.WaitAsync(token) to cancel the queued work. Again I suggest using Rx.Net for that. Its just easier to use Rx.Net to get the cancelation token to work with .Take(100).
Try something like this?
private const int u_limit = 100;
private const int c_limit = 5;
List<Task> tasks = new List<Task>();
int totalRun = 0;
while (totalRun < u_limit)
{
for (int i = 0; i < c_limit; i++)
{
tasks.Add(Task.Run (() => {
// Your code here.
}));
}
Task.WaitAll(tasks);
totalRun += c_limit;
}