Random tasks from Task.Factory.StartNew never finishes - c#

I am using Async await with Task.Factory method.
public async Task<JobDto> ProcessJob(JobDto jobTask)
{
try
{
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
await T;
}
This method I am calling inside a loop like this
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
What I notice is that new tasks opens up inside Process explorer and they also start working (based on log file). however out of 10 sometimes 8 or sometimes 7 finishes. Rest of them just never come back.
why would that be happening ?
Are they timing out ? Where can I set timeout for my tasks ?
UPDATE
Basically above, I would like each Task to start running as soon as they are called and wait for the response on AWAIT T keyword. I am assuming here that once they finish each of them will come back at Await T and do the next action. I am alraedy seeing this result for 7 out of 10 tasks but 3 of them are not coming back.
Thanks

It is hard to say what the issues is without the rest if the code, but you code can be simplified by making ProcessJob synchronous and then calling Task.Run with it.
public JobDto ProcessJob(JobDto jobTask)
{
JobWorker jobWorker = new JobWorker();
return jobWorker.Execute(jobTask);
}
Start tasks and wait for all tasks to finish. Prefer using Task.Run rather than Task.Factory.StartNew as it provides more favourable defaults for pushing work to the background. See here.
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = Task.Run(() => ProcessJob(jobList[i]));
}
try
{
await Task.WhenAll(tasks);
}
catch(Exception ex)
{
// handle exception
}

First, let's make a reproducible version of your code. This is NOT the best way to achieve what you are doing, but to show you what is happening in your code!
I'll keep the code almost same as your code, except I'll use simple int rather than your JobDto and on completion of the job Execute() I'll write in a file that we can verify later. Here's the code
public class SomeMainClass
{
public void StartProcessing()
{
var jobList = Enumerable.Range(1, 10).ToArray();
var tasks = new Task[10];
//[1] start 10 jobs, one-by-one
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
//[4] here we have 10 awaitable Task in tasks
//[5] do all other unrelated operations
Thread.Sleep(1500); //assume it works for 1.5 sec
// Task.WaitAll(tasks); //[6] wait for tasks to complete
// The PROCESS IS COMPLETE here
}
public async Task ProcessJob(int jobTask)
{
try
{
//[2] start job in a ThreadPool, Background thread
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
//[3] await here will keep context of calling thread
await T; //... and release the calling thread
}
catch (Exception) { /*handle*/ }
}
}
public class JobWorker
{
static object locker = new object();
const string _file = #"C:\YourDirectory\out.txt";
public void Execute(int jobTask) //on complete, writes in file
{
Thread.Sleep(500); //let's assume does something for 0.5 sec
lock(locker)
{
File.AppendAllText(_file,
Environment.NewLine + "Writing the value-" + jobTask);
}
}
}
After running just the StartProcessing(), this is what I get in the file
Writing the value-4
Writing the value-2
Writing the value-3
Writing the value-1
Writing the value-6
Writing the value-7
Writing the value-8
Writing the value-5
So, 8/10 jobs has completed. Obviously, every time you run this, the number and order might change. But, the point is, all the jobs did not complete!
Now, if I un-comment the step [6] Task.WaitAll(tasks);, this is what I get in my file
Writing the value-2
Writing the value-3
Writing the value-4
Writing the value-1
Writing the value-5
Writing the value-7
Writing the value-8
Writing the value-6
Writing the value-9
Writing the value-10
So, all my jobs completed here!
Why the code is behaving like this, is already explained in the code-comments. The main thing to note is, your tasks run in ThreadPool based Background threads. So, if you do not wait for them, they will be killed when the MAIN process ends and the main thread exits!!
If you still don't want to await the tasks there, you can return the list of tasks from this first method and await the tasks at the very end of the process, something like this
public Task[] StartProcessing()
{
...
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
...
return tasks;
}
//in the MAIN METHOD of your application/process
var tasks = new SomeMainClass().StartProcessing();
// do all other stuffs here, and just at the end of process
Task.WaitAll(tasks);
Hope this clears all confusion.

It's possible your code is swallowing exceptions. I would add a ContineWith call to the end of the part of the code that starts the new task. Something like this untested code:
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
}).ContinueWith(tsk =>
{
var flattenedException = tsk.Exception.Flatten();
Console.Log("Exception! " + flattenedException);
return true;
});
},TaskContinuationOptions.OnlyOnFaulted); //Only call if task is faulted
Another possibility is that something in one of the tasks is timing out (like you mentioned) or deadlocking. To track down whether a timeout (or maybe deadlock) is the root cause, you could add some timeout logic (as described in this SO answer):
int timeout = 1000; //set to something much greater than the time it should take your task to complete (at least for testing)
var task = TheMethodWhichWrapsYourAsyncLogic(cancellationToken);
if (await Task.WhenAny(task, Task.Delay(timeout, cancellationToken)) == task)
{
// Task completed within timeout.
// Consider that the task may have faulted or been canceled.
// We re-await the task so that any exceptions/cancellation is rethrown.
await task;
}
else
{
// timeout/cancellation logic
}
Check out the documentation on exception handling in the TPL on MSDN.

Related

Return data from long running Task on demand

I want to create a Task, which may run for many minutes, collecting data via an API call to another system. At some point in the future I need to stop the task and return the collected data. This future point is unknown at the time of starting the task.
I have read many question about returning data from tasks, but I can't find any that answer this scenario. I may be missing a trick, but all of the examples actually seem to wait in the man thread for the task to finish before continuing. This seems counter-intuitive, surely the purpose of a task is to hand off an activity whilst continuing with other activities in your main thread?
Here is one of those many examples, taken from DotNetPearls..
namespace TaskBasedAsynchronousProgramming
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine($"Main Thread Started");
Task<double> task1 = Task.Run(() =>
{
return CalculateSum(10);
});
Console.WriteLine($"Sum is: {task1.Result}");
Console.WriteLine($"Main Thread Completed");
Console.ReadKey();
}
static double CalculateSum(int num)
{
double sum = 0;
for (int count = 1; count <= num; count++)
{
sum += count;
}
return sum;
}
}
}
Is it possible to do what I need, and have a long-running task running in parallel, stop it and return the data at an arbitrary future point?
Here is a sample application how you can do that:
static double partialResult = -1;
static void Main()
{
CancellationTokenSource calculationEndSignal = new(TimeSpan.FromSeconds(3));
Task meaningOfLife = Task.Run(() =>
GetTheMeaningOfLife(calculationEndSignal.Token),
calculationEndSignal.Token);
calculationEndSignal.Token.Register(() => Console.WriteLine(partialResult));
Console.ReadLine();
}
static async Task GetTheMeaningOfLife(CancellationToken cancellationToken)
{
foreach (var semiResult in Enumerable.Range(1, 42))
{
partialResult = semiResult;
cancellationToken.ThrowIfCancellationRequested();
await Task.Delay(1000);
}
}
partialResult is a shared variable between the two threads
The worker thread (GetTheMeaningOfLife) only writes it
The main thread (Main) only reads it
The read operation is performed only after the Task has been cancelled
calculationEndSignal is used to cancel the long-running operation
I've have specified a timeout, but you can call the Cancel method if you want
meaningOfLife is the Task which represents the long-running operation call
I have passed the CancellationToken to the GetTheMeaningOfLife and to the Task.Run as well
For this very simple example the Task.Run should not need to receive the token but it is generally a good practice to pass there as well
Register is receiving a callback which should be called after the token is cancelled
ReadLine can be any other computation
I've used ReadLine to keep the application running
GetTheMeaningOfLife simply increments the partialResult shared variable
either until it reaches the meaning of life
or until it is cancelled
Here is one approach. It features a CancellationTokenSource that is used as a stopping mechanism, instead of its normal usage as a cancellation mechanism. That's because you want to get the partial results, and a canceled Task does not propagate results:
CancellationTokenSource stoppingTokenSource = new();
Task<List<int>> longRunningTask = Task.Run(() =>
{
List<int> list = new();
for (int i = 1; i <= 60; i++)
{
if (stoppingTokenSource.IsCancellationRequested) break;
// Simulate a synchronous operation that has 1 second duration.
Thread.Sleep(1000);
list.Add(i);
}
return list;
});
Then, somewhere else in your program, you can send a stopping signal to the task, and then await asynchronously until the task acknowledges the signal and completes successfully. The await will also propagate the partial results:
stoppingTokenSource.Cancel();
List<int> partialResults = await longRunningTask;
Or, if you are not in an asynchronous workflow, you can wait synchronously until the partial results are available:
stoppingTokenSource.Cancel();
List<int> partialResults = longRunningTask.Result;

Await async call in for-each-loop [duplicate]

This question already has answers here:
Using async/await for multiple tasks
(8 answers)
Closed 4 years ago.
I have a method in which I'm retrieving a list of deployments. For each deployment I want to retrieve an associated release. Because all calls are made to an external API, I now have a foreach-loop in which those calls are made.
public static async Task<List<Deployment>> GetDeployments()
{
try
{
var depjson = await GetJson($"{BASEURL}release/deployments?deploymentStatus=succeeded&definitionId=2&definitionEnvironmentId=5&minStartedTime={MinDateTime}");
var deployments = (JsonConvert.DeserializeObject<DeploymentWrapper>(depjson))?.Value?.OrderByDescending(x => x.DeployedOn)?.ToList();
foreach (var deployment in deployments)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
}
return deployments;
}
catch (Exception)
{
throw;
}
}
This all works perfectly fine. However, I do not like the await in the foreach-loop at all. I also believe this is not considered good practice. I just don't see how to refactor this so the calls are made parallel, because the result of each call is used to set a property of the deployment.
I would appreciate any suggestions on how to make this method faster and, whenever possible, avoid the await-ing in the foreach-loop.
There is nothing wrong with what you are doing now. But there is a way to call all tasks at once instead of waiting for a single task, then processing it and then waiting for another one.
This is how you can turn this:
wait for one -> process -> wait for one -> process ...
into
wait for all -> process -> done
Convert this:
foreach (var deployment in deployments)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
}
To:
var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}"));
var reljsons = await Task.WhenAll(deplTasks);
for(var index = 0; index < deployments.Count; index++)
{
deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]);
}
First you take a list of unfinished tasks. Then you await it and you get a collection of results (reljson's). Then you have to deserialize them and assign to Release.
By using await Task.WhenAll() you wait for all the tasks at the same time, so you should see a performance boost from that.
Let me know if there are typos, I didn't compile this code.
Fcin suggested to start all Tasks, await for them all to finish and then start deserializing the fetched data.
However, if the first Task is already finished, but the second task not, and internally the second task is awaiting, the first task could already start deserializing. This would shorten the time that your process is idly waiting.
So instead of:
var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}"));
var reljsons = await Task.WhenAll(deplTasks);
for(var index = 0; index < deployments.Count; index++)
{
deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]);
}
I'd suggest the following slight change:
// async fetch the Release data of Deployment:
private async Task<Release> FetchReleaseDataAsync(Deployment deployment)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
return JsonConvert.DeserializeObject<Release>(reljson);
}
// async fill the Release data of Deployment:
private async Task FillReleaseDataAsync(Deployment deployment)
{
deployment.Release = await FetchReleaseDataAsync(deployment);
}
Then your procedure is similar to the solution that Fcin suggested:
IEnumerable<Task> tasksFillDeploymentWithReleaseData = deployments.
.Select(deployment => FillReleaseDataAsync(deployment)
.ToList();
await Task.WhenAll(tasksFillDeploymentWithReleaseData);
Now if the first task has to wait while fetching the release data, the 2nd task begins and the third etc. If the first task already finished fetching the release data, but the other tasks are awaiting for their release data, the first task starts already deserializing it and assigns the result to deployment.Release, after which the first task is complete.
If for instance the 7th task got its data, but the 2nd task is still waiting, the 7th task can deserialize and assign the data to deployment.Release. Task 7 is completed.
This continues until all tasks are completed. Using this method there is less waiting time because as soon as one task has its data it is scheduled to start deserializing
If i understand you right and you want to make the var reljson = await GetJson parralel:
Try this:
Parallel.ForEach(deployments, (deployment) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
});
you might limit the number of parallel executions such as:
Parallel.ForEach(
deployments,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
(deployment) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
});
you might also want to be able to break the loop:
Parallel.ForEach(deployments, (deployment, state) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
if (noFurtherProcessingRequired) state.Break();
});

Tasks not finishing before process end

In my code in the main thread I call a 3rd party API. For each result from the API I call 2 async tasks. Sometimes all works perfectly, sometimes not all async tasks run. I suppose that when the main thread finishes, the garbage collector kills all my other tasks that run in the background. Is there any way to tell garbage collector not to kill the background services when the main thread finishes?
The code is like this:
for (int i = 0; i < 1000; i++)
{
var demo = new AsyncAwaitTest();
demo.DoStuff1(guid);
demo.DoStuff2(guid);
}
public class AsyncAwaitTest
{
public async Task DoStuff1(string guid)
{
await Task.Run(() =>
{
DoSomething1(guid);
});
}
public async Task DoStuff2(string guid)
{
await Task.Run(() =>
{
DoSomething2(guid);
});
}
private static async Task<int> DoSomething1(string guid)
{
// insert in db or something else
return 1;
}
private static async Task<int> DoSomething2(string guid)
{
// insert in db or something else
return 1;
}
Thanks
If you want to wait for all your tasks to finish, you have to collect them and wait for them. Because when your process ends, that normally is the end of everything.
var tasks = List<Task>();
for (int i = 0; i < 1000; i++)
{
var demo = new AsyncAwaitTest();
tasks.Add(demo.DoStuff1(guid));
tasks.Add(demo.DoStuff2(guid));
}
// before your process ends, you need to make sure all tasks ave finished.
Task.WaitAll(tasks.ToArray());
You also have 2 levels of carelessness (meaning you start a task and don't care whether it's done). You need to remove the second one, too:
public async Task DoStuff1(string guid)
{
await DoSomething1(guid);
}

Run X number of Task<T> at any given time while keeping UI responsive

I have a C# WinForms (.NET 4.5.2) app utilizing the TPL. The tool has a synchronous function which is passed over to a task factory X amount of times (with different input parameters), where X is a number declared by the user before commencing the process. The tasks are started and stored in a List<Task>.
Assuming the user entered 5, we have this in an async button click handler:
for (int i = 0; i < X; i++)
{
var progress = Progress(); // returns a new IProgress<T>
var task = Task<int>.Factory.StartNew(() => MyFunction(progress), TaskCreationOptions.LongRunning);
TaskList.Add(task);
}
Each progress instance updates the UI.
Now, as soon as a task is finished, I want to fire up a new one. Essentially, the process should run indefinitely, having X tasks running at any given time, unless the user cancels via the UI (I'll use cancellation tokens for this). I try to achieve this using the following:
while (TaskList.Count > 0)
{
var completed = await Task.WhenAny(TaskList.ToArray());
if (completed.Exception == null)
{
// report success
}
else
{
// flatten AggregateException, print out, etc
}
// update some labels/textboxes in the UI, and then:
TaskList.Remove(completed);
var task = Task<int>.Factory.StartNew(() => MyFunction(progress), TaskCreationOptions.LongRunning);
TaskList.Add(task);
}
This is bogging down the UI. Is there a better way of achieving this functionality, while keeping the UI responsive?
A suggestion was made in the comments to use TPL Dataflow but due to time constraints and specs, alternative solutions are welcome
Update
I'm not sure whether the progress reporting might be the problem? Here's what it looks like:
private IProgress<string> Progress()
{
return new Progress<string>(msg =>
{
txtMsg.AppendText(msg);
});
}
Now, as soon as a task is finished, I want to fire up a new one. Essentially, the process should run indefinitely, having X tasks running at any given time
It sounds to me like you want an infinite loop inside your task:
for (int i = 0; i < X; i++)
{
var progress = Progress(); // returns a new IProgress<T>
var task = RunIndefinitelyAsync(progress);
TaskList.Add(task);
}
private async Task RunIndefinitelyAsync(IProgress<T> progress)
{
while (true)
{
try
{
await Task.Run(() => MyFunction(progress));
// handle success
}
catch (Exception ex)
{
// handle exceptions
}
// update some labels/textboxes in the UI
}
}
However, I suspect that the "bogging down the UI" is probably in the // handle success and/or // handle exceptions code. If my suspicion is correct, then push as much of the logic into the Task.Run as possible.
As I understand, you simply need a parallel execution with the defined degree of parallelization. There is a lot of ways to implement what you want. I suggest to use blocking collection and parallel class instead of tasks.
So when user clicks button, you need to create a new blocking collection which will be your data source:
BlockingCollection<IProgress> queue = new BlockingCollection<IProgress>();
CancellationTokenSource source = new CancellationTokenSource();
Now you need a runner that will execute your in parallel:
Task.Factory.StartNew(() =>
Parallel.For(0, X, i =>
{
foreach (IProgress p in queue.GetConsumingEnumerable(source.Token))
{
MyFunction(p);
}
}), source.Token);
Or you can choose more correct way with partitioner. So you'll need a partitioner class:
private class BlockingPartitioner<T> : Partitioner<T>
{
private readonly BlockingCollection<T> _Collection;
private readonly CancellationToken _Token;
public BlockingPartitioner(BlockingCollection<T> collection, CancellationToken token)
{
_Collection = collection;
_Token = token;
}
public override IList<IEnumerator<T>> GetPartitions(int partitionCount)
{
throw new NotImplementedException();
}
public override IEnumerable<T> GetDynamicPartitions()
{
return _Collection.GetConsumingEnumerable(_Token);
}
public override bool SupportsDynamicPartitions
{
get { return true; }
}
}
And runner will looks like this:
ParallelOptions Options = new ParallelOptions();
Options.MaxDegreeOfParallelism = X;
Task.Factory.StartNew(
() => Parallel.ForEach(
new BlockingPartitioner<IProgress>(queue, source.Token),
Options,
p => MyFunction(p)));
So all you need right now is to fill queue with necessary data. You can do it whenever you want.
And final touch, when the user cancels operation, you have two options:
first you can break execution with source.Cancel call,
or you can gracefully stop execution by marking collection complete (queue.CompleteAdding), in that case runner will execute all already queued data and finish.
Of course you need additional code to handle exceptions, progress, state and so on. But main idea is here.

c# do the equivalent of restarting a Task with some parameter

The main idea here is to fetch some data from somewhere, when it's fetched start writing it, and then prepare the next batch of data to be written, while waiting for the previous write to be complete.
I know that a Task cannot be restarted or reused (nor should it be), although I am trying to find a way to do something like this :
//The "WriteTargetData" method should take the "data" variable
//created in the loop below as a parameter
//WriteData basically do a shedload of mongodb upserts in a separate thread,
//it takes approx. 20-30 secs to run
var task = new Task(() => WriteData(somedata));
//GetData also takes some time.
foreach (var data in queries.Select(GetData))
{
if (task.Status != TaskStatus.Running)
{
//start task with "data" as a parameter
//continue the loop to prepare the next batch of data to be written
}
else
{
//wait for task to be completed
//"restart" task
//continue the loop to prepare the next batch of data to be written
}
}
Any suggestion appreciated ! Thanks. I don't necessarily want to use Task, I just think it might be the way to go.
This may be over simplifying your requirements, but would simply "waiting" for the previous task to complete work for you? You can use Task.WaitAny and Task.WaitAll to wait for previous operations to complete.
pseudo code:
// Method that makes calls to fetch and write data.
public async Task DoStuff()
{
Task currTask = null;
object somedata = await FetchData();
while (somedata != null)
{
// Wait for previous task.
if (currTask != null)
Task.WaitAny(currTask);
currTask = WriteData(somedata);
somedata = await FetchData();
}
}
// Whatever method fetches data.
public Task<object> FetchData()
{
var data = new object();
return Task.FromResult(data);
}
// Whatever method writes data.
public Task WriteData(object somedata)
{
return Task.Factory.StartNew(() => { /* write data */});
}
The Task class is not designed to be restarted. so you Need to create a new task and run the body with the same Parameters. Next i do not see where you start the task with the WriteData function in its body. That will property Eliminate the call of if (task.Status != TaskStatus.Running) There are AFAIK only the class Task and Thread where task is only the abstraction of an action that will be scheduled with the TaskScheduler and executed in different threads ( when we talking about the Common task Scheduler, the one you get when you call TaskFactory.Scheduler ) and the Number of the Threads are equal to the number of Processor Cores.
To you Business App. Why do you wait for the execution of WriteData? Would it be not a lot more easy to gater all data and than submit them into one big Write?
something like ?
public void Do()
{
var task = StartTask(500);
var array = new[] {1000, 2000, 3000};
foreach (var data in array)
{
if (task.IsCompleted)
{
task = StartTask(data);
}
else
{
task.Wait();
task = StartTask(data);
}
}
}
private Task StartTask(int data)
{
var task = new Task(DoSmth, data);
task.Start();
return task;
}
private void DoSmth(object time)
{
Thread.Sleep((int) time);
}
You can use a thread and an AutoResetEvent. I have code like this for several different threads in my program:
These are variable declarations that belong to the main program.
public AutoResetEvent StartTask = new AutoResetEvent(false);
public bool IsStopping = false;
public Thread RepeatingTaskThread;
Somewhere in your initialization code:
RepeatingTaskThread = new Thread( new ThreadStart( RepeatingTaskProcessor ) ) { IsBackground = true; };
RepeatingTaskThread.Start();
Then the method that runs the repeating task would look something like this:
private void RepeatingTaskProcessor() {
// Keep looping until the program is going down.
while (!IsStopping) {
// Wait to receive notification that there's something to process.
StartTask.WaitOne();
// Exit if the program is stopping now.
if (IsStopping) return;
// Execute your task
PerformTask();
}
}
If there are several different tasks you want to run, you can add a variable that would indicate which one to process and modify the logic in PerformTask to pick which one to run.
I know that it doesn't use the Task class, but there's more than one way to skin a cat & this will work.

Categories

Resources