This question already has answers here:
Task.WaitAll freezes app C#
(4 answers)
Closed 6 years ago.
I have a site scraper that scrapes a site with paging results.
For every page I try to run a Task to make it faster, but the freezes.
var pageCount = getPageCount(txtSearchQuery.Text);
var tasks = new Task[pageCount];
var link = txtSearchQuery.Text;
for (var i = 1; i <= pageCount; i++)
{
tasks[i-1] = new Task(new Action(() => { Scrape(link, i); }));
tasks[i-1].Start();
}
Task.WaitAll(tasks);
MessageBox.Show("Complete");
What I am doing wrong?
I am assuming you are using the latest version of .net that supports async/await
Change your method signature to make it async and await the result of your tasks using Task.WhenAll. This will free up the UI thread ie. won't hang the UI.
// normally you do not return void but a Task BUT with WPF events
// (like on button click) void is required instead
protected async void MyMethod() {
// Task.WaitAll(tasks); // replace this with
await Task.WhenAll(tasks); // this will not hang your UI
// rest of your code that you want to execute
}
WaitAll is going to block the Main thread. Instead use ContinueWhenAll and do your work in there after all the tasks are completed.
var link = txtSearchQuery.Text;
for (var i = 1; i <= pageCount; i++)
{
tasks[i-1] = new Task(new Action(() => { Scrape(link, i); }));
tasks[i-1].Start();
}
Task.Factory.ContinueWhenAll(tasks, completedTasks =>
{
// Do continuation work.
});
Related
This question already has answers here:
Captured variable in a loop in C#
(10 answers)
Closed 1 year ago.
I am trying to get the content length of multiple pages using Task in .NET Core asynchronously:
public static async Task GetContentLengthAsync(string url)
{
HttpClient httpClient = new HttpClient();
var httpResponse = await httpClient.GetAsync(url);
var page = await httpResponse.Content.ReadAsStringAsync();
Console.WriteLine(page.Length);
}
If I use it like this to take the first 100 pages' content length:
for (int i = 1; i <= 100; i++)
{
Task.Run(() => GetContentLengthAsync($"https://www.website.com/view/{i}"));
}
=> all the outputs are the same or different but incorrect. (but I get the results very fast)
If I run the Task with await like this by just calling the GetLengthsAsync():
public static async void GetLengthsAsync()
{
for (int i = 1; i <= 100; i++)
{
await Task.Run(() => GetContentLengthAsync($"https://www.website.com/view/{i}"));
}
}
=> the output is correct and I can still type in the console and do other tasks in that time but each of the GetContentLengthAsync tasks waits the other to be completed and uses only one thread at a time. Is there a way to make it run not only asynchronously but also on multiple threads at the same time without losing information?
P.S. I want to use Tasks because it's an university project and I know that there probably are better ways of doing it(but those are the requirements). It's more of a problem solving task to better understand how Task works.
A lambda function in a for() loop will 'capture' the loop variable. Most requests would have been for /view/100.
for (int i = 1; i <= 100; i++)
{
int localCopy = i;
Task.Run(() => GetContentLengthAsync($"https://www.website.com/view/{localCopy}"));
}
This question already has answers here:
Running multiple async tasks and waiting for them all to complete
(10 answers)
Closed 3 years ago.
I have a code to call a function in a loop, which would access endpoints asynchronously, coded like this:
public async Task HitEndPointsAsync()
{
for (int i = 0; i < _urls.Length; i++)
{
await HitOneEndPointAsync(i);
}
}
The simplified function HitOneEndPointAsync looks like this:
private async Task HitOneEndPointAsync(int i)
{
var newRec = InsertRec(i);
ExtURL extUrl = new ExtURL(_urls[i]);
result = await extUrl.GetAsync(_parms[i]);
UpdateRec(i, result);
}
If I remove the await in HitEndPointsAsync, then the await that is in HitOneEndPointAsync is no longer effective. If I need all endpoints to be called at once, but on each call await for the response to process that response further, would this be an option? Because as soon as I remove the await at the function call level, the await down the line is ignored. Thoughts?
You could try something like this if you don't want to await within the loop:
public async Task HitEndPointsAsync()
{
var tasks = new List<Task>();
for (int i = 0; i < _urls.Length; i++)
{
tasks.Add(HitOneEndPointsAsync(i));
}
await Task.WhenAll(tasks);
}
Note that this may add extra threads due to the calls to HitEndPointAsync being wrapped in a Task so I am not sure it has any advantages over the original version, maybe you could add more context to your question to know exactly what you are trying to do.
I found some strange behavior running tasks within a for loop and awaiting them all. As mentioned below, what the code does is starting a "loop" for definite number of tasks, each task creates a "ToDo" item and each task has a continuation that assigns each created task to a person and finally but them in a listBox using Invoke so the UI thread is called. this works fine, but i do not get the expected data in the listBox. I do not expect them to be ordered, but i do expect them to be paired, e.g. :
Person_8, Todo 8
Person 5, Todo 5
etc...
And they should only appear ones in the listBox of course ! But instead, i get strange output (and output is never the same for each run), here are some examples running the code:
enter image description here
enter image description here
And here is the code:
private async void buttonCreateToDo_Click(object sender, EventArgs e){
await CreateToDoAsync();
}
private async Task CreateToDoAsync(){
List<Task> taskList = new List<Task>();
for (int i = 1; i < 10; i++){
var task = Task.Run(() => CreateToDo(i));
Task continuation = task.ContinueWith((antecedent) => Invoke(new AssignTaskDelegate(AssignTask), (new Person() {
Name = $"Person_{i}",
ToDoForPerson = antecedent.Result
})));
taskList.Add(task);
}
await Task.WhenAll(taskList.ToArray());
}
private ToDo CreateToDo(int toDoId) {
return new ToDo(){
Id = toDoId,
Description = $"Todo {toDoId}"
};
}
private void AssignTask(Person person){
listBoxToDo.Items.Add($"{person.Name}, {person.ToDoForPerson.Description}");
}
Your issue is that for loop runs much faster than the creation of the tasks and so by the time the tasks run the variable i has reached the end of the loop.
To fix this you need to take a copy of i inside the loop and use that.
Try this code:
private async Task CreateToDoAsync()
{
List<Task> taskList = new List<Task>();
for (int i = 1; i < 10; i++)
{
var local_i = i;
var task = Task.Run(() => CreateToDo(local_i));
Task continuation = task.ContinueWith((antecedent) => Invoke(new AssignTaskDelegate(AssignTask), (new Person()
{
Name = $"Person_{local_i}",
ToDoForPerson = antecedent.Result
})));
taskList.Add(task);
}
await Task.WhenAll(taskList.ToArray());
}
Now, in preference I'd use Microsoft's Reactive Framework (NuGet "System.Reactive") to do this work. Your code would look like this:
private async Task CreateToDoAsync()
{
var query =
from i in Observable.Range(1, 9)
from t in Observable.Start(() => CreateToDo(i))
select new Person() { Name = $"Person_{i}", ToDoForPerson = t };
await query.ObserveOn(listBoxToDo).Do(x => AssignTask(x));
}
Done. That's it.
When I run my code (with the AssignTask outputting to the Console) I get this:
Person_1, Todo 1
Person_2, Todo 2
Person_3, Todo 3
Person_6, Todo 6
Person_7, Todo 7
Person_4, Todo 4
Person_5, Todo 5
Person_8, Todo 8
Person_9, Todo 9
The .ObserveOn(listBoxToDo) works for Windows Forms to marshall back to the UI thread.
You are using variable "i" from the loop inside the task. "i" might have changed while your task started.
I am using Async await with Task.Factory method.
public async Task<JobDto> ProcessJob(JobDto jobTask)
{
try
{
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
await T;
}
This method I am calling inside a loop like this
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
What I notice is that new tasks opens up inside Process explorer and they also start working (based on log file). however out of 10 sometimes 8 or sometimes 7 finishes. Rest of them just never come back.
why would that be happening ?
Are they timing out ? Where can I set timeout for my tasks ?
UPDATE
Basically above, I would like each Task to start running as soon as they are called and wait for the response on AWAIT T keyword. I am assuming here that once they finish each of them will come back at Await T and do the next action. I am alraedy seeing this result for 7 out of 10 tasks but 3 of them are not coming back.
Thanks
It is hard to say what the issues is without the rest if the code, but you code can be simplified by making ProcessJob synchronous and then calling Task.Run with it.
public JobDto ProcessJob(JobDto jobTask)
{
JobWorker jobWorker = new JobWorker();
return jobWorker.Execute(jobTask);
}
Start tasks and wait for all tasks to finish. Prefer using Task.Run rather than Task.Factory.StartNew as it provides more favourable defaults for pushing work to the background. See here.
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = Task.Run(() => ProcessJob(jobList[i]));
}
try
{
await Task.WhenAll(tasks);
}
catch(Exception ex)
{
// handle exception
}
First, let's make a reproducible version of your code. This is NOT the best way to achieve what you are doing, but to show you what is happening in your code!
I'll keep the code almost same as your code, except I'll use simple int rather than your JobDto and on completion of the job Execute() I'll write in a file that we can verify later. Here's the code
public class SomeMainClass
{
public void StartProcessing()
{
var jobList = Enumerable.Range(1, 10).ToArray();
var tasks = new Task[10];
//[1] start 10 jobs, one-by-one
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
//[4] here we have 10 awaitable Task in tasks
//[5] do all other unrelated operations
Thread.Sleep(1500); //assume it works for 1.5 sec
// Task.WaitAll(tasks); //[6] wait for tasks to complete
// The PROCESS IS COMPLETE here
}
public async Task ProcessJob(int jobTask)
{
try
{
//[2] start job in a ThreadPool, Background thread
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
//[3] await here will keep context of calling thread
await T; //... and release the calling thread
}
catch (Exception) { /*handle*/ }
}
}
public class JobWorker
{
static object locker = new object();
const string _file = #"C:\YourDirectory\out.txt";
public void Execute(int jobTask) //on complete, writes in file
{
Thread.Sleep(500); //let's assume does something for 0.5 sec
lock(locker)
{
File.AppendAllText(_file,
Environment.NewLine + "Writing the value-" + jobTask);
}
}
}
After running just the StartProcessing(), this is what I get in the file
Writing the value-4
Writing the value-2
Writing the value-3
Writing the value-1
Writing the value-6
Writing the value-7
Writing the value-8
Writing the value-5
So, 8/10 jobs has completed. Obviously, every time you run this, the number and order might change. But, the point is, all the jobs did not complete!
Now, if I un-comment the step [6] Task.WaitAll(tasks);, this is what I get in my file
Writing the value-2
Writing the value-3
Writing the value-4
Writing the value-1
Writing the value-5
Writing the value-7
Writing the value-8
Writing the value-6
Writing the value-9
Writing the value-10
So, all my jobs completed here!
Why the code is behaving like this, is already explained in the code-comments. The main thing to note is, your tasks run in ThreadPool based Background threads. So, if you do not wait for them, they will be killed when the MAIN process ends and the main thread exits!!
If you still don't want to await the tasks there, you can return the list of tasks from this first method and await the tasks at the very end of the process, something like this
public Task[] StartProcessing()
{
...
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
...
return tasks;
}
//in the MAIN METHOD of your application/process
var tasks = new SomeMainClass().StartProcessing();
// do all other stuffs here, and just at the end of process
Task.WaitAll(tasks);
Hope this clears all confusion.
It's possible your code is swallowing exceptions. I would add a ContineWith call to the end of the part of the code that starts the new task. Something like this untested code:
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
}).ContinueWith(tsk =>
{
var flattenedException = tsk.Exception.Flatten();
Console.Log("Exception! " + flattenedException);
return true;
});
},TaskContinuationOptions.OnlyOnFaulted); //Only call if task is faulted
Another possibility is that something in one of the tasks is timing out (like you mentioned) or deadlocking. To track down whether a timeout (or maybe deadlock) is the root cause, you could add some timeout logic (as described in this SO answer):
int timeout = 1000; //set to something much greater than the time it should take your task to complete (at least for testing)
var task = TheMethodWhichWrapsYourAsyncLogic(cancellationToken);
if (await Task.WhenAny(task, Task.Delay(timeout, cancellationToken)) == task)
{
// Task completed within timeout.
// Consider that the task may have faulted or been canceled.
// We re-await the task so that any exceptions/cancellation is rethrown.
await task;
}
else
{
// timeout/cancellation logic
}
Check out the documentation on exception handling in the TPL on MSDN.
I'm using Asp .Net 4.5.1.
I have tasks to run, which call a web-service, and some might fail. I need to run N successful tasks which perform some light CPU work and mainly call a web service, then stop, and I want to throttle.
For example, let's assume we have 300 URLs in some collection. We need to run a function called Task<bool> CheckUrlAsync(url) on each of them, with throttling, meaning, for example, having only 5 run "at the same time" (in other words, have maximum 5 connections used at any given time). Also, we only want to perform N (assume 100) successful operations, and then stop.
I've read this and this and still I'm not sure what would be the correct way to do it.
How would you do it?
Assume ASP .Net
Assume IO call (http call to web serice), no heavy CPU operations.
Use Semaphore slim.
var semaphore = new SemaphoreSlim(5);
var tasks = urlCollection.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
};
while(tasks.Where(t => t.Completed).Count() < 100)
{
await.Task.WhenAny(tasks);
}
Although I would prefer to use Rx.Net to produce some better code.
using(var semaphore = new SemaphoreSlim(5))
{
var results = urlCollection.ToObservable()
.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
}).Take(100).ToList();
}
Okay...this is going to be fun.
public static class SemaphoreHelper
{
public static Task<T> ContinueWith<T>(
this SemaphoreSlim semaphore,
Func<Task<T>> action)
var ret = semaphore.WaitAsync()
.ContinueWith(action);
ret.ContinueWith(_ => semaphore.Release(), TaskContinuationOptions.None);
return ret;
}
var semaphore = new SemaphoreSlim(5);
var results = urlCollection.Select(
url => semaphore.ContinueWith(() => CheckUrlAsync(url)).ToList();
I do need to add that the code as it stands will still run all 300 URLs, it just will return quicker...thats all. You would need to add the cancelation token to the semaphore.WaitAsync(token) to cancel the queued work. Again I suggest using Rx.Net for that. Its just easier to use Rx.Net to get the cancelation token to work with .Take(100).
Try something like this?
private const int u_limit = 100;
private const int c_limit = 5;
List<Task> tasks = new List<Task>();
int totalRun = 0;
while (totalRun < u_limit)
{
for (int i = 0; i < c_limit; i++)
{
tasks.Add(Task.Run (() => {
// Your code here.
}));
}
Task.WaitAll(tasks);
totalRun += c_limit;
}