I have a method that returns XML elements, but that method takes some time to finish and return a value.
What I have now is
foreach (var t in s)
{
r.add(method(test));
}
but this only runs the next statement after previous one finishes. How can I make it run simultaneously?
You should be able to use tasks for this:
//first start a task for each element in s, and add the tasks to the tasks collection
var tasks = new List<Task>();
foreach( var t in s)
{
tasks.Add(Task.Factory.StartNew(method(t)));
}
//then wait for all tasks to complete asyncronously
Task.WaitAll(tasks);
//then add the result of all the tasks to r in a treadsafe fashion
foreach( var task in tasks)
{
r.Add(task.Result);
}
EDIT
There are some problems with the code above. See the code below for a working version. Here I have also rewritten the loops to use LINQ for readability issues (and in the case of the first loop, to avoid the closure on t inside the lambda expression causing problems).
var tasks = s.Select(t => Task<int>.Factory.StartNew(() => method(t))).ToArray();
//then wait for all tasks to complete asyncronously
Task.WaitAll(tasks);
//then add the result of all the tasks to r in a treadsafe fashion
r = tasks.Select(task => task.Result).ToList();
You can use Parallel.ForEach which will utilize multiple threads to do the execution in parallel. You have to make sure that all code called is thread safe and can be executed in parallel.
Parallel.ForEach(s, t => r.add(method(t));
From what I'm seeing you are updating a shared collection inside the loop. This means that if you execute the loop in parallel a data race will occur because multiple threads will try to update a non-synchronized collection (assuming r is a List or something like this) at the same time, causing an inconsistent state.
To execute correctly in parallel, you will need to wrap that section of code inside a lock statement:
object locker = new object();
Parallel.Foreach (s,
t =>
{
lock(locker) r.add(method(t));
});
However, this will make the execution actually serial, because each thread needs to acquire the lock and two threads cannot do so at the same time.
The better solution would be to have a local list for each thread, add the partial results to that list and then merge the results when all threads have finished. Probably #Øyvind Knobloch-Bråthen's second solution is the best one, assuming method(t) is the real CPU-hog in this case.
Modification to the correct answer for this question
change
tasks.Add(Task.Factory.StartNew(method(t);));
to
//solution will be the following code
tasks.Add(Task.Factory.StartNew(() => { method(t);}));
Related
I have this code, it is the skeleton of larger functionality stripped down to prove the problem:
var tasks = Enumerable.Range(0, 10)
.Select(laneNo => Task.Run(() => Console.WriteLine($"Starting generator for lane {laneNo}")));
for(int r=0; ;++r)
{
Task.Delay(TimeSpan.FromSeconds(3)).Wait();
Console.WriteLine($"Iteration {r} at {DateTime.Now}");
}
I never see "Starting generator" printed to Console but I do see the iteration fire every 3 seconds - something is causing those tasks not to progress (in the real code they run for a significant period but removing that doesn't affect the problem).
Why are the first bunch of Tasks not running? My theory is it's related to Task.Delay?
Your linq-statment is never materialized. Linq-operators like Select, Where, OrderBy, etc work as building blocks that you chain together but they are not executed until you run it through a foreach or use operators which do not return enumerables, like ToArray, ToList, First, Last etc.
If you call ToList at the end you should see all of the tasks executing but if you only call First you should see only a single one because the iteration of your original Range will then terminate after first element.
LINQ Select has deferred execution; it simply defines an iterator, so your Tasks are not being generated.
You could make use of Task.WhenAll(IEnumerable<Task>), which will iterate and await each Task, generating new Task that completes once all the provided tasks have also completed:
var tasks = Enumerable.Range(0, 10)
.Select(laneNo => Task.Run(() => Console.WriteLine($"Starting generator for lane {laneNo}")));
await Task.WhenAll(tasks);
I am trying to wrap my head around how to handle multiple async/await calls in a foreach loop. I have around 20,000 rows of data that are processed by the foreach loop. Roughly my code is:
foreach (var item in data)
{
if (ConditionA(item))
{
if (ConditionAB(item));
{
await CreateThingViaAPICall(item)
}
else
{
var result = await GetExistingRecord(item);
var result2 = await GetOtherExistingRecord(result);
var result3 = await GetOtherExistingRecord(result2);
//Do processing
...
await CreateThingViaAPICall();
}
}
... and so on
}
I've seen many posts saying the best way to use async in a loop is to build a list of tasks and then use Task.WhenAll. In my case I have Tasks that depend on each other as part of each iteration. How do I build up a list of tasks to execute in this case?
It's easiest if you break the processing of an individual item into a separate (async) method:
private async Task ProcessItemAsync(Item item)
{
if (ConditionA(item))
{
if (ConditionAB(item));
{
await CreateThingViaAPICall(item)
}
else
{
var result = await GetExistingRecord(item);
var result2 = await GetOtherExistingRecord(result);
var result3 = await GetOtherExistingRecord(result2);
//Do processing
...
await CreateThingViaAPICall();
}
}
... and so on
}
Then process your collection like so:
var tasks = data.Select(ProcessItemAsync);
await Task.WhenAll(tasks);
This effectively wraps the multiple dependent Tasks required to process a single item into one Task, allowing those steps to happen sequentially while items of the collection itself are processed concurrently.
With 10's of thousands of items, you may, for a variety of reasons, find that you need to throttle the number of Tasks running concurrently. Have a look at TPL Dataflow for this type of scenario. See here for an example.
If I'm not mistaken the recommended way to use async/wait in a foreach is to build a list of Tasks first then call Task.WhenAll.
You're partly mistaken.
If you have a multiple tasks that don't depend on each other then it is indeed generally a very good idea to have those multiple task happen in a WhenAll so that they can be scheduled together, giving better throughput.
If however each task depends on the results of the previous, then this approach isn't viable. Instead you should just await them within a foreach.
Indeed, this will work fine for any case, it's just suboptimal to have tasks wait on each other if they don't have to.
The ability to await tasks in a foreach is in fact one of the biggest gains that async/await has given us. Most code that uses await can be re-written to use ContinueWith quite easily, if less elegantly, but loops were trickier and if the actual end of the loop was only found by examining the results of the tasks themselves, trickier again.
A weird thing is happening here. I thought Parallel.Foreach would wait until all of its tasks are complete before moving on. But then, I have something like that:
List<string> foo(List<A> list){
Dictionary<string, bool> dictionary = new Dictionary<string, bool>();
Parallel.Foreach(list, element =>
{
dictionary[element.Id] = true;
if (element.SomeMethod()){
dictionary[element.Id] = false;
}
});
List<string> selectedIds = (from element in list where !dictionary[element.Id] select element.Id).ToList();
return selectedIds;
}
and then I'm getting System.Collections.Generic.KeyNotFoundException (sometimes, not always) in the select line. As you can see, I'm initializing the dictionary for every possible key (Ids of list's elements), and then getting this exception, which made me think that this line might be reached before the execution of the Parallel.Foreach completes... Is that right? If so, how can I wait until all branches of this Parallel.Foreach completes?
Parallel.Foreach doesn't need to be waited as it doesn't return a Task and isn't asynchronous. When the call to that method completes the iteration is already done.
However, Parallel.Foreach uses multiple threads concurrently and Dictionary isn't thread safe.
You probably have a race conditions on your hands and you should be using the thread safe ConcurrentDictionary instead.
This specific case can be solved in a simpler way by using PLinq's AsParallel:
list.AsParallel().Where(element => !element.SomeMethod());
I have a typical foreach loop that calls a method where the parameter is an element of the collection we're looping over; something like this:
foreach (byte x in SomeCollection)
{
SomeMethod(x);
}
The problem is that SomeMethod takes a long time to run. I want to move the call into a new task so that the loop just creates the tasks and then the thread that called the loops just continues. How do I do this in a thread-safe way?
Edit
I had a performance issue because SomeMethod makes several DB calls. So I converted the loop to a Parallel.ForEach but that didn't make much of a difference because each thread then call the DB. What I'm looking to do is just create Tasks that will run in the background and let the main thread continue.
One way would be to use Parallel.ForEach to do this:
Parallel.ForEach(SomeCollection, x => SomeMethod(x));
The code would wait for all calls of SomeMethod to complete before proceeding, but the individual calls may run in parallel.
If you don't want to wait for the calls to finish, wrap this call in StartNew:
Task.Factory.StartNew(() => Parallel.ForEach(SomeCollection, x => SomeMethod(x)));
What thread safety do you expect? This will be thread-safe:
foreach (byte x in SomeCollection) { Task.Factory.StartNew(() => SomeMethod(x)); }
until your method does not modify any shared state, which isn't thread-safe itself.
You could something like:
IEnumerable<Task> doWork()
{
foreach(var x in SomeCollection)
yield return Task.Run(()=>SomeMethod(x);
}
Task.WhenAll(doWork());
This will run them all at the same time.
I am running some code that uses ConcurrentBags. I am exploring the IEnumerable functionality.
The code I run is
ConcurrentBag<int> bag = new ConcurrentBag<int>();
Task.Run(() =>
{
bag.Add(42);
Thread.Sleep(1000);
bag.Add(21);
});
Task.Run(() =>
{
foreach (int i in bag)
Console.WriteLine(i);
}).Wait();
I expected the code to return 42, but it is returning nothing.
Was my assumption wrong?
You have a race condition, basically. On my machine, this does print 42 most of the time - but fundamentally you have two independent tasks: one adding, and one printing. There is no guarantee which task will execute its first statement first, as you have no synchronization or coordination between the two tasks.
If you want to ensure that the first Add call has completed before you start to iterate over the bag, you'll need to have some coordination.