Using ConcurrentBags - c#

I am running some code that uses ConcurrentBags. I am exploring the IEnumerable functionality.
The code I run is
ConcurrentBag<int> bag = new ConcurrentBag<int>();
Task.Run(() =>
{
bag.Add(42);
Thread.Sleep(1000);
bag.Add(21);
});
Task.Run(() =>
{
foreach (int i in bag)
Console.WriteLine(i);
}).Wait();
I expected the code to return 42, but it is returning nothing.
Was my assumption wrong?

You have a race condition, basically. On my machine, this does print 42 most of the time - but fundamentally you have two independent tasks: one adding, and one printing. There is no guarantee which task will execute its first statement first, as you have no synchronization or coordination between the two tasks.
If you want to ensure that the first Add call has completed before you start to iterate over the bag, you'll need to have some coordination.

Related

Why are parallel tasks not running?

I have this code, it is the skeleton of larger functionality stripped down to prove the problem:
var tasks = Enumerable.Range(0, 10)
.Select(laneNo => Task.Run(() => Console.WriteLine($"Starting generator for lane {laneNo}")));
for(int r=0; ;++r)
{
Task.Delay(TimeSpan.FromSeconds(3)).Wait();
Console.WriteLine($"Iteration {r} at {DateTime.Now}");
}
I never see "Starting generator" printed to Console but I do see the iteration fire every 3 seconds - something is causing those tasks not to progress (in the real code they run for a significant period but removing that doesn't affect the problem).
Why are the first bunch of Tasks not running? My theory is it's related to Task.Delay?
Your linq-statment is never materialized. Linq-operators like Select, Where, OrderBy, etc work as building blocks that you chain together but they are not executed until you run it through a foreach or use operators which do not return enumerables, like ToArray, ToList, First, Last etc.
If you call ToList at the end you should see all of the tasks executing but if you only call First you should see only a single one because the iteration of your original Range will then terminate after first element.
LINQ Select has deferred execution; it simply defines an iterator, so your Tasks are not being generated.
You could make use of Task.WhenAll(IEnumerable<Task>), which will iterate and await each Task, generating new Task that completes once all the provided tasks have also completed:
var tasks = Enumerable.Range(0, 10)
.Select(laneNo => Task.Run(() => Console.WriteLine($"Starting generator for lane {laneNo}")));
await Task.WhenAll(tasks);

What's the diference between Task.WhenAll() and foreach(var task in tasks)

After a few hours of struggle I found a bug in my app. I considered the 2 functions below to have identical behavior, but it turned out they don't.
Can anyone tell me what's really going on under the hood, and why they behave in a different way?
public async Task MyFunction1(IEnumerable<Task> tasks){
await Task.WhenAll(tasks);
Console.WriteLine("all done"); // happens AFTER all tasks are finished
}
public async Task MyFunction2(IEnumerable<Task> tasks){
foreach(var task in tasks){
await task;
}
Console.WriteLine("all done"); // happens BEFORE all tasks are finished
}
They'll function identically if all tasks complete successfully.
If you use WhenAll and any items fail, it still won't be completed until all of the items are finished, and it'll represent an AggregatException that wraps all errors from all tasks.
If you await each one then it'll complete as soon as it hits any item that fails, and it'll represent an exception for that one error, not any others.
The two also differ in that WhenAll will materialize the entire IEnumerable right at the start, before adding any continuations to other items. If the IEnumerable represents a collection of already existing and started tasks, then this isn't relevant, but if the act of iterating the enumerable creates and/or starts tasks, then materializing the sequence at the start would run them all in parallel, and awaiting each before fetching the next task would execute them sequentially. Below is a IEnumerable you could pass in that would behave as I've described here:
public static IEnumerable<Task> TaskGeneratorSequence()
{
for(int i = 0; i < 10; i++)
yield return Task.Delay(TimeSpan.FromSeconds(2);
}
Likely the most important functional difference is that Task.WhenAll can introduce concurrency when your tasks perform truly asynchronous operations, for example, IO. This may or may not be what you want depending on your situation.
For example, if your tasks are querying the database using the same EF DbContext, the next query would fire as soon as the first one is "in flight" which causes EF to blow up as it doesn't support multiple simultaneous queries using the same context.
That's because you're not awaiting each asynchronous operation individually. You're awaiting a task that represents the completion of all of those asynchronous operations. They can also be completed in any order.
However when you await each one individually in a foreach, you only fire the next task when the current one completes, preventing concurrency and ensuring serial execution.
A simple example demonstrating this behavior:
async Task Main()
{
var tasks = new []{1, 2, 3, 4, 5}.Select(i => OperationAsync(i));
foreach(var t in tasks)
{
await t;
}
await Task.WhenAll(tasks);
}
static Random _rand = new Random();
public async Task OperationAsync(int number)
{
// simulate an asynchronous operation
// taking anywhere between 100 to 3000 milliseconds
await Task.Delay(_rand.Next(100, 3000));
Console.WriteLine(number);
}
You'll see that no matter how long OperationAsync takes, with foreach you always get 1, 2, 3, 4, 5 printed. But with Task.WhenAll they are executed concurrently and printed in their completion order.

WhenAll vs WaitAll in parallel

I'm trying to understand how WaitAll and WhenAll works and have following problem. There are two possible ways to get a result from a method:
return Task.WhenAll(tasks).Result.SelectMany(r=> r);
return tasks.Select(t => t.Result).SelectMany(r => r).ToArray();
If I understand correctly, the second case is like calling WaitAll on tasks and fetching the results after that.
It looks like the second case has much better performance. I know that the proper usage of WhenAll is with await keyword, but still, i'm wondering why there is so big difference in performance for these lines.
After analyzing the flow of the system I think I've figured out how to model the problem in a simple test application (test code is based on I3arnon answer):
public static void Test()
{
var tasks = Enumerable.Range(1, 1000).Select(n => Task.Run(() => Compute(n)));
var baseTasks = new Task[100];
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
baseTasks[i] = Task.Run(() =>
{
tasks.Select(t => t.Result).SelectMany(r => r).ToList();
});
}
Task.WaitAll(baseTasks);
Console.WriteLine("Select - {0}", stopwatch.Elapsed);
baseTasks = new Task[100];
stopwatch.Restart();
for (int i = 0; i < 100; i++)
{
baseTasks[i] = Task.Run(() =>
{
Task.WhenAll(tasks).Result.SelectMany(result => result).ToList();
});
}
Task.WaitAll(baseTasks);
Console.WriteLine("Task.WhenAll - {0}", stopwatch.Elapsed);
}
It looks like the problem is in starting tasks from other tasks (or in Parallel loop). In that case WhenAll results in much worse performance of the program. Why is that?
You are starting tasks inside a Parallel.ForEach loop which you should avoid. The whole point of Paralle.ForEach is to parallelize many small but intensive computations across the available CPU cores and starting a task is not an intensive computation. Rather it creates a task object and stores it on a queue if the task pool is saturated which it quickly will be with 1000 tasks being starteed. So now Parallel.ForEach competes with the task pool for compute resources.
In the first loop that is quite slow it seems that the scheduling is suboptimal and very little CPU is used probably because of Task.WhenAll inside the Parallel.ForEach. If you change the Parallel.ForEach to a normal for loop you will see a speedup.
But if you code really is as simple as a Compute function without any state carried forward between iterations you can get rid of the tasks and simply use Parallel.ForEach to maximize performance:
Parallel.For(0, 100, (i, s) =>
{
Enumerable.Range(1, 1000).Select(n => Compute(n)).SelectMany(r => r).ToList();
});
As to why Task.WhenAll performs much worse you should realize that this code
tasks.Select(t => t.Result).SelectMany(r => r).ToList();
will not run the tasks in parallel. The ToList basically wraps the iteration in a foreach loop and the body of the loop creates a task and then waits for the task to complete because you retrieve the Task.Result property. So each iteration of the loop will create a task and then wait for it to complete. The 1000 tasks are executed one after the other and there is very little overhead in handling the tasks. This means that you do not need the tasks which is also what I have suggested above.
On the other hand, the code
Task.WhenAll(tasks).Result.SelectMany(result => result).ToList();
will start all the tasks and try to execute them concurrently and because the task pool is unable to execute 1000 tasks in parallel most of these tasks are queued before they are executed. This creates a big management and task switch overhead which explains the bad performance.
With regard to the final question you added: If the only purpose of the outer task is to start the inner tasks then the outer task has no useful purpose but if the outer tasks are there to perform some kind of coordination of the inner tasks then it might make sense (perhaps you want to combine Task.WhenAny with Task.WhenAll). Without more context it is hard to answer. However, your question seems to be about performance and starting 100,000 tasks may add considerable overhead.
Parallel.ForEach is a good choice if you want to perform 100,000 independent computations like you do in your example. Tasks are very good for executing concurrent activities involving "slow" calls to other systems where you want to wait for and combine results and also handle errors. For massive parallelism they are probably not the best choice.
Your test is way too complicated so I've made my own. Here's a simple test that incorporates your Consume method:
public static void Test()
{
var tasks = Enumerable.Repeat(int.MaxValue, 10000).Select(n => Task.Run(() => Compute(n)));
var stopwatch = Stopwatch.StartNew();
Task.WhenAll(tasks).Result.SelectMany(result => result).ToList();
Console.WriteLine("Task.WhenAll - {0}", stopwatch.Elapsed);
stopwatch.Restart();
tasks.Select(t => t.Result).SelectMany(r => r).ToList();
Console.WriteLine("Select - {0}", stopwatch.Elapsed);
}
private static List<int> Compute(int seed)
{
var results = new List<int>();
for (int i = 0; i < 5000; i++)
{
results.Add(seed * i);
}
return results;
}
Output:
Task.WhenAll - 00:00:01.2894227
Select - 00:00:01.7114142
However if I use Enumerable.Repeat(int.MaxValue, 100) the output is:
Task.WhenAll - 00:00:00.0205375
Select - 00:00:00.0178089
Basically the difference between the options is if you're blocking once or blocking for each element. Blocking once is better when there are many elements, but for few blocking for each one could be better.
Since there ins't really a big difference and, you care about performance only when you're dealing with many items and logically you want to proceed when all the tasks completed I recommend using Task.WhenAll.

In a Parallel.For, is it possible to synchronize each threads?

In a Parallel.For, is it possible to synchronize each threads with a 'WaitAll' ?
Parallel.For(0, maxIter, i =>
{
// Do stuffs
// Synchronisation : wait for all threads => ???
// Do another stuffs
});
Parallel.For, in the background, batches the iterations of the loop into one or more Tasks, which can executed in parallel. Unless you take ownership of the partitioning, the number of tasks (and threads) is (and should!) be abstracted away. Control will only exit the Parallel.For loop once all the tasks have completed (i.e. no need for WaitAll).
The idea of course is that each loop iteration is independent and doesn't require synchronization.
If synchronization is required in the tight loop, then you haven't isolated the Tasks correctly, or it means that Amdahl's Law is in effect, and the problem can't be speeded up through parallelization.
However, for an aggregation type pattern, you may need to synchronize after completion of each Task - use the overload with the localInit / localFinally to do this, e.g.:
// allTheStrings is a shared resource which isn't thread safe
var allTheStrings = new List<string>();
Parallel.For( // for (
0, // var i = 0;
numberOfIterations, // i < numberOfIterations;
() => new List<string> (), // localInit - Setup each task. List<string> --> localStrings
(i, parallelLoopState, localStrings) =>
{
// The "tight" loop. If you need to synchronize here, there is no point
// using parallel at all
localStrings.Add(i.ToString());
return localStrings;
},
(localStrings) => // local Finally for each task.
{
// Synchronization needed here is needed - run once per task
lock(allTheStrings)
{
allTheStrings.AddRange(localStrings);
}
});
In the above example, you could also have just declared allTheStrings as
var allTheStrings = new ConcurrentBag<string>();
In which case, we wouldn't have required the lock in the localFinally.
You shouldn't (for reasons stated by other users), but if you want to, you can use Barrier. This can be used to cause all threads to wait (block) at a certain point before X number of participants hit a barrier, causing the barrier to proceed and threads to unblock. The downside of this approach, as others have said, deadlocks

run a method multiple times simultaneously in c#

I have a method that returns XML elements, but that method takes some time to finish and return a value.
What I have now is
foreach (var t in s)
{
r.add(method(test));
}
but this only runs the next statement after previous one finishes. How can I make it run simultaneously?
You should be able to use tasks for this:
//first start a task for each element in s, and add the tasks to the tasks collection
var tasks = new List<Task>();
foreach( var t in s)
{
tasks.Add(Task.Factory.StartNew(method(t)));
}
//then wait for all tasks to complete asyncronously
Task.WaitAll(tasks);
//then add the result of all the tasks to r in a treadsafe fashion
foreach( var task in tasks)
{
r.Add(task.Result);
}
EDIT
There are some problems with the code above. See the code below for a working version. Here I have also rewritten the loops to use LINQ for readability issues (and in the case of the first loop, to avoid the closure on t inside the lambda expression causing problems).
var tasks = s.Select(t => Task<int>.Factory.StartNew(() => method(t))).ToArray();
//then wait for all tasks to complete asyncronously
Task.WaitAll(tasks);
//then add the result of all the tasks to r in a treadsafe fashion
r = tasks.Select(task => task.Result).ToList();
You can use Parallel.ForEach which will utilize multiple threads to do the execution in parallel. You have to make sure that all code called is thread safe and can be executed in parallel.
Parallel.ForEach(s, t => r.add(method(t));
From what I'm seeing you are updating a shared collection inside the loop. This means that if you execute the loop in parallel a data race will occur because multiple threads will try to update a non-synchronized collection (assuming r is a List or something like this) at the same time, causing an inconsistent state.
To execute correctly in parallel, you will need to wrap that section of code inside a lock statement:
object locker = new object();
Parallel.Foreach (s,
t =>
{
lock(locker) r.add(method(t));
});
However, this will make the execution actually serial, because each thread needs to acquire the lock and two threads cannot do so at the same time.
The better solution would be to have a local list for each thread, add the partial results to that list and then merge the results when all threads have finished. Probably #Øyvind Knobloch-Bråthen's second solution is the best one, assuming method(t) is the real CPU-hog in this case.
Modification to the correct answer for this question
change
tasks.Add(Task.Factory.StartNew(method(t);));
to
//solution will be the following code
tasks.Add(Task.Factory.StartNew(() => { method(t);}));

Categories

Resources