Creating a task inside a foreach loop - c#

I have a typical foreach loop that calls a method where the parameter is an element of the collection we're looping over; something like this:
foreach (byte x in SomeCollection)
{
SomeMethod(x);
}
The problem is that SomeMethod takes a long time to run. I want to move the call into a new task so that the loop just creates the tasks and then the thread that called the loops just continues. How do I do this in a thread-safe way?
Edit
I had a performance issue because SomeMethod makes several DB calls. So I converted the loop to a Parallel.ForEach but that didn't make much of a difference because each thread then call the DB. What I'm looking to do is just create Tasks that will run in the background and let the main thread continue.

One way would be to use Parallel.ForEach to do this:
Parallel.ForEach(SomeCollection, x => SomeMethod(x));
The code would wait for all calls of SomeMethod to complete before proceeding, but the individual calls may run in parallel.
If you don't want to wait for the calls to finish, wrap this call in StartNew:
Task.Factory.StartNew(() => Parallel.ForEach(SomeCollection, x => SomeMethod(x)));

What thread safety do you expect? This will be thread-safe:
foreach (byte x in SomeCollection) { Task.Factory.StartNew(() => SomeMethod(x)); }
until your method does not modify any shared state, which isn't thread-safe itself.

You could something like:
IEnumerable<Task> doWork()
{
foreach(var x in SomeCollection)
yield return Task.Run(()=>SomeMethod(x);
}
Task.WhenAll(doWork());
This will run them all at the same time.

Related

Effects of async within a parallel for loop

I am going to start by saying that I am learning about mulithreading at the moment so it may be the case that not all I say is correct - please feel free to correct me as required. I do have a reasonable understanding of async and await.
My basic aim is as follows:
I have a body of code that currently takes about 3 seconds. I am trying to load some data at the start of the method that will be used right at the end. My plan is to load the data on a different thread right at the start - allowing the rest of the code to execute independently. Then, at the point that I need the data, the code will wait if the data is not loaded. So far this is all seems to be working fine and as I describe.
My question relates to what happens when I call a method that is async, within a parallel for loop, without awaiting it.
My code follows this structure:
public void MainCaller()
{
List<int> listFromThread = null;
var secondThread = Task.Factory.StartNew(() =>
{
listFromThread = GetAllLists().Result;
});
//Do some other stuff
secondThread.Wait();
//Do not pass this point until this thread has completed
}
public Task<List<int>> GetAllLists()
{
var intList = new List<int>(){ /*Whatever... */};
var returnList = new List<int>();
Parallel.ForEach(intList, intEntry =>
{
var res = MyMethod().Result;
returnList.AddRange(res);
});
return Task.FromResult(returnList);
}
private async Task<List<int>> MyMethod()
{
var myList = await obtainList.ToListAsync();
}
Note the Parallel for Loop calls the async method, but does not await it as it is not async itself.
This is a method that is used elsewhere, so it is valid that it is async. I know one option is to make a copy of this method that is not async, but I am trying to understand what will happen here.
My question is, can I be sure that when I reach secondThread.Wait(); the async part of the execution will be complete. Eg will wait to know wait for the async part to complete, or will async mess up the wait, or will it work seamlessly together?
It seems to me it could be possible that as the call to MyMethod is not awaited, but there is an await within MyMethod, the parallel for loop could continue execution before the awaited call has completed?
Then I think, as it is assigning it by reference, then once the assigning takes place, the value will be the correct result.
This leads me to think that as long as the wait will know to wait for the async to complete, then there is no problem - hence my question.
I guess this relates to my lack of understanding of Tasks?
I hope this is clear?
In your code there is no part that is executed asynchrounously.
In MainCaller, you start a Task and immediately Wait for it to finished.
This is a blocking operation which only introduces the extra overhead of calling
GetAllLists in another Task.
In this Task you call You start a new Task (by calling GettAllLists) but immediately
wait for this Task to finish by waiting for its Result (which is also blocking).
In the Task started by GetAllLists you have the Parallel.Foreach loop which starts
several new Tasks. Each of these 'for' Tasks will start another Task by calling
MyMethod and immediately waiting for its result.
The net result is that your code completely executes synchronously. The only parallelism is introduced in the Parallel.For loop.
Hint: a usefull thread concerning this topic: Using async/await for multiple tasks
Additionally your code contains a serious bug:
Each Task created by the Parallel.For loop will eventually add its partial List to the ReturnList by calling AddRange. 'AddRange' is not thread safe, so you need to have some synchronisation mechanism (e.g. 'Lock') or there is the possibility that your ReturnList gets corrupted or does not contain all the results. See also: Is the List<T>.AddRange() thread safe?

Should Parallel.Foreach be waited?

A weird thing is happening here. I thought Parallel.Foreach would wait until all of its tasks are complete before moving on. But then, I have something like that:
List<string> foo(List<A> list){
Dictionary<string, bool> dictionary = new Dictionary<string, bool>();
Parallel.Foreach(list, element =>
{
dictionary[element.Id] = true;
if (element.SomeMethod()){
dictionary[element.Id] = false;
}
});
List<string> selectedIds = (from element in list where !dictionary[element.Id] select element.Id).ToList();
return selectedIds;
}
and then I'm getting System.Collections.Generic.KeyNotFoundException (sometimes, not always) in the select line. As you can see, I'm initializing the dictionary for every possible key (Ids of list's elements), and then getting this exception, which made me think that this line might be reached before the execution of the Parallel.Foreach completes... Is that right? If so, how can I wait until all branches of this Parallel.Foreach completes?
Parallel.Foreach doesn't need to be waited as it doesn't return a Task and isn't asynchronous. When the call to that method completes the iteration is already done.
However, Parallel.Foreach uses multiple threads concurrently and Dictionary isn't thread safe.
You probably have a race conditions on your hands and you should be using the thread safe ConcurrentDictionary instead.
This specific case can be solved in a simpler way by using PLinq's AsParallel:
list.AsParallel().Where(element => !element.SomeMethod());

Will Parallel.Foreach block until it is done?

If I have code similar to this:
foreach (Item child in item.Children)
{
// Do stuff
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 3;
Parallel.ForEach(items, i => DoStuff());
}
Is the Parallel.Foreach going to finish all of its items before moving on to the next foreach item?
Yes - Parallel.ForEach will block. It's a synchronous method, which internally does its work in parallel.
I've gone with a slightly bizarre way to demonstrate the desired property below, because I can't find any nice excerpts from the documentation for e.g. Parallel.ForEach that just come out and states that the loops are completed before the methods return:
Yes. Note the return type of Parallel.Foreach is a ParallelLoopResult which contains information that can only be available once all of the operations have completed, such as IsCompleted:
Gets whether the loop ran to completion, such that all iterations of the loop were executed and the loop didn't receive a request to end prematurely.
ParallelLoopResult is a struct - and so whatever value is returned from Parallel.ForEach cannot be altered after the return from that method.

c# Threads and Thread.Join

I have a list of 10 items that I need to process, with each item using a separate thread. Should the code be like this:
foreach (Item item in items)
{
Thread t = new Thread(() =>
{
ProcessItem(item);
});
t.Start();
}
I would also need to pause the thread for (1 second minus the time taken to execute the thread). Should I use Thread.Sleep in this case?
If you don't mind skipping the manual handling of Threads, the following line should do exactly what you want:
Parallel.ForEach(items, ProcessItem);
Or sleeping before processing each (although that does not make much sense):
Parallel.ForEach(items, item => { Thread.Sleep(1000); ProcessItem(item); });
You will use Thread.Join to wait for other threads to finish their work.
Thread.Sleep will essentially wait for the specified number of milli-seconds
Thread.Sleep indeed has side-effects and is not recommended.
Some points to be noted in your context:
What if there are no more threads available (if the number of items increases ?)
Does the threads access some shared resources ?
Check out the ThreadPooling and thread-safe operations too.
The code for starting the threads looks fine.
You will have to use Thread.Sleep(duration in milliseconds) for making the thread to pause for duration amount of time.
Join will halt the current thread until the thread on which you join does not complete its processing.
Use the following if, for some reason, you don't want to use the Parallel.ForEach:
Thread[] threads = new Thread[10];
int count = 0;
foreach (Item item in items)
{
Thread t = new Thread(() =>
{
ProcessItem(item);
});
t.Start();
threads[count++]=t;
}
for (int i=0;i<10;++i)
threads[i].Join();
Use Thread.Sleep.
Thread.Sleep and Thread.Join are different things.
Thread.Sleep blocks (stops) the current thread for a certain time.
Thread.Join blocks (stops) the current thread until the one which Join was called finishes.
Also, consider using Parallel.ForEach as #nvoigt suggested.

run a method multiple times simultaneously in c#

I have a method that returns XML elements, but that method takes some time to finish and return a value.
What I have now is
foreach (var t in s)
{
r.add(method(test));
}
but this only runs the next statement after previous one finishes. How can I make it run simultaneously?
You should be able to use tasks for this:
//first start a task for each element in s, and add the tasks to the tasks collection
var tasks = new List<Task>();
foreach( var t in s)
{
tasks.Add(Task.Factory.StartNew(method(t)));
}
//then wait for all tasks to complete asyncronously
Task.WaitAll(tasks);
//then add the result of all the tasks to r in a treadsafe fashion
foreach( var task in tasks)
{
r.Add(task.Result);
}
EDIT
There are some problems with the code above. See the code below for a working version. Here I have also rewritten the loops to use LINQ for readability issues (and in the case of the first loop, to avoid the closure on t inside the lambda expression causing problems).
var tasks = s.Select(t => Task<int>.Factory.StartNew(() => method(t))).ToArray();
//then wait for all tasks to complete asyncronously
Task.WaitAll(tasks);
//then add the result of all the tasks to r in a treadsafe fashion
r = tasks.Select(task => task.Result).ToList();
You can use Parallel.ForEach which will utilize multiple threads to do the execution in parallel. You have to make sure that all code called is thread safe and can be executed in parallel.
Parallel.ForEach(s, t => r.add(method(t));
From what I'm seeing you are updating a shared collection inside the loop. This means that if you execute the loop in parallel a data race will occur because multiple threads will try to update a non-synchronized collection (assuming r is a List or something like this) at the same time, causing an inconsistent state.
To execute correctly in parallel, you will need to wrap that section of code inside a lock statement:
object locker = new object();
Parallel.Foreach (s,
t =>
{
lock(locker) r.add(method(t));
});
However, this will make the execution actually serial, because each thread needs to acquire the lock and two threads cannot do so at the same time.
The better solution would be to have a local list for each thread, add the partial results to that list and then merge the results when all threads have finished. Probably #Øyvind Knobloch-Bråthen's second solution is the best one, assuming method(t) is the real CPU-hog in this case.
Modification to the correct answer for this question
change
tasks.Add(Task.Factory.StartNew(method(t);));
to
//solution will be the following code
tasks.Add(Task.Factory.StartNew(() => { method(t);}));

Categories

Resources