I have a list of tasks that I want to execute in parallel using Parallel.ForEach. It starts fine with 4 tasks running in parallel but in the end it decreases to only one task at a time.
Here is the count of parallel tasks in time:
1 2 3 4 4 3 4 4 ... 4 4 4 3 3 1 1 1 1 1 1 1
Max degree of parallelism is set to 4. At the end of execution only one task is executed at one time and all executions run on the same thread. My question is why I am getting this one task at a time execution in the end? How can I avoid this?
Here is the code:
var threadCount = 4;
ThreadPool.SetMinThreads(threadCount, threadCount);
Parallel.ForEach(taskDataList,
new ParallelOptions() {MaxDegreeOfParallelism = threadCount},
(x) => { RunOne(x); });
RunOne function starts external process and waits for it to end. Some suspected that RunOne could have been the problem of lack of parallel execution. To make sure that this is not the case I recreated situation by replacing this function with a sleep call of identical duration.
The code is below. Here t is the list of seconds each task takes. activeCount is the number of currently running tasks and remaining is the number of tasks that still remain in the list.
var t = new List<int>()
{2,2,2,1,1,1,1,1,1,1,
1,1,1,1,1,3,1,1,1,1,
1,1,1,1,1,1,1,1,5,4,
26,12,11,16,44,4,37,26,13,36};
int activeCount = 0;
int remaining = t.Count;
Parallel.ForEach(t, new ParallelOptions() {MaxDegreeOfParallelism = 4},
(x) =>
{
Console.WriteLine($"Active={Interlocked.Increment(ref activeCount)}"+
$"Remaining={Interlocked.Decrement(ref remaining)} " +
$"Run thread={Thread.CurrentThread.ManagedThreadId}");
Thread.Sleep(x * 1000); //Sleep x seconds
Interlocked.Decrement(ref activeCount);
});
At the very end it produces output like this:
Active=2 Remaining=7 Run thread=3
Active=1 Remaining=6 Run thread=3
Active=1 Remaining=5 Run thread=3
Active=1 Remaining=4 Run thread=3
Active=1 Remaining=3 Run thread=3
Active=1 Remaining=2 Run thread=3
Active=1 Remaining=1 Run thread=3
Active=1 Remaining=0 Run thread=3
This output shows that in the end only 1 task is running when 6 tasks still remain. With limit of 4 parallel tasks it does not make any sense. When 6 tasks are still available I would expect to see 4 tasks running in parallel.
Should I use Parallel.ForEach differently or is it a bug/feature?
After looking at reference source of Parallel.ForEach I found out that instead of distributing elements to different threads one by one it splits the list of tasks into chunks and then gives the list of tasks to each thread. It is very inefficient approach for long running tasks
var t = new List<int>()
{2,2,2,1,1,1,1,1,1,1,
1,1,1,1,1,3,1,1,1,1,
1,1,1,1,1,1,1,1,5,4,
26,12,11,16,44,4,37,26,13,36};
int activeCount = 0;
int remaining = t.Count;
var cq = new ConcurrentQueue<int>(t);
var tasks = new List<Task>();
for (int i = 0; i < 4; i++) tasks.Add(Task.Factory.StartNew(() =>
{
int x;
while (cq.TryDequeue(out x))
{
Console.WriteLine($"Active={Interlocked.Increment(ref activeCount)} " +
$"Remaining={Interlocked.Decrement(ref remaining)} " +
$"Run thread={Thread.CurrentThread.ManagedThreadId}");
Thread.Sleep(x * 1000); //Sleep x seconds
Interlocked.Decrement(ref activeCount);
}
}));
Task.WaitAll(tasks.ToArray());
I used 4 parallel tasks as in the first code example. Execution time in this case was 83 seconds when using Parallel.ForEach took 211 seconds. This just proves that Parallel.ForEach is very inefficient in certain cases and that it should be used with caution.
Related
I am trying to find out why parallel foreach does not give the expected speedup on a machine with 32 physical cores and 64 logical cores with a simple test computation.
...
var parameters = new List<string>();
for (int i = 1; i <= 9; i++) {
parameters.Add(i.ToString());
if (Scenario.UsesParallelForEach)
{
Parallel.ForEach(parameters, parameter => {
FireOnParameterComputed(this, parameter, Thread.CurrentThread.ManagedThreadId, "started");
var lc = new LongComputation();
lc.Compute();
FireOnParameterComputed(this, parameter, Thread.CurrentThread.ManagedThreadId, "stopped");
});
}
else
{
foreach (var parameter in parameters)
{
FireOnParameterComputed(this, parameter, Thread.CurrentThread.ManagedThreadId, "started");
var lc = new LongComputation();
lc.Compute();
FireOnParameterComputed(this, parameter, Thread.CurrentThread.ManagedThreadId, "stopped");
}
}
}
...
class LongComputation
{
public void Compute()
{
var s = "";
for (int i = 0; i <= 40000; i++)
{
s = s + i.ToString() + "\n";
}
}
}
The Compute function takes about 5 seconds to complete. My assumption was, that with the parallel foreach loop each additional iteration creates a parallel thread running on one of the cores and taking as much as it would take to compute the Compute function only once. So, if I run the loop twice, then with the sequential foreach, it would take 10 seconds, with the parallel foreach only 5 seconds (assuming 2 cores are available). The speedup would be 2. If I run the loop three times, then with the sequential foreach, it would take 15 seconds, but again with the parallel foreach only 5 seconds. The speedup would be 3, then 4, 5, 6, 7, 8, and 9. However, what I observe is a constant speedup of 1.3.
Sequential vs parallel foreach. X-axis: number of sequential/parallel execution of the computation. Y-axis: time in seconds
Speedup, time of the sequential foreach divided by parallel foreach
The event fired in FireOnParameterComputed is intended to be used in a GUI progress bar to show the progress. In the progress bar it can be clearly see, that for each iteration, a new thread is created.
My question is, why don't I see the expected speedup or at least close to the expected speedup?
Tasks aren't threads.
Sometimes starting a task will cause a thread to be created, but not always. Creating and managing threads consumes time and system resources. When a task only takes a short amount of time, even though it's counter-intuitive, the single-threaded model is often faster.
The CLR knows this and tries to make its best judgment on how to execute the task based on a number of factors including any hints that you've passed to it.
For Parallel.ForEach, if you're certain that you want multiple threads to be spawned, try passing in ParallelOptions.
Parallel.ForEach(parameters, new ParallelOptions { MaxDegreeOfParallelism = 100 }, parameter => {});
Hello I was wondering how people make loops threaded for example
for(int i = 0; i<10; i++)
{
Console.WriteLine(i);
}
Is it possible to have one thread for every loop?
so,
Thread 1: 0
Thread 2: 1
Thread 3: 2
ect..
and if so, how would I cap the threads ?
What you are looking for is the parallel for
This:
for(int i=0; i <=10; i++)
{
Console.WriteLine(i);
}
Becomes this:
Parallel.For(0, 10, new ParallelOptions { MaxDegreeOfParallelism = 5 }, i =>
{
Console.WriteLine(i);
});
This will spawn one task per iteration of the loop. One of the overloads of parallel takes in the parallel options which lets set the maximum number of tasks running cuncurrently. Official docs: https://msdn.microsoft.com/en-us/library/dd992418(v=vs.110).aspx
Note there is some difference between a thread and a task in C#: What is the difference between task and thread?
If you have a list of items to process I recommend the parallel foreach: https://msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx
I have some code that loops through a list of records, starts an export task for each one, and increases a progress counter by 1 each time a task finishes so the user knows how far along the process is.
But depending on the timing of my loops, I often see the output showing a higher number before a lower number.
For example, I would expect to see output like this:
Exporting A
Exporting B
Exporting C
Exporting D
Exporting E
Finished 1 / 5
Finished 2 / 5
Finished 3 / 5
Finished 4 / 5
Finished 5 / 5
But instead I get output like this
Exporting A
Exporting B
Exporting C
Exporting D
Exporting E
Finished 1 / 5
Finished 2 / 5
Finished 5 / 5
Finished 4 / 5
Finished 3 / 5
I don't expect the output to be exact since I'm not locking the value when I update/use it (sometimes it outputs the same number twice, or skips a number), however I wouldn't expect it to go backwards.
My test data set is 72 values, and the relevant code looks like this:
var tasks = new List<Task>();
int counter = 0;
StatusMessage = string.Format("Exporting 0 / {0}", count);
foreach (var value in myValues)
{
var valueParam = value;
// Create async task, start it, and store the task in a list
// so we can wait for all tasks to finish at the end
tasks.Add(
Task.Factory.StartNew(() =>
{
Debug.WriteLine("Exporting " + valueParam );
System.Threading.Thread.Sleep(500);
counter++;
StatusMessage = string.Format("Exporting {0} / {1}", counter, count);
Debug.WriteLine("Finished " + counter.ToString());
})
);
}
// Begin async task to wait for all tasks to finish and update output
Task.Factory.StartNew(() =>
{
Task.WaitAll(tasks.ToArray());
StatusMessage = "Finished";
});
The output can appear backwards in both the debug statements and the StatusMessage output.
What's the correct way to keep count of how many async tasks in a loop are completed so that this problem doesn't occur?
You get mixed output, because counter is not incremented in the same order as Debug.WriteLine(...) method is executed.
To get a consistent progress report, you can introduce a reporting lock into the task
tasks.Add(
Task.Factory.StartNew(() =>
{
Debug.WriteLine("Exporting " + valueParam );
System.Threading.Thread.Sleep(500);
lock(progressReportLock)
{
counter++;
StatusMessage = string.Format("Exporting {0} / {1}", counter, count);
Debug.WriteLine("Finished " + counter.ToString());
}
})
);
In this sample the counter variable represents shared state among several threads. Using the ++ operator on shared state is simply unsafe and will give you incorrect results. It essentially boils down to the following instructions
push counter to stack
push 1 to stack
add values on the stack
store into counter
Because multiple threads are executing this statement it's possible for one to interrupt the other partway through completing the above sequence. This would cause the incorrect value to end up in counter.
Instead of ++ use the following statement
Interlocked.Increment(ref counter);
This operation is specifically designed to update state which may be shared among several threads. The interlocked will happen atomically and won't suffer from the race conditions I outlined
The actual out of order display of values suffers from a similar problem even after my suggested fix. The increment and display operation aren't atomic and hence one thread can interrupt the other in between the increment and display. If you want the operations to be un-interruptable by other threads then you will need to use a lock.
object lockTarget = new object();
int counter = 0;
...
lock (lockTarget) {
counter++;
StatusMessage = string.Format("Exporting {0} / {1}", counter, count);
Debug.WriteLine("Finished " + counter.ToString());
}
Note that because the increment of counter now occurs inside the lock there is no longer a need to use Interlocked.Increment
Running the code from Parallel.ForEach keeps spawning new threads with a few my modifications
The output with commented line:
//threadsRemaining = Interlocked.Decrement(ref concurrentThreads);
is "obvious", i.e. expected one:
[00:00] Job 0 complete. 2 threads remaining. unsafeCount=2
[00:00] Job 1 complete. 1 threads remaining. unsafeCount=1
[00:00] Job 2 complete. 3 threads remaining. unsafeCount=3
[00:00] Job 3 complete. 4 threads remaining. unsafeCount=4
[00:00] Job 4 complete. 5 threads remaining. unsafeCount=5
[00:00] Job 5 complete. 6 threads remaining. unsafeCount=6
[00:01] Job 6 complete. 7 threads remaining. unsafeCount=7
[00:01] Job 8 complete. 8 threads remaining. unsafeCount=8
[00:01] Job 7 complete. 9 threads remaining. unsafeCount=9
[00:01] Job 9 complete. 10 threads remaining. unsafeCount=10
While the output of the same code upon UNcommenting the above line is:
[00:00] Job 0 complete. 1 threads remaining. unsafeCount=1
[00:00] Job 1 complete. 0 threads remaining. unsafeCount=0
[00:00] Job 3 complete. 0 threads remaining. unsafeCount=0
[00:00] Job 2 complete. 1 threads remaining. unsafeCount=1
[00:00] Job 4 complete. 1 threads remaining. unsafeCount=1
[00:00] Job 5 complete. 1 threads remaining. unsafeCount=1
[00:01] Job 6 complete. 1 threads remaining. unsafeCount=1
[00:01] Job 8 complete. 1 threads remaining. unsafeCount=1
[00:01] Job 9 complete. 1 threads remaining. unsafeCount=1
[00:01] Job 7 complete. 0 threads remaining. unsafeCount=0
Can you explain me why decrementing one variable threadsRemainin stops (or prevents) incrementing another one unsafeCount ?
The code of console app:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace seParallelForEachKeepsSpawningNewThreads
{
public class Node
{
public Node Previous { get; private set; }
public Node(Node previous)
{
Previous = previous;
}
}
public class Program
{
public static void Main(string[] args)
{
DateTime startMoment = DateTime.Now;
int concurrentThreads = 0;
int unsafeCount = 0;
var jobs = Enumerable.Range(0, 10);
ParallelOptions po = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
Parallel.ForEach(jobs, po, delegate(int jobNr)
{
int threadsRemaining = Interlocked.Increment(ref concurrentThreads);
unsafeCount++;
int heavyness = jobNr % 9;
//Give the processor and the garbage collector something to do...
List<Node> nodes = new List<Node>();
Node current = null;
//for (int y = 0; y < 1024 * 1024 * heavyness; y++)
for (int y = 0; y < 1024 * 4 * heavyness; y++)
{
current = new Node(current);
nodes.Add(current);
}
TimeSpan elapsed = DateTime.Now - startMoment;
//*****************
//threadsRemaining = Interlocked.Decrement(ref concurrentThreads);
Console.WriteLine("[{0:mm\\:ss}] Job {1} complete. {2} threads remaining. unsafeCount={2}",
elapsed, jobNr, threadsRemaining, unsafeCount);
});
Console.WriteLine("FINISHED");
Console.ReadLine();
}
}
}
This is the problem:
Console.WriteLine(
"[{0:mm\\:ss}] Job {1} complete. {2} threads remaining. unsafeCount={2}",
elapsed, jobNr, threadsRemaining, unsafeCount);
The final part should be {3}, not {2}. You're just printing out threadsRemaining twice at the moment...
Given an entity List, of updated objects, is it safe to instantiate a new context per iteration in a Parallel.For or foreach loop, and call SubmitChanges() on every of (let's say) 10 000 iterations?
Is it safe performing bulk updates this way? What are the possible drawbacks?
This may be a scenerio where parallelism should be avoided.
Instantiating a new DataContext per an iteration would mean that within the iteration a connection would be acquired from the connection pool, opened and a single entity written to the database before returning the connection to pool. Do this every iteration is a comparitively expensive operation so the generating a overhead that outweighs the advantages of parallelism. Where as adding entities to the data context and writing them to the database as a single action is more efficent.
Using the following as a benchmark for the Parallel insertions:
private static TimeSpan RunInParallel(int inserts)
{
Stopwatch watch = new Stopwatch();
watch.Start();
Parallel.For(0, inserts, new ParallelOptions() { MaxDegreeOfParallelism = 100 },
(i) =>
{
using (var context = new DataClasses1DataContext())
{
context.Tables.InsertOnSubmit(new Table() { Number = i });
context.SubmitChanges();
}
}
);
watch.Stop();
return watch.Elapsed;
}
For serial insertions:
private static TimeSpan RunInSerial(int inserts)
{
Stopwatch watch = new Stopwatch();
watch.Start();
using (var ctx = new DataClasses1DataContext())
{
for (int i = 0; i < inserts; i++)
{
ctx.Tables.InsertOnSubmit(new Table() { Number = i });
}
ctx.SubmitChanges();
}
watch.Stop();
return watch.Elapsed;
}
Where the DataClasses1DataContext classes are an automatically generated DataContext for:
When run on a first generation Intel i7 (8 logical cores) the following results were obtained:
10 inserts:
Average time elapsed for a 100 runs in parallel: 00:00:00.0202820
Average time elapsed for a 100 runs in serial: 00:00:00.0108694
100 inserts:
Average time elapsed for a 100 runs in parallel: 00:00:00.2269799
Average time elapsed for a 100 runs in serial: 00:00:00.1434693
1000 inserts:
Average time elapsed for a 100 runs in parallel: 00:00:02.1647577
Average time elapsed for a 100 runs in serial: 00:00:00.8163786
10000 inserts:
Average time elapsed for a 10 runs in parallel: 00:00:22.7436584
Average time elapsed for a 10 runs in serial: 00:00:07.7273398
In general, when run in parallel the insertions take approximately twice as long to execute as when run without parallelism.
UPDATE:
If you can implement some batching scheme for the data, it might be beneficial to use parallel insertions.
When using batches, the size of the batch will affect the insertion performance so some optimal ratio between the number of entries per batch and number of batches inserted will have to be determined. To demonstrate this the following method was used to batch 10000 inserts into groups of 1 (10000 batches, same as the initial parallel approach), 10 (1000 batches), 100 (100 batches), 1000 (10 batches), 10000 (1 batch, same as the serial insertion approach) then insert each batch in parallel:
private static TimeSpan RunAsParallelBatches(int inserts, int batchSize)
{
Stopwatch watch = new Stopwatch();
watch.Start();
// batch the data to be inserted
List<List<int>> batches = new List<List<int>>();
for (int g = 0; g < inserts / batchSize; g++)
{
List<int> numbers = new List<int>();
int start = g * batchSize;
int end = start + batchSize;
for (int i = start; i < end; i++)
{
numbers.Add(i);
}
batches.Add(numbers);
}
// insert each batch in parallel
Parallel.ForEach(batches,
(batch) =>
{
using (DataClasses1DataContext ctx = new DataClasses1DataContext())
{
foreach (int number in batch)
{
ctx.Tables.InsertOnSubmit(new Table() { Number = number });
}
ctx.SubmitChanges();
}
}
);
watch.Stop();
return watch.Elapsed;
}
taking the average time for 10 runs of 10000 insertions generates the following results:
10000 inserts repeated 10 times
Average time for initial parallel insertion approach: 00:00:22.7436584
Average time in parallel using batches of 1 entity (10000 batches): 00:00:23.1088289
Average time in parallel using batches of 10 entities (1000 batches): 00:00:07.1443220
Average time in parallel using batches of 100 entities (100 batches): 00:00:04.3111268
Average time in parallel using batches of 1000 entities (10 batches): 00:00:04.0668334
Average time in parallel using batches of 10000 entities (1 batch): 00:00:08.2820498
Average time for serial insertion approach: 00:00:07.7273398
So by batching the insertions into groups, an performance increase can be gained so long as enough work is performed with in the iteration to outweigh the overhead of setting up the DataContext and performing the batch insertions. In this case by batching the insertions into groups of 1000, the parallel insertion managed to out perform the serial by ~2x on this system.
This can be done safely and will yield better performance. You need to make sure that:
you are not ever accessing the same datacontext concurrently
your are inserting batches of rows (maybe 100 to 10000 at a time). This will keep the overhead of instantiating the datacontext and opening connections low.