When is Parallel.Invoke useful?

When is Parallel.Invoke useful? - c#

I'm just diving into learning about the Parallel class in the 4.0 Framework and am trying to understand when it would be useful. At first after reviewing some of the documentation I tried to execute two loops, one using Parallel.Invoke and one sequentially like so:
static void Main()
{
DateTime start = DateTime.Now;
Parallel.Invoke(BasicAction, BasicAction2);
DateTime end = DateTime.Now;
var parallel = end.Subtract(start).TotalSeconds;
start = DateTime.Now;
BasicAction();
BasicAction2();
end = DateTime.Now;
var sequential = end.Subtract(start).TotalSeconds;
Console.WriteLine("Parallel:{0}", parallel.ToString());
Console.WriteLine("Sequential:{0}", sequential.ToString());
Console.Read();
}
static void BasicAction()
{
for (int i = 0; i < 10000; i++)
{
Console.WriteLine("Method=BasicAction, Thread={0}, i={1}", Thread.CurrentThread.ManagedThreadId, i.ToString());
}
}
static void BasicAction2()
{
for (int i = 0; i < 10000; i++)
{
Console.WriteLine("Method=BasicAction2, Thread={0}, i={1}", Thread.CurrentThread.ManagedThreadId, i.ToString());
}
}
There is no noticeable difference in time of execution here, or am I missing the point? Is it more useful for asynchronous invocations of web services or...?
EDIT: I removed the DateTime with Stopwatch, removed the write to the console with a simple addition operation.
UPDATE - Big Time Difference Now: Thanks for clearing up the problems I had when I involved Console
static void Main()
{
Stopwatch s = new Stopwatch();
s.Start();
Parallel.Invoke(BasicAction, BasicAction2);
s.Stop();
var parallel = s.ElapsedMilliseconds;
s.Reset();
s.Start();
BasicAction();
BasicAction2();
s.Stop();
var sequential = s.ElapsedMilliseconds;
Console.WriteLine("Parallel:{0}", parallel.ToString());
Console.WriteLine("Sequential:{0}", sequential.ToString());
Console.Read();
}
static void BasicAction()
{
Thread.Sleep(100);
}
static void BasicAction2()
{
Thread.Sleep(100);
}

The test you are doing is nonsensical; you are testing to see if something that you can not perform in parallel is faster if you perform it in parallel.
Console.Writeline handles synchronization for you so it will always act as though it is running on a single thread.
From here:
...call the SetIn, SetOut, or SetError method, respectively. I/O
operations using these streams are synchronized, which means multiple
threads can read from, or write to, the streams.
Any advantage that the parallel version gains from running on multiple threads is lost through the marshaling done by the console. In fact I wouldn't be surprised to see that all the thread switching actually means that the parallel run would be slower.
Try doing something else in the actions (a simple Thread.Sleep would do) that can be processed by multiple threads concurrently and you should see a large difference in the run times. Large enough that the inaccuracy of using DateTime as your timing mechanism will not matter too much.

It's not a matter of time of execution. The output to the console is determined by how the actions are scheduled to run. To get an accurate time of execution, you should be using StopWatch. At any rate, you are using Console.Writeline so it will appear as though it is in one thread of execution. Any thing you have tried to attain by using parallel.invoke is lost by the nature of Console.Writeline.

On something simple like that the run times will be the same. What Parallel.Invoke is doing is running the two methods at the same time.
In the first case you'll have lines spat out to the console in a mixed up order.
Method=BasicAction2, Thread=6, i=9776
Method=BasicAction, Thread=10, i=9985
// <snip>
Method=BasicAction, Thread=10, i=9999
Method=BasicAction2, Thread=6, i=9777
In the second case you'll have all the BasicAction's before the BasicAction2's.
What this shows you is that the two methods are running at the same time.

In ideal case (if number of delegates is equal to number of parallel threads & there are enough cpu cores) duration of operations will become MAX(AllDurations) instead of SUM(AllDurations) (if AllDurations is a list of each delegate execution times like {1sec,10sec, 20sec, 5sec} ). In less idealcase its moving in this direction.
Its useful when you don't care about the order in which delegates are invoked, but you care that you block thread execution until every delegate is completed, so yes it can be a situation where you need to gather data from various sources before you can proceed (they can be webservices or other types of sources).
Parallel.For can be used much more often I think, in this case its pretty much required that you got different tasks and each is taking substantial duration to execute, and I guess if you don't have an idea of possible range of execution times ( which is true for webservices) Invoke will shine the most.
Maybe your static constructor requires to build up two independant dictionaries for your type to use, you can invoke methods that fill them using Invoke() in parallel and shorten time 2x if they both take roughly same time for example.

Related

C# shared variable access from different threads

I am using a static variables to get access between threads, but is taking so long to get their values.
Context: I have a static class Results.cs, where I store the result variables of two running Process.cs instances.
public static int ResultsStation0 { get; set; }
public static int ResultsStation1 { get; set; }
Then, a function of the two process instances is called at the same time, with initial value of ResultsStation0/1 = -1.
Because the result will be provided not at the same time, the function is checking that both results are available. The fast instance will set the result and await for the result of the slower instance.
void StationResult(){
Stopwatch sw = new Stopwatch();
sw.Restart();
switch (stationIndex) //Set the result of the station thread
{
case 0: Results.ResultsStation0 = 1; break;
case 1: Results.ResultsStation1 = 1; break;
}
//Waits to get the results of both threads
while (true)
{
if (Results.ResultsStation0 != -1 && Results.ResultsStation1 != -1)
{
break;
}
}
Trace_Info("GOT RESULTS " + stationIndex + "Time: " + sw.ElapsedMilliseconds.ToString() + "ms");
if (Results.ResultsStation0 == 1 && Results.ResultsStation1 == 1)
{
//set OK if both results are OK
Device.profinet.WritePorts(new Enum[] { NOK, OK },
new int[] { 0, 1 });
}
}
It works, but the problem is that the value of sw of the thread that awaits, should be 1ms more or less. I am getting 1ms sometimes, but most of the times I have values up to 80ms.
My question is: why it takes that much if they are sharing the same memory (I guess)?
Is this the right way to access to a variable between threads?

Don't use this method. Global mutable state is bad enough. Mixing in multiple threads sounds like a recipe for unmaintainable code. Since there is no synchronization at all in sight there is no real guarantee that your program may ever finish. On a single CPU system your loop will prevent any real work from actually being done until the scheduler picks one of the worker threads to run, an even on multi core system you will waste a ton of CPU cycles.
If you really want global variables, these should be something that can signal the completion of the operation, i.e. a Task, or ManualResetEvent. That way you can get rid of your horrible spin-wait, and actually wait for each task to complete.
But I would highly recommend to get rid of the global variables and just use standard task based programming:
var result1 = Task.Run(MyMethod1);
var result2 = Task.Run(MyMethod2);
await Task.WhenAll(new []{result1, result2});
Such code is much easier to reason about and understand.
Multi threaded programming is difficult. There are a bunch of new ways your program can break, and the compiler will not help you. You are lucky if you even get an exception, in many cases you will just get an incorrect result. If you are unlucky you will only get incorrect results in production, not in development or testing. So you should read a fair amount about the topic so that you are at least familiar with the common dangers and the ways to mitigate them.

You are using flags as signaling for this you have a class called AutoResetEvent.
There's a difference between safe access and synchronization.
For safe access (atomic) purpose you can use the class Interlocked
For synchronization you use mutex based solutions - either spinlocks, barriers, etc...
What it looks like is you need a synchronization mechanism because you relay on an atomic behavior to signal a process that it is done.
Further more,
For C# there's the async way to do things and that is to call await.
It is Task based so in case you can redesign your flow to use Tasks instead of Threads it will suit you more.
Just to be clear - atomicity means you perform the call in one go.
So for example this is not atomic
int a = 0;
int b = a; //not atomic - read 'a' and then assign to 'b'.
I won't teach you everything to know about threading in C# in one post answer - so my advice is to read the MSDN articles about threading and tasks.

Let threads wait for all tasks to complete before starting on the next set of tasks

I have a pipeline that consists of several stages. Jobs in the same stage can be worked on in parallel. But all jobs in stage 1 have to completed before anyone can start working on jobs in stage 2, etc..
I was thinking of synchronizing this work using a CountDownEvent.
My basis structure would be
this.WorkerCountdownEvent = new CountdownEvent(MaxJobsInStage);
this.WorkerCountdownEvent.Signal(MaxJobsInStage); // Starts all threads
// Each thread runs the following code
for (this.currentStage = 0; this.currentStage < this.PipelineStages.Count; this.currentStage++)
{
this.WorkerCountdownEvent.Wait();
var stage = this.PipelineStages[this.currentStage];
if (stage.Systems.Count < threadIndex)
{
var system = stage.Systems[threadIndex];
system.Process();
}
this.WorkerCountdownEvent.Signal(); // <--
}
This would work well for processing one stage. But the first thread that reaches this.WorkerCountdownEvent.Signal() will crash the application as its trying to decrement the signal to below zero.
Of course if I want prevent this, and have the jobs to wait again, I have to call this.WorkerCountdownEvent.Reset(). But I have to call it after all threads have started working, but before one thread is done with its work. Which seems like an impossible task?
Am I using the wrong synchronization primitive? Or should I use two countdown events? Or am I missing something completely?
(Btw usually jobs will take less than a milliseconds so bonus points if someone has a better way to do this using 'slim' primitives like ManualResetEventSlim. ThreadPools, or Task<> are not the direction I'm looking at since these threads will live for very long (hours) and need to go through the pipeline 60x per second. So the overhead of stopping/starting a Tasks is considerable here).
Edit: this questions was flagged as a duplciate of two questions. One of the questons was answered with "use thread.Join()" and the other one was answered with "Use TPL" both answers are (in my opnion) clearly not answers to a question about pipelining and threading primitives such as CountDownEvent.

I think that the most suitable synchronization primitive for this case is the Barrier.
Enables multiple tasks to cooperatively work on an algorithm in parallel through multiple phases.
Usage example:
private Barrier _barrier = new Barrier(this.WorkersCount);
// Each worker thread runs the following code
for (i = 0; i < this.StagesCount; i++)
{
// Here goes the work of a single worker for a single stage...
_barrier.SignalAndWait();
}
Update: In case you want the workers to wait for the signal asynchronously, there is an AsyncBarrier implementation here.

Controlling number of tasks being executed at an instance at runtime [duplicate]

I have the following code:
var factory = new TaskFactory();
for (int i = 0; i < 100; i++)
{
var i1 = i;
factory.StartNew(() => foo(i1));
}
static void foo(int i)
{
Thread.Sleep(1000);
Console.WriteLine($"foo{i} - on thread {Thread.CurrentThread.ManagedThreadId}");
}
I can see it only does 4 threads at a time (based on observation). My questions:
What determines the number of threads used at a time?
How can I retrieve this number?
How can I change this number?
P.S. My box has 4 cores.
P.P.S. I needed to have a specific number of tasks (and no more) that are concurrently processed by the TPL and ended up with the following code:
private static int count = 0; // keep track of how many concurrent tasks are running
private static void SemaphoreImplementation()
{
var s = new Semaphore(20, 20); // allow 20 tasks at a time
for (int i = 0; i < 1000; i++)
{
var i1 = i;
Task.Factory.StartNew(() =>
{
try
{
s.WaitOne();
Interlocked.Increment(ref count);
foo(i1);
}
finally
{
s.Release();
Interlocked.Decrement(ref count);
}
}, TaskCreationOptions.LongRunning);
}
}
static void foo(int i)
{
Thread.Sleep(100);
Console.WriteLine($"foo{i:00} - on thread " +
$"{Thread.CurrentThread.ManagedThreadId:00}. Executing concurently: {count}");
}

When you are using a Task in .NET, you are telling the TPL to schedule a piece of work (via TaskScheduler) to be executed on the ThreadPool. Note that the work will be scheduled at its earliest opportunity and however the scheduler sees fit. This means that the TaskScheduler will decide how many threads will be used to run n number of tasks and which task is executed on which thread.
The TPL is very well tuned and continues to adjust its algorithm as it executes your tasks. So, in most cases, it tries to minimize contention. What this means is if you are running 100 tasks and only have 4 cores (which you can get using Environment.ProcessorCount), it would not make sense to execute more than 4 threads at any given time, as otherwise it would need to do more context switching. Now there are times where you want to explicitly override this behaviour. Let's say in the case where you need to wait for some sort of IO to finish, which is a whole different story.
In summary, trust the TPL. But if you are adamant to spawn a thread per task (not always a good idea!), you can use:
Task.Factory.StartNew(
() => /* your piece of work */,
TaskCreationOptions.LongRunning);
This tells the DefaultTaskscheduler to explicitly spawn a new thread for that piece of work.
You can also use your own Scheduler and pass it in to the TaskFactory. You can find a whole bunch of Schedulers HERE.
Note another alternative would be to use PLINQ which again by default analyses your query and decides whether parallelizing it would yield any benefit or not, again in the case of a blocking IO where you are certain starting multiple threads will result in a better execution you can force the parallelism by using WithExecutionMode(ParallelExecutionMode.ForceParallelism) you then can use WithDegreeOfParallelism, to give hints on how many threads to use but remember there is no guarantee you would get that many threads, as MSDN says:
Sets the degree of parallelism to use in a query. Degree of
parallelism is the maximum number of concurrently executing tasks that
will be used to process the query.
Finally, I highly recommend having a read of THIS great series of articles on Threading and TPL.

If you increase the number of tasks to for example 1000000 you will see a lot more threads spawned over time. The TPL tends to inject one every 500ms.
The TPL threadpool does not understand IO-bound workloads (sleep is IO). It's not a good idea to rely on the TPL for picking the right degree of parallelism in these cases. The TPL is completely clueless and injects more threads based on vague guesses about throughput. Also to avoid deadlocks.
Here, the TPL policy clearly is not useful because the more threads you add the more throughput you get. Each thread can process one item per second in this contrived case. The TPL has no idea about that. It makes no sense to limit the thread count to the number of cores.
What determines the number of threads used at a time?
Barely documented TPL heuristics. They frequently go wrong. In particular they will spawn an unlimited number of threads over time in this case. Use task manager to see for yourself. Let this run for an hour and you'll have 1000s of threads.
How can I retrieve this number? How can I change this number?
You can retrieve some of these numbers but that's not the right way to go. If you need a guaranteed DOP you can use AsParallel().WithDegreeOfParallelism(...) or a custom task scheduler. You also can manually start LongRunning tasks. Do not mess with process global settings.

I would suggest using SemaphoreSlim because it doesn't use Windows kernel (so it can be used in Linux C# microservices) and also has a property SemaphoreSlim.CurrentCount that tells how many remaining threads are left so you don't need the Interlocked.Increment or Interlocked.Decrement. I also removed i1 because i is value type and it won't be changed by the call of foo method passing the i argument so it's no need to copy it into i1 to ensure it never changes (if that was the reasoning for adding i1):
private static void SemaphoreImplementation()
{
var maxThreadsCount = 20; // allow 20 tasks at a time
var semaphoreSlim = new SemaphoreSlim(maxTasksCount, maxTasksCount);
var taskFactory = new TaskFactory();
for (int i = 0; i < 1000; i++)
{
taskFactory.StartNew(async () =>
{
try
{
await semaphoreSlim.WaitAsync();
var count = maxTasksCount-semaphoreSlim.CurrentCount; //SemaphoreSlim.CurrentCount tells how many threads are remaining
await foo(i, count);
}
finally
{
semaphoreSlim.Release();
}
}, TaskCreationOptions.LongRunning);
}
}
static async void foo(int i, int count)
{
await Task.Wait(100);
Console.WriteLine($"foo{i:00} - on thread " +
$"{Thread.CurrentThread.ManagedThreadId:00}. Executing concurently: {count}");
}

Async slower than Sync

I have been working on Async calls and I found that the Async version of a method is running much slower than the Sync version. Can anyone comment on what I may be missing. Thanks.
Statistics
Sync method time is 00:00:23.5673480
Async method time is 00:01:07.1628415
Total Records/Entries returned per call = 19972
Below is the code that i am running.
-------------------- Test class ----------------------
[TestMethod]
public void TestPeoplePerformanceSyncVsAsync()
{
DateTime start;
DateTime end;
start = DateTime.Now;
for (int i = 0; i < 10; i++)
{
using (IPersonRepository repository = kernel.Get<IPersonRepository>())
{
IList<IPerson> people1 = repository.GetPeople();
IList<IPerson> people2 = repository.GetPeople();
}
}
end = DateTime.Now;
var diff = start - end;
Console.WriteLine(diff);
start = DateTime.Now;
for (int i = 0; i < 10; i++)
{
using (IPersonRepository repository = kernel.Get<IPersonRepository>())
{
Task<IList<IPerson>> people1 = GetPeopleAsync();
Task<IList<IPerson>> people2 = GetPeopleAsync();
Task.WaitAll(new Task[] {people1, people2});
}
}
end = DateTime.Now;
diff = start - end;
Console.WriteLine(diff);
}
private async Task<IList<IPerson>> GetPeopleAsync()
{
using (IPersonRepository repository = kernel.Get<IPersonRepository>())
{
return await repository.GetPeopleAsync();
}
}
-------------------------- Repository ----------------------------
public IList<IPerson> GetPeople()
{
List<IPerson> people = new List<IPerson>();
using (PersonContext context = new PersonContext())
{
people.AddRange(context.People);
}
return people;
}
public async Task<IList<IPerson>> GetPeopleAsync()
{
List<IPerson> people = new List<IPerson>();
using (PersonContext context = new PersonContext())
{
people.AddRange(await context.People.ToListAsync());
}
return people;
}

So we've got a whole bunch of issues here, so I'll just say right off the bat that this isn't going to be an exhaustive list.
First off, the point of asynchrony is not strictly to improve performance. It can be, in certain contexts, used to improve performance, but that's not necessarily its goal. It can also be used to keep a UI responsive, for example. Paralleization is usually used to increase performance, but parallelization and asynchrony aren't equivalent. On top of that, parallelization has an overhead. You're spending time creating threads, scheduling them, synchronizing data between them, etc. The benefit of performing some operations in parallel may or may not surpass this overhead. If it doesn't, a synchronous solution may well be more performant.
Next, your "asynchronous" example isn't asynchronous "all the way up". You're calling WaitAll on the tasks inside the loop. For the example to be properly asynchronous one would like to see it be asynchronous all the way up to a single operation, namely some form of message loop.
Next, the two aren't don't the exact same thing in an asynchronous and synchronous manor. They are doing different things, which will obviously affect performance:
Your "asynchronous" solution creates 3 repositories. Your synchronous solution creates one. There is going to be some overhead here.
GetPeopleAsync takes a list, then pulls all of the items out of the list and puts them into another list. That's unnecessary overhead.
Then there are problems with your benchmarking:
You're using DateTime.Now, which is not designed for timing how long an operation takes. it's precision isn't particularly high, for example. You should use a StopWatch to time how long code takes.
You aren't performing all that many iterations. There's plenty of opportunity for the variation to affect the results here.
You aren't accounting for the fact that the first few runs through a section of code will take longer. The JITter needs to "warm up".
Garbage collections can be affecting your timings, namely that the objects created in the first test can end up being cleaned up during the second test.

It may depend on your data, or rather the amount of it. You didn't post what test metrics you're using to run your tests but this is my experience:
Usually when you see a slowdown in the performance of parallel algorithms when you're expecting improvement it's that the overhead of loading the extra libraries and spawning threads etc. slows down the parallel algorithm and makes it look like the linear/single-threaded version is performing better.
A greater amount of data should show better performance. Also try running the same test twice when all the libraries are loaded to avoid the load overhead.
If you don't see improvement, something is seriously wrong.
Note: You're getting voted down, I'm guessing, because you posted much more code than context, metrics etc. in the OP. IMO, very few SOers will actually bother to read and grok even that much code without being able to execute it while also being presented with metrics that are not at all useful!
Why I didn't read the code: When I see a code block with scroll bars along with the kind of text that was present in the original OP, my brain says: Don't bother. I think many if not most, probably do this.
Things to try:
Two different synch times does not mean statistically significant data. You should run each algorithm a number of times (5 at least) to see if you're experiencing anomalies. If your results for the same algorithms vary wildly then you may have other issues such as bandwidth restriction, server load etc. and the issue is external.
Try a .NET memory performance and/or memory profiler to help you track down the issue.
See #servy's great answer for more clues. It seems that he actually took the time to look at your code more closely.

Thread Safe Method with a Stop Watch

I'm trying to determine if my code I'm using is Thread Safe or not. I'm basically trying to call a method several times from different threads, and capture the time it takes for certain calls within the method to complete.
Here is an example of what I am doing.
using System;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
namespace ThreadTest
{
class Program
{
static BlockingCollection<TimeSpan> Timer1 = new BlockingCollection<TimeSpan>(new ConcurrentBag<TimeSpan>());
static TimeSpan CaptureTime(Action action)
{
Stopwatch stopwatch = Stopwatch.StartNew();
action();
stopwatch.Stop();
return stopwatch.Elapsed;
}
static void ThreadFunction()
{
TimeSpan timer1 = new TimeSpan();
timer1 = CaptureTime(() =>
{
//Do Some Work
});
Timer1.Add(timer1);
}
static void Main(string[] args)
{
for (int i = 0; i < 50; i++)
{
var task = new Task(ThreadFunction);
task.Start();
}
}
}
}
And what I'm trying to determine is whether or not the TimeSpan values returned by the CaptureTime method can be trusted.
Thank you to anyone who can enlighten me.

Use of Stopwatch here is not the problem. See this recent answer. Since you are in a single thread when you use the Stopwatch, it will work fine.
But I'm not sure this approach is really going to be very useful. Are you trying to create your own profiler? Why not just use existing profiling tools?
When you spin up 50 instances of the same operation, they're bound to fight for the same CPU resources. Also, a new Task might or might not spin up a new thread. Even then, the amount of switching involved would make the results less-than-meaningful. Unless you are specifically trying to observe parallel behavior, I would avoid this approach.
The better way would be to run the action 50 times sequentially, time the whole thing, then divide by 50. (Assuming this is a short-running task.)
The using of BlockingCollection<TimeSpan>(new ConcurrentBag<TimeSpan>()) is also very weird. Since you are just adding to the list, and it is static and pre-created, then you can just use List<TimeSpan>. See the notes on Thread Saftey in the List<T> documentation here.
Ignore that. I misunderstood the context of the docs. Your code is just fine, and is indeed thread-safe. Thanks to Jim and Alexi for clearing that up.

They can be 'trusted' alright but that does not mean they will be very accurate.
It depends on lots of factors but basically you would want to measure a large number of calls to action() (on the same thread) and average them. Especially when a single call takes a relatively short time ( <= 1 ms)
You will still have to deal with external factors, Windows is not a Real Time OS.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.