ParallelEnumerable.Range vs Enumerable.Range.AsParallel?

ParallelEnumerable.Range vs Enumerable.Range.AsParallel? - c#

What is the differences between ParallelEnumerable.Range and Enumerable.Range(...).AsParallel()?
ParallelEnumerable.Range creates range partition( best for operations where cpu time is equal foreach item)
where as Enumerable.Range(...).AsParallel() might be executes as range or chunk
Is there any performance difference ? When should I use which ?

If you're performing an operation where it's CPU pressure grows as it iterates then you're going to want to use AsParallel because it can make adjustments as it iterates. However, if it's something that you just need to run in parallel and the operation doesn't create more pressure as it iterates, then use ParallelEnumerable.Range because it can partition the work immediately.
Reference this article for a more detailed explanation.
So, let's assume you're performing some complex math in each iteration, and as the input values grow the math takes longer, you're going to want to use AsParallel so that it can make adjustments. But if you use that same option with something like setting a registry setting, you'll see a performance decrease because there is overhead associated with AsParallel that is not necessary.

Related

why does Paralle.for loose so badly [duplicate]

Here is the code:
using (var context = new AventureWorksDataContext())
{
IEnumerable<Customer> _customerQuery = from c in context.Customers
where c.FirstName.StartsWith("A")
select c;
var watch = new Stopwatch();
watch.Start();
var result = Parallel.ForEach(_customerQuery, c => Console.WriteLine(c.FirstName));
watch.Stop();
Debug.WriteLine(watch.ElapsedMilliseconds);
watch = new Stopwatch();
watch.Start();
foreach (var customer in _customerQuery)
{
Console.WriteLine(customer.FirstName);
}
watch.Stop();
Debug.WriteLine(watch.ElapsedMilliseconds);
}
The problem is, Parallel.ForEach takes about 400ms vs a regular foreach, which takes about 40ms. What exactly am I doing wrong and why doesn't this work as I expect it to?

Suppose you have a task to perform. Let's say you're a math teacher and you have twenty papers to grade. It takes you two minutes to grade a paper, so it's going to take you about forty minutes.
Now let's suppose that you decide to hire some assistants to help you grade papers. It takes you an hour to locate four assistants. You each take four papers and you are all done in eight minutes. You've traded 40 minutes of work for 68 total minutes of work including the extra hour to find the assistants, so this isn't a savings. The overhead of finding the assistants is larger than the cost of doing the work yourself.
Now suppose you have twenty thousand papers to grade, so it is going to take you about 40000 minutes. Now if you spend an hour finding assistants, that's a win. You each take 4000 papers and are done in a total of 8060 minutes instead of 40000 minutes, a savings of almost a factor of 5. The overhead of finding the assistants is basically irrelevant.
Parallelization is not free. The cost of splitting up work amongst different threads needs to be tiny compared to the amount of work done per thread.
Further reading:
Amdahl's law
Gives the theoretical speedup in latency of the execution of a task at fixed workload, that can be expected of a system whose resources are improved.
Gustafson's law
Gives the theoretical speedup in latency of the execution of a task at fixed execution time, that can be expected of a system whose resources are improved.

The first thing you should realize is that not all parallelism is beneficial. There is an amount of overhead to parallelism, and this overhead may or may not be significant depending on the complexity what is being parallelized. Since the work in your parallel function is very small, the overhead of the management the parallelism has to do becomes significant, thus slowing down the overall work.

The additional overhead of creating all the threads for your enumerable VS just executing the numerable is more than likely the cause for the slowdown. Parallel.ForEach is not a blanket performance increasing move; it needs to be weighed whether or not the operation that is to be completed for each element is likely to block.
For example, if you were to make a web request or something instead of simply writing to the console, the parallel version might be faster. As it is, simply writing to the console is a very fast operation, so the overhead of creating the threads and starting them is going to be slower.

As previous writer has said there are some overhead associated with Parallel.ForEach, but that is not why you can't see your performance improvement. Console.WriteLine is a synchronous operation, so only one thread is working at a time. Try changing the body to something non-blocking and you will see the performance increase (as long as the amount of work in the body is big enough to outweight the overhead).

I like salomons answer and would like to add that you also have additional overhead of
Allocating delegates.
Calling through them.

Parallel.Invoke vs Parallel.Foreach for running parallel process on large list

I have C# list which contains around 8000 items (file paths). I want to run a method on all of these items in parallel. For this i have below 2 options:
1) Manually divide list into small-small chunks (say of 500 size each) and create array of actions for these small lists and then call Parallel.Invoke like below:
var partitionedLists = MainList.DivideIntoChunks(500);
List<Action> actions = new List<Action>();
foreach (var lst in partitionedLists)
{
actions.Add(() => CallMethod(lst));
}
Parallel.Invoke(actions.ToArray())
2) Second option is to run Parallel.ForEach like below
Parallel.ForEach(MainList, item => { CallMethod(item) });
What will the best option here?
How Parallel.Foreach divide the list
into small chunks?
Please suggest, thanks in advance.

The first option is a form of task-parallelism, in which you divide your task into group of sub-tasks and execute them in parallel. As is obvious from the code you provided, you are responsible for choosing the level of granularity [chunks] while creating the sub-tasks. The selected granularity might be too big or too low, if one does not rely on appropriate heuristics, and the resulting performance gain might not be significant. Task-parallelism is used in scenarios where the operation to be performed takes similar time for all input values.
The second option is a form of data-parallelism, in which the input data is divided into smaller chunks based on the number of hardware threads/cores/processors available, and then each individual chunk is processed in isolation. In this case, the .NET library chooses the right level of granularity for you and ensures better CPU utilization. Conventionally, data-parallelism is used in scenarios when the operation to be performed can vary in terms of time taken, depending on the input value.
In conclusion, if your operation is more or less uniform over the range of input values and you know the right granularity [chunk size], go ahead with the first option. If however that's not the case or if you are unsure about the above questions, go with the second option which usually pans out better in most scenarios.
NOTE: If this is a very performance critical component of your application, I will advise bench-marking the performances in production like environment with both approaches to get more data, in addition to the above recommendations.

Determininistic random numbers in parallel code

I have a question regarding thread ordering for the TPL.
Indeed, it is very important for me that my Parallel.For loop to be executed in the order of the loop. What I mean is that given 4 threads, i would like the first thread to execute every 4k loop, 2nd thread every 4k+1 etc with (k between 0 and, NbSim/4).
1st thread -> 1st loop, 2nd thread -> 2nd loop , 3rd thread -> 3rd loop
4th thread -> 4th loop , 1th thread -> 5th loop etc ...
I have seen the OrderedPartition directive but I am not quite sure of the way I should apply it to a FOR loop and not to a Parallel.FOREACH loop.
Many Thanks for your help.
Follwing the previous remkarks, I am completing the description :
Actually, after some consideration, I believe that my problem is not about ordering.
Indeed, I am working on a Monte-Carlo engine, in which for each iteration I am generating a set of random numbers (always the same (seed =0)) and then apply some business logic to them. Thus everything should be deterministic and when running the algorithm twice I should get the exact same results. But unfortunately this is not the case, and I am strugeling to understand why. Any idea, on how to solve that kind of problems (without printing out every variable I have)?
Edit Number 2:
Thank you all for your suggestions
First, here is the way my code is ordered :
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 4; //or 1
ParallelLoopResult res = Parallel.For<LocalDataStruct>(1,NbSim, options,
() => new LocalDataStruct(//params of the constructor of LocalData),
(iSim, loopState, localDataStruct) => {
//logic
return localDataStruct;
}, localDataStruct => {
lock(syncObject) {
//critical section for outputting the parameters
});
When setting the degreeofParallelism to 1,everything works fine, however when setting the degree of Parallelism to 4 I am getting results that are false and non deterministic (when running the code twice I get different results). It is probably due to mutable objects that is what I am checking now, but the source code is quite extensive, so it takes time. Do you think that there is a good strategy to check the code other than review it (priniting out all variables is impossible in this case (> 1000)? Also when setting the Nb of Simulation to 4 for 4 threads everything is working fine as well, mostly due to luck I believe ( that s why I metionned my first idea regarding ordering).

You can enforce ordering in PLINQ but it comes at a cost. It gives ordered results but does not enforce ordering of execution.
You really cannot do this with TPL without essentially serializing your algorithm. The TPL works on a Task model. It allows you to schedule tasks which are executed by the scheduler with no guarantee as to the order in which the Tasks are executed. Typically parallel implementations take the PLINQ approach and guarantee ordering of results not ordering of execution.
Why is ordered execution important?
So. For a Monte-Carlo engine you would need to make sure that each index in your array received the same random numbers. This does not mean that you need to order your threads, just make the random numbers are ordered across the work done by each thread. So if each loop of your ParallelForEach was passed not only the array of elements to do work on but also it's own instance of a random number generator (with a different fixed seed per thread) then you will still get deterministic results.
I'm assuming that you are familiar with the challenges related to parallelizing Monte-Carlo and generating good random number sequences. If not here's something to get you started;Pseudo-random Number Generation for
Parallel Monte Carlo—A Splitting Approach, Fast, High-Quality, Parallel Random-Number Generators: Comparing Implementations.
Some suggestions
I would start off by ensuring that you can get deterministic results in the sequential case by replacing the ParallelForEach with a ForEach and see if this runs correctly. You could also try comparing the output of a sequential and a parallel run, add some diagnostic output and pipe it to a text file. Then use a diff tool to compare the results.
If this is OK then it is something to do with your parallel implementation, which as is pointed out below is usually related to mutable state. Some things to consider:
Is your random number generator threadsafe? Random is a poor random number generator at best and as far as I know is not designed for parallel execution. It is certainly not suitable for M-C calculations, parallel or otherwise.
Does your code have other state shared between threads, if so what is it? This state will be mutated in a non-deterministic manner and effect your results.
Are you merging results from different threads in parallel. The non-associativity of parallel floating point operations will also cause you issues here, see How can floating point calculations be made deterministic?. Even if the thread results are deterministic if you are combining them in a non deterministic way you will still have issues.

Assuming all threads share the same random number generator, then although you are generating the same sequence every time, which thread gets which elements of this sequence is non-deterministic. Hence you could arrive at different results.
That's if the random number generator is thread-safe; if it isn't, then it's not even guaranteed to generate the same sequence when called from multiple threads.
Apart from that it is difficult to theorize what could be causing non-determinism to arise; basically any global mutable state is suspicious. Each Task should be working with its own data.

If, rather than using a random number generator, you set up an array [0...N-1] of pre-determined values, say [0, 1/N, 2/N, ...], and do a Parallel.ForEach on that, does it still give nondeterministic results? If so, the RNG isn't the issue.

Drawing signal with a lot of samples

I need to display a set of signals. Each signal is defined by millions of samples. Just processing the collection (for converting samples to points according to bitmap size) of samples takes a significant amount of time (especially during scrolling).
So I implemented some kind of downsampling. I just skip some points: take every 2nd, every 3rd, every 50th point depending on signal characteristics. It increases speed very much but significantly distorts signal form.
Are there any smarter approaches?

We've had a similar issue in a recent application. Our visualization (a simple line graph) became too cluttered when zoomed out to see the full extent of the data (about 7 days of samples with a sample taken every 6 seconds more or less), so down-sampling was actually the way to go. If we didn't do that, zooming out wouldn't have much meaning, as all you would see was just a big blob of lines smeared out over the screen.
It all depends on how you are going to implement the down-sampling. There's two (simple) approaches: down-sample at the moment you get your sample or down-sample at display time.
What really gives a huge performance boost in both of these cases is the proper selection of your data-sources.
Let's say you have 7 million samples, and your viewing window is just interested in the last million points. If your implementation depends on an IEnumerable, this means that the IEnumerable will have to MoveNext 6 million times before actually starting. However, if you're using something which is optimized for random reads (a List comes to mind), you can implement your own enumerator for that, more or less like this:
public IEnumerator<T> GetEnumerator(int start, int count, int skip)
{
// assume we have a field in the class which contains the data as a List<T>, named _data
for(int i = start;i<count && i < _data.Count;i+=skip)
{
yield return _data[i];
}
}
Obviously this is a very naive implementation, but you can do whatever you want within the for-loop (use an algorithm based on the surrounding samples to average?). However, this approach will make usually smooth out any extreme spikes in your signal, so be wary of that.
Another approach would be to create some generalized versions of your dataset for different ranges, which update itself whenever you receive a new signal. You usually don't need to update the complete dataset; just updating the end of your set is probably good enough. This allows you do do a bit more advanced processing of your data, but it will cost more memory. You will have to cache the distinct 'layers' of detail in your application.
However, reading your (short) explanation, I think a display-time optimization might be good enough. You will always get a distortion in your signal if you generalize. You always lose data. It's up to the algorithm you choose on how this distortion will occur, and how noticeable it will be.

You need a better sampling algorithm, also you can employ parallel processing features of c#. Refer to Task Parallel Library

Can looking up object parameter in a large object array take too long? (C#)

I'm making a 2d tile-based game. Some of my testing levels have few tiles, some have 500,000 for testing purposes. Running a performance profiler that comes with visual studio shows the bottleneck:
What is exact reason why it takes so much time? How do I avoid such situations?
UPD: nevermind, I'm just going through the whole array instead of going only through the ~200 visible tiles.

It's not that evaluating each individual Visibility property/field would take much time - it's that you are doing it 500k times - it adds up. Since this is what the profiler estimates is the bottleneck, it means that the vast majority of items have Visibility set to false - otherwise you would assume the Draw() method call would be shown as bottleneck. One approach to optimize this could be separating the visible items and only iterating over those.

.Visible is a property. To understand why this is a bottleneck, you need to look at the implementation of that getter. While it looks like just a simple Boolean, it could be complicated behind the scenes.

Is 500k a typical scenario? Keep in mind that depending on your real world needs, the complexity varies a lot.
If you really have to deal with 500k tiles, I'd have a second List of visible tiles, or a list of ints that contain the array indexes of those visible tiles.

A sequential check for finding objects with a certain property turned on means a worst case scenario of running 'n' iterations where 'n' is the total number of records. This is certainly a bottleneck.
You may want to keep a collection of visible items separately.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.