Parallel ordered consumable

Parallel ordered consumable - c#

I would like to process some items in parallel. This processing is independent (order does not matter) and returns an output. These outputs should then be relayed back in order as quickly as possible.
That is to say, the method should behave equivalent to this (except calling Process in parallel):
IEnumerable<T> OrderedParallelImmediateSelect<T> (IEnumerable<object> source)
{
foreach (var input in source) {
var result = Process (input);
yield return result;
}
}
Accordingly, it it required to try to process the items in order. As this is (of course) not guaranteed to finish in order, the result collector must be sure to wait for delayed results.
As soon as the next result in order comes in, it must be returned immediately. We cannot wait for the whole input to be processed before sorting the results.
This is an example of how this could look like:
begin 0
begin 1 <-- we start processing in increasing order
begin 2
complete 1 <-- 1 is complete but we are still waiting for 0
begin 3
complete 0 <-- 0 is complete, so we can return it and 1, too
return 0
return 1
begin 4
begin 5
complete 4 <-- 2 and 3 are missing before we may return this
complete 2 <-- 2 is done, 4 must keep waiting
return 2
begin 6
complete 3 <-- 3 and 4 can now be returned
return 3
return 4
If at all possible, I would like to perform processing on a regular thread pool.
Is this scenario something .NET provides a solution for? I've built a custom solution, but would prefer to use something simpler.
I'm aware of a lot of similar questions, but it seems they all either allow waiting for all items to finish processing or do not guarantee ordered results.
Here's an attempt that sadly does not seem to work. Replacing IEnumerable with ParallelQuery had no effect.
int Process (int item)
{
Console.WriteLine ($"+ {item}");
Thread.Sleep (new Random (item).Next (100, 1000));
Console.WriteLine ($"- {item}");
return item;
}
void Output (IEnumerable<int> items)
{
foreach (var it in items) {
Console.WriteLine ($"=> {it}");
}
}
IEnumerable<int> OrderedParallelImmediateSelect (IEnumerable<int> source)
{
// This processes in parallel but does not return the results immediately
return source.AsParallel ().AsOrdered ().Select (Process);
}
var input = Enumerable.Range (0, 20);
Output (OrderedParallelImmediateSelect (input));
Output:
+0 +1 +3 +2 +4 +5 +6 +7 +9 +10 +11 +8 -1 +12 -3 +13 -5 +14 -7 +15 -9 +16 -11 +17 -14 +18 -16 +19 -0 -18 -2 -4 -6 -8 -13 -10 -15 -17 -12 -19 =>0 =>1 =>2 =>3 =>4 =>5 =>6 =>7 =>8 =>9 =>10 =>11 =>12 =>13 =>14 =>15 =>16 =>17 =>18 =>19

I created this program, as a console application:
using System;
using System.Linq;
using System.Threading;
namespace PlayAreaCSCon
{
class Program
{
static void Main(string[] args)
{
var items = Enumerable.Range(0, 1000);
int prodCount = 0;
foreach(var item in items.AsParallel()
.AsOrdered()
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.Select((i) =>
{
Thread.Sleep(i % 100);
Interlocked.Increment(ref prodCount);
return i;
}))
{
Console.WriteLine(item);
}
Console.ReadLine();
}
}
}
I then initially set a breakpoint on Console.WriteLine(item);. Running the program, when I first hit that breakpoint, prodCount is 5 - we're definitely consuming results before all processing has completed. And after removing the breakpoint, all results appear to be produced in the original order.

The ParallelMergeOptions.NotBuffered disables the buffering of the output, but there is also buffering happening at the other side. The PLINQ employs chunk partitioning by default, which means that the source is enumerated in chunks. This is easy to miss, because the chunks initially have a size of one, and are becoming progressively chunkier as the enumeration unfolds. To remove the buffering at the input side, you must use the EnumerablePartitionerOptions.NoBuffering option:
IEnumerable<int> OrderedParallelImmediateSelect(IEnumerable<int> source)
{
return Partitioner
.Create(source, EnumerablePartitionerOptions.NoBuffering)
.AsParallel()
.AsOrdered()
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.Select(Process);
}
Something else you might be interested to know is that the current thread participates in the processing of the source, along with ThreadPool threads. So if you have additional work to do during the enumeration of the resulting parallel query, this work will use less than the full power of a thread. It will be like running on a low-priority thread. If you don't want this to happen, you can offload the enumeration of the query to a separate ThreadPool thread, so that the Process runs only on ThreadPool threads, and the current thread is freed and can dedicate itself to the work on the results. There is a custom OffloadEnumeration method in this answer, that could be appended at the end of the query:
//...
.Select(Process)
.OffloadEnumeration();
...or used in the foreach loop:
foreach (var item in OffloadEnumeration(query)) // ...

Related

Why Iterations with lower index is not performed?

The code successfully build no compilation error however nothing iteration on runtime. I Stopped the loop iteration at 200 so loop will not proceed further but loop does not execute iteration lower than < 200.
I am not sure. Is there anything alternative of Stop I can use to fix this code?
Why Iterations with lower index is not performed?
How to fix this issue. I googled stuff but all vain.
Please consider the following code.
static void Main(string[] args)
{
var values= Enumerable.Range(0, 500).ToArray();
ParallelLoopResult result = Parallel.For(0, values.Count(),
(int i, ParallelLoopState loopState) => {
if (i == 200)
loopState.Stop();
WorkOnItem(values[i]);
});
Console.WriteLine(result);
}
static void WorkOnItem(object value) {
System.Console.WriteLine("Started working on: " + value);
Thread.Sleep(100);
System.Console.WriteLine("Finished working on: " + value); }
Any help to solve this issue would be appreciated. Thanks

You should call loopState.Break() instead of loopState.Stop().
From the documentation of ParallelLoopState.Break method:
Break indicates that no iterations after the current iteration should be run. It effectively cancels any additional iterations of the loop. However, it does not stop any iterations that have already begun execution. For example, if Break is called from the 100th iteration of a parallel loop iterating from 0 to 1,000, all iterations less than 100 should still be run, but the iterations from 101 through to 1000 that have not yet started are not executed.

C# parallel foreach does not give expected speedup

I am trying to find out why parallel foreach does not give the expected speedup on a machine with 32 physical cores and 64 logical cores with a simple test computation.
...
var parameters = new List<string>();
for (int i = 1; i <= 9; i++) {
parameters.Add(i.ToString());
if (Scenario.UsesParallelForEach)
{
Parallel.ForEach(parameters, parameter => {
FireOnParameterComputed(this, parameter, Thread.CurrentThread.ManagedThreadId, "started");
var lc = new LongComputation();
lc.Compute();
FireOnParameterComputed(this, parameter, Thread.CurrentThread.ManagedThreadId, "stopped");
});
}
else
{
foreach (var parameter in parameters)
{
FireOnParameterComputed(this, parameter, Thread.CurrentThread.ManagedThreadId, "started");
var lc = new LongComputation();
lc.Compute();
FireOnParameterComputed(this, parameter, Thread.CurrentThread.ManagedThreadId, "stopped");
}
}
}
...
class LongComputation
{
public void Compute()
{
var s = "";
for (int i = 0; i <= 40000; i++)
{
s = s + i.ToString() + "\n";
}
}
}
The Compute function takes about 5 seconds to complete. My assumption was, that with the parallel foreach loop each additional iteration creates a parallel thread running on one of the cores and taking as much as it would take to compute the Compute function only once. So, if I run the loop twice, then with the sequential foreach, it would take 10 seconds, with the parallel foreach only 5 seconds (assuming 2 cores are available). The speedup would be 2. If I run the loop three times, then with the sequential foreach, it would take 15 seconds, but again with the parallel foreach only 5 seconds. The speedup would be 3, then 4, 5, 6, 7, 8, and 9. However, what I observe is a constant speedup of 1.3.
Sequential vs parallel foreach. X-axis: number of sequential/parallel execution of the computation. Y-axis: time in seconds
Speedup, time of the sequential foreach divided by parallel foreach
The event fired in FireOnParameterComputed is intended to be used in a GUI progress bar to show the progress. In the progress bar it can be clearly see, that for each iteration, a new thread is created.
My question is, why don't I see the expected speedup or at least close to the expected speedup?

Tasks aren't threads.
Sometimes starting a task will cause a thread to be created, but not always. Creating and managing threads consumes time and system resources. When a task only takes a short amount of time, even though it's counter-intuitive, the single-threaded model is often faster.
The CLR knows this and tries to make its best judgment on how to execute the task based on a number of factors including any hints that you've passed to it.
For Parallel.ForEach, if you're certain that you want multiple threads to be spawned, try passing in ParallelOptions.
Parallel.ForEach(parameters, new ParallelOptions { MaxDegreeOfParallelism = 100 }, parameter => {});

Parallel.For "Thread local state"

MSDN
My question is: The third parameter in the parallel.for, what does it do?
When I change it to ()=> 1d, it doubles my result, set to two it triples, but it ignores the decimals.
Why does it ignore the decimals, if it was some sort of doubling? What is really happening there?
I've now tried adding locks. And it does not just initialize the interimresult to the value specified.
Here is the code Im using:
static void RunParallelForCorrectedAdam()
{
object _lock = new object();
double result = 0d;
// Here we call same method several times.
// for (int i = 0; i < 32; i++)
Parallel.For(0, 32,
// Func<TLocal> localInit,
() => 3d,
// Func<int, ParallelLoopState, TLocal, TLocal> body,
(i, state, interimResult) =>
{
lock (_lock)
{
return interimResult + 1;
}
},
//Final step after the calculations
//we add the result to the final result
// Action<TLocal> localFinally
(lastInterimResult) =>
{
lock (_lock)
{
result += lastInterimResult;
}
}
);
// Print the result
Console.WriteLine("The result is {0}", result);
}

With () => 3d, result will be 32 + 3 * t, where t is the number of threads that were used. 3d is passed as interimResult to the first call to body within each thread.
The whole purpose of Parallel.For is to distribute the work on several threads. So interimResult + 1 is executed exactly 32 times (possibly on different threads). But each thread has to have some initial value for interimResult. That's the value that is returned by localInit.
So if the work is distributed on e.g. two threds, each one does + 1 16 times and thus calculates 3 + 16. At the end, the partial results are summed yielding 6 + 32.
In short, in this example, it doesn't make much sense for localInit to return somthing different than 0d.

My question is: The third parameter in the parallel.for, what does it do?
It's a Func that gets executed once per thread. If your loop requires thread-local variable, this is where you initialize it.
EDIT:
Step by step:
(i, state, interimResult) => interimResult + 1,
Do you understand that interimResult is your local variable, the same one you initialized as 0d?

Algorithm for preventing burst from a producer

I have the following :
One producer that produces random integer (around one every minute). Eg : 154609722148751
One consumer that consumes theses integers one by one. Consumption is around 3 seconds long.
From time to time the producer get crazy and produces only one 'kind' of figure very quickly and then get back to normal.
Eg : 6666666666666666666666666666666666666666666666675444696 in 1 second.
My goal is to have as lower as possible as different kind of figure not consumed.
Say, in the previous sample
I have :
a lot of '6' not consumed
one '7' not consumed
one '5' not consumed
three '4' not consumed
one '9' not consumed
If I use a simple FIFO algorithm I am going to wait a long time before all the '6' being consumed. I would prefer to 'priortize' the other figures and THEN consume the '6'.
Does such an algorithm already exists ? (C# implementation is a plus)
Currently, I was thinking about this algorithm :
have a queue for each figure (Q0,Q1,Q2 ..., Q9)
sequentially dequeue one item for each queue :
private int _currentQueueId;
private Queue<T>[] _allQueues;
public T GetNextItemToConsume<T>()
{
//We assume that there is at least one item to consume, and no lock needed
var found = false;
while(!found)
{
var nextQueue = _allQueues[_currentQueueId++ % 10];
if(nextQueue.Count > 0)
return nextQueue.DeQueue();
}
}
Do you have better algorithm than this one ? (or any idea)
NB1 : I don't have the lead on the consumption process (that is to say I can't increase the consumption speed nor the number of consumption thread ..) (indeed an infinite consumation speed would solve my issue)
NB2 : exact time's figures are not relevant but we can assume that consumption is ten times quicker that production
NB3 : I don't have the lead on the 'crazy' behaviour of the producer and in fact it is a normal (but not so frequent) production behaviour

Here is a more complete example than my comment above:
sealed class Item {
public Item(int group) {
ItemGroup = group;
}
public int ItemGroup { get; private set; }
}
Item TakeNextItem(IList<Item> items) {
const int LowerBurstLimit = 1;
if (items == null)
throw new ArgumentNullException("items");
if (items.Count == 0)
return null;
var item = items.GroupBy(x => x.ItemGroup)
.OrderBy(x => Math.Max(LowerBurstLimit, x.Count()))
.First()
.First();
items.Remove(item);
return item;
}
My idea here is to sort the items based on their frequency and then just take from the one with lowest frequency. It is the same idea as your with your multiple queues but it is calculated on the fly.
If there are multiple groups with same frequency it will take the oldest item. (assuming GroupBy and OrderBy are stable. They are in practice but I am not sure it is stated in the documentation)
Increase LowerBurstLimit if you want to process the item in chronological order except the ones with more than LowerBurstLimit item in the queue.
To measure the time I just created this quick code in LinqPad.
(Eric Lippert: please ignore this part :-))
void Main()
{
var rand = new Random();
var items = Enumerable.Range(1, 1000)
.Select(x => { // simulate burst
int group = rand.Next(100);
return new Item(group < 90 ? 1 : (group % 10));
})
.ToList();
var first = TakeNextItem(items); // JIT
var sw = new Stopwatch();
sw.Start();
while (TakeNextItem(items) != null) {
}
sw.Stop();
Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
}
When I run this code with a 1000 items it takes around 80ms on my 3 year old laptop. i.e. on average 80µs per "Take".
GroupBy should be O(N*M), OrderBy O(M*LogM) and Remove O(N) (N=average queue length, M=number of groups) so the performance should scale linearly with N. i.e. a 10000 item queue take ~800µs per "Take" (I got ~700µs with 10000 items in the test)

I would use the 10 queues as you mentionned and select the one from which to dequeue statistically based on the number of element present in the particular queue. The queue with the highest number of elements is more likely to be selected for dequeue.
For better perf, you need to keep track of the total count of elements across all queues. For each dequeue operation, draw a random int X between 0 and total count-1, this will tell you from which queue to dequeue (loop through the queues a substract the number of elements in the queue from X, until you would go below zero, then pick that queue).

How can I keep track of how many async tasks have completed in a loop?

I have some code that loops through a list of records, starts an export task for each one, and increases a progress counter by 1 each time a task finishes so the user knows how far along the process is.
But depending on the timing of my loops, I often see the output showing a higher number before a lower number.
For example, I would expect to see output like this:
Exporting A
Exporting B
Exporting C
Exporting D
Exporting E
Finished 1 / 5
Finished 2 / 5
Finished 3 / 5
Finished 4 / 5
Finished 5 / 5
But instead I get output like this
Exporting A
Exporting B
Exporting C
Exporting D
Exporting E
Finished 1 / 5
Finished 2 / 5
Finished 5 / 5
Finished 4 / 5
Finished 3 / 5
I don't expect the output to be exact since I'm not locking the value when I update/use it (sometimes it outputs the same number twice, or skips a number), however I wouldn't expect it to go backwards.
My test data set is 72 values, and the relevant code looks like this:
var tasks = new List<Task>();
int counter = 0;
StatusMessage = string.Format("Exporting 0 / {0}", count);
foreach (var value in myValues)
{
var valueParam = value;
// Create async task, start it, and store the task in a list
// so we can wait for all tasks to finish at the end
tasks.Add(
Task.Factory.StartNew(() =>
{
Debug.WriteLine("Exporting " + valueParam );
System.Threading.Thread.Sleep(500);
counter++;
StatusMessage = string.Format("Exporting {0} / {1}", counter, count);
Debug.WriteLine("Finished " + counter.ToString());
})
);
}
// Begin async task to wait for all tasks to finish and update output
Task.Factory.StartNew(() =>
{
Task.WaitAll(tasks.ToArray());
StatusMessage = "Finished";
});
The output can appear backwards in both the debug statements and the StatusMessage output.
What's the correct way to keep count of how many async tasks in a loop are completed so that this problem doesn't occur?

You get mixed output, because counter is not incremented in the same order as Debug.WriteLine(...) method is executed.
To get a consistent progress report, you can introduce a reporting lock into the task
tasks.Add(
Task.Factory.StartNew(() =>
{
Debug.WriteLine("Exporting " + valueParam );
System.Threading.Thread.Sleep(500);
lock(progressReportLock)
{
counter++;
StatusMessage = string.Format("Exporting {0} / {1}", counter, count);
Debug.WriteLine("Finished " + counter.ToString());
}
})
);

In this sample the counter variable represents shared state among several threads. Using the ++ operator on shared state is simply unsafe and will give you incorrect results. It essentially boils down to the following instructions
push counter to stack
push 1 to stack
add values on the stack
store into counter
Because multiple threads are executing this statement it's possible for one to interrupt the other partway through completing the above sequence. This would cause the incorrect value to end up in counter.
Instead of ++ use the following statement
Interlocked.Increment(ref counter);
This operation is specifically designed to update state which may be shared among several threads. The interlocked will happen atomically and won't suffer from the race conditions I outlined
The actual out of order display of values suffers from a similar problem even after my suggested fix. The increment and display operation aren't atomic and hence one thread can interrupt the other in between the increment and display. If you want the operations to be un-interruptable by other threads then you will need to use a lock.
object lockTarget = new object();
int counter = 0;
...
lock (lockTarget) {
counter++;
StatusMessage = string.Format("Exporting {0} / {1}", counter, count);
Debug.WriteLine("Finished " + counter.ToString());
}
Note that because the increment of counter now occurs inside the lock there is no longer a need to use Interlocked.Increment

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parallel ordered consumable - c#

Related

Why Iterations with lower index is not performed?

C# parallel foreach does not give expected speedup

Parallel.For "Thread local state"

Algorithm for preventing burst from a producer

How can I keep track of how many async tasks have completed in a loop?

Categories

Resources