Parallel.For<int> not working as expected - c#

I wrote a simple Parallel.For loop. But when i run the code, i get random results. I expect var total to be 15 (1+2+3+4+5). I used Interlocked.Add to prevent from race conditions and strange behavior. Can someone explain why the output is random and not 15?
public class Program
{
public static void Main(string[] args)
{
Console.WriteLine("before Dowork");
DoWork();
Console.WriteLine("After Dowork");
Console.ReadLine();
}
public static void DoWork()
{
try
{
int total = 0;
var result = Parallel.For<int>(0, 6,
() => 0,
(i, status, y) =>
{
return i;
},
(x) =>
{
Interlocked.Add(ref total, x);
});
if (result.IsCompleted)
Console.WriteLine($"total is: {total}");
else Console.WriteLine("loop not ready yet");
}
catch(Exception e)
{
Console.WriteLine(e.Message);
}
}
}

Instead of using
(i, status, y) =>
{
return i;
}
you should use
(i, status, y) =>
{
return y + i;
}
Parallel.For splits the source sequence into several partitions. The items in each partition are processed sequentially, but multiple partitions may be executed in parallel.
Each partition has a local state. The local state is the return value of the the above lambda function and it is also passed as the y parameter. So the reason for returning y + i should be clear now: you should update the local state to the sum of the previous state and the input value i.
After every item of a partition has been processed, the final value of the local state is passed to the last function, where you sum up all the states:
(x) =>
{
Interlocked.Add(ref total, x);
}

Related

Observable with Time interval not displaying results on subscribe

I am trying to add a time interval to this Observable sequence( That is produce an integer sequence at a specific timespan) but it seems not to be working. When i remove the time, then it works time. Am i applying the timer wrongly?
var timer = Observable.Interval(TimeSpan.FromSeconds(2)).Take(4);
var nums = Observable.Range(1,1200).Where(a => a % 2 == 0);
var sourcenumbs = timer.SelectMany(nums);
var results = sourcenumbs.Subscribe(
x => Console.WriteLine("OnNext: {0}",x),
ex => Console.WriteLine("OnError: {0}",ex.Message),
() => Console.WriteLine("OnComplete")
);
This code displays no output, Does it get Dispose before it reaches the Subscribe?
But if i had a forloop with a timer in it then it works. Why?
for (int i = 0; i < 10; i++)
{
Thread.Sleep(TimeSpan.FromSeconds(0.9));
}
Is this what you want?
static void Main(string[] args)
{
Execute();
Console.ReadKey();
}
private static async void Execute()
{
var intervals = Observable.Interval(TimeSpan.FromSeconds(2)).StartWith(0);
var evenNumbers = Enumerable.Range(1, 1200).Where(a => a % 2 == 0);
var evenNumbersAtIntervals = intervals.Zip(evenNumbers, (_, num) => num);
try
{
await evenNumbersAtIntervals.ForEachAsync(
x => Console.WriteLine("OnNext: {0}", x)
);
Console.WriteLine("Complete");
}
catch(Exception e)
{
Console.WriteLine("Exception " + e);
}
}
Take note that numbers are Enumerable and not Observable.

Data Propagation in TPL Dataflow Pipeline with Batchblock.Triggerbatch()

In my Producer-Consumer scenario, I have multiple consumers, and each of the consumers send an action to external hardware, which may take some time. My Pipeline looks somewhat like this:
BatchBlock --> TransformBlock --> BufferBlock --> (Several) ActionBlocks
I have assigned BoundedCapacity of my ActionBlocks to 1.
What I want in theory is, I want to trigger the Batchblock to send a group of items to the Transformblock only when one of my Actionblocks are available for operation. Till then the Batchblock should just keep buffering elements and not pass them on to the Transformblock. My batch-sizes are variable. As Batchsize is mandatory, I do have a really high upper-limit for BatchBlock batch size, however I really don't wish to reach upto that limit, I would like to trigger my batches depending upon the availability of the Actionblocks permforming the said task.
I have achieved this with the help of the Triggerbatch() method. I am calling the Batchblock.Triggerbatch() as the last action in my ActionBlock.However interestingly after several days of working properly the pipeline has come to a hault. Upon checking I found out that sometimes the inputs to the batchblock come in after the ActionBlocks are done with their work. In this case the ActionBlocks do actually call Triggerbatch at the end of their work, however since at this point there is no input to the Batchblock at all, the call to TriggerBatch is fruitless. And after a while when inputs do flow in to the Batchblock, there is no one left to call TriggerBatch and restart the Pipeline. I was looking for something where I could just check if something is infact present in the inputbuffer of the Batchblock, however there is no such feature available, I could also not find a way to check if the TriggerBatch was fruitful.
Could anyone suggest a possible solution to my problem. Unfortunately using a Timer to triggerbatches is not an option for me. Except for the start of the Pipeline, the throttling should be governed only by the availability of one of the ActionBlocks.
The example code is here:
static BatchBlock<int> _groupReadTags;
static void Main(string[] args)
{
_groupReadTags = new BatchBlock<int>(1000);
var bufferOptions = new DataflowBlockOptions{BoundedCapacity = 2};
BufferBlock<int> _frameBuffer = new BufferBlock<int>(bufferOptions);
var consumerOptions = new ExecutionDataflowBlockOptions { BoundedCapacity = 1};
int batchNo = 1;
TransformBlock<int[], int> _workingBlock = new TransformBlock<int[], int>(list =>
{
Console.WriteLine("\n\nWorking on Batch Number {0}", batchNo);
//_groupReadTags.TriggerBatch();
int sum = 0;
foreach (int item in list)
{
Console.WriteLine("Elements in batch {0} :: {1}", batchNo, item);
sum += item;
}
batchNo++;
return sum;
});
ActionBlock<int> _worker1 = new ActionBlock<int>(async x =>
{
Console.WriteLine("Number from ONE :{0}",x);
await Task.Delay(500);
Console.WriteLine("BatchBlock Output Count : {0}", _groupReadTags.OutputCount);
_groupReadTags.TriggerBatch();
},consumerOptions);
ActionBlock<int> _worker2 = new ActionBlock<int>(async x =>
{
Console.WriteLine("Number from TWO :{0}", x);
await Task.Delay(2000);
_groupReadTags.TriggerBatch();
}, consumerOptions);
_groupReadTags.LinkTo(_workingBlock);
_workingBlock.LinkTo(_frameBuffer);
_frameBuffer.LinkTo(_worker1);
_frameBuffer.LinkTo(_worker2);
_groupReadTags.Post(10);
_groupReadTags.Post(20);
_groupReadTags.TriggerBatch();
Task postingTask = new Task(() => PostStuff());
postingTask.Start();
Console.ReadLine();
}
static void PostStuff()
{
for (int i = 0; i < 10; i++)
{
_groupReadTags.Post(i);
Thread.Sleep(100);
}
Parallel.Invoke(
() => _groupReadTags.Post(100),
() => _groupReadTags.Post(200),
() => _groupReadTags.Post(300),
() => _groupReadTags.Post(400),
() => _groupReadTags.Post(500),
() => _groupReadTags.Post(600),
() => _groupReadTags.Post(700),
() => _groupReadTags.Post(800)
);
}
Here is an alternative BatchBlock implementation with some extra features. It includes a TriggerBatch method with this signature:
public int TriggerBatch(int nextMinBatchSizeIfEmpty);
Invoking this method will either trigger a batch immediately if the input queue is not empty, otherwise it will set a temporary MinBatchSize that will affect only the next batch. You could invoke this method with a small value for nextMinBatchSizeIfEmpty to ensure that in case a batch cannot be currently produced, the next batch will occur sooner than the configured BatchSize at the block's constructor.
This method returns the size of the batch produced. It returns 0 in case that the input queue is empty, or the output queue is full, or the block has completed.
public class BatchBlockEx<T> : ITargetBlock<T>, ISourceBlock<T[]>
{
private readonly ITargetBlock<T> _input;
private readonly IPropagatorBlock<T[], T[]> _output;
private readonly Queue<T> _queue;
private readonly object _locker = new object();
private int _nextMinBatchSize = Int32.MaxValue;
public Task Completion { get; }
public int InputCount { get { lock (_locker) return _queue.Count; } }
public int OutputCount => ((BufferBlock<T[]>)_output).Count;
public int BatchSize { get; }
public BatchBlockEx(int batchSize, DataflowBlockOptions dataflowBlockOptions = null)
{
if (batchSize < 1) throw new ArgumentOutOfRangeException(nameof(batchSize));
dataflowBlockOptions = dataflowBlockOptions ?? new DataflowBlockOptions();
if (dataflowBlockOptions.BoundedCapacity != DataflowBlockOptions.Unbounded &&
dataflowBlockOptions.BoundedCapacity < batchSize)
throw new ArgumentOutOfRangeException(nameof(batchSize),
"Number must be no greater than the value specified in BoundedCapacity.");
this.BatchSize = batchSize;
_output = new BufferBlock<T[]>(dataflowBlockOptions);
_queue = new Queue<T>(batchSize);
_input = new ActionBlock<T>(async item =>
{
T[] batch = null;
lock (_locker)
{
_queue.Enqueue(item);
if (_queue.Count == batchSize || _queue.Count >= _nextMinBatchSize)
{
batch = _queue.ToArray(); _queue.Clear();
_nextMinBatchSize = Int32.MaxValue;
}
}
if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
}, new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 1,
CancellationToken = dataflowBlockOptions.CancellationToken
});
var inputContinuation = _input.Completion.ContinueWith(async t =>
{
try
{
T[] batch = null;
lock (_locker)
{
if (_queue.Count > 0)
{
batch = _queue.ToArray(); _queue.Clear();
}
}
if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
}
finally
{
if (t.IsFaulted)
{
_output.Fault(t.Exception.InnerException);
}
else
{
_output.Complete();
}
}
}, TaskScheduler.Default).Unwrap();
this.Completion = Task.WhenAll(inputContinuation, _output.Completion);
}
public void Complete() => _input.Complete();
void IDataflowBlock.Fault(Exception ex) => _input.Fault(ex);
public int TriggerBatch(Func<T[], bool> condition, int nextMinBatchSizeIfEmpty)
{
if (nextMinBatchSizeIfEmpty < 1)
throw new ArgumentOutOfRangeException(nameof(nextMinBatchSizeIfEmpty));
int count = 0;
lock (_locker)
{
if (_queue.Count > 0)
{
T[] batch = _queue.ToArray();
if (condition == null || condition(batch))
{
bool accepted = _output.Post(batch);
if (accepted) { _queue.Clear(); count = batch.Length; }
}
_nextMinBatchSize = Int32.MaxValue;
}
else
{
_nextMinBatchSize = nextMinBatchSizeIfEmpty;
}
}
return count;
}
public int TriggerBatch(Func<T[], bool> condition)
=> TriggerBatch(condition, Int32.MaxValue);
public int TriggerBatch(int nextMinBatchSizeIfEmpty)
=> TriggerBatch(null, nextMinBatchSizeIfEmpty);
public int TriggerBatch() => TriggerBatch(null, Int32.MaxValue);
DataflowMessageStatus ITargetBlock<T>.OfferMessage(
DataflowMessageHeader messageHeader, T messageValue,
ISourceBlock<T> source, bool consumeToAccept)
{
return _input.OfferMessage(messageHeader, messageValue, source,
consumeToAccept);
}
T[] ISourceBlock<T[]>.ConsumeMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target, out bool messageConsumed)
{
return _output.ConsumeMessage(messageHeader, target, out messageConsumed);
}
bool ISourceBlock<T[]>.ReserveMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target)
{
return _output.ReserveMessage(messageHeader, target);
}
void ISourceBlock<T[]>.ReleaseReservation(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target)
{
_output.ReleaseReservation(messageHeader, target);
}
IDisposable ISourceBlock<T[]>.LinkTo(ITargetBlock<T[]> target,
DataflowLinkOptions linkOptions)
{
return _output.LinkTo(target, linkOptions);
}
}
Another overload of the TriggerBatch method allows to examine the batch that can be currently produced, and decide if it should be triggered or not:
public int TriggerBatch(Func<T[], bool> condition);
The BatchBlockEx class does not support the Greedy and MaxNumberOfGroups options of the built-in BatchBlock.
I have found that using TriggerBatch in this way is unreliable:
_groupReadTags.Post(10);
_groupReadTags.Post(20);
_groupReadTags.TriggerBatch();
Apparently TriggerBatch is intended to be used inside the block, not outside it like this. I have seen this result in odd timing issues, like items from next batch batch being included in the current batch, even though TriggerBatch was called first.
Please see my answer to this question for an alternative using DataflowBlock.Encapsulate: BatchBlock produces batch with elements sent after TriggerBatch()

Take from IObservable until collection reached count or time elapsed

I want to fill a collection until any of these two conditions is satisfied:
either allowed time of 5 seconds has completed, or
collection reached the count of 5 items.
If any of these conditions is fulfilled, the method that i subscribed to should be executed (in this case Console.WriteLine)
static void Main(string[] args)
{
var sourceCollection = Source().ToObservable();
var bufferedCollection = sourceCollection.Buffer(
() => Observable.Amb(
Observable.Timer(TimeSpan.FromSeconds(5)//,
//Observable.TakeWhile(bufferedCollection, a=> a.Count < 5)
))
);
bufferedCollection.Subscribe(col =>
{
Console.WriteLine("count of items is now {0}", col.Count);
});
Console.ReadLine();
}
static IEnumerable<int> Source()
{
var random = new Random();
var lst = new List<int> { 1,2,3,4,5 };
while(true)
{
yield return lst[random.Next(lst.Count)];
Thread.Sleep(random.Next(0, 1500));
}
}
i managed to make it work with the Observable.Timer, but the TakeWhile doesnt work, how do I check for the collection count, does TakeWhile work for this or is there some other method? Im sure its something simple.
I got it, the answer was in the documentation of Buffer - there's an overload that takes a parameter that specifies the maximum count. So I don't need Observable.Amb, I can just say
var sourceCollection = Source().ToObservable();
var maxBufferCount = 5;
var bufferedCollection = sourceCollection.Buffer(TimeSpan.FromSeconds(5), maxBufferCount, Scheduler.Default);
bufferedCollection.Subscribe(col =>
{
Console.WriteLine("count of items is now {0}", col.Count);
});
Console.ReadLine();

C# yield Follow-up to the end

If I call methodA and then methodB output is: "121234". But I need output: "1234", "12" from methodA and "34" from methodB. I need to remember where getNumber ended with the return, and the next call continue from here. It is possible?
MethodA snipset
int x = 0;
foreach (var num in GetNumber())
{
if (x == 2)
{
break;
}
x++;
Console.Write(num);
}
MethodB snipset
int x = 0;
foreach (var num in GetNumber())
{
if (x == 4)
{
break;
}
x++;
Console.Write(num);
}
GetNumber
static IEnumerable<int> GetNumber()
{
int x = 0;
while (true)
{
yield return x++;
}
}
You can initialize x outside the method:
static class InfiniteContinuingSequence
{
static int x = 0;
public static IEnumerable<int> GetNumbers()
{
while (true)
{
yield return x++;
}
}
}
But you will have to explicitly be careful when and where this sequence gets enumerated. If it is inadvertently enumerated multiple times, you will "lose" numbers in the sequence; if you try to exhaustively enumerate GetNumbers() you'll get either an infinite loop or an overflow; and if you try to call this from multiple threads without locking, you'll get potentially odd behavior.
Instead, I might suggest to either use an explicit enumerator, or be able to specify a "restart point" to your method (in this case at least you will not get odd behavior by multiple enumerations):
static IEnumerable<int> GetNumbers(int startingPoint = 0)
{
int x = startingPoint;
while (true)
{
yield return x++;
}
}
Change GetNumber to
static int x = 0;
static IEnumberable<int> GetNumber()
{
while (true) { yield return x++; }
}
The issue is that you have two enumerations of GetNumber. By making x static you will get the results you want. But beware this is not thread safe code.

Observing an asynchronous sequence with 'yield return'

The following sample works fine:
static IEnumerable<int> GenerateNum(int sequenceLength)
{
for(int i = 0; i < sequenceLength; i++)
{
yield return i;
}
}
static void Main(string[] args)
{
//var observ = Observable.Start(() => GenerateNum(1000));
var observ = GenerateNum(1000).ToObservable();
observ.Subscribe(
(x) => Console.WriteLine("test:" + x),
(Exception ex) => Console.WriteLine("Error received from source: {0}.", ex.Message),
() => Console.WriteLine("End of sequence.")
);
Console.ReadKey();
}
However, what I really want is to use the commented out line - i.e. I want to run the 'number generator' asynchronously, and every time it yields a new value, I want it to be output to the console. It doesn't seem to work - how can I modify this code to work?
When doing this for asynchronous execution in a console app, you may want to use the ToObservable(IEnumerable<TSource>, IScheduler) overload (see Observable.ToObservable Method (IEnumerable, IScheduler)). To use the built-in thread pool schedule, for example, try
var observ = GenerateNum(1000).ToObservable(Scheduler.ThreadPool);
It works for me...To expand, the following complete example works exactly as I think you intend:
static Random r = new Random();
static void Main(string[] args) {
var observ = GenerateNum(1000).ToObservable(Scheduler.ThreadPool );
observ.Subscribe(
(x) => Console.WriteLine("test:" + x),
(Exception ex) => Console.WriteLine("Error received from source: {0}.", ex.Message),
() => Console.WriteLine("End of sequence.")
);
while (Console.ReadKey(true).Key != ConsoleKey.Escape) {
Console.WriteLine("You pressed a key.");
}
}
static IEnumerable<int> GenerateNum(int sequenceLength) {
for (int i = 0; i < sequenceLength; i++) {
Thread.Sleep(r.Next(1, 200));
yield return i;
}
}

Categories

Resources