TPL Dataflow, confused about core design

TPL Dataflow, confused about core design - c#

I have been using TPL Dataflow quite a bit but am stumbling about an issue that I cannot resolve:
I have the following architecture:
BroadCastBlock<List<object1>> -> 2 different TransformBlock<List<Object1>, Tuple<int, List<Object1>>> -> both link to TransformManyBlock<Tuple<int, List<Object1>>, Object2>
I vary the lambda expression within the TransformManyBlock in the end of chain: (a) code that performs operations on the streamed tuple, (b) no code at all.
Within the TransformBlocks I measure the time starting from the arrival of the first item and stopping when TransformBlock.Completion indicates the block completed (broadCastBlock links to transfrom blocks with propagateCompletion set to true).
What I cannot reconcile is why the transformBlocks in the case of (b) complete about 5-6 times faster than with (a). This completely goes against the intent of the whole TDF design intentions. The items from the transform blocks were passed on to the transfromManyBlock, thus it should not matter at all what the transformManyBlock does to the items that influences when the transform blocks complete. I do not see a single reason why anything that goes on in the transfromManyBlock may have a bearing on the preceding TransformBlocks.
Anyone who can reconcile this weird observation?
Here is some code to show the difference. When running the code make sure to change the following two lines from:
tfb1.transformBlock.LinkTo(transformManyBlock);
tfb2.transformBlock.LinkTo(transformManyBlock);
to:
tfb1.transformBlock.LinkTo(transformManyBlockEmpty);
tfb2.transformBlock.LinkTo(transformManyBlockEmpty);
in order to observe the difference in runtime of the preceding transformBlocks.
class Program
{
static void Main(string[] args)
{
Test test = new Test();
test.Start();
}
}
class Test
{
private const int numberTransformBlocks = 2;
private int currentGridPointer;
private Dictionary<int, List<Tuple<int, List<Object1>>>> grid;
private BroadcastBlock<List<Object1>> broadCastBlock;
private TransformBlockClass tfb1;
private TransformBlockClass tfb2;
private TransformManyBlock<Tuple<int, List<Object1>>, Object2>
transformManyBlock;
private TransformManyBlock<Tuple<int, List<Object1>>, Object2>
transformManyBlockEmpty;
private ActionBlock<Object2> actionBlock;
public Test()
{
grid = new Dictionary<int, List<Tuple<int, List<Object1>>>>();
broadCastBlock = new BroadcastBlock<List<Object1>>(list => list);
tfb1 = new TransformBlockClass();
tfb2 = new TransformBlockClass();
transformManyBlock = new TransformManyBlock<Tuple<int, List<Object1>>, Object2>
(newTuple =>
{
for (int counter = 1; counter <= 10000000; counter++)
{
double result = Math.Sqrt(counter + 1.0);
}
return new Object2[0];
});
transformManyBlockEmpty
= new TransformManyBlock<Tuple<int, List<Object1>>, Object2>(
tuple =>
{
return new Object2[0];
});
actionBlock = new ActionBlock<Object2>(list =>
{
int tester = 1;
//flush transformManyBlock
});
//linking
broadCastBlock.LinkTo(tfb1.transformBlock
, new DataflowLinkOptions
{ PropagateCompletion = true }
);
broadCastBlock.LinkTo(tfb2.transformBlock
, new DataflowLinkOptions
{ PropagateCompletion = true }
);
//link either to ->transformManyBlock or -> transformManyBlockEmpty
tfb1.transformBlock.LinkTo(transformManyBlock);
tfb2.transformBlock.LinkTo(transformManyBlock);
transformManyBlock.LinkTo(actionBlock
, new DataflowLinkOptions
{ PropagateCompletion = true }
);
transformManyBlockEmpty.LinkTo(actionBlock
, new DataflowLinkOptions
{ PropagateCompletion = true }
);
//completion
Task.WhenAll(tfb1.transformBlock.Completion
, tfb2.transformBlock.Completion)
.ContinueWith(_ =>
{
transformManyBlockEmpty.Complete();
transformManyBlock.Complete();
});
transformManyBlock.Completion.ContinueWith(_ =>
{
Console.WriteLine("TransformManyBlock (with code) completed");
});
transformManyBlockEmpty.Completion.ContinueWith(_ =>
{
Console.WriteLine("TransformManyBlock (empty) completed");
});
}
public void Start()
{
const int numberBlocks = 100;
const int collectionSize = 300000;
//send collection numberBlock-times
for (int i = 0; i < numberBlocks; i++)
{
List<Object1> list = new List<Object1>();
for (int j = 0; j < collectionSize; j++)
{
list.Add(new Object1(j));
}
broadCastBlock.Post(list);
}
//mark broadCastBlock complete
broadCastBlock.Complete();
Console.WriteLine("Core routine finished");
Console.ReadLine();
}
}
class TransformBlockClass
{
private Stopwatch watch;
private bool isStarted;
private int currentIndex;
public TransformBlock<List<Object1>, Tuple<int, List<Object1>>> transformBlock;
public TransformBlockClass()
{
isStarted = false;
watch = new Stopwatch();
transformBlock = new TransformBlock<List<Object1>, Tuple<int, List<Object1>>>
(list =>
{
if (!isStarted)
{
StartUp();
isStarted = true;
}
return new Tuple<int, List<Object1>>(currentIndex++, list);
});
transformBlock.Completion.ContinueWith(_ =>
{
ShutDown();
});
}
private void StartUp()
{
watch.Start();
}
private void ShutDown()
{
watch.Stop();
Console.WriteLine("TransformBlock : Time elapsed in ms: "
+ watch.ElapsedMilliseconds);
}
}
class Object1
{
public int val { get; private set; }
public Object1(int val)
{
this.val = val;
}
}
class Object2
{
public int value { get; private set; }
public List<Object1> collection { get; private set; }
public Object2(int value, List<Object1> collection)
{
this.value = value;
this.collection = collection;
}
}
*EDIT: I posted another code piece, this time using collections of value types and I cannot reproduce the problem I am observing in above code. Could it be that passing around reference types and operating on them concurrently (even within different data flow blocks) could block and cause contention? *
class Program
{
static void Main(string[] args)
{
Test test = new Test();
test.Start();
}
}
class Test
{
private BroadcastBlock<List<int>> broadCastBlock;
private TransformBlock<List<int>, List<int>> tfb11;
private TransformBlock<List<int>, List<int>> tfb12;
private TransformBlock<List<int>, List<int>> tfb21;
private TransformBlock<List<int>, List<int>> tfb22;
private TransformManyBlock<List<int>, List<int>> transformManyBlock1;
private TransformManyBlock<List<int>, List<int>> transformManyBlock2;
private ActionBlock<List<int>> actionBlock1;
private ActionBlock<List<int>> actionBlock2;
public Test()
{
broadCastBlock = new BroadcastBlock<List<int>>(item => item);
tfb11 = new TransformBlock<List<int>, List<int>>(item =>
{
return item;
});
tfb12 = new TransformBlock<List<int>, List<int>>(item =>
{
return item;
});
tfb21 = new TransformBlock<List<int>, List<int>>(item =>
{
return item;
});
tfb22 = new TransformBlock<List<int>, List<int>>(item =>
{
return item;
});
transformManyBlock1 = new TransformManyBlock<List<int>, List<int>>(item =>
{
Thread.Sleep(100);
//or you can replace the Thread.Sleep(100) with actual work,
//no difference in results. This shows that the issue at hand is
//unrelated to starvation of threads.
return new List<int>[1] { item };
});
transformManyBlock2 = new TransformManyBlock<List<int>, List<int>>(item =>
{
return new List<int>[1] { item };
});
actionBlock1 = new ActionBlock<List<int>>(item =>
{
//flush transformManyBlock
});
actionBlock2 = new ActionBlock<List<int>>(item =>
{
//flush transformManyBlock
});
//linking
broadCastBlock.LinkTo(tfb11, new DataflowLinkOptions
{ PropagateCompletion = true });
broadCastBlock.LinkTo(tfb12, new DataflowLinkOptions
{ PropagateCompletion = true });
broadCastBlock.LinkTo(tfb21, new DataflowLinkOptions
{ PropagateCompletion = true });
broadCastBlock.LinkTo(tfb22, new DataflowLinkOptions
{ PropagateCompletion = true });
tfb11.LinkTo(transformManyBlock1);
tfb12.LinkTo(transformManyBlock1);
tfb21.LinkTo(transformManyBlock2);
tfb22.LinkTo(transformManyBlock2);
transformManyBlock1.LinkTo(actionBlock1
, new DataflowLinkOptions
{ PropagateCompletion = true }
);
transformManyBlock2.LinkTo(actionBlock2
, new DataflowLinkOptions
{ PropagateCompletion = true }
);
//completion
Task.WhenAll(tfb11.Completion, tfb12.Completion).ContinueWith(_ =>
{
Console.WriteLine("TransformBlocks 11 and 12 completed");
transformManyBlock1.Complete();
});
Task.WhenAll(tfb21.Completion, tfb22.Completion).ContinueWith(_ =>
{
Console.WriteLine("TransformBlocks 21 and 22 completed");
transformManyBlock2.Complete();
});
transformManyBlock1.Completion.ContinueWith(_ =>
{
Console.WriteLine
("TransformManyBlock (from tfb11 and tfb12) finished");
});
transformManyBlock2.Completion.ContinueWith(_ =>
{
Console.WriteLine
("TransformManyBlock (from tfb21 and tfb22) finished");
});
}
public void Start()
{
const int numberBlocks = 100;
const int collectionSize = 300000;
//send collection numberBlock-times
for (int i = 0; i < numberBlocks; i++)
{
List<int> list = new List<int>();
for (int j = 0; j < collectionSize; j++)
{
list.Add(j);
}
broadCastBlock.Post(list);
}
//mark broadCastBlock complete
broadCastBlock.Complete();
Console.WriteLine("Core routine finished");
Console.ReadLine();
}
}

Okay, final attempt ;-)
Synopsis:
The observed time delta in scenario 1 can be fully explained by differing behavior of the garbage collector.
When running scenario 1 linking the transformManyBlocks, the runtime behavior is such that garbage collections are triggered during the creation of new items (Lists) on the main thread, which is not the case when running scenario 1 with the transformManyBlockEmptys linked.
Note that creating a new reference type instance (Object1) results in a call to allocate memory in the GC heap which in turn may trigger a GC collection run. As quite a few Object1 instances (and lists) are created, the garbage collector has quite a bit more work to do scanning the heap for (potentially) unreachable objects.
Therefore the observed difference can be minimized by any of the following:
Turning Object1 from a class to a struct (thereby ensuring that memory for the instances is not allocated on the heap).
Keeping a reference to the generated lists (thereby reducing the time the garbage collector needs to identify unreachable objects).
Generating all the items before posting them to the network.
(Note: I cannot explain why the garbage collector behaves differently in scenario 1 "transformManyBlock" vs. scenario 1 "transformManyBlockEmpty", but data collected via the ConcurrencyVisualizer clearly shows the difference.)
Results:
(Tests were run on a Core i7 980X, 6 cores, HT enabled):
I modified scenario 2 as follows:
// Start a stopwatch per tfb
int tfb11Cnt = 0;
Stopwatch sw11 = new Stopwatch();
tfb11 = new TransformBlock<List<int>, List<int>>(item =>
{
if (Interlocked.CompareExchange(ref tfb11Cnt, 1, 0) == 0)
sw11.Start();
return item;
});
// [...]
// completion
Task.WhenAll(tfb11.Completion, tfb12.Completion).ContinueWith(_ =>
{
Console.WriteLine("TransformBlocks 11 and 12 completed. SW11: {0}, SW12: {1}",
sw11.ElapsedMilliseconds, sw12.ElapsedMilliseconds);
transformManyBlock1.Complete();
});
Results:
Scenario 1 (as posted, i.e. linked to transformManyBlock):
TransformBlock : Time elapsed in ms: 6826
TransformBlock : Time elapsed in ms: 6826
Scenario 1 (linked to transformManyBlockEmpty):
TransformBlock : Time elapsed in ms: 3140
TransformBlock : Time elapsed in ms: 3140
Scenario 1 (transformManyBlock, Thread.Sleep(200) in loop body):
TransformBlock : Time elapsed in ms: 4949
TransformBlock : Time elapsed in ms: 4950
Scenario 2 (as posted but modified to report times):
TransformBlocks 21 and 22 completed. SW21: 619 ms, SW22: 669 ms
TransformBlocks 11 and 12 completed. SW11: 669 ms, SW12: 667 ms
Next, I changed scenario 1 and 2 to prepare the input data prior to posting it to the network:
// Scenario 1
//send collection numberBlock-times
var input = new List<List<Object1>>(numberBlocks);
for (int i = 0; i < numberBlocks; i++)
{
var list = new List<Object1>(collectionSize);
for (int j = 0; j < collectionSize; j++)
{
list.Add(new Object1(j));
}
input.Add(list);
}
foreach (var inp in input)
{
broadCastBlock.Post(inp);
Thread.Sleep(10);
}
// Scenario 2
//send collection numberBlock-times
var input = new List<List<int>>(numberBlocks);
for (int i = 0; i < numberBlocks; i++)
{
List<int> list = new List<int>(collectionSize);
for (int j = 0; j < collectionSize; j++)
{
list.Add(j);
}
//broadCastBlock.Post(list);
input.Add(list);
}
foreach (var inp in input)
{
broadCastBlock.Post(inp);
Thread.Sleep(10);
}
Results:
Scenario 1 (transformManyBlock):
TransformBlock : Time elapsed in ms: 1029
TransformBlock : Time elapsed in ms: 1029
Scenario 1 (transformManyBlockEmpty):
TransformBlock : Time elapsed in ms: 975
TransformBlock : Time elapsed in ms: 975
Scenario 1 (transformManyBlock, Thread.Sleep(200) in loop body):
TransformBlock : Time elapsed in ms: 972
TransformBlock : Time elapsed in ms: 972
Finally, I changed the code back to the original version, but keeping a reference to the
created list around:
var lists = new List<List<Object1>>();
for (int i = 0; i < numberBlocks; i++)
{
List<Object1> list = new List<Object1>();
for (int j = 0; j < collectionSize; j++)
{
list.Add(new Object1(j));
}
lists.Add(list);
broadCastBlock.Post(list);
}
Results:
Scenario 1 (transformManyBlock):
TransformBlock : Time elapsed in ms: 6052
TransformBlock : Time elapsed in ms: 6052
Scenario 1 (transformManyBlockEmpty):
TransformBlock : Time elapsed in ms: 5524
TransformBlock : Time elapsed in ms: 5524
Scenario 1 (transformManyBlock, Thread.Sleep(200) in loop body):
TransformBlock : Time elapsed in ms: 5098
TransformBlock : Time elapsed in ms: 5098
Likewise, changing Object1 from a class to a struct results in both blocks to complete at about the same time (and about 10x faster).
Update: Below answer does not suffice to explain the behavior observed.
In scenario one a tight loop is executed inside the TransformMany lambda, which will hog the CPU and will starve other threads for processor resources. That's the reason why a delay in the execution of the Completion continuation task can be observed. In scenario two a Thread.Sleep is executed inside the TransformMany lambda giving other threads the chance to execute the Completion continuation task. The observed difference in runtime behavior is not related to TPL Dataflow. To improve the observed deltas it should suffice to introduce a Thread.Sleep inside the loop's body in scenario 1:
for (int counter = 1; counter <= 10000000; counter++)
{
double result = Math.Sqrt(counter + 1.0);
// Back off for a little while
Thread.Sleep(200);
}
(Below is my original answer. I didn't read the OP's question careful enough, and only understood what he was asking about after having read his comments. I still leave it here as a reference.)
Are you sure that you are measuring the right thing? Note that when you do something like this: transformBlock.Completion.ContinueWith(_ => ShutDown()); then your time measurement will be influenced by the behavior of the TaskScheduler (e.g. how long it takes until the continuation task starts executing). Although I was not able to observe the difference you saw on my machine I got preciser results (in terms of the delta between tfb1 and tfb2 completion times) when using dedicated threads for measuring time:
// Within your Test.Start() method...
Thread timewatch = new Thread(() =>
{
var sw = Stopwatch.StartNew();
tfb1.transformBlock.Completion.Wait();
Console.WriteLine("tfb1.transformBlock completed within {0} ms",
sw.ElapsedMilliseconds);
});
Thread timewatchempty = new Thread(() =>
{
var sw = Stopwatch.StartNew();
tfb2.transformBlock.Completion.Wait();
Console.WriteLine("tfb2.transformBlock completed within {0} ms",
sw.ElapsedMilliseconds);
});
timewatch.Start();
timewatchempty.Start();
//send collection numberBlock-times
for (int i = 0; i < numberBlocks; i++)
{
// ... rest of the code

Related

C# Multi threading using Task class

I've been trying to implement multi threading which looks something like this:
static void Main(string[] args)
{
List<Task> tskList = new List<Task>();
for (int i = 0; i < 100; i++)
{
Task taskTemp = new Task(() => { Display(i); });
taskTemp.Start();
tskList.Add(taskTemp);
//Thread.Sleep(10);
}
Task.WaitAll(tskList.ToArray());
}
public static void Display(int value)
{
Thread.Sleep(1000);
Console.WriteLine(value);
}
Without the Thread.Sleep(10) part, I get output printed as 100 times "100" instead of 0 to 99 which I'm getting with that sleep time of 10 ms.
My guess is that this could be happening because of the time required to schedule the thread by the system and by the time the thread is about to actually start, the value has reached 100.
If I put enough wait time (say 1000 ms instead of 10), will it be guaranteed to not have this problem? Or should I suspect that the system may take even more time to schedule the thread when CPU utilization is too much? What is the best way to solve this problem?
Thanks in advance for any inputs!

you should add a local variable to hold 'i', such as :
for (int i = 0; i < 100; i++)
{
var t = i;
Task taskTemp = new Task(() => { Display(t); });
taskTemp.Start();
tskList.Add(taskTemp);
//Thread.Sleep(10);
}

Just make a copy of "i" to "i1" and use it as local variable. "i" is always changed, thats why you get 100 100 100....:
private static void Main(string[] args)
{
var tskList = new List<Task>();
for (var i = 0; i < 100; i++)
{
var i1 = i;
var taskTemp = new Task(() => { Display(i1); });
taskTemp.Start();
tskList.Add(taskTemp);
}
Task.WaitAll(tskList.ToArray());
}
public static void Display(int value)
{
Thread.Sleep(1000);
Console.WriteLine(value);
}

Data Propagation in TPL Dataflow Pipeline with Batchblock.Triggerbatch()

In my Producer-Consumer scenario, I have multiple consumers, and each of the consumers send an action to external hardware, which may take some time. My Pipeline looks somewhat like this:
BatchBlock --> TransformBlock --> BufferBlock --> (Several) ActionBlocks
I have assigned BoundedCapacity of my ActionBlocks to 1.
What I want in theory is, I want to trigger the Batchblock to send a group of items to the Transformblock only when one of my Actionblocks are available for operation. Till then the Batchblock should just keep buffering elements and not pass them on to the Transformblock. My batch-sizes are variable. As Batchsize is mandatory, I do have a really high upper-limit for BatchBlock batch size, however I really don't wish to reach upto that limit, I would like to trigger my batches depending upon the availability of the Actionblocks permforming the said task.
I have achieved this with the help of the Triggerbatch() method. I am calling the Batchblock.Triggerbatch() as the last action in my ActionBlock.However interestingly after several days of working properly the pipeline has come to a hault. Upon checking I found out that sometimes the inputs to the batchblock come in after the ActionBlocks are done with their work. In this case the ActionBlocks do actually call Triggerbatch at the end of their work, however since at this point there is no input to the Batchblock at all, the call to TriggerBatch is fruitless. And after a while when inputs do flow in to the Batchblock, there is no one left to call TriggerBatch and restart the Pipeline. I was looking for something where I could just check if something is infact present in the inputbuffer of the Batchblock, however there is no such feature available, I could also not find a way to check if the TriggerBatch was fruitful.
Could anyone suggest a possible solution to my problem. Unfortunately using a Timer to triggerbatches is not an option for me. Except for the start of the Pipeline, the throttling should be governed only by the availability of one of the ActionBlocks.
The example code is here:
static BatchBlock<int> _groupReadTags;
static void Main(string[] args)
{
_groupReadTags = new BatchBlock<int>(1000);
var bufferOptions = new DataflowBlockOptions{BoundedCapacity = 2};
BufferBlock<int> _frameBuffer = new BufferBlock<int>(bufferOptions);
var consumerOptions = new ExecutionDataflowBlockOptions { BoundedCapacity = 1};
int batchNo = 1;
TransformBlock<int[], int> _workingBlock = new TransformBlock<int[], int>(list =>
{
Console.WriteLine("\n\nWorking on Batch Number {0}", batchNo);
//_groupReadTags.TriggerBatch();
int sum = 0;
foreach (int item in list)
{
Console.WriteLine("Elements in batch {0} :: {1}", batchNo, item);
sum += item;
}
batchNo++;
return sum;
});
ActionBlock<int> _worker1 = new ActionBlock<int>(async x =>
{
Console.WriteLine("Number from ONE :{0}",x);
await Task.Delay(500);
Console.WriteLine("BatchBlock Output Count : {0}", _groupReadTags.OutputCount);
_groupReadTags.TriggerBatch();
},consumerOptions);
ActionBlock<int> _worker2 = new ActionBlock<int>(async x =>
{
Console.WriteLine("Number from TWO :{0}", x);
await Task.Delay(2000);
_groupReadTags.TriggerBatch();
}, consumerOptions);
_groupReadTags.LinkTo(_workingBlock);
_workingBlock.LinkTo(_frameBuffer);
_frameBuffer.LinkTo(_worker1);
_frameBuffer.LinkTo(_worker2);
_groupReadTags.Post(10);
_groupReadTags.Post(20);
_groupReadTags.TriggerBatch();
Task postingTask = new Task(() => PostStuff());
postingTask.Start();
Console.ReadLine();
}
static void PostStuff()
{
for (int i = 0; i < 10; i++)
{
_groupReadTags.Post(i);
Thread.Sleep(100);
}
Parallel.Invoke(
() => _groupReadTags.Post(100),
() => _groupReadTags.Post(200),
() => _groupReadTags.Post(300),
() => _groupReadTags.Post(400),
() => _groupReadTags.Post(500),
() => _groupReadTags.Post(600),
() => _groupReadTags.Post(700),
() => _groupReadTags.Post(800)
);
}

Here is an alternative BatchBlock implementation with some extra features. It includes a TriggerBatch method with this signature:
public int TriggerBatch(int nextMinBatchSizeIfEmpty);
Invoking this method will either trigger a batch immediately if the input queue is not empty, otherwise it will set a temporary MinBatchSize that will affect only the next batch. You could invoke this method with a small value for nextMinBatchSizeIfEmpty to ensure that in case a batch cannot be currently produced, the next batch will occur sooner than the configured BatchSize at the block's constructor.
This method returns the size of the batch produced. It returns 0 in case that the input queue is empty, or the output queue is full, or the block has completed.
public class BatchBlockEx<T> : ITargetBlock<T>, ISourceBlock<T[]>
{
private readonly ITargetBlock<T> _input;
private readonly IPropagatorBlock<T[], T[]> _output;
private readonly Queue<T> _queue;
private readonly object _locker = new object();
private int _nextMinBatchSize = Int32.MaxValue;
public Task Completion { get; }
public int InputCount { get { lock (_locker) return _queue.Count; } }
public int OutputCount => ((BufferBlock<T[]>)_output).Count;
public int BatchSize { get; }
public BatchBlockEx(int batchSize, DataflowBlockOptions dataflowBlockOptions = null)
{
if (batchSize < 1) throw new ArgumentOutOfRangeException(nameof(batchSize));
dataflowBlockOptions = dataflowBlockOptions ?? new DataflowBlockOptions();
if (dataflowBlockOptions.BoundedCapacity != DataflowBlockOptions.Unbounded &&
dataflowBlockOptions.BoundedCapacity < batchSize)
throw new ArgumentOutOfRangeException(nameof(batchSize),
"Number must be no greater than the value specified in BoundedCapacity.");
this.BatchSize = batchSize;
_output = new BufferBlock<T[]>(dataflowBlockOptions);
_queue = new Queue<T>(batchSize);
_input = new ActionBlock<T>(async item =>
{
T[] batch = null;
lock (_locker)
{
_queue.Enqueue(item);
if (_queue.Count == batchSize || _queue.Count >= _nextMinBatchSize)
{
batch = _queue.ToArray(); _queue.Clear();
_nextMinBatchSize = Int32.MaxValue;
}
}
if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
}, new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 1,
CancellationToken = dataflowBlockOptions.CancellationToken
});
var inputContinuation = _input.Completion.ContinueWith(async t =>
{
try
{
T[] batch = null;
lock (_locker)
{
if (_queue.Count > 0)
{
batch = _queue.ToArray(); _queue.Clear();
}
}
if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
}
finally
{
if (t.IsFaulted)
{
_output.Fault(t.Exception.InnerException);
}
else
{
_output.Complete();
}
}
}, TaskScheduler.Default).Unwrap();
this.Completion = Task.WhenAll(inputContinuation, _output.Completion);
}
public void Complete() => _input.Complete();
void IDataflowBlock.Fault(Exception ex) => _input.Fault(ex);
public int TriggerBatch(Func<T[], bool> condition, int nextMinBatchSizeIfEmpty)
{
if (nextMinBatchSizeIfEmpty < 1)
throw new ArgumentOutOfRangeException(nameof(nextMinBatchSizeIfEmpty));
int count = 0;
lock (_locker)
{
if (_queue.Count > 0)
{
T[] batch = _queue.ToArray();
if (condition == null || condition(batch))
{
bool accepted = _output.Post(batch);
if (accepted) { _queue.Clear(); count = batch.Length; }
}
_nextMinBatchSize = Int32.MaxValue;
}
else
{
_nextMinBatchSize = nextMinBatchSizeIfEmpty;
}
}
return count;
}
public int TriggerBatch(Func<T[], bool> condition)
=> TriggerBatch(condition, Int32.MaxValue);
public int TriggerBatch(int nextMinBatchSizeIfEmpty)
=> TriggerBatch(null, nextMinBatchSizeIfEmpty);
public int TriggerBatch() => TriggerBatch(null, Int32.MaxValue);
DataflowMessageStatus ITargetBlock<T>.OfferMessage(
DataflowMessageHeader messageHeader, T messageValue,
ISourceBlock<T> source, bool consumeToAccept)
{
return _input.OfferMessage(messageHeader, messageValue, source,
consumeToAccept);
}
T[] ISourceBlock<T[]>.ConsumeMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target, out bool messageConsumed)
{
return _output.ConsumeMessage(messageHeader, target, out messageConsumed);
}
bool ISourceBlock<T[]>.ReserveMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target)
{
return _output.ReserveMessage(messageHeader, target);
}
void ISourceBlock<T[]>.ReleaseReservation(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target)
{
_output.ReleaseReservation(messageHeader, target);
}
IDisposable ISourceBlock<T[]>.LinkTo(ITargetBlock<T[]> target,
DataflowLinkOptions linkOptions)
{
return _output.LinkTo(target, linkOptions);
}
}
Another overload of the TriggerBatch method allows to examine the batch that can be currently produced, and decide if it should be triggered or not:
public int TriggerBatch(Func<T[], bool> condition);
The BatchBlockEx class does not support the Greedy and MaxNumberOfGroups options of the built-in BatchBlock.

I have found that using TriggerBatch in this way is unreliable:
_groupReadTags.Post(10);
_groupReadTags.Post(20);
_groupReadTags.TriggerBatch();
Apparently TriggerBatch is intended to be used inside the block, not outside it like this. I have seen this result in odd timing issues, like items from next batch batch being included in the current batch, even though TriggerBatch was called first.
Please see my answer to this question for an alternative using DataflowBlock.Encapsulate: BatchBlock produces batch with elements sent after TriggerBatch()

TPL Dataflow Blocks

Question: Why using a WriteOnceBlock (or BufferBlock) for getting back the answer (like sort of callback) from another BufferBlock<Action> (getting back the answer happens in that posted Action) causes a deadlock (in this code)?
I thought that methods in a class can be considered as messages that we are sending to the object (like the original point of view about OOP that was proposed by - I think - Alan Kay). So I wrote this generic Actor class that helps to convert and ordinary object to an Actor (Of-course there are lots of unseen loopholes here because of mutability and things, but that's not the main concern here).
So we have these definitions:
public class Actor<T>
{
private readonly T _processor;
private readonly BufferBlock<Action<T>> _messageBox = new BufferBlock<Action<T>>();
public Actor(T processor)
{
_processor = processor;
Run();
}
public event Action<T> Send
{
add { _messageBox.Post(value); }
remove { }
}
private async void Run()
{
while (true)
{
var action = await _messageBox.ReceiveAsync();
action(_processor);
}
}
}
public interface IIdGenerator
{
long Next();
}
Now; why this code works:
static void Main(string[] args)
{
var idGenerator1 = new IdInt64();
var idServer1 = new Actor<IIdGenerator>(idGenerator1);
const int n = 1000;
for (var i = 0; i < n; i++)
{
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(answer.Receive());
}, TaskCreationOptions.LongRunning); // Runs on a separate new thread
t.Start();
}
Console.WriteLine("press any key you like! :)");
Console.ReadKey();
Trace.Flush();
}
And this code does not work:
static void Main(string[] args)
{
var idGenerator1 = new IdInt64();
var idServer1 = new Actor<IIdGenerator>(idGenerator1);
const int n = 1000;
for (var i = 0; i < n; i++)
{
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(answer.Receive());
}, TaskCreationOptions.PreferFairness); // Runs and is managed by Task Scheduler
t.Start();
}
Console.WriteLine("press any key you like! :)");
Console.ReadKey();
Trace.Flush();
}
Different TaskCreationOptions used here to create Tasks. Maybe I am wrong about TPL Dataflow concepts here, just started to use it (A [ThreadStatic] hidden somewhere?).

The problematic issue with your code is this part: answer.Receive().
When you move it inside the action the deadlock doesn't happen:
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
Trace.WriteLine(answer.Receive());
};
idServer1.Send += action;
});
t.Start();
So why is that? answer.Receive();, as opposed to await answer.ReceiveAsnyc(); blocks the thread until an answer is returned. When you use TaskCreationOptions.LongRunning each task gets its own thread, so there's no problem, but without it (the TaskCreationOptions.PreferFairness is irrelevant) all the thread pool threads are busy waiting and so everything is much slower. It doesn't actually deadlock, as you can see when you use 15 instead of 1000.
There are other solutions that help understand the problem:
Increasing the thread pool with ThreadPool.SetMinThreads(1000, 0); before the original code.
Using ReceiveAsnyc:
Task.Run(async () =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(await answer.ReceiveAsync());
});

Why one loop is performing better than other memory wise as well as performance wise?

I have following two loops in C#, and I am running these loops for a collection with 10,000 records being downloaded with paging using "yield return"
First
foreach(var k in collection) {
repo.Save(k);
}
Second
var collectionEnum = collection.GetEnumerator();
while (collectionEnum.MoveNext()) {
var k = collectionEnum.Current;
repo.Save(k);
k = null;
}
Seems like that the second loop consumes less memory and it faster than the first loop. Memory I understand may be because of k being set to null(Even though I am not sure). But how come it is faster than for each.
Following is the actual code
[Test]
public void BechmarkForEach_Test() {
bool isFirstTimeSync = true;
Func<Contact, bool> afterProcessing = contactItem => {
return true;
};
var contactService = CreateSerivce("/administrator/components/com_civicrm");
var contactRepo = new ContactRepository(new Mock<ILogger>().Object);
contactRepo.Drop();
contactRepo = new ContactRepository(new Mock<ILogger>().Object);
Profile("For Each Profiling",1,()=>{
var localenumertaor=contactService.Download();
foreach (var item in localenumertaor) {
if (isFirstTimeSync)
item.StateFlag = 1;
item.ClientTimeStamp = DateTime.UtcNow;
if (item.StateFlag == 1)
contactRepo.Insert(item);
else
contactRepo.Update(item);
afterProcessing(item);
}
contactRepo.DeleteAll();
});
}
[Test]
public void BechmarkWhile_Test() {
bool isFirstTimeSync = true;
Func<Contact, bool> afterProcessing = contactItem => {
return true;
};
var contactService = CreateSerivce("/administrator/components/com_civicrm");
var contactRepo = new ContactRepository(new Mock<ILogger>().Object);
contactRepo.Drop();
contactRepo = new ContactRepository(new Mock<ILogger>().Object);
var itemsCollection = contactService.Download().GetEnumerator();
Profile("While Profiling", 1, () =>
{
while (itemsCollection.MoveNext()) {
var item = itemsCollection.Current;
//if First time sync then ignore and overwrite the stateflag
if (isFirstTimeSync)
item.StateFlag = 1;
item.ClientTimeStamp = DateTime.UtcNow;
if (item.StateFlag == 1)
contactRepo.Insert(item);
else
contactRepo.Update(item);
afterProcessing(item);
item = null;
}
contactRepo.DeleteAll();
});
}
static void Profile(string description, int iterations, Action func) {
// clean up
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
// warm up
func();
var watch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++) {
func();
}
watch.Stop();
Console.Write(description);
Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
I m using the micro bench marking, from a stackoverflow question itself benchmarking-small-code
The time taken is
For Each Profiling Time Elapsed 5249 ms
While Profiling Time Elapsed 116 ms

Your foreach version calls var localenumertaor = contactService.Download(); inside the profile action, while the enumerator version calls it outside of the Profile call.
On top of that, the first execution of the iterator version will exhaust the items in the enumerator, and on subsequent iterations itemsCollection.MoveNext() will return false and skip the inner loop entirely.

Exact time measurement for performance testing [duplicate]

This question already has answers here:
How to measure code performance in .NET?
(18 answers)
Closed 9 years ago.
What is the most exact way of seeing how long something, for example a method call, took in code?
The easiest and quickest I would guess is this:
DateTime start = DateTime.Now;
{
// Do some work
}
TimeSpan timeItTook = DateTime.Now - start;
But how exact is this? Are there better ways?

A better way is to use the Stopwatch class:
using System.Diagnostics;
// ...
Stopwatch sw = new Stopwatch();
sw.Start();
// ...
sw.Stop();
Console.WriteLine("Elapsed={0}",sw.Elapsed);

As others have said, Stopwatch is a good class to use here. You can wrap it in a helpful method:
public static TimeSpan Time(Action action)
{
Stopwatch stopwatch = Stopwatch.StartNew();
action();
stopwatch.Stop();
return stopwatch.Elapsed;
}
(Note the use of Stopwatch.StartNew(). I prefer this to creating a Stopwatch and then calling Start() in terms of simplicity.) Obviously this incurs the hit of invoking a delegate, but in the vast majority of cases that won't be relevant. You'd then write:
TimeSpan time = StopwatchUtil.Time(() =>
{
// Do some work
});
You could even make an ITimer interface for this, with implementations of StopwatchTimer, CpuTimer etc where available.

As others said, Stopwatch should be the right tool for this. There can be few improvements made to it though, see this thread specifically: Benchmarking small code samples in C#, can this implementation be improved?.
I have seen some useful tips by Thomas Maierhofer here
Basically his code looks like:
//prevent the JIT Compiler from optimizing Fkt calls away
long seed = Environment.TickCount;
//use the second Core/Processor for the test
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2);
//prevent "Normal" Processes from interrupting Threads
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
//prevent "Normal" Threads from interrupting this thread
Thread.CurrentThread.Priority = ThreadPriority.Highest;
//warm up
method();
var stopwatch = new Stopwatch()
for (int i = 0; i < repetitions; i++)
{
stopwatch.Reset();
stopwatch.Start();
for (int j = 0; j < iterations; j++)
method();
stopwatch.Stop();
print stopwatch.Elapsed.TotalMilliseconds;
}
Another approach is to rely on Process.TotalProcessTime to measure how long the CPU has been kept busy running the very code/process, as shown here This can reflect more real scenario since no other process affects the measurement. It does something like:
var start = Process.GetCurrentProcess().TotalProcessorTime;
method();
var stop = Process.GetCurrentProcess().TotalProcessorTime;
print (end - begin).TotalMilliseconds;
A naked, detailed implementation of the samething can be found here.
I wrote a helper class to perform both in an easy to use manner:
public class Clock
{
interface IStopwatch
{
bool IsRunning { get; }
TimeSpan Elapsed { get; }
void Start();
void Stop();
void Reset();
}
class TimeWatch : IStopwatch
{
Stopwatch stopwatch = new Stopwatch();
public TimeSpan Elapsed
{
get { return stopwatch.Elapsed; }
}
public bool IsRunning
{
get { return stopwatch.IsRunning; }
}
public TimeWatch()
{
if (!Stopwatch.IsHighResolution)
throw new NotSupportedException("Your hardware doesn't support high resolution counter");
//prevent the JIT Compiler from optimizing Fkt calls away
long seed = Environment.TickCount;
//use the second Core/Processor for the test
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2);
//prevent "Normal" Processes from interrupting Threads
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
//prevent "Normal" Threads from interrupting this thread
Thread.CurrentThread.Priority = ThreadPriority.Highest;
}
public void Start()
{
stopwatch.Start();
}
public void Stop()
{
stopwatch.Stop();
}
public void Reset()
{
stopwatch.Reset();
}
}
class CpuWatch : IStopwatch
{
TimeSpan startTime;
TimeSpan endTime;
bool isRunning;
public TimeSpan Elapsed
{
get
{
if (IsRunning)
throw new NotImplementedException("Getting elapsed span while watch is running is not implemented");
return endTime - startTime;
}
}
public bool IsRunning
{
get { return isRunning; }
}
public void Start()
{
startTime = Process.GetCurrentProcess().TotalProcessorTime;
isRunning = true;
}
public void Stop()
{
endTime = Process.GetCurrentProcess().TotalProcessorTime;
isRunning = false;
}
public void Reset()
{
startTime = TimeSpan.Zero;
endTime = TimeSpan.Zero;
}
}
public static void BenchmarkTime(Action action, int iterations = 10000)
{
Benchmark<TimeWatch>(action, iterations);
}
static void Benchmark<T>(Action action, int iterations) where T : IStopwatch, new()
{
//clean Garbage
GC.Collect();
//wait for the finalizer queue to empty
GC.WaitForPendingFinalizers();
//clean Garbage
GC.Collect();
//warm up
action();
var stopwatch = new T();
var timings = new double[5];
for (int i = 0; i < timings.Length; i++)
{
stopwatch.Reset();
stopwatch.Start();
for (int j = 0; j < iterations; j++)
action();
stopwatch.Stop();
timings[i] = stopwatch.Elapsed.TotalMilliseconds;
print timings[i];
}
print "normalized mean: " + timings.NormalizedMean().ToString();
}
public static void BenchmarkCpu(Action action, int iterations = 10000)
{
Benchmark<CpuWatch>(action, iterations);
}
}
Just call
Clock.BenchmarkTime(() =>
{
//code
}, 10000000);
or
Clock.BenchmarkCpu(() =>
{
//code
}, 10000000);
The last part of the Clock is the tricky part. If you want to display the final timing, its up to you to choose what sort of timing you want. I wrote an extension method NormalizedMean which gives you the mean of the read timings discarding the noise. I mean I calculate the the deviation of each timing from the actual mean, and then I discard the values which was farer (only the slower ones) from the mean of deviation (called absolute deviation; note that its not the often heard standard deviation), and finally return the mean of remaining values. This means, for instance, if timed values are { 1, 2, 3, 2, 100 } (in ms or whatever), it discards 100, and returns the mean of { 1, 2, 3, 2 } which is 2. Or if timings are { 240, 220, 200, 220, 220, 270 }, it discards 270, and returns the mean of { 240, 220, 200, 220, 220 } which is 220.
public static double NormalizedMean(this ICollection<double> values)
{
if (values.Count == 0)
return double.NaN;
var deviations = values.Deviations().ToArray();
var meanDeviation = deviations.Sum(t => Math.Abs(t.Item2)) / values.Count;
return deviations.Where(t => t.Item2 > 0 || Math.Abs(t.Item2) <= meanDeviation).Average(t => t.Item1);
}
public static IEnumerable<Tuple<double, double>> Deviations(this ICollection<double> values)
{
if (values.Count == 0)
yield break;
var avg = values.Average();
foreach (var d in values)
yield return Tuple.Create(d, avg - d);
}

Use the Stopwatch class

System.Diagnostics.Stopwatch is designed for this task.

Stopwatch is fine, but loop the work 10^6 times, then divide by 10^6.
You'll get a lot more precision.

I'm using this:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(myUrl);
System.Diagnostics.Stopwatch timer = new Stopwatch();
timer.Start();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
statusCode = response.StatusCode.ToString();
response.Close();
timer.Stop();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

TPL Dataflow, confused about core design - c#

Related

C# Multi threading using Task class

Data Propagation in TPL Dataflow Pipeline with Batchblock.Triggerbatch()

TPL Dataflow Blocks

Why one loop is performing better than other memory wise as well as performance wise?

Exact time measurement for performance testing [duplicate]

Categories

Resources