BoundedCapacity of linked ActionBlock is not respected - c#

I have a sequential pipeline that consists of two steps.
(simplified example)
The first step simply adds 1000 to the input number.
The second step simply displays the number.
var transformBlock = new TransformBlock<int, long>(StepOne, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 1,
BoundedCapacity = DataflowBlockOptions.Unbounded,
});
var actionBlock = new ActionBlock<long>(StepTwo, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 1,
BoundedCapacity = 2,
});
transformBlock.LinkTo(actionBlock, new DataflowLinkOptions
{
PropagateCompletion = true
});
for (int i = 0; i < 100; i++)
{
transformBlock.Post(i);
}
static async Task<long> StepOne(int item)
{
await Task.Delay(500);
Console.WriteLine("transforming: " + item);
return (long)item + 1000;
}
static async Task StepTwo(long item)
{
await Task.Delay(1000);
Console.WriteLine("final product: " + item);
}
Since step 2 is taking longer than step 1, I would expect step 1 to throttle after some time since it cannot send the result to the bounded buffer of step 2.
Expected output:
Transforming: 0
Transforming: 1
Final product: 1000
Transforming: 2
Final product: 1001
Transforming: 3
Final product: 1002
Transforming: 4
Final product: 1003
...
Actual output:
Transforming: 0
Transforming: 1
Final product: 1000
Transforming: 2
Transforming: 3
Final product: 1001
Transforming: 4
Transforming: 5
Final product: 1002
Transforming: 6
Transforming: 7
Final product: 1003
...

A TransformBlock maintains two queues internally, an input queue and an output queue. The size of these two queues can be monitored at any moment through the InputCount and OutputCount properties. The accumulated size of these two queues is configured by the BoundedCapacity option, so the sum InputCount+OutputCount is always less than or equal to the BoundedCapacity value. In your case the BoundedCapacity of the block is Unbounded, so there is no limiting factor at how large these two queues can become (other than some hard limits like the Int32.MaxValue probably). The fact that the linked ActionBlock has a limited bounded capacity is mostly irrelevant, and has no consequence other than delaying the transfer of the transformed values from the output queue of the TransformBlock to the input queue of the ActionBlock. This consequence is only observable if you monitor the OutputCount property of the source block, and the InputCount property of the target block. It wouldn't even matter if the TransformBlock was not linked to any target block. It would happily continue crunching numbers by itself, until some hard limit was hit, or the memory of the machine was exhausted.

Related

rx.net buffer subscribers called with zero elements on timespan expiration

With the below sequence, I am buffering into blocks every 3 seconds - In the real world use case, the observerable source could have many items within the buffer timespan (of 3 seconds below), or sometimes no items within that time frame.
In those cases of zero items I would not want the subscriber to be called.
var numbers = Observable
.Interval(TimeSpan.FromSeconds(4))
.Select(i => (int) i + 1)
.Do(i => Console.WriteLine($"Producing {i}"));
numbers.Buffer(TimeSpan.FromSeconds(3))
.Subscribe(buffer => Console.WriteLine("Buffer of {1} # {0}", DateTime.Now, buffer.Count));
With the output below, note the Buffer of 0 where subscribe has been called with zero items.
Buffer of 0 # 19/05/2022 21:43:27
Producing 1
Buffer of 1 # 19/05/2022 21:43:30
Producing 2
Buffer of 1 # 19/05/2022 21:43:33
Buffer of 0 # 19/05/2022 21:43:36
Producing 3
Buffer of 1 # 19/05/2022 21:43:39
Producing 4
Buffer of 1 # 19/05/2022 21:43:42
Producing 5
Buffer of 1 # 19/05/2022 21:43:45
Buffer of 0 # 19/05/2022 21:43:48
Producing 6
Buffer of 1 # 19/05/2022 21:43:51
Producing 7
Buffer of 1 # 19/05/2022 21:43:54
Producing 8
Buffer of 1 # 19/05/2022 21:43:57
Buffer of 0 # 19/05/2022 21:44:00
Producing 9
As a hack I could modify to ignore zero element sequences:
numbers.Buffer(TimeSpan.FromSeconds(3))
.Subscribe( buffer =>
{
if(buffer.Count == 0) return;
Console.WriteLine("Buffer of {1} # {0}", DateTime.Now, buffer.Count);
});
Questions please:
Avoiding hacks, is there another operator that I could use (I am looking at Window but unsure of its usage) so that we only call downstream subscriber methods when we have a block of data > 0 elements?
What is the purpose of the potential for a zero length buffer to be pased?
How would one expand this example to grouping buffers by an identifer GroupId, for an example sequence Observable.Interval(timespan).Select(i => (GroupId: random.Next(0, 3), Value: (int) i + 1))?

Long Running Task Not Having Access to Console?

I'm writing a producer-> Queue -> Consumer -> Queue2 -> Consumer2 application
I have the consumer2 wait for a list to get to a threshold then start another task simulating a long running task (e.g. SQL Multi=insert).
However, when I run the application the 'long running task' (LongRunningTaskInsert()) seems to wait until all of the queues have signaled completion before writing to the console.
When I debug, the List Lists variable shows that some of the tasks are completing in the middle of the application.
Am I doing something wrong/naughty with tasks?
Code:
class Program
{
static void Main(string[] args)
{
BlockingCollection<string> bag1 = new BlockingCollection<string>();
BlockingCollection<string> bag2 = new BlockingCollection<string>();
var Tasks = new List<Task>();
List<string> Container = new List<string>();
Task consumer2 = Task.Factory.StartNew(() =>
{
foreach (var item in bag2.GetConsumingEnumerable())
{
Container.Add(item);
if (bag2.IsCompleted || Container.Count > 5)
{
Console.WriteLine("Container:");
Container.ForEach(y =>
{
Console.Write($"Value: {y}, ");
});
Console.Write("\n");
var newTask = Task.Factory.StartNew(() => {
Thread.Sleep(2000);
LongRunningTaskInsert();
}
);
Tasks.Add(newTask);
Container.Clear();
}
}
Task.WhenAll(Tasks);
});
//this is a task that evaluates all available elements on separate threads.
Task consumer1 = Task.Factory.StartNew(() =>
{
//do something with the consumer
Parallel.ForEach(
bag1.GetConsumingEnumerable(),
(x) =>
{
Console.WriteLine($"Consumer {x} => bag2, thread {Thread.CurrentThread.ManagedThreadId}");
bag2.Add(x);
});
bag2.CompleteAdding();
});
Task producer = Task.Factory.StartNew(() =>
{
//do something to put records into the bad
for (int i = 0; i < 10; i++)
{
System.Threading.Thread.Sleep(500);
bag1.Add(i.ToString());
bag1.Add((i * 10).ToString());
bag1.Add((i + 10).ToString());
Console.WriteLine($"Producer: {i} & { i * 10} & {i + 10}");
}
bag1.CompleteAdding();
});
producer.Wait();
consumer1.Wait();
consumer2.Wait();
Console.Read();
}
private static bool LongRunningTaskInsert()
{
//Thread.Sleep(1000);
Console.WriteLine("Long Running Task Complete");
return true;
}
}
edit:
The output I'm getting is:
Producer: 0 & 0 & 10
Consumer 0 => bag2, thread 4
Consumer 0 => bag2, thread 6
Consumer 10 => bag2, thread 5
Producer: 1 & 10 & 11
Consumer 10 => bag2, thread 8
Consumer 11 => bag2, thread 10
Consumer 1 => bag2, thread 9
Container:
Value: 0, Value: 0, Value: 10, Value: 10, Value: 11, Value: 1,
Producer: 2 & 20 & 12
Consumer 20 => bag2, thread 4
Consumer 2 => bag2, thread 6
Consumer 12 => bag2, thread 5
Producer: 3 & 30 & 13
Consumer 3 => bag2, thread 10
Consumer 30 => bag2, thread 9
Consumer 13 => bag2, thread 8
Container:
Value: 20, Value: 2, Value: 12, Value: 3, Value: 30, Value: 13,
Producer: 4 & 40 & 14
Consumer 4 => bag2, thread 4
Consumer 40 => bag2, thread 6
Consumer 14 => bag2, thread 5
Producer: 5 & 50 & 15
Consumer 5 => bag2, thread 10
Consumer 15 => bag2, thread 8
Consumer 50 => bag2, thread 9
Container:
Value: 4, Value: 40, Value: 14, Value: 5, Value: 15, Value: 50,
Producer: 6 & 60 & 16
Consumer 6 => bag2, thread 6
Consumer 60 => bag2, thread 6
Producer: 7 & 70 & 17
Consumer 16 => bag2, thread 4
Consumer 70 => bag2, thread 5
Consumer 17 => bag2, thread 5
Consumer 7 => bag2, thread 4
Container:
Value: 6, Value: 60, Value: 16, Value: 70, Value: 17, Value: 7,
Producer: 8 & 80 & 18
Consumer 8 => bag2, thread 6
Consumer 80 => bag2, thread 6
Producer: 9 & 90 & 19
Consumer 90 => bag2, thread 4
Consumer 19 => bag2, thread 4
Consumer 18 => bag2, thread 8
Consumer 9 => bag2, thread 8
Container:
Value: 8, Value: 80, Value: 90, Value: 19, Value: 18, Value: 9,
Long Running Task Complete
Long Running Task Complete
Long Running Task Complete
Long Running Task Complete
Long Running Task Complete
I expect the 'Long Running Task Complete' to be mixed in and not all clustered at the end.
The Parallel.Foreach statement is spawning a bunch of threads and my method LongRunningTaskInsert() isn't getting any clock time.. If I change that to a synchronous foreach loop my number of threads is reduced from 8 to 4 and I get the results I expect (where the long running task console calls are mixed in).

Parallel processing for a List

My scenario:
I need to process a list of elements. Each element processing is highly time consuming (1-10 seconds)
Instead of a
List retval = new List();
foreach (item in myList)
retval.Add(ProcessItem(item));
return retval;
I want to parallel process each item.
I know .NET has got a number of approach for parallel processing: what is the best one? (note, I'm stuck to 3.5 framework version, cannot use Task, async and all nancy features coming with .Net 4...)
Here my try using delegates:
private void DoTest(int processingTaskDuration)
{
List<int> itemsToProcess = new List<int>();
for (int i = 1; i <= 20; i++)
itemsToProcess.Add(i);
TestClass tc = new TestClass(processingTaskDuration);
DateTime start = DateTime.Now;
List<int> result = tc.ProcessList(itemsToProcess);
TimeSpan elapsed = DateTime.Now - start;
System.Diagnostics.Debug.WriteLine(string.Format("elapsed (msec)= {0}", (int)elapsed.TotalMilliseconds));
}
public class TestClass
{
static int s_Counter = 0;
static object s_lockObject = new Object();
int m_TaskMsecDuration = 0;
public TestClass() :
this(5000)
{
}
public TestClass(int taskMsecDuration)
{
m_TaskMsecDuration = taskMsecDuration;
}
public int LongOperation(int itemToProcess)
{
int currentCounter = 0;
lock (s_lockObject)
{
s_Counter++;
currentCounter = s_Counter;
}
System.Diagnostics.Debug.WriteLine(string.Format("LongOperation\tStart\t{0}\t{1}\t{2}", currentCounter, System.Threading.Thread.CurrentThread.ManagedThreadId, DateTime.Now.ToString("HH:mm:ss.ffffff")));
// time consuming task, e.g 5 seconds
Thread.Sleep(m_TaskMsecDuration);
int retval = itemToProcess * 2;
System.Diagnostics.Debug.WriteLine(string.Format("LongOperation\tEnd \t{0}\t{1}\t{2}", currentCounter, System.Threading.Thread.CurrentThread.ManagedThreadId, DateTime.Now.ToString("HH:mm:ss.ffffff")));
return retval;
}
delegate int LongOperationDelegate(int itemToProcess);
public List<int> ProcessList(List<int> itemsToProcess)
{
List<IAsyncResult> asyncResults = new List<IAsyncResult>();
LongOperationDelegate del = LongOperation;
foreach (int item in itemsToProcess)
{
IAsyncResult res = del.BeginInvoke(item, null, null);
asyncResults.Add(res);
}
// list of waitHandles to wait for
List<WaitHandle> waitHandles = new List<WaitHandle>();
asyncResults.ForEach(el => waitHandles.Add(el.AsyncWaitHandle));
// wait for processing every item
WaitHandle.WaitAll(waitHandles.ToArray());
// retrieve result of processing
List<int> retval = new List<int>();
asyncResults.ForEach(res =>
{
int singleProcessingResult = del.EndInvoke(res);
retval.Add(singleProcessingResult);
}
);
return retval;
}
}
And thats some output (column #3 is a progressive counter, use it to match start with end of a call, #4 is threadID and last is a timeStamp)
LongOperation Start 1 6 15:11:18.331619
LongOperation Start 2 12 15:11:18.331619
LongOperation Start 3 13 15:11:19.363722
LongOperation Start 4 14 15:11:19.895775
LongOperation Start 5 15 15:11:20.406826
LongOperation Start 6 16 15:11:21.407926
LongOperation Start 7 17 15:11:22.410026
LongOperation End 1 6 15:11:23.360121
LongOperation End 2 12 15:11:23.361122
LongOperation Start 8 12 15:11:23.363122
LongOperation Start 9 6 15:11:23.365122
LongOperation Start 10 18 15:11:23.907176
LongOperation End 3 13 15:11:24.365222
LongOperation Start 11 13 15:11:24.366222
LongOperation End 4 14 15:11:24.897275
LongOperation Start 12 14 15:11:24.898275
LongOperation Start 13 19 15:11:25.407326
LongOperation End 5 15 15:11:25.408326
LongOperation Start 14 15 15:11:25.412327
LongOperation Start 15 20 15:11:26.407426
LongOperation End 6 16 15:11:26.410426
LongOperation Start 16 16 15:11:26.410426
LongOperation Start 17 21 15:11:27.408526
LongOperation End 7 17 15:11:27.411527
LongOperation Start 18 17 15:11:27.413527
LongOperation End 8 12 15:11:28.365622
LongOperation Start 19 12 15:11:28.366622
LongOperation End 9 6 15:11:28.366622
LongOperation Start 20 6 15:11:28.389624
LongOperation End 10 18 15:11:28.908676
LongOperation End 11 13 15:11:29.367722
LongOperation End 12 14 15:11:29.899775
LongOperation End 13 19 15:11:30.411827
LongOperation End 14 15 15:11:30.413827
LongOperation End 15 20 15:11:31.407926
LongOperation End 16 16 15:11:31.411927
LongOperation End 17 21 15:11:32.413027
LongOperation End 18 17 15:11:32.416027
LongOperation End 19 12 15:11:33.389124
LongOperation End 20 6 15:11:33.391124
elapsed (msec)= 15075
So:
Is Delegate approach the right one?
Did I implement it right?
If so, why the 3rd operations starts one second after the first two (and so on)?
I mean, I'd like the whole processing complete in more or less the time of one single processing, but it seems the system uses thread pool in a strange way. After all, I'm asking 20 threads, and it waits to span the 3rd one just after the first two calls.
I think the 3.5 backport of Reactive Extensions comes with an implementation of Parallel.ForEach() that you should be able to use. The port should just contain only what was needed to get Rx to work on 3.5, but that should be enough.
Others have tried implementing it as well, basically just queuing work items on ThreadPool.
void Main()
{
var list = new List<int>{ 1,2,3 };
var processes = list.Count();
foreach (var item in list)
{
ThreadPool.QueueUserWorkItem(s => {
ProcessItem(item);
processes--;
});
}
while (processes > 0) { Thread.Sleep(10); }
}
static void ProcessItem(int item)
{
Thread.Sleep(100); // do work
}
I got rid of my third question:
If so, why the 3rd operations starts one second after the first two
(and so on)?
The problem seems to be in the default way ThreadPool manages thread spawning: see http://msdn.microsoft.com/en-us/library/0ka9477y%28v=VS.90%29.aspx. Quote:
The thread pool has a built-in delay (half a second in the .NET
Framework version 2.0) before starting new idle threads. If your
application periodically starts many tasks in a short time, a small
increase in the number of idle threads can produce a significant
increase in throughput. Setting the number of idle threads too high
consumes system resources needlessly.
It seems a call to ThreadPool.SetMinThreads with a proper value helps a lot.
At the start of my ProcessList, I inserted a call to this method:
private void SetUpThreadPool(int numThreadDesired)
{
int currentWorkerThreads;
int currentCompletionPortThreads;
ThreadPool.GetMinThreads(out currentWorkerThreads, out currentCompletionPortThreads);
//System.Diagnostics.Debug.WriteLine(string.Format("ThreadPool.GetMinThreads: workerThreads = {0}, completionPortThreads = {1}", workerThreads, completionPortThreads));
const int MAXIMUM_VALUE_FOR_SET_MIN_THREAD_PARAM = 20;
int numMinThreadToSet = Math.Min(numThreadDesired, MAXIMUM_VALUE_FOR_SET_MIN_THREAD_PARAM);
if (currentWorkerThreads < numMinThreadToSet)
ThreadPool.SetMinThreads(numThreadDesired, currentCompletionPortThreads);
}
public List<int> ProcessList(List<int> itemsToProcess)
{
SetUpThreadPool(documentNumberList.Count);
...
}
Now all thread (up to 20) start at the same moment, without delay. I think 20 is a good compromise for MAXIMUM_VALUE_FOR_SET_MIN_THREAD_PARAM: not too hight, and fits my particular requirements
Still wondering about main questions
Is Delegate approach the right one?
Did I implement it right?
Thanks to everyone helping.

Incorrect linear interpolation with large x values using Math.Net Numerics

I'm trying to use Math.NET Numerics to do interpolation of a DateTime - Value series. I started off with linear interpolation, but am getting some very off looking results.
Running this test:
public class script{
public void check_numerics()
{
var ticks = DateTime.Now.Ticks;
Console.WriteLine("Ticks: " + ticks);
var xValues = new double[] { ticks, ticks + 1000, ticks + 2000, ticks + 3000, ticks + 4000, ticks + 5000 };
var yValues = new double[] {0, 1, 2, 3, 4, 5};
var spline = Interpolate.LinearBetweenPoints(xValues, yValues);
var ticks2 = ticks;
for (int i = 0; i < 10; i++)
{
ticks2 += 500;
Console.WriteLine(spline.Interpolate(ticks2));
}
}
}
This gives:
Ticks: 635385235576843379
0.5
1
1.5
2
2.42857142857143 // this should be 2.5
3
3.5
4
4.5
5
Notice that 2.4285 is fairly wrong. At a different time (different ticks value) a different value will be "wrong". Is there a "bug" with large x values in Math.NET or am I expecting too much?
Just confirming the comments above as the maintainer of Math.NET Numerics:
The distance (epsilon) between the closest numbers of this magnitude that can be represented at double precision is 128:
Precision.EpsilonOf(ticks); // 128
This means that if you add or substract 128/2-1 = 63 from this number, you get back exactly the same number:
long ticks = DateTime.Now.Ticks // 635385606515570758
((long)(double)ticks) // 635385606515570816
((long)(63+(double)ticks)) // 635385606515570816
((long)(-63+(double)ticks)) // 635385606515570816
((long)(65+(double)ticks)) // 635385606515570944
((long)(-65+(double)ticks)) // 635385606515570688
The incremental steps of 500 are very close to these 128 and effectively get rounded to multiples of 128 (e.g. 512), so it's not surprising that there will be some artifacts like this.
If you reduce the time precision to milliseconds by dividing the ticks by 10000, as suggested by James, you get an epsilon of 0.0078125, and accurate results even for steps of 1 instead of 500.
Precision.EpsilonOf(ticks/10000); // 0.0078125

Using async/await, when do continuations happen?

I am attempting to use async/await in a very large already existing synchronous code base. There is some global state in this code base that works fine, if kludgy, in a synchronous context, but it doesn't work in the asynchronous context of async/await.
So, my two options seem to be to either factor out the global context which woould be a very large and very time consuming task, or do something clever with when continuations run.
In order to better understand async/await and continuations, I made a test program, shown here. Shown here.
// A method to simulate an Async read of the database.
private static Task ReadAsync()
{
return Task.Factory.StartNew(() =>
{
int max = int.MaxValue / 2;
for (int i = 0; i < max; ++i)
{
}
});
}
// An async method that creates several continuations.
private static async Task ProcessMany(int i)
{
Console.WriteLine(string.Format("{0} {1}", i.ToString(), 0));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1}", i.ToString(), 1));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1}", i.ToString(), 2));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1}", i.ToString(), 3));
}
public static void Main(string[] args)
{
Queue<Task> queue = new Queue<Task>();
for (int i = 0; i < 10; ++i)
{
queue.Enqueue(ProcessMany(i));
}
// Do some synchonous processing...
Console.WriteLine("Processing... ");
for (int i = 0; i < int.MaxValue; ++i)
{
}
Console.WriteLine("Done processing... ");
queue.Dequeue().Wait();
}
After reading all about async/await, my understanding would be that none of the continuations would happen between the "Processing.. " and "Done processing... " WriteLines.
Here is some sample output.
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
Processing...
3 1
2 1
7 1
6 1
0 1
4 1
5 1
1 1
6 2
3 2
Done processing...
7 2
2 2
0 2
4 2
5 2
1 2
6 3
3 3
7 3
2 3
0 3
I would expect the single Wait() at the end of the program to potentially yield to multiple continuations while the first one finishes, but I don't understand how any continuations could run between the "Processing... " and the "Done Processing... ". I thought there might be a yield or something in the Console.WriteLine method, so I completely replaced it, but that didn't change the output.
There is clearly a gap in my understanding of async/await. How could a continuation happen when we are simply incrementing a variable? Is the compiler or CLR injecting some sort of magic here?
Thank you in advance for any help in better understanding async/await and continuations.
EDIT:
If you edit the sample code this way as per the comment by Stephen, what's going is much more obvious.
// An async method that creates several continuations.
private static async Task ProcessMany(int i)
{
Console.WriteLine(string.Format("{0} {1} {2}", i.ToString(), 0, Thread.CurrentThread.ManagedThreadId));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1} {2}", i.ToString(), 1, Thread.CurrentThread.ManagedThreadId));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1} {2}", i.ToString(), 2, Thread.CurrentThread.ManagedThreadId));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1} {2}", i.ToString(), 3, Thread.CurrentThread.ManagedThreadId));
}
public static void Main(string[] args)
{
Queue<Task> queue = new Queue<Task>();
for (int i = 0; i < 10; ++i)
{
queue.Enqueue(ProcessMany(i));
}
// Do some synchonous processing...
Console.WriteLine("Processing... {0}", Thread.CurrentThread.ManagedThreadId);
for (int i = 0; i < int.MaxValue; ++i)
{
}
Console.WriteLine("Done processing... {0}", Thread.CurrentThread.ManagedThreadId);
queue.Dequeue().Wait();
}
Output:
0 0 9
1 0 9
2 0 9
3 0 9
4 0 9
5 0 9
6 0 9
7 0 9
8 0 9
9 0 9
Processing... 9
4 1 14
3 1 13
2 1 12
5 1 15
0 1 10
6 1 16
1 1 11
7 1 17
4 2 14
3 2 13
0 2 10
6 2 16
2 2 12
5 2 15
Done processing... 9
1 2 11
7 2 17
0 3 10
4 3 14
If you don't have a current SynchronizationContext or TaskScheduler, then the continuations will execute on a thread pool thread (separately from the main thread). This is the case in Console apps but you'll see very different behavior in WinForms/WPF/ASP.NET.
While you could control the continuation scheduling by using a custom TaskScheduler, that would be quite a bit of work with probably very little benefit. I'm not clear on what the problems are with your global state, but consider alternatives such as SemaphoreSlim.
As soon as you call the following line in your ProcessMany, each call to ProcessMany starts executing in a separate thread in a separate thread pool right away.
await ...;
So that's why you see a bunch of calls before your "Processing" printout. So while you have all those 10 ProcessMany calls executing, then you start running your large loop. As that large loop is running on your main thread, the 10 ProcessMany calls continue to execute in their threads, producing the additional printouts. Looks like your ProcessMany calls do not finish executing before your main thread loop, so they continue to spit out more results after your "Done Processing" printout.
I hope that clarifies the order of things for you.

Categories

Resources