I'm writing a producer-> Queue -> Consumer -> Queue2 -> Consumer2 application
I have the consumer2 wait for a list to get to a threshold then start another task simulating a long running task (e.g. SQL Multi=insert).
However, when I run the application the 'long running task' (LongRunningTaskInsert()) seems to wait until all of the queues have signaled completion before writing to the console.
When I debug, the List Lists variable shows that some of the tasks are completing in the middle of the application.
Am I doing something wrong/naughty with tasks?
Code:
class Program
{
static void Main(string[] args)
{
BlockingCollection<string> bag1 = new BlockingCollection<string>();
BlockingCollection<string> bag2 = new BlockingCollection<string>();
var Tasks = new List<Task>();
List<string> Container = new List<string>();
Task consumer2 = Task.Factory.StartNew(() =>
{
foreach (var item in bag2.GetConsumingEnumerable())
{
Container.Add(item);
if (bag2.IsCompleted || Container.Count > 5)
{
Console.WriteLine("Container:");
Container.ForEach(y =>
{
Console.Write($"Value: {y}, ");
});
Console.Write("\n");
var newTask = Task.Factory.StartNew(() => {
Thread.Sleep(2000);
LongRunningTaskInsert();
}
);
Tasks.Add(newTask);
Container.Clear();
}
}
Task.WhenAll(Tasks);
});
//this is a task that evaluates all available elements on separate threads.
Task consumer1 = Task.Factory.StartNew(() =>
{
//do something with the consumer
Parallel.ForEach(
bag1.GetConsumingEnumerable(),
(x) =>
{
Console.WriteLine($"Consumer {x} => bag2, thread {Thread.CurrentThread.ManagedThreadId}");
bag2.Add(x);
});
bag2.CompleteAdding();
});
Task producer = Task.Factory.StartNew(() =>
{
//do something to put records into the bad
for (int i = 0; i < 10; i++)
{
System.Threading.Thread.Sleep(500);
bag1.Add(i.ToString());
bag1.Add((i * 10).ToString());
bag1.Add((i + 10).ToString());
Console.WriteLine($"Producer: {i} & { i * 10} & {i + 10}");
}
bag1.CompleteAdding();
});
producer.Wait();
consumer1.Wait();
consumer2.Wait();
Console.Read();
}
private static bool LongRunningTaskInsert()
{
//Thread.Sleep(1000);
Console.WriteLine("Long Running Task Complete");
return true;
}
}
edit:
The output I'm getting is:
Producer: 0 & 0 & 10
Consumer 0 => bag2, thread 4
Consumer 0 => bag2, thread 6
Consumer 10 => bag2, thread 5
Producer: 1 & 10 & 11
Consumer 10 => bag2, thread 8
Consumer 11 => bag2, thread 10
Consumer 1 => bag2, thread 9
Container:
Value: 0, Value: 0, Value: 10, Value: 10, Value: 11, Value: 1,
Producer: 2 & 20 & 12
Consumer 20 => bag2, thread 4
Consumer 2 => bag2, thread 6
Consumer 12 => bag2, thread 5
Producer: 3 & 30 & 13
Consumer 3 => bag2, thread 10
Consumer 30 => bag2, thread 9
Consumer 13 => bag2, thread 8
Container:
Value: 20, Value: 2, Value: 12, Value: 3, Value: 30, Value: 13,
Producer: 4 & 40 & 14
Consumer 4 => bag2, thread 4
Consumer 40 => bag2, thread 6
Consumer 14 => bag2, thread 5
Producer: 5 & 50 & 15
Consumer 5 => bag2, thread 10
Consumer 15 => bag2, thread 8
Consumer 50 => bag2, thread 9
Container:
Value: 4, Value: 40, Value: 14, Value: 5, Value: 15, Value: 50,
Producer: 6 & 60 & 16
Consumer 6 => bag2, thread 6
Consumer 60 => bag2, thread 6
Producer: 7 & 70 & 17
Consumer 16 => bag2, thread 4
Consumer 70 => bag2, thread 5
Consumer 17 => bag2, thread 5
Consumer 7 => bag2, thread 4
Container:
Value: 6, Value: 60, Value: 16, Value: 70, Value: 17, Value: 7,
Producer: 8 & 80 & 18
Consumer 8 => bag2, thread 6
Consumer 80 => bag2, thread 6
Producer: 9 & 90 & 19
Consumer 90 => bag2, thread 4
Consumer 19 => bag2, thread 4
Consumer 18 => bag2, thread 8
Consumer 9 => bag2, thread 8
Container:
Value: 8, Value: 80, Value: 90, Value: 19, Value: 18, Value: 9,
Long Running Task Complete
Long Running Task Complete
Long Running Task Complete
Long Running Task Complete
Long Running Task Complete
I expect the 'Long Running Task Complete' to be mixed in and not all clustered at the end.
The Parallel.Foreach statement is spawning a bunch of threads and my method LongRunningTaskInsert() isn't getting any clock time.. If I change that to a synchronous foreach loop my number of threads is reduced from 8 to 4 and I get the results I expect (where the long running task console calls are mixed in).
Related
I have a sequential pipeline that consists of two steps.
(simplified example)
The first step simply adds 1000 to the input number.
The second step simply displays the number.
var transformBlock = new TransformBlock<int, long>(StepOne, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 1,
BoundedCapacity = DataflowBlockOptions.Unbounded,
});
var actionBlock = new ActionBlock<long>(StepTwo, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 1,
BoundedCapacity = 2,
});
transformBlock.LinkTo(actionBlock, new DataflowLinkOptions
{
PropagateCompletion = true
});
for (int i = 0; i < 100; i++)
{
transformBlock.Post(i);
}
static async Task<long> StepOne(int item)
{
await Task.Delay(500);
Console.WriteLine("transforming: " + item);
return (long)item + 1000;
}
static async Task StepTwo(long item)
{
await Task.Delay(1000);
Console.WriteLine("final product: " + item);
}
Since step 2 is taking longer than step 1, I would expect step 1 to throttle after some time since it cannot send the result to the bounded buffer of step 2.
Expected output:
Transforming: 0
Transforming: 1
Final product: 1000
Transforming: 2
Final product: 1001
Transforming: 3
Final product: 1002
Transforming: 4
Final product: 1003
...
Actual output:
Transforming: 0
Transforming: 1
Final product: 1000
Transforming: 2
Transforming: 3
Final product: 1001
Transforming: 4
Transforming: 5
Final product: 1002
Transforming: 6
Transforming: 7
Final product: 1003
...
A TransformBlock maintains two queues internally, an input queue and an output queue. The size of these two queues can be monitored at any moment through the InputCount and OutputCount properties. The accumulated size of these two queues is configured by the BoundedCapacity option, so the sum InputCount+OutputCount is always less than or equal to the BoundedCapacity value. In your case the BoundedCapacity of the block is Unbounded, so there is no limiting factor at how large these two queues can become (other than some hard limits like the Int32.MaxValue probably). The fact that the linked ActionBlock has a limited bounded capacity is mostly irrelevant, and has no consequence other than delaying the transfer of the transformed values from the output queue of the TransformBlock to the input queue of the ActionBlock. This consequence is only observable if you monitor the OutputCount property of the source block, and the InputCount property of the target block. It wouldn't even matter if the TransformBlock was not linked to any target block. It would happily continue crunching numbers by itself, until some hard limit was hit, or the memory of the machine was exhausted.
I am using following example .
using System;
using System.Threading;
public class MyThread
{
public void Thread1()
{
for (int i = 0; i < 10; i++)
{
Console.WriteLine("Hello world " + i);
Thread.Sleep(1000);
}
}
}
public class MyClass
{
public static void Main()
{
Console.WriteLine("Before start thread");
MyThread thr1 = new MyThread();
MyThread thr2 = new MyThread();
Thread tid1 = new Thread(new ThreadStart(thr1.Thread1) );
Thread tid2 = new Thread(new ThreadStart(thr2.Thread1) );
tid1.Start();
tid2.Start();
}
}
It works like this.
output :
Before start thread
Hello world 0
Hello world 0
Hello world 1
Hello world 1
Hello world 2
Hello world 2
Hello world 3
Hello world 3
Hello world 4
Hello world 4
Hello world 5
Hello world 5
Hello world 6
Hello world 6
Hello world 7
Hello world 7
Hello world 8
Hello world 8
Hello world 9
Hello world 9
It print i from first for loop and sleep for 1 sec move to second for loop.It prints i from second for loop sleep for 1 sec and move to first. I don't want it to context switch after every i print.I want like this after certain time after printing some value of i,sleep for some time within that period print value of i from second loop.After that expire again move back to first loop.
Ex: It suppose to print from 1 to 2-10 from first for loop and sleep move to second,it may print 1 to 2-10 from second for loop.and move to first loop.Value of i from from first loop may not be same for second.
Do you mean something like this?
class Program
{
static void Main(string[] args)
{
Task.WaitAll
(
Task.Run(() => new Work().DoWork(1)),
Task.Run(() => new Work().DoWork(2))
);
Console.ReadLine();
}
public class Work
{
public void DoWork(int taskNumber)
{
for(int i=0; i < 100; i++)
{
Console.WriteLine("Task {0} Value {1}", taskNumber, i);
}
}
}
}
so you get the following output
Task 1 Value 0
Task 1 Value 1
Task 1 Value 2
Task 1 Value 3
Task 1 Value 4
Task 1 Value 5
Task 1 Value 6
Task 1 Value 7
Task 1 Value 8
Task 1 Value 9
Task 1 Value 10
Task 1 Value 11
Task 1 Value 12
Task 2 Value 0
Task 2 Value 1
Task 2 Value 2
Task 2 Value 3
Task 2 Value 4
Task 2 Value 5
Task 2 Value 6
Task 2 Value 7
Task 2 Value 8
Task 2 Value 9
Task 2 Value 10
Task 2 Value 11
Task 2 Value 12
Task 2 Value 13
Task 2 Value 14
Task 2 Value 15
Task 2 Value 16
Task 2 Value 17
Task 2 Value 18
Task 2 Value 19
Task 2 Value 20
Task 2 Value 21
Task 2 Value 22
Task 2 Value 23
Task 2 Value 24
Task 2 Value 25
Task 2 Value 26
Task 2 Value 27
Task 2 Value 28
Task 2 Value 29
Task 2 Value 30
Task 2 Value 31
Task 2 Value 32
Task 2 Value 33
Task 2 Value 34
Task 2 Value 35
Task 2 Value 36
Task 1 Value 13
Task 1 Value 14
Task 1 Value 15
Task 1 Value 16
Task 1 Value 17
Task 1 Value 18
Task 1 Value 19
Task 1 Value 20
Task 1 Value 21
Task 1 Value 22
Task 1 Value 23
Task 1 Value 24
Task 1 Value 25
Task 1 Value 26
Task 1 Value 27
Task 1 Value 28
Task 1 Value 29
Task 1 Value 30
Task 1 Value 31
Task 1 Value 32
Task 1 Value 33
Task 2 Value 37
Task 2 Value 38
Task 2 Value 39
Task 2 Value 40
Task 2 Value 41
Task 2 Value 42
Task 2 Value 43
Task 2 Value 44
Task 2 Value 45
Task 2 Value 46
Task 2 Value 47
Task 2 Value 48
Task 2 Value 49
Task 2 Value 50
Task 2 Value 51
Task 2 Value 52
Task 2 Value 53
Task 2 Value 54
Task 2 Value 55
Task 1 Value 34
Task 1 Value 35
Task 1 Value 36
Task 1 Value 37
Task 1 Value 38
Task 1 Value 39
Task 1 Value 40
Task 1 Value 41
Task 1 Value 42
Task 1 Value 43
Task 1 Value 44
Task 1 Value 45
Task 1 Value 46
Task 1 Value 47
Task 1 Value 48
Task 2 Value 56
Task 2 Value 57
Task 2 Value 58
Task 2 Value 59
Task 2 Value 60
Task 2 Value 61
Task 2 Value 62
Task 2 Value 63
Task 2 Value 64
Task 2 Value 65
Task 2 Value 66
Task 2 Value 67
Task 2 Value 68
Task 2 Value 69
Task 2 Value 70
Task 2 Value 71
Task 2 Value 72
Task 2 Value 73
Task 2 Value 74
Task 2 Value 75
Task 2 Value 76
Task 1 Value 49
Task 1 Value 50
Task 1 Value 51
Task 1 Value 52
Task 1 Value 53
Task 1 Value 54
Task 1 Value 55
Task 1 Value 56
Task 1 Value 57
Task 1 Value 58
Task 1 Value 59
Task 1 Value 60
Task 1 Value 61
Task 1 Value 62
Task 1 Value 63
Task 1 Value 64
Task 1 Value 65
Task 1 Value 66
Task 1 Value 67
Task 1 Value 68
Task 1 Value 69
Task 2 Value 77
Task 2 Value 78
Task 2 Value 79
Task 2 Value 80
Task 2 Value 81
Task 2 Value 82
Task 2 Value 83
Task 2 Value 84
Task 2 Value 85
Task 2 Value 86
Task 2 Value 87
Task 2 Value 88
Task 2 Value 89
Task 2 Value 90
Task 2 Value 91
Task 2 Value 92
Task 2 Value 93
Task 2 Value 94
Task 2 Value 95
Task 2 Value 96
Task 2 Value 97
Task 2 Value 98
Task 2 Value 99
Task 1 Value 70
Task 1 Value 71
Task 1 Value 72
Task 1 Value 73
Task 1 Value 74
Task 1 Value 75
Task 1 Value 76
Task 1 Value 77
Task 1 Value 78
Task 1 Value 79
Task 1 Value 80
Task 1 Value 81
Task 1 Value 82
Task 1 Value 83
Task 1 Value 84
Task 1 Value 85
Task 1 Value 86
Task 1 Value 87
Task 1 Value 88
Task 1 Value 89
Task 1 Value 90
Task 1 Value 91
Task 1 Value 92
Task 1 Value 93
Task 1 Value 94
Task 1 Value 95
Task 1 Value 96
Task 1 Value 97
Task 1 Value 98
Task 1 Value 99
You OS will give every "Task" time for execute, is the time over it will wait until he get time again. So in this period the Task is "sleeping".
Pls let me now if this work for you.
Perhaps try looking at using Microsoft's Reactive Framework. It is designed to do this kind of thing.
Try this code:
var source = Observable.Range(0, 10);
var interval = Observable.Interval(TimeSpan.FromSeconds(1.0));
var query =
Observable
.Merge(
source.Select(n => $"A{n}").Buffer(3),
source.Select(n => $"B{n}").Buffer(3))
.SelectMany(x => x)
.Zip(interval, (n, _) => n);
query.Subscribe(i => Console.WriteLine($"Hello world {i}"));
Here's the output:
Hello world A0
Hello world A1
Hello world A2
Hello world B0
Hello world B1
Hello world B2
Hello world A3
Hello world A4
Hello world A5
Hello world B3
Hello world B4
Hello world B5
Hello world A6
Hello world A7
Hello world A8
Hello world B6
Hello world B7
Hello world B8
Hello world A9
Hello world B9
Notice the groups of 3 lots of the "A" and "B" values?
Each line is output one second after the previous one. That can be changed by changing the TimeSpan in the Interval operator.
This all happens in background threads (from the thread pool) and it lets you do all sorts of interesting filtering, grouping, querying, composition, etc.
Just NuGet "System.Reactive" and add using System.Reactive.Linq; to your code.
My scenario:
I need to process a list of elements. Each element processing is highly time consuming (1-10 seconds)
Instead of a
List retval = new List();
foreach (item in myList)
retval.Add(ProcessItem(item));
return retval;
I want to parallel process each item.
I know .NET has got a number of approach for parallel processing: what is the best one? (note, I'm stuck to 3.5 framework version, cannot use Task, async and all nancy features coming with .Net 4...)
Here my try using delegates:
private void DoTest(int processingTaskDuration)
{
List<int> itemsToProcess = new List<int>();
for (int i = 1; i <= 20; i++)
itemsToProcess.Add(i);
TestClass tc = new TestClass(processingTaskDuration);
DateTime start = DateTime.Now;
List<int> result = tc.ProcessList(itemsToProcess);
TimeSpan elapsed = DateTime.Now - start;
System.Diagnostics.Debug.WriteLine(string.Format("elapsed (msec)= {0}", (int)elapsed.TotalMilliseconds));
}
public class TestClass
{
static int s_Counter = 0;
static object s_lockObject = new Object();
int m_TaskMsecDuration = 0;
public TestClass() :
this(5000)
{
}
public TestClass(int taskMsecDuration)
{
m_TaskMsecDuration = taskMsecDuration;
}
public int LongOperation(int itemToProcess)
{
int currentCounter = 0;
lock (s_lockObject)
{
s_Counter++;
currentCounter = s_Counter;
}
System.Diagnostics.Debug.WriteLine(string.Format("LongOperation\tStart\t{0}\t{1}\t{2}", currentCounter, System.Threading.Thread.CurrentThread.ManagedThreadId, DateTime.Now.ToString("HH:mm:ss.ffffff")));
// time consuming task, e.g 5 seconds
Thread.Sleep(m_TaskMsecDuration);
int retval = itemToProcess * 2;
System.Diagnostics.Debug.WriteLine(string.Format("LongOperation\tEnd \t{0}\t{1}\t{2}", currentCounter, System.Threading.Thread.CurrentThread.ManagedThreadId, DateTime.Now.ToString("HH:mm:ss.ffffff")));
return retval;
}
delegate int LongOperationDelegate(int itemToProcess);
public List<int> ProcessList(List<int> itemsToProcess)
{
List<IAsyncResult> asyncResults = new List<IAsyncResult>();
LongOperationDelegate del = LongOperation;
foreach (int item in itemsToProcess)
{
IAsyncResult res = del.BeginInvoke(item, null, null);
asyncResults.Add(res);
}
// list of waitHandles to wait for
List<WaitHandle> waitHandles = new List<WaitHandle>();
asyncResults.ForEach(el => waitHandles.Add(el.AsyncWaitHandle));
// wait for processing every item
WaitHandle.WaitAll(waitHandles.ToArray());
// retrieve result of processing
List<int> retval = new List<int>();
asyncResults.ForEach(res =>
{
int singleProcessingResult = del.EndInvoke(res);
retval.Add(singleProcessingResult);
}
);
return retval;
}
}
And thats some output (column #3 is a progressive counter, use it to match start with end of a call, #4 is threadID and last is a timeStamp)
LongOperation Start 1 6 15:11:18.331619
LongOperation Start 2 12 15:11:18.331619
LongOperation Start 3 13 15:11:19.363722
LongOperation Start 4 14 15:11:19.895775
LongOperation Start 5 15 15:11:20.406826
LongOperation Start 6 16 15:11:21.407926
LongOperation Start 7 17 15:11:22.410026
LongOperation End 1 6 15:11:23.360121
LongOperation End 2 12 15:11:23.361122
LongOperation Start 8 12 15:11:23.363122
LongOperation Start 9 6 15:11:23.365122
LongOperation Start 10 18 15:11:23.907176
LongOperation End 3 13 15:11:24.365222
LongOperation Start 11 13 15:11:24.366222
LongOperation End 4 14 15:11:24.897275
LongOperation Start 12 14 15:11:24.898275
LongOperation Start 13 19 15:11:25.407326
LongOperation End 5 15 15:11:25.408326
LongOperation Start 14 15 15:11:25.412327
LongOperation Start 15 20 15:11:26.407426
LongOperation End 6 16 15:11:26.410426
LongOperation Start 16 16 15:11:26.410426
LongOperation Start 17 21 15:11:27.408526
LongOperation End 7 17 15:11:27.411527
LongOperation Start 18 17 15:11:27.413527
LongOperation End 8 12 15:11:28.365622
LongOperation Start 19 12 15:11:28.366622
LongOperation End 9 6 15:11:28.366622
LongOperation Start 20 6 15:11:28.389624
LongOperation End 10 18 15:11:28.908676
LongOperation End 11 13 15:11:29.367722
LongOperation End 12 14 15:11:29.899775
LongOperation End 13 19 15:11:30.411827
LongOperation End 14 15 15:11:30.413827
LongOperation End 15 20 15:11:31.407926
LongOperation End 16 16 15:11:31.411927
LongOperation End 17 21 15:11:32.413027
LongOperation End 18 17 15:11:32.416027
LongOperation End 19 12 15:11:33.389124
LongOperation End 20 6 15:11:33.391124
elapsed (msec)= 15075
So:
Is Delegate approach the right one?
Did I implement it right?
If so, why the 3rd operations starts one second after the first two (and so on)?
I mean, I'd like the whole processing complete in more or less the time of one single processing, but it seems the system uses thread pool in a strange way. After all, I'm asking 20 threads, and it waits to span the 3rd one just after the first two calls.
I think the 3.5 backport of Reactive Extensions comes with an implementation of Parallel.ForEach() that you should be able to use. The port should just contain only what was needed to get Rx to work on 3.5, but that should be enough.
Others have tried implementing it as well, basically just queuing work items on ThreadPool.
void Main()
{
var list = new List<int>{ 1,2,3 };
var processes = list.Count();
foreach (var item in list)
{
ThreadPool.QueueUserWorkItem(s => {
ProcessItem(item);
processes--;
});
}
while (processes > 0) { Thread.Sleep(10); }
}
static void ProcessItem(int item)
{
Thread.Sleep(100); // do work
}
I got rid of my third question:
If so, why the 3rd operations starts one second after the first two
(and so on)?
The problem seems to be in the default way ThreadPool manages thread spawning: see http://msdn.microsoft.com/en-us/library/0ka9477y%28v=VS.90%29.aspx. Quote:
The thread pool has a built-in delay (half a second in the .NET
Framework version 2.0) before starting new idle threads. If your
application periodically starts many tasks in a short time, a small
increase in the number of idle threads can produce a significant
increase in throughput. Setting the number of idle threads too high
consumes system resources needlessly.
It seems a call to ThreadPool.SetMinThreads with a proper value helps a lot.
At the start of my ProcessList, I inserted a call to this method:
private void SetUpThreadPool(int numThreadDesired)
{
int currentWorkerThreads;
int currentCompletionPortThreads;
ThreadPool.GetMinThreads(out currentWorkerThreads, out currentCompletionPortThreads);
//System.Diagnostics.Debug.WriteLine(string.Format("ThreadPool.GetMinThreads: workerThreads = {0}, completionPortThreads = {1}", workerThreads, completionPortThreads));
const int MAXIMUM_VALUE_FOR_SET_MIN_THREAD_PARAM = 20;
int numMinThreadToSet = Math.Min(numThreadDesired, MAXIMUM_VALUE_FOR_SET_MIN_THREAD_PARAM);
if (currentWorkerThreads < numMinThreadToSet)
ThreadPool.SetMinThreads(numThreadDesired, currentCompletionPortThreads);
}
public List<int> ProcessList(List<int> itemsToProcess)
{
SetUpThreadPool(documentNumberList.Count);
...
}
Now all thread (up to 20) start at the same moment, without delay. I think 20 is a good compromise for MAXIMUM_VALUE_FOR_SET_MIN_THREAD_PARAM: not too hight, and fits my particular requirements
Still wondering about main questions
Is Delegate approach the right one?
Did I implement it right?
Thanks to everyone helping.
My application searches many files in parallel using regex, await Task.WhenAll(filePaths.Select(FindThings));
Inside of FindThings, it spends most of it's time performing the regex search, as these files can be hundreds of mb in size.
static async Task FindThings(string path) {
string fileContent = null;
try
{
using (var reader = File.OpenText(path))
fileContent = await reader.ReadToEndAsync();
}
catch (Exception e)
{
WriteLine(lineIndex, "{0}: Error {1}", filename, e);
return;
}
var exitMatches = _exitExp.Matches(fileContent);
foreach (Match exit in exitMatches)
{
if (_taskDelay > 0)
await Task.Delay(_taskDelay);
// [...]
Is there an async version of Regex or any way to make this properly cooperative with Tasks?
Why this is important
I'm getting a lot of responses that indicate I didn't clarify why this is important. Take this example program (that uses the Nitro.Async library):
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Nito.AsyncEx;
namespace Scrap
{
class Program
{
static void Main(string[] args)
{
AsyncContext.Run(() => MainAsync(args));
}
static async void MainAsync(string[] args)
{
var tasks = new List<Task>();
var asyncStart = DateTime.Now;
tasks.Add(Task.WhenAll(Enumerable.Range(0, 10).Select(i =>
ShowIndexAsync(i, asyncStart))));
var start = DateTime.Now;
tasks.Add(Task.WhenAll(Enumerable.Range(0, 10).Select(i =>
ShowIndex(i, start))));
await Task.WhenAll(tasks);
Console.ReadLine();
}
static async Task ShowIndexAsync(int index, DateTime start)
{
Console.WriteLine("ShowIndexAsync: {0} ({1})",
index, DateTime.Now - start);
await Task.Delay(index * 100);
Console.WriteLine("!ShowIndexAsync: {0} ({1})",
index, DateTime.Now - start);
}
static Task ShowIndex(int index, DateTime start)
{
return Task.Factory.StartNew(() => {
Console.WriteLine("ShowIndex: {0} ({1})",
index, DateTime.Now - start);
Task.Delay(index * 100).Wait();
Console.WriteLine("!ShowIndex: {0} ({1})",
index, DateTime.Now - start);
});
}
}
}
So this calls ShowIndexAsync 10 times then ShowIndex 10 times and waits for them to finish.
ShowIndexAsync is "async to the core" while ShowIndex is not, but they both operate on tasks. The blocking operation here is Task.Delay, and the difference being one awaits that task, while the other .Wait()'s it inside of a task.
You'd expect the first ones to be queued (ShowIndexAsync) to finish first but you'd be incorrect.
ShowIndexAsync: 0 (00:00:00.0060000)
!ShowIndexAsync: 0 (00:00:00.0070000)
ShowIndexAsync: 1 (00:00:00.0080000)
ShowIndexAsync: 2 (00:00:00.0110000)
ShowIndexAsync: 3 (00:00:00.0110000)
ShowIndexAsync: 4 (00:00:00.0120000)
ShowIndexAsync: 5 (00:00:00.0130000)
ShowIndexAsync: 6 (00:00:00.0130000)
ShowIndexAsync: 7 (00:00:00.0140000)
ShowIndexAsync: 8 (00:00:00.0150000)
ShowIndexAsync: 9 (00:00:00.0150000)
ShowIndex: 0 (00:00:00.0020000)
!ShowIndex: 0 (00:00:00.0020000)
ShowIndex: 1 (00:00:00.0030000)
!ShowIndex: 1 (00:00:00.1100000)
ShowIndex: 2 (00:00:00.1100000)
!ShowIndex: 2 (00:00:00.3200000)
ShowIndex: 3 (00:00:00.3200000)
!ShowIndex: 3 (00:00:00.6220000)
ShowIndex: 4 (00:00:00.6220000)
!ShowIndex: 4 (00:00:01.0280000)
ShowIndex: 5 (00:00:01.0280000)
!ShowIndex: 5 (00:00:01.5420000)
ShowIndex: 6 (00:00:01.5420000)
!ShowIndex: 6 (00:00:02.1500000)
ShowIndex: 7 (00:00:02.1510000)
!ShowIndex: 7 (00:00:02.8650000)
ShowIndex: 8 (00:00:02.8650000)
!ShowIndex: 8 (00:00:03.6660000)
ShowIndex: 9 (00:00:03.6660000)
!ShowIndex: 9 (00:00:04.5780000)
!ShowIndexAsync: 1 (00:00:04.5950000)
!ShowIndexAsync: 2 (00:00:04.5960000)
!ShowIndexAsync: 3 (00:00:04.5970000)
!ShowIndexAsync: 4 (00:00:04.5970000)
!ShowIndexAsync: 5 (00:00:04.5980000)
!ShowIndexAsync: 6 (00:00:04.5990000)
!ShowIndexAsync: 7 (00:00:04.5990000)
!ShowIndexAsync: 8 (00:00:04.6000000)
!ShowIndexAsync: 9 (00:00:04.6010000)
Why did that happen?
The task scheduler is only going to use so many real threads. "await" compiles to a cooperative multi-tasking state machine. If you have a blocking operation that is not awaited, in this example Task.Delay(...).Wait(), but in my question, the Regex matching, it's not going to cooperate and let the task scheduler properly manage tasks.
If we change our sample program to:
static async void MainAsync(string[] args)
{
var asyncStart = DateTime.Now;
await Task.WhenAll(Enumerable.Range(0, 10).Select(i =>
ShowIndexAsync(i, asyncStart)));
var start = DateTime.Now;
await Task.WhenAll(Enumerable.Range(0, 10).Select(i =>
ShowIndex(i, start)));
Console.ReadLine();
}
Then our output changes to:
ShowIndexAsync: 0 (00:00:00.0050000)
!ShowIndexAsync: 0 (00:00:00.0050000)
ShowIndexAsync: 1 (00:00:00.0060000)
ShowIndexAsync: 2 (00:00:00.0080000)
ShowIndexAsync: 3 (00:00:00.0090000)
ShowIndexAsync: 4 (00:00:00.0090000)
ShowIndexAsync: 5 (00:00:00.0100000)
ShowIndexAsync: 6 (00:00:00.0110000)
ShowIndexAsync: 7 (00:00:00.0110000)
ShowIndexAsync: 8 (00:00:00.0120000)
ShowIndexAsync: 9 (00:00:00.0120000)
!ShowIndexAsync: 1 (00:00:00.1150000)
!ShowIndexAsync: 2 (00:00:00.2180000)
!ShowIndexAsync: 3 (00:00:00.3160000)
!ShowIndexAsync: 4 (00:00:00.4140000)
!ShowIndexAsync: 5 (00:00:00.5190000)
!ShowIndexAsync: 6 (00:00:00.6130000)
!ShowIndexAsync: 7 (00:00:00.7190000)
!ShowIndexAsync: 8 (00:00:00.8170000)
!ShowIndexAsync: 9 (00:00:00.9170000)
ShowIndex: 0 (00:00:00.0030000)
!ShowIndex: 0 (00:00:00.0040000)
ShowIndex: 3 (00:00:00.0060000)
ShowIndex: 4 (00:00:00.0090000)
ShowIndex: 2 (00:00:00.0100000)
ShowIndex: 1 (00:00:00.0100000)
ShowIndex: 5 (00:00:00.0130000)
ShowIndex: 6 (00:00:00.0130000)
ShowIndex: 7 (00:00:00.0150000)
ShowIndex: 8 (00:00:00.0180000)
!ShowIndex: 7 (00:00:00.7660000)
!ShowIndex: 6 (00:00:00.7660000)
ShowIndex: 9 (00:00:00.7660000)
!ShowIndex: 2 (00:00:00.7660000)
!ShowIndex: 5 (00:00:00.7660000)
!ShowIndex: 4 (00:00:00.7660000)
!ShowIndex: 3 (00:00:00.7660000)
!ShowIndex: 1 (00:00:00.7660000)
!ShowIndex: 8 (00:00:00.8210000)
!ShowIndex: 9 (00:00:01.6700000)
Notice how the async calls have a nice even end time distribution but the non-async code does not. The task scheduler is getting blocked because it wont create additional real threads because it's expecting cooperation.
I don't expect it to take less CPU time or the like, but my goal is to make FindThings multi-task in a cooperative manor, ie, make it "async to the core."
Regex searches are a CPU-bound operation, so they're going to take time. You can use Task.Run to push the work off to a background thread and thus keep your UI responsive, but it won't help them go any faster.
Since your searches are already in parallel, there's probably nothing more you can do. You could try using asynchronous file reads to reduce the number of blocked threads in your thread pool, but it probably won't have a huge effect.
Your current code is calling ReadToEndAsync but it needs to open the file for asynchronous access (i.e., use the FileStream constructor and explicitly ask for an asynchronous file handle by passing true for the isAsync parameter or FileOptions.Asynchronous for the options parameter).
You have a big file with alot of text and multiple possible matches. A possible answer would be diving it into 5-10 minifiles.
So take your Text File(1000 lines) and use StreamReaderto create 5 minifiles(200 lines).
Run your regex on all the minifiles using different threads.
I wonder if that would decrease runtime.
I am attempting to use async/await in a very large already existing synchronous code base. There is some global state in this code base that works fine, if kludgy, in a synchronous context, but it doesn't work in the asynchronous context of async/await.
So, my two options seem to be to either factor out the global context which woould be a very large and very time consuming task, or do something clever with when continuations run.
In order to better understand async/await and continuations, I made a test program, shown here. Shown here.
// A method to simulate an Async read of the database.
private static Task ReadAsync()
{
return Task.Factory.StartNew(() =>
{
int max = int.MaxValue / 2;
for (int i = 0; i < max; ++i)
{
}
});
}
// An async method that creates several continuations.
private static async Task ProcessMany(int i)
{
Console.WriteLine(string.Format("{0} {1}", i.ToString(), 0));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1}", i.ToString(), 1));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1}", i.ToString(), 2));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1}", i.ToString(), 3));
}
public static void Main(string[] args)
{
Queue<Task> queue = new Queue<Task>();
for (int i = 0; i < 10; ++i)
{
queue.Enqueue(ProcessMany(i));
}
// Do some synchonous processing...
Console.WriteLine("Processing... ");
for (int i = 0; i < int.MaxValue; ++i)
{
}
Console.WriteLine("Done processing... ");
queue.Dequeue().Wait();
}
After reading all about async/await, my understanding would be that none of the continuations would happen between the "Processing.. " and "Done processing... " WriteLines.
Here is some sample output.
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
Processing...
3 1
2 1
7 1
6 1
0 1
4 1
5 1
1 1
6 2
3 2
Done processing...
7 2
2 2
0 2
4 2
5 2
1 2
6 3
3 3
7 3
2 3
0 3
I would expect the single Wait() at the end of the program to potentially yield to multiple continuations while the first one finishes, but I don't understand how any continuations could run between the "Processing... " and the "Done Processing... ". I thought there might be a yield or something in the Console.WriteLine method, so I completely replaced it, but that didn't change the output.
There is clearly a gap in my understanding of async/await. How could a continuation happen when we are simply incrementing a variable? Is the compiler or CLR injecting some sort of magic here?
Thank you in advance for any help in better understanding async/await and continuations.
EDIT:
If you edit the sample code this way as per the comment by Stephen, what's going is much more obvious.
// An async method that creates several continuations.
private static async Task ProcessMany(int i)
{
Console.WriteLine(string.Format("{0} {1} {2}", i.ToString(), 0, Thread.CurrentThread.ManagedThreadId));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1} {2}", i.ToString(), 1, Thread.CurrentThread.ManagedThreadId));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1} {2}", i.ToString(), 2, Thread.CurrentThread.ManagedThreadId));
await ReadAsync();
Console.WriteLine(string.Format("{0} {1} {2}", i.ToString(), 3, Thread.CurrentThread.ManagedThreadId));
}
public static void Main(string[] args)
{
Queue<Task> queue = new Queue<Task>();
for (int i = 0; i < 10; ++i)
{
queue.Enqueue(ProcessMany(i));
}
// Do some synchonous processing...
Console.WriteLine("Processing... {0}", Thread.CurrentThread.ManagedThreadId);
for (int i = 0; i < int.MaxValue; ++i)
{
}
Console.WriteLine("Done processing... {0}", Thread.CurrentThread.ManagedThreadId);
queue.Dequeue().Wait();
}
Output:
0 0 9
1 0 9
2 0 9
3 0 9
4 0 9
5 0 9
6 0 9
7 0 9
8 0 9
9 0 9
Processing... 9
4 1 14
3 1 13
2 1 12
5 1 15
0 1 10
6 1 16
1 1 11
7 1 17
4 2 14
3 2 13
0 2 10
6 2 16
2 2 12
5 2 15
Done processing... 9
1 2 11
7 2 17
0 3 10
4 3 14
If you don't have a current SynchronizationContext or TaskScheduler, then the continuations will execute on a thread pool thread (separately from the main thread). This is the case in Console apps but you'll see very different behavior in WinForms/WPF/ASP.NET.
While you could control the continuation scheduling by using a custom TaskScheduler, that would be quite a bit of work with probably very little benefit. I'm not clear on what the problems are with your global state, but consider alternatives such as SemaphoreSlim.
As soon as you call the following line in your ProcessMany, each call to ProcessMany starts executing in a separate thread in a separate thread pool right away.
await ...;
So that's why you see a bunch of calls before your "Processing" printout. So while you have all those 10 ProcessMany calls executing, then you start running your large loop. As that large loop is running on your main thread, the 10 ProcessMany calls continue to execute in their threads, producing the additional printouts. Looks like your ProcessMany calls do not finish executing before your main thread loop, so they continue to spit out more results after your "Done Processing" printout.
I hope that clarifies the order of things for you.