Are these 2 methods started almost at the same time? - c#

When I run below code, Output is this:
When I run till 300, output is this:
When I run till 100, output is this:
Does this mean that both methods started almost at the same time?
If this is true, why do we need Parallel library if we can achieve parallelism by async-await?
using System;
using System.Threading.Tasks;
class Program
{
public static void PrintX()
{
for (int i = 0; i < 500; i++) { Console.Write("x"); }
}
public static void PrintY()
{
for (int i = 0; i < 500; i++) { Console.Write("y"); }
}
public async Task RunAsync()
{
var t1 = Task.Run(() => PrintY());
var t2 = Task.Run(() => PrintX());
await t1;
await t2;
}
static void Main(string[] args)
{
Task t = new Program().RunAsync();
t.Wait();
}
}

Ultimately you're at the mercy of the thread pool here. You have enqueued two items (Task.Run), and they will be picked up and serviced at some future time. When they start is non-deterministic, and will depend on how many available threads there are, and other factors.
They will start approximately at the same time, with no guarantees of anything (perhaps not even the order in which they start). The await will be triggered against their completion - so when you call await (or even whether you call await) won't impact them in any way. They might run in parallel, but most likely they individually run fast enough that whichever one gets started first will have completed before it tries starting the second. They might even end up running consecutively on the same thread (outputting the managed thread id would be a way to see this).
As for why we need Parallel: firstly, it pre-dates async/await by a long time; secondly it does a lot of things to allow larger scale parallelization - things like running a large sequence with concurrent processing including fixed maximum parallelization.
Just to show that it can be concurrent, here's the output from a real run where I added the Environment.CurrentManagedThreadId into the output:
main: 1
y: 3
x: 4
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
definitely concurrent, but: other runs can show very different outputs

Related

Parallel.ForEach blocked on long iteration

I've been using Parallel.ForEach to do some time-consuming processing on collections of items. The processing is actually handled by an external command line tool and I cannot change that. However, it seems that the Parallel.ForEach will get "stuck" on a long running item from the collection. I've distilled the problem down and can show that Parallel.ForEach is, in fact, waiting for this long one to finish and not allowing any others through. I've written a console app to demonstrate the problem:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace testParallel
{
class Program
{
static int inloop = 0;
static int completed = 0;
static void Main(string[] args)
{
// initialize an array integers to hold the wait duration (in milliseconds)
var items = Enumerable.Repeat(10, 1000).ToArray();
// set one of the items to 10 seconds
items[50] = 10000;
// Initialize our line for reporting status
Console.Write(0.ToString("000") + " Threads, " + 0.ToString("000") + " completed");
// Start the loop in a task (to avoid SO answers having to do with the Parallel.ForEach call, itself, not being parallel)
var t = Task.Factory.StartNew(() => Process(items));
// Wait for the operations to compelte
t.Wait();
// Report finished
Console.WriteLine("\nDone!");
}
static void Process(int[] items)
{
// SpinWait (not sleep or yield or anything) for the specified duration
Parallel.ForEach(items, (msToWait) =>
{
// increment the counter for how many threads are in the loop right now
System.Threading.Interlocked.Increment(ref inloop);
// determine at what time we shoule stop spinning
var e = DateTime.Now + new TimeSpan(0, 0, 0, 0, msToWait);
// spin until the target time
while (DateTime.Now < e) /* no body -- just a hard loop */;
// count another completed
System.Threading.Interlocked.Increment(ref completed);
// we're done with this iteration
System.Threading.Interlocked.Decrement(ref inloop);
// report status
Console.Write("\r" + inloop.ToString("000") + " Threads, " + completed.ToString("000") + " completed");
});
}
}
}
Basically, I make an array of int to store the number of milliseconds a given operation takes. I set them all to 10 except for one, which I set to 10000 (so, 10 seconds). I kick off the Parallel.ForEach in a task and process each integer in a hard spin wait (so it shouldn't be yielding or sleeping or anything).
On each iteration, I report how many iterations are in the body of the loop right now, and how many iterations we have completed. Mostly, it goes along fine. However, toward the end (time-wise), it reports "001 Threads, 987 Completed".
My question is why doesn't it use 7 of the other cores to work on the remaining 13 "jobs"? This one long-running iteration should not keep it from processing other elements in the collection, right?
This example happens to be a fixed collection, but it could easily be set to be an enumerable. We wouldn't want to stop fetching the next item in the enumerable just because one was taking a long time.
I found the answer (or at least, an answer). It has to do with the chunk partitioning. The SO answer here got it for me. So basically, at the top of my "Process" function, if I change from this:
static void Process(int[] items)
{
Parallel.ForEach(items, (msToWait) => { ... });
}
to this
static void Process(int[] items)
{
var partitioner = Partitioner.Create(items, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitioner, (msToWait) => { ... });
}
it grabs the work one at a time. For the more typical case of a parallel for each, where the body doesn't take more than a second, I can certainly see chunking the sets of work. In my use case, however, each body part can take anywhere from half a second to 5 hours. I certainly would not want a bunch of the 10-second variety elements to be blocked by one 5 hour element. So, in this case, the overhead of "one-at-a-time" is well worth it.

Slow down using Async calls with lock

I was noticing an initial slow down in my process and upon taking multiple hangdumps, I was able to isolate the issue and reproduce the scenario using the following code. I am using a library that has locks and what not, which eventually calls the user side implementation of certain methods. These methods make async calls using httpclient. These async calls are made from within these locks inside the library.
Now, my theory as to what is happening (do correct me if I am wrong):
The tasks that get spun try to acquire the lock and hold on to the threads fast enough such that the first PingAsync method needs to wait for the default task scheduler to spin up a new thread for it to run on, which is 0.5 s based on the default .net scheduling algorithm. This is why I think I notice delays for total tasks greater than 32, which also increases linearly with increasing total tasks count.
The workaround:
Increase the minthreads count, which I think is treating the symptom and not the actual problem.
Another way is to have a limited concurrency to control the number of tasks fired. But these are tasks spun by a webserver for incoming httprequests and typically we will not have control over it (or will we?)
I understand that combining asyc and non-async is bad design and using sempahores' async calls would be a better way to go. Assuming I do not have control over this library, how does one go about mitigating this problem?
const int ParallelCount = 16;
const int TotalTasks = 33;
static object _lockObj = new object();
static HttpClient _httpClient = new HttpClient();
static int count = 0;
static void Main(string[] args)
{
ThreadPool.GetMinThreads(out int workerThreads, out int ioThreads);
Console.WriteLine($"Min threads count. Worker: {workerThreads}. IoThreads: {ioThreads}");
ThreadPool.GetMaxThreads(out workerThreads, out ioThreads);
Console.WriteLine($"Max threads count. Worker: {workerThreads}. IoThreads: {ioThreads}");
//var done = ThreadPool.SetMaxThreads(1024, 1000);
//ThreadPool.GetMaxThreads(out workerThreads, out ioThreads);
//Console.WriteLine($"Set Max Threads success? {done}.");
//Console.WriteLine($"Max threads count. Worker: {workerThreads}. IoThreads: {ioThreads}");
//var done = ThreadPool.SetMinThreads(1024, 1000);
//ThreadPool.GetMinThreads(out workerThreads, out ioThreads);
//Console.WriteLine($"Set Min Threads success? {done}.");
//Console.WriteLine($"Min threads count. Worker: {workerThreads}. IoThreads: {ioThreads}");
var startTime = DateTime.UtcNow;
var tasks = new List<Task>();
for (int i = 0; i < TotalTasks; i++)
{
tasks.Add(Task.Run(() => LibraryMethod()));
//while (tasks.Count > ParallelCount)
//{
// var task = Task.WhenAny(tasks.ToArray()).GetAwaiter().GetResult();
// if (task.IsFaulted)
// {
// throw task.Exception;
// }
// tasks.Remove(task);
//}
}
Task.WaitAll(tasks.ToArray());
//while (tasks.Count > 0)
//{
// var task = Task.WhenAny(tasks.ToArray()).GetAwaiter().GetResult();
// if (task.IsFaulted)
// {
// throw task.Exception;
// }
// tasks.Remove(task);
// Console.Write(".");
//}
Console.Write($"\nDone in {(DateTime.UtcNow-startTime).TotalMilliseconds}");
Console.ReadLine();
}
Assuming this is the part where library methods are called,
public static void LibraryMethod()
{
lock (_lockObj)
{
SimpleNonAsync();
}
}
Eventually, the user implementation of this method gets called which is async.
public static void SimpleNonAsync()
{
//PingAsync().Result;
//PingAsync().ConfigureAwaiter(false).Wait();
PingAsync().Wait();
}
private static async Task PingAsync()
{
Console.Write($"{Interlocked.Increment(ref count)}.");
await _httpClient.SendAsync(new HttpRequestMessage
{
RequestUri = new Uri($#"http://127.0.0.1"),
Method = HttpMethod.Get
});
}
These async calls are made from within these locks inside the library.
This is a design flaw. No one should call arbitrary code while under a lock.
That said, the locks have nothing to do with the problem you're seeing.
I understand that combining asyc and non-async is bad design and using sempahores' async calls would be a better way to go. Assuming I do not have control over this library, how does one go about mitigating this problem?
The problem is that the library is forcing your code to be synchronous. This means one thread is being blocked for every download; there's no way around that as long as the library's callbacks are synchronous.
Increase the minthreads count, which I think is treating the symptom and not the actual problem.
If you can't modify the library, then you must use one thread per request, and this becomes a viable workaround. You have to treat the symptom because you can't fix the problem (i.e., the library).
Another way is to have a limited concurrency to control the number of tasks fired. But these are tasks spun by a webserver for incoming httprequests and typically we will not have control over it (or will we?)
No; the tasks causing problems are the ones you're spinning up yourself using Task.Run. The tasks on the server are completely independent; your code can't influence or even detect them.
If you want higher concurrency without waiting for thread injection, then you'll need to increase min threads, and you'll also probably need to increase ServicePointManager.DefaultConnectionLimit. You can then continue to use Task.Run, or (as I would prefer) Parallel or Parallel LINQ to do parallel processing. One nice aspect of Parallel / Parallel LINQ is that it has built-in support for throttling, if that is also desired.

Running a long-running parallel task in the background, while allowing small async tasks to update the foreground

I have around 10 000 000 tasks that each takes from 1-10 seconds to complete. I am running those tasks on a powerful server, using 50 different threads, where each thread picks the first not-done task, runs it, and repeats.
Pseudo-code:
for i = 0 to 50:
run a new thread:
while True:
task = first available task
if no available tasks: exit thread
run task
Using this code, I can run all the tasks in parallell on any given number of threads.
In reality, the code uses C#'s Task.WhenAll, and looks like this:
ServicePointManager.DefaultConnectionLimit = threadCount; //Allow more HTTP request simultaneously
var currentIndex = -1;
var threads = new List<Task>(); //List of threads
for (int i = 0; i < threadCount; i++) //Generate the threads
{
var wc = CreateWebClient();
threads.Add(Task.Run(() =>
{
while (true) //Each thread should loop, picking the first available task, and executing it.
{
var index = Interlocked.Increment(ref currentIndex);
if (index >= tasks.Count) break;
var task = tasks[index];
RunTask(conn, wc, task, port);
}
}));
}
await Task.WhenAll(threads);
This works just as I wanted it to, but I have a problem: since this code takes a lot of time to run, I want the user to see some progress. The progress is displayed in a colored bitmap (representing a matrix), and also takes some time to generate (a few seconds).
Therefore, I want to generate this visualization on a background thread. But this other background thread is never executed. My suspicion is that it is using the same thread pool as the parallel code, and is therefore enqueued, and will not be executed before the parallel code is actually finished. (And that's a bit too late.)
Here's an example of how I generate the progress visualization:
private async void Refresh_Button_Clicked(object sender, RoutedEventArgs e)
{
var bitmap = await Task.Run(() => // <<< This task is never executed!
{
//bla, bla, various database calls, and generating a relatively large bitmap
});
//Convert the bitmap into a WPF image, and update the GUI
VisualizationImage = BitmapToImageSource(bitmap);
}
So, how could I best solve this problem? I could create a list of Tasks, where each Task represents one of my tasks, and run them with Parallel.Invoke, and pick another Thread pool (I think). But then I have to generate 10 million Task objects, instead of just 50 Task objects, running through my array of stuff to do. That sounds like it uses much more RAM than necessary. Any clever solutions to this?
EDIT:
As Panagiotis Kanavos suggested in one of his comments, I tried replacing some of my loop logic with ActionBlock, like this:
// Create an ActionBlock<int> that performs some work.
var workerBlock = new ActionBlock<ZoneTask>(
t =>
{
var wc = CreateWebClient(); //This probably generates some unnecessary overhead, but that's a problem I can solve later.
RunTask(conn, wc, t, port);
},
// Specify a maximum degree of parallelism.
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = threadCount
});
foreach (var t in tasks) //Note: the objects in the tasks array are not Task objects
workerBlock.Post(t);
workerBlock.Complete();
await workerBlock.Completion;
Note: RunTask just executes a web request using the WebClient, and parses the results. It's nothing in there that can create a dead lock.
This seems to work as the old parallelism code, except that it needs a minute or two to do the initial foreach loop to post the tasks. Is this delay really worth it?
Nevertheless, my progress task still seems to be blocked. Ignoring the Progress< T > suggestion for now, since this reduced code still suffers the same problem:
private async void Refresh_Button_Clicked(object sender, RoutedEventArgs e)
{
Debug.WriteLine("This happens");
var bitmap = await Task.Run(() =>
{
Debug.WriteLine("This does not!");
//Still doing some work here, so it's not optimized away.
};
VisualizationImage = BitmapToImageSource(bitmap);
}
So it still looks like new tasks are not executed as long as the parallell task is running. I even reduced the "MaxDegreeOfParallelism" from 50 to 5 (on a 24 core server) to see if Peter Ritchie's suggestion was right, but no change. Any other suggestions?
ANOTHER EDIT:
The issue seems to have been that I overloaded the thread pool with all my simultaneous blocking I/O calls. I replaced WebClient with HttpClient and its async-functions, and now everything seems to be working nicely.
Thanks to everyone for the great suggestions! Even though not all of them directly solved the problem, I'm sure they all improved my code. :)
.NET already provides a mechanism to report progress with the IProgress< T> and the Progress< T> implementation.
The IProgress interface allows clients to publish messages with the Report(T) class without having to worry about threading. The implementation ensures that the messages are processed in the appropriate thread, eg the UI thread. By using the simple IProgress< T> interface the background methods are decoupled from whoever processes the messages.
You can find more information in the Async in 4.5: Enabling Progress and Cancellation in Async APIs article. The cancellation and progress APIs aren't specific to the TPL. They can be used to simplify cancellation and reporting even for raw threads.
Progress< T> processes messages on the thread on which it was created. This can be done either by passing a processing delegate when the class is instantiated, or by subscribing to an event. Copying from the article:
private async void Start_Button_Click(object sender, RoutedEventArgs e)
{
//construct Progress<T>, passing ReportProgress as the Action<T>
var progressIndicator = new Progress<int>(ReportProgress);
//call async method
int uploads=await UploadPicturesAsync(GenerateTestImages(), progressIndicator);
}
where ReportProgress is a method that accepts a parameter of int. It could also accept a complex class that reported work done, messages etc.
The asynchronous method only has to use IProgress.Report, eg:
async Task<int> UploadPicturesAsync(List<Image> imageList, IProgress<int> progress)
{
int totalCount = imageList.Count;
int processCount = await Task.Run<int>(() =>
{
int tempCount = 0;
foreach (var image in imageList)
{
//await the processing and uploading logic here
int processed = await UploadAndProcessAsync(image);
if (progress != null)
{
progress.Report((tempCount * 100 / totalCount));
}
tempCount++;
}
return tempCount;
});
return processCount;
}
This decouples the background method from whoever receives and processes the progress messages.

Is async/await syntax really better than manual threading for a programmer or sometimes it is not so useful?

I often see in the Internet that async\await is a "genius innovation" in programming. Sometimes it is, but in some case as I feel it can't shorten a code needed to be written.
If I need 3 parallel tasks (downloads) and I want to do something with results of each downloads (where processing (output SUM to a console) is dependent from each downloads and I need all results simultaneously, as shown at #1), I can do it using Thread and without using async/await as here (some pseudocode because GetByteArray doesn't exist):
using System;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static int length = 0;
static HttpClient client =
new HttpClient() { MaxResponseContentBufferSize = 1000000 };
static void Main(string[] args)
{
CreateMultipleTasksAsync();
Console.ReadKey();
}
static void CreateMultipleTasksAsync()
{
Thread th1 = new Thread(ProcessURLAsync);
th1.Start("http://msdn.microsoft.com");
Thread th2 = new Thread(ProcessURLAsync);
th2.Start("http://msdn.microsoft.com/en-us/library/hh156528(VS.110).aspx");
Thread th3 = new Thread(ProcessURLAsync);
th3.Start("http://msdn.microsoft.com/en-us/library/67w7t67f.aspx");
//#2 there I need results of all 3 downloads (so it will wait for all 3 downloads being completed)
Console.WriteLine("\r\n\r\nTotal bytes returned: {0}\r\n", Program.length);
}
static void ProcessURLAsync(object urlObj)
{
string url = (string)urlObj;
var length = client.GetByteArray(url).Length;
/* //#1 there I need only a result of one current download (so it will wait only for current download being completed)
Console.WriteLine("\n{0,-58} {1}", url, length);*/
Program.length =+ length;
}
}
}
or can do it this way using async/await:
using System;
using System.Net.Http;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
CreateMultipleTasksAsync();
Console.ReadKey();
}
static async Task CreateMultipleTasksAsync()
{
HttpClient client =
new HttpClient() { MaxResponseContentBufferSize = 1000000 };
Task<byte[]> download1 =
client.GetByteArrayAsync("http://msdn.microsoft.com");
Task<byte[]> download2 =
client.GetByteArrayAsync("http://msdn.microsoft.com/en-us/library/hh156528(VS.110).aspx");
Task<byte[]> download3 =
client.GetByteArrayAsync("http://msdn.microsoft.com/en-us/library/67w7t67f.aspx");
int length1 = (await download1).Length;
int length2 = (await download2).Length;
int length3 = (await download3).Length;
//#1 there I need results of all 3 downloads (so it will wait for all 3 downloads being completed)
Console.WriteLine("\r\n\r\nTotal bytes returned: {0}\r\n", length1 + length2 + length3);
}
}
}
and the async/await-way is really more short and better because all my code is compact and all is in one method CreateMultipleTasksAsync and I shouldn't create additional methods and delegate it.
But if I want to do something with a result of a single download (independently from other downloads, as shown at #2), I need take out a piece of a code into a separate method with only 1 await modifier in this method.
I need additional method because I cant't write code this way:
...
Task<int> download1 =
ProcessURLAsync("http://msdn.microsoft.com", client);
Task<int> download2 =
ProcessURLAsync("http://msdn.microsoft.com/en-us/library/hh156528(VS.110).aspx", client);
Task<int> download3 =
ProcessURLAsync("http://msdn.microsoft.com/en-us/library/67w7t67f.aspx", client);
int length1 = (await download1).Length;
//#3
Console.WriteLine("\n{0,-58} {1}", "http://msdn.microsoft.com", length1);
int length2 = (await download2).Length;
//#4
Console.WriteLine("\n{0,-58} {1}", "http://msdn.microsoft.com/en-us/library/hh156528(VS.110).aspx", length2);
int length3 = (await download3).Length;
//#5
...
because in this code a result of download2 is always processed (output to console at #4) after processing a result of download1 (#3) and a result of download3 is always processed (output to console at #5) after processing both results of download1 (#3) and download2 (#4) (processing of different results becomes dependent from processing of other results). And an output order in this case is always the same (even if download3 has been completed much earlier then download1, anyway #5 will be displayed after #3).
But if I want #5 displayed before #3 when download3 has been completed earlier then download1, I'm forced to create an additional method ProcessURLAsync:
using System;
using System.Net.Http;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
CreateMultipleTasksAsync();
Console.ReadKey();
}
static async Task CreateMultipleTasksAsync()
{
HttpClient client =
new HttpClient() { MaxResponseContentBufferSize = 1000000 };
Task<int> download1 =
ProcessURLAsync("http://msdn.microsoft.com", client);
Task<int> download2 =
ProcessURLAsync("http://msdn.microsoft.com/en-us/library/hh156528(VS.110).aspx", client);
Task<int> download3 =
ProcessURLAsync("http://msdn.microsoft.com/en-us/library/67w7t67f.aspx", client);
int length1 = await download1;
int length2 = await download2;
int length3 = await download3;
//#1 there I need results of all 3 downloads (so it will wait for all 3 downloads being completed)
Console.WriteLine("\r\n\r\nTotal bytes returned: {0}\r\n", length1 + length2 + length3);
}
static async Task<int> ProcessURLAsync(string url, HttpClient client)
{
var length = (await client.GetByteArrayAsync(url)).Length;
//#2 there I need only a result of one current download, independently from other downloads (so it will wait only for current download being completed)
Console.WriteLine("\n{0,-58} {1}", url, length);
return length;
}
}
}
and in this case it isn't so obvious why the async/await-way is better because I need create additional method ProcessURLAsync to process a result of a single download, so my code is so much like a first example's code without async/await.
If we ignore a fact that async/await-way is obviously better than Thread-way by 3 reasons - 1) you should pass url to a delegate as object and then cast it back to string, 2) you can use only local variables to store lengths (length1, length2, length3) without a need to create static property Program.length (because you can't return a value from a delegate), 3) you can't pass more than 1 argument to a delegate so you need to make 'HttpClient client' static instead of local - does it mean that the async/await-way is really better only in situations when: 1) you need only 1 parallel task ; 2) or you need multiple parallel tasks but there is no need to process them separately from each other (only there is a need to process them altogether)?
If none of parallel tasks is dependent from each other and I should process them separately, is there any advantage of syntax async/await?
Your final code really is not comparable to threading. Using a single-threaded SynchronizationContext (default for UI apps) the ProcessURLAsync calls execute overlapped, but not in parallel. The only interruption points are uses of the await keyword.
This means that they can safely access the UI and update shared data structures without needing additional synchronization. This can result in a great reduction of code length and complexity compared to explicit threading.
(Note: As mentioned in the comments, this line that appears in your threaded false equivalent accesses a shared variable without synchronization and therefore suffers a race condition: Program.length =+ length; This is in addition to the total failure to wait for the threads to finish before printing the result. You may have additional problems if the HttpClient client object is not threadsafe.)
There are a few things here.
First, the way async works in a console application can be decidedly odd due to differences between the synchronization contexts in UIs vs. console applications, see this article.
Secondly, async isn't necessarily the same thing as multithreading. If you do something like Task.Run(...) that'll definitely run on the thread pool. However, a "standard" async operation isn't the same.
My standard illustration of this is as follows: suppose you go a restaurant with 10 people. When the waiter comes by, the first person he asks for his order isn't ready; however, the other 9 people are. Thus, the waiter asks the other 9 people for their orders and then comes back to the original guy. (It's definitely not the case that they'll get a second waiter to wait for the original guy to be ready to order and doing so probably wouldn't save much time anyway). That's how async/await typically works (the exception being that some of the Task Parallel library calls, like Thread.Run(...), actually are executing on other threads - in our illustration, bringing in a second waiter - so make sure you check the documentation for which is which).
Note that, if you're not using a synchronization context in your console application, there are much fewer guarantees as to which thread async methods will actually run on, so it might not behave exactly how you'd expect in this case.
Really, which one you end up using depends on whether your task is CPU-bound. If it's not a CPU-bound operation (i.e. it's mostly just waiting around for a result from a server, external hardware, etc.) the performance difference of using a thread vs. async probably won't be too significant - you can do the equivalent of telling the waiter to "come back to me." For CPU-bound operations, however, you probably want to put it on a separate thread.
Also, in the code sample you posted there doesn't seem to be a reason to await the result of the operations in every case; if the caller doesn't have an immediate need for the result you don't have to await it. In fact, "kicking off" the process without waiting for it can be a big performance boost. The caveat, of course, is to make sure that all of your tasks finish before you close the console application.
Hopefully that clears things up a little; if not please let me know and I can edit my answer.

await Task.Delay(foo); takes seconds instead of ms

Using a variable delay in Task.Delay randomly takes seconds instead of milliseconds when combined with a IO-like operation.
Code to reproduce:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApplication {
class Program {
static void Main(string[] args) {
Task[] wait = {
new delayTest().looper(5250, 20),
new delayTest().looper(3500, 30),
new delayTest().looper(2625, 40),
new delayTest().looper(2100, 50)
};
Task.WaitAll(wait);
Console.WriteLine("All Done");
Console.ReadLine();
}
}
class delayTest {
private Stopwatch sw = new Stopwatch();
public delayTest() {
sw.Start();
}
public async Task looper(int count, int delay) {
var start = sw.Elapsed;
Console.WriteLine("Start ({0}, {1})", count, delay);
for (int i = 0; i < count; i++) {
var before = sw.Elapsed;
var totalDelay = TimeSpan.FromMilliseconds(i * delay) + start;
double wait = (totalDelay - sw.Elapsed).TotalMilliseconds;
if (wait > 0) {
await Task.Delay((int)wait);
SpinWait.SpinUntil(() => false, 1);
}
var finalDelay = (sw.Elapsed - before).TotalMilliseconds;
if (finalDelay > 30 + delay) {
Console.WriteLine("Slow ({0}, {1}): {4} Expected {2:0.0}ms got {3:0.0}ms", count, delay, wait, finalDelay, i);
}
}
Console.WriteLine("Done ({0}, {1})", count, delay);
}
}
}
Also reported this on connect.
Leaving old question bellow, for completeness.
I am running a task that reads from a network stream, then delays for 20ms, and reads again (doing 500 reads, this should take around 10 seconds). This works well when I only read with 1 task, but strange things happen when I have multiple tasks running, some with long (60 seconds) delay. My ms-delay tasks suddenly hang half way.
I am running the following code (simplified):
var sw = Stopwatch();
sw.Start()
await Task.Delay(20); // actually delay is 10, 20, 30 or 40;
if (sw.Elapsed.TotalSeconds > 1) {
Console.WriteLine("Sleep: {0:0.00}s", sw.Elapsed.TotalSeconds);
}
This prints:
Sleep: 11.87s
(Actually it gives the 20ms delay 99% of the time, those are ignored).
This delay is almost 600 times longer than expected. The same delay happens on 3 separate threads at the same time, and they all continue again at the same time also.
The 60 second sleeping task wakes up as normal ~40 seconds after the short tasks finish.
Half the time this problem does not even happen. The other half, it has a consistent delay of 11.5-12 seconds. I would suspect a scheduling or thread-pool problem, but all threads should be free.
When I pause my program during the stuck phase, the main thread stacktrace stands on Task.WaitAll, 3 tasks are Scheduled on await Task.Delay(20) and one task is Scheduled on await Task.Delay(60000). Also there are 4 more tasks Awaiting those first 4 tasks, reporting things like '"Task 24" is waiting on this object: "Task 5313" (Owned by thread 0)'. All 4 tasks say the waiting task is owned by thread 0. There are also 4 ContinueWith tasks that I think I can ignore.
There are some other things going on, like a second console application that writes to the network stream, but one console application should not affect the other.
I am completely clueless on this one. What is going on?
Update:
Based on comments and questions:
When I run my program 4 times, 2-3 times it will hang for 10-15 seconds, 1-2 times it will operate as normal (and wont print "Sleep: {0:0.00}s".)
Thread.Count indeed goes up, but this happens regardless of the hang. I just had a run where it did not hang, and Thread.Count started at 24, wend up to 40 after 1 second, around 22 seconds the short tasks finished normal, and then Thread.Count wend down to 22 slowly over the next 40 seconds.
Some more code, full code is found in the link below. Starting clients:
List<Task> tasks = new List<Task>();
private void makeClient(int delay, int startDelay) {
Task task = new ClientConnection(this, delay, startDelay).connectAsync();
task.ContinueWith(_ => {
lock (tasks) { tasks.Remove(task); }
});
lock (tasks) { tasks.Add(task); }
}
private void start() {
DateTime start = DateTime.Now;
Console.WriteLine("Starting clients...");
int[] iList = new[] {
0,1,1,2,
10, 20, 30, 40};
foreach (int delay in iList) {
makeClient(delay, 0); ;
}
makeClient(15, 40);
Console.WriteLine("Done making");
tasks.Add(displayThreads());
waitForTasks(tasks);
Console.WriteLine("All done.");
}
private static void waitForTasks(List<Task> tasks) {
Task[] waitFor;
lock (tasks) {
waitFor = tasks.ToArray();
}
Task.WaitAll(waitFor);
}
Also, I tried to replace the Delay(20) with await Task.Run(() => Thread.Sleep(20))
Thread.Count now goes from 29 to 43 and back down to 24, however among multiple runes it never hangs.
With or without ThreadPool.SetMinThreads(500, 500), using TaskExt.Delay by noserati it does not hang. (That said, even switching over 1 line of code sometimes stops it from hanging, only to randomly continue after I restart the project 4 times, but I've tried this 6 times in a row without any problems now).
I've tried everything above with and without ThreadPool.SetMinThreads so far, never made any difference.
Update2: CODE!
Without seeing more code, it's hard to make futher guesses, but I'd like to summarize the comments, it may help someone else in the future:
We've figured out that the ThreadPool stuttering is not an issues here, as ThreadPool.SetMinThreads(500, 500) didn't help.
Is there any SynchronizationContext in place anywhere in your task workflow? Place Debug.Assert(SyncrhonizationContext.Current == null) everywhere to check for that. Use ConfigureAwait(false) with every await.
Is there any .Wait, .WaitOne, .WaitAll, WaitAny, .Result used anywhere in your code? Any lock () { ... } constructs? Monitor.Enter/Exit or any other blocking synchronization primitives?
Regarding this: I've already replaced Task.Delay(20) with Task.Yield(); Thread.Sleep(20) as a workaround, that works. But yeah, I continue to try to figure out what's going on here because the idea that Task.Delay(20) can shoot this far out of line makes it totally unusable.
This sounds worrying, indeed. It's very unlikely there's a bug in Task.Delay, but everything is possible. For the sake of experimenting, try replacing await Task.Delay(20) with await Task.Run(() => Thread.Sleep(20)), having ThreadPool.SetMinThreads(500, 500) still in-place.
I also have an experimental implementation of Delay which uses unamanaged CreateTimerQueueTimer API (unlike Task.Delay, which uses System.Threading.Timer, which in turn uses managed TimerQueue). It's available here as a gist. Feel free to try it as TaskExt.Delay instead of the standard Task.Delay. The timer callbacks are posted to ThreadPool, so ThreadPool.SetMinThreads(500, 500) still should be used for this experiment. I doubt it could make any difference, but I'd be interested to know.

Categories

Resources