I am playing and learning through async and parallel programming. I have a list of addresses and want to dns resolve them. Furthermore, I have made this function for that:
private static Task<string> ResolveAsync(string ipAddress)
{
return Task.Run(() => Dns.GetHostEntry(ipAddress).HostName);
}
Now, in the program I am resolving addresses like this, the idea is to use parallel programming:
//getting orderedClientIps before
var taskArray = new List<Task>();
foreach (var orderedClientIp in orderedClientIps)
{
var task = new Task(async () =>
{
orderedClientIp.Address = await ResolveAsync(orderedClientIp.Ip);
});
taskArray.Add(task);
task.Start();
}
Task.WaitAll(taskArray.ToArray());
foreach (var orderedClientIp in orderedClientIps)
{
Console.WriteLine($"{(orderedClientIp.Ip)} ({orderedClientIp.Ip}) - {orderedClientIp.Count}");
}
So, here we wait for all the addresses to resolve, and then in a separate iteration print them.
What interests me, what would be the difference if instead of printing in separate iteration, I would do something like this:
foreach (var orderedClientIp in orderedClientIps)
{
var task = new Task(async () =>
{
orderedClientIp.Address = await ResolveAsync(orderedClientIp.Ip);
Console.WriteLine($"{(orderedClientIp.Ip)} ({orderedClientIp.Ip}) - {orderedClientIp.Count}");
});
taskArray.Add(task);
task.Start();
}
Task.WaitAll(taskArray.ToArray());
I have tried executing, and it writes to console one by one, whereas in the first instance writes them all out after waiting them.
I think that the first approach is parallel and better, but am not quite sure of the differences. What is, in the context of async and parallel programming, different in the second approach? And, does the second approach somehow violates Task.WaitAll() line.
The difference in the output behaviour you see is simply related to the point in time where you write the output.
Second approach: "and it writes to console one by one"
That's because the code to write the output is called as soon as any task is "done". That happens at different point in time and thus you see them being output "one by one".
First approach: "in the first instance writes them all out after waiting them."
That's because you do just that in your code. Wait until all is done and then output sequentially what you have found.
Your example cannot be judged by the behaviour of the output regarding which version is better in running things parallel.
In fact, for all practical purposes they are identical. The overhead of Console.WriteLine inside the task (compared to doing the actual DNS-lookup) should be neglectable.
It would be different for compute intensive things, but then you should probably be using Parallel.ForEach anyway.
So where should you output then? It depends. If you need to show the information (here the DNS-lookup result) as soon as possible, then do it from inside the Task. If it can wait until all is done (which might take some time), then do it at the end.
The write to the console is not asyncron. because the console is per default not async. With the Console part you "syncronize" your tasks. Maybe:
var task = new Task(async () =>
{
orderedClientIp.Address = await ResolveAsync(orderedClientIp.Ip);
return $"{(orderedClientIp.Ip)} ({orderedClientIp.Ip}) - {orderedClientIp.Count}";
}).ContinueWith(previousTask => Console.WriteLine(previousTask.Result));
Related
I am trying to understand what is the best possible way of dealing with a problem that this is and also how do I over come TaskWasCancelledException where cancelling of task seems a genuine use case.
So, scenario is: I have a Huge list of names, that I want to match with 3 different Regex. If we find a match in any one of the Regex, we add the name into a ConcurrentBag and stop matching it with other Regexes. So my basic algo for this is
Iterate through all names in parallel
Start three different tasks with Regex.IsMatch
Each of these three tasks have CancellationToken
As soon as there is a match, add the item into Bag and Cancel other tasks.
Wait for all tasks to finish.
My problem is in last step, where we are waiting for tasks to finish, WaitAll throws TaskWasCancelledException. Can you please tell me what is wrong with my approach and what else can be a better way of dealing with such scenario.
Also, i have to check for task != null which again I don't understand that why and when some of the tasks are being set to null.
Parallel.ForEach(accounts, p =>
{
var can1 = new CancellationTokenSource();
var can2 = new CancellationTokenSource();
var can3 = new CancellationTokenSource();
tasks.Add(Task.Run(() =>
{
if (reg1.IsMatch(p.DisplayName))
{
bag.Add(p);
can2.Cancel();
can3.Cancel();
}
}, can1.Token));
tasks.Add(Task.Run(() =>
{
if (reg2.IsMatch(p.DisplayName))
{
bag.Add(p);
can1.Cancel();
can3.Cancel();
}
}, can2.Token));
tasks.Add(Task.Run(() =>
{
if (reg3.IsMatch(p.DisplayName))
{
bag.Add(p);
can1.Cancel();
can2.Cancel();
}
}, can3.Token));
}
);
await Task.WhenAll(tasks.Where(t => t != null).ToArray());
Unless the regex is really long running there is no need to run the 3 regexes in parallel since your name list is already processed in parallel. So the best solution is to just delete the regex task code.
Simply call the regexes one after the other in sequential code and stop when a match was found. This is fastest and easiest at the same time.
If you want to keep this then note that the TPL cannot cancel your task code for you. The token that you pass is only checked once before running your task code. .NET code is not interruptible. So cancellation cannot possibly work here.
WhenAll throwing is indeed a problem that must be mitigated. You can say:
Task.WhenAll(myRegexTasks).ContinueWith(_ => { }).Wait();
This creates a proxy task whose sole purpose it is to not throw even if the base task throws. Unfortunately, the .NET Framework still has no clean way to wait for a Task to complete but not throw.
I'll try to illustrate my question by the pointless example you see below.
using System;
using System.IO;
namespace PointlessProgram
{
class PointlessClass
{
public static void WriteFooToTextFile ( )
{
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#"C:\Users\Hillary\Documents\PointlessFolder\Foo.txt"))
{
file.Write("Foo");
}
}
public static void WriteBarToTextFile ( )
{
using (System.IO.StreamWriter file = new System.IO.StreamWriter(#"C:\Users\Hillary\Documents\PointlessFolder\Bar.txt"))
{
file.Write("Bar");
}
}
static void Main()
{
WriteFooToTextFile(); // (A)
WriteBarToTextFile(); // (B)
}
}
}
Here, (B) does not need to be run after (A) because it does not depend on any output produced by (A), and neighter does (A) depend on (B). Let's suppose my computer has 2 processors and will devote all it's processing power to running this program. Will the compiler figure out that (A) and (B) can be run in parallel, or will it run them one after the other unless I explicitely write my code to tell the machine to not wait for (A) to finish before beginning to execute (B)? And if so, is async-await how I change this execution from
===
A
---
B
===
to
========
A | B
========
???
What I find confusing about async-await is that you don't think in terms of partioning your program into independent tasks A,B, C, ...; you instead think in terms of "this task" and "everything else" and all you're doing is saying "everything else can keep running while this task runs".
Please help me understand.
Will the compiler figure out that (A) and (B) can be run in parallel?
No. The compiler doesn't automatically parallelize anything for you.
And if so, is async-await how I change this execution?
No as well. async-await doesn't really have anything to do with it.
You're code can be sequential whether it's synchronous or not. And it can be parallel whether it's asynchronous or not.
For example, synchronous and parallel:
var task1 = Task.Run(() => WriteFooToTextFile());
var task2 = Task.Run(() => WriteBarToTextFile());
Task.WaitAll(task1, task2);
And asynchronous and sequential:
await WriteFooToTextFileAsync();
await WriteBarToTextFileAsync();
By making your operations truly asynchronous you free up threads and allow them to work on other parts of your application. But if there aren't other parts they sit and wait in the ThreadPool and you don't gain much.
The above will run sequentially, imperative languages expect the developer to explicitly state the rules of execution. Most languages were initially designed to be run on single core system, they follow a Von Neumann Architecture.
https://en.wikipedia.org/wiki/Von_Neumann_architecture
So as a result, you're required to explicitly state what can safely be run in parallel. The compiler should be treated as if it's dumb.
In terms of ways to achieve a parallel task, you could use the likes of Parallel.Invoke. I'd probably just resort to Parallel invoke or Task.Run in the above instance of code.
Your code is synchronous, so A is run then B is run. To run them at the same time you should use the Task.Run() command to execute them in a separate task, then await them completing.
So:
public static async Task Write()
{
Task t1 = Task.Run(() => WriteFooToTextFile());
Task t2 = Task.Run(() => WriteBarToTextFile());
await Task.WhenAll(t1, t2);
}
with the usage of:
await PointlessClass.Write();
Note the signature of the write method is now async Task. You will have to make your event handler (or whatever triggers this code), also async.
As i3arnon mentions below, you don't have to make your code asynchronous to achieve the parallel operations. Its just generally done to stop the UI thread from blocking in WPF and WinForms applications, which is why you see the code going hand in hand in a lot of examples.
If you are using console, try:
static void Main()
{
MainAsync().Wait();
}
static async Task MainAsync()
{
Task t1 = Task.Run(() => WriteFooToTextFile());
Task t2 = Task.Run(() => WriteBarToTextFile());
await Task.WhenAll(t1, t2);
}
Refer to: Async Console App for more information.
As your program is presented, "A" will run first, then "B" will run after "A" concludes. You could implement the async/await model, however, since we don't really care about the return values, a safer and more concise may would simply be to start each method call on a new thread. We can start off with:
static void Main()
{
Task.Run(() => WriteFooToTextFile()); // (A)
Task.Run(() => WriteBarToTextFile()); // (B)
}
(To utilize the Task object, we will have to add a using System.Threading.Tasks; directive.)
A further argument to utilize the task model, is that the async/await model is not multi-threading, but merely asynchronous execution on the same application thread -- so in order to best take advantage of both CPUs, you should be executing on separate threads.
Of course now with multi-threading, we must be cognizant of a potential race condition. Since no dependence exists between each task in your example, there should be no foreseeable implications for one task to complete first versus the other. However, we do not want the program to terminate until both threads have completed; additionally, we don't want exceptions from one process interfering with the other, but we'd ideally want to handle all exceptions at once. Therefore we might implement the calls in the following manner:
static void Main()
{
try
{
Task.WaitAll(new Task[] {
Task.Run(() => WriteFooToTextFile()), // (A)
Task.Run(() => WriteBarToTextFile()) // (B)
});
}
catch (AggregateException e)
{
// Handle exception, display failure message, etc.
}
}
I have the following asynchronous method in a WPF project:
private async void RecalculateRun(Guid run_number)
{
// kick off the Full recalculation
//
await FullRecalcAsync(run_number);
// When thats done, asyncronously kick off a refresh
//
Task RefreshTask = new Task(() => RefreshResults());
await RefreshTask;
}
The first await does a load of calculations and the second takes the results and updates some bound variables. I wasn't expecting the UI to update during the second await, but I was expecting it to do so when it finished. Of course, this doesn't happen. Also, I'd just like to point out that if I call RefreshTask synchronously after the first await, it works fine.
You're creating a task without starting it. You need to call Start:
Task RefreshTask = new Task(() => RefreshResults());
RefreshTask.Start();
await RefreshTask;
Or better off use Task.Run
await Task.Run(() => RefreshResults());
Using the Task constructor directly is usually discouraged:
In general, I always recommend using Task.Factory.StartNew unless the particular situation provides a compelling reason to use the constructor followed by Start. There are a few reasons I recommend this. For one, it's generally more efficient
From "Task.Factory.StartNew" vs "new Task(...).Start"
I have method like that:
public async Task<IEnumerable<Model>> Get([FromUri]IList<string> links)
{
IList<Model> list = new List<Model>();
foreach (var link in links)
{
MyRequestAsync request = new MyRequestAsync(link);
list.Add(await request.GetResult());
}
return list;
}
But I am just wondering if it is really async because I am thinking that part list.Add(await request.GetResult()); and return list; breaking the async nature of the method.
Please correct me if I am wrong and if I am right how can I fix that?
UPDATED: for my understanding I have to so something like that C# 5.0 async await return a list return await Task.Run(() => new List<string>() {"a", "b"}); but not sure how to apply that for my case.
Your method is async, but may not be making the best use of resources.
What your method will do is enter the foreach loop, create the MyRequestAsync object, and then (at the await point) it will give up its thread until the result becomes available. Once the result is available, an appropriate thread will be found and the method will resume running. It will add the result to list and go back to the top of the loop, and repeat this whole process over and over.
But, consider this - if these requests are independent, you could, instead, make each of the requests in parallel, and then only continue running your method once all of the requests are complete. That would be something like:
public async Task<IEnumerable<Model>> Get([FromUri]IList<string> links)
{
IList<Task<Model>> list = new List<Task<Model>>();
foreach (var link in links)
{
MyRequestAsync request = new MyRequestAsync(link);
list.Add(request.GetResult());
}
return new List<Model>(await Task.WhenAll(list));
//Or just
//return await Task.WhenAll(list);
//Since we don't need to return a list
}
And, for silly points, you could re-write the whole method as:
return await Task.WhenAll(from l in links select new RequestAsync(l).GetResult());
But that may make it less readable.
In my opinion the I/O is async so the method can be called "really async".
async is meant for I/O to not block the thread when it is waiting for something (here the result), but not when it is "doing something" (here the list.Add).
That's kind of impossible to tell because anything you call could be a blocking operation. If there's a blocking operation hidden somewhere this method will block as well. If you want to make a method non-blocking and/or only use scalable async IO you must review everything that you do and that you call.
That said your code looks like it is non-blocking because it uses await (instead of, say, Task.Wait). Simplifying things a bit, this method will return on the first await operation which is probably what do need.
I'm building this program in visual studio 2010 using C# .Net4.0
The goal is to use thread and queue to improve performance.
I have a list of urls I need to process.
string[] urls = { url1, url2, url3, etc.} //up to 50 urls
I have a function that will take in each url and process them.
public void processUrl(string url) {
//some operation
}
Originally, I created a for-loop to go through each urls.
for (i = 0; i < urls.length; i++)
processUrl(urls[i]);
The method works, but the program is slow as it was going through urls one after another.
So the idea is to use threading to reduce the time, but I'm not too sure how to approach that.
Say I want to create 5 threads to process at the same time.
When I start the program, it will start processing the first 5 urls. When one is done, the program start process the 6th url; when another one is done, the program starts processing the 7th url, and so on.
The problem is, I don't know how to actually create a 'queue' of urls and be able to go through the queue and process.
Can anyone help me with this?
-- EDIT at 1:42PM --
I ran into another issue when I was running 5 process at the same time.
The processUrl function involve writing to log file. And if multiple processes timeout at the same time, they are writing to the same log file at the same time and I think that's throwing an error.
I'm assuming that's the issue because the error message I got was "The process cannot access the file 'data.log' because it is being used by another process."
The simplest option would be to just use Parallel.ForEach. Provided processUrl is thread safe, you could write:
Parallel.ForEach(urls, processUrl);
I wouldn't suggest restricting to 5 threads (the scheduler will automatically scale normally), but this can be done via:
Parallel.ForEach(urls, new ParallelOptions { MaxDegreeOfParallelism = 5}, processUrl);
That being said, URL processing is, by its nature, typically IO bound, and not CPU bound. If you could use Visual Studio 2012, a better option would be to rework this to use the new async support in the language. This would require changing your method to something more like:
public async Task ProcessUrlAsync(string url)
{
// Use await with async methods in the implementation...
You could then use the new async support in the loop:
// Create an enumerable to Tasks - this will start all async operations..
var tasks = urls.Select(url => ProcessUrlAsync(url));
await Task.WhenAll(tasks); // "Await" until they all complete
Use a Parallel Foreach with the Max Degree of Parallelism set to the number of threads you want (or leave it empty and let .NET do the work for you)
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 5;
Parallel.ForEach(urls, parallelOptions, url =>
{
processUrl(url);
});
If you really want to create threads to accomplish you task in place of using parallel execution:
Suppose that I want one thread for each URL:
string[] urls = {"url1", "url2", "url3"};
I just start a new Thread instance for each URL (or each 5 url's):
foreach (var thread in urls.Select(url => new Thread(() => DownloadUrl(url))))
thread.Start();
And the method to download your URL:
private static void DownloadUrl(string url)
{
Console.WriteLine(url);
}