Why can't I debug code in an async method? - c#

I actually started the night trying to learn more about MongoDB, but am getting hung up and the .NET await/async stuff. I am trying to implement the code shown on MongoDB's site. I've had to modify it a tad bit, so I could get my program to compile. I now have the following in my console application:
protected static IMongoClient _client;
protected static IMongoDatabase _database;
static void Main(string[] args)
{
_client = new MongoClient();
_database = _client.GetDatabase("test");
GetDataAsync();
}
private static async void GetDataAsync() //method added by me.
{
int x = await GetData();
}
private static async Task<int> GetData()
{
var collection = _database.GetCollection<BsonDocument>("restaurants");
var filter = new BsonDocument();
var count = 0;
Func<int> task = () => count; //added by me.
var result = new Task<int>(task); //added by me.
using (var cursor = await collection.FindAsync(filter)) //Debugger immediately exits here, goes back to main() and then terminates.
{
while (await cursor.MoveNextAsync())
{
var batch = cursor.Current;
foreach (var document in batch)
{
// process document
count++;
}
}
}
return count; //added by me
}
When I run the application, the debugger will call into my GetDataAsync() method which in turn calls into the GetData() method. It gets to the line using (var cursor = await collection.FindAsync(filter)) and then immediately returns to finish the main() method.
Any break points I put below that line are ignored, as are any breakpoints I put in the GetDataAsync() method. Is this code just not getting run because the program exits? Can someone explain to me what is going on?

Because you are not awaiting your GetDataAsync method. When the first await is reached the thread is returned to the caller. Since you are not waiting for the completion of the task, your console application exits and your breakpoint is not reached. You will also need to update the GetDataAsync method to return a Task rather than void. You cannot await void. You should avoid using async void for anything other than the event handler.
protected static IMongoClient _client;
protected static IMongoDatabase _database;
static void Main(string[] args)
{
_client = new MongoClient();
_database = _client.GetDatabase("test");
GetDataAsync().Wait();
// Will block the calling thread but you don't have any other solution in a console application
}
private static async Task GetDataAsync() //method added by me.
{
int x = await GetData();
}
private static async Task<int> GetData()
{
var collection = _database.GetCollection<BsonDocument>("restaurants");
var filter = new BsonDocument();
var count = 0;
Func<int> task = () => count; //added by me.
var result = new Task<int>(task); //added by me.
using (var cursor = await collection.FindAsync(filter)) //Debugger immediately exits here, goes back to main() and then terminates.
{
while (await cursor.MoveNextAsync())
{
var batch = cursor.Current;
foreach (var document in batch)
{
// process document
count++;
}
}
}
return count; //added by me
}

I' not so good with async dev and had a similar problem, however I was kicking off the async method in Main like:
Task.Run(async () => await GetDataAsync());
I think the Garbage Collector was disposing of the anonymous method as nothing had a reference to it anymore. Using Fabien's .Wait() allowed me to step through the program and debug. I am using netcore 2.1 with vs2017

Related

When will awaiting a method exit the program?

I've been reading the following post, specifically this point:
On your application’s entry point method, e.g. Main. When you await an instance that’s not yet completed, execution returns to the caller of the method. In the case of Main, this would return out of Main, effectively ending the program.
I've been trying to do that, but I can't seem to make it exit. Here's the code I have:
class Program
{
// ending main!!!
static async Task Main(string[] args)
{
Task<Task<string>> t = new Task<Task<string>>(async () =>
{
Console.WriteLine("Synchronous");
for (int i = 0; i < 10; i++)
{
Console.WriteLine("Synchronous");
Console.WriteLine("Synchronous I is: " + i);
await Task.Run(() => { Console.WriteLine("Asynchronous"); });
await Task.Run(() => { for (int j = 0; j < 1000; j++) { Console.WriteLine( "Asynchronous, program should have exited by now?"); }});
}
return "ASD";
});
t.Start();
await t;
}
}
What exactly am I missing here? Shouldn't the program exit upon hitting the line await t; or by following through the thread at await Task.Run(() => { Console.WriteLine("Asynchronous"); });?
This is a Console application's main method.
The article you linked is from 2012, which predates the addition of async support to Main(), which is why what it's talking about seems wrong.
For the current implementation, consider the following code:
public static async Task Main()
{
Console.WriteLine("Awaiting");
await Task.Delay(2000);
Console.WriteLine("Awaited");
}
This gets converted by the compiler as follows (I used "JustDecompile" to decompile this):
private static void <Main>()
{
Program.Main().GetAwaiter().GetResult();
}
public static async Task Main()
{
Console.WriteLine("Awaiting");
await Task.Delay(2000);
Console.WriteLine("Awaited");
}
Now you can see why the program doesn't exit.
The async Task Main() gets called from static void <Main>() which waits for the Task returned from async Task Main() to complete (by accessing .GetResult()) before the program exits.

Parallel.For and httpclient crash the application C#

I want to avoid application crashing problem due to parallel for loop and httpclient but I am unable to apply solutions that are provided elsewhere on the web due to my limited knowledge of programming. My code is pasted below.
class Program
{
public static List<string> words = new List<string>();
public static int count = 0;
public static string output = "";
private static HttpClient Client = new HttpClient();
public static void Main(string[] args)
{
//input path strings...
List<string> links = new List<string>();
links.AddRange(File.ReadAllLines(input));
List<string> longList = new List<string>(File.ReadAllLines(#"a.txt"));
words.AddRange(File.ReadAllLines(output1));
System.Net.ServicePointManager.DefaultConnectionLimit = 8;
count = longList.Count;
//for (int i = 0; i < longList.Count; i++)
Task.Run(() => Parallel.For(0, longList.Count, new ParallelOptions { MaxDegreeOfParallelism = 5 }, (i, loopState) =>
{
Console.WriteLine(i);
string link = #"some link" + longList[i] + "/";
try
{
if (!links.Contains(link))
{
Task.Run(async () => { await Download(link); }).Wait();
}
}
catch (System.Exception e)
{
}
}));
//}
}
public static async Task Download(string link)
{
HtmlAgilityPack.HtmlDocument document = new HtmlDocument();
document.LoadHtml(await getURL(link));
//...stuff with html agility pack
}
public static async Task<string> getURL(string link)
{
string result = "";
HttpResponseMessage response = await Client.GetAsync(link);
Console.WriteLine(response.StatusCode);
if(response.IsSuccessStatusCode)
{
HttpContent content = response.Content;
var bytes = await response.Content.ReadAsByteArrayAsync();
result = Encoding.UTF8.GetString(bytes);
}
return result;
}
}
There are solutions for example this one, but I don't know how to put await keyword in my main method, and currently the program simply exits due to its absence before Task.Run(). As you can see I have already applied a workaround regarding async Download() method to call it in main method.
I have also doubts regarding the use of same instance of httpclient in different parallel threads. Please advise me whether I should create new instance of httpclient each time.
You're right that you have to block tasks somewhere in a console application, otherwise the program will just exit before it's complete. But you're doing this more than you need to. Aim for just blocking the main thread and delegating the rest to an async method. A good practice is to create a method with a signature like private async Task MainAsyc(args), put the "guts" of your program logic there, call it from Main like this:
MainAsync(args).Wait();
In your example, move everything from Main to MainAsync. Then you're free to use await as much as you want. Task.Run and Parallel.For are explicitly consuming new threads for I/O bound work, which is unnecessary in the async world. Use Task.WhenAll instead. The last part of your MainAsync method should end up looking something like this:
await Task.WhenAll(longList.Select(async s => {
Console.WriteLine(i);
string link = #"some link" + s + "/";
try
{
if (!links.Contains(link))
{
await Download(link);
}
}
catch (System.Exception e)
{
}
}));
There is one little wrinkle here though. Your example is throttling the parallelism at 5. If you find you still need this, TPL Dataflow is a great library for throttled parallelism in the async world. Here's a simple example.
Regarding HttpClient, using a single instance across threads is completely safe and highly encouraged.

Can this c# async process be more performant?

I am working on a program which makes multiple json calls to retrieve it's data.
The data however is pretty big and when running it without async it takes 17 hours to fully process.
The fetching of the data goes as follows:
Call to a service with a page number (2000 pages in total to be processed), which returns 200 records per page.
For each record it returns, an other service needs to be called to receive the data for the current record.
I'm new to the whole async functionality and I've made an attempt using async and await and already made a performance boost but was wondering if this is the correct way of using it and if there are any other ways to increase performance?
This is the code I currently have:
static void Main(string[] args)
{
MainAsyncCall().Wait();
Console.ReadKey();
}
public static async Task MainAsyncCall()
{
ServicePointManager.DefaultConnectionLimit = 999999;
List<Task> allPages = new List<Task>();
for (int i = 0; i <= 10; i++)
{
var page = i;
allPages.Add(Task.Factory.StartNew(() => processPage(page)));
}
Task.WaitAll(allPages.ToArray());
Console.WriteLine("Finished all pages");
}
public static async Task processPage(Int32 page)
{
List<Task> players = new List<Task>();
using (var client = new HttpClient())
{
string url = "<Request URL>";
var response = client.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
dynamic item = Newtonsoft.Json.JsonConvert.DeserializeObject(content);
dynamic data = item.data;
var localPage = page;
Console.WriteLine($"Processing Page: {localPage}");
foreach (dynamic d in data)
{
players.Add(Task.Factory.StartNew(() => processPlayer(d, localPage)));
}
}
Task.WaitAll(players.ToArray());
Console.WriteLine($"Finished Page: {page}");
}
public static async Task processPlayer(dynamic player, int page)
{
using (var client = new HttpClient())
{
string url = "<Request URL>";
HttpResponseMessage response = null;
response = client.GetAsync(url).Result;
var content = await response.Content.ReadAsStringAsync();
dynamic item = Newtonsoft.Json.JsonConvert.DeserializeObject(content);
Console.WriteLine($"{page}: Processed {item.name}");
}
}
Any suggestion is welcome!
This is what it should look like to me:
static void Main(string[] args)
{
// it's okay here to use wait because we're at the root of the application
new AsyncServerCalls().MainAsyncCall().Wait();
Console.ReadKey();
}
public class AsyncServerCalls
{
// dont use static async methods
public async Task MainAsyncCall()
{
ServicePointManager.DefaultConnectionLimit = 999999;
List<Task> allPages = new List<Task>();
for (int i = 0; i <= 10; i++)
{
var page = i;
allPages.Add(processPage(page));
}
await Task.WhenAll(allPages.ToArray());
Console.WriteLine("Finished all pages");
}
public async Task processPage(Int32 page)
{
List<Task> players = new List<Task>();
using (var client = new HttpClient())
{
string url = "<Request URL>";
var response = await client.GetAsync(url)// nope .Result;
var content = await response.Content.ReadAsStringAsync(); // again never use .Result;
dynamic item = Newtonsoft.Json.JsonConvert.DeserializeObject(content);
dynamic data = item.data;
var localPage = page;
Console.WriteLine($"Processing Page: {localPage}");
foreach (dynamic d in data)
{
players.Add(processPlayer(d, localPage)); // no need to put the task unnecessarily on a different thread, let the current SynchronisationContext deal with that
}
}
await Task.WhenAll(players.ToArray()); // always await a task in an async method
Console.WriteLine($"Finished Page: {page}");
}
public async Task processPlayer(dynamic player, int page)
{
using (var client = new HttpClient())
{
string url = "<Request URL>";
HttpResponseMessage response = null;
response = await client.GetAsync(url); // don't use .Result;
var content = await response.Content.ReadAsStringAsync();
dynamic item = Newtonsoft.Json.JsonConvert.DeserializeObject(content);
Console.WriteLine($"{page}: Processed {item.name}");
}
}
}
So basially the points here are to make sure you let the SynchronisationContext do it's job. Inside a console program it should use the TaskSchedular.Default which is a ThreadPool SynchronisationContext. You can always force this by doing:
static void Main(string[] args)
{
Task.Run(() => new AsyncServerCalls().MainAsyncCall()).Wait();
Console.ReadKey();
}
Reference to Task.Run forcing Default
One thing you need to remember, which I got into trouble with last week is that you can fire hose the thread pool, i.e. spawn so many tasks that the your process just dies with insane CPU and Memory usage. So you may need to use a Semaphore to just limit the number of threads that going to be created.
I created a solution that processes a single file in multiple parts all at the same time Parallel Read it is still being worked on, but shows the uses of async stuff
Just to clarify the parallelism.
When you take a reference to all those tasks:
allPages.Add(processPage(page));
They all will be started.
When you do:
await Task.WhenAll(allPages);
This will block the current method execution until all those page processes have been executed (it won't block the current thread though, don't get these confused)
Danger Zone
If you don't want to block method execution on
Task.WhenAll
So, you can parallel all page processes for each page, then you can add that Task to an overall List<Task>.
However, the danger with this is the fire hosing... You are going to limit the number of threads you execute at some point, so where.... well that is up to you but just remember, it will happen at some point.

How to make a task more independent?

Originally I wrote my C# program using Threads and ThreadPooling, but most of the users on here were telling me Async was a better approach for more efficiency. My program sends JSON objects to a server until status code of 200 is returned and then moves on to the next task.
The problem is once one of my Tasks retrieves a status code of 200, it waits for the other Tasks to get 200 code and then move onto the next Task. I want each task to continue it's next task without waiting for other Tasks to finish (or catch up) by receiving a 200 response.
Below is my main class to run my tasks in parallel.
public static void Main (string[] args)
{
var tasks = new List<Task> ();
for (int i = 0; i < 10; i++) {
tasks.Add (getItemAsync (i));
}
Task.WhenAny (tasks);
}
Here is the getItemAsync() method that does the actual sending of information to another server. Before this method works, it needs a key. The problem also lies here, let's say I run 100 tasks, all 100 tasks will wait until every single one has a key.
public static async Task getItemAsync (int i)
{
if (key == "") {
await getProductKey (i).ConfigureAwait(false);
}
Uri url = new Uri ("http://www.website.com/");
var content = new FormUrlEncodedContent (new[] {
...
});
while (!success) {
using (HttpResponseMessage result = await client.PostAsync (url, content)) {
if (result.IsSuccessStatusCode) {
string resultContent = await result.Content.ReadAsStringAsync ().ConfigureAwait(false);
Console.WriteLine(resultContent);
success=true;
}
}
}
await doMoreAsync (i);
}
Here's the function that retrieves the keys, it uses HttpClient and gets parsed.
public static async Task getProductKey (int i)
{
string url = "http://www.website.com/key";
var stream = await client.GetStringAsync (url).ConfigureAwait(false);
var doc = new HtmlDocument ();
doc.LoadHtml (stream);
HtmlNode node = doc.DocumentNode.SelectNodes ("//input[#name='key']") [0];
try {
key = node.Attributes ["value"].Value;
} catch (Exception e) {
Console.WriteLine ("Exception: " + e);
}
}
After every task receives a key and has a 200 status code, it runs doMoreAsync(). I want any individual Task that retrieved the 200 code to run doMoreAsync() without waiting for other tasks to catch up. How do I go about this?
Your Main method is not waiting for a Task to complete, it justs fires off a bunch of async tasks and returns.
If you want to await an async task you can only do that from an async method. Workaround is to kick off an async task from the Main method and wait for its completion using the blocking Task.Wait():
public static void Main(string[] args) {
Task.Run(async () =>
{
var tasks = new List<Task> ();
for (int i = 0; i < 10; i++) {
tasks.Add (getItemAsync (i));
}
var finishedTask = await Task.WhenAny(tasks); // This awaits one task
}).Wait();
}
When you are invoking getItemAsync() from an async method, you can remove the ConfigureAwait(false) as well. ConfigureAwait(false) just makes sure some code does not execute on the UI thread.
If you want to append another task to a task, you can also append it to the previous Task directly by using ContinueWith():
getItemAsync(i).ContinueWith(anotherTask);
Your main problem seems to be that you're sharing the key field across all your concurrently-running asynchronous operations. This will inevitably lead to race hazards. Instead, you should alter your getProductKey method to return each retrieved key as the result of its asynchronous operation:
// ↓ result type of asynchronous operation
public static async Task<string> getProductKey(int i)
{
// retrieval logic here
// return key as result of asynchronous operation
return node.Attributes["value"].Value;
}
Then, you consume it like so:
public static async Task getItemAsync(int i)
{
string key;
try
{
key = await getProductKey(i).ConfigureAwait(false);
}
catch
{
// handle exceptions
}
// use local variable 'key' in further asynchronous operations
}
Finally, in your main logic, use a WaitAll to prevent the console application from terminating before all your tasks complete. Although WaitAll is a blocking call, it is acceptable in this case since you want the main thread to block.
public static void Main (string[] args)
{
var tasks = new List<Task> ();
for (int i = 0; i < 10; i++) {
tasks.Add(getItemAsync(i));
}
Task.WaitAll(tasks.ToArray());
}

Refresh DataServiceCollection

I wonder if there is a code there that refresh my DataServiceCollection that is loaded in a WPF client using BeginExecute async await Task as show below:
public static async Task<IEnumerable<TResult>> ExecuteAsync<TResult>(this DataServiceQuery<TResult> query)
{
//Thread.Sleep(10000);
var queryTask = Task.Factory.FromAsync<IEnumerable<TResult>>(query.BeginExecute(null, null), (asResult) =>
{
var result = query.EndExecute(asResult).ToList();
return result;
});
return await queryTask;
}
I call the extension method as the following:
public async void LoadData()
{
_ctx = new TheContext(the URI);
_dataSource = new DataServiceCollection<TheEntity>(_ctx);
var query = _ctx.TheEntity;
var data = await Task.Run(async () => await query.ExecuteAsync());
_dataSource.Clear(true);
_dataSource.Load(data);
}
LoadData is Called in a ViewModel
Change a field value by hand using SQL Management Studio
Call LoadData again <<<< Does not refresh!!!
Meanwhile, If I use Load method the data is refreshed without any problems as:
var query = _ctx.TheEntity;
_dataSource.Load(query);
Another issue is that I do not know how to Cancel client changes. Last question is whether the MergeOption has any effect to BeginExecute..EndExecute or it only works with Load method?
I suspect that the data context does not like being accessed from multiple threads.
I recommend that you first remove all processing from your extension method:
public static Task<IEnumerable<TResult>> ExecuteAsync<TResult>(this DataServiceQuery<TResult> query)
{
return Task.Factory.FromAsync(query.BeginExecute, query.EndExecute, null);
}
Which you should use without jumping onto a background thread:
public async Task LoadDataAsync()
{
_ctx = new TheContext(the URI);
_dataSource = new DataServiceCollection<TheEntity>(_ctx);
var query = _ctx.TheEntity;
var data = await query.ExecuteAsync();
_dataSource.Clear(true);
_dataSource.Load(data);
}

Categories

Resources