Using Async and Await to break up database call (with Dapper) - c#

We're requesting thousands of objects back from Dapper and hit the parameter limit (2100) so have decided to load them in chunks.
I thought it would be a good opportunity to try Async Await - this the first time I've had a go so maybe making a school boy error!
Breakpoints are being hit, but the whole thing is just not returning. It's not throwing an error - it just seems like everything is going in a black hole!
Help please!
This was my original method - it now calls the Async method
public List<MyObject> Get(IEnumerable<int> ids)
{
return this.GetMyObjectsAsync(ids).Result.ToList();
} //Breakpoint on this final bracket never gets hit
I added this method to split the ids into chunks of 1000 and then await for the tasks to complete
private async Task<List<MyObject>> GetMyObjectsAsync(IEnumerable<int> ids)
{
var subSets = this.Partition(ids, 1000);
var tasks = subSets.Select(set => GetMyObjectsTask(set.ToArray()));
//breakpoint on the line below gets hit ...
var multiLists = await Task.WhenAll(tasks);
//breakpoint on line below never gets hit ...
var list = new List<MyObject>();
foreach (var myobj in multiLists)
{
list.AddRange(myobj);
}
return list;
}
and below is the task ...
private async Task<IEnumerable<MyObject>> GetMyObjectsTask(params int[] ids)
{
using (var db = new SqlConnection(this.connectionString))
{
//breakpoint on the line below gets hit
await db.OpenAsync();
return await db.QueryAsync<MyObject>(#"SELECT Something FROM Somewhere WHERE ID IN #Ids",
new { ids});
}
}
The following method just splits the list of ids in chunks - this appears to work fine ...
private IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int size)
{
var partition = new List<T>(size);
var counter = 0;
using (var enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
partition.Add(enumerator.Current);
counter++;
if (counter % size == 0)
{
yield return partition.ToList();
partition.Clear();
counter = 0;
}
}
if (counter != 0)
yield return partition;
}
}

You are creating a deadlock situation the way you use async/await in combination with Task<T>.Result.
The offending line is:
return this.GetMyObjectsAsync(ids).Result.ToList();
GetMyObjectsAsync(ids).Result blocks the current sync context. But GetMyObjectsAsync uses await which schedules a continuation point for the current sync context. I'm sure you can see the problem with that approach: the continuation can never be executed because the current sync contextis blocked by Task.Result.
One solution that can work in some cases would be to use ConfigureAwait(false) which means that the continuation can be run on any sync context.
But in general I think it's best to avoid Task.Result with async/await.
Please note that whether this deadlock situation actually occurs depends on the synchronization context that is used when calling the asynchronous method. For ASP.net, Windows Forms and WPF this will result in a deadlock, but as far as I know it won't for a console application. (Thanks to Marc Gravell for his comment)
Microsoft has a good article about Best Practices in Asynchronous Programming. (Thanks to ken2k)

I think parameters are case sensitive it should be:
return await db.QueryAsync<MyObject>(#"SELECT Something FROM Somewhere WHERE ID IN #ids",
new { ids});
instead of "#Ids" below in query:
return await db.QueryAsync<MyObject>(#"SELECT Something FROM Somewhere WHERE ID IN **#Ids**",
new { ids});
not sure though but just try.

Related

Await async call in for-each-loop [duplicate]

This question already has answers here:
Using async/await for multiple tasks
(8 answers)
Closed 4 years ago.
I have a method in which I'm retrieving a list of deployments. For each deployment I want to retrieve an associated release. Because all calls are made to an external API, I now have a foreach-loop in which those calls are made.
public static async Task<List<Deployment>> GetDeployments()
{
try
{
var depjson = await GetJson($"{BASEURL}release/deployments?deploymentStatus=succeeded&definitionId=2&definitionEnvironmentId=5&minStartedTime={MinDateTime}");
var deployments = (JsonConvert.DeserializeObject<DeploymentWrapper>(depjson))?.Value?.OrderByDescending(x => x.DeployedOn)?.ToList();
foreach (var deployment in deployments)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
}
return deployments;
}
catch (Exception)
{
throw;
}
}
This all works perfectly fine. However, I do not like the await in the foreach-loop at all. I also believe this is not considered good practice. I just don't see how to refactor this so the calls are made parallel, because the result of each call is used to set a property of the deployment.
I would appreciate any suggestions on how to make this method faster and, whenever possible, avoid the await-ing in the foreach-loop.
There is nothing wrong with what you are doing now. But there is a way to call all tasks at once instead of waiting for a single task, then processing it and then waiting for another one.
This is how you can turn this:
wait for one -> process -> wait for one -> process ...
into
wait for all -> process -> done
Convert this:
foreach (var deployment in deployments)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
}
To:
var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}"));
var reljsons = await Task.WhenAll(deplTasks);
for(var index = 0; index < deployments.Count; index++)
{
deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]);
}
First you take a list of unfinished tasks. Then you await it and you get a collection of results (reljson's). Then you have to deserialize them and assign to Release.
By using await Task.WhenAll() you wait for all the tasks at the same time, so you should see a performance boost from that.
Let me know if there are typos, I didn't compile this code.
Fcin suggested to start all Tasks, await for them all to finish and then start deserializing the fetched data.
However, if the first Task is already finished, but the second task not, and internally the second task is awaiting, the first task could already start deserializing. This would shorten the time that your process is idly waiting.
So instead of:
var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}"));
var reljsons = await Task.WhenAll(deplTasks);
for(var index = 0; index < deployments.Count; index++)
{
deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]);
}
I'd suggest the following slight change:
// async fetch the Release data of Deployment:
private async Task<Release> FetchReleaseDataAsync(Deployment deployment)
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
return JsonConvert.DeserializeObject<Release>(reljson);
}
// async fill the Release data of Deployment:
private async Task FillReleaseDataAsync(Deployment deployment)
{
deployment.Release = await FetchReleaseDataAsync(deployment);
}
Then your procedure is similar to the solution that Fcin suggested:
IEnumerable<Task> tasksFillDeploymentWithReleaseData = deployments.
.Select(deployment => FillReleaseDataAsync(deployment)
.ToList();
await Task.WhenAll(tasksFillDeploymentWithReleaseData);
Now if the first task has to wait while fetching the release data, the 2nd task begins and the third etc. If the first task already finished fetching the release data, but the other tasks are awaiting for their release data, the first task starts already deserializing it and assigns the result to deployment.Release, after which the first task is complete.
If for instance the 7th task got its data, but the 2nd task is still waiting, the 7th task can deserialize and assign the data to deployment.Release. Task 7 is completed.
This continues until all tasks are completed. Using this method there is less waiting time because as soon as one task has its data it is scheduled to start deserializing
If i understand you right and you want to make the var reljson = await GetJson parralel:
Try this:
Parallel.ForEach(deployments, (deployment) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
});
you might limit the number of parallel executions such as:
Parallel.ForEach(
deployments,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
(deployment) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
});
you might also want to be able to break the loop:
Parallel.ForEach(deployments, (deployment, state) =>
{
var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
if (noFurtherProcessingRequired) state.Break();
});

Starting Multiple Async Tasks and Process Them As They Complete (C#)

So I am trying to learn how to write asynchronous methods and have been banging my head to get asynchronous calls to work. What always seems to happen is the code hangs on "await" instruction until it eventually seems to time out and crash the loading form in the same method with it.
There are two main reason this is strange:
The code works flawlessly when not asynchronous and just a simple loop
I copied the MSDN code almost verbatim to convert the code to asynchronous calls here: https://msdn.microsoft.com/en-us/library/mt674889.aspx
I know there are a lot of questions already about this on the forms but I have gone through most of them and tried a lot of other ways (with the same result) and now seem to think something is fundamentally wrong after MSDN code wasn't working.
Here is the main method that is called by a background worker:
// this method loads data from each individual webPage
async Task LoadSymbolData(DoWorkEventArgs _e)
{
int MAX_THREADS = 10;
int tskCntrTtl = dataGridView1.Rows.Count;
Dictionary<string, string> newData_d = new Dictionary<string, string>(tskCntrTtl);
// we need to make copies of things that can change in a different thread
List<string> links = new List<string>(dataGridView1.Rows.Cast<DataGridViewRow>()
.Select(r => r.Cells[dbIndexs_s.url].Value.ToString()).ToList());
List<string> symbols = new List<string>(dataGridView1.Rows.Cast<DataGridViewRow>()
.Select(r => r.Cells[dbIndexs_s.symbol].Value.ToString()).ToList());
// we need to create a cancelation token once this is working
// TODO
using (LoadScreen loadScreen = new LoadScreen("Querying stock servers..."))
{
// we cant use the delegate becaus of async keywords
this.loaderScreens.Add(loadScreen);
// wait until the form is loaded so we dont get exceptions when writing to controls on that form
while ( !loadScreen.IsLoaded() );
// load the total number of operations so we can simplify incrementing the progress bar
// on seperate form instances
loadScreen.LoadProgressCntr(0, tskCntrTtl);
// try to run all async tasks since they are non-blocking threaded operations
for (int i = 0; i < tskCntrTtl; i += MAX_THREADS)
{
List<Task<string[]>> ProcessURL = new List<Task<string[]>>();
List<int> taskList = new List<int>();
// Make a list of task indexs
for (int task = i; task < i + MAX_THREADS && task < tskCntrTtl; task++)
taskList.Add(task);
// ***Create a query that, when executed, returns a collection of tasks.
IEnumerable<Task<string[]>> downloadTasksQuery =
from task in taskList select QueryHtml(loadScreen, links[task], symbols[task]);
// ***Use ToList to execute the query and start the tasks.
List<Task<string[]>> downloadTasks = downloadTasksQuery.ToList();
// ***Add a loop to process the tasks one at a time until none remain.
while (downloadTasks.Count > 0)
{
// Identify the first task that completes.
Task<string[]> firstFinishedTask = await Task.WhenAny(downloadTasks); // <---- CODE HANGS HERE
// ***Remove the selected task from the list so that you don't
// process it more than once.
downloadTasks.Remove(firstFinishedTask);
// Await the completed task.
string[] data = await firstFinishedTask;
if (!newData_d.ContainsKey(data.First()))
newData_d.Add(data.First(), data.Last());
}
}
// now we have the dictionary with all the information gathered from teh websites
// now we can add the columns if they dont already exist and load the information
// TODO
loadScreen.UpdateProgress(100);
this.loaderScreens.Remove(loadScreen);
}
}
And here is the async method for querying web pages:
async Task<string[]> QueryHtml(LoadScreen _loadScreen, string _link, string _symbol)
{
string data = String.Empty;
try
{
HttpClient client = new HttpClient();
var doc = new HtmlAgilityPack.HtmlDocument();
var html = await client.GetStringAsync(_link); // <---- CODE HANGS HERE
doc.LoadHtml(html);
string percGrn = doc.FindInnerHtml(
"//span[contains(#class,'time_rtq_content') and contains(#class,'up_g')]//span[2]");
string percRed = doc.FindInnerHtml(
"//span[contains(#class,'time_rtq_content') and contains(#class,'down_r')]//span[2]");
// create somthing we'll nuderstand later
if ((String.IsNullOrEmpty(percGrn) && String.IsNullOrEmpty(percRed)) ||
(!String.IsNullOrEmpty(percGrn) && !String.IsNullOrEmpty(percRed)))
throw new Exception();
// adding string to empty gives string
string perc = percGrn + percRed;
bool isNegative = String.IsNullOrEmpty(percGrn);
double percDouble;
if (double.TryParse(Regex.Match(perc, #"\d+([.])?(\d+)?").Value, out percDouble))
data = (isNegative ? 0 - percDouble : percDouble).ToString();
}
catch (Exception ex) { }
finally
{
// update the progress bar...
_loadScreen.IncProgressCntr();
}
return new string[] { _symbol, data };
}
I could really use some help. Thanks!
In short when you combine async with any 'regular' task functions you get a deadlock
http://olitee.com/2015/01/c-async-await-common-deadlock-scenario/
the solution is by using configureawait
var html = await client.GetStringAsync(_link).ConfigureAwait(false);
The reason you need this is because you didn't await your orginal thread.
// ***Create a query that, when executed, returns a collection of tasks.
IEnumerable<Task<string[]>> downloadTasksQuery = from task in taskList select QueryHtml(loadScreen,links[task], symbols[task]);
What's happeneing here is that you mix the await paradigm with thre regular task handling paradigm. and those don't mix (or rather you have to use the ConfigureAwait(false) for this to work.

Async and Await - How is order of execution maintained?

I am actually reading some topics about the Task Parallel Library and the asynchronous programming with async and await. The book "C# 5.0 in a Nutshell" states that when awaiting an expression using the await keyword, the compiler transforms the code into something like this:
var awaiter = expression.GetAwaiter();
awaiter.OnCompleted (() =>
{
var result = awaiter.GetResult();
Let's assume, we have this asynchronous function (also from the referred book):
async Task DisplayPrimeCounts()
{
for (int i = 0; i < 10; i++)
Console.WriteLine (await GetPrimesCountAsync (i*1000000 + 2, 1000000) +
" primes between " + (i*1000000) + " and " + ((i+1)*1000000-1));
Console.WriteLine ("Done!");
}
The call of the 'GetPrimesCountAsync' method will be enqueued and executed on a pooled thread. In general invoking multiple threads from within a for loop has the potential for introducing race conditions.
So how does the CLR ensure that the requests will be processed in the order they were made? I doubt that the compiler simply transforms the code into the above manner, since this would decouple the 'GetPrimesCountAsync' method from the for loop.
Just for the sake of simplicity, I'm going to replace your example with one that's slightly simpler, but has all of the same meaningful properties:
async Task DisplayPrimeCounts()
{
for (int i = 0; i < 10; i++)
{
var value = await SomeExpensiveComputation(i);
Console.WriteLine(value);
}
Console.WriteLine("Done!");
}
The ordering is all maintained because of the definition of your code. Let's imagine stepping through it.
This method is first called
The first line of code is the for loop, so i is initialized.
The loop check passes, so we go to the body of the loop.
SomeExpensiveComputation is called. It should return a Task<T> very quickly, but the work that it'd doing will keep going on in the background.
The rest of the method is added as a continuation to the returned task; it will continue executing when that task finishes.
After the task returned from SomeExpensiveComputation finishes, we store the result in value.
value is printed to the console.
GOTO 3; note that the existing expensive operation has already finished before we get to step 4 for the second time and start the next one.
As far as how the C# compiler actually accomplishes step 5, it does so by creating a state machine. Basically every time there is an await there's a label indicating where it left off, and at the start of the method (or after it's resumed after any continuation fires) it checks the current state, and does a goto to the spot where it left off. It also needs to hoist all local variables into fields of a new class so that the state of those local variables is maintained.
Now this transformation isn't actually done in C# code, it's done in IL, but this is sort of the morale equivalent of the code I showed above in a state machine. Note that this isn't valid C# (you cannot goto into a a for loop like this, but that restriction doesn't apply to the IL code that is actually used. There are also going to be differences between this and what C# actually does, but is should give you a basic idea of what's going on here:
internal class Foo
{
public int i;
public long value;
private int state = 0;
private Task<int> task;
int result0;
public Task Bar()
{
var tcs = new TaskCompletionSource<object>();
Action continuation = null;
continuation = () =>
{
try
{
if (state == 1)
{
goto state1;
}
for (i = 0; i < 10; i++)
{
Task<int> task = SomeExpensiveComputation(i);
var awaiter = task.GetAwaiter();
if (!awaiter.IsCompleted)
{
awaiter.OnCompleted(() =>
{
result0 = awaiter.GetResult();
continuation();
});
state = 1;
return;
}
else
{
result0 = awaiter.GetResult();
}
state1:
Console.WriteLine(value);
}
Console.WriteLine("Done!");
tcs.SetResult(true);
}
catch (Exception e)
{
tcs.SetException(e);
}
};
continuation();
}
}
Note that I've ignored task cancellation for the sake of this example, I've ignored the whole concept of capturing the current synchronization context, there's a bit more going on with error handling, etc. Don't consider this a complete implementation.
The call of the 'GetPrimesCountAsync' method will be enqueued and executed on a pooled thread.
No. await does not initiate any kind of background processing. It waits for existing processing to complete. It is up to GetPrimesCountAsync to do that (e.g. using Task.Run). It's more clear this way:
var myRunningTask = GetPrimesCountAsync();
await myRunningTask;
The loop only continues when the awaited task has completed. There is never more than one task outstanding.
So how does the CLR ensure that the requests will be processed in the order they were made?
The CLR is not involved.
I doubt that the compiler simply transforms the code into the above manner, since this would decouple the 'GetPrimesCountAsync' method from the for loop.
The transform that you shows is basically right but notice that the next loop iteration is not started right away but in the callback. That's what serializes execution.

will Task.Run() make any difference in performance?

I have a method which calls database as shown below:
BL Method to call DAO method:
public async Task<List<Classes>> GetClassesAndAddRules(string classId)
{
var classData = await Task.Run( () => _daoClass.GetClasses(classId));
//logic for adding rule
//..................................
}
DatabaseCall in DAO:
//*below method takes 1 second approx to return*
public List<Classes> GetClasses(string id)
{
var retVal = new List<Classes>();
using (var context = new test_db_context())
{
var rows = context.GetClassesById(id);
foreach (ClassesDBComplexType row in rows)
{
retVal.Add(Mapper.Map<GetClassesByClassIdOut>(row));
}
}
return retVal;
}
Is there any performance boost just my calling the DAO method using await ?
My understanding is GetClasses() will be called on a separate thread so that it doesn't block and continue processing other stuff.
Any help is appreciated.
The code you posted won't compile. From the title of your question, I'm assuming that your call actually looks like await Task.Run(() => _daoClass.GetClasses(classId));
In that case, the use of Task.Run will make a difference in performance: it will be worse.
The point of async on the server side is to free up the request thread instead of blocking it. What you're doing with await Task.Run(...) is to free up the request thread by starting work on another thread. In other words, the Task.Run code has the same amount of work to do plus thread marshaling.

Async/Await Iterate over returned Task<IEnumerable<SomeClass>>

I want to use async/await when querying a database with a HTTP call in my WPF application. It's my first time using async/await, so if you see any obvious mistake, feel free to point them out.
The problem is, that im not able to iterate over the returned collecting anymore, because it's now a Task of <IEnumerable<SomeClass>>, and from what I found out, the Task doesn't implement IEnumerable/IEnumerator.
My code look like this: The method that calls the async/await method.
private void AddProjectDrawingAndComponentsFromServerToLocalDbAsync(CreateDbContext1 db, Project item)
{
var drawings = client.GetDrawingsAsync(item.ProjectId);
..... (waiting for GetDrawingsAsync to return, so i can iterate over the drawings)
db.SaveChanges();
}
The method GetDrawingsAsync:
public async Task<IEnumerable<Drawing>> GetDrawingsAsync(int projectId)
{
var drawings = Task.Factory.StartNew(() => _client.Get(new GetDrawingsReq() { ProjectId = projectId }));
return await drawings;
}
Back to the method that waits for the async method to finish:
private void AddProjectDrawingAndComponentsFromServerToLocalDbAsync(CreateDbContext1 db, Project item)
{
var drawings = client.GetDrawingsAsync(item.ProjectId); <-- The returned Task<Task<IEnumerable<Drawing>>
Drawing prj_drawing = null;
foreach (var draw in drawings)
{
prj_drawing = new Drawing() { DrawingKey = draw.DrawingKey, Name = draw.Name, ProjectId = item.ProjectId };
item.AddDrawing(prj_drawing);
db.Projects.Add(item);
}
db.SaveChanges();
}
How would I be able to convert the returned type Task<IEnumerable<Drawing>> to something that i can iterate over in the foreach loop.
To iterate over the result of the Task, you need to get the result of the Task. And to do that, you should use await. This means you should change AddProjectDrawingAndComponentsFromServerToLocalDbAsync() into an async method:
private async Task AddProjectDrawingAndComponentsFromServerToLocalDbAsync(
CreateDbContext1 db, Project item)
{
var drawings = await client.GetDrawingsAsync(item.ProjectId);
foreach (var draw in drawings)
{
// whatever
}
db.SaveChanges();
}
This means that AddProjectDrawingAndComponentsFromServerToLocalDbAsync() (which is BTW quite a bad name for a method, I think) now returns a Task, so you should probably change the method that calls it into an async method too. This leads to “async all the way”, but that's unavoidable.

Categories

Resources