How to use Table.ExecuteQuerySegmentedAsync() with Azure Table Storage

How to use Table.ExecuteQuerySegmentedAsync() with Azure Table Storage - c#

Working with the Azure Storage Client library 2.1, I'm working on making a query of Table storage async. I created this code:
public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
var theQuery = _table.CreateQuery<TAzureTableEntity>()
.Where(tEnt => tEnt.PartitionKey == partitionKey);
TableQuerySegment<TAzureTableEntity> querySegment = null;
var returnList = new List<TAzureTableEntity>();
while(querySegment == null || querySegment.ContinuationToken != null)
{
querySegment = await theQuery.AsTableQuery()
.ExecuteSegmentedAsync(querySegment != null ?
querySegment.ContinuationToken : null);
returnList.AddRange(querySegment);
}
return returnList;
}
Let's assume there is a large set of data coming back so there will be a lot of round trips to Table Storage. The problem I have is that we're awaiting a set of data, adding it to an in-memory list, awaiting more data, adding it to the same list, awaiting yet more data, adding it to the list... and so on and so forth. Why not just wrap a Task.Factory.StartNew() around a regular TableQuery? Like so:
public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
var returnList = await Task.Factory.StartNew(() =>
table.CreateQuery<TAzureTableEntity>()
.Where(ent => ent.PartitionKey == partitionKey)
.ToList());
return returnList;
}
Doing it this way seems like we're not bouncing the SynchronizationContext back and forth so much. Or does it really matter?
Edit to Rephrase Question
What's the difference between the two scenarios mentioned above?

The difference between the two is that your second version will block a ThreadPool thread for the whole time the query is executing. This might be acceptable in a GUI application (where all you want is to execute the code somewhere other than the UI thread), but it will negate any scalability advantages of async in a server application.
Also, if you don't want your first version to return to the UI context for each roundtrip (which is a reasonable requirement), then use ConfigureAwait(false) whenever you use await:
querySegment = await theQuery.AsTableQuery()
.ExecuteSegmentedAsync(…)
.ConfigureAwait(false);
This way, all iterations after the first one will (most likely) execute on a ThreadPool thread and not on the UI context.
BTW, in your second version, you don't actually need await at all, you could just directly return the Task:
public Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
return Task.Run(() => table.CreateQuery<TAzureTableEntity>()
.Where(ent => ent.PartitionKey == partitionKey)
.ToList());
}

Not sure if this is the answer you're looking for but I still want to mention it :).
As you may already know, the 2nd method (using Task) handles continuation tokens internally and comes out of the method when all entities have been fetched whereas the 1st method fetches a set of entities (up to a maximum of 1000) and then comes out giving you the result set as well as a continuation token.
If you're interested in fetching all entities from a table, both methods can be used however the 1st one gives you the flexibility of breaking out of loop gracefully anytime, which you don't get in the 2nd one. So using the 1st function you could essentially introduce pagination concept.
Let's assume you're building a web application which shows data from a table. Further let's assume that the table contains large number of entities (let's say 100000 entities). Using 1st method, you can just fetch 1000 entities return the result back to the user and if the user wants, you can fetch next set of 1000 entities and show them to the user. You could continue doing that till the time user wants and there's data in the table. With the 2nd method the user would have to wait till all 100000 entities are fetched from the table.

Related

Asynchronously adding entries to database from ASP.NET

Let's assume that I have a list of database entries and I want to duplicate them in the database. Would a regular foreach and db.Add() work, or is there an asynchronous way I should be using with db.Add()?
PS: this is not the actual code, just an example of what I'm trying to accomplish
var pencils = await db.Pencils.Where(x => x.IsBroken == false).ToListAsync();
foreach (var pencil in pencils)
{
pencil.ID = 0;
db.Add(pencil)
}
await db.SaveChangesAsync()

If your intention is to duplicate the broken pencils, your code is slightly flawed. You are loading tracked entities, and I'd be assuming that by setting the ID to 0 you'd want to insert new rows where an identity would take over assigning new IDs. EF entities will chuck an exception if you try setting a tracked entity's PK.
Instead:
var pencils = await db.Pencils.AsNoTracking()
.Where(x => x.IsBroken == false)
.ToListAsync();
db.AddRange(pencils);
await db.SaveChangesAsync()
AsNoTracking ensures the DbContext does not track the entities loaded. This means if we use Add, or in this case AddRange to add them all at once, provided those entities are declared with an Identity-based PK (Pencil.Id) then EF treats them as new entities and they would be assigned a new PK by the underlying database when saved.
As mentioned in the comments, "awaiting" an operation does not wait for the operation. That would be more along the lines of doing:
db.SaveChangesAsync().Wait();
... which would block the current thread until the SaveChanges completed.
await marks a continuation point so that the caller of the encompassing async operation can continue and a resumption point will be picked up and executed at some point after the operation is completed. So for instance in a web action handler, the thread that ASP.Net Core or IIS allocated to action the request can be free to pick up and start processing a new request while this runs in the background. The request will get it's response populated and sent back after completion.

MongoDB c# driver - ForEachAsync vs ToListAsync

I'm using MongoDB c# driver to achieve the following: for all records satisfying certain criteria, set one of the records date fields to the value of another date field.
I was hoping to use UpdateAllAsync for that but seems there is no convenient way to do it.
So now I wonder about using ForEachAsync vs using ToListAsync:
await this.repository.Find(filter).ForEachAsync(async record =>
{
await this.repository.UpdateOneAsync(
Builders<Records>.Filter.Eq(x => x.Id, record.Id),
Builders<Records>.Update.Set(x => x.Date1, record.Date2));
});
vs
var records = await this.repository.Find(filter).ToListAsync();
foreach (var record in records)
{
await this.repository.UpdateOneAsync(
Builders<Records>.Filter.Eq(x => x.Id, record.Id),
Builders<Records>.Update.Set(x => x.Date1, record.Date2));
}
Is the first approach safe? Which approach is better in this case?

Differences
ToListAsync() Will bring all the data in memory and then iterate on them. If we are talking about a huge number of rows, this could lead to too much memory being used for a long period of time.
ForEachAsync() On the other hand, will read one row at a time in an asynchronous maner and will not bring everything to the memory. Under the hood, the Async Enumerator will just use the ReadAsync() over the result set to return the next item.
Both will run asynchronously, which means that they will not block the main thread.
Why ToListAsync
There is a limitation to the ForEachAsync extention.
Check the Signature: ForEachAsync(IQueryable, Action<Object>) It only take an Action delegate, which means that if you need any return value from your operation, you can't use it.
Nice workaround for the limitations of ForEachAsync here: Unexpected behaviour with Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ForEachAsync<T>()

IQueryable async extension methods exact time execution and returned task

I have a method that returns data from the database
public Task<List<T>> GetAsync(int someId)
{
return dbSet
.Where(x => x.Id == someId)
.ToListAsync();
}
I have a dictionary that stores Task by some key
Dictionary<int, Task<List<T>>> SomeDictionary { get; set; }
In some other place I get the Task and put it in the dictionary
var someTask = repo.GetAsync(someId);
someInstance.AddInDictionary(key, someTask);
Then I get the task and await it to get the result
var task = someInstance.GetFromDictionary(key);
List<T> result = await task;
So my questions are:
1. Where do the IQueryable translate to sql query and execute in the database:
when I call the method in the repo
var someTask = repo.GetAsync(someId);
-or when I await the task
List<T> result = await task;
2. In the dictionary, when I store Tasks
Dictionary<int, Task<List<T>>> SomeDictionary { get; set; }
...do I store only a operation that should return a result, or a operation together with the actual result? In other words, does saving the Task instead of saving the List, save any memory?
Thanks in advance.

The await keyword is effectively syntactic sugar. It works in a similar (not same) fashion to a callback, where the callback is the code proceeding the await. It does not control when or how the Task is executed.
This means that Entity Framework is responsible for scheduling and executing the Task without knowledge of await, and therefore has the ability to schedule the Task's execution however it desires.
Depending on how ToListAsync is implemented, it may contain a portion of code which executes immediately and synchronously with the current thread.
I have not seen Microsoft state when the IQueryable is translated in to SQL, or the connection to the SQL server is initiated. My guess would be that the IQueryable is translated to SQL and the connection handshake starts immediately upon calling ToListAsync, and that the remainder of the method is executed as soon as possible on one of the ThreadPool threads.
That being said, calling code should be written in such a way that it does not rely on the internal operations of the asynchronous method.
As for the second part of your question, Task is a reference type. It has the same memory overhead as any other reference type. If you declare N fields to house N Tasks, you'll end up with N * B bits, where B is 32 or 64 depending on your operating system. If you store them in a List, which is also a reference type, you'll probably end up consuming more memory, because List internally keeps a larger array than is necessary so it can append/prepend items more efficiently. If you store N Tasks in an Array of N elements, you'll end up with (N + 1) * B (could be more, not sure), since the array is also a reference.

Difference between Find and FindAsync

I am writing a very, very simple query which just gets a document from a collection according to its unique Id. After some frusteration (I am new to mongo and the async / await programming model), I figured this out:
IMongoCollection<TModel> collection = // ...
FindOptions<TModel> options = new FindOptions<TModel> { Limit = 1 };
IAsyncCursor<TModel> task = await collection.FindAsync(x => x.Id.Equals(id), options);
List<TModel> list = await task.ToListAsync();
TModel result = list.FirstOrDefault();
return result;
It works, great! But I keep seeing references to a "Find" method, and I worked this out:
IMongoCollection<TModel> collection = // ...
IFindFluent<TModel, TModel> findFluent = collection.Find(x => x.Id == id);
findFluent = findFluent.Limit(1);
TModel result = await findFluent.FirstOrDefaultAsync();
return result;
As it turns out, this too works, great!
I'm sure that there's some important reason that we have two different ways to achieve these results. What is the difference between these methodologies, and why should I choose one over the other?

The difference is in a syntax.
Find and FindAsync both allows to build asynchronous query with the same performance, only
FindAsync returns cursor which doesn't load all documents at once and provides you interface to retrieve documents one by one from DB cursor. It's helpful in case when query result is huge.
Find provides you more simple syntax through method ToListAsync where it inside retrieves documents from cursor and returns all documents at once.

Imagine that you execute this code in a web request, with invoking find method the thread of the request will be frozen until the database return results it's a synchron call, if it's a long database operation that takes seconds to complete, you will have one of the threads available to serve web request doing nothing simply waiting that database return the results, and wasting valuable resources (the number of threads in thread pool is limited).
With FindAsync, the thread of your web request will be free while is waiting the database for returning the results, this means that during the database call this thread is free to attend an another web request. When the database returns the result then the code continue execution.
For long operations like read/writes from file system, database operations, comunicate with another services, it's a good idea to use async calls. Because while you are waiting for the results, the threads are available for serve another web request. This is more scalable.
Take a look to this microsoft article https://msdn.microsoft.com/en-us/magazine/dn802603.aspx.

Which is quickest, Loop or FirstOrDefault()

I'm trying to improve performance on a WPF app as my users are saddened by the fact that one part of the system, seems to have a performance issue. The part in question is a screen which shows logged in users. The slow part logs them out, by: scanning in their employee ref and finds their child control and removes it from the parent, i.e. logs them out. This currently uses a loop.
foreach (var userControl in UsersStackPanel.Children)
{
if (userControl.Employee.EmpRef == employee.EmpRef)
{
// plus some DB stuff here
UsersStackPanel.Children.Remove(userControl);
break;
}
}
but I've an alternative which does this,
var LoggedInUser = (UI.Controls.Generic.User)UsersStackPanel.Children
.OfType<FrameworkElement>()
.FirstOrDefault(e => e.Name == "EmpRef" + employee.EmpRef);
if (LoggedInUser != null)
{
// some DB stuff here
UsersStackPanel.Children.Remove(LoggedInUser);
}
I've timed both using the Stopwatch class but the results don't point to which is better, they both return 0 miliseconds. I am wondering if the DB part is the bottle neck, i just thought I'd start with the screen to improve things there as well as the DB updates.
Any thoughts appreciated.
Dan

It seems to me that your second example should look more like this:
UI.Controls.Generic.User LoggedInUser = UsersStackPanel.Children
.OfType<UI.Controls.Generic.User>()
.FirstOrDefault(e => e.Employee.EmpRef == employee.EmpRef);
if (LoggedInUser != null)
{
// some DB stuff here
UsersStackPanel.Children.Remove(LoggedInUser);
}
But regardless, unless you have hundreds of thousands of controls in your StackPanel (and if you do, you have bigger fish to fry than this loop), the database access will completely swamp and make irrelevant any performance difference in the two looping techniques.
There's not enough context in your question to know for sure what the correct thing to do here is, but in terms of keeping the UI responsive, most likely what you'll want to wind up doing is wrapping the DB access in a helper method, and then execute that method as an awaited task, e.g. await Task.Run(() => DoSomeDBStuff()); That will let the UI thread (which is presumably the thread that executes the code you posted) continue working while the DB operations go on. When the DB stuff is done, your method will continue execution at the next statement, i.e. the call to the Remove() method.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to use Table.ExecuteQuerySegmentedAsync() with Azure Table Storage - c#

Related

Asynchronously adding entries to database from ASP.NET

MongoDB c# driver - ForEachAsync vs ToListAsync

IQueryable async extension methods exact time execution and returned task

Difference between Find and FindAsync

Which is quickest, Loop or FirstOrDefault()

Categories

Resources