async calls in foreach loops - c#

More of a concept question...
Is there any reason it would be a bad idea to make async calls to my DataContext within a foreach loop such as this?
private async Task ProcessItems(List<Item> items)
{
var modifiedItems = new List<modifiedItem>();
foreach (var item in items)
{
// **edited to reflect link between items and properties**
// var properties = await _context.Properties
// .Where(p => p.Condition == true).ToListAsync();
var properties = await _context.Properties
.Where(p => p.Condition == item.Condition).ToListAsync();
foreach (var property in properties)
{
// do something to modify 'item'
// based on the value of 'property'
// save to variable 'modifiedItem'
modifiedItems.Add(modifiedItem)
}
}
await _context.ModifiedItems.AddRangeAsync(modifiedItems);
await _context.SaveChangesAsync();
}
Since the inner foreach loop is dependent upon the properties variable, does it not start until the properties variable has been fully instantiated?
Being that the modifiedItems variable is declared outside of the parent foreach loop, is it a bad idea to asynchronously add each modifiedItem to the modifiedItems list?
Are there any properties/methods in Entity Framework, Linq, whatever that would be better suited for this kind of task? Better idea than doing embedded foreach loop?
(In case anyone wants some context... IRL, items is a list of readings from sensors. And properties are mathematical equations to convert the raw readings to meaningful data such as volume and weight in different units... then those calculated data points are being stored in a database.)

No there is none, however you miss some concept here.
Since you use an async method, the ProcessItems method should be called ProcessItemsAsync and return a task.
this would be of use to you: Async/Await - Best Practices in Asynchronous Programming
depends on your needs, it is recommended to add CancellationToken and consider exception handling as well, just becareful to not swallow exceptions.

There is no issue here using async, once you await an async call the returned object is the same.
You may want to rethink executing a DB call in a foreach if you can run it once outside the loop, and filter the data in memory to act on it. Each use case is different and you have to make sure the larger return set can be handled in memory.
Usually getting a 1000 rows from a DB once is faster than 100 rows 10 times.

In this specific case, I'd instead write it to load a single list of all properties for all sensors of interest (considering all items and based on you macAddress/sensorKey properties you've mentioned), and store that in a list. We'll call that allProperties. We await that once and avoid making repeated database calls.
Then, use LINQ To Objects to join your items to the objects in allProperties that they match. Iterate the results of that join and there's no await to do inside the loop.

Yes, this would be a bad idea.
The method you propose, performs one database query per item in your sequence of items. Apart from that this is less efficient, it also might result in an incoherent result set. Someone might have changed the database between your first and your last query, so the first returned object belongs to a different database state than the result from the last query.
A simplified example: Let's query persons, with the name of the company they work for:
Table of Companies:
Id Name
1 "Better Bakery"
2 "Google Inc"
Table of Persons
Id Name CompanyId
1 "John" 1
2 "Pete" 1
Suppose I want the Id / Name / CompnanyName of Persons with Id {1, 2}
Between your first and your second query, someone changes the name of the bakery into Best Bakery Your result would be:
PersonId Name CompanyName
1 "John" "Better Bakery"
2 "Pete" "Best Bakery"
Suddenly John and Pete work for different companies?
And what would be the result if you'd ask for Persons with Id {1, 1}? The same person would work for two different companies?
Surely this is not what you want!
Beside that the result would be incorrect, your database can improve it's queries if it knows that you want the result of several items. The database creates temporary tables to calculate the result. These tables would have to be generated only once.
So if you want one consistent set of data that represents the state of the data at the moment of your query, create your data using one query:
The same result in one database query:
var itemConditions = items.Select(item => item.Condition);
IQueryable<Property> queryProperties = dbContext.Properties
.Where(property => itemConditions.Contains(property.Condition);
In words: From every item in your input sequence of Items (in your case a list), take the propertyCondition`.
Then query the database table Properties. Keep only those rows in this table that have a Condition that is one of the Condidions in itemConditions.
Note that no query or enumeration has been performed yet. The database has not been accessed. Only the Expression in the IQueryable has been modified.
To perform the query:
List<Property> fetchedProperties = await queryProperties.ToListAsync();
If you want, you could concatenate these statements into one big LINQ statement. I doubt whether this would improve performance, it surely would deteriorate readability, and thus testability / maintainability.
Once you've fetched all desired properties in one database query, you can change them:
foreach (Property fetchedProperty in fetchedProperties)
{
// change some values
}
await dbContext.SaveChangesAsync();
As long as you keep your dbContext alive, it will remember the state of all fetched rows from all tables in dbContext.ChangeTracker.Entries. This is a sequence of DbEntityEntries. One object per fetched row. Every DbEntityEntry holds the original fetched value and the modified value. They are used to identify which items are changed and thus need to be saved in one database transaction during Savechanges()
Hence it is seldom needed to actively notify your DbContext that you changed a fetched object
The only use case I see, is when you want to change an object without first adding or fetching it, which might be dangerous, as some of the properties of the object might be changed by someone else.
You can see this in

Related

There is already an open DataReader associated with this Command without ToList()

I have the method below to load dependent data from navigation property. However, it generates an error. I can remove the error by adding ToList() or ToArray(), but I'd rather not do that for performance reasons. I also cannot set the MARS property in my web.config file because it causes a problem for other classes of the connection.
How can I solve this without using extension methods or editing my web.config?
public override void Load(IEnumerable<Ques> data)
{
if (data.Any())
{
foreach (var pstuu in data)
{
if (pstuu?.Id_user != null)
{
db.Entry(pstuu).Reference(q => q.Users).Load();
}
}
}
}
I take it from this question you've got a situation something like:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
LoadDependent(query);
Chances are based on this method it's probably a call stack of various methods that build search expressions and such, but ultimately what gets passed into LoadDependent() is an IQueryable<TEntity>.
Instead if you call:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
var data = query.ToList();
LoadDependent(data);
Or.. in your LoadDependent changing doing something like:
base.LoadDependent(data);
data = data.ToList();
or better,
foreach (Ques qst in data.ToList())
Then your LoadDependent() call works, but in the first example you get an error that a DataReader is open. This is because your foreach call as-is would be iterating over the IQueryable meaning EF's data reader would be left open so further calls to db, which I'd assume is a module level variable for the DbContext that is injected, cannot be made.
Replacing this:
db.Entry(qst).Reference(q => q.AspNetUsers).Load();
with this:
db.Entry(qst).Reference(q => q.AspNetUsers).LoadAsync();
... does not actually work. This just delegates the load call asynchronously, and without awaiting it, it too would fail, just not raise the exception on the continuation thread.
As mentioned in the comments to your question this is a very poor design choice to handle loading references. You are far, far better off enabling lazy loading and taking the Select n+1 hit if/when a reference is actually needed if you aren't going to implement the initial fetch properly with either eager loading or projection.
Code like this forces a Select n+1 pattern throughout your code.
A good example of loading a "Ques" with it's associated User eager loaded:
var ques = db.Ques
.Include(x => x.AspNetUsers)
.Where(x => x.SomeCondition == someCondition)
.ToList();
Whether "SomeCondition" results in 1 Ques returned or 1000 Ques returned, the data will execute with one query to the DB.
Select n+1 scenarios are bad because in the case where 1000 Ques are returned with a call to fetch dependencies you get:
var ques = db.Ques
.Where(x => x.SomeCondition == someCondition)
.ToList(); // 1 query.
foreach(var q in ques)
db.Entry(q).Reference(x => x.AspNetUsers).Load(); // 1 query x 1000
1001 queries run. This compounds with each reference you want to load.
Which then looks problematic where later code might want to offer pagination such as to take only 25 items where the total record count could run in the 10's of thousands or more. This is where lazy loading would be the lesser of two Select n+1 evils, as with lazy loading you know that AspNetUsers would only be selected if any returned Ques actually referenced it, and only for those Ques that actually reference it. So if the pagination only "touched" 25 rows, Lazy Loading would result in 26 queries. Lazy loading is a trap however as later code changes could inadvertently lead to performance issues appearing in seemingly unrelated areas as new referenences or code changes result in far more references being "touched" and kicking off a query.
If you are going to pursue a LoadDependent() type method then you need to ensure that it is called as late as possible, once you have a known set size to load because you will need to materialize the collection to load related entities with the same DbContext instance. (I.e. after pagination) Trying to work around it using detached instances (AsNoTracking()) or by using a completely new DbContext instance may give you some headway but will invariably lead to more problems later, as you will have a mix of tracked an untracked entities, or worse, entities tracked by different DbContexts depending on how these loaded entities are consumed.
An alternative teams pursue is rather than a LoadReference() type method would be an IncludeReference() type method. The goal here being to build .Include statements into the IQueryable. This can be done two ways, either by magic strings (property names) or by passing in expressions for the references to include. Again this can turn into a bit of a rabbit hole when handling more deeply nested references. (I.e. building .Include().ThenInclude() chains.) This avoids the Select n+1 issue by eager loading the required related data.
I have solved the problem by deletion the method Load and I have used Include() in my first query of data to show the reference data in navigation property

Queryable Linq Query Differences In Entity Framework

I have a very simple many to many table in entity framework connecting my approvals to my transactions (shown below).
I am trying to do a query inside the approval object to count the amount of transactions on the approval, which should be relatively easy.
If I do something like this then it works super fast.
int count;
EntitiesContainer dbContext = new EntitiesContainer ();
var aCnt = from a in dbContext.Approvals
where a.id == id
select a.Transactions.Count;
count = aCnt.First();
However when I do this
count = Transactions.Count;
or this
count = Transactions.AsQueryable<Transaction>().Count();
its exceedingly slow. I have traced the sql running on the server and it does indeed seem to be trying to load in all the transactions instead of just doing the COUNT query on the collection of Transactions.
Can anyone explain to me why?
Additional :
Here is how the EF model looks in regards to these two classes
UPDATE :
Thanks for all the responses, I believe where I was going wrong was to believe that the collections attached to the Approval object would execute as IQueryable. I'm going to have to execute the count against the dbContext object.
Thanks everyone.
var aCnt = from a in dbContext.Approvals
where a.id == id
select a.Transactions.Count;
EF compiles query by itself, the above query will be compiled as select count transactions
Unlike,
count = Transactions.AsQueryable<Transaction>().Count();
count = Transactions.Count;
these will select all the records from transaction and then computes the count
When you access the a.Transactions property, then you load the list of transactions (lazy loading). If you want to get the Count only, then use something like this:
dbContext.Transactions.Where(t => t.Approvals.Any(ap => ap.Id == a.Id)).Count();
where a is given Approval.
Your first method allows the counting to take place on the database server level. It will ask the database not to return the records, but to return the amount of records found. This is the most efficient method.
This is not to say that other methods can't work as efficiently, but with the other two lines, you are not making it clear in the first place that you are retrieving transactions from a join on Approvals. Instead, in the other two lines, you take the Transactions collection just by itself and do a count on that, basically forcing the collection to be filled so it can be counted.
Your first snippet causes a query to be executed on the database server. It works that because the IQueryable instance is of type ObjectQuery provided by the Entity Framework which performs the necessary translation to SQL and then execution.
The second snippet illustrates working with IEnumerable instances. Count() works on them by, in worst case, enumerating the entire collection.
In the third snippet you attempt to make the IEnumerable an IQueryable again. But the Enumerable.AsQueryable method has no way of knowing that the IEnumerable it is getting "came" from Entity Framework. The best it can do is to wrap the IEnumerable in a EnumerableQuery instance which simply dynamically compiles the expression trees given to all LINQ query operators and executes them in memory.
If you need the count to be calculated by the database server, you can either formulate the requisite query manually (that is, write what you already did in snippet one), or use the method CreateSourceQuery available to you if you're not using Code First. Note that it will really be executed on the database server, so if you have modified the collection and have not yet saved changes, the result will be different to what would be returned by calling Count directly.

Getting weird behavior when retrieving data from Microsoft CRM using LINQ

I'm having this problem accessing the Contact entity using LINQ.
I have the 2 functions below.
If I ran the 1st function and then call the 2nd one, I seemed to be missing a lot of fields in the 2nd query. Like firstname and lastname are not showing up. They just shows up as null values. If I ran the 2nd function on its own, I am getting the right data. The only fields that shows up correctly in both runs are Id, ContactId and new_username.
If I ran the 2nd function on its own, I am getting the right data.
Any ideas what am I doing wrong?
Thanks a lot
Here are the 2 functions
public List<String> GetContactsUsernameOnly()
{
IQueryable<String> _records = from _contactSet in _flinsafeContext.ContactSet
where
_contactSet.new_FAN == "username"
orderby _contactSet.new_username
select _contactSet.new_username;
return _records.ToList();
}
public List<Contact> GetContacts()
{
IQueryable<Contact> _records = from _contactSet in _flinsafeContext.ContactSet
where
_contactSet.new_FAN == "my-username-here"
orderby _contactSet.new_username
select _contactSet;
return _records.ToList();
}
It is because you are reusing the same CRM context when you call both methods (in your case _flinsafeContext)
What the context does is cache records, so the first method is returning your contact but only bringing back the new_username field.
The second method wants to return the whole record, but when it is called after the first one the record already exists in the context so it just returns that, despite only having the one field populated. It is not clever enough to lazy load the fields that have not been populated. If this method was called first, it doesn't exist in the context so will return the whole record.
There are 2 ways to get around this:
1) Don't reuse CRMContexts. Instead create a new one in each method based on a singleton IOrganizationService.
2) There is a ClearChanges() method on your context that will mean the next time you do a query it will go back to CRM and get the fields you have selected. This will also clear any unsaved Created/Updates/Deletes etc so you have to be careful around what state the context is in.
As an aside, creating a new CRM Context isn't an intensive operation so it's not often worthwhile passing contexts around and reusing them. It is creating the underlying OrganisationService that is the slowest bit.
This behaviour can be so painful, because it is horribly inefficient and slow to return the entire record so you WANT to be selecting only the fields you want for each query.
And here's how you return just the fields you want:
IEnumerable<ptl_billpayerapportionment> bpas = context.ptl_billpayerapportionmentSet
.Where(bm => bm.ptl_bill.Id == billId)
.Select(bm => new ptl_billpayerapportionment()
{
Id = bm.Id,
ptl_contact = bm.ptl_contact
})
This will ensure a much smaller sql statement will be executed against the context as the Id and ptl_contact are the only two fields being returned. But as Ben says above, further retrievals against the same entity in the same context will return nulls for fields not included in the initial select (as per the OP's question).
For bonus points, using IEnumerable and creating a new, lightweight, entity gives you access to the usual LINQ methods, e.g. .Any(), .Sum() etc. The CRM SDK doesn't like using them against var datasets, apparently.

Rules for LINQ to SQL across method boundaries

To keep my code cleaner I often try to break down parts of my data access code in LINQ to SQL into private sub-methods, just like with plain-old business logic code. Let me give a very simplistic example:
public IEnumerable<Item> GetItemsFromRepository()
{
var setA = from a in this.dataContext.TableA
where /* criteria */
select a.Prop;
return DoSubquery(setA);
}
private IEnumerable<Item> DoSubQuery(IEnumerable<DateTimeOffset> set)
{
return from item in set
where /* criteria */
select item;
}
I'm sure no one's imagination would be stretched by imagining more complex examples with deeper nesting or using results of sets to filter other queries.
My basic question is this: I've seen some significant performance differences and even exceptions being thrown by just simply reorganizing LINQ to SQL code in private methods. Can anyone explain the rules for these behaviors so that I can make informed decisions about how to write efficient, clean data access code?
Some questions I've had:
1) When does passage of System.Linq.Table instace to a method cause query execution?
2) When does using a System.Linq.Table in another query cause execution?
3) Are there limits to what types of operations (Take, First, Last, order by, etc.) can be applied to System.Linq.Table passed a parameters into a method?
The most important rule in terms of LINQ-to-SQL would be: don't return IEnumerable<T> unless you must - as the semantic is unclear. There are two schools of thought beyond that:
if you return IQueryable<T>, it is composable, meaning the where from later queries is combined to make a single TSQL, but as a down-side, it is hard to fully test
otherwise, return List<T> or similar, so it is clear that everything beyond that point is LINQ-to-Objects
Currently, you are doing something in the middle: collapsing it to LINQ-to-Objects (via IEnumerable<T>), but without it being obvious - and keeping the connection open in the middle (again, only a problem because it isn't obvious)
Remove the implicit cast:
public IQueryable<Item> GetItemsFromRepository()
{
var setA = from a in this.dataContext.TableA
where /* criteria */
select a.Prop;
return DoSubquery(setA);
}
private IQueryable<Item> DoSubQuery(IQueryable<DateTimeOffset> set)
{
return from item in set
where /* criteria */
select item;
}
The implicit cast from IQueryable<Item> to IEnumerable<Item> is essentially the same as calling AsEnumerable() on your IQueryable<Item>. There are of course times when you want that, but you should leave things as IQueryable by default, so that the entire query can be performed on the database, rather than merely the GetItemsFromRepository() bit with the rest being done in memory.
The secondary questions:
1) When does passage of System.Linq.Table instace to a method cause query execution?
When something needs a final result, such as Max(), ToList(), etc. that is neither a queryable object, nor a loaded-as-it-goes enumerable.
Note though, that while AsEnumerable() does not cause query execution, it does mean that when execution does happen only that before the AsEnumerable() will be performed against the source datasource, this will then produce an on-demand in-memory datasource against which the rest will be performed.
2) When does using a System.Linq.Table in another query cause
execution?
The same as above. Table<T> implements IQueryable<T>. If you e.g. join two of them together, that won't yet cause anything to be executed.
3) Are there limits to what types of operations (Take,
First, Last, order by, etc.) can be applied to System.Linq.Table
passed a parameters into a method?
Those that are definted by IQueryable<T>.
Edit: Some clarification on the differences and similarities between IEnumerable and IQueryable.
Just about anything you can do on an IQueryable you can do on an IEnumerable and vice-versa, but how it's performed will be different.
Any given IQueryable implementation can be used in linq queries and will have all the linqy extension methods like Take(), Select(), GroupBy and so on.
Just how this is done, depends on the implementation. For example, System.Linq.Data.Table implements those methods by the query being turned into an SQL query, the results of which are turned into a objects on a as-loaded basis. So if mySource is a table then:
var filtered = from item in mySource
where item.ID < 23
select new{item.ID, item.Name};
foreach(var i in filtered)
Console.WriteLine(i.Name);
Gets turned into SQL like:
select id, name from mySourceTable where id < 23
And then an enumerator is created from that such that on each call to MoveNext() another row is read from the results, and a new anonymous object created from it.
On the other hand, if mySource where a List or a HashSet, or anything else that implements IEnumerable<T> but doesn't have its own query engine, then the linq-to-objects code will turn it into something like:
foreach(var item in mySource)
if(item.ID < 23)
yield return new {item.ID, item.Name};
Which is about as efficiently as that code could be done in memory. The results will be the same, but the way to get them, would be different:
Now, since all IQueryable<T> can be converted into the equivalent IEnumerable<T> we could, if we wanted to, take the first mySource (where execution happens in a database) and do the following instead:
var filtered = from item in mySource.AsEnumerable()
where item.ID < 23
select new{item.ID, item.Name};
Here, while there is still nothing executed against the database until we iterate through the results or call something that examines all of those results, once we do so, it's as if we split the execution into two separate steps:
var asEnum = mySource.AsEnumerable();
var filtered = from item in asEnum
where item.ID < 23
select new{item.ID, item.Name};
The implemenatation of the first line would be to execute the SQL SELECT * FROM mySourceTable, and the execution of the rest would be like the linq-to-objects example above.
It's not hard to see how, if the database contained 10 items with an id < 23, and 50,000 items with an id higher, this is now much, much less performant.
As well as offering the explicity AsEnumerable() method, all IQueryable<T> can be implicitly cast to IEnumerable<T>. This lets us do foreach on them and use them with any other existing code that handles IEnumerable<T>, but if we accidentally do it at in inappropriate time, we can make queries much slower, and this is what was happening when your DoSubQuery was defined to take an IEnumerable<DateTimeOffset> and return an IEnumerable<Item>; it implicitly called AsEnumerable() on your IQueryable<DateTimeOffset> and your IQueryable<Item> and caused what could have been performed on the database to be performed in-memory.
For this reason, 99% of the time, we want to keep dealing in IQueryable until the very last moment.
As an example of the opposite though, just to point out that AsEnumerable() and the casts to IEnumerable<T> aren't there out of madness, we should consider two things. The first is that IEnumerable<T> lets us do things that can't be done otherwise, such as joining two completely different sources that don't know about each other (e.g. two different databases, a database and an XML file, etc.)
Another is that sometimes IEnumerable<T> is actually more efficient too. Consider:
IQueryable<IGrouping<string, int>> groupingQuery = from item in mySource select item.ID group by item.Name;
var list1 = groupingQuery.Select(grp => new {Name=grp.Key, Count=grp.Count()}).ToList();//fine
foreach(var grp in groupingQuery)//disaster!
Console.WriteLine(grp.Count());
Here groupingQuery is set up as a queryable that does some grouping, but which hasn't been executed in anyway. When we create list1, then first we create a new IQueryable based on that, and the query engine does it's best to work out what the best SQL for it is, and comes up with something like:
select name, count(id) from mySourceTable group by name
Which is pretty efficiently performed. Then the rows are turned into objects, which are then put into a list.
On the other hand, with the second query, there isn't as natural a SQL conversion for a group by that doesn't perform aggregate methods on all of the non-grouped items, so the best the query engine can come up with is to first do:
select distinct name from mySourceTable,
And then for every name it receives, to do:
select id from mySourceTable where name = '{name found in last query goes here}'
And so on, should this mean 2 SQL queries, or 200,000.
In this case, we're much better working on mySource.AsEnumerable() because here it is more efficient to grab the whole table into memory first. (Even better still would be to work on mySource.Select(item => new {item.ID, item.Name}).AsEnumerable() because then we still only retrieve the columns we care about from the database, and switch to in-memory at that point).
The last bit is worth remembering because it breaks our rule that we should stay with IQueryable<T> as long as possible. It isn't something to worry about much, but it is worth keeping an eye on if you do grouping and find yourself with a very slow query.

inserting multiple records through objectset

Is there any way to insert multiple records without a loop if you already have a collection? Granted there is little performance benefit if you commit outside the loop.
I have a context and ObjectSet<Type>. I also have a collection of Type which I want to swap in or concatenate or intersect or what have you for what's in the table. In other words, I don't want to proceed from the below:
foreach (Type r in Collection)
{
Context.ObjectSet.Add(r);
}
Context.SaveChanges();
You must always use loop. There cannot be any performance benefit in methods like AddCollection or AddRange because these methods usually works on some performance optimization where internal collection is extended for whole range and it is just copied instead of extending collection for each Add call. AddObject does much more then passing data to some internal collection so it still have to process entities one by one.
If you want to optimize performance of database inserts themselves you must move to another solution because EF doesn't have any batch or bulk data modifications. Each record is passed as single insert in separate roundtrip to the database.
You could try creating an extension method to do the looping and adding.
see below.
public static class AddHelper
{
public void AddAll(this ObjectSet<T> objectSet,IEnumerable<T> items)
{
foreach(var item in items)
{
objectSet.AddObject(item);
}
}
}
Then you could use the code below.
IEnumerable<EntityName> list = GetEntities(); //this would typically return your list
Context.ObjectSet.AddAll(list);
I'm looking for the same type of solution.
I have yet to find a way to do it directly with EF, but there is a way to do it. If you create a Stored Procedure that accepts a DataTable as a parameter, you can execute the PROC with Entity Context and pass the table as a parameter.
When you're dealing with thousands or millions of records, this is a lot better than looping through each record to be passed to DB.
Passing Table Valued Parameters to SQL Server Stored Procedures.

Categories

Resources