People asking me to fix an N+1 error? [duplicate] - c#

This question already has answers here:
What is the "N+1 selects problem" in ORM (Object-Relational Mapping)?
(19 answers)
Closed 8 years ago.
I have never heard of it, but people are refering to an issue in an application as an "N+1 problem". They are doing a Linq to SQL based project, and a performance problem has been identified by someone. I don't quite understand it - but hopefully someone can steer me.
It seems that they are trying to get a list of obects, and then the Foreach after that is causing too many database hits:
From what I understand, the second part of the source is only being loaded in the forwach.
So, list of items loaded:
var program = programRepository.SingleOrDefault(r => r.ProgramDetailId == programDetailId);
And then later, we make use of this list:
foreach (var phase in program.Program_Phases)
{
phase.Program_Stages.AddRange(stages.Where(s => s.PhaseId == phase.PhaseId));
phase.Program_Stages.ForEach(s =>
{
s.Program_Modules.AddRange(modules.Where(m => m.StageId == s.StageId));
});
phase.Program_Modules.AddRange(modules.Where(m => m.PhaseId == phase.PhaseId));
}
It seems the problem idetified is that, they expected 'program' to contain it's children. But when we refer to the child in the query, it reloads the proram:
program.Program_Phases
They're expecting program to be fully loaded and in memory, and profilder seems to indicate that program table, with all the joins is being called on each 'foreach'.
Does this make sense?
(EDIT: I foind this link:
Does linq to sql automatically lazy load associated entities?
This might answer my quetion, but .. they're using that nicer (where person in...) notation, as opposed to this strange (x => x....). So if this link Is the answer - i.e, we need to 'join' in the query - can that be done?)

In ORM terminology, the 'N+1 select problem' typically occurs when you have an entity that has nested collection properties. It refers to the number of queries that are needed to completely load the entity data into memory when using lazy-loading. In general, the more queries, the more round-trips from client to server and the more work the server has to do to process the queries, and this can have a huge impact on performance.
There are various techniques for avoiding this problem. I am not familiar with Linq to SQL but NHibernate supports eager fetching which helps in some cases. If you do not need to load the entire entity instance then you could also consider doing a projection query. Another possibility is to change your entity model to avoid having nested collections.

For performant linq first work out exactly what properties you actually care about. The one advantage that linq has performance-wise is that you can easily leave out retrieval of data you won't use (you can always hand-code something that does better than linq does, but linq makes it easy to do this without creating a library full of hundreds of classes for slight variants of what you leave out each time).
You say "list" a few times. Don't go obtaining lists if you don't need to, only if you'll re-use the same list more than once. Otherwise working one item at a time will perform better 99% of the time.
The "nice" and "strange" syntaxes as you put it are different ways to say the same thing. There are some things that can only be expressed with methods and lambdas, but the other form can always be expressed as such - indeed is after compilation. The likes of from b in someSource where b.id == 21 select b.name becomes compiled as someSource.Where(b => b.id == 21).Select(b => b.name) anyway.
You can set DataContext.LoadOptions to define exactly which entities you want loaded with which object (and best of all, set it differently in different uses quite easily). This can solve your N+1 issue.
Alternatively you might find it easer to update existing entities or insert new ones, by setting the appropriate id yourself.
Even if you don't find it easier, it can be faster in some cases. Normally I'd say go with the easier and forget the performance as far as this choice goes, but if performance is a concern here (and you say it is), then that can make profiling to see which of the two works better, worthwhile.

Related

Single Select() statement or multiple for transformation with multiple steps?

In C#, I have a collection of objects that I want to transform to a different type. This conversion, which I would like to do with LINQ Select(), requires multiple operations in sequence. To perform these operations, is it better to chain together multiple Select() queries like
resultSet.Select(stepOneDelegate).Select(stepTwoDelegate).Select(stepThreeDelegate);
or instead to perform these three steps in a single call?
resultSet.Select(item => stepThree(stepTwo(stepOne(item))));
Note: The three steps themselves are not necessarily functions. They are meant to be a concise demonstration of the problem. If that has an effect on the answer please include that information.
Any performance difference would be negligible, but the definitive answer to that is simply "test it". The question would be more around readability, which one is easier to understand and grasp what is going on.
Cases where I have needed to Project on a Projection would include when working with EF Linq expressions where I ultimately need to do something that isn't supported by EF Linq so I need to materialize a projection (usually to an anonymous type) then finish the expression before selecting the final output. In these cases you would need to use the first example.
Personally I'd probably just stick to the first scenario as to me it is easy to understand what is going on, and it easily supports additions for other operations such as filtering with Where or using OrderBy etc. The second scenario only really comes up when I'm building a composite model.
.Select(x => new OuterModel
{
Id = x.Id,
InnerModel = x.Inner.Select(i => new InnerModel
{
// ...
})
}) // ...
In most cases though this can be handled through Automapper.
I'd be wary of any code that I felt needed chaining a lot of Select expressions as it would smell like trying to do too much in one expression chain. Making something easy to understand, even if it involves a few extra steps and might add a few milliseconds to the execution is far better than having the risk of needing to track down bugs that someone introduced because they misunderstood potentially complex looking code.

Parallel Foreach with a Lazy Loaded List

I am using Entity Framework, and I have a list that I need to iterate through to do work on. I'm unable to do this work directly via the query on the database, and the list can be quiet big, so I'm hoping I can use Parallel Foreach or AsParallel.
The problem is that even when I load my list using ToList() into memory, and then run it in a parallel function, it destroys my lazy loaded navigation properties.
I am running something simple like this (This has been simplified alot for the purpose of this question):
var quoteList = DbContext.Quotes.ToList();
List<QuotesItem> quoteViewItemList = new List<QuotesItem>();
quoteList.ForAll(quote => {
var quoteViewItem = new QuotesItem();
quoteViewItem.YearEnding = quote.QuoteService.FirstOrDefault().Yearending; //is SOMETIMES null when it shouldn't be.
quoteViewItem.Manager quote.Client.Manager.Name; //IS sometimes null
quoteViewItem.... = ..... // More processing here
quoteViewItemList.Add(quoteViewItem);
});
the problem is QuoteService seems to be null sometimes even when its not null in the List.
First lets talk about the issue.
The only reason i can image is disposing the DbContext befour .ForAll has finnished its work, probably from different thread ? But i am just guessing at the moment.
I can give you couple of suggestions about possible optimization of the code.
First evaluating all Quotes using .ToList() probably has some performance impact if there are many records in the database, so my advice here is to exchange the eager evaluation with SQL cursor.
This can be achieved by using any construct/code which internaly uses IEnumerable<> and more specific .GetEnumerator().
The internal implementation of the IQueriable<> in EF creates a SQL cursor, which is pretty fast. The best part is that you can use it with .AsParallel() or Parallel.ForEach, both internally uses the Enumerators.
This is not very documented feature, but if start SQL Server Profile and execute the code below you will see that only single request is executed to the Server and also you can notice that the RAM of the machine executing the code does not spike, which means it does not fetch everything at once.
I've found some random information about it if you're intrested: https://sqljudo.wordpress.com/2015/02/24/entity-framework-hidden-cursors/
Although i've used this approach in cases where the code operating on the database record was pretty heavy, for simple projection from one type to another running the cursor with .AsParallel( .. ) or simple foreach constuct may perform similar.
So using
DbContext.Quotes
.AsParallel()
.WithDegreeOfParallelism(...)
.ForAll(...)
should run with good performance.
My second advice is about accessing Navigation properties in EF with lazy loading. This as you know leads to the N+1 select issue.
So instead of depending on EF lazy evaluation in this case it will be better to eagerly fetch the navigation properties, so we can rewrite the code about as this
DbContext.Quotes
.Include(x => x.QuoteService)
.AsParallel()
.WithDegreeOfParallelism(...)
.ForAll(...)
This way when accessing the QuoteService navigation property EF won't do any additional requests to the Server, so this should dramatically improve you performance and it probably can magicaly fix the Null reference issue (i hope)
The Generic version of the .Include(..) method which i am using part of the System.Data.Entity namespace.
Further if it acceptable for your scenario you can disable the change tracking which will gain you a bit more performance. So my final code will look like this
var queryViewItems = DbContext.Quotes
.AsNoTracking()
.Include(x => x.QuoteService)
.AsParallel()
.WithDegreeOfParallelism(...)
.Select(x => { ... })
.ToList();

What's the most efficient way to get only the final row of a SQL table using EF4?

I'm looking to retrieve the last row of a table by the table's ID column. What I am currently using works:
var x = db.MyTable.OrderByDescending(d => d.ID).FirstOrDefault();
Is there any way to get the same result with more efficient speed?
I cannot see that this queries through the entire table.
Do you not have an index on the ID column?
Can you add the results of analysing the query to your question, because this is not how it should be.
As well as the analysis results, the SQL produced. I can't see how it would be anything other than select top 1 * from MyTable order by id desc only with explicit column names and some aliasing. Nor if there's an index on id how it's anything other than a scan on that index.
Edit: That promised explanation.
Linq gives us a set of common interfaces, and in the case of C# and VB.NET some keyword support, for a variety of operations upon sources which return 0 or more items (e.g. in-memory collections, database calls, parsing of XML documents, etc.).
This allows us to express similar tasks regardless of the underlying source. Your query for example includes the source, but we could do a more general form of:
public static YourType FinalItem(IQueryable<YourType> source)
{
return source.OrderByDesending(d => d.ID).FirstOrDefault();
}
Now, we could do:
IEnumerable<YourType> l = SomeCallThatGivesUsAList();
var x = FinalItem(db.MyTable);//same as your code.
var y = FinalItem(l);//item in list with highest id.
var z = FinalItem(db.MyTable.Where(d => d.ID % 10 == 0);//item with highest id that ends in zero.
But the really important part, is that while we've a means of defining the sort of operation we want done, we can have the actual implementation hidden from us.
The call to OrderByDescending produces an object that has information on its source, and the lambda function it will use in ordering.
The call to FirstOrDefault in turn has information on that, and uses it to obtain a result.
In the case with the list, the implementation is to produce the equivalent Enumerable-based code (Queryable and Enumerable mirror each other's public members, as do the interfaces they use such as IOrderedQueryable and IOrderedEnumerable and so on).
This is because, with a list that we don't know is already sorted in the order we care about (or in the opposite order), there isn't any faster way than to examine each element. The best we can hope for is an O(n) operation, and we might get an O(n log n) operation - depending on whether the implementation of the ordering is optimised for the possibility of only one item being taken from it*.
Or to put it another way, the best we could hand-code in code that only worked on enumerables is only slightly more efficient than:
public static YourType FinalItem(IEnumerable<YourType> source)
{
YourType highest = default(YourType);
int highestID = int.MinValue;
foreach(YourType item in source)
{
curID = item.ID;
if(highest == null || curID > highestID)
{
highest = item;
highestID = curID;
}
}
return highest;
}
We can do slightly better with some micro-opts on handling the enumerator directly, but only slightly and the extra complication would just make for less-good example code.
Since we can't do any better than that by hand, and since the linq code doesn't know anything more about the source than we do, that's the best we could possibly hope for it matching. It might do less well (again, depending on whether the special case of our only wanting one item was thought of or not), but it won't beat it.
However, this is not the only approach linq will ever take. It'll take a comparable approach with an in-memory enumerable source, but your source isn't such a thing.
db.MyTable represents a table. To enumerate through it gives us the results of an SQL query more or less equivalent to:
SELECT * FROM MyTable
However, db.MyTable.OrderByDescending(d => d.ID) is not the equivalent of calling that, and then ordering the results in memory. Because queries get processed as a whole when they are executed, we actually get the result of an SQL query more or less like:
SELECT * FROM MyTable ORDER BY id DESC
Finally, the entire query db.MyTable.OrderByDescending(d => d.ID).FirstOrDefault() results in a query like:
SELECT TOP 1 * FROM MyTable ORDER BY id DESC
Or
SELECT * FROM MyTable ORDER BY id DESC LIMIT 1
Depending upon what sort of database server you are using. Then the results get passed to code equivalent to the following ADO.NET-based code:
return dataReader.Read() ?
new MyType{ID = dataReader.GetInt32(0), dataReader.GetInt32(1), dataReader.GetString(2)}//or similar
: null;
You can't get much better.
And as for that SQL query. If there's an index on the id column (and since it looks like a primary key, there certainly should be), then that index will be used to very quickly find the row in question, rather than examining each row.
In all, because different linq providers use different means to fulfil the query, they can all try their best to do so in the best way possible. Of course, being in an imperfect world we'll no doubt find that some are better than others. What's more, they can even work to pick the best approach for different conditions. One example of this is that database-related providers can choose different SQL to take advantage of features of different versions of databases. Another is that the implementation of the the version of Count() that works with in memory enumerations works a bit like this;
public static int Count<T>(this IEnumerable<T> source)
{
var asCollT = source as ICollection<T>;
if(asCollT != null)
return asCollT.Count;
var asColl = source as ICollection;
if(asColl != null)
return asColl.Count;
int tally = 0;
foreach(T item in source)
++tally;
return tally;
}
This is one of the simpler cases (and a bit simplified again in my example here, I'm showing the idea not the actual code), but it shows the basic principle of code taking advantage of more efficient approaches when they're available - the O(1) length of arrays and the Count property on collections that is sometimes O(1) and it's not like we've made things worse in the cases where it's O(n) - and then when they aren't available falling back to a less efficient but still functional approach.
The result of all of this is that Linq tends to give very good bang for buck, in terms of performance.
Now, a decent coder should be able to match or beat its approach to any given case most of the time†, and even when Linq comes up with the perfect approach there are some overheads to it itself.
However, over the scope of an entire project, using Linq means that we can concisely create reasonably efficient code that relates to a relatively constrained number of well defined entities (generally one per table as far as databases go). In particular, the use of anonymous functions and joins means that we get queries that are very good. Consider:
var result = from a in db.Table1
join b in db.Table2
on a.relatedBs = b.id
select new {a.id, b.name};
Here we're ignoring columns we don't care about here, and the SQL produced will do the same. Consider what we would do if we were creating the objects that a and b relate to with hand-coded DAO classes:
Create a new class to represent this combination of a's id and b's name, and relevant code to run the query we need to produce instances.
Run a query to obtain all information about each a and the related b, and live with the waste.
Run a query to obtain the information on each a and b that we care of, and just set default values for the other fields.
Of these, option 2 will be wasteful, perhaps very wasteful. Option 3 will be a bit wasteful and very error prone (what if we accidentally try to use a field elsewhere in the code that wasn't set correctly?). Only option 1 will be more efficient than what the linq approach will produce, but this is just one case. Over a large project this could mean producing dozens or even hundreds or thousands of slightly different classes (and unlike the compiler, we won't necessarily spot the cases where they're actually the same). In practice, therefore, linq can do us some great favours when it comes to efficiency.
Good policies for efficient linq are:
Stay with the type of query you start with as long as you can. Whenever you grab items into memory with ToList() or ToArray etc, consider if you really need to. Unless you need to or you can clearly state the advantage doing so gives you, don't.
If you do need to move to processing in memory, favour AsEnumerable() over ToList() and the other means, so you only grab one at a time.
Examine long-running queries with SQLProfiler or similar. There are a handful of cases where policy 1 here is wrong and moving to memory with AsEnumerable() is actually better (most relate to uses of GroupBy that don't use aggregates on the non-grouped fields, and hence don't actually have a single SQL query they correspond with).
If a complicated query is hit many times, then CompiledQuery can help (less so with 4.5 since it has automatic optimisations that cover some of the cases it helps in), but it's normally better to leave that out of the first approach and then use it only in hot-spots that are efficiency problems.
You can get EF to run arbitrary SQL, but avoid it unless it's a strong gain because too much such code reduces the consistent readability using a linq approach throughout gives (I have to say though, I think Linq2SQL beats EF on calling stored procedures and even more so on calling UDFs, but even there this still applies - it's less clear from just looking at the code how things relate to each other).
*AFAIK, this particular optimisation isn't applied, but we're talking about the best possible implementation at this point, so it doesn't matter if it is, isn't, or is in some versions only.
†I'll admit though that Linq2SQL would often produce queries that use APPLY that I would not think of, as I was used to thinking of how to write queries in versions of SQLServer before 2005 introduced it, while code doesn't have those sort of human tendencies to go with old habits. It pretty much taught me how to use APPLY.

Ways for refreshing linked entity set collections within global static data context data objects

Anyone whos been using LINQ to SQL for any length of time will know that using a global static data context can present problems with synchronisation between the database (especially if being used by many concurrent users). For the sake of simplicity, I like to work with objects directly in memory, manipulate there, then push a context.SubmitChanges() when updates and inserts to that object and its linked counterparts are complete. I am aware this is not recommended but it also has advantages. The problem here is that any attached linked objects are not refreshed with this, and it is also not possible to refresh the collection with context.Refresh(linqobject.linkedcollection) as this does not take into account newly added and removed objects.
My question is have i missed something obvious? It seems to be madness that there is no simple way to refresh these collections without creating specific logic.
I would also like to offer a workaround which I have discovered, but I am interested to know if there are drawbacks with this approach (I have not profiled the output and am concerned that it may be generating unintended insert and delete statements).
foreach (OObject O in Program.BaseObject.OObjects.OrderBy(o => o.ID))
{
Program.DB.Refresh(System.Data.Linq.RefreshMode.OverwriteCurrentValues, O);
Program.DB.Refresh(System.Data.Linq.RefreshMode.OverwriteCurrentValues, O.LinksTable);
O.LinksTable.Assign(Program.DB.LinksTable.Where(q => q.OObject == O));}
It also seems to be possible to do things like Program.DB.Refresh(System.Data.Linq.RefreshMode.OverwriteCurrentValues, Program.DB.OObjects);
however this appears to return the entire table which is often highly inefficient. Any ideas?

Help Need with LINQ Syntax

Can someone help to change to following to select unique Model from Product table
var query = from Product in ObjectContext.Products.Where(p => p.BrandId == BrandId & p.ProdDelOn == null)
orderby Product.Model
select Product;
I'm guessing you that you still want to filter based on your existing Where() clause. I think this should take care of it for you (and will include the ordering as well):
var query = ObjectContext.Products
.Where(p => p.BrandId == BrandId && p.ProdDelOn == null)
.Select(p => p.Model)
.Distinct()
.OrderBy(m => m);
But, depending on how you read the post...it also could be taken as you're trying to get a single unique Model out of the results (which is a different query):
var model = ObjectContext.Products
.Where(p => p.BrandId == BrandId && p.ProdDelOn == null)
.Select(p => p.Model)
.First();
Change the & to && and add the following line:
query = query.Distinct();
I'm afraid I can't answer the question - but I want to comment on it nonetheless.
IMHO, this is an excellent example of what's wrong with the direction the .NET Framework has been going in the last few years. I cannot stand LINQ, and nor do I feel too warmly about extension methods, anonymous methods, lambda expressions, and so on.
Here's why: I have yet to see a situation where either of these things actually contribute anything to solving real-world programming problems. LINQ is ceratinly no replacement for SQL, so you (or at least the project) still need to master that. Writing the LINQ statements is not any simpler than writing the SQL, but it does add run-time processing to build an expression tree and "compile" it into an SQL statement. Now, if you could solve complex problems more easily with LINQ than with SQL directly, or if it meant you didn't need to also know SQL, and if you could trust LINQ would produce good-enough SQL all the time, it might still have been worth using. But NONE of these preconditions are met, so I'm left wondering what the benefit is supposed to be.
Of course, in good old-fashioned SQL the statement would be
SELECT DISTINCT [Model]
FROM [Product]
WHERE [BrandID] = #brandID AND [ProdDelOn] IS NULL
ORDER BY [Model]
In many cases the statements can be easily generated with dev tools and encapsulated by stored procedures. This would perform better, but I'll grant that for many things the performance difference between LINQ and the more straightforward stored procs would be totally irrelevant. (On the other hand, performance problems do have a tendency to sneak in, as we devs often work with totally unrealistic amounts of data and on environments that have little in common with those hosting our software in real production systems.) But the advantages of just not using LINQ are HUGE:
1) Fewer skills required (since you must use SQL anyway)
2) All data access can be performed in one way (since you need SQL anyway)
3) Some control over HOW to get data and not just what
4) Less chance of being rightfully accused of writing bloatware (more efficient)
Similar things can be said with respect to many of the new language features introduced since C# 2.0, though I do appreciate and use some of them. The "var" keyword with type inferrence is great for initializing locals - it's not much use getting the same type information two times on the same line. But let's not pretend this somehow helps one bit if you have a problem to solve. Same for anonymous types - nested private types served the same purpose with hardly any more code, and I've found NO use for this feature since trying it out when it was new and shiny. Extention methods ARE in fact just plain old utility methods, and I have yet to hear any good explanation of why one should use the SAME syntax for instance methods and static methods invoked on another class! This actually means that adding a method to a class and getting no build warnings or errors can break an application. (In case you doubt: If you had an extension method Bar() for your Foo type, Foo.Bar() invokes a completely different implementation which may or may not do something similar to what your extension method Bar() did the day you introduce an instance method with the same signature. It'll build and crash at runtime.)
Sorry to rant like this, and maybe there is a better place to post this than in response to a question. But I really think anyone starting out with LINQ is wasting their time - unless it's in preparation for an MS certification exam, which AFAIU is also something a bit removed from reality.

Categories

Resources