I ran into an issue where the following code results in an exception.
public IList<Persoon> GetPersonenWithCurrentWorkScheme(int clientId)
{
//SELECT N+1
Session.QueryOver<Werkschema>()
.Fetch(ws => ws.Labels).Eager
.JoinQueryOver<WerkschemaHistory>(p => p.Histories)
.JoinQueryOver<ArbeidsContract>(a => a.ArbeidsContract)
.JoinQueryOver<Persoon>(p => p.Persoon)
.Where(p => p.Klant.ID == clientId)
.Future();
var result = Session.QueryOver<Persoon>()
.Fetch(p => p.ArbeidsContracten).Eager
.Where(p => p.Klant.ID == clientId)
.Future();
return result.ToList();
}
When I comment the first part of the method (Session.QueryOver WerkSchema...) the code runs fine. When it is not commented, the first NHibernate.Commit() that occurs throws an exception. (Something with a date-time conversion but that's not really what I'm worried about).
My question: Is the first bit of code useful? Does it do anything? The result is not stored in a variable which is used later, so to me it looks like dead code. Or is this some NHibernate dark magic which actually does something useful?
The Future is an optimization over existing API provided by NHibernate.
One of the nicest new features in NHibernate 2.1 is the Future() and FutureValue() functions. They essentially function as a way to defer query execution to a later date, at which point NHibernate will have more information about what the application is supposed to do, and optimize for it accordingly. This build on an existing feature of NHibernate, Multi Queries, but does so in a way that is easy to use and almost seamless.
This will execute multiple queries in single round trip to database. If it is not possible to get all the needed data in single round trip, multiple calls will be executed as expected; but it still helps in many cases.
The database call is triggered when first round trip is requested.
In your case, first round trip is requested by calling result.ToList(), which also including your first part of code.
As you suspect, output of first part; though retrieved is never being used. So in my view, that part can be safely commented. But, this is based on code you post in question only.
It may be the case that the data loaded while this call is saving round-trips in other part of code. But in that case, code should be optimized and should be moved to proper location.
Related
I am using Entity Framework, and I have a list that I need to iterate through to do work on. I'm unable to do this work directly via the query on the database, and the list can be quiet big, so I'm hoping I can use Parallel Foreach or AsParallel.
The problem is that even when I load my list using ToList() into memory, and then run it in a parallel function, it destroys my lazy loaded navigation properties.
I am running something simple like this (This has been simplified alot for the purpose of this question):
var quoteList = DbContext.Quotes.ToList();
List<QuotesItem> quoteViewItemList = new List<QuotesItem>();
quoteList.ForAll(quote => {
var quoteViewItem = new QuotesItem();
quoteViewItem.YearEnding = quote.QuoteService.FirstOrDefault().Yearending; //is SOMETIMES null when it shouldn't be.
quoteViewItem.Manager quote.Client.Manager.Name; //IS sometimes null
quoteViewItem.... = ..... // More processing here
quoteViewItemList.Add(quoteViewItem);
});
the problem is QuoteService seems to be null sometimes even when its not null in the List.
First lets talk about the issue.
The only reason i can image is disposing the DbContext befour .ForAll has finnished its work, probably from different thread ? But i am just guessing at the moment.
I can give you couple of suggestions about possible optimization of the code.
First evaluating all Quotes using .ToList() probably has some performance impact if there are many records in the database, so my advice here is to exchange the eager evaluation with SQL cursor.
This can be achieved by using any construct/code which internaly uses IEnumerable<> and more specific .GetEnumerator().
The internal implementation of the IQueriable<> in EF creates a SQL cursor, which is pretty fast. The best part is that you can use it with .AsParallel() or Parallel.ForEach, both internally uses the Enumerators.
This is not very documented feature, but if start SQL Server Profile and execute the code below you will see that only single request is executed to the Server and also you can notice that the RAM of the machine executing the code does not spike, which means it does not fetch everything at once.
I've found some random information about it if you're intrested: https://sqljudo.wordpress.com/2015/02/24/entity-framework-hidden-cursors/
Although i've used this approach in cases where the code operating on the database record was pretty heavy, for simple projection from one type to another running the cursor with .AsParallel( .. ) or simple foreach constuct may perform similar.
So using
DbContext.Quotes
.AsParallel()
.WithDegreeOfParallelism(...)
.ForAll(...)
should run with good performance.
My second advice is about accessing Navigation properties in EF with lazy loading. This as you know leads to the N+1 select issue.
So instead of depending on EF lazy evaluation in this case it will be better to eagerly fetch the navigation properties, so we can rewrite the code about as this
DbContext.Quotes
.Include(x => x.QuoteService)
.AsParallel()
.WithDegreeOfParallelism(...)
.ForAll(...)
This way when accessing the QuoteService navigation property EF won't do any additional requests to the Server, so this should dramatically improve you performance and it probably can magicaly fix the Null reference issue (i hope)
The Generic version of the .Include(..) method which i am using part of the System.Data.Entity namespace.
Further if it acceptable for your scenario you can disable the change tracking which will gain you a bit more performance. So my final code will look like this
var queryViewItems = DbContext.Quotes
.AsNoTracking()
.Include(x => x.QuoteService)
.AsParallel()
.WithDegreeOfParallelism(...)
.Select(x => { ... })
.ToList();
Assuming the two following possible blocks of code inside of a view, with a model passed to it using something like return View(db.Products.Find(id));
List<UserReview> reviews = Model.UserReviews.OrderByDescending(ur => ur.Date).ToList();
if (myUserReview != null)
reviews = reviews.Where(ur => ur.Id != myUserReview.Id).ToList();
IEnumerable<UserReview> reviews = Model.UserReviews.OrderByDescending(ur => ur.Date);
if (myUserReview != null)
reviews = reviews.Where(ur => ur.Id != myUserReview.Id);
What are the performance implications between the two? By this point, is all of the product related data in memory, including its navigation properties? Does using ToList() in this instance matter at all? If not, is there a better approach to using Linq queries on a List without having to call ToList() every time, or is this the typical way one would do it?
Read http://blogs.msdn.com/b/charlie/archive/2007/12/09/deferred-execution.aspx
Deferred execution is one of the many marvels intrinsic to linq. The short version is that your data is never touched (it remains idle in the source be that in-memory, or in-database, or wherever). When you construct a linq query all you are doing is creating an IEnumerable class that is 'capable' of enumerating your data. The work doesn't begin until you actually start enumerating and then each piece of data comes all the way from the source, through the pipeline, and is serviced by your code one item at a time. If you break your loop early, you have saved some work - later items are never processed. That's the simple version.
Some linq operations cannot work this way. Orderby is the best example. Orderby has to know every piece of data because it possible that the last piece retrieved from the source very well could be the first piece that you are supposed to get. So when an operation such as orderby is in the pipe, it will actually cache your dataset internally. So all data has been pulled from the source, and has gone through the pipeline, up to the orderby, and then the orderby becomes your new temporary data source for any operations that come afterwards in the expression. Even so, orderby tries as much as possible to follow the deferred execution paradigm by waiting until the last possible moment to build its cache. Including orderby in your query still doesn't do any work, immediately, but the work begins once you start enumerating.
To answer your question directly, your call to ToList is doing exactly that. OrderByDescending is caching the data from your datasource => ToList additionally persists it into a variable that you can actually touch (reviews) => where starts pulling records one at a time from reviews, and if it matches then your final ToList call is storing the results into yet another list in memory.
Beyond the memory implications, ToList is additionally thwarting deferred execution because it also STOPS the processing of your view at the time of the call, to entirely process that entire linq expression, in order to build its in-memory representation of the results.
Now none of this is a real big deal if the number of records we're talking about are in the dozens. You'll never notice the difference at runtime because it happens so quick. But when dealing with large scale datasets, deferring as much as possible for as long as possible in hopes that something will happen allowing you to cancel a full enumeration... in addition to the memory savings... gold.
In your version without ToList: OrderByDescending will still cache a copy of your dataset as processed through the pipeline up to that point, internally, sorted of course. That's ok, you gotta do what you gotta do. But that doesn't happen until you actually try to retrieve your first record later in your view. Once that cache is complete, you get your first record, and for every next record you are then pulling from that cache, checking against the where clause, you get it or you don't based upon that where and have saved a couple of in memory copies and a lot of work.
Magically, I bet even your lead-in of db.Products.Find(id) doesn't even start spinning until your view starts enumerating (if not using ToList). If db.Products is a Linq2SQL datasource, then every other element you've specified will reduce into SQL verbiage, and your entire expression will be deferred.
Hope this helps! Read further on Deferred Execution. And if you want to know 'how' that works, look into c# iterators (yield return). There's a page somewhere on MSDN that I'm having trouble finding that contains the common linq operations, and whether they defer execution or not. I'll update if I can track that down.
/*edit*/ to clarify - all of the above is in the context of raw linq, or Linq2Objects. Until we find that page, common sense will tell you how it works. If you close your eyes and imagine implementing orderby, or any other linq operation, if you can't think of a way to implement it with 'yield return', or without caching, then execution is not likely deferred and a cache copy is likely and/or a full enumeration... orderby, distinct, count, sum, etc... Linq2SQL is a whole other can of worms. Even in that context, ToList will still stop and process the whole expression and store the results because a list, is a list, and is in memory. But Linq2SQL is uniquely capable of deferring many of those aforementioned clauses, and then some, by incorporating them into the generated SQL that is sent to the SQL server. So even orderby can be deferred in this way because the clause will be pushed down into your original datasource and then ignored in the pipe.
Good luck ;)
Not enough context to know for sure.
But ToList() guarantees that the data has been copied into memory, and your first example does that twice.
The second example could involve queryable data or some other on-demand scenario. Even if the original data was all already in memory and even if you only added a call to ToList() at the end, that would be one less copy in-memory than the first example.
And it's entirely possible that in the second example, by the time you get to the end of that little snippet, no actual processing of the original data has been done at all. In that case, the database might not even be queried until some code actually enumerates the final reviews value.
As for whether there's a "better" way to do it, not possible to say. You haven't defined "better". Personally, I tend to prefer the second example...why materialize data before you need it? But in some cases, you do want to force materialization. It just depends on the scenario.
This question already has answers here:
What is the "N+1 selects problem" in ORM (Object-Relational Mapping)?
(19 answers)
Closed 8 years ago.
I have never heard of it, but people are refering to an issue in an application as an "N+1 problem". They are doing a Linq to SQL based project, and a performance problem has been identified by someone. I don't quite understand it - but hopefully someone can steer me.
It seems that they are trying to get a list of obects, and then the Foreach after that is causing too many database hits:
From what I understand, the second part of the source is only being loaded in the forwach.
So, list of items loaded:
var program = programRepository.SingleOrDefault(r => r.ProgramDetailId == programDetailId);
And then later, we make use of this list:
foreach (var phase in program.Program_Phases)
{
phase.Program_Stages.AddRange(stages.Where(s => s.PhaseId == phase.PhaseId));
phase.Program_Stages.ForEach(s =>
{
s.Program_Modules.AddRange(modules.Where(m => m.StageId == s.StageId));
});
phase.Program_Modules.AddRange(modules.Where(m => m.PhaseId == phase.PhaseId));
}
It seems the problem idetified is that, they expected 'program' to contain it's children. But when we refer to the child in the query, it reloads the proram:
program.Program_Phases
They're expecting program to be fully loaded and in memory, and profilder seems to indicate that program table, with all the joins is being called on each 'foreach'.
Does this make sense?
(EDIT: I foind this link:
Does linq to sql automatically lazy load associated entities?
This might answer my quetion, but .. they're using that nicer (where person in...) notation, as opposed to this strange (x => x....). So if this link Is the answer - i.e, we need to 'join' in the query - can that be done?)
In ORM terminology, the 'N+1 select problem' typically occurs when you have an entity that has nested collection properties. It refers to the number of queries that are needed to completely load the entity data into memory when using lazy-loading. In general, the more queries, the more round-trips from client to server and the more work the server has to do to process the queries, and this can have a huge impact on performance.
There are various techniques for avoiding this problem. I am not familiar with Linq to SQL but NHibernate supports eager fetching which helps in some cases. If you do not need to load the entire entity instance then you could also consider doing a projection query. Another possibility is to change your entity model to avoid having nested collections.
For performant linq first work out exactly what properties you actually care about. The one advantage that linq has performance-wise is that you can easily leave out retrieval of data you won't use (you can always hand-code something that does better than linq does, but linq makes it easy to do this without creating a library full of hundreds of classes for slight variants of what you leave out each time).
You say "list" a few times. Don't go obtaining lists if you don't need to, only if you'll re-use the same list more than once. Otherwise working one item at a time will perform better 99% of the time.
The "nice" and "strange" syntaxes as you put it are different ways to say the same thing. There are some things that can only be expressed with methods and lambdas, but the other form can always be expressed as such - indeed is after compilation. The likes of from b in someSource where b.id == 21 select b.name becomes compiled as someSource.Where(b => b.id == 21).Select(b => b.name) anyway.
You can set DataContext.LoadOptions to define exactly which entities you want loaded with which object (and best of all, set it differently in different uses quite easily). This can solve your N+1 issue.
Alternatively you might find it easer to update existing entities or insert new ones, by setting the appropriate id yourself.
Even if you don't find it easier, it can be faster in some cases. Normally I'd say go with the easier and forget the performance as far as this choice goes, but if performance is a concern here (and you say it is), then that can make profiling to see which of the two works better, worthwhile.
I'm using a few functions like
ICollection<ICache> caches = new HashSet<ICache>();
ICollection<T> Matches<T>(string dependentVariableName)
{
return caches
.Where(x => x.GetVariableName() == dependentVariableName)
.Where(x => typeof(T).IsAssignableFrom(x.GetType()))
.Select(x => (T) x)
.ToList();
}
in my current class design. They work wonderfully from an architecture perspective--where I can arbitrarily add objects of various related types (in this case ICaches) and retrieve them as collections of concrete types.
An issue is that the framework here is a scientific package, and these sorts of functions lie on very hot code paths that are getting called thousands of times over a few minute period. The result:
and functions like the above are the main consumers of the COMDelegate::DelegateConstruct.
As you can see from the relative distribution of the sample %, this isn't a deal breaker, but it would be fantastic to reduce the overhead a bit!
Thanks in advance.
1) i dont see how the code you posted it related to the performance data... the functions listed dont look like they are called from this code at all. so really i can't answer your question except to say that maybe you are interpreting he performance report wrong.
2) don't call .ToList at the end... just return the IEnumerable. that will help performance. only do ToList when you really do need a list that you can later add/remove/sort things in.
3) i dont have enough context but it seems this method could be eliminated by making use of the dynamic keyword
Can someone help to change to following to select unique Model from Product table
var query = from Product in ObjectContext.Products.Where(p => p.BrandId == BrandId & p.ProdDelOn == null)
orderby Product.Model
select Product;
I'm guessing you that you still want to filter based on your existing Where() clause. I think this should take care of it for you (and will include the ordering as well):
var query = ObjectContext.Products
.Where(p => p.BrandId == BrandId && p.ProdDelOn == null)
.Select(p => p.Model)
.Distinct()
.OrderBy(m => m);
But, depending on how you read the post...it also could be taken as you're trying to get a single unique Model out of the results (which is a different query):
var model = ObjectContext.Products
.Where(p => p.BrandId == BrandId && p.ProdDelOn == null)
.Select(p => p.Model)
.First();
Change the & to && and add the following line:
query = query.Distinct();
I'm afraid I can't answer the question - but I want to comment on it nonetheless.
IMHO, this is an excellent example of what's wrong with the direction the .NET Framework has been going in the last few years. I cannot stand LINQ, and nor do I feel too warmly about extension methods, anonymous methods, lambda expressions, and so on.
Here's why: I have yet to see a situation where either of these things actually contribute anything to solving real-world programming problems. LINQ is ceratinly no replacement for SQL, so you (or at least the project) still need to master that. Writing the LINQ statements is not any simpler than writing the SQL, but it does add run-time processing to build an expression tree and "compile" it into an SQL statement. Now, if you could solve complex problems more easily with LINQ than with SQL directly, or if it meant you didn't need to also know SQL, and if you could trust LINQ would produce good-enough SQL all the time, it might still have been worth using. But NONE of these preconditions are met, so I'm left wondering what the benefit is supposed to be.
Of course, in good old-fashioned SQL the statement would be
SELECT DISTINCT [Model]
FROM [Product]
WHERE [BrandID] = #brandID AND [ProdDelOn] IS NULL
ORDER BY [Model]
In many cases the statements can be easily generated with dev tools and encapsulated by stored procedures. This would perform better, but I'll grant that for many things the performance difference between LINQ and the more straightforward stored procs would be totally irrelevant. (On the other hand, performance problems do have a tendency to sneak in, as we devs often work with totally unrealistic amounts of data and on environments that have little in common with those hosting our software in real production systems.) But the advantages of just not using LINQ are HUGE:
1) Fewer skills required (since you must use SQL anyway)
2) All data access can be performed in one way (since you need SQL anyway)
3) Some control over HOW to get data and not just what
4) Less chance of being rightfully accused of writing bloatware (more efficient)
Similar things can be said with respect to many of the new language features introduced since C# 2.0, though I do appreciate and use some of them. The "var" keyword with type inferrence is great for initializing locals - it's not much use getting the same type information two times on the same line. But let's not pretend this somehow helps one bit if you have a problem to solve. Same for anonymous types - nested private types served the same purpose with hardly any more code, and I've found NO use for this feature since trying it out when it was new and shiny. Extention methods ARE in fact just plain old utility methods, and I have yet to hear any good explanation of why one should use the SAME syntax for instance methods and static methods invoked on another class! This actually means that adding a method to a class and getting no build warnings or errors can break an application. (In case you doubt: If you had an extension method Bar() for your Foo type, Foo.Bar() invokes a completely different implementation which may or may not do something similar to what your extension method Bar() did the day you introduce an instance method with the same signature. It'll build and crash at runtime.)
Sorry to rant like this, and maybe there is a better place to post this than in response to a question. But I really think anyone starting out with LINQ is wasting their time - unless it's in preparation for an MS certification exam, which AFAIU is also something a bit removed from reality.