Can someone help to change to following to select unique Model from Product table
var query = from Product in ObjectContext.Products.Where(p => p.BrandId == BrandId & p.ProdDelOn == null)
orderby Product.Model
select Product;
I'm guessing you that you still want to filter based on your existing Where() clause. I think this should take care of it for you (and will include the ordering as well):
var query = ObjectContext.Products
.Where(p => p.BrandId == BrandId && p.ProdDelOn == null)
.Select(p => p.Model)
.Distinct()
.OrderBy(m => m);
But, depending on how you read the post...it also could be taken as you're trying to get a single unique Model out of the results (which is a different query):
var model = ObjectContext.Products
.Where(p => p.BrandId == BrandId && p.ProdDelOn == null)
.Select(p => p.Model)
.First();
Change the & to && and add the following line:
query = query.Distinct();
I'm afraid I can't answer the question - but I want to comment on it nonetheless.
IMHO, this is an excellent example of what's wrong with the direction the .NET Framework has been going in the last few years. I cannot stand LINQ, and nor do I feel too warmly about extension methods, anonymous methods, lambda expressions, and so on.
Here's why: I have yet to see a situation where either of these things actually contribute anything to solving real-world programming problems. LINQ is ceratinly no replacement for SQL, so you (or at least the project) still need to master that. Writing the LINQ statements is not any simpler than writing the SQL, but it does add run-time processing to build an expression tree and "compile" it into an SQL statement. Now, if you could solve complex problems more easily with LINQ than with SQL directly, or if it meant you didn't need to also know SQL, and if you could trust LINQ would produce good-enough SQL all the time, it might still have been worth using. But NONE of these preconditions are met, so I'm left wondering what the benefit is supposed to be.
Of course, in good old-fashioned SQL the statement would be
SELECT DISTINCT [Model]
FROM [Product]
WHERE [BrandID] = #brandID AND [ProdDelOn] IS NULL
ORDER BY [Model]
In many cases the statements can be easily generated with dev tools and encapsulated by stored procedures. This would perform better, but I'll grant that for many things the performance difference between LINQ and the more straightforward stored procs would be totally irrelevant. (On the other hand, performance problems do have a tendency to sneak in, as we devs often work with totally unrealistic amounts of data and on environments that have little in common with those hosting our software in real production systems.) But the advantages of just not using LINQ are HUGE:
1) Fewer skills required (since you must use SQL anyway)
2) All data access can be performed in one way (since you need SQL anyway)
3) Some control over HOW to get data and not just what
4) Less chance of being rightfully accused of writing bloatware (more efficient)
Similar things can be said with respect to many of the new language features introduced since C# 2.0, though I do appreciate and use some of them. The "var" keyword with type inferrence is great for initializing locals - it's not much use getting the same type information two times on the same line. But let's not pretend this somehow helps one bit if you have a problem to solve. Same for anonymous types - nested private types served the same purpose with hardly any more code, and I've found NO use for this feature since trying it out when it was new and shiny. Extention methods ARE in fact just plain old utility methods, and I have yet to hear any good explanation of why one should use the SAME syntax for instance methods and static methods invoked on another class! This actually means that adding a method to a class and getting no build warnings or errors can break an application. (In case you doubt: If you had an extension method Bar() for your Foo type, Foo.Bar() invokes a completely different implementation which may or may not do something similar to what your extension method Bar() did the day you introduce an instance method with the same signature. It'll build and crash at runtime.)
Sorry to rant like this, and maybe there is a better place to post this than in response to a question. But I really think anyone starting out with LINQ is wasting their time - unless it's in preparation for an MS certification exam, which AFAIU is also something a bit removed from reality.
Related
In C#, I have a collection of objects that I want to transform to a different type. This conversion, which I would like to do with LINQ Select(), requires multiple operations in sequence. To perform these operations, is it better to chain together multiple Select() queries like
resultSet.Select(stepOneDelegate).Select(stepTwoDelegate).Select(stepThreeDelegate);
or instead to perform these three steps in a single call?
resultSet.Select(item => stepThree(stepTwo(stepOne(item))));
Note: The three steps themselves are not necessarily functions. They are meant to be a concise demonstration of the problem. If that has an effect on the answer please include that information.
Any performance difference would be negligible, but the definitive answer to that is simply "test it". The question would be more around readability, which one is easier to understand and grasp what is going on.
Cases where I have needed to Project on a Projection would include when working with EF Linq expressions where I ultimately need to do something that isn't supported by EF Linq so I need to materialize a projection (usually to an anonymous type) then finish the expression before selecting the final output. In these cases you would need to use the first example.
Personally I'd probably just stick to the first scenario as to me it is easy to understand what is going on, and it easily supports additions for other operations such as filtering with Where or using OrderBy etc. The second scenario only really comes up when I'm building a composite model.
.Select(x => new OuterModel
{
Id = x.Id,
InnerModel = x.Inner.Select(i => new InnerModel
{
// ...
})
}) // ...
In most cases though this can be handled through Automapper.
I'd be wary of any code that I felt needed chaining a lot of Select expressions as it would smell like trying to do too much in one expression chain. Making something easy to understand, even if it involves a few extra steps and might add a few milliseconds to the execution is far better than having the risk of needing to track down bugs that someone introduced because they misunderstood potentially complex looking code.
I'm looking to retrieve the last row of a table by the table's ID column. What I am currently using works:
var x = db.MyTable.OrderByDescending(d => d.ID).FirstOrDefault();
Is there any way to get the same result with more efficient speed?
I cannot see that this queries through the entire table.
Do you not have an index on the ID column?
Can you add the results of analysing the query to your question, because this is not how it should be.
As well as the analysis results, the SQL produced. I can't see how it would be anything other than select top 1 * from MyTable order by id desc only with explicit column names and some aliasing. Nor if there's an index on id how it's anything other than a scan on that index.
Edit: That promised explanation.
Linq gives us a set of common interfaces, and in the case of C# and VB.NET some keyword support, for a variety of operations upon sources which return 0 or more items (e.g. in-memory collections, database calls, parsing of XML documents, etc.).
This allows us to express similar tasks regardless of the underlying source. Your query for example includes the source, but we could do a more general form of:
public static YourType FinalItem(IQueryable<YourType> source)
{
return source.OrderByDesending(d => d.ID).FirstOrDefault();
}
Now, we could do:
IEnumerable<YourType> l = SomeCallThatGivesUsAList();
var x = FinalItem(db.MyTable);//same as your code.
var y = FinalItem(l);//item in list with highest id.
var z = FinalItem(db.MyTable.Where(d => d.ID % 10 == 0);//item with highest id that ends in zero.
But the really important part, is that while we've a means of defining the sort of operation we want done, we can have the actual implementation hidden from us.
The call to OrderByDescending produces an object that has information on its source, and the lambda function it will use in ordering.
The call to FirstOrDefault in turn has information on that, and uses it to obtain a result.
In the case with the list, the implementation is to produce the equivalent Enumerable-based code (Queryable and Enumerable mirror each other's public members, as do the interfaces they use such as IOrderedQueryable and IOrderedEnumerable and so on).
This is because, with a list that we don't know is already sorted in the order we care about (or in the opposite order), there isn't any faster way than to examine each element. The best we can hope for is an O(n) operation, and we might get an O(n log n) operation - depending on whether the implementation of the ordering is optimised for the possibility of only one item being taken from it*.
Or to put it another way, the best we could hand-code in code that only worked on enumerables is only slightly more efficient than:
public static YourType FinalItem(IEnumerable<YourType> source)
{
YourType highest = default(YourType);
int highestID = int.MinValue;
foreach(YourType item in source)
{
curID = item.ID;
if(highest == null || curID > highestID)
{
highest = item;
highestID = curID;
}
}
return highest;
}
We can do slightly better with some micro-opts on handling the enumerator directly, but only slightly and the extra complication would just make for less-good example code.
Since we can't do any better than that by hand, and since the linq code doesn't know anything more about the source than we do, that's the best we could possibly hope for it matching. It might do less well (again, depending on whether the special case of our only wanting one item was thought of or not), but it won't beat it.
However, this is not the only approach linq will ever take. It'll take a comparable approach with an in-memory enumerable source, but your source isn't such a thing.
db.MyTable represents a table. To enumerate through it gives us the results of an SQL query more or less equivalent to:
SELECT * FROM MyTable
However, db.MyTable.OrderByDescending(d => d.ID) is not the equivalent of calling that, and then ordering the results in memory. Because queries get processed as a whole when they are executed, we actually get the result of an SQL query more or less like:
SELECT * FROM MyTable ORDER BY id DESC
Finally, the entire query db.MyTable.OrderByDescending(d => d.ID).FirstOrDefault() results in a query like:
SELECT TOP 1 * FROM MyTable ORDER BY id DESC
Or
SELECT * FROM MyTable ORDER BY id DESC LIMIT 1
Depending upon what sort of database server you are using. Then the results get passed to code equivalent to the following ADO.NET-based code:
return dataReader.Read() ?
new MyType{ID = dataReader.GetInt32(0), dataReader.GetInt32(1), dataReader.GetString(2)}//or similar
: null;
You can't get much better.
And as for that SQL query. If there's an index on the id column (and since it looks like a primary key, there certainly should be), then that index will be used to very quickly find the row in question, rather than examining each row.
In all, because different linq providers use different means to fulfil the query, they can all try their best to do so in the best way possible. Of course, being in an imperfect world we'll no doubt find that some are better than others. What's more, they can even work to pick the best approach for different conditions. One example of this is that database-related providers can choose different SQL to take advantage of features of different versions of databases. Another is that the implementation of the the version of Count() that works with in memory enumerations works a bit like this;
public static int Count<T>(this IEnumerable<T> source)
{
var asCollT = source as ICollection<T>;
if(asCollT != null)
return asCollT.Count;
var asColl = source as ICollection;
if(asColl != null)
return asColl.Count;
int tally = 0;
foreach(T item in source)
++tally;
return tally;
}
This is one of the simpler cases (and a bit simplified again in my example here, I'm showing the idea not the actual code), but it shows the basic principle of code taking advantage of more efficient approaches when they're available - the O(1) length of arrays and the Count property on collections that is sometimes O(1) and it's not like we've made things worse in the cases where it's O(n) - and then when they aren't available falling back to a less efficient but still functional approach.
The result of all of this is that Linq tends to give very good bang for buck, in terms of performance.
Now, a decent coder should be able to match or beat its approach to any given case most of the time†, and even when Linq comes up with the perfect approach there are some overheads to it itself.
However, over the scope of an entire project, using Linq means that we can concisely create reasonably efficient code that relates to a relatively constrained number of well defined entities (generally one per table as far as databases go). In particular, the use of anonymous functions and joins means that we get queries that are very good. Consider:
var result = from a in db.Table1
join b in db.Table2
on a.relatedBs = b.id
select new {a.id, b.name};
Here we're ignoring columns we don't care about here, and the SQL produced will do the same. Consider what we would do if we were creating the objects that a and b relate to with hand-coded DAO classes:
Create a new class to represent this combination of a's id and b's name, and relevant code to run the query we need to produce instances.
Run a query to obtain all information about each a and the related b, and live with the waste.
Run a query to obtain the information on each a and b that we care of, and just set default values for the other fields.
Of these, option 2 will be wasteful, perhaps very wasteful. Option 3 will be a bit wasteful and very error prone (what if we accidentally try to use a field elsewhere in the code that wasn't set correctly?). Only option 1 will be more efficient than what the linq approach will produce, but this is just one case. Over a large project this could mean producing dozens or even hundreds or thousands of slightly different classes (and unlike the compiler, we won't necessarily spot the cases where they're actually the same). In practice, therefore, linq can do us some great favours when it comes to efficiency.
Good policies for efficient linq are:
Stay with the type of query you start with as long as you can. Whenever you grab items into memory with ToList() or ToArray etc, consider if you really need to. Unless you need to or you can clearly state the advantage doing so gives you, don't.
If you do need to move to processing in memory, favour AsEnumerable() over ToList() and the other means, so you only grab one at a time.
Examine long-running queries with SQLProfiler or similar. There are a handful of cases where policy 1 here is wrong and moving to memory with AsEnumerable() is actually better (most relate to uses of GroupBy that don't use aggregates on the non-grouped fields, and hence don't actually have a single SQL query they correspond with).
If a complicated query is hit many times, then CompiledQuery can help (less so with 4.5 since it has automatic optimisations that cover some of the cases it helps in), but it's normally better to leave that out of the first approach and then use it only in hot-spots that are efficiency problems.
You can get EF to run arbitrary SQL, but avoid it unless it's a strong gain because too much such code reduces the consistent readability using a linq approach throughout gives (I have to say though, I think Linq2SQL beats EF on calling stored procedures and even more so on calling UDFs, but even there this still applies - it's less clear from just looking at the code how things relate to each other).
*AFAIK, this particular optimisation isn't applied, but we're talking about the best possible implementation at this point, so it doesn't matter if it is, isn't, or is in some versions only.
†I'll admit though that Linq2SQL would often produce queries that use APPLY that I would not think of, as I was used to thinking of how to write queries in versions of SQLServer before 2005 introduced it, while code doesn't have those sort of human tendencies to go with old habits. It pretty much taught me how to use APPLY.
This question already has answers here:
What is the "N+1 selects problem" in ORM (Object-Relational Mapping)?
(19 answers)
Closed 8 years ago.
I have never heard of it, but people are refering to an issue in an application as an "N+1 problem". They are doing a Linq to SQL based project, and a performance problem has been identified by someone. I don't quite understand it - but hopefully someone can steer me.
It seems that they are trying to get a list of obects, and then the Foreach after that is causing too many database hits:
From what I understand, the second part of the source is only being loaded in the forwach.
So, list of items loaded:
var program = programRepository.SingleOrDefault(r => r.ProgramDetailId == programDetailId);
And then later, we make use of this list:
foreach (var phase in program.Program_Phases)
{
phase.Program_Stages.AddRange(stages.Where(s => s.PhaseId == phase.PhaseId));
phase.Program_Stages.ForEach(s =>
{
s.Program_Modules.AddRange(modules.Where(m => m.StageId == s.StageId));
});
phase.Program_Modules.AddRange(modules.Where(m => m.PhaseId == phase.PhaseId));
}
It seems the problem idetified is that, they expected 'program' to contain it's children. But when we refer to the child in the query, it reloads the proram:
program.Program_Phases
They're expecting program to be fully loaded and in memory, and profilder seems to indicate that program table, with all the joins is being called on each 'foreach'.
Does this make sense?
(EDIT: I foind this link:
Does linq to sql automatically lazy load associated entities?
This might answer my quetion, but .. they're using that nicer (where person in...) notation, as opposed to this strange (x => x....). So if this link Is the answer - i.e, we need to 'join' in the query - can that be done?)
In ORM terminology, the 'N+1 select problem' typically occurs when you have an entity that has nested collection properties. It refers to the number of queries that are needed to completely load the entity data into memory when using lazy-loading. In general, the more queries, the more round-trips from client to server and the more work the server has to do to process the queries, and this can have a huge impact on performance.
There are various techniques for avoiding this problem. I am not familiar with Linq to SQL but NHibernate supports eager fetching which helps in some cases. If you do not need to load the entire entity instance then you could also consider doing a projection query. Another possibility is to change your entity model to avoid having nested collections.
For performant linq first work out exactly what properties you actually care about. The one advantage that linq has performance-wise is that you can easily leave out retrieval of data you won't use (you can always hand-code something that does better than linq does, but linq makes it easy to do this without creating a library full of hundreds of classes for slight variants of what you leave out each time).
You say "list" a few times. Don't go obtaining lists if you don't need to, only if you'll re-use the same list more than once. Otherwise working one item at a time will perform better 99% of the time.
The "nice" and "strange" syntaxes as you put it are different ways to say the same thing. There are some things that can only be expressed with methods and lambdas, but the other form can always be expressed as such - indeed is after compilation. The likes of from b in someSource where b.id == 21 select b.name becomes compiled as someSource.Where(b => b.id == 21).Select(b => b.name) anyway.
You can set DataContext.LoadOptions to define exactly which entities you want loaded with which object (and best of all, set it differently in different uses quite easily). This can solve your N+1 issue.
Alternatively you might find it easer to update existing entities or insert new ones, by setting the appropriate id yourself.
Even if you don't find it easier, it can be faster in some cases. Normally I'd say go with the easier and forget the performance as far as this choice goes, but if performance is a concern here (and you say it is), then that can make profiling to see which of the two works better, worthwhile.
I'm using a few functions like
ICollection<ICache> caches = new HashSet<ICache>();
ICollection<T> Matches<T>(string dependentVariableName)
{
return caches
.Where(x => x.GetVariableName() == dependentVariableName)
.Where(x => typeof(T).IsAssignableFrom(x.GetType()))
.Select(x => (T) x)
.ToList();
}
in my current class design. They work wonderfully from an architecture perspective--where I can arbitrarily add objects of various related types (in this case ICaches) and retrieve them as collections of concrete types.
An issue is that the framework here is a scientific package, and these sorts of functions lie on very hot code paths that are getting called thousands of times over a few minute period. The result:
and functions like the above are the main consumers of the COMDelegate::DelegateConstruct.
As you can see from the relative distribution of the sample %, this isn't a deal breaker, but it would be fantastic to reduce the overhead a bit!
Thanks in advance.
1) i dont see how the code you posted it related to the performance data... the functions listed dont look like they are called from this code at all. so really i can't answer your question except to say that maybe you are interpreting he performance report wrong.
2) don't call .ToList at the end... just return the IEnumerable. that will help performance. only do ToList when you really do need a list that you can later add/remove/sort things in.
3) i dont have enough context but it seems this method could be eliminated by making use of the dynamic keyword
I have these two lines, that do exactly the same thing. But are written differently. Which is the better practice, and why?
firstRecordDate = (DateTime)(from g in context.Datas
select g.Time).Min();
firstRecordDate = (DateTime)context.Datas.Min(x => x.Time);
there is no semantic difference between method syntax and query
syntax. In addition, some queries, such as those that retrieve the
number of elements that match a specified condition, or that retrieve
the element that has the maximum value in a source sequence, can only
be expressed as method calls.
http://msdn.microsoft.com/en-us/library/bb397947.aspx
Also look here: .NET LINQ query syntax vs method chain
It comes down to what you are comfortable with and what you find is more readable.
The second one use lambda expressions. I like it as it is compact and easier to read (although some will find the former easier to read).
Also, the first is better suited if you have a SQL background.
I'd say go with what is most readable or understandable with regards to your development team. Come back in a year or so and see if you can remember that LINQ... well, this particular LINQ is obviously simple so that's moot :-)
Best practice is also quite opinionated, you aren't going to get one answer here. In this case, I'd go for the second item because it's concise and I can personally read and understand it faster than the first, though only slightly faster.
I personally much prefer using lambda expressions. As far as I know there is no real difference as you say you can do exactly the same thing both ways. We agreed to all use the lambda as it is easy to read, follow and to pick up for people who don't like SQL.
There is absolutely no difference in terms of the results, assuming you do actually write equivalent statements in each format.
Go for the most readable one for any given query. Complex queries with joins and many where clauses etc are often easier to write/read in the linq query syntax, but really simple ones like context.Employees.SingleOrDefault(e => e.Id == empId) are easier using the method-chaining syntax. There's no general "one is better" rule, and two people may have a difference of opinion for any given example.
There is no semantic difference between the two statements. Which you choose is purely a matter of style preference
Do you need the explicit cast in either of them? Isn't Time already a DateTime?
Personally I prefer the second approach as I find the extension method syntax more familiar than the LINQ syntax, but it is really just personal preference, they perform the same.
The second one written to more exactly look like the first would be context.Datas.Select(x => x.Time).Min(). So you can see how you wrote it with Min(x => x.Time) might be slightly more efficient, because you only have on operation instead of two
The query comprehension syntax is actually compiled down to a series of calls to the extension methods, which means that the two syntaxes are semantically identical. Whichever style you prefer is the one you should use.