How to optimize this LINQ query for Visual Studio?

How to optimize this LINQ query for Visual Studio? - c#

I have this one gigantic complex LINQ to SQL query that I need to optimize somehow, because the background C# compiler completely hogs the CPU and I can't type or edit my .cs file normally in Visual Studio 2010 (every letter, especially if IntelliSense wants to pop up, lags horribly).
The culprit is this:
var custFVC =
(from cfvc in customer.CustomerFrameVariationCategories
let lastValue = cfvc.CustomerFrameVariationCategoryValueChanges.Where(cfvcvc => cfvcvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(cfvcvc2 => cfvcvc2.ChangeDateTime).FirstOrDefault() ?? new CustomerFrameVariationCategoryValueChange()
let lastValue2 = cfvc.FrameVariationCategory.FrameVariation.Frame.FrameValueChanges.Where(fvc => fvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fvc2 => fvc2.ChangeDateTime).FirstOrDefault() ?? new FrameValueChange()
let lastValue3 = cfvc.FrameVariationCategory.FrameVariationCategoryValueChanges.Where(fvcvc => fvcvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fvcvc2 => fvcvc2.ChangeDateTime).FirstOrDefault() ?? new FrameVariationCategoryValueChange()
let lastValue4 = cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Any(fvm => (fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleValueChange()).IsActive == false)
where lastValue.IsActive == true
orderby cfvc.FrameVariationCategory.FrameVariation.Frame.Name, cfvc.FrameVariationCategory.Category.Name, cfvc.FrameVariationCategory.FrameVariation.Name
select new
{
cfvc.Id,
cfvc.FrameVariationCategory,
lastValue.CoverCoefficient,
lastValue.NeiserNet,
PlywoodName = lastValue2.Plywood.Name,
FrameIsActive = lastValue2.IsActive,
OwnCost = cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Sum(fvm => // sum all frame variation modules
(lastValue4 ? 0 : fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime) // if module not active then 0
.OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault().Porolone) + // otherwise get Porolone
fvm.FrameModule.FrameModuleComponents.Sum(fmc => // add to Porolone sum of all module components
(fmc.Article.ArticleDetails.Any() ? fmc.Article.ArticleDetails.Sum(ad => // if any article details then use A*L*W*T instead of Amount
WindowExcel.MultiplyArticleDetailValues(ad.ArticleDetailValueChanges.Where(advc => advc.ChangeDateTime <= this.SelectedDateTime)
.OrderByDescending(advc2 => advc2.ChangeDateTime).FirstOrDefault() ?? new ArticleDetailValueChange())) :
WindowExcel.GetModuleComponentAmount(fmc.FrameModuleComponentValueChanges.Where(fmcvc => fmcvc.ChangeDateTime <= this.SelectedDateTime) // no details = get amount
.OrderByDescending(fmcvc2 => fmcvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleComponentValueChange())) * // times article values
WindowExcel.MultiplyArticleValues(fmc.Article.ArticleValueChanges.Where(avc => avc.ChangeDateTime <= this.SelectedDateTime)
.OrderByDescending(avc2 => avc2.ChangeDateTime).FirstOrDefault() ?? new ArticleValueChange()))),
Cubes = cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Sum(fvm => (fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime && fmvc.IsActive == true).OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleValueChange()).Cubes),
lastValue3.CoverNet,
lastValue3.CoverGarbage,
lastValue3.CoverGross,
lastValue3.CoverPrice,
lastValue3.BackgroundNet,
lastValue3.BackgroundGarbage,
lastValue3.BackgroundGross,
lastValue3.BackgroundPrice,
FVCIsActive = lastValue3.IsActive,
FrameModuleAnyNonActive = lastValue4
}).ToList();
The biggest problem here is OwnCost, everything up to and after that Visual Studio can handle. I don't want to turn off background compiling (a feature that checks for compile time errors before actually compiling), I don't want to create a stored procedure. I can't off this code into a separate class/method, because the LINQ DataContext can't be passed around (as far as I know - also take into consideration that the context variable is inside a using statement).
The only vague idea that I have, is some sort of an extension method, or a method that returns a LINQ query or something like that. Because I don't know what exactly it is that I can do here to rectify the problem, I don't know how to formulate the wording, thus I can't google it...
How can I move (or optimize) OwnCost or the entire query out of the current .cs file, or perhaps split it into a method within the same file (might help the background compiler), or "something"...?

My first instinct as that you're trying to make LINQ to SQL do the work of a stored procedure. But that may be incorrect; it's pretty difficult to tell if it would even be possible for a stored procedure to do this.
My second instinct is that it should be possible to split the OwnCost calculation into a function, so that this query just contains
OwnCost = cfvc.Select(CalculateOwnCost)
My third instinct, on seeing that the calculation includes a WindowExcel object, is to flee screaming, but I'm going to take a couple of deep breaths and ask, are you in fact interoperating with Excel in the context of this query, and might that possibly be a source of problems?
Edit
To break out the OwnCost calculation into its own function, do something like this:
public decimal CalculateOwnCost(CustomerFrameVariationCategory cvfc)
{
return cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Sum(fvm => // sum all frame variation modules
(lastValue4 ? 0 : fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime) // if module not active then 0
.OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault().Porolone) + // otherwise get Porolone
fvm.FrameModule.FrameModuleComponents.Sum(fmc => // add to Porolone sum of all module components
(fmc.Article.ArticleDetails.Any() ? fmc.Article.ArticleDetails.Sum(ad => // if any article details then use A*L*W*T instead of Amount
WindowExcel.MultiplyArticleDetailValues(ad.ArticleDetailValueChanges.Where(advc => advc.ChangeDateTime <= this.SelectedDateTime)
.OrderByDescending(advc2 => advc2.ChangeDateTime).FirstOrDefault() ?? new ArticleDetailValueChange())) :
WindowExcel.GetModuleComponentAmount(fmc.FrameModuleComponentValueChanges.Where(fmcvc => fmcvc.ChangeDateTime <= this.SelectedDateTime) // no details = get amount
.OrderByDescending(fmcvc2 => fmcvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleComponentValueChange())) * // times article values
WindowExcel.MultiplyArticleValues(fmc.Article.ArticleValueChanges.Where(avc => avc.ChangeDateTime <= this.SelectedDateTime)
.OrderByDescending(avc2 => avc2.ChangeDateTime).FirstOrDefault() ?? new ArticleValueChange()))),
Cubes = cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Sum(fvm => (fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime && fmvc.IsActive == true).OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleValueChange()).Cubes)
}
That assumes that CustomerFrameVariationCategories is a colleciton of CustomerFrameVariationCategory objects, and that OwnCost is a decimal.
Once you do this, your original query can just include the Select that I showed above - you can also write it as
OwnCost = cfvc.Select(x => CalculateOwnCost(x))
if it makes you more comfortable (me, I've gotten scolded by Resharper enough on this point that I've come to accept it, but it's a matter of taste).
There's no reason you can't further decompose some of the intermediate expressions in that query into their own functions. A lambda function is just a function, after all.

I don't have any inherent insight into this problem in terms of what is expensive in the C# compiler. However the two things that jump out when I look at your query are the following
The number and complexity of the let bindings
The complexity of the initializer of the OwnCost member inside the select clause
The best advice I can give is to try and break up the query in order to get these into separate statements and hopefully that will ease the pressure on the compiler.

Split it up. That is one massive expression tree VS is trying to deal with. You could break it up so that some of the SELECT clause transformation happens in LINQ-to-object. This would be a lot easier for the background compiler to deal with. Just get:
var custFVC = (from cfvc in customer.CustomerFrameVariationCategories
let lastValue = cfvc.CustomerFrameVariationCategoryValueChanges.Where(cfvcvc => cfvcvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(cfvcvc2 => cfvcvc2.ChangeDateTime).FirstOrDefault() ?? new CustomerFrameVariationCategoryValueChange()
let lastValue2 = cfvc.FrameVariationCategory.FrameVariation.Frame.FrameValueChanges.Where(fvc => fvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fvc2 => fvc2.ChangeDateTime).FirstOrDefault() ?? new FrameValueChange()
let lastValue3 = cfvc.FrameVariationCategory.FrameVariationCategoryValueChanges.Where(fvcvc => fvcvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fvcvc2 => fvcvc2.ChangeDateTime).FirstOrDefault() ?? new FrameVariationCategoryValueChange()
let lastValue4 = cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Any(fvm => (fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleValueChange()).IsActive == false)
where lastValue.IsActive == true
orderby cfvc.FrameVariationCategory.FrameVariation.Frame.Name, cfvc.FrameVariationCategory.Category.Name, cfvc.FrameVariationCategory.FrameVariation.Name
select new
{ cfvc, lastValue, lastValue1, lastValue2, lastValue3}).ToList();
And then do the rest of your manipulation from there. If the result set is small, this might be more efficient anyway, and certainly easier on your db. If the result set is small, doing this will have very little performance cost.
If you have a bored, underworked db and there's a large result set and the machine where this code is running is strained, then you might need to keep the workload on the db.
Keep in mind that just breaking up into several steps the building of one massive expression tree to be run against IQuerable will not do you any good. That last variable will be as complicated (under the hood) as yours is now, and the compiler will still choke. The bottom line is you need to run .ToList() earlier in the life of this manipulation. A series of LINQ-to-object queries against IEnumerable won't be difficult for the background compiler to handle.

VS is probably choking on this because it's such a large, complex single statement.
Since the only linkages between the OwnCost and the LINQ context are the reference to cfvc and lastValue4, it seems to me that you could calculate OwnCost in a separate step after the initial LINQ query statement. Store the lastValue4 in the anonymous type constructed by the LINQ statement, remove OwnCost, and remove .ToList() from the end. You don't need to store the cfvc value since the only thing you're using it for is to access .FrameVariationCategory, which you've already captured in the 2nd field of the anonymous type.
In a separate statement, select from the CustFVC resultset to construct the OwnCost for each item to produce a new result set that contains all the data bits you're looking for. Call .ToList() on that second result set. This should produce equivalent results to the monster statement in similar time.
If this is a large result set, be careful about iterating over the data multiple times. If you use a foreach to calculate OwnCost for each item in the original result set, you will be running through the data twice - twice as much work as the single monster LINQ query.
If you use a LINQ query for the second operation, it shouldn't cause any additional passes over the data beyond what you already have - LINQ is lazy evaluated, so the next row is not actually retrieved until asked for. ToList() forces all rows to be retrieved. A foreach loop forces all rows to be retrieved. LINQ query using a LINQ query as an input does not iterate any rows of the input result set, it just piles on more conditions to be evaluated when somebody eventually asks for the next row of the 2nd result set.

I may be wrong here and it's just a guess, but have you tried splitting your class (and essentially your file) with the partial keyword?

Hah, I found a solution myself :)
Roberts instinct about a LINQ function got me googling. The results were not relevant to the matter at hand, but the little code I did stumble upon, got me thinking about a brute force attack method. Using redoced's idea of a partial class I finally wrote this piece of code in a seperate .cs file:
public partial class WindowExcel
{
private static decimal GetOwnCost(CustomerFrameVariationCategory cfvc, bool frameModuleAnyNonActive, DateTime selectedDateTime)
{
return cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Sum(fvm => // sum all frame variation modules
(frameModuleAnyNonActive ? 0 : fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= selectedDateTime) // if module not active then 0
.OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault().Porolone) + // otherwise get Porolone
fvm.FrameModule.FrameModuleComponents.Sum(fmc => // add to Porolone sum of all module components
(fmc.Article.ArticleDetails.Any() ? fmc.Article.ArticleDetails.Sum(ad => // if any article details then use A*L*W*T instead of Amount
WindowExcel.MultiplyArticleDetailValues(ad.ArticleDetailValueChanges.Where(advc => advc.ChangeDateTime <= selectedDateTime)
.OrderByDescending(advc2 => advc2.ChangeDateTime).FirstOrDefault() ?? new ArticleDetailValueChange())) :
WindowExcel.GetModuleComponentAmount(fmc.FrameModuleComponentValueChanges.Where(fmcvc => fmcvc.ChangeDateTime <= selectedDateTime) // no details = get amount
.OrderByDescending(fmcvc2 => fmcvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleComponentValueChange())) * // times article values
WindowExcel.MultiplyArticleValues(fmc.Article.ArticleValueChanges.Where(avc => avc.ChangeDateTime <= selectedDateTime)
.OrderByDescending(avc2 => avc2.ChangeDateTime).FirstOrDefault() ?? new ArticleValueChange())));
}
}
And in my gigantic LINQ query I rewrote OwnCost as such:
OwnCost = WindowExcel.GetOwnCost(cfvc, lastValue4, this.SelectedDateTime)
Editing the GetOwnCost method is still painfully slow, as was excepted, but at least the rest of my project is now usable. I'm not sure what this brute force seperation does to performance. The fact that I can't ref the CustomerFrameVariationCategory and that the OwnCost expression tree is inside of a method not in a LINQ query itself, raises questions. Guess I'll have to profile it at some point, but that's a hole other issue.
Now to the delicate issue of what to mark as the answer. Though I do appreciate all the input, none of the answers so far were correct (no concrete solution), thus I'll have to mark my own post as the answer. But I will vote for redoced's and Robert's answers for pointing me in the right direction.
I would appreciate, if anyone can comment about possible code execution performance impacts for my solution vs the original code.
PS! Writing this in Internet Explorer 8 is again painfully slow because of the constant CPU hogging (has something to do with coloring the code). So it's not only a VS issue....
Edit:
It seems Robert has managed to post the exact same solution I came up with. Would have probably got my answer posted earlier if not for the constant CPU hogging...
In all fairness I marked Robert's post as the answer :)

Instead of writing LINQ to SQL you can write a stored procedure for this to do all these things.

Related

C# LINQ: is calling .Where() multiple times bad for performance?

If I want to query a database is it better to make 1 call to .Where() with a large set of conditions or can I make several successive calls to .Where with smaller conditions?
e.g.
_db.Person.Where(p => p.Name = X && p.Age > 1 && p.Face == Attractive)
or
var person = _db.Person.Where(p => p.Name = X)
person = person.Where(p => p.Age > 1)
person = person.Where(p => p.Face == Attractive)
To filter results I presume LINQ has to loop over items. Does LINQ .Where have any optimisation features to prevent the second approach impacting performance?

Linq-to-SQL and Entity Framework do not work like that. They translate the query into a single SQL query, therefore it's not going to make the slightest bit of difference which one you do, as either way you get the same SQL, which the database engine will compile using the best indexes available.
In fact, even if the conditions are flipped it will not make a difference on the vast majority of DBMSs, because of the way they compile the SQL, using indexes and statistics to reorder conditions and so on.
Linq-To-Objects on the other hand will be very slightly faster. It still won't need to loop the whole list again if you have multiple Where. What it actually does is something like this
var person = list.Where(p => yourCondition && previousConditions)
But because each one is in a separate lambda there is a slight overhead to an extra function call.
Furthermore, the logic would be reversed: the last condition is checked first.

Find Distinct Count in MongoDB using C# (mongocsharpdriver)

I have a MongoDB collection, and I need to find the distinct count of records after running a filter.
This is what I have right now,
var filter = Builders<Foo>.Filter.Eq("bar", "1");
db.GetCollection<Foo>("FooTable").Distinct<dynamic>("zoo", filter).ToList().Count;
I don't like this solution as it reads the collection in memory to get the count.
Is there a better way to get distinct count directly from db?

The following code will get the job done using the aggregation framework.
var x = db.GetCollection<Foo>("FooTable")
.Aggregate()
.Match(foo => foo.bar == 1)
.Group(foo => foo.zoo,
grouping => new { DoesNotMatter = grouping.Key })
.Count()
.First()
.Count;
The funky "DoesNotMatter" bit seems required (could have a different name) but the driver won't accept null or anything else... Mind you, it gets optimized away anyway and will not be sent to MongoDB.
Also, this code will execute entirely on the server. It won't, however, use indexes so will probably be slower than what you have at this point.
You current code could be shortened into:
db.GetCollection<Foo>("FooTable").Distinct(d => d.zoo, d => d.bar == 1).ToList().Count;
This will use indexes if available but, yes, it will transfer the list of all distinct values back to the client...

Nested .Select() inside of .Where()

I have a many-to-many relationship between tables of Games and Genres. During an analysis, i need to get items from Games that match specific criteria.
The problem is, to check for this criteria, i need to analyse genres of this specific game. And linq won't let me do it.
My request now looks like this:
var result = GDB.Games.Where((g)=>
g.GamesToGenres.Select((gtg)=>
(weights.ContainsKey(gtg.Genre.Name) ? weights[gtg.Genre.Name]:0.0)
).Sum() > Threshhold
).ToArray();
When I execute it, I receive SQL exception
Only one expression can be specified in the select list when the
subquery is not introduced with EXISTS.
Is there a workaround? How can i perform such Select inside of Where?
EDIT: weights is a Dictionary<string, double>.
EDIT: I was playing with lambdas, and found out a strange thing in their behaviour:
this code won't work, throwing nvarchar to float conversion exception:
Func<string, double> getW = (name) => 1;
var t = GDB.Games.Where((g)=>
g.GamesToGenres.Select((gtg)=>
getW(gtg.Genre.Name)
).Sum() > Threshhold
).ToArray();
but this one will work nicely:
var t = GDB.Games.Where((g)=>
g.GamesToGenres.Select((gtg)=>
1
).Sum() > Threshhold
).ToArray();
This leads me to conclusion that linq lambdas are not usual lambdas. What's wrong with them, then? What are their limitations? What i can and what i can't do inside of them? Why is it ok for me to place a .select call inside of lambda, but not my own call of getW?
RESOLVED. See the answer below. Long story short, C# can't into clojures unless explicitly told so. If anyone knows better answer, i am still confused.

Your problem is you're trying to select something form the dictionary weights that exists in your application and not in your DB. If it was the result of a query to your DB, use the query.Single(...) in its place

Well, i am confused beyond imagination. The following code works perfectly:
Func<Game, bool> predicate = (g) =>
g.GamesToGenres.Select((gtg) =>
(weights.ContainsKey(gtg.Genre.Name) ? weights[gtg.Genre.Name] : 0.0)
).Sum() > Threshhold;
var t = GDB.Games.Where(predicate).ToArray();
careful reader might want to say "Hey! Isn't that the very same code you wrote in the question? You just explicitly assigned it to a variable!", and he will be right. Right now it seems like C# lambda processor is a piece of something, and it creates clojure only when you explicitly declare a lambda. If someone can describe this phenomena to me, i will be gratefull, for right now i am more confused than a newborn baby.

LINQ allows you to combine SQL data with local data like (Dictionary, etc.) with one restriction. You need to select data from SQL first. This means your code will work if you replace GDB.Games.Where with GDB.Games.ToList().Where. You can ask about performance, but you able to select a slice of data like GameId, Genre Name, etc. Then filter out games. Then return end list of full game's info by game ID list.

Linq query runs in a fraction of a second, but .ToList() takes 3.5 seconds

I'm running a Linq query that returns about 25 records, each with 10 numeric columns. According to my code profiler, the query itself is taking a fraction of a second - but the call to .ToList() is taking about 3.5 seconds. As noted, the volume of data return from SQL is trivial, so the time taken to copy it into a List should not be burdensome.
Why is .ToList() taking so long? And how could it be improved?
EDIT: With appreciation to all the rapid answers, let me state more clearly: I am completely aware of the fact that the query is lazy loaded. The phenomenon I am seeing is that both SQL Server Profiler and ANTS Performance Profiler are reporting that the actual query execution time was a fraction of a second.
Here's a screen shot from ANTS:
Notice that the calling method is taking 4.3 seconds, while none of the actual SQL queries is taking longer than .05 seconds. Could it be other code in that method, and not SQL? Let's look at how ANTS breaks down the code profile here:
Smoking gun proof: .ToList() is taking 3.36 seconds, of which maybe 0.05 sec can be attributed to actual query execution time, leaving 3.31 sec unaccounted for.
Where's that time going to?
EDIT 2: Okay, you asked for it, so here's my code:
public static Expression<Func<Student, Chart>> GetStudentAssessmentQuestionResultByStudentIdNew(MyDataEntities db)
{
return s => new Chart
{
studentID = s.ID,
Lines =
db.StudentAssessmentAnswers
.Where(
saa =>
saa.StudentAssessment.BorrowedBook.StudentID == s.ID && saa.PointsAwarded != null &&
saa.Question.PointValue > 0 &&
(saa.Question.QuestionType == QuestionType.MultipleChoice ||
saa.Question.QuestionType == QuestionType.OpenEnded))
.GroupBy(
saa =>
new
{
saa.StudentAssessment.AssessmentYear,
saa.StudentAssessment.AssessmentMonth,
saa.Question.CommonCoreStandard
},
saa => saa)
.Select(x => new
{
x.Key.AssessmentYear,
x.Key.AssessmentMonth,
x.Key.CommonCoreStandard,
PercentagePointValue =
(float)(x.Sum(a => a.PointsAwarded) * 100) / (x.Sum(a => a.Question.PointValue))
})
.OrderByDescending(x => x.CommonCoreStandard)
.GroupBy(r1 => (byte)r1.CommonCoreStandard)
.Select(g => new ChartLine
{
ChartType = ((ChartType)g.Key),
//type = g.Key.ToString(),
type = g.Key,
Points = g.Select(grp => new ChartPoint
{
Year = grp.AssessmentYear.Value,
Month = grp.AssessmentMonth.Value,
yValue = grp.PercentagePointValue
})
})
};
}
This is called by:
var students =
db.ClassEnrollments
.Where(ce => ce.SchoolClass.HomeRoomTeacherID == teacherID)
.Select(s => s.Student);
var charts = CCProgressChart.GetStudentAssessmentQuestionResultByStudentIdNew(db);
var chartList = students.Select(charts).ToList();
Does that help any?

.ToList() is actually executing the query. So your query is taking 2.5 seconds to run.
Read more about Deferred Execution here.
Without posting your actual LINQ query, we have no means to help you with it's performance (if you post it, I'll update my answer).

LINQ to SQL is lazy loaded. Nothing actually happens until you call the ToList().
Edit:
Since you've updated your answer there's a few things to note. ToList() is taking 73.4% of the time in the constructor. This is the place the SQL statment is actually executed. Below is the actual ToList method:
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
if (source != null)
{
return new List<TSource>(source);
}
else
{
throw Error.ArgumentNull("source");
}
}
Your SQL is not actually executed until the following line is called:
return new List<TSource>(source);
So yes ToList is taking forever but that's because its started to process the source parameter. Antz is a very useful tool but it can sometimes lead your down the wrong path. The issue is most likely going to be one of these:
The T-SQL generated is hideous and slow.
Opening the connection to the SQL server is slow
Your lambda expression tree is complicated and entity framework is taking a long time to process it and generate the T-SQL
I would start by first opening up SQL profiler and looking at the SQL. Even what looks like the simplest statement can be doing something crazy, the only way to know for sure is to look at the actual T-SQL.
If the T-SQL is ok your going to have to play around with your Lambda expression to make it simpler, and easier to process.

Linq2SQL uses something called Lazy Loading. Basically until you look in the container you asked for, it doesn't go near the data source. You can keep building up your query, but as soon as you toList it, or look at the First, it builds up your query, sends it to the database and waits for the result.... hence it takes so long

Without more detail, it's hard to know for sure. But, what type is your List of? What does that class do in the constructor/setters? Does it implement INotifyPropertyChanged, or fire any other events?
Bear in mind the query will only be executed when you call ToList. Up until that point, it's just a queryable object. Look at it with a SQL profiler, and you'll see that that's when it does the DB access.

Well, after all that, it turns out that ANTS was doing a Heisenberg on us.
The way things played out was that I originally got a complaint about poor performance in this area. I ran ANTS and identified that this code was responsible. So I refactored, optimized and the result was the code you see in the question. I then retested, and found a significant improvement, but performance was still unacceptable. Then came the SO question.
And then I decided to try running the unit test without ANTS... and it ran in a fraction of a second.
Lesson learned: sometimes the performance profiler is itself the reason for poor performance.

Very poor performance with a simple query in Entity Framework

So I have a very simple structure:
I have Orders that have a unique OrderNumber
Orders have many OrderRows
OrderRows have many RowExtras that have 2 fields, position (the sequence number of the RowExtra within the OrderRow) and Info, which is a string. More often than not, an OrderRow does not have more than one RowExtra.
(Don't mind the silly structure for now, it's just how it is).
So now I get a list of objects that have three properties:
OrderNumber
Position
Info
What I want to do is simply 1) check if the RowExtra with the given OrderNumber/Position -pair exists in the database and if so, 2) update the Info-property.
I have tried a few different ways to accomplish this with very poor results at best. The solutions loop through the list of objects and issue a query such as
myContext.RowExtras.Where(x => x.Position == currentPosition &&
x.OrderRow.Order.OrderNumber == currentOrderNumber)
or going from the other side
myContext.Orders.Where(x => x.OrderNumber == currentOrderNumber)
.SelectMany(x => x.OrderRows)
.SelectMany(x => x.RowExtras)
.Where(x => x.Position == currentPosition)
and then check if the count equals to 1 and if so, update the property, otherwise proceed to next item.
I currently have roughly 4000 RowExtras in the database and need to update about half of them. These methods make the procedure take several minutes to complete, which is really not acceptable. What I don't understand is why it takes such a long time, because the SQL-clause that returns the required RowExtra would be quite easy to write manually (with just 2 joins and 2 conditions in the where-part).
The best performance I managed to achieve was with a compiledquery looking like this
Func<MyContext, int, string, IQueryable<RowExtra>> query =
CompiledQuery.Compile(
(MyContext ctx, int position, string orderNumber) =>
from extra in ctx.RowExtras
where
extra.Position == position &&
extra.OrderRow.Order == orderNumber
select extra);
and then invoking said query for each object in my list. But even this approach took way over a minute. So how do I actually get this thing to run within a reasonable timeframe?
Also, I'm sorry for the overly long explanation, but hopefully someone can help me!

Try to minimise the number of database calls. As a rule of thumb, each one will take roughly 10ms at least - even one that just returns a scalar.
So, in general, fetch all the data you will need in one go, modify it in code and then save it.
List<Order> orders = myContext.Orders
.Include( "OrderRows.RowExtras" )
.Where( ... select all the orders you want, not just one at a time ... )
.ToList();
foreach ( Order order in orders )
{
... execute your logic on the in-memory model
}
myContext.SaveChanges();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.