Linq2Entities Include with Skip / Take - load issue

Linq2Entities Include with Skip / Take - load issue - c#

Note: I know there are a number of questions around for issues with Linq's .Include(table) not loading data, I believe I have exhausted the options people have listed, and still had problems.
I have a large Linq2Entities query on an application I'm maintaining. The query is built up as such:
IQueryable<Results> query = context.MyTable
.Where(r =>
r.RelatedTable.ID == 2 &&
r.AnotherRelatedTable.ID == someId);
Then predicates are built up depending on various business logic, such as:
if (sortColumn.Contains("dob "))
{
if (orderByAscending)
query = query.OrderBy(p => p.RelatedTable.OrderByDescending(q => q.ID).FirstOrDefault().FieldName);
else
query = query.OrderByDescending(p => p.RelatedTable.OrderByDescending(q => q.ID).FirstOrDefault().FieldName);
}
Note - there is always a sort order provided.
Originally the included tables were set at the beginning, after reading articles such as the famous Tip 22, so now they are done at the end (which didn't fix the problem):
var resultsList = (query.Select(r => r) as ObjectQuery<Results>)
.Include("RelatedTable")
.Include("AnotherRelatedTable")
.Skip((page - 1) * rowsPerPage)
.Take(rowsPerPage);
Seemingly at random (approximately for every 5000 users of the site, this issue happens once) the RelatedTable data won't load. It can be brute forced by calling load on the related table. But even the failure to load isn't consistent, I've run the query in testing and it's worked, but most of the time hasn't, without changing any code or data.
It is fine, when the skip and take aren't included, and the whole dataset is returned, but I would expect the skip and take to be done on the complete dataset - it certainly appears to be from profiling the SQL...
UPDATE 16/11/10: I have profiled the SQL against a problem data set, and I've been able to reproduce the query failing about 9/10 times, but succeeding the rest. The SQL being executed is identical when the query fails or succeeds except, as expected, for the parameters passed to the SQL.
The issue has been solved with the following change, but the question remains as to why this should be.
Failing - get LINQ to handle the rows:
var resultsList = (query.Select(r => r) as ObjectQuery<Results>)
.Include("RelatedTable")
.Include("AnotherRelatedTable")
.Skip((page - 1) * rowsPerPage)
.Take(rowsPerPage)
.ToList();
Working - enumerate the data then get the rows:
var resultsList = (query.Select(r => r) as ObjectQuery<Results>)
.Include("RelatedTable")
.Include("AnotherRelatedTable")
.ToList()
.Skip((page - 1) * rowsPerPage)
.Take(rowsPerPage);
Unfortunately the SQL this query creates contains some sensitive schema data so I can't post it, it is also 1400 lines long, so I wouldn't subject the public to it anyway!

The sole effect of Take() is to change the generated SQL. Other than that, the Entity Framework does not care about it at all. Same for .Skip(). It's hard to believe that this would have an effect on query materialization (although stranger things have happened).
So what could be causing this behavior? Off the top of my head:
A bug in your application or mapping which is causing an incorrect query to be generated.
A bug in the Entity Framework which would cause returned data to be materialized into objects incorrectly in certain circumstances.
Bad data in your database.
A bug in your database's SQL parser.
I don't think you're going to get a lot further with this until you can capture the generated SQL and run it yourself. This is actually not terribly hard, as you can set up a SQL profiler with an appropriate filter. If you find that the generated SQL is different in the buggy case, you can work backwards from there. If you find that the generated SQL is identical in the buggy case, the next step would be to look at the rows returned, preferably in the same context as the application ran it.
In short, I think you just have to keep tweaking your SQL profiling until you have the information you need.

Related

LINQ Search nvarchar(MAX) column extremely slow using .Contains()

I have a .net core API and I am trying to search 4.4 million records using .Contains(). This is obviously extremely slow - 26 seconds. I am just querying one column which is the name of the record. How is this problem generally solved when dealing with millions of records?
I have never worked with millions of records before so apart from the obvious altering of the .Select and .Take, I haven't tried anything too drastic. I have spent many hours on this though.
The other filters included in the .Where are only used when a user chooses to use them on the front end - The real problem is just searching by CompanyName.
Note; I am using .ToArray() when returning the results.
I have indexes in the database but cannot add one for CompanyName as it is Nvarchar(MAX).
I have also looked at the execution plan and it doesn't really show anything out of the ordinary.
query = _context.Companies.Where(
c => c.CompanyName.Contains(paging.SearchCriteria.companyNameFilter.ToUpper())
&& c.CompanyNumber.StartsWith(
string.IsNullOrEmpty(paging.SearchCriteria.companyNumberFilter)
? paging.SearchCriteria.companyNumberFilter.ToUpper()
: ""
)
&& c.IncorporationDate > paging.SearchCriteria.companyIncorperatedGreaterFilter
&& c.IncorporationDate < paging.SearchCriteria.companyIncorperatedLessThanFilter
)
.Select(x => new Company() {
CompanyName = x.CompanyName,
IncorporationDate = x.IncorporationDate,
CompanyNumber = x.CompanyNumber
}
)
.Take(10);
I expect the query to take around 1 / 2 seconds as when I execute a like query in ssms it take about 1 / 2 seconds.
Here is the code being submitted to DB:
Microsoft.EntityFrameworkCore.Database.Command: Information: Executing DbCommand [Parameters=[#__p_4='?' (DbType = Int32), #__ToUpper_0='?' (Size = 4000), #__p_1='?' (Size = 4000), #__paging_SearchCriteria_companyIncorperatedGreaterFilter_2='?' (DbType = DateTime2), #__paging_SearchCriteria_companyIncorperatedLessThanFilter_3='?' (DbType = DateTime2), #__p_5='?' (DbType = Int32)], CommandType='Text', CommandTimeout='30']
SELECT [t].[CompanyName], [t].[IncorporationDate], [t].[CompanyNumber]
FROM (
SELECT TOP(#__p_4) [c].[CompanyName], [c].[IncorporationDate], [c].[CompanyNumber], [c].[ID]
FROM [Companies] AS [c]
WHERE (((((#__ToUpper_0 = N'') AND #__ToUpper_0 IS NOT NULL) OR (CHARINDEX(#__ToUpper_0, [c].[CompanyName]) > 0)) AND (((#__p_1 = N'') AND #__p_1 IS NOT NULL) OR ([c].[CompanyNumber] IS NOT NULL AND (#__p_1 IS NOT NULL AND (([c].[CompanyNumber] LIKE [c].[CompanyNumber] + N'%') AND (((LEFT([c].[CompanyNumber], LEN(#__p_1)) = #__p_1) AND (LEFT([c].[CompanyNumber], LEN(#__p_1)) IS NOT NULL AND #__p_1 IS NOT NULL)) OR (LEFT([c].[CompanyNumber], LEN(#__p_1)) IS NULL AND #__p_1 IS NULL))))))) AND ([c].[IncorporationDate] > #__paging_SearchCriteria_companyIncorperatedGreaterFilter_2)) AND ([c].[IncorporationDate] < #__paging_SearchCriteria_companyIncorperatedLessThanFilter_3)
) AS [t]
ORDER BY [t].[IncorporationDate] DESC
OFFSET #__p_5 ROWS FETCH NEXT #__p_4 ROWS ONLY
SOLVED! With the help of both answers!
In the end as suggested, I tried full-text searching which was lightening fast but compromised accuracy of search results. In order to filter those results more accurately, I used .Contains on the query after applying the full-text search.
Here is the code that works. Hopefully this helps others.
//query = _context.Companies
//.Where(c => c.CompanyName.StartsWith(paging.SearchCriteria.companyNameFilter.ToUpper())
//&& c.CompanyNumber.StartsWith(string.IsNullOrEmpty(paging.SearchCriteria.companyNumberFilter) ? paging.SearchCriteria.companyNumberFilter.ToUpper() : "")
//&& c.IncorporationDate > paging.SearchCriteria.companyIncorperatedGreaterFilter && c.IncorporationDate < paging.SearchCriteria.companyIncorperatedLessThanFilter)
//.Select(x => new Company() { CompanyName = x.CompanyName, IncorporationDate = x.IncorporationDate, CompanyNumber = x.CompanyNumber }).Take(10);
query = _context.Companies.Where(c => EF.Functions.FreeText(c.CompanyName, paging.SearchCriteria.companyNameFilter.ToUpper()));
query = query.Where(x => x.CompanyName.Contains(paging.SearchCriteria.companyNameFilter.ToUpper()));
(I temporarily excluded the other filters for simplicity)

When you run the query in SSMS, it's probably cached for subsequent calls. The original query probably took similar time as the EF query. That said, there are disadvantages to parametrised queries - while you can better reuse execution plans in a parametrised query, this also means that the execution plan isn't necessarily the best for the actual query you're trying to run right now.
For example, if you specify a CompanyNumber (which is easy to find in an index due to the StartsWith), you can filter the data first by CompanyNumber, thus making the name search trivial (I assume CompanyNumber is unique, so either you get 0 records, or you get the one you get by CompanyNumber). This might not be possible for the parametrised query, if its execution plan was optimized for looking up by name.
But in the end, Contains is a performance killer. It needs to read every single byte of data in your table's CompanyName field; which usually means it has to read every single row, and process much of its data. Searching by a substring looks deceptively simple, but always carries heavy penalties - its complexity is linear with respect to data size.
One option is to find a way to avoid the Contains. Users often ask for features they don't actually need. StartsWith might work just as well for most of the cases. But that's a business decision, of course.
Another option would be finding a way to reduce the query as much as possible before you apply the Contains filter - if you only allow searching for company name with other filters that narrow the search down, you can save the DB server a lot of work. This may be tricky, and can sometimes collide with the execution plan collission issue - you might want to add some way to avoid having the same execution plan for two queries that are wildly different; an easy way in EF would be to build the query up dynamically, rather than trying for one expression:
var query = _context.Companies;
if (!string.IsNullOrEmpty(paging.SearchCriteria.companyNameFilter))
query = query.Where(c => c.CompanyName.Contains(paging.SearchCriteria.companyNameFilter));
if (!string.IsNullOrEmpty(paging.SearchCriteria.companyNumberFilter))
query = query.Where(c => c.CompanyNumber.StartsWith(paging.SearchCriteria.companyNumberFilter));
// etc. for the rest of the query
This means that you actually have multiple parametrised queries that can each have their own execution plan, more in line with what the query actually does. For some extreme cases, it might also be worthwhile to completely prevent execution plan caching (this is often useful in reports).
The final option is using full-text search. You can find plenty of tutorials on how to make this work. This works essentially by splitting the unformatted string data to individual words or phrases, and indexing those. This means that a search for "hello world" doesn't necessarily return all the records that have "hello world" in the name, and it might also return records that have something else than "hello world" in the name. Think Google Search rather than Contains. This can often be a great method for human-written text, but it can be very confusing for the user who doesn't understand why you'd return search results that are completely different from what he was searching for. It also often doesn't work well if you need to do partial searches (e.g. searching for "Computer" might return "Computer, Inc.", but searching for "Comp" might return nothing).
The first option is likely the fastest, and closest to what the users would expect. It has the weakness that it can't search in the middle, though. The second option is the most correct, and might make your query substantially faster, especially in the most common cases with good statistics. The third option is probably about as fast as the first one, but can be tricky to setup properly, and can be confusing for your users. It does also provide you with more powerful ways to query the text data (e.g. using wildcards).

Welcome to stack overflow. It looks like you are suffering from at least one of these three problems in your code and your architecture.
First: indexing
You've mentioned that this cannot be indexed but there is support in SQL Server for full text indexing at the very least.
.Contains
This method isn't exactly suitable for the size of operation you're performing. If possible, perhaps as a last resort, consider moving to a parameterized query. For now, however, it looks like you want to keep your business logic in the .net code rather than spreading it into SQL and that's a worthy plan.
c.IncorporationDate
Date comparison can be a little costly in SQL Server. Once you're dealing with so many millions of rows you might get a lot of performance benefit from correctly partitioned tables and indexes.
Consider whether or not these rows can change at all. Something named IncoporationDate sounds like it definitely should not be changed. I suspect you may want to leverage that after reading the rest of these.

How does Entity Framework entity loading work?

I've been playing around with making my own Entity Framework (for personal projects and out of curiosity on what making something like this would take).
While I was doing Entity Framework performance tests with a data table with 700k rows and 5 columns (named MassData), I ran into something peculiar issues that I'm hoping someone could explain to me.
Running the following test:
var Context = new EntityFameworkContext();
var first = Context.MassData.Where(x => x.Id == 1);
var firstFifty = Context.MassData.Where(x => x.Id < 50).ToArray();
The context creation takes 35ms, getting first takes about 215ms and getting firstFifty takes 14ms.
Removing 'first', getting 'firstFifty' takes about 210ms.
The results were the same if I switch the 'first' query with a Where() that selects everything (still with no iteration).
My first thought, was that this was some case of loading the lazy data in the DbSet, with the first query enumerating data the next one accesses (even though the first one doesn't iterate through anything). This would kind of explain why the first always takes a minimum of 200ms regardless of the query, while the second runs as fast as if no database connection was even involved (the 'firstFifty' takes 25ms minimum to run as an SQL query, more than the 15ms I'm seeing here).
Except loading all of MassData takes 5 seconds. Just reading it takes about 2,5. So it can't be loading everything, but it's clearly loading more than the first query requires. So obviously I'm missing something.
Would anyone happen to have an explanation for why the
var first = Context.MassData.Where(x => x.Id == 1);
query speeds up the
var firstFifty = Context.MassData.Where(x => x.Id < 50).ToArray();
query?
EDIT:
Turns out, it really had nothing to do with lazy loading at all. The first query opens the connection and (I presume) does and stores the validation of the entity type against the database table. The second query then doesn't have to open the connection or do much if any validation, in which case the duration of the second query matches up, and everything makes sense.
EDIT 2:
Modified title to better match what the question ended up really being about (How does lazy-loading work => how does entity loading work).

Because you're still loading the Entity Type and the prerequisites of it. Regardless of what you're trying to query. Lambda expression for EF still is SQL, with the conversion from Lambda to String clauses and statements. So the first slowness is not from the Query but from the EF Initial set up.
Remember, You're still creating an Instance of the EF so it will eat up some runtime process. Then the rest is Query Fetching time. This is unavoidable process of-course because of the CLR process.
So, Generally. The Second Query is ready for the Query since in your Model where you set up the EF is still on use, But when the Garbage collection decides that it is not going to be used anywhere, then your query for the next session will be slower at the begin Init for EF. "MEANING YOUR CONNECTION FROM THE DB IS STILL OPEN" as easy as that.

There are tools to show sql server activity, so you do not have to guess (for example sql profiler for microsoft sql server). But the lag on first query has probably nothing to do with database, it is just EF internal initialization. EF is notoriously lazy.

Entity Framework Core count does not have optimal performance

I need to get the amount of records with a certain filter.
Theoretically this instruction:
_dbContext.People.Count (w => w.Type == 1);
It should generate SQL like:
Select count (*)
from People
Where Type = 1
However, the generated SQL is:
Select Id, Name, Type, DateCreated, DateLastUpdate, Address
from People
Where Type = 1
The query being generated takes much longer to run in a database with many records.
I need to generate the first query.
If I just do this:
_dbContext.People.Count ();
Entity Framework generates the following query:
Select count (*)
from People
.. which runs very fast.
How to generate this second query passing search criteria to the count?

There is not much to answer here. If your ORM tool does not produce the expected SQL query from a simple LINQ query, there is no way you can let it do that by rewriting the query (and you shouldn't be doing that at the first place).
EF Core has a concept of mixed client/database evaluation in LINQ queries which allows them to release EF Core versions with incomplete/very inefficient query processing like in your case.
Excerpt from Features not in EF Core (note the word not) and Roadmap:
Improved translation to enable more queries to successfully execute, with more logic being evaluated in the database (rather than in-memory).
Shortly, they are planning to improve the query processing, but we don't know when will that happen and what level of degree (remember the mixed mode allows them to consider query "working").
So what are the options?
First, stay away from EF Core until it becomes really useful. Go back to EF6, it's has no such issues.
If you can't use EF6, then stay updated with the latest EF Core version.
For instance, in both v1.0.1 and v1.1.0 you query generates the intended SQL (tested), so you can simply upgrade and the concrete issue will be gone.
But note that along with improvements the new releases introduce bugs/regressions (as you can see here EFCore returning too many columns for a simple LEFT OUTER join for instance), so do that on your own risk (and consider the first option again, i.e. Which One Is Right for You :)

Try to use this lambda expression for execute query faster.
_dbContext.People.select(x=> x.id).Count();

Try this
(from x in _dbContext.People where x.Type == 1 select x).Count();
or you could do the async version of it like:
await (from x in _dbContext.People where x.Type == 1 select x).CountAsync();
and if those don't work out for you, then you could at least make the query more efficient by doing:
(from x in _dbContext.People where x.Type == 1 select x.Id).Count();
or
await (from x in _dbContext.People where x.Type == 1 select x.Id).CountAsync();

If you want to optimize performance and the current EF provider is not not (yet) capable of producing the desired query, you can always rely on raw SQL.
Obviously, this is a trade-off as you are using EF to avoid writing SQL directly, but using raw SQL can be useful if the query you want to perform can't be expressed using LINQ, or if using a LINQ query is resulting in inefficient SQL being sent to the database.
A sample raw SQL query would look like this:
var results = _context.People.FromSql("SELECT Id, Name, Type, " +
"FROM People " +
"WHERE Type = #p0",
1);
As far as I know, raw SQL queries passed to the FromSql extension method currently require that you return a model type, i.e. returning a scalar result may not yet be supported.
You can however always go back to plain ADO.NET queries:
using (var connection = _context.Database.GetDbConnection())
{
connection.Open();
using (var command = connection.CreateCommand())
{
command.CommandText = "SELECT COUNT(*) FROM People WHERE Type = 1";
var result = command.ExecuteScalar().ToString();
}
}

It seems that there has been some problem with one of the early releases of Entity Framework Core. Unfortunately you have not specified exact version so I am not able to dig into EF source code to tell what exactly has gone wrong.
To test this scenario, I have installed the latest EF Core package and managed to get correct result.
Here is my test program:
And here is SQL what gets generated captured by SQL Server Profiler:
As you can see it matches all the expectations.
Here is the excerpt from packages.config file:
...
<package id="Microsoft.EntityFrameworkCore" version="1.1.0" targetFramework="net452" />
...
So, in your situation the only solution is to update to the latest package which is 1.1.0 at the time of writing this.

Does this get what you want:
_dbContext.People.Where(w => w.Type == 1).Count();

I am using EFCore 1.1 here.
This can occur if EFCore cannot translate the entire Where clause to SQL. This can be something as simple as DateTime.Now that might not even think about.
The following statement results in a SQL query that will surprisingly run a SELECT * and then C# .Count() once it has loaded the entire table!
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > DateTime.Now.AddDays(-7)).Count();
But this query will run an SQL SELECT COUNT(*) as you would expect / hope for:
DateTime earliestDate = DateTime.Now.AddDays(-7);
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template
&& x.SendConfirmedDate > earliestDate).Count();
Crazy but true. Fortunately this also works:
DateTime now = DateTime.Now;
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > now.AddDays(-7)).Count();

sorry for the bump, but...
probably the reason the query with the where clause is slow is because you didnt provide your database a fast way to execute it.
in case of the select count(*) from People query we dont need to know the actual data for each field and we can just use a small index that doesnt have all these fields in them so we havent got to spend our slow I/O on. The database software would be clever enough to see that the primary key index requires the least I/O to do the count on. The pk id's require less space than the full row so you get more back to count per I/O block so you can complete faster.
Now in the case of the query with the Type it needs to read the Type to determine it's value. You should create an index on Type if you want your query to be fast or else it will have to do a very slow full table scan, reading all rows. It helps when your values are more discriminating. A column Gender (usually) only has two values and isnt very discriminating, a primary key column where every value is unique is highly dscriminating. Higher discriminating values will result in a shorter index range scan and a faster result to the count.

What I used to count rows using a search query was
_dbContext.People.Where(w => w.Type == 1).Count();
This can also be achieved by
List<People> people = new List<People>();
people = _dbContext.People.Where(w => w.Type == 1);
int count = people.Count();
This way you will get the people list too if you need it further.

Improving Linq query

I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?

There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.

1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.

I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)

I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.

Dynamically Generating a Linq/Lambda Where Clause

I've been searching here and Google, but I'm at a loss. I need to let users search a database for reports using a form. If a field on the form has a value, the app will get any reports with that field set to that value. If a field on a form is left blank, the app will ignore it. How can I do this? Ideally, I'd like to just write Where clauses as Strings and add together those that are not empty.
.Where("Id=1")
I've heard this is supposed to work, but I keep getting an error: "could not be resolved in the current scope of context Make sure all referenced variables are in scope...".
Another approach is to pull all the reports then filter it one where clause at a time. I'm hesitant to do this because 1. that's a huge chunk of data over the network and 2. that's a lot of processing on the user side. I'd like to take advantage of the server's processing capabilities. I've heard that it won't query until it's actually requested. So doing something like this
var qry = ctx.Reports
.Select(r => r);
does not actually run the query until I do:
qry.First()
But if I start doing:
qry = qry.Where(r => r.Id = 1).Select(r => r);
qry = qry.Where(r => r.reportDate = '2010/02/02').Select(r => r);
Would that run the query? Since I'm adding a where clause to it. I'd like a simple solution...in the worst case I'd use the Query Builder things...but I'd rather avoid that (seems complex).
Any advice? :)

Linq delays record fetching until a record must be fetched.
That means stacking Where clauses is only adding AND/OR clauses to the query, but still not executing.
Execution of the generated query will be done in the precise moment you try to get a record (First, Any etc), a list of records(ToList()), or enumerate them (foreach).
.Take(N) is not considered fetching records - but adding a (SELECT TOP N / LIMIT N) to the query

No, this will not run the query, you can structure your query this way, and it is actually preferable if it helps readability. You are taking advantage of lazy evaluation in this case.
The query will only run if you enumerate results from it by using i.e. foreach or you force eager evaluation of the query results, i.e. using .ToList() or otherwise force evaluation, i.e evaluate to a single result using i.e First() or Single().

Try checking out this dynamic Linq dll that was released a few years back - it still works just fine and looks to be exactly what you are looking for.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.