Get LINQ to preload a complete table - c#

I need LINQ to grab a whole table, but this seems not to be working... everytime i select on the values via the pkey, a select is fired again..
So, actually this code:
DataContext dc = new DataContext();
dc.Stores.ToList();
Store st = dc.Stores.SingleOrDefault(p => p.Id == 124671);
is making a
select * from store
at the "ToList()" method and an ADDITIONAL
select * from store where id = 124671
at the selection part below it...
Of course, i want to prevent it to make the second select..
How would i do it? (I DON'T want to store the ToList() result in an additional property like List< Store > )
UPDATE:
Regarding your answers that would mean, that:
Store st = stores.SingleOrDefault(p => p.Id == 124671);
Store st = stores.SingleOrDefault(p => p.Id == 124671);
would trigger 2 selects to the DB, which would make the LINQ-idea useless?! Or what am i getting wrong here?
I thought LINQ would basically save all the data i grabbed in the selects and ONLY performs another request when the data was not found in the "cache".. So, i thought of it like some kind of "magical" storagelayer between my application and the database..
UPDATE #2
Just that you get the idea.. i want to lose the performance at the beginning ( when grabbing all data ) and win it back alter when i select from the "cached" data...

Try this instead:
DataContext dc = new DataContext();
var allStores = dc.Stores.ToList();
Store st = allStores.SingleOrDefault(p => p.Id == 124671);

(I'm assuming it knows that the id is the primary key etc)
Which exact version? That looks like LINQ-to-SQL; in 3.5 (no service packs), yes - the identity manager was a bit rubbish at this. It got better in 3.5 SP1, and is supposedly fixed in 4.0.
In .NET 3.5 SP1, the Single(predicate) approach works IIRC, so perhaps use that and catch block? A try/catch will be faster than a network hop.
From the later connect post:
This bug has now been fixed and will
be included in .NET Framework 4.0. The
optimization to search the cache first
for ID-based lookups will now be done
for
Single/SingleOrDefault/First/FirstOrDefault(predicate)
as well as
Where(predicate).Single/SingleOrDefault/First/FirstOrDefault(),
where predicate has the same
restrictions as before.

What you are doing is getting the result as a list, throw it away, and query the original again. That's why you get the double trips to the database.
The ToList method returns the list as a result, which you can store in a variable and query from:
var stores = dc.Stores.ToList();
Store st = stores.SingleOrDefault(p => p.Id == 124671);

Short answer - you don't need to have the ToList()
Long answer - your ToList() call is totally rundant in the supplied example. When working with Linq, nothing it actually executed on your database server until you actually enumerate the IQueryable.
If you just want to get a single record by PK, then just need to work with the data-context directly:
var theStore = dc.Stores.SingleOrDefault(p => p.Id == 124671);
If you actually want to get all of the records and iterate over them, the you can call ToList() on the table:
var allStores = dc.Stores.ToList()
If all you want is a single record, performing
var allStores = dc.Stores.ToList();
Store st = allStores.SingleOrDefault(p => p.Id == 124671);
is a complete failure. You are bringing all of the results into memory and then get .Net to filter the list, thus losing all benefits of indexes on your database etc.

Related

Entity Framework is not able to pull specific record by its ID in C# back end

Lets say I have a table called Books, I have few records in that as shown below
ID BookName
1 Book1
2 Book2
3 Book3
when I query above table from my C# back end using records ID as below
var result = context.Books.where(b => b.ID == 1).FistOrDefault();
I get null for ID = 1 but for IDs 2,3 I get the whole records. When I directly query record ID = 1 on SSMS I get the record ID 1.
It dose not make sense to me why and how that can happen. Any help or clue will be highly appreciated.
you need to query table books. please change your query as below:
var result = context.Books.Where(b => b.ID == 1).FistOrDefault();
or
var result = context.Books.FistOrDefault(b => b.ID == 1);
Also, if its working with ID values 2 & 3 then next thing I would check is the database that your C# is actually pointed to. Also I would check the schema where Books table exists. By default EF, queries table which are with dbo schema unless you have defined otherwise.
Next thing, I would check is the query that is being sent to database. For this, extract context.Books.Where(b => b.ID == 1) to a variable and get the sql query and run it manually in SSMS.
var queryableBooks = context.Books.Where(b => b.ID == 1);
var result = queryableBooks.FirstOrDefault(); // have a break point here
Then during debugging, verify queryableBooks SQL via Quick Watch. YOu will see something like below:
Alternatively, run a trace against the database to capture the SQL statements actually running against it, use a breakpoint in the code, start the trace and execute the read line. The trace will reveal the exact SQL statement(s) being run. You can then copy those statements into SSMS to inspect what they return. A simple trace tool I use for SQL Server is ExpressProfiler from back in the days of enabling tracing for SQL Server Express. Though I recommend building it from source (https://github.com/ststeiger/ExpressProfiler) rather than any installer such as Sourceforge. SSMS has profiler under Tools/SQL Server Profiler which captures a lot more noise by default. Either tool is invaluable for investigating EF weirdness as well as performance.
if context.Books.Where(b => b.ID == 1).SingleOrDefault() returns #null, I would look at what SQL gets captured for that statement on your database. Compare that to b => b.ID == 2.
If your trace does not capture anything for either 1 or 2, but you see resulting data for the ID = 2 scenario then the explanation would be that your DbContext is not pointed at the database/server you think it is, or there is something amiss with your DbContext setup.
If your trace does capture something for both 1 and 2, but you are only seeing data for #2, check that the query for #1 executes and returns data. If the query looks valid but your code is not seeing anything for ID #1 then somehow your DbContext is in a state where ID #1 has been removed. I would add code to check the DbContext's ChangeTracker to find out if there is an entry for #1 that is sitting in a entity state of Removed. It is possible that you have code being run which is unexpectedly cascading a delete but SaveChanges was not called or was not successful. Long running DbContext instances are prone to this kind of problem.
Test the following:
instead of
var result = context.Books.where(b => b.ID == 1).FirstOrDefault();
use this:
using(var testContext = new MyDbContext())
{
var result = testContext.Books.where(b => b.ID == 1).SingleOrDefault();
}
Substituting MyDbContext with your application DbContext. This eliminates any funny business with a long running DbContext's state. The context can still execute SQL but return what is in cached state.
Note: When querying data, opt for SingleOrDefault rather than FirstOrDefault when you expect 0..1 results. Operations like the First flavours should only be used when you expect 0..many but only care about the first result, and they should always be used with an Order By condition to ensure predictable ordering.

Entity Framework Core count does not have optimal performance

I need to get the amount of records with a certain filter.
Theoretically this instruction:
_dbContext.People.Count (w => w.Type == 1);
It should generate SQL like:
Select count (*)
from People
Where Type = 1
However, the generated SQL is:
Select Id, Name, Type, DateCreated, DateLastUpdate, Address
from People
Where Type = 1
The query being generated takes much longer to run in a database with many records.
I need to generate the first query.
If I just do this:
_dbContext.People.Count ();
Entity Framework generates the following query:
Select count (*)
from People
.. which runs very fast.
How to generate this second query passing search criteria to the count?
There is not much to answer here. If your ORM tool does not produce the expected SQL query from a simple LINQ query, there is no way you can let it do that by rewriting the query (and you shouldn't be doing that at the first place).
EF Core has a concept of mixed client/database evaluation in LINQ queries which allows them to release EF Core versions with incomplete/very inefficient query processing like in your case.
Excerpt from Features not in EF Core (note the word not) and Roadmap:
Improved translation to enable more queries to successfully execute, with more logic being evaluated in the database (rather than in-memory).
Shortly, they are planning to improve the query processing, but we don't know when will that happen and what level of degree (remember the mixed mode allows them to consider query "working").
So what are the options?
First, stay away from EF Core until it becomes really useful. Go back to EF6, it's has no such issues.
If you can't use EF6, then stay updated with the latest EF Core version.
For instance, in both v1.0.1 and v1.1.0 you query generates the intended SQL (tested), so you can simply upgrade and the concrete issue will be gone.
But note that along with improvements the new releases introduce bugs/regressions (as you can see here EFCore returning too many columns for a simple LEFT OUTER join for instance), so do that on your own risk (and consider the first option again, i.e. Which One Is Right for You :)
Try to use this lambda expression for execute query faster.
_dbContext.People.select(x=> x.id).Count();
Try this
(from x in _dbContext.People where x.Type == 1 select x).Count();
or you could do the async version of it like:
await (from x in _dbContext.People where x.Type == 1 select x).CountAsync();
and if those don't work out for you, then you could at least make the query more efficient by doing:
(from x in _dbContext.People where x.Type == 1 select x.Id).Count();
or
await (from x in _dbContext.People where x.Type == 1 select x.Id).CountAsync();
If you want to optimize performance and the current EF provider is not not (yet) capable of producing the desired query, you can always rely on raw SQL.
Obviously, this is a trade-off as you are using EF to avoid writing SQL directly, but using raw SQL can be useful if the query you want to perform can't be expressed using LINQ, or if using a LINQ query is resulting in inefficient SQL being sent to the database.
A sample raw SQL query would look like this:
var results = _context.People.FromSql("SELECT Id, Name, Type, " +
"FROM People " +
"WHERE Type = #p0",
1);
As far as I know, raw SQL queries passed to the FromSql extension method currently require that you return a model type, i.e. returning a scalar result may not yet be supported.
You can however always go back to plain ADO.NET queries:
using (var connection = _context.Database.GetDbConnection())
{
connection.Open();
using (var command = connection.CreateCommand())
{
command.CommandText = "SELECT COUNT(*) FROM People WHERE Type = 1";
var result = command.ExecuteScalar().ToString();
}
}
It seems that there has been some problem with one of the early releases of Entity Framework Core. Unfortunately you have not specified exact version so I am not able to dig into EF source code to tell what exactly has gone wrong.
To test this scenario, I have installed the latest EF Core package and managed to get correct result.
Here is my test program:
And here is SQL what gets generated captured by SQL Server Profiler:
As you can see it matches all the expectations.
Here is the excerpt from packages.config file:
...
<package id="Microsoft.EntityFrameworkCore" version="1.1.0" targetFramework="net452" />
...
So, in your situation the only solution is to update to the latest package which is 1.1.0 at the time of writing this.
Does this get what you want:
_dbContext.People.Where(w => w.Type == 1).Count();
I am using EFCore 1.1 here.
This can occur if EFCore cannot translate the entire Where clause to SQL. This can be something as simple as DateTime.Now that might not even think about.
The following statement results in a SQL query that will surprisingly run a SELECT * and then C# .Count() once it has loaded the entire table!
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > DateTime.Now.AddDays(-7)).Count();
But this query will run an SQL SELECT COUNT(*) as you would expect / hope for:
DateTime earliestDate = DateTime.Now.AddDays(-7);
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template
&& x.SendConfirmedDate > earliestDate).Count();
Crazy but true. Fortunately this also works:
DateTime now = DateTime.Now;
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > now.AddDays(-7)).Count();
sorry for the bump, but...
probably the reason the query with the where clause is slow is because you didnt provide your database a fast way to execute it.
in case of the select count(*) from People query we dont need to know the actual data for each field and we can just use a small index that doesnt have all these fields in them so we havent got to spend our slow I/O on. The database software would be clever enough to see that the primary key index requires the least I/O to do the count on. The pk id's require less space than the full row so you get more back to count per I/O block so you can complete faster.
Now in the case of the query with the Type it needs to read the Type to determine it's value. You should create an index on Type if you want your query to be fast or else it will have to do a very slow full table scan, reading all rows. It helps when your values are more discriminating. A column Gender (usually) only has two values and isnt very discriminating, a primary key column where every value is unique is highly dscriminating. Higher discriminating values will result in a shorter index range scan and a faster result to the count.
What I used to count rows using a search query was
_dbContext.People.Where(w => w.Type == 1).Count();
This can also be achieved by
List<People> people = new List<People>();
people = _dbContext.People.Where(w => w.Type == 1);
int count = people.Count();
This way you will get the people list too if you need it further.

Entity Framework COUNT is doing a SELECT of all records

Profiling my code because it is taking a long time to execute, it is generating a SELECT instead of a COUNT and as there are 20,000 records it is very very slow.
This is the code:
var catViewModel= new CatViewModel();
var catContext = new CatEntities();
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
catViewModel.NumberOfCats = catAccount.Cats.Count();
It is straightforward stuff, but the code that the profiler is showing is:
exec sp_executesql N'SELECT
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy // You get the idea
FROM [dbo].[Cats] AS [Extent1]
WHERE Cats.[AccountId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=7
I've never seen this behaviour before, any ideas?
Edit: It is fixed if I simply do this instead:
catViewModel.NumberOfRecords = catContext.Cats.Where(c => c.AccountId == accountId).Count();
I'd still like to know why the former didn't work though.
So you have 2 completely separate queries going on here and I think I can explain why you get different results. Let's look at the first one
// pull a single account record
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
// count all the associated Cat records against said account
catViewModel.NumberOfCats = catAccount.Cats.Count();
Going on the assumption that Cats has a 0..* relationship with Account and assuming you are leveraging the frameworks ability to lazily load foreign tables then your first call to catAccounts.Cats is going to result in a SELECT for all the associated Cat records for that particular account. This results in the table being brought into memory therefore the call to Count() would result in an internal check of the Count property of the in-memory collection (hence no COUNT SQL generated).
The second query
catViewModel.NumberOfRecords =
catContext.Cats.Where(c => c.AccountId == accountId).Count();
Is directly against the Cats table (which would be IQueryable<T>) therefore the only operations performed against the table are Where/Count, and both of these will be evaluated on the DB-side before execution so it's obviously a lot more efficient than the first.
However, if you need both Account and Cats then I would recommend you eager load the data on the fetch, that way you take the hit upfront once
var catAccount = catContext.Account.Include(a => a.Cats).Single(...);
Most times, when somebody accesses a sub-collection of an entity, it is because there are a limited number of records, and it is acceptable to populate the collection. Thus, when you access:
catAccount.Cats
(regardless of what you do next), it is filling that collection. Your .Count() is then operating on the local in-memory collection. The problem is that you don't want that. Now you have two options:
check whether your provider offer some mechanism to make that a query rather than a collection
build the query dynamically
access the core data-model instead
I'm pretty confident that if you did:
catViewModel.NumberOfRecords =
catContext.Cats.Count(c => c.AccountId == accountId);
it will work just fine. Less convenient? Sure. But "works" is better than "convenient".

how to append IQueryable within a loop

I have a simple foreach loop that goes through the productID's I have stored in a user's basket and looks up the product's details from the database.
As you can see from my code, what I have at present will return the very last item on screen - as the variable is overwritten within the loop. I'd like to be able to concat this so that I can display the product details for the items only in the basket.
I know I could do something very easy like store only ProductIDs in the repeater I use and onitemdatabound call the database there but I'd like to make just one database call if possible.
Currently I have the following (removed complex joins from example, but if this matters let me know):
IQueryable productsInBasket = null;
foreach (var thisproduct in store.BasketItems)
{
productsInBasket = (from p in db.Products
where p.Active == true && p.ProductID == thisproduct.ProductID
select new
{
p.ProductID,
p.ProductName,
p.BriefDescription,
p.Details,
p.ProductCode,
p.Barcode,
p.Price
});
}
BasketItems.DataSource = productsInBasket;
BasketItems.DataBind();
Thanks for your help!
It sounds like you really want something like:
var productIds = store.BasketItems.Select(x => x.ProductID).ToList();
var query = from p in db.Products
where p.Active && productIds.Contains(p.ProductID)
select new
{
p.ProductID,
p.ProductName,
p.BriefDescription,
p.Details,
p.ProductCode,
p.Barcode,
p.Price
};
In Jon's answer, which works just fine, the IQueryable will however be converted to an IEnumerable, since you call ToList() on it. This will cause the query to be executed and the answer retrieved. For your situation, this may be OK, since you want to retrieve products for a basket, and where the number of products will probably be considerably small.
I am, however, facing a similar situation, where I want to retrieve friends for a member. Friendship depends on which group two members belongs to - if they share at least one group, they are friends. I thus have to retrieve all membership for all groups for a certain member, then retrieve all members from those groups.
The ToList-approach will not be applicable in my case, since that would execute the query each time I want to handle my friends in various ways, e.g. find stuff that we can share. Retrieving all members from the database, instead of just working on the query and execute it at the last possible time, will kill performance.
Still, my first attempt at this situation was to do just this - retrieve all groups I belonged to (IQueryable), init an List result (IEnumerable), then loop over all groups and append all members to the result if they were not already in the list. Finally, since my interface enforced that an IQueryable was to be returned, I returned the list with AsIQueryable.
This was a nasty piece of code, but at least it worked. It looked something like this:
var result = new List<Member>();
foreach (var group in GetGroupsForMember(member))
result.AddRange(group.GroupMembers.Where(x => x.MemberId != member.Id && !result.Contains(x.Member)).Select(groupMember => groupMember.Member));
return result.AsQueryable();
However, this is BAD, since I add ALL shared members to a list, then convert the list to an IQueryable just to satisfy my post condition. I will retrieve all members that are affected from the database, every time I want to do stuff with them.
Imagine a paginated list - I would then just want to pick out a certain range from this list. If this is done with an IQueryable, the query is just completed with a pagination statement. If this is done with an IEnumerable, the query has already been executed and all operations are applied to the in-memory result.
(As you may also notice, I also navigate down the entity's relations (GroupMember => Member), which increases coupling can cause all kinds of nasty situations further on. I wanted to remove this behavior as well).
So, tonight, I took another round and ended up with a much simpler approach, where I select data like this:
var groups = GetGroupsForMember(member);
var groupMembers = GetGroupMembersForGroups(groups);
var memberIds = groupMembers.Select(x => x.MemberId);
var members = memberService.GetMembers(memberIds);
The two Get methods honor the IQueryable and never convert it to a list or any other IEnumerable. The third line just performs a LINQ query ontop of the IEnumerable. The last line just takes the member IDs and retrieves all members from another service, which also works exclusively with IQueryables.
This is probably still horrible in terms of performance, but I can optimize it further later on, if needed. At least, I avoid loading unnecessary data.
Let me know if I am terribly wrong here.

at what point does linq-to-sql or linq send a request to the database

I want to make my queries better but have been un-able to find a resource out there which lays out when a query is shipped of to the db.
DBContext db = new DBContext();
Order _order = (from o in db
where o.OrderID == "qwerty-asdf-xcvb"
select o).FirstOrDefault();
String _custName = _order.Customer.Name +" "+_order.Customer.Surname;
Does the assignment of _custName need to make any request to the database?
Does the assignment of _custName need to make any request to the database?
It depends on whether or not Order.Customer is lazily loaded. If it is lazily loaded, then yes. Otherwise, no.
By the way, you can investigate this easily if you set the DataContext.Log property:
db.Log = Console.Out;
Then you can watch the SQL statements on the console. By stepping through your program you can see exactly when the SQL statement hits the database.
Check out MSDN on Deferred versus Immediate Loading. In particular, you can turn off lazy loading. Watch out for the SELECT N + 1 problem.
Just FYI, besides lazy loading, there is another reason why database activity may not occur when you expect it to when using LINQ. For example, if I change your example code slightly:
DBContext db = new DBContext();
var orders = (from o in db
where o.OrderID == "qwerty-asdf-xcvb"
select o);
var order = orders.FirstOrDefault();
String _custName = _order.Customer.Name +" "+_order.Customer.Surname;
Someone unfamiliar with how LINQ works may expect that all orders are retrieved from the database when the second line of code is executed. In fact, LINQ delays querying the database until the last possible moment, which in this case is the call to FirstOrDefault. Of course, at this point LINQ knows to only retrieve at most one record.

Categories

Resources