Why do I need to .Include() collections

Why do I need to .Include() collections - c#

I wrote a query which is pretty simple:
var locations = await _context.Locations
.Include(x => x.LocationsOfTheUsers)
.Include(x => x.Address)
.ThenInclude(x => x.County)
.Where(CalculateFilters(searchObj))
.ToListAsync(cancellationToken);
And everytime LocationsOfTheUsers were null so I decided to .Include(x => x.LocationsOfTheUsers) and I received results as expected but I'm not sure why do I have to include this collections since it's defined like this:
public class Location
{
public string Title { get; set; }
public long? RegionId { get; set; }
public Region Region { get; set; }
public long? AddressId { get; set; }
public Address Address { get; set; }
public long? CountyId { get; set; }
public County County { get; set; }
public ICollection<LocationsOfTheUsers> LocationsOfTheUsers { get; set; }
}
I thought this will be automatically included since it exist as ICollection in Location class.
So why is .Include() on LocationsOfTheUsers needed here?
Thanks guys
Cheers

In entity framework the non-virtual properties represent the columns of the tables, the virtual properties represent the relations between the tables (one-to-many, many-to-many, ...)
So your property should have been defined as:
public virtual ICollection<LocationsOfTheUsers> LocationsOfTheUsers { get; set; }
One of the slower parts of a database query is the transfer of the selected data from the database management system to your local process. Hence it is wise to limit the selected data to the values you actually plan to use.
If you have a one-to-many relation between Schools and Students, and you ask for School [10] you don't want automatically to fetch its 2000 Students.
Even if you would like to have "School [10] with all its Students" it would not be efficient to use Include to also fetch the Students. Every Student will have a foreign key SchoolId with a Value of [10]. If you would use Include you would transfer this foreign key 2000 times. What a waste!
When using entity framework always use Select to fetch data and select only the properties that you actually plan to use. Only use Include if you plan to change the included items.
This way you can separate your database table structure from the actual query. If your database structure changes, only the query changes, users of your query don't notice the internal changes.
Apart from better performance and more robustness against changes, readers of your code can more easily see what values are in their query.
Certainly don't use Include to save you some typing. Having to debug one error after future changes will take way more time than you will ever save by typeing include instead of Select
Finally: limit your data early in your process, so put the Where in front.
So your query should be:
var predicate = CalculateFilters(searchObj)
var queryLocations = dbContext.Locations
.Where(predicate)
.Select(location => new
{
// Select only the location properties that you plan to use
Id = location.Id,
Name = location.Name,
// Locations Of the users:
UserLocations = location.LocationsOfTheUsers
.Select(userLocation => new
{
// again: only the properties that you plan to use
Id = userLocation.Id,
...
// Not needed, you already know the value
// LocationId = userLocation.LocationId
})
.ToList(),
Address = new
{
Street = location.Address.Street,
PostCode = location.Addrress.PostCode,
...
County = location.Address.County.Name // if you only want one property
// or if you want more properties:
County = new
{
Name = location.Address.County.Name,
Abbr = location.Address.Count.Abbr,
...
}),
},
});

I thought this will be automatically included since it exist as ICollection in Location class.
Well, it's not automatically included, probably for performance reasons as the graph of related entities and their recursive child entities may be rather deep.
That's why you use eager loading to explicitly include the related entities that you want using the Include method.
The other option is to use lazy loading which means that the related entities are loaded as soon as you access the navigation property in your code, assuming some prerequisites are fulfilled and that the context is still around when this happens.
Please refer to the docs for more information.

I believe you are using EntityFrameworkCore. In EntityFramework (EF6), lazy loading is enabled by default, However, in EntityFrameworkCore, lazy loading related entities is handled by a separate package Microsoft.EntityFrameworkCore.Proxies.
To enable the behaviour you are seeking, install the above package and add the following code
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder.UseLazyLoadingProxies();
}
After this, the related entities will be loaded without the Include call.

Related

NotMapped property causes all properties load in select statement in EF Core

I use EF Core 5 in a project, one of my entities contains a NotMapped property that mixed two properties of the entity, I expect in the select statement only properties that contain in the select statement load from the database but after profiling, I have seen that all properties were loaded.
As a sample, the Contact entity contains one NotMapped property as follows.
public class Contact
{
public int ContactId { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
[NotMapped]
public string FullName => $"{FirstName} {LastName}";
public string Email { get; set; }
public string Phone { get; set; }
public string Address { get; set; }
}
public class SampleContext : DbContext
{
public DbSet<Contact> Contacts { get; set; }
}
In the following query, I need only ContactId and FullName, I expect only ContactId, FirstName, LastName to load in the TSQL query, but all properties are loaded.
var list = dbContext.Contacts.Select( e => new
{
e.ContactId,
e.FullName
}).ToList();

If you just want to load some columns you could:
var list = dbContext.Contacts
.Where(...)
.Select( e => new
{
e.ContactId,
LastName = e.FirstName + " " + e.LastName
})
.ToList() // hit the database

I'm actually surprised this runs at all. The whole point of NotMapped is to indicate that the column is not bound and shouldn't even be used in a Select or any other Linq expression. With EF 6 trying that would have resulted in an error. Your choices were:
A) Select the entity and use the client-side computed property as you had declared. (Essentially what your code is doing behind the scenes)
or
B) as tymtam suggested, compute the property in the anonymous type.
var list = dbContext.Contacts
.Where(...)
.Select( e => new
{
e.ContactId,
LastName = e.FirstName + " " + e.LastName
}).ToList() // hit the database
EF Core had introduced client side evaluation which was enabled by default in earlier versions but I believe disabled by default since EF Core 3... This issue smells either of that you have client side evaluation enabled, or a bug in EF Core 5 that is still reverting to a client-side evaluation for an unmapped property. Either way, I am not aware of marking a property as "client side computed" as this would only possibly work if the property was declared as an expression (rather than effectively a string.Format) so your options are either of the two above, or relying on what looks to be client side evaluation which is either something your project has configured (and potentially will bite you in the butt down the road) or a bug that may be "fixed" and stop working at some point in the future.

EFCore doesn't (can't) decompile [NotMapped] to calculate the dependencies (that's a pretty hard thing to do), it defaults back to the next best thing it can do which is load all properties so it can calculate it client side.
If you do want to be able to push this calculation to SQL you need to use an expression tree. I would suggest using a package like NeinLinq or EFCore Projectables that allows you to do this in a fairly easy way.
(there's quite a few libraries that allow this, this was just the first two i knew off the top of my head)

Does Entity size matter for performance in EF Core with DB first?

I'm writing a ASP.NET Core Web API project. As a data source It will be using existing (and pretty big) database. But not entire database. The API will use only some of the tables and even in these tables it will not use all the columns.
Using Reverse engineering and scaffolding I was able to generate DbContext and Entity classes... and it got me thinking. There is a table with 30 columns (or more). I'm using this table, but I only need 5 columns.
My question is:
Is there any advantage of removing 25 unused columns from C# entity object? Does it really matter?
The advantage of leaving them there unused is that in case of someone wants to add new functionality that will need one of them, he will not need to go to the db and reverse engineer needed columns (there are there already).
The advantage of removing unused is... ?
EDIT: Here is the sample code:
public class FooContext : DbContext
{
public FooContext(DbContextOptions<FooContext> options)
: base(options)
{
}
public DbSet<Item> Items { get; set; }
}
[Table("item")]
public class Item
{
[Key]
[Column("itemID", TypeName = "int")]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int Id { get; set; }
[Column("name", TypeName = "varchar(255)")]
public string Name { get; set; }
}
Sample usage:
public ItemDto GetItem(int id)
{
var item = _fooContext.Items.Where(i => i.Id == id).FirstOrDefault();
// Here I have item with two fields: Id and Name.
var itemDto = _mapper.Map<ItemDto>(item);
return itemDto;
}
Obviously I'm curious about more complex operations. Like... when item entity is being included by other entity. For example:
_foo.Warehouse.Include(i => i.Items)
or other more complex functions on Item entity

Your entity needs to match what's in the database, i.e. you need a property to match each column (neglecting any shadow properties). There's no choice here, as EF will complain otherwise.
However, when you actually query, you can select only the columns you actually need via something like:
var foos = await _context.Foos
.Select(x => new
{
Bar = x.Bar,
Baz = z.Baz
})
.ToListAsync();
Alternatively, if you don't need to be able to insert/update the table, you can instead opt to use DbQuery<T> instead of DbSet<T>. With DbQuery<T>, you can use anything class you want, and project the values however you like, via FromSql.

EntityFramework load / update Entities

i am struggeling for a while now to understand how EF loads / updates entities.
First of all i wanna explain what my app (WPF) is about. I am developing
an application where users can store Todo Items in Categories, these categories are predefined by the application. Each user can read all items but can only delete / update his own items. It's a multiuser system, means the application is running multiple times in the network accessing the same sql server database.
When a user is adding/deleting/updating items the UI on all the other running apps has to update.
My model looks like this:
public class Category
{
public int Id { get; set; }
public string Name { get; set; }
public List<Todo> Todos { get; set; }
}
public class Todo
{
public int Id { get; set; }
public string Content { get; set; }
public DateTime LastUpdate { get; set; }
public string Owner { get; set; }
public Category Category { get; set; }
public List<Info> Infos { get; set; }
}
public class Info
{
public int Id { get; set; }
public string Value { get; set; }
public Todo Todo { get; set; }
}
I am making the inital load like this, which works fine:
Context.dbsCategories.Where(c => c.Id == id).Include(c => c.Todos.Select(t => t.Infos)).FirstOrDefault();
Now i was trying to load only the Todos which are from the current user therefore i tried this:
Context.dbsCategories.Where(c => c.Id == id).Include(c => c.Todos.Where(t => t.Owner == Settings.User).Select(t => t.Infos)).FirstOrDefault();
This does not work because it's not possible to filter within include, so I tried this:
var cat = Context.dbsCategories.Where(c => c.Id == id).FirstOrDefault();
Context.dbsTodos.Where(t => t.Category.Id == cat.Id && t.Owner == Settings.User).Include(t=>t.Infos);
After executing the second line where i look for the Todo Items, these Items were automatically added to cat's Todos collection. Why? I would have expected that i have to add them manually to cat's Todos collection.
Just for my understanding what is EF doing here exactly?
Now to my main problem -> the synchronization of the data between database and client. I am using a long running Context which lives as long as the application is running to save changes to the database which are made on owned items. The user does not have the possibility to manipulate / delete data from other users this is guarantee by the user interface.
To synchronize the data i build this Synch Method which will run every 10 second, right now it's triggere manually.
Thats my synchronization Code, which only synchronizes Items to the client that do not belong to it.
private async Task Synchronize()
{
using (var ctx = new Context())
{
var database = ctx.dbsTodos().Where(x => x.Owner != Settings.User).Select(t => t.Infos).AsNoTracking();
var loaded = Context.dbsTodos.Local.Where(x => x.Owner != Settings.User);
//In local context but not in database anymore -> Detachen
foreach (var detach in loaded.Except(database, new TodoIdComparer()).ToList())
{
Context.ObjectContext.Detach(detach);
Log.Debug(this, $"Item {detach} detached");
}
//In database and local context -> Check Timestamp -> Update
foreach (var update in loaded.Intersect(database, new TodoIdTimeStampComparer()))
{
await Context.Entry(update).ReloadAsync();
Log.Debug(this, $"Item {update} updated");
}
//In database but not in local context -> Attach
foreach (var attach in database.ToList().Except(loaded, new TodoIdComparer()))
{
Context.dbsTodos().Attach(attach);
Log.Debug(this, $"Item {attach} attached");
}
}
}
I am having following problems / issues of unknow origin with it:
Detaching deleted Items seems to work, right now i am not sure if only the Todo Items are detached or also the Infos.
Updating Items works only for the TodoItem itsself, its not reloading the Infos within? How can i reload the whole entity with all it's relations?
I am thankful for every help on this, even if you are saying it's all wrong what i am doing here!
Attaching new Items and Infos does not work so far? What am i doing wrong here?
Is this the right approach to synchronize data between client and database?
What am i doing wrong here? Is there any "How to Sync" Tutorial? I have not found anything helpful so far?
Thanks!

My, you do like to deviate from entity framework code-first conventions, do you?
(1) Incorrect class definitions
The relations between your tables are Lists, instead of ICollections, they are not declared virtual and you forgot to declare the foreign key
There is a one-to-many relation between Todo and Category: every Todo belongs to exactly one Category (using a foreign key), every Category has zero or more Todos.
You choose to give Category a property:
List<Todo> Todos {get; set;}
Are you sure that category.Todos[4] has a defined meaning?
What would category.Todos.Insert(4, new Todo()) mean?
Better stick to an interface where you can't use functions that have no proper meaning in your database: use ICollection<Todo> Todos {get; set;}. This way you'll have only access to functions that Entity Framework can translate to SQL.
Besides, a query will probably be faster: you give entity framework the possibility to query the data in its most efficient way, instead of forcing it to put the result into a List.
In entity framework the columns of a table are represented by non-virtual properties; the virtual properties represent the relations between the tables (one-to-many, many-to-many)
public class Category
{
public int Id { get; set; }
public string Name { get; set; }
... // other properties
// every Category has zero or more Todos (one-to-many)
public virtual ICollection<Todo> Todos { get; set; }
}
public class Todo
{
public int Id { get; set; }
public string Content { get; set; }
... // other properties
// every Todo belongs to exactly one Category, using foreign key
public int CategoryId { get; set }
public virtual Category Category { get; set; }
// every Todo has zero or more Infos:
public virtual ICollection<Info> Infos { get; set; }
}
You'll probably guess Info by now:
public class Info
{
public int Id { get; set; }
public string Value { get; set; }
... // other properties
// every info belongs to exactly one Todo, using foreign key
public int TodoId {get; set;}
public virtual Todo Todo { get; set; }
}
Three major improvements:
ICollections instead of Lists
ICollections are virtual, because it is not a real column in your table,
foreign key definitions non-virtual: they are real columns in your tables.
(2) Use Select instead of Include
One of the slower parts of a database query is the transport of the selected data from the Database Management System to your local process. Hence it is wise to limit the amount of transported data.
Suppose Category with Id [4] has a thousand Todos. Every Todo of this Category will have a foreign key with a value 4. So this same value 4 will be transported 1001 times. What a waste of processing power!
In entity framework use Select instead of Include to query data and select only the properties you actually plan to use. Only use Include if you plan to update the Selected data.
Give me all Categories that ... with their Todos that ...
var results = dbContext.Categories
.Where(category => ...)
.Select(category => new
{
// only select properties that you plan to use
Id = category.Id,
Name = category.Name,
...
Todos = category.Todos
.Where(todo => ...) // only if you don't want all Todos
.Select(todo => new
{
// again, select only the properties you'll plan to use
Id = todo.Id,
...
// not needed, you know the value:
// CategoryId = todo.CategoryId,
// only if you also want some infos:
Infos = todo.Infos
.Select(info => ....) // you know the drill by now
.ToList(),
})
.ToList(),
});
(3) Don't keep DbContext alive for such a long time!
Another problem is that you keep your DbContext open for quite some time. This is not how a dbContext was meant. If your database changes between your query and your update, you'll have troubles. I can hardly imagine that you query so much data that you need to optimize it by keeping your dbContext alive. Even if you query a lot of data, the display of this huge amount of data would be the bottle-neck, not the database query.
Better fetch the data once, dispose the DbContext, and when updating fetch the data again, update the changed properties and SaveChanges.
fetch data:
RepositoryCategory FetchCategory(int categoryId)
{
using (var dbContext = new MyDbContext())
{
return dbContext.Categories.Where(category => category.Id == categoryId)
.Select(category => new RepositoryCategory
{
... // see above
})
.FirstOrDefault();
}
}
Yes, you'll need an extra class RepositoryCategory for this. The advantage is, that you hide that you fetched your data from a database. Your code would hardly change if you'd fetch your data from a CSV-file, or from the internet. This is way better testable, and also way better maintainable: if the Category table in your database changes, users of your RepositoryCategory won't notice it.
Consider creating a special namespace for the data you fetch from your database. This way you can name the fetched Category still Category, instead of RepositoryCategory. You even hide better where you fetched your data from.
Back to your question
You wrote:
Now i was trying to load only the Todos which are from the current user
After the previous improvements, this will be easy:
string owner = Settings.User; // or something similar
var result = dbContext.Todos.Where(todo => todo.Owner == owner)
.Select(todo => new
{
// properties you need
})

Not supported in LINQ to Entities

Everytime I use the Include extension, it would return an error when a value from included entity is used in the WHERE CLAUSE.
I included the System.Data.Entity which is the common answer but still have the same issue.
Model:
public partial class business_partner
{
public int id { get; set; }
public string accountid { get; set; }
}
public partial class order
{
public int id { get; set; }
public string doc_number { get; set; }
public int vendor_id { get; set; }
public int status { get; set; };
[ForeignKey("vendor_id")]
public virtual business_partner businessPartnerVendor { get; set; }
}
public IQueryable<order> GetOrder()
{
return (context.order);
}
Query:
_orderService.GetOrder()
.Include(a => a.businessPartnerVendor)
.Where(o => o.doc_number == "Order Number"
&& o.businessPartnerVendor.accountid == "TEST"
&& o.status > 2 && o.status != 9).Count() > 0
Exception:
The specified type member 'businessPartnerVendor' is not supported in LINQ to Entities. Only initializers, entity members, and entity navigation properties are supported.

Alas you forgot to write your requirement. Your code doesn't do what you want, so I might come to the incorrect conclusion, but looking at your code, it seems that you want the following:
Tell me whether there are Orders, that
- have a value of DocNumber that equals "Order_Number",
- AND that are orders of a BusinessPartnerVendor with a value of AccountId equal to "TEST",
- AND have a value of Status which is more than 2 and not equal to 9.
The part "Tell me whether there are Orders that", was deducted by the fact that you only want to know whether Count() > 0
Your Count would have joined all elements, included all columns of BusinessPartnerVendor, removed all rows that didn't match your Where, and counted how many joined items were left. That integer value would be transferred, after which your process would check whether the value is larger than zero.
One of the slower parts of a database query is the transport of the selected data to from the Database Management System to your local process. Hence it is wise to limit the amount of transferred data.
Quite often I see people using Include to get the items that are stored in a different table (quite often a one-to-many). This will select the complete row. From the businessPartnerVendor, you only want to use property AccountId. So why select the complete object?
In entity framework use Select to select properties you want to query. Only use Include if you want to update the fetched data.
bool areTestOrdersAvailable = orderService.GetOrder()
.Where(order => order.doc_number == "Order Number"
&& order.businessPartnerVendor.accountid == "TEST"
&& order.status > 2 && order.status != 9)
.Any();
Because of the virtual keyword in your classes (ans maybe some fluent API), entity framework knows about the one-to-many relation and will perform the correct join for you. It will only use SQL "TAKE 1" to detect whether there are any elements. Only one Boolean is transferred
Some Advices about entity framework
It is good practice to stick as much as possible to the entity framework code first conventions The more you do this, the less Attributes and Fluent API is needed. There will also be less discrepancy between the way Microsoft's usage of identifiers for classes, fields, properties, methods, etc and yours.
In entity framework, all columns of a table are represented by non-virtual properties, the virtual properties represent the relations between tables (one-to-many, many-to-many, ...)
My advice would be: add the foreign keys to your classes, and stick to one identifier to describe one row in your tables.
So decide whether to use business_partner or BusinessPartnerVendor if they are in fact the same kind of thing
Add the foreign key:
// Every Order is the Order of exactly one BusinessPartner, using foreign key (one-to-many)
public int BusinessPartnerId {get; set;}
public virtual BusinessPartner BusinessPartner {get; set;}
This has the advantage, that if you want to select the Ids of all BusinessPartners that have one or more Orders that ..., you don't have to perform a join:
var businessPartnerIds = myDbContext.Orders
.Where(order => ...)
.Select(order => order.BusinessPartnerId)
.Distinct();
Only one database table will be accessed

Querying Many to Many relationships Entitty Framework (doing wrong?? )

I've been doing some research on this topic and figure out a way to achieve this queries in my project but I'm not sure if something here is wrong. please help.
in summary I've created the entities like this:
class Student
{
public int StudentId { get; set; }
public string Name { get; set; }
public ICollection<Courses> Courses {get;set;} //or public List <Courses> {get;set;}
}
class Course
{
public int CourseId { get; set; }
public string Name { get; set; }
public ICollection<Students> Students {get;set;} //or public List<Students> {get;set;}
}
// We can see here that the database creates the Join Table Correctly
What I want to do:
Display in a grid view each student and for each of the students display the courses in wich they are enrolled.
If I made a simple query like
dbContex.Students.ToList(); 
and we look at the list the Collection of courses value is null. What is happening here?, shoulden't EF map this and make a query to SQL to get the info?
After this y could not solve the problem because the info that I found was using other approach of the framework (Diagram First ,i think) and they set up things in the entities diagram.
 
How did I work out the problem :
Find out in a Wordpress Post a Query that I haven´t tried out and add some other lines of code to achieve what I wanted:
aux_S = contexto.Students.ToList();
foreach(var element in aux_S)
         
   {
                
element.Courses= contexto.Courses.Where(c => c.Students.Any(s => s.StudentId == element.StudentId)).ToList();
          
  }
// I know I can make a projection to dismiss all the fields that I do not need , this is just to try it out
Am I wrong  doing this ?
It worked, but how is it possible?

One of the slower parts of a database query is the transfer of the data to your machine. So it is good practice to transfer only the data you plan to use.
When you use LINQ in entity framework, using Queryable.Select is a good way to specify exactly what data you want to transfer. This is usually done just before your final ToList / ToDictionary / FirstOrDefault / Single / ...
You want all Students, each with all his Courses. If you look at your tables, you'll see that there is more data in the tables then you want. For instance, each Student has an Id, each of his Courses have the same value for StudentId. So if a Student attends 20 Courses, you would have transferred the same value for StudentId 21 times.
So to make your query efficient: Select only the Properties of Students you plan to use, with only the Properties of the Courses of these Students you are interested in.
This will automatically solve your problem:
var result = myDbcontext.Students
// if you don't want all Students, use a Where:
.Where(student => student.City = "Guadalajara")
// Select only the properties you plan to use:
.Select(student => new
{
Id = student.Id,
Name = student.Name,
Birthday = student.Birthday,
Address = new
{
Street = student.Street,
City = student.City,
...
}
Courses = student.Courses
// if you don't want all courses: use a where
.Where(course => course.Start.Year == 2018)
// again: select only the properties you plan to use
{
Name = course.Name,
Location = course.Location,
...
// One of the useless properties to transfer:
// StudentId = course.StudentId
})
.ToList();
});

If you perform this query:
var studentslist = dbContex.Students.ToList();
Each item on studentslist will have the 'Courses' collection null, because, although the connection/relation exists (between each table), you didn't specify that you wanted that collection populated. For that to happen you can change your query accordingly:
var studentslist = dbContex.Students.Include(p => p.Courses).ToList();
Now, after running the last query, if you get an empty list on one/any of the items, then it means those items (students), aren't linked to any courses.

You are not lazy loading, if you add virtual like: public virtual ICollection<Courses> Courses {get;set;} you should get the courses loaded.
However, I'd advise using lazy loading since it may cause performance issues down the road, what you want to do is eager loading.
So when you are querying your student you would simply do this:
dbContex.Students.Include(c => c.Courses).ToList();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why do I need to .Include() collections - c#

Related

NotMapped property causes all properties load in select statement in EF Core

Does Entity size matter for performance in EF Core with DB first?

EntityFramework load / update Entities

Not supported in LINQ to Entities

Querying Many to Many relationships Entitty Framework (doing wrong?? )

Categories

Resources