c# EF query explosion - c#

Dealing with three tables - Company, Employee and User.
Company has 0 or Many Employees.
Employee has a nullable int FK to Company. In practice this alway has a value.
Employee has a non nullable int FK to User.
User has a bit field AccountIsDisabled.
In my Data Model I have a partial class extending the EF model class for Company.
On this the call to ActiveEmployees returning all Employees that are active for the company.
My problem is that this code is generating a query explosion.
For a company of 1K employees I am getting 1K calls to the DB. It seems EF is creating a call for each employee when navigating to the User table.
I have tried many methods to force eager loading but to no avail.
Anyone out there see a reason that I am getting this query explosion?
namespace JCS.Data
{
public partial class Company : IIdentifiable
{
public IEnumerable<Employee> ActiveEmployees
{
get
{
return Employees.Where(e => !e.User.AccountIsDisabled);
}
}
}
}
Sorry for the missing info.
The explosion of queries occurs when a bool property on the related Employee class is accessed. Like so
namespace JCS.Data
{
public partial class Employee : IIdentifiable
{
public bool ApprovesTimesheets
{
get
{
return Company.ActiveEmployees.Any(
employee => employee.TimesheetApproverEmployeeID == ID
&& employee.TimesheetsEnabled);
}
}
}
}
So anywhere in the code I go
bool approvesTimesheets = employee.ApprovesTimesheets;
I get the 1K queries.
I have tried adding ToLis() to the Company.ActiveEmployees. No joy.
e.g.
in Employee class
var activeEmployees = Company.ActiveEmployees.ToList();
var approvesTimesheets = activeEmployees .Any(
employee => employee.TimesheetApproverEmployeeID == ID
&& employee.TimesheetsEnabled);
the latest in a long line of failed attempts:
public List<Employee> ActiveEmployees
{
get
{
var employees = Employees.AsQueryable().Include(x => x.User).ToList();
return employees.Where(e => !e.User.AccountIsDisabled).ToList();
//return Employees.Where(e => !e.User.AccountIsDisabled);
}
}

You need to call .ToList() or ToListAsync() to get all the data at once otherwise it will get the data on the fly per record.
This is the problem with deferred execution VS immediate execution. When you don't materialize the list with .Where(foo).ToList() it loads each record whenever you try to access it therefore the 1000 DB calls.
edit: Please note that you are also using a navigational property that points to another object (my guess is that it's object mapped directly to a table ) so, when trying to get that object you also do additional DB calls. to avoid that do something like this :
public partial class Company : IIdentifiable
{
public IEnumerable<Employee> ActiveEmployees
{
get
{
return Employees.Where(e => !e.User.AccountIsDisabled).Include(x=>x.User).ToList();
}
}
}

OK the problem stemmed from using navigation properties in the POCOs.
As pointed out by #IvanStoev I could only enforce Eager Loading at the initial call to the DB.
So when I load the initial Employee object I need to load all related objects. So..
_currentUser = Repository.Context.Employees.Include("User").Include("Company.Employees.User").FirstOrDefault(e => e.User.Person.Email == HttpContext.User.Identity.Name);
Solves the problem. I am worried now that I have a lot of data loaded. Company.Employee is 1K+ object for a big company.
Some more testing needed but the research has greatly increased my understanding of EF. Thanks for the help.

Related

Virtual keyword in Entity Framework properties

public class Student
{
public int StudentId;
public string StudentName;
public int CourseId;
public virtual Course Courses { get; set; }
}
public class Course
{
public int CourseId;
public string CourseName;
public string Description;
public ICollection<Student> Students {get;set;}
public ICollection<Lecture> Lectures { get; set; }
}
public class Lecture
{
public int LectureId;
public string LectureName;
public int CourseId;
public virtual Course Courses { get; set; }
}
What is the keyword virtual used for here?
I was told a virtual is for lazy loading but I don't understand why.
Because when we do
_context.Lecture.FirstOrDefault()
the result returns the first Lecture and it does not include the attribute Course.
To get the Lecture with the Course, we have to use:
_context.Lecture.Include("Courses").FirstOrDefault()
without using a virtual keyword, it's already a lazy-loading.
Then why do we need the keyword?
By declaring it virtual you allow EF to substitute the value property with a proxy to enable lazy loading. Using Include() is telling the EF query to eager-load the related data.
In EF6 and prior, lazy loading was enabled by default. With EF Core it is disabled by default. (Or not supported in the earliest versions)
Take the following query:
var lecture = _context.Lecture.Single(x => x.LectureId == lectureId);
to load one lecture.
If you omit virtual then accessing lecture.Course would do one of two things. If the DbContext (_context) was not already tracking an instance of the Course that lecture.CourseId was pointing at, lecture.Course would return #null. If the DbContext was already tracking that instance, then lecture.Course would return that instance. So without lazy loading you might, or might not get a reference, don't count on it being there.
With virtual and lazy loading in the same scenario, the proxy checks if the Course has been provided by the DbContext and returns it if so. If it hasn't been loaded then it will automatically go to the DbContext if it is still in scope and attempt to query it. In this way if you access lecture.Course you can count on it being returned if there is a record in the DB.
Think of lazy loading as a safety net. It comes with a potentially significant performance cost if relied on, but one could argue that a performance hit is the lesser of two evils compared to runtime bugs with inconsistent data. This can be very evident with collections of related entities. In your above example the ICollection<Student> and such should be marked as virtual as well to ensure those can lazy load. Without that you would get back whatever students might have been tracked at the time, which can be very inconsistent data state at runtime.
Take for example you have 2 courses, Course #1 and #2. There are 4 students, A, B, C, and D. All 4 are registered to Course #1 and only A & B are registered to Course B. If we ignore lazy-loading by removing the virtual then the behavior will change depending on which course we load first if we happen to eager-load in one case and forget in the second...
using (var context = new MyAppDbContext())
{
var course1 = context.Courses
.Include(x => x.Students)
.Single(x => x.CourseId == 1);
var course2 = context.Courses
.Single(x => x.CourseId == 2);
var studentCount = course2.Students.Count();
}
Disclaimer: With collections in entities you should ensure these are always initialized so they are ready to go. This can be done in the constructor or on an auto-property:
public ICollection<Student> Students { get; set; } = new List<Student>();
In the above example, studentCount would come back as "2" because in loading Course #1, both Student A & B were loaded via the Include(x => x.Students) This is a pretty obvious example loading the two courses right after one another but this situation can easily occur when loading multiple records that share data, such as search results, etc. It is also affected by how long the DbContext has been alive. This example uses a using block for a new DbContext instance scope, one scoped to the web request or such could be tracking related instances from earlier in the call.
Now reverse the scenario:
using (var context = new MyAppDbContext())
{
var course2 = context.Courses
.Include(x => x.Students)
.Single(x => x.CourseId == 2);
var course1 = context.Courses
.Single(x => x.CourseId == 1);
var studentCount = course1.Students.Count();
}
In this case, only Students A & B were eager loaded. While Course 1 actually references 4 students, studentCount here would return "2" for the two students associated with Course 1 that the DbContext was tracking when Course 1 was loaded. You might expect 4, or 0 knowing that you didn't eager-load the students. The resulting related data is unreliable and what you might or might not get back will be situational.
Where lazy loading will get expensive is when loading sets of data. Say we load a list of 100 students and when working with those students we access student.Course. Eager loading will generate 1 SQL statement to load 100 students and their related courses. Lazy loading will end up executing 1 query for the students, then 100 queries to load course for each student. (I.e. SELECT * FROM Courses WHERE StudentId = 1; SELECT * FROM Courses WHERE StudentId = 2; ...) If student had several lazy loaded properties then that's another 100 queries per lazy load.

When do you need to .Include related entities in Entity Framework?

This seems arbitrary to me when I have to actually .Include() related entities and when I don't. In some cases, EF gives me the info for the related entities without it and in other cases, it can't do anything with the related entities because I didn't include them:
Works without .Include();
This is an example where I'm loading data without .Include();
public class InvoiceService
{
private ApplicationDbContext db { get; set; }
public InvoiceService(ApplicationDbContext context)
{
db = context;
}
public Invoice Get(int id)
{
return db.Invoices.SingleOrDefault(x => x.Id == id);
}
}
public partial class ShowInvoice : System.Web.UI.Page
{
private InvoiceService invoiceService;
private readonly ApplicationDbContext context = new ApplicationDbContext();
protected void Page_Load(object sender, EventArgs e)
{
invoiceService = new InvoiceService(context);
if (!IsPostBack)
{
int.TryParse(Request.QueryString["invoiceId"].ToString(), out int invoiceId);
LoadInvoice(invoiceId);
}
}
private void LoadInvoice(int invoiceId)
{
var invoice = invoiceService.Get(invoiceId);
// Other code irrelevant to the question goes here.
}
}
Here follows the result which includes the data for the Company associated with the invoice I'm requested:
As you can see, the information for the company definitely comes through but was not explicitly included.
Doesn't work without .Include();
Conversely, I've done some mapping to do with invoices in this same project and I got NullReferenceExceptions when fetching the related entities property values because I didn't .Include().
This method gets all the approved timesheet entries for the specified company. This viewmodel is exclusively to be used when manipulating the association of timesheet entries for an invoice (so you're invoicing based on the timesheet entries selected).
public List<InvoiceTimesheetViewModel> GetInvoiceTimesheetsByCompanyId(int companyId)
{
var factory = new TimesheetViewModelsFactory();
var timesheets = db.Timesheets.Where(x => x.Approved && x.Company.Id == companyId && !x.Deleted).ToList();
return factory.GetInvoiceTimesheetsViewModel(timesheets);
}
NullReferenceExceptions occurred in the factory that maps the timesheet entities to the viewmodel:
public List<InvoiceTimesheetViewModel> GetInvoiceTimesheetsViewModel(List<Timesheet> timesheets)
{
var model = new List<InvoiceTimesheetViewModel>();
foreach (var timesheet in timesheets)
{
var start = DateTime.Parse((timesheet.DateAdded + timesheet.StartTime).ToString());
var finished = DateTime.Parse((timesheet.DateCompleted + timesheet.EndTime).ToString());
DateTime.TryParse(timesheet.RelevantDate.ToString(), out DateTime relevant);
model.Add(new InvoiceTimesheetViewModel
{
RelevantDate = relevant,
BillableHours = timesheet.BillableHours,
Finished = finished,
Id = timesheet.Id,
StaffMember = timesheet.StaffMember.UserName, // NRE here.
Start = start,
Task = timesheet.Task.Name // NRE here.
});
}
return model;
}
To fix these, I had to change the query that fetches the data to the following:
var timesheets = db.Timesheets.Include(i => i.StaffMember).Include(i => i.Task)
.Where(x => x.Approved && x.Company.Id == companyId && !x.Deleted).ToList();
Why is Entity Framework sometimes happy to give me data without me explicitly requesting that data and sometimes it requires me to explicitly request the data or else throws an error?
And how am I to know when I need to explicitly include the data I'm looking for and when I don't?
Entity framework uses lazy loading to load child relationships. For lazy loading to work property in the model should be marked with virtual keyword. Ef overrides it and adds lazy loading support.
When you have no virtual property EF has no way to load your child relationship data later, so the only time it's possible to do - during initial data loading using Include.
public class Timesheet
{
...
public virtual StaffMember StaffMember { get; set; }
public virtual Task Task { get; set; }
...
}
It depends on your models. If you have marked relational properties as virtual then you'll need to use .Include so EF knows that you need it. It is Lazy Loading. Preserves machine's memory and DB requests.

Updating object with child collection using Entity Framework causing duplicates in database

I have a Customer class that has a relationship to an Address class:
public class Customer
{
public int Id { get; set; }
public string Name { get; set; }
public virtual ICollection<Address> Addresses { get; set; }
}
public class Address
{
public int Id { get; set; }
public string Street1 { get; set; }
//Snip a bunch of properties
public virtual Customer Customer { get; set; }
}
I have an edit form which displays all the fields for both the customer and address. When this form is submitted, it calls the Edit method in the controller:
public ActionResult Save(Customer customer)
{
if (!ModelState.IsValid)
{
var viewModel = new CustomerFormViewModel
{
Customer = customer,
CustomerTypes = _context.CustomerTypes.ToList()
};
return View("CustomerForm", viewModel);
}
if (customer.Id == 0)
_context.Customers.Add(customer);
else
{
var existingCustomer = _context.Customers
.Include(c => c.Addresses)
.Single(c => c.Id == customer.Id);
existingCustomer.Name = customer.Name;
existingCustomer.TaxId = customer.TaxId;
existingCustomer.CustomerTypeId = customer.CustomerTypeId;
existingCustomer.CreditLimit = customer.CreditLimit;
existingCustomer.Exempt = customer.Exempt;
existingCustomer.Addresses = customer.Addresses;
}
_context.SaveChanges();
return RedirectToAction("Index", "Customers");
}
This doesn't work and creates duplicate entries in the Addresses table in the DB. I think I understand why (EF isn't smart enough to know the Addresses inside the collection need to be added/modified/deleted as the case may be). So, what is the best way to fix this?
My instinct is that I need to iterate over the Addresses collections and compare them manually, adding any new ones from the form that don't exist for the customer, updating ones that do exist, and deleting ones that were not sent by the form but exist in the DB for the customer. Something like (ignoring the delete functionality for now):
foreach(Address address in customer.Addresses)
{
if (address.Id == 0)
// Add record
else
// Fetch address record from DB
// Update data
}
// Save context
Is this the best way to go about this, or are there any EF tricks to iterating and syncing a child collection to the DB?
Oh, and one question which has me scratching my head - I can sort of understand how a new address record is getting created in the DB, but what I don't get is the existing address record is also updated to have its customer_id set to NULL...how the heck does that happen? That leads me to believe that EF does see the original address record is somehow linked (as it is modifying it) but it's not smart enough to realize the record I'm passing in should replace it?
Thanks -- also, this is EF6 and MVC5
The problem comes from the line
existingCustomer.Addresses = customer.Addresses;
in your code. This like assigns field Addresses from customer coming from the model. So far ok. The point is that customer does not have any relation to the database model at this point (it's not coming from the database but from the view).
If you would like to update existingCustomer.Addresses with the data coming from the model, you need to merge the data instead of replacing it. The following "pseudo code" might give you a direction:
void MergeAddresses(var existingAddresses, var newAddresses) {
foreach(var address in newAddresses) {
if (existingAddresses.Contains(newAddress)) {
// merge fields if applicable
}
else {
// add field to existingAddresses - be ware to use a "cloned" list
}
}
// now delete items from existing list
foreach (var address in existingAddresses.CloneList()) {
if (!newAddresses.Contains(address)) {
// remove from existingAddresses
}
}
}
Is this the best way to go about this, or are there any EF tricks to iterating and syncing a child collection to the DB?
No, there aren't such tricks. EF designers left saving detached entities totally up to us - the developers.
However there is a package called GraphDiff which is addressing that, so you could give it a try. Here is how your code would look like using it:
using RefactorThis.GraphDiff;
...
_context.UpdateGraph(customer, map => map.OwnedCollection(
e => e.Addresses, with => with.AssociatedEntity(e => e.Customer)));
_context.SaveChanges();

EF DbSet.Find throws InvalidOperationException

I have two entities connected by TPT inheritance pattern:
public class User {...}
public class Employee : User {...}
As you can see, base class isn't abstract so both entity types can be added into db-sets. There are two separate sets (I need them both in my model):
public DbSet<User> Users { get; set; }
public DbSet<Employee> Employees { get; set; }
So, basically, Users table contains all entities and Employees holds additional data only for objects that were instantiated as new Employee().
Now, when I try to get entity from Employees set using Find method, I'm expecting that it will only return 'actual' employees. But if I'm specifying Id of the User entity, EF still fetches it from the database and then throws an InvalidOperationException:
"The specified cast from a materialized
'System.Data.Entity.DynamicProxies.User_B2E5EC989E36BE8C53B9285A70C4E879F0B5672E1D141B93FD299D1BA60258EE'
type to the 'Data.Employee' type is not valid."
It can't cast User to Employee, which is understandable.
My question is - is there a way to configure TPT inheritance so Find just returns null in such cases as it does when you pass non-existing Id into it.
My current workaround is this:
public Employee GetEmployeeById(int id)
{
try
{
return Employees.Find(id);
}
catch(InvalidOperationException ex) when (ex.Message.StartsWith("The specified cast from a materialized"))
{
return null;
}
}
But I don't like how it looks - so maybe there is a better (more elegant) solution?
I tend to prefer singleordefault()/firstordefault() over find as it will return null directly if no matches are found, but could you use a predicate with Find like this?
return Employees.Find(em => em.id == id && em is Employee);
Your are missing your DbContext instance. You can't search on the Table Type coz thats declaration.
var checkfind = dbInstance.Employees.Find(searchedID);
If you don't have access directly to your Db you use
using (DBLocal db = new DBLocal())
{
db.Employees.Find(searchedID);
}

Using Linq to get records that have specific foreign key

I'm trying get all records from a table that have a specific foreign key but I'm struggling to get linq to return anything useful.
Post Model
public class Post
{
//... irrelevant properties
[ForeignKey("Category")]
public int CategoryId;
public virtual Category Category { get; set; }
}
my dbo.Posts table
What I've tried
I've tried several variations of the following:
//id = 7
using (UnitOfWork uwork = new UnitOfWork())
{
var post = uwork.PostRepository.GetAll().Where(c => c.CategoryId == id);
}
This only returns "Non-Public members", which doesn't contain anything useful.
Question
How could I modify my linq query to return all posts that have a specific Foreign Key id?
updates
here's my repository
It seems that you are basically looking at DbQuery<T> object, which is an implementation of IQueryable<T>. Basically LINQ did not make a query yet, because no one asked it for data. So instead it collects all info about the query in an object, to execute it later when needed.
To force it to give you the actual data, simply do ToList, or iterate over posts, or anything:
var post = uwork.PostRepository.GetAll().Where(c => c.CategoryId == id).ToList();
Just make sure to do so before you expose the db context object.

Categories

Resources