Is DbSet<>.Local something to use with special care? - c#

For a few days now, I have been struggling with retrieving my entities from a repository (DbContext).
I am trying to save all the entities in an atomic action. Thus, different entities together represent something of value to me. If all the entities are 'valid', then I can save them all to the database. Entity 'a' is already stored in my repository, and needs to be retrieved to 'validate' entity 'b'.
That's where the problem arises. My repository relies on the DbSet<TEntity> class which works great with Linq2Sql (Include() navigation properties e.g.). But, the DbSet<TEntity> does not contain entities that are in the 'added' state.
So I have (as far as I know) two options:
Use the ChangeTracker to see which entities are available and query them into a set based on their EntityState.
Use the DbSet<TEntity>.Local property.
The ChangeTracker seems to involve some extra hard work to get it working in a way such that I can use Linq2Sql to Include() navigation properties e.g.
The DbSet<TEntity>.Local seems a bit weird to me. It might just be the name. I just read something that it is not performing very well (slower than DbSet<> itself). Not sure if that is a false statement.
Could somebody with significant EntityFramework experience shine some light on this? What's the 'wise' path to follow? Or am I seeing ghosts and should I always use the .Local property?
Update with code examples:
An example of what goes wrong
public void AddAndRetrieveUncommittedTenant()
{
_tenantRepository = new TenantRepository(new TenantApplicationTestContext());
const string tenantName = "testtenant";
// Create the tenant, but not call `SaveChanges` yet until all entities are validated
_tenantRepository.Create(tenantName);
//
// Some other code
//
var tenant = _tenantRepository.GetTenants().FirstOrDefault(entity => entity.Name.Equals(tenantName));
// The tenant will be null, because I did not call save changes yet,
// and the implementation of the Repository uses a DbSet<TEntity>
// instead of the DbSet<TEntity>.Local.
Assert.IsNotNull(tenant);
// Can I safely use DbSet<TEntity>.Local ? Or should I play
// around with DbContext.ChangeTracker instead?
}
An example of how I want to use my Repository
In my Repository I have this method:
public IQueryable<TEntity> GetAll()
{
return Context.Set<TEntity>().AsQueryable();
}
Which I use in business code in this fashion:
public List<Case> GetCasesForUser(User user)
{
return _repository.GetAll().
Where(#case => #case.Owner.EmailAddress.Equals(user.EmailAddress)).
Include(#case => #case.Type).
Include(#case => #case.Owner).
ToList();
}
That is mainly the reason why I prefer to stick to DbSet like variables. I need the flexibility to Include navigation properties. If I use the ChangeTracker I retrieve the entities in a List, which does not allow me to lazy load related entities at a later point in time.
If this is close to incomprehensible bullsh*t, then please let me know so that I can improve the question. I desperately need an answer.
Thx a lot in advance!

If you want to be able to 'easily' issue a query against the DbSet and have it find newly created items, then you will need to call SaveChanges() after each entity is created. If you are using a 'unit of work' style approach to working with persistent entities, this is actually not problematic because you can have the unit of work wrap all actions within the UoW as a DB transaction (i.e. create a new TransactionScope when the UoW is created, and call Commit() on it when the UoW completed). With this structure, the changes are sent to the DB, and will be visible to DbSet, but not visible to other UoWs (modulo whatever isolation level you use).
If you don't want the overhead of this, then you need to modify your code to make use of Local at appropriate times (which may involve looking at Local, and then issuing a query against the DbSet if you didn't find what you were looking for). The Find() method on DbSet can also be quite helpful in these situations. It will find an entity by primary key in either Local or the DB. So if you only need to locate items by primary key, this is pretty convenient (and has performance advantages as well).

As mentioned by Terry Coatta, the best approach if you don't want to save the records first would be checking both sources.
For example:
public Person LookupPerson(string emailAddress, DateTime effectiveDate)
{
Expression<Func<Person, bool>> criteria =
p =>
p.EmailAddress == emailAddress &&
p.EffectiveDate == effectiveDate;
return LookupPerson(_context.ObjectSet<Person>.Local.AsQueryable(), criteria) ?? // Search local
LookupPerson(_context.ObjectSet<Person>.AsQueryable(), criteria); // Search database
}
private Person LookupPerson(IQueryable<Person> source, Expression<Func<Person, bool>> predicate)
{
return source.FirstOrDefault(predicate);
}

For those who come after, I ran into some similar issues and decided to give the .Concat method a try. I have not done extensive performance testing so someone with more knowledge than I should feel free to chime in.
Essentially, in order to properly break up functionality into smaller chunks, I ended up with a situation in which I had a method that didn't know about consecutive or previous calls to that same method in the current UoW. So I did this:
var context = new MyDbContextClass();
var emp = context.Employees.Concat(context.Employees.Local).FirstOrDefault(e => e.Name.Contains("some name"));

This may only apply to EF Core, but every time you reference .Local of a DbSet, you're silently triggering change detection on the context, which can be a performance hit, depending on how complex your model is, and how many entries are currently being tracked.
If this is a concern, you'll want to use (fore EFCore) dbContext.ChangeTracker.Entries<T>() to get the locally tracked entities, which will not trigger change detection, but does require manual filtering of the DB state, as it will include deleted and detached entities.
There's a similar version of this in EF6, but in EFCore the Entries is a list of EntityEntries which you'll have to select out the entry.Entity to get out the same data the DbSet would give you.

Related

Is it okay to use .ToList() to bypass DbContext tracking?

I would like to keep the lifespan of my DbContext as short as possible, so I take advantage of the using statement.
However, since Entity Framework tracks the entities, I cannot do something like this :
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons;
}
}
Because as soon as I use the list of Persons in my ViewModel, I will receive an InvalidOperationException (the context has been disposed before I use the object).
To bypass that, I convert the result of the query to a concrete List, by calling ToList()
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons.ToList();
}
}
Is this an okay thing to do ? AsNoTracking() doesn't have any effects because it still returns some entity that gets dropped if the context is disposed.
When it comes to Entities you have two options. One is simple, and the other looks even simpler but is a whole lot more complex.
The simple option: An entity should never be referenced beyond the scope of it's DBContext. This means in something like an MVC app, entities don't get sent to a view. They don't get serialized, they are consumed solely within the scope of their DbContext.
The key to this approach is using POCO view models and projection, (Select and Automapper's ProjectTo) and when dealing with various common layers and such, adopting something like a Unit of Work pattern to manage the lifetime scope of a DbContext.
The deceptively simple option: Detached entities. Simply ensuring the DbContext is alive when returning entities can be enough to solve your problem. A Unit of Work pattern can be leveraged for this approach as well. While it looks simple enough, it is loaded with pitfalls. Serializing an entity for instance to pass back to a view will trigger lazy load calls as the serializer traverses the entity(ies). Passing entities back to be persisted is also problematic as you have to be cautious of trying to re-attach entities where the Context may already be tracking leading to situational runtime errors, overwriting data with stale copies, and exposing your system to unintended tampering if overly trusting the passed in entity. From a performance standpoint, this is also the worst option as you will be serializing and transmitting entire entity graphs back and forth rather than just the data a view needs.
Fixing your problem with option 2 is generally as easy as Eager Loading anything the calling code might touch. For example, if an Person has a reference to an Address:
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons
.Include(x => x.Address)
.ToList();
}
}
However, here is where the complexity creeps in. This requires the method to know about how the Persons might be consumed. Person might have references to five or more other objects/sets, and those objects may have references to more. (I.e. Address having a reference to a Country entity, or AddressType, etc.) The catch-all fix ends up being to eager load everything, which results in a lot of unnecessary data and is prone to bugs as entities evolve and code grows to rely on different bits. Then when performance becomes a problem you start diving down a rabbit hole in trying to support the caller telling this method what to eager load. As the system grows and you want to support pagination, sorting, etc. the method becomes more and more complex or it becomes slower and slower.
The solution I advocate for is to ensure that entities never cross the boundary of their DbContext, and leveraging Linq/EF's IQueryable implementation in combination with a unit of work so that consumers of these repository or service methods can be responsible for the scope of the DbContext. For example:
private AppDbContext Context
{
get { return AmbientDbContextLocator.Get<AppDbContext>(); }
}
public IQueryable<Person> GetPersons()
{
Logging.Log(Information, "Requested getPersons from service");
return Context.Persons.AsQueryable();
}
Then in calling code:
var mapperConfig = new MapperConfiguration(cfg =>
{
cfg.CreateMap<Person, PersonViewModel>(); // Can include any relevant mapping to flatten needed fields...
});
using( var contextScope = ContextScopeFactory.Create())
{
var viewModels = PersonRepository.GetPersons()
.ProjectTo<PersonViewModel>(mapperConfig)
.ToList();
return View(viewModel);
}
Often you will have low-level rules such as defaulting to only returning Active rows. .AsQueryable() is only needed if you have no rules. Anything that results in a .Where or such inside the repository/service method returns IQueryable so it could return:
return Context.Persons.Where(x => x.IsActive);
Repositories can enforce low level filtering rules like IsActive, authorization checks, etc. leaving more variable filtering, sorting, etc. up to the consumers.
The advantage of this approach is that the consumer (Controller, etc.) has full control over how the data is consumed without introducing any complexity/business logic into the repository/service. If I want to support sorting and pagination:
var viewModels = PersonRepository.GetPersons()
.OrderBy(x => x.Age)
.ProjectTo<PersonViewModel>(mapperConfig)
.Skip(pageNumber * pageSize)
.Take(pageSize)
.ToList();
The key point is that the entity (Person) doesn't leave the scope of the unit of work / DbContext, it is projected into a ViewModel which holds no references to entities, only data. That model can be safely serialized and represents only the data that the view needs. For projection I've used Automappper's ProjectTo as an example. You can use Linq's Select as a more manual option. The unit of work pattern I use and have outlined above is Mehdime's DbContextScope.
If you just return db.Persons which is an IEnumerable<Person> to some caller and then dispose the db context before that caller can iterate the result, you will get the exception, due to the lazy nature by default of entity framework. ToList gets around this by forcing iteration immediately before the context is disposed, but still has overhead for object tracking, so combining ToList with AsNoTracking gets you the best performance.
AsNoTracking removes some internal tracking code inside the db context, it does not allow you to pass out the IEnumerable<Person> after disposing the context, you still need ToList for that.
Use the extension method AsNoTracking and then call ToList:
db.Persons.AsNoTracking().ToList();
Or you can disable change tracking on the change tracker property of the db context to make the whole thing read only, which avoids having to call AsNoTracking everywhere. You still need ToList with this approach.
/// <summary>
/// Disable all change tracking - place in db context sub class
/// </summary>
public void DisableChangeTracking()
{
ChangeTracker.AutoDetectChangesEnabled = false;
ChangeTracker.LazyLoadingEnabled = false;
ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;
}
It is acceptable to do so. The problem with IEnumerable is that Entity Framework supports deferred execution, so they won't be populated until you fetch them to a concrete collection (i.e. .ToList()), which could happen after it's been disposed of.
Your 2 possibilities are:
Executing the work on the deferred object by fetching it to a concrete collection (.ToList()).
Using dependency injection with a scoped Lifetime to let it live until the request completes.
Both are correct, and which one to prefer depends on your use-case.

Maintaining Referential Integrity Without Actually Deleting A Record

Rather than deleting an entry from the database, I am planning on using a boolean column like isActive in every table and manage its true/false state.
Normally when you delete a record from the database,
referential integrity is maintained, which means you cannot delete it if before deleting its dependencies.
when you query a deleted record, it returns null
How can I achieve the same results in an automated way using Entity Framework? Because checking isActive field for every entity in every query manually seems too much work which will be error-prone. And the same holds true for marking the dependencies as isActive=false.
EDIT:
My purpose is not limited to point-in-time queries. Let me give an example. UserA posted a photo and UserB wrote a comment on it. Then UserB wanted to delete his account. But the comment has its poster FK pointing at UserB. So, rather than deleting UserB, I want to deactivate its account but keep the record in order not to break dependencies.
And I want to extend this logic to every table in the database. Is that wrong?
As kind of a side answer to this question, instead of querying all of the tables directly why not use Views and then query the views? You can place a filter in the view to only display the "IsActive = true" records, that way you don't have to worry about including it manually in every query (something you mention is error prone).
Because checking isActive field for every entity in every query manually seems too much work which will be error-prone
It is error prone. But you may not always want only the active records (admin page?). You may also not want to soft delete ALL records, as not everything makes sense to keep around (in my experience). You could use an Expression to help you out / wire it up for certain methods / repositories and build dynamic queries.
Expression<Func<MyModel, bool>> IsActive = x => x.IsActive;
And the same holds true for marking the dependencies as isActive=false
A base repository could handle the delete for all your repositories, which would set the status to false (where the BaseModel would have an IsActive property).
public int Delete<TEntity>(long id) where TEntity : BaseModel
{
using (var context = GetContext())
{
var dbEntity = context.Set<TEntity>().Find(id);
dbEntity.IsActive = false;
return context.SaveChanges();
}
}
There is an OSS tool called EF Filters that can achieve what you are looking for: https://github.com/jbogard/EntityFramework.Filters
It let's you set global filters like an IsActive field and would certainly work for queries.

How does entity framework access associations?

I have two tables, Kittens and Owners in my Entity Framework model. A Kitten has 1 Owner but an Owner can have many Kittens.
I have a repository method called GetKittens() that returns _context.Kittens.ToList();
Since I set up an association, I can do kitten.Owner.Name.
But since ToList() was already called, and the context disposed of, how does it access the property? When retrieving an Entity, does it do a Join to all tables that have an association?
I have to write a query that pulls data from 4 tables, so I am wondering how to do this efficiently, hence this question trying to understand a bit more about how EF works.
By default, a DbContext will use lazy loading. There is a few options available to you, depending on your use cases.
1- If you have control over the lifetime of your DbContext, do not dispose it. However, every time you will access a related entity (for the first time), a new query will be sent to the database to fetch it.
2- Eagerly include the related entity by use Include on the IQueryable<Kitten>:
// For imagine context is the DbContext for your EF Model
context.Kittens.Include(c => c.Owners); // Or Include("Owners")
However, if you have no control over your repository, you have no option but to call a related method of your repository (like IEnumerable<Owner> GetOwners(Kitten kitten)) since the repository already returns the list.
If you do, consider either eagerly include the Kitten's owners in the repository before materializing with ToList() or return an IQuerable and leave the responsibility to the calling class to include related entities or customizing the query. If you do not want a caller to be able to alter the query, you can add an overload with includes that could be something along the line of:
public List<Kitten> GetKittens(params string[] includes)
{
return includes.Aggregate(
context.Kittens.AsQueryable(),
(query, include) => return query.Include(include)).ToList();
}
All in all, this is an implementation decision that you will have to take.

Simple transactions in EF4.0 POCO with UnitOfWork pattern and ApplicationBus

i recently ran into the surprising behaviour of EF4, where after adding an entity to a context it is not available for querying (well, you need to make your queries aware, that you might be searching in the memory) unless SaveChanges() is called.
Let me explain a bit our scenario:
We are using the UnitOfWork pattern with EF 4.0 and POCO objects. We recently decided to implement a Message Bus, where we would implement in the message handlers most of the application logic.
The problem i ran into was when i was passing around my UnitOfWork(a context wrapper in our case) in the messages. For example I have the logic of printing a Barcode, when i do it, it should change the Printed Counter in the DB. The printing can happen ad hoc for an existing package, or can be done automatically when creating a special type of a package. I pass over the UnitOfWork, and then i look for a barcode with:
public void Handle(IBarCodePrintMessage message)
{
if (message.UnitOfWork == null)
using (var uow = factory.Create<IUnitOfWork>)
{
Handle(message, uow);
uow.Commit();
}
else
Handle(message, message.UnitOfWork);
}
void Handle(IBarCodePrintMessage message, IUnitOfWork uow)
{
// the barcode with the PackageID is in the context or in the db
var barCode = uow.BarCodes.Where(b => b.PackageID == message.PackageID).SingleOrDefault();
barCode.IncreasePrintCount(); // this might be actually quite complex, and sometimes fail
printingServices.PrintBarCode(barCode);
}
My problem is, that if the Barcode was added within this same uow, and there was no Commit yet, then it is not found.
This is the simpliest example, we have tons of code that was doing its own commit, and now all this needs to go under one transaction.
I have a couple of questions actually:
1) I was thinking of somehow hacking my IUnitOfWork, to return a set of objects, that might be either in memory (not commited changes), or in the DB (if not retrieved yet). This is actually a behaviour i would expect my UnitOfWork, tell me what is the last state i am at, even though i didnt commit yet, and not to give me the db state. For the db state i can create another context.
Anyways this seems to be quite tricky, cause i would need to implement my own type of entity collection, all the extendsion methods on it (where, select, first, groupby, etc), then to get it to work the IQueryable way (meaning it doesnt list the tables straight away), and then find a way to match up the locally cached entities and the retrieved ones.
To me this seems to be a generic problem, and i would believe there is already an implementation out there, just not sure where.
2) another option is to introduce manual transactions. I tried it against SQLCE4.0 and it might solve the issue (call savecontext very often, so the entities are queriable from the db, and then if any error occurs, rollback the transaction), but i have big doubts. There might be different threads runnign different transactions at the same time, not sure how would they interact if one rolls back.
Also we are using SQL CE 4.0 and SQL Express 2008 (both, and can be switched dynamically). Starting to handle transactions this way seems to bring in the DTC, which - i read everywhere - is quite a heavy thing, and i would prefer to avoid. Is there a way to use transactions in an easy way without the risk of leveraging them to the DTC?
3) Does anyone have any other options or ideas on how to go about this problem?
You can query context for not persisted entities.
var entity = context.ObjectStateManager
.GetObjectStateEntries(EntityState.Added)
.Where(e => !e.IsRelationship)
.Select(e => e.Entity)
.OfType<BarCode>()
.FirstOrDefualt(b => b.PackageId == message.PackageId);
You can make this more generic and incorporate it into your UoW logic where you will first check your unsaved entities and if you not find the entity you will query the database.
Anyway if you know that you create a new barcode which will be needed in the following processing it should be passed in the "message" as property exposed on your interface.

Repository Pattern without an ORM

I am using repository pattern in a .NET C# application that does not use an ORM. However the issue I am having is how to fill One-to-many List properties of an entity. e.g. if a customer has a list of orders i.e. if the Customer class has a List property called Orders and my repository has a method called GetCustomerById, then?
Should I load the Orders list within the GetCustomerById method?
What if the Order itself has another list property and so on?
What if I want to do lazy loading? Where would I put the code to load the Orders property in customer? Inside the Orders property get{} accessor? But then I would have to inject repository into the domain entity? which I don't think is the right solution.
This also raises questions for Features like Change Tracking, Deleting etc? So i think the end result is can I do DDD without ORM ?
But right now I am only interested in lazy loading List properties in my domain entities? Any idea?
Nabeel
I am assuming this is a very common issue for anyone not using an ORM in a Domain Driven Design? Any idea?
can I do DDD without ORM ?
Yes, but an ORM simplifies things.
To be honest I think your problem isn't to do with whether you need an ORM or not - it's that you are thinking too much about the data rather than behaviour which is the key for success with DDD. In terms of the data model, most entities will have associations to most another entities in some form, and from this perspective you could traverse all around the model. This is what it looks like with your customer and orders and perhaps why you think you need lazy loading. But you need to use aggregates to break these relationships up into behavioural groups.
For example why have you modelled the customer aggregate to have a list of order? If the answer is "because a customer can have orders" then I'm not sure you're in the mindset of DDD.
What behaviour is there that requires the customer to have a list of orders? When you give more thought to the behaviour of your domain (i.e. what data is required at what point) you can model your aggregates based around use cases and things become much clearer and much easier as you are only change tracking for a small set of objects in the aggregate boundary.
I suspect that Customer should be a separate aggregate without a list of orders, and Order should be an aggregate with a list of order lines. If you need to perform operations on each order for a customer then use orderRepository.GetOrdersForCustomer(customerID); make your changes then use orderRespository.Save(order);
Regarding change tracking without an ORM there are various ways you can do this, for example the order aggregate could raise events that the order repository is listening to for deleted order lines. These could then be deleted when the unit of work completed. Or a slightly less elegant way is to maintain deleted lists, i.e. order.DeletedOrderLines which your repository can obviously read.
To Summarise:
I think you need to think more about behaviour than data
ORM's make life easier for change tracking, but you can do it without one and you can definitely do DDD without one.
EDIT in response to comment:
I don't think I'd implement lazy loading for order lines. What operations are you likely to perform on the order without needing the order lines? Not many I suspect.
However, I'm not one to be confined to the 'rules' of DDD when it doesn't seem to make sense, so... If in the unlikely scenario that there are a number of operations performed on the order object that didn't require the order lines to be populated AND there are often a large number of order lines associated to an order (both would have to be true for me to consider it an issue) then I'd do this:
Have this private field in the order object:
private Func<Guid, IList<OrderLine>> _lazilyGetOrderLines;
Which would be passed by the order repository to the order on creation:
Order order = new Order(this.GetOrderLines);
Where this is a private method on the OrderRepository:
private IList<OrderLine> GetOrderLines(Guid orderId)
{
//DAL Code here
}
Then in the order lines property could look like:
public IEnumberable<OrderLine> OrderLines
{
get
{
if (_orderLines == null)
_orderLines = _lazilyGetOrderLines(this.OrderId);
return _orderLines;
}
}
Edit 2
I've found this blog post which has a similar solution to mine but slightly more elegant:
http://thinkbeforecoding.com/post/2009/02/07/Lazy-load-and-persistence-ignorance
1) Should I load the Orders list within the GetCustomerById method?
It's probably a good idea to separate the order mapping code from the customer mapping code. If you're writing your data access code by hand, calling that mapping module from the GetCustomerById method is your best option.
2) What if the Order itself has another list property and so on?
The logic to put all those together has to live somewhere; the related aggregate repository is as good a place as any.
3) What if I want to do lazy loading? Where would I put the code to load the Orders property in customer? Inside the Orders property get{} accessor? But then I would have to inject repository into the domain entity? which I don't think is the right solution.
The best solution I've seen is to make your repository return subclassed domain entities (using something like Castle DynamicProxy) - that lets you maintain persistence ignorance in your domain model.
Another possible answer is to create a new Proxy object that inherits from Customer, call it CustomerProxy, and handle the lazy load there. All this is pseudo-code, so it's to give you an idea, not just copy and paste it for use.
Example:
public class Customer
{
public id {get; set;}
public name {get; set;}
etc...
public virtual IList<Order> Orders {get; protected set;}
}
here is the Customer "proxy" class... this class does not live in the business layer, but in the Data Layer along with your Context and Data Mappers. Note that any collections you want to make lazy-load you should declare as virtual (I believe EF 4.0 also requires you to make props virtual, as if spins up proxy classes at runtime on pure POCO's so the Context can keep track of changes)
internal sealed class CustomerProxy : Customer
{
private bool _ordersLoaded = false;
public override IList<Order> Orders
{
get
{
IList<Order> orders = new List<Order>();
if (!_ordersLoaded)
{
//assuming you are using mappers to translate entities to db and back
//mappers also live in the data layer
CustomerDataMapper mapper = new CustomerDataMapper();
orders = mapper.GetOrdersByCustomerID(this.ID);
_ordersLoaded = true;
// Cache Cases for later use of the instance
base.Orders = orders;
}
else
{
orders = base.Orders;
}
return orders;
}
}
}
So, in this case, our entity object, Customer is still free from database/datamapper code calls, which is what we want... "pure" POCO's. You've delegated the lazy-load to the proxy object which lives in the Data layer, and does instantiate data mappers and make calls.
there is one drawback to this approach, which is calling client code can't override the lazy load... it's either on or off. So it's up to you in your particular usage circumstance. If you know maybe 75% of the time you'll always needs the Orders of a Customer, than lazy-load is probably not the best bet. It would be better for your CustomerDataMapper to populate that collection at the time you get a Customer entity.
Again, I think NHibernate and EF 4.0 both allow you to change lazy-loading characteristics at runtime, so, as per usual, it makes sense to use an ORM, b/c a lot of functionality is provided for you.
If you don't use Orders that often, then use a lazy-load to populate the Orders collection.
I hope that this is "right", and is a way of accomplishing lazy-load the correct way for Domain Model designs. I'm still a newbie at this stuff...
Mike

Categories

Resources