C# WinForms: I'm using code first approach with Entity framework v6.2.0 and lazy loading, the problem is that it takes 4 to 5 seconds (only the first time) to load the data in our grid. I would like to reduce this load time. Any help?
public List<ShipmentOrder> GetShipmentOrder()
{
var ObjShipmentOrderResult = context.shipmentOrders.ToList();
List<ShipmentOrderEntity> ObjShipmentOrder = null;
if (ObjShipmentOrderResult != null)
{
ObjShipmentOrder = new List<ShipmentOrderEntity>();
foreach (var item in ObjShipmentOrderResult)
{
context.Entry(item).Reference(e => e.storageGate).Load();
context.Entry(item).Reference(e => e.shippingOrderStatus).Load();
context.Entry(item).Reference(e => e.packingOrder).Load();
ObjShipmentOrder.Add(item);
}
context.Database.Connection.Close();
return AutoMapper.Mapper.Map<List<ShipmentOrder>>ObjShipmentOrder);
}
else
{
return null;
}
}
Using Automapper you don't need to worry about lazy loads:
public List<ShipmentOrderViewModel> GetShipmentOrder()
{
var query= context.shipmentOrders.AsQueryable();
return query.ProjectTo<ShipmentOrderViewModel>().ToList();
}
ProjectTo takes an IQueryable (what EF queries normally deal with, though in this case because you are using the full DbSet, we need the AsQueryable()) and will project into that queryable to load just the data needed for your view model. This will result in an optimized query to the database to load all the fields necessary to populate the list of view models.
A few smells from the original code:
Obviously as the system grows you need to limit the # of records that this will pull back. Simply returning everything in a DbSet will quickly become a big problem as the # of records increases.
I've clarified the naming for the returned object as a View Model to ensure that it differentiates from the Entity class. In some cases the view model may pretty much map fields 1-to-1 with an entity, but passing entities outside of the scope of a DbContext is rife with issues, and even dual-purposing an entity by selecting a new, detached copy and treating it like a view model isn't advisable. (An entity class should always be treated like an entity, it gets confusing down the road if entities may be detached vs. attached, or incomplete representations of data.)
Lastly, why are you closing the underlying database connection in the context explicitly? How is the context scoped? If you are using an IoC container like Autofac, Unity, Ninject, etc. this should be managed automatically as the context is disposed. Provided the context is scoped to the request. If not, where is the context constructed? If there is a need to close the connection then the context should be scoped with a using() block until you are able to implement an IoC container to manage its lifetime.
Related
I would like to keep the lifespan of my DbContext as short as possible, so I take advantage of the using statement.
However, since Entity Framework tracks the entities, I cannot do something like this :
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons;
}
}
Because as soon as I use the list of Persons in my ViewModel, I will receive an InvalidOperationException (the context has been disposed before I use the object).
To bypass that, I convert the result of the query to a concrete List, by calling ToList()
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons.ToList();
}
}
Is this an okay thing to do ? AsNoTracking() doesn't have any effects because it still returns some entity that gets dropped if the context is disposed.
When it comes to Entities you have two options. One is simple, and the other looks even simpler but is a whole lot more complex.
The simple option: An entity should never be referenced beyond the scope of it's DBContext. This means in something like an MVC app, entities don't get sent to a view. They don't get serialized, they are consumed solely within the scope of their DbContext.
The key to this approach is using POCO view models and projection, (Select and Automapper's ProjectTo) and when dealing with various common layers and such, adopting something like a Unit of Work pattern to manage the lifetime scope of a DbContext.
The deceptively simple option: Detached entities. Simply ensuring the DbContext is alive when returning entities can be enough to solve your problem. A Unit of Work pattern can be leveraged for this approach as well. While it looks simple enough, it is loaded with pitfalls. Serializing an entity for instance to pass back to a view will trigger lazy load calls as the serializer traverses the entity(ies). Passing entities back to be persisted is also problematic as you have to be cautious of trying to re-attach entities where the Context may already be tracking leading to situational runtime errors, overwriting data with stale copies, and exposing your system to unintended tampering if overly trusting the passed in entity. From a performance standpoint, this is also the worst option as you will be serializing and transmitting entire entity graphs back and forth rather than just the data a view needs.
Fixing your problem with option 2 is generally as easy as Eager Loading anything the calling code might touch. For example, if an Person has a reference to an Address:
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons
.Include(x => x.Address)
.ToList();
}
}
However, here is where the complexity creeps in. This requires the method to know about how the Persons might be consumed. Person might have references to five or more other objects/sets, and those objects may have references to more. (I.e. Address having a reference to a Country entity, or AddressType, etc.) The catch-all fix ends up being to eager load everything, which results in a lot of unnecessary data and is prone to bugs as entities evolve and code grows to rely on different bits. Then when performance becomes a problem you start diving down a rabbit hole in trying to support the caller telling this method what to eager load. As the system grows and you want to support pagination, sorting, etc. the method becomes more and more complex or it becomes slower and slower.
The solution I advocate for is to ensure that entities never cross the boundary of their DbContext, and leveraging Linq/EF's IQueryable implementation in combination with a unit of work so that consumers of these repository or service methods can be responsible for the scope of the DbContext. For example:
private AppDbContext Context
{
get { return AmbientDbContextLocator.Get<AppDbContext>(); }
}
public IQueryable<Person> GetPersons()
{
Logging.Log(Information, "Requested getPersons from service");
return Context.Persons.AsQueryable();
}
Then in calling code:
var mapperConfig = new MapperConfiguration(cfg =>
{
cfg.CreateMap<Person, PersonViewModel>(); // Can include any relevant mapping to flatten needed fields...
});
using( var contextScope = ContextScopeFactory.Create())
{
var viewModels = PersonRepository.GetPersons()
.ProjectTo<PersonViewModel>(mapperConfig)
.ToList();
return View(viewModel);
}
Often you will have low-level rules such as defaulting to only returning Active rows. .AsQueryable() is only needed if you have no rules. Anything that results in a .Where or such inside the repository/service method returns IQueryable so it could return:
return Context.Persons.Where(x => x.IsActive);
Repositories can enforce low level filtering rules like IsActive, authorization checks, etc. leaving more variable filtering, sorting, etc. up to the consumers.
The advantage of this approach is that the consumer (Controller, etc.) has full control over how the data is consumed without introducing any complexity/business logic into the repository/service. If I want to support sorting and pagination:
var viewModels = PersonRepository.GetPersons()
.OrderBy(x => x.Age)
.ProjectTo<PersonViewModel>(mapperConfig)
.Skip(pageNumber * pageSize)
.Take(pageSize)
.ToList();
The key point is that the entity (Person) doesn't leave the scope of the unit of work / DbContext, it is projected into a ViewModel which holds no references to entities, only data. That model can be safely serialized and represents only the data that the view needs. For projection I've used Automappper's ProjectTo as an example. You can use Linq's Select as a more manual option. The unit of work pattern I use and have outlined above is Mehdime's DbContextScope.
If you just return db.Persons which is an IEnumerable<Person> to some caller and then dispose the db context before that caller can iterate the result, you will get the exception, due to the lazy nature by default of entity framework. ToList gets around this by forcing iteration immediately before the context is disposed, but still has overhead for object tracking, so combining ToList with AsNoTracking gets you the best performance.
AsNoTracking removes some internal tracking code inside the db context, it does not allow you to pass out the IEnumerable<Person> after disposing the context, you still need ToList for that.
Use the extension method AsNoTracking and then call ToList:
db.Persons.AsNoTracking().ToList();
Or you can disable change tracking on the change tracker property of the db context to make the whole thing read only, which avoids having to call AsNoTracking everywhere. You still need ToList with this approach.
/// <summary>
/// Disable all change tracking - place in db context sub class
/// </summary>
public void DisableChangeTracking()
{
ChangeTracker.AutoDetectChangesEnabled = false;
ChangeTracker.LazyLoadingEnabled = false;
ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;
}
It is acceptable to do so. The problem with IEnumerable is that Entity Framework supports deferred execution, so they won't be populated until you fetch them to a concrete collection (i.e. .ToList()), which could happen after it's been disposed of.
Your 2 possibilities are:
Executing the work on the deferred object by fetching it to a concrete collection (.ToList()).
Using dependency injection with a scoped Lifetime to let it live until the request completes.
Both are correct, and which one to prefer depends on your use-case.
We have an application using SDK provided by our provider to integrate easily with them. This SDK connects to AMQP endpoint and simply distributes, caches and transforms messages to our consumers. Previously this integration was over HTTP with XML as a data source and old integration had two ways of caching DataContext - per web request and per managed thread id. (1)
Now, however, we do not integrate over HTTP but rather AMQP which is transparent to us since the SDK is doing all the connection logic and we are only left with defining our consumers so there is no option to cache DataContext "per web request" so only per managed thread id is left.
I implemented chain of responsibility pattern, so when an update comes to us it is put in one pipeline of handlers which uses DataContext to update the database according to the new updates. This is how the invocation method of pipeline looks like:
public Task Invoke(TInput entity)
{
object currentInputArgument = entity;
for (var i = 0; i < _pipeline.Count; ++i)
{
var action = _pipeline[i];
if (action.Method.ReturnType.IsSubclassOf(typeof(Task)))
{
if (action.Method.ReturnType.IsConstructedGenericType)
{
dynamic tmp = action.DynamicInvoke(currentInputArgument);
currentInputArgument = tmp.GetAwaiter().GetResult();
}
else
{
(action.DynamicInvoke(currentInputArgument) as Task).GetAwaiter().GetResult();
}
}
else
{
currentInputArgument = action.DynamicInvoke(currentInputArgument);
}
}
return Task.CompletedTask;
}
The problem is (at least what I think it is) that this chain of responsibility is chain of methods returning/starting new tasks so when an update for entity A comes it is handled by managed thread id = 1 let's say and then only sometime after again same entity A arrives only to be handled by managed thread id = 2 for example. This leads to:
System.InvalidOperationException: 'An entity object cannot be referenced by multiple instances of IEntityChangeTracker.'
because DataContext from managed thread id = 1 already tracks entity A. (at least that's what I think it is)
My question is how can I cache DataContext in my case? Did you guys have the same problem? I read this and this answers and from what I understood using one static DataContext is not an option also.(2)
Disclaimer: I should have said that we inherited the application and I cannot answer why it was implemented like that.
Disclaimer 2: I have little to no experience with EF.
Comunity asked questions:
What version of EF we are using? 5.0
Why do entities live longer than the context? - They don't but maybe you are asking why entities need to live longer than the context. I use repositories that use cached DataContext to get entities from the database to store them in an in-memory collection which I use as a cache.
This is how entities are "extracted", where DatabaseDataContext is the cached DataContext I am talking about (BLOB with whole database sets inside)
protected IQueryable<T> Get<TProperty>(params Expression<Func<T, TProperty>>[] includes)
{
var query = DatabaseDataContext.Set<T>().AsQueryable();
if (includes != null && includes.Length > 0)
{
foreach (var item in includes)
{
query = query.Include(item);
}
}
return query;
}
Then, whenever my consumer application receives AMQP message my chain of responsibility pattern begins checking if this message and its data I already processed. So I have method that looks like that:
public async Task<TEntity> Handle<TEntity>(TEntity sportEvent)
where TEntity : ISportEvent
{
... some unimportant business logic
//save the sport
if (sport.SportID > 0) // <-- this here basically checks if so called
// sport is found in cache or not
// if its found then we update the entity in the db
// and update the cache after that
{
_sportRepository.Update(sport); /*
* because message update for the same sport can come
* and since DataContext is cached by threadId like I said
* and Update can be executed from different threads
* this is where aforementioned exception is thrown
*/
}
else // if not simply insert the entity in the db and the caches
{
_sportRepository.Insert(sport);
}
_sportRepository.SaveDbChanges();
... updating caches logic
}
I thought that getting entities from the database with AsNoTracking() method or detaching entities every time I "update" or "insert" entity will solve this, but it did not.
Whilst there is a certain overhead to newing up a DbContext, and using DI to share a single instance of a DbContext within a web request can save some of this overhead, simple CRUD operations can just new up a new DbContext for each action.
Looking at the code you have posted so far, I would probably have a private instance of the DbContext newed up in the Repository constructor, and then new up a Repository for each method.
Then your method would look something like this:
public async Task<TEntity> Handle<TEntity>(TEntity sportEvent)
where TEntity : ISportEvent
{
var sportsRepository = new SportsRepository()
... some unimportant business logic
//save the sport
if (sport.SportID > 0)
{
_sportRepository.Update(sport);
}
else
{
_sportRepository.Insert(sport);
}
_sportRepository.SaveDbChanges();
}
public class SportsRepository
{
private DbContext _dbContext;
public SportsRepository()
{
_dbContext = new DbContext();
}
}
You might also want to consider the use of Stub Entities as a way around sharing a DbContext with other repository classes.
Since this is about some existing business application I will focus on ideas that can help solve the issue rather than lecture about best practices or propose architectural changes.
I know this is kind of obvious but sometimes rewording error messages helps us better understand what's going on so bear with me.
The error message indicates entities are being used by multiple data contexts which indicates that there are multiple dbcontext instances and that entities are referenced by more than one of such instances.
Then the question states there is a data context per thread that used to be per http request and that entities are cached.
So it seems safe to assume entities read from a db context upon a cache miss and returned from the cache on a hit. Attempting to update entities loaded from one db context instance using a second db context instance cause the failure. We can conclude that in this case the exact same entity instance was used in both operations and no serialization/deserialization is in place for accessing the cache.
DbContext instances are in themselves entity caches through their internal change tracker mechanism and this error is a safeguard protecting its integrity. Since the idea is to have a long running process handling simultaneous requests through multiple db contexts (one per thread) plus a shared entity cache it would be very beneficial performance-wise and memory-wise (the change tracking would likely increase memory consumption in time) to attempt to either change db contexts lifecycle to be per message or empty their change tracker after each message is processed.
Of course in order to process entity updates they need to be attached to the current db context right after retrieving it from the cache and before any changes are applied to them.
The scenario here is for each screen (view) there is one ViewModel behind. And for best (or recommended) practice, we should use one long-alive DbContext for each ViewModel.
So there is one requirement to reload the related entities if there is some change (new added / deleted entities) made in another ViewModel.
Here are some solutions to this issue:
Publish some event or send some message to notify about the change, the subscriber ViewModels can:
Add/remove the added/deleted entities accordingly without having to reload the entities, this looks like syncing data between ViewModels. It has its own complexity because the added/removed entities here should not have state tracked (meaning the state should be Unchanged not Added or Deleted because these changes have already been updated to database). Also proxied entities cannot be added to multiple DbContexts, ... too many issues here.
Reload all the related entities. This is not naturally supported by EF.
Just reload the whole ViewModel at the time of switching screen (meaning the ViewModel won't be kept for a whole lifetime of the application). This may be applicable in some cases but actually it's not flexible enough to be used in any case (such as some change may be done from outside the application - another application - usually we just need a Refresh button on the current view to refresh data, so reload the whole ViewModel will affect the current View unnecessarily and may cause some bad visual effect, ...)
So I'm really looking for a good solution to this by reloading the related entities. By Googling around, looks like that this is not easily done by Entity Framework, the quickest and safest way is just create and use a new DbContext which means create and use a new ViewModel (please Note that I'm using dependency injection to inject the DbContext into the ViewModel, so the DbContext's lifetime is actually the same with the ViewModel's).
I can Google to find some hacky code to reload entities in Entity Framework but I don't really like hacky stuff. So if possible please share with me your approach, your solutions to this issue or even persuade me that hacky stuff is just fine.
we should use one long-alive DbContext for each ViewModel
I wouldn't say this is true.
You can and probably should create new DbContext instance for every load/update operation.
Using different DbContext instances give you possibility execute queries asynchronously.
For Windows applications (Winforms, WPF) asynchronous database access has huge improve in loading times, while application remain responsive.
With one DbContext this wouldn't be easy.
Instead injecting DbContext, create DbContext factory and inject it to the viewmodel, then
using (var context = _contextFactory.Create<MyDbContext>())
{
var orders = await context.Orders.ToListAsync();
return orders.Select(order => order.ToOrderDto());
}
But what I am afraid of, is that your business an view logic totally rely on database structure.
Your viewmodel shouldn't depend on DbContext, instead depend on a abstraction of database layer. (actually your question is the first wall you hit when rely on DbContext).
public interface OrderDataAccess
{
Task<Order> GetOrder(Guide id);
Task<IEnumerable<OrderLine>> GetOrderLines(Guide orderId);
}
When you load whole view you can load order and order lines.
var orderTask = _dataAccess.GetOrder(id);
var orderLinesTask = _dataAccess.GetOrderLines(id);
await Task.WhenAll(orderTask, orderLinesTask);
this.OrderViewModel = orderTask.Result;
this.OrderLinesViewModels = orderLinesTask.Result;
Then when for example you need reload order lines
this.OrderLinesViewModels = await _dataAccess.GetOrderLines(id);
Using transient DbContext instances just kicks the can down the road. Your ViewModel has some entity data that might be out-of-date. But it also might have unsaved changes. You simply have to decide how you want to handle that on a ViewModel-by-ViewModel basis.
In a Desktop App the ViewModel is the Unit-of-Work, and is still the proper scope for the DbContext.
If you decide you want to reload a tracked entity, or all the tracked entities for a DbContext instance, it shouldn't be a problem. EG something like:
void ReloadAllTrackedEntities()
{
foreach (var entry in ChangeTracker.Entries())
{
entry.Reload();
}
}
On a side-note, since you're building a desktop app, did you know EF Core supports using INotifyPropertyChanged for change tracking?
I have a model-first, entity framework design like this (version 4.4)
When I load it using code like this:
PriceSnapshotSummary snapshot = db.PriceSnapshotSummaries.FirstOrDefault(pss => pss.Id == snapshotId);
the snapshot has loaded everything (that is SnapshotPart, Quote, QuoteType), except the DataInfo. Now looking into the SQL this appears to be because Quote has no FK to DataInfo because of the 0..1 relationship.
However, I would have expected that the navigation property 'DataInfo' on Quote would still go off to the database to fetch it.
My current work around is this:
foreach (var quote in snapshot.ComponentQuotes)
{
var dataInfo = db.DataInfoes.FirstOrDefault(di => di.Quote.Id == quote.InstrumentQuote.Id);
quote.InstrumentQuote.DataInfo = dataInfo;
}
Is there a better way to achieve this? I thought EF would automatically load the reference?
This problem has to do with how the basic linq building blocks interact with Entity Framework.
take the following (pseudo)code:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This looks OK, but will throw a runtime error, because of an interface called IQueryable.
IQueryable implements IEnumerable and adds info for an expression and a provider. This basically allows it to build and execute sql statements against a database and not have to load whole tables when fetching data and iterating over them like you would over an IEnumerable.
Because linq defers execution of the expression until it's used, it compiles the IQueryable expression into SQL and executes the database query only right before it's needed. This speeds up things a lot, and allows for expression chaining without going to the database every time a Where() or Select() is executed. The side effect is if the object is used outside the scope of db, then the sql statement is executed after db has been disposed of.
To force linq to execute, you can use ToList, like this:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000).ToList();
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will force linq to execute the expression against db and get all addresses with number greater than a thousand. this is all good if you need to access a field within the addresses table, but since we want to get the name of a city (a 1..1 relationship similar to yours), we'll hit another bump before it can run: lazy loading.
Entity framework lazy loads entities by default, so nothing is fetched from the database until needed. Again, this speeds things up considerably, since without it every call to the database could potentially bring the whole database into memory; but has the problem of depending on the context being available.
You could set EF to eager load (in your model, go to properties and set 'Lazy Loading Enabled' to False), but that would bring in a lot of info you probably don't use.
The best fix for this problem is to execute everything inside db's scope:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
addresses.Select(addr => Console.WriteLine(addr.City.Name));
}
I know this is a really simple example but in the real world you can use a DI container like ninject to handle your dependencies and have your db available to you throughout execution of the app.
This leaves us with Include. Include will make IQueryable include all specified relation paths when building the sql statement:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Include("City").Where(addr => addr.Number > 1000).ToList;
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will work, and it's a nice compromise between having to load the whole database and having to refactor an entire project to support DI.
Another thing you can do, is map multiple tables to a single entity. In your case, since the relationship is 1-0..1, you shouldn't have a problem doing it.
I want to cache a never-changing aggregate which would be accessible by a root object only (all other entities are accessible only by using Reference/HasMany properties on the root object)?
Should I use NHibernate (which we are already using) second-level-cache or is it better to build some sort of singleton that provides access to all entities in the aggregate?
I found a blog post about getting everything with MultiQuery but my database does not support it.
The 'old way' to do this would be to
Do a select * from all aggregate tables
Loop the entities and set the References and the Collections manually
Something like:
foreach (var e in Entities)
{
e.Parent = loadedParentEntities.SingleOrDefault(pe => e.ParentId = pe.Id);
}
But surely there is a way to tell NHibernate to do this for me?
Update
Currently I tried merely fetching everything from the db and hope NHibernate does all the reference setting. It does not however :(
var getRoot = Session.Query<RootObject>().ToList();
var getRoot_hasMany = Session.Query<RootObjectCollection>().ToList();
var getRoot_hasMany_ref = Session.Query<RootObjectCollectionReference>().ToList();
var getRoot_hasMany_hasMany = Session.Query<RootObjectCollectionCollection>().ToList();
Domain:
Root objects are getRoot. These have a collection property 'HasMany'. These HasMany have each a reference back to GetRoot, and a reference to another entity (getRoot_hasMany_ref), and a collection of their own (getRoot_hasMany_hasMany). If this doesn't make sense, I'll create an ERD but the actual structure is not really relevant for the question (I think).
This results in 4 queries being executed. (which is good)
However, when accessing properties like getRoot.First().HasMany.First().Ref or getRoot.First().HasMany.First().HasMany().First() it still results in extra queries being executed even altough everything should already be known to the ISession?
So how do I tell NHibernate to perform those 4 queries and then build the graphs without using any proxy properties, ... so that I have access to everything even after the ISession went out of scope?
I think there are several questions in one.
I stopped trying to trick NHibernate too much. I wouldn't access entities from multiple threads, because they are usually not thread safe. At least when using lazy loading. Caching lazy entities is therefore something evil.
I would avoid too many queries by the use of batch size, which is far the cleanest and easiest solution and in most cases "good enough". It's fully transparent to the business logic, which makes it so cool.
I would:
Consider not caching the entity at all. Use NH first level cache (say: always load it using session.Get()). Make use of lazy loading when only a small part of the data is used in a single transaction.
Is there is a proven need to cache the data, consider to turn off lazy loading at all (by making the entities non-lazy and setting all the collections to non lazy. Load the entity once and cache it. Still consider thread safety when accessing the data while it is still loaded.
Should the entities be lazy, because some instances of the same type are not in the cache, consider using a DTO-like structure as cache. Copy all data in a similar class structure which are not entities. This may sound like a lot of additional work, but at the end it will avoid many strange problems and safe you much time.
Usually, query time is less important as flush time. This time is used by NH to find which entities changed in a session. To avoid this, make entities read only if you can.
if the whole object tree never changes (config settings?) then just load them efficiently with all references/collections initialised
using(var Session = Sessionfactory = OpenSession())
{
var root = Session.Query<RootObject>().FetchMany(x => x.Collection).ToFutureValue();
Session.Query<RootObjectCollection>().Fetch(x => x.Ref).FetchMany(x => x.Collection).ToFuture();
// Do something with root.Value
}