i recently ran into the surprising behaviour of EF4, where after adding an entity to a context it is not available for querying (well, you need to make your queries aware, that you might be searching in the memory) unless SaveChanges() is called.
Let me explain a bit our scenario:
We are using the UnitOfWork pattern with EF 4.0 and POCO objects. We recently decided to implement a Message Bus, where we would implement in the message handlers most of the application logic.
The problem i ran into was when i was passing around my UnitOfWork(a context wrapper in our case) in the messages. For example I have the logic of printing a Barcode, when i do it, it should change the Printed Counter in the DB. The printing can happen ad hoc for an existing package, or can be done automatically when creating a special type of a package. I pass over the UnitOfWork, and then i look for a barcode with:
public void Handle(IBarCodePrintMessage message)
{
if (message.UnitOfWork == null)
using (var uow = factory.Create<IUnitOfWork>)
{
Handle(message, uow);
uow.Commit();
}
else
Handle(message, message.UnitOfWork);
}
void Handle(IBarCodePrintMessage message, IUnitOfWork uow)
{
// the barcode with the PackageID is in the context or in the db
var barCode = uow.BarCodes.Where(b => b.PackageID == message.PackageID).SingleOrDefault();
barCode.IncreasePrintCount(); // this might be actually quite complex, and sometimes fail
printingServices.PrintBarCode(barCode);
}
My problem is, that if the Barcode was added within this same uow, and there was no Commit yet, then it is not found.
This is the simpliest example, we have tons of code that was doing its own commit, and now all this needs to go under one transaction.
I have a couple of questions actually:
1) I was thinking of somehow hacking my IUnitOfWork, to return a set of objects, that might be either in memory (not commited changes), or in the DB (if not retrieved yet). This is actually a behaviour i would expect my UnitOfWork, tell me what is the last state i am at, even though i didnt commit yet, and not to give me the db state. For the db state i can create another context.
Anyways this seems to be quite tricky, cause i would need to implement my own type of entity collection, all the extendsion methods on it (where, select, first, groupby, etc), then to get it to work the IQueryable way (meaning it doesnt list the tables straight away), and then find a way to match up the locally cached entities and the retrieved ones.
To me this seems to be a generic problem, and i would believe there is already an implementation out there, just not sure where.
2) another option is to introduce manual transactions. I tried it against SQLCE4.0 and it might solve the issue (call savecontext very often, so the entities are queriable from the db, and then if any error occurs, rollback the transaction), but i have big doubts. There might be different threads runnign different transactions at the same time, not sure how would they interact if one rolls back.
Also we are using SQL CE 4.0 and SQL Express 2008 (both, and can be switched dynamically). Starting to handle transactions this way seems to bring in the DTC, which - i read everywhere - is quite a heavy thing, and i would prefer to avoid. Is there a way to use transactions in an easy way without the risk of leveraging them to the DTC?
3) Does anyone have any other options or ideas on how to go about this problem?
You can query context for not persisted entities.
var entity = context.ObjectStateManager
.GetObjectStateEntries(EntityState.Added)
.Where(e => !e.IsRelationship)
.Select(e => e.Entity)
.OfType<BarCode>()
.FirstOrDefualt(b => b.PackageId == message.PackageId);
You can make this more generic and incorporate it into your UoW logic where you will first check your unsaved entities and if you not find the entity you will query the database.
Anyway if you know that you create a new barcode which will be needed in the following processing it should be passed in the "message" as property exposed on your interface.
Related
The documentation for AsNoTracking() in Entity Framework Core says that any edits to it won't be persisted when the database context is saved.
I have noticed another difference when AsNoTracking() is used, namely that if the database context has unsaved edits and you query it with AsNoTracking(), those changes won't be returned.
The documentation makes it sound like only edits done TO an AsNoTracking() query won't be tracked and persisted on save, but it seems that the contents returned will also be different.
If this is indeed the intended behaviour, I am unsure of the best design pattern.
I have used AsNoTracking() on all my read-only queries, but this means I have a bug as my design is something like this:
Controller endpoint that modifies data:
Call something in a service that may or may not alter the db
Call something else in the service, that does a read-only query with AsNoTracking()
Controller saves the database context
The intention is that any controller endpoint can call any number of service methods that may or may not alter the database, the database contexts are scoped so they are shared between the calls, and ultimately the controller persists the changes.
The problem is that #2 in the above won't return changes done in #1. How should this be resolved? The services can call out to other services which may fetch some data from places that have already been modified, so I can't just pass the models around everywhere.
Should I just remove AsNoTracking() from everywhere and call it a day? Or should I add a save call after every write? Or is there something else I could do?
TLDR: I want AsNoTracking() to be used in read-only queries for speed, but it won't return any unsaved changes. Should I remove AsNoTracking(), save after every edit, or is there a better way?
Edit:
Here is a snippet of what I mean; any query with AsNoTracking() ignores any edits done to a context before it's been saved, making me wonder how AsNoTracking() could be useful at all then:
var userSessionEntry = await this.mainContext.Sessions
.Where(t => t.AccountId == session.AccountId).FirstAsync();
userSessionEntry.AccountId = Guid.Empty;
var userSessionEntry2 = await this.mainContext.Sessions
.Where(t => t.AccountId == session.AccountId).AsNoTracking().FirstAsync();
Console.WriteLine(userSessionEntry2.AccountId); // prints original AccountId and not an empty id
Edit 2:
I'm using the latest preview version of Entity Framework Core; 5.0.0-preview.5.20278.
Thanks.
The way AsNoTracking works is that it will always bypass the DbContext's own cache (change-tracked) entities, and directly execute a query on the database. This is what is meant by the definition. The cached data can be different than the underlying database data, assuming someone else makes changes to the same entities your working with.
However, per your design, if all services in your controller use the same exact DbContext instance, then you will be fine. There are ways to do this by using scoped dependency injection of your database context to any services you have. This way all parts of your service request should use the same instance.
If you need the most up-to-date data all the time, then you'll need to use AsNoTracking for all queries you make so you always hit the database for the freshest data.
You can still make edits to entities that are no-longer change tracked, but some additional code will be required:
var managers = await DbContext.Set<Employee>()
.AsNoTracking()
.Where(x => x.IsManager)
.ToListAsync();
foreach (var manager in managers)
{
manager.Salary += 10000;
var dbEntry = DbContext.DbEntry(manager);
dbEntry.Property(x => x.Salary).IsModified = true;
}
await DbContext.SaveChangesAsync();
You can use the above strategy to always ensure your working with the freshest data. If you have 1000's of users actively using your service, this can actually hit your database quite a lot, so some caching strategy would be in order.
I've got an Application which consists of 2 parts at the moment
A Viewer that receives data from a database using EF
A Service that manipulates data from the database at runtime.
The logic behind the scenes includes some projects such as repositories - data access is realized with a unit of work. The Viewer itself is a WPF-Form with an underlying ViewModel.
The ViewModel contains an ObservableCollection which is the datasource of my Viewer.
Now the question is - How am I able to retrieve the database-data every few minutes? I'm aware of the following two problems:
It's not the latest data my Repository is "loading" - does EF "smart" stuff and retrieves data from the local cache? If so, how can I force EF to load the data from the database?
Re-Setting the whole ObservableCollection or adding / removing entities from another thread / backgroundworker (with invokation) is not possible. How am I supposed to solve this?
I will add some of my code if needed but at the moment I don't think that this would help at all.
Edit:
public IEnumerable<Request> GetAllUnResolvedRequests() {
return AccessContext.Requests.Where(o => !o.IsResolved);
}
This piece of code won't get the latest data - I edit some rows manually (set IsResolved to true) but this method retrieves it nevertheless.
Edit2:
Edit3:
var requests = AccessContext.Requests.Where(o => o.Date >= fromDate && o.Date <= toDate).ToList();
foreach (var request in requests) {
AccessContext.Entry(request).Reload();
}
return requests;
Final Question:
The code above "solves" the problem - but in my opinion it's not clean. Is there another way?
When you access an entity on a database, the entity is cached (and tracked to track changes that your application does until you specify AsNoTracking).
This has some issues (for example, performance issues because the cache increases or you see an old version of entities that is your case).
For this reasons, when using EF you should work with Unit of work pattern (i.e. you should create a new context for every unit of work).
You can have a look to this Microsoft article to understand how implement Unit of work pattern.
http://www.asp.net/mvc/overview/older-versions/getting-started-with-ef-5-using-mvc-4/implementing-the-repository-and-unit-of-work-patterns-in-an-asp-net-mvc-application
In your case using Reload is not a good choice because the application is not scalable. For every reload you are doing a query to database. If you just need to return desired entities the best way is to create a new context.
public IEnumerable<Request> GetAllUnResolvedRequests()
{
return GetNewContext().Requests.Where(o => !o.IsResolved).ToList();
}
Here is what you can do.
You can define the Task (which keeps running on ThreadPool) that periodically checks the Database (consider that periodically making EF to reload data has its own cost).
And You can define SQL Dependency on your query so that when there is a change in data, you can notify the main thread for the same.
What is the best way to refresh data in Entity Framework 5? I've got an WPF application showing statistics from a database where data is changing all the time. Every 10 seconds the application is updating the result but the default behaviour for EF seems to be to cache the previous results. I would thus like a way to invalidate the previous results so a new set of data can be loaded.
The context of interest is defined in the following way:
public partial class MyEntities: DbContext
{
...
public DbSet<Stat> Stats { get; set; }
...
}
After some reading I was able to find a few approaches, but I have no idea of how efficient these ways are and if they come with downsides.
Create a new instance of the entities object
using (var db = new MyEntities())
{
var stats = from s in db.Stats ...
}
This works but feels inefficient because there are many other places where data is retrieved, and I don't want to reopen a new connection every time I need some data. Wouldn't it be more efficient to keep the connection open and do it another way?
Call refresh on the ObjectContext
var stats = from s in db.Stats ...
ObjectContext.Refresh(RefreshMode.StoreWins, stats );
This also assumes I'm extracting ObjectContext from the dbContext in this way:
private MyEntities db = null;
private ObjectContext ObjectContext
{
get
{
return ((IObjectContextAdapter)db).ObjectContext;
}
}
This is the solution I'm using as it is now. It seems simple. But I read somewhere that ObjectContext nowadays isn't directly accessible in DbContext because the EF team doesn't think that anyone would need it, and that you can do all things you need directly in DbContext. This makes me think that maybe this is not the best way to do it, or?
I know there is a reload method of dbContext.Entry but since I'm not reloading a single entity but rather retrieve a list of entities, I don't really know if this way will work. If I get 5 stat objects in the first query, save them in a list and do a reload on them when it's time to update, I might miss out others that have been added to the list on the database. Or have I completely misunderstood the reload method? Can I do a reload on a DbSetspecified in MyEntities?
There are a number of questions above but what I mainly want to know is what is the best practice in EF5 for asking the same query to the database over and over again? It might very well be something that I haven't discovered yet...
Actually, and even if it seems counter intuitive, the first option is the correct one, see this
DbContext are design to have short lifespans, hence their instantiation cost is quite low compared to the cost of reloading everything, it's mostly due to things like caching, and their data loading designs in general.
That's also why EF works so "naturally" well with ASP .NET MVC, since the DbContext is instantiated at each request.
That doesn't mean you have to create DbContext all over the place of course, in your context, using a DbContext per update operation (the one happening every 10secs) seems good enough, if during that operation you would need to delete a particular row, for example, you would pass the DbContext around, not create a new one.
For a few days now, I have been struggling with retrieving my entities from a repository (DbContext).
I am trying to save all the entities in an atomic action. Thus, different entities together represent something of value to me. If all the entities are 'valid', then I can save them all to the database. Entity 'a' is already stored in my repository, and needs to be retrieved to 'validate' entity 'b'.
That's where the problem arises. My repository relies on the DbSet<TEntity> class which works great with Linq2Sql (Include() navigation properties e.g.). But, the DbSet<TEntity> does not contain entities that are in the 'added' state.
So I have (as far as I know) two options:
Use the ChangeTracker to see which entities are available and query them into a set based on their EntityState.
Use the DbSet<TEntity>.Local property.
The ChangeTracker seems to involve some extra hard work to get it working in a way such that I can use Linq2Sql to Include() navigation properties e.g.
The DbSet<TEntity>.Local seems a bit weird to me. It might just be the name. I just read something that it is not performing very well (slower than DbSet<> itself). Not sure if that is a false statement.
Could somebody with significant EntityFramework experience shine some light on this? What's the 'wise' path to follow? Or am I seeing ghosts and should I always use the .Local property?
Update with code examples:
An example of what goes wrong
public void AddAndRetrieveUncommittedTenant()
{
_tenantRepository = new TenantRepository(new TenantApplicationTestContext());
const string tenantName = "testtenant";
// Create the tenant, but not call `SaveChanges` yet until all entities are validated
_tenantRepository.Create(tenantName);
//
// Some other code
//
var tenant = _tenantRepository.GetTenants().FirstOrDefault(entity => entity.Name.Equals(tenantName));
// The tenant will be null, because I did not call save changes yet,
// and the implementation of the Repository uses a DbSet<TEntity>
// instead of the DbSet<TEntity>.Local.
Assert.IsNotNull(tenant);
// Can I safely use DbSet<TEntity>.Local ? Or should I play
// around with DbContext.ChangeTracker instead?
}
An example of how I want to use my Repository
In my Repository I have this method:
public IQueryable<TEntity> GetAll()
{
return Context.Set<TEntity>().AsQueryable();
}
Which I use in business code in this fashion:
public List<Case> GetCasesForUser(User user)
{
return _repository.GetAll().
Where(#case => #case.Owner.EmailAddress.Equals(user.EmailAddress)).
Include(#case => #case.Type).
Include(#case => #case.Owner).
ToList();
}
That is mainly the reason why I prefer to stick to DbSet like variables. I need the flexibility to Include navigation properties. If I use the ChangeTracker I retrieve the entities in a List, which does not allow me to lazy load related entities at a later point in time.
If this is close to incomprehensible bullsh*t, then please let me know so that I can improve the question. I desperately need an answer.
Thx a lot in advance!
If you want to be able to 'easily' issue a query against the DbSet and have it find newly created items, then you will need to call SaveChanges() after each entity is created. If you are using a 'unit of work' style approach to working with persistent entities, this is actually not problematic because you can have the unit of work wrap all actions within the UoW as a DB transaction (i.e. create a new TransactionScope when the UoW is created, and call Commit() on it when the UoW completed). With this structure, the changes are sent to the DB, and will be visible to DbSet, but not visible to other UoWs (modulo whatever isolation level you use).
If you don't want the overhead of this, then you need to modify your code to make use of Local at appropriate times (which may involve looking at Local, and then issuing a query against the DbSet if you didn't find what you were looking for). The Find() method on DbSet can also be quite helpful in these situations. It will find an entity by primary key in either Local or the DB. So if you only need to locate items by primary key, this is pretty convenient (and has performance advantages as well).
As mentioned by Terry Coatta, the best approach if you don't want to save the records first would be checking both sources.
For example:
public Person LookupPerson(string emailAddress, DateTime effectiveDate)
{
Expression<Func<Person, bool>> criteria =
p =>
p.EmailAddress == emailAddress &&
p.EffectiveDate == effectiveDate;
return LookupPerson(_context.ObjectSet<Person>.Local.AsQueryable(), criteria) ?? // Search local
LookupPerson(_context.ObjectSet<Person>.AsQueryable(), criteria); // Search database
}
private Person LookupPerson(IQueryable<Person> source, Expression<Func<Person, bool>> predicate)
{
return source.FirstOrDefault(predicate);
}
For those who come after, I ran into some similar issues and decided to give the .Concat method a try. I have not done extensive performance testing so someone with more knowledge than I should feel free to chime in.
Essentially, in order to properly break up functionality into smaller chunks, I ended up with a situation in which I had a method that didn't know about consecutive or previous calls to that same method in the current UoW. So I did this:
var context = new MyDbContextClass();
var emp = context.Employees.Concat(context.Employees.Local).FirstOrDefault(e => e.Name.Contains("some name"));
This may only apply to EF Core, but every time you reference .Local of a DbSet, you're silently triggering change detection on the context, which can be a performance hit, depending on how complex your model is, and how many entries are currently being tracked.
If this is a concern, you'll want to use (fore EFCore) dbContext.ChangeTracker.Entries<T>() to get the locally tracked entities, which will not trigger change detection, but does require manual filtering of the DB state, as it will include deleted and detached entities.
There's a similar version of this in EF6, but in EFCore the Entries is a list of EntityEntries which you'll have to select out the entry.Entity to get out the same data the DbSet would give you.
I'm new to the Entities Framework, and am just starting to play around with it in my free time. One of the major questions I have is regarding how to handle ObjectContexts.
Which is generally preferred/recommended of these:
This
public class DataAccess{
MyDbContext m_Context;
public DataAccess(){
m_Context = new MyDbContext();
}
public IEnumerable<SomeItem> GetSomeItems(){
return m_Context.SomeItems;
}
public void DeleteSomeItem(SomeItem item){
m_Context.DeleteObject(item);
m_Context.SaveChanges();
}
}
Or this?
public class DataAccess{
public DataAccess(){ }
public IEnumerable<SomeItem> GetSomeItems(){
MyDbContext context = new DbContext();
return context.SomeItems;
}
public void DeleteSomeItem(SomeItem item){
MyDbContext context = new DbContext();
context.DeleteObject(item);
context.SaveChanges();
}
}
The ObjectContext is meant to be the "Unit of Work".
Essentially what this means is that for each "Operation" (eg: each web-page request) there should be a new ObjectContext instance. Within that operation, the same ObjectContext should be re-used.
This makes sense when you think about it, as transactions and change submission are all tied to the ObjectContext instance.
If you're not writing a web-app, and are instead writing a WPF or windows forms application, it gets a bit more complex, as you don't have the tight "request" scope that a web-page-load gives you, but you get the idea.
PS: In either of your examples, the lifetime of the ObjectContext will either be global, or transient. In both situations, it should NOT live inside the DataAccess class - it should be passed in as a dependency
If you keep the same context for a long-running process running lots queries against it, linq-to-sql (I didn't test against linq to entities, but I guess that's the same problem) gets VERY slow (1 query a second after some 1000 simple queries). Renewing the context on a regular basis fixes this issue, and doesn't cost so much.
What happens is that the context keeps track of every query you do on it, so if it's not reset in a way, it gets really fat... Other issue is then the memory it takes.
So it mainly depends on the way your application is working, and if you new up a DataAccess instance regularly or if you keep it the same all along.
Hope this helps.
Stéphane
Just a quick note - the two code pieces are roughly the same in their underlying problem. This is something that I have been looking at, because you dont want to keep opening and closing the context (see second example) at the same time you are not sure if you can trust Microsoft to properly dispose of the context for you.
One of the things I did was create a common base class that lazy loads the Context in and implement the base class destruct-er to dispose of things. This works well for something like the MVC framework, but unfortunately leads to the problem of having to pass the context around to the various layers so the business objects can share the call.
In the end I went with something using Ninject to inject this dependency into each layer and had it track usage
While I'm not in favour of always creating, what must be, complicated objects each time I need them - I too have found that the DataContexts in Linq to Sql and the ObjectContexts in EF are best created when required.
Both of these perform a lot of static initialisation based on the model that you run them against, which is cached for subsequent calls, so you'll find that the initial startup for a context will be longer than all subsequent instantiations.
The biggest hurdle you face with this is the fact that once you obtain an entity from the context, you can't simply pass it back into another to perform update operations, or add related entities back in. In EF you can reattach an entity back to a new context. In L2S this process is nigh-on impossible.