DbContext caching

DbContext caching - c#

I know the caching of DbContext is not good idea. But I would like to do it fine. What do you think about this way?
public class Context : DbContext
{
private Context()
{
}
private static WeakReference<Context> _cachedContext;
public Context Instance
{
get
{
Context context;
if (!_cachedContext.TryGetTarget(out context))
{
context = new Context();
_cachedContext.SetTarget(context);
}
return context;
}
}
}
This code is planned to be used without IDisposable.Dispose calling in the client-side. What problems this can cause except singleton (anti)pattern? Thanks.

The DbContext is a cache. Keeping hold of it for a long time is a terrible idea... it will slowly consume your application's memory hanging on to data that may well be stale.
It was not designed to be used in the way you propose.
Don't do it.
A DbContext is a transient object that should be used and disposed of in the smallest scope possible.
using(var ctx = new MyDbContext())
{
//make some changes
ctx.SaveChanges();
}
That's how it was designed to be used. Use it properly.

This is an XY problem. Why do you want to "cache" the DbContext? What benefit do you think you'll gain from this?
You should not do this, ever. The class was not meant for this. It will instead cause performance problems (undoing the benefit you think to gain) and persistent errors after you attached invalid entities - you'll never be able to save entities using this context again, as the change tracker holds the entities.
See Correct usage of EF's DBContext in ASP.NET MVC application with Castle Windsor, Working with DbContext, Managing DbContext the right way with Entity Framework 6: an in-depth guide, Manage the lifetime of dbContext or any of the other thousands of hits on searching the web for "entity framework dbcontext lifetime".
If you want to cache data, then instead cache the records themselves. There are existing solutions (code and libraries) that can help you with this, which is called "second level caching". No need to write it yourself. See for example How to make Entity Framework cache some objects.

Related

How to cache DataContext instances in a consumer type application?

We have an application using SDK provided by our provider to integrate easily with them. This SDK connects to AMQP endpoint and simply distributes, caches and transforms messages to our consumers. Previously this integration was over HTTP with XML as a data source and old integration had two ways of caching DataContext - per web request and per managed thread id. (1)
Now, however, we do not integrate over HTTP but rather AMQP which is transparent to us since the SDK is doing all the connection logic and we are only left with defining our consumers so there is no option to cache DataContext "per web request" so only per managed thread id is left.
I implemented chain of responsibility pattern, so when an update comes to us it is put in one pipeline of handlers which uses DataContext to update the database according to the new updates. This is how the invocation method of pipeline looks like:
public Task Invoke(TInput entity)
{
object currentInputArgument = entity;
for (var i = 0; i < _pipeline.Count; ++i)
{
var action = _pipeline[i];
if (action.Method.ReturnType.IsSubclassOf(typeof(Task)))
{
if (action.Method.ReturnType.IsConstructedGenericType)
{
dynamic tmp = action.DynamicInvoke(currentInputArgument);
currentInputArgument = tmp.GetAwaiter().GetResult();
}
else
{
(action.DynamicInvoke(currentInputArgument) as Task).GetAwaiter().GetResult();
}
}
else
{
currentInputArgument = action.DynamicInvoke(currentInputArgument);
}
}
return Task.CompletedTask;
}
The problem is (at least what I think it is) that this chain of responsibility is chain of methods returning/starting new tasks so when an update for entity A comes it is handled by managed thread id = 1 let's say and then only sometime after again same entity A arrives only to be handled by managed thread id = 2 for example. This leads to:
System.InvalidOperationException: 'An entity object cannot be referenced by multiple instances of IEntityChangeTracker.'
because DataContext from managed thread id = 1 already tracks entity A. (at least that's what I think it is)
My question is how can I cache DataContext in my case? Did you guys have the same problem? I read this and this answers and from what I understood using one static DataContext is not an option also.(2)
Disclaimer: I should have said that we inherited the application and I cannot answer why it was implemented like that.
Disclaimer 2: I have little to no experience with EF.
Comunity asked questions:
What version of EF we are using? 5.0
Why do entities live longer than the context? - They don't but maybe you are asking why entities need to live longer than the context. I use repositories that use cached DataContext to get entities from the database to store them in an in-memory collection which I use as a cache.
This is how entities are "extracted", where DatabaseDataContext is the cached DataContext I am talking about (BLOB with whole database sets inside)
protected IQueryable<T> Get<TProperty>(params Expression<Func<T, TProperty>>[] includes)
{
var query = DatabaseDataContext.Set<T>().AsQueryable();
if (includes != null && includes.Length > 0)
{
foreach (var item in includes)
{
query = query.Include(item);
}
}
return query;
}
Then, whenever my consumer application receives AMQP message my chain of responsibility pattern begins checking if this message and its data I already processed. So I have method that looks like that:
public async Task<TEntity> Handle<TEntity>(TEntity sportEvent)
where TEntity : ISportEvent
{
... some unimportant business logic
//save the sport
if (sport.SportID > 0) // <-- this here basically checks if so called
// sport is found in cache or not
// if its found then we update the entity in the db
// and update the cache after that
{
_sportRepository.Update(sport); /*
* because message update for the same sport can come
* and since DataContext is cached by threadId like I said
* and Update can be executed from different threads
* this is where aforementioned exception is thrown
*/
}
else // if not simply insert the entity in the db and the caches
{
_sportRepository.Insert(sport);
}
_sportRepository.SaveDbChanges();
... updating caches logic
}
I thought that getting entities from the database with AsNoTracking() method or detaching entities every time I "update" or "insert" entity will solve this, but it did not.

Whilst there is a certain overhead to newing up a DbContext, and using DI to share a single instance of a DbContext within a web request can save some of this overhead, simple CRUD operations can just new up a new DbContext for each action.
Looking at the code you have posted so far, I would probably have a private instance of the DbContext newed up in the Repository constructor, and then new up a Repository for each method.
Then your method would look something like this:
public async Task<TEntity> Handle<TEntity>(TEntity sportEvent)
where TEntity : ISportEvent
{
var sportsRepository = new SportsRepository()
... some unimportant business logic
//save the sport
if (sport.SportID > 0)
{
_sportRepository.Update(sport);
}
else
{
_sportRepository.Insert(sport);
}
_sportRepository.SaveDbChanges();
}
public class SportsRepository
{
private DbContext _dbContext;
public SportsRepository()
{
_dbContext = new DbContext();
}
}
You might also want to consider the use of Stub Entities as a way around sharing a DbContext with other repository classes.

Since this is about some existing business application I will focus on ideas that can help solve the issue rather than lecture about best practices or propose architectural changes.
I know this is kind of obvious but sometimes rewording error messages helps us better understand what's going on so bear with me.
The error message indicates entities are being used by multiple data contexts which indicates that there are multiple dbcontext instances and that entities are referenced by more than one of such instances.
Then the question states there is a data context per thread that used to be per http request and that entities are cached.
So it seems safe to assume entities read from a db context upon a cache miss and returned from the cache on a hit. Attempting to update entities loaded from one db context instance using a second db context instance cause the failure. We can conclude that in this case the exact same entity instance was used in both operations and no serialization/deserialization is in place for accessing the cache.
DbContext instances are in themselves entity caches through their internal change tracker mechanism and this error is a safeguard protecting its integrity. Since the idea is to have a long running process handling simultaneous requests through multiple db contexts (one per thread) plus a shared entity cache it would be very beneficial performance-wise and memory-wise (the change tracking would likely increase memory consumption in time) to attempt to either change db contexts lifecycle to be per message or empty their change tracker after each message is processed.
Of course in order to process entity updates they need to be attached to the current db context right after retrieving it from the cache and before any changes are applied to them.

Prevent cached objects to end up in the database with Entity Framework

We have an ASP.NET project with Entity Framework and SQL Azure.
A big part of our data only needs to be updated a few times a day, other data is very volatile.
The data that barely changes we cache in memory at startup, detach from the context and than use it mainly for reading, drastically lowering the amount of database requests we have to do.
The volatile data is requested everytime by a DbContext per Http request.
When we do an update to the cached data, we send a message to all instances to catch a fresh version of all the data from the SQL server.
So far, so good.
Until we introduced a bug that linked one of these 'cached' objects to the 'volatile' data, and did a SaveChanges.
Well, that was quite a mess.
The whole data tree was added again and again by every update, corrupting the whole database with a whole lot of duplicated data.
As a complete hack I added a completely arbitrary column with a UniqueConstraint and some gibberish data on one of the root tables; hopefully failing the SaveChanges() next time we introduce such a bug because it will violate the Unique Constraint.
But it is of course hacky, and I'm still pretty scared ;P
Are there any better ways to prevent whole tree's of cached objects ending up in the database?
More information
Project is ASP.NET MVC
I cache this data, because it is mainly read only, and this saves a tons of extra database calls per http request
This is in a high traffic website, with a lot of personal customized views. Having the POCO data in memory works really good for what I want. Except the problem I mentioned.
It is a bit more complicated, but a simplified version is that I cache the objects by a singleton: so i.e:
EntityCache.Instance.LolCats = new DbContext().LolCats.AsNoTracking().ToList();
This cache I dependency-inject into my controllers.

You can solve it like this:
1) Create an interface like this:
public interface IIsReadOnly
{
bool IsReadOnly { get; set; }
}
2) Implement this interface in all of the entities that can be cached. When you read and cache them, set the IsReadOnly property to true. This flag will be used when SaveChanges is invoked. Remember to decorate this property with the [NotMapped] attribute, or use any other mean to make EF ignore it.
public class ACacheableEntitySample
: IIsReadOnly
{
[NotMapped]
public bool IsReadOnly { get; set; }
// define the "regular" entity properties
}
NOTE: you can include the property directly in the class definition (if using Code First), or use partial classes (for Db First, Model First, or Code First).
NOTE: alternatively you can make EF ignore the IsReadOnly property using the Fluent API, or even better a custom convention (EF 6+)
3) Override your inherited DbContext.SaveChanges method. In the overridden method, review all the entries with pending changes, and if they are read only, change there state to Unchanged:
if (entry is IIsReadOnly) // if it's a cacheable entity
{
if (entry.IsReadOnly) // and it was marked as readonly when caching
{
// change the entry state to unchanged here, so that it's not updated
}
}
NOTE: This is sample code to explain what you need to do. In your final implementation you can do it with a simple LINQ sentence that get all the IIsReadOnly entities, which have the IsReadOnly set to true, and set their state to Unchanged.
You can use the IIsReadOnly entites in another DbContext and manipulate them in the usual way. For example if you get one of these entites, update it, and call SaveChanges, the changes will be saved because IsReadOnly will have the default false value. But you'll easily avoid saving changes of cached data accidentally, simply by setting the IsReadOnly property to true when caching.

Original answer deleted because it was a waste of time.
Your post and proceeding comments are a perfect example of the XY Problem.
You say:
I really need a solution for the problem, not for the architecture
What if the architecture is the problem?
The problem you presented
A caching solution you implemented that violates at least a half dozen best practices has (surprise!) blown up in your face. You've managed to stop it from blowing again up via a spectacular (not in a good way) hack but you want to know how to do it in a way that won't require such a spectacular hack.
The problem you had
You needed to cache some data because it was getting too expensive to hit the database for every request.
The answers that were offered
Use foreign keys instead of navigation properties
This is a perfectly valid answer and, surprise, a best practice. Navigation properties can change any time you regenerate the code in your Entity Data Model and are often ambiguous. With a bit of effort you could have used this and never had to worry about EF's handling of object relationships again.
Cache models instead of Entity objects
Another valid answer, and one that requires the least amount of actual work. MVC applications usually require some redundancy between viewmodels and entity objects and if you ever write a proper multi-tier application you'll practically drown in redundant objects. And nobody will accidentally add these objects to a DbContext ever again - because they can't.
Criticism
You have offered up very little useful information. From what I can tell your approach from the get-go was wrong.
Firstly, dumping whole tables into memory at App_Start is at best a temporary solution. If the table was too big to hit on every request, it's too big to hit on App_Start. What happens if something important breaks while people are using your application and you need to deploy a bug fix ASAP? What happens when your tables get really big and you start getting timeouts from EF while trying to dump them into memory? What happens if 95% of your users only really ever need 10% of that big table you've dumped into memory? Is the memory on your web/cache server going to be enough to accommodate the increasing size of your tables? For how long?
Secondly, no Entity object should remain anywhere after its originating DbContext is disposed. Entity objects behave in a convenient way while their DbContext is in scope and become troublesome POCOs when it's out of scope. I say troublesome because the 'magic' DbContext does with change tracking tends to fool people unfamiliar with the inner workings of EF into thinking that an Entity object is directly connected to a table row in the database. The problem you had illustrates this point perfectly.
Thirdly, it looks like you need to delete and re-dump a whole table to memory, even if you only update a single column in a single row. That's immensely wasteful to both the memory and CPU on your web server, and to your Azure SQL instance(s). What happens when a small bit of data comes in wrong and needs to be updated in a hurry? What if one of your nightly update jobs fails but you need fresh data in the morning?
You may not worry about any of this stuff now but your solution blowing up in your face should have at the very least raised some red flags. I've had to deal with as lot of caching in projects I've worked on in the past few years and everything I say here comes from experience.
Proposed solution - On-demand caching
If you've put a little effort into organizing your code, all of your CRUD operations on the database should be in specialized helper classes which I call repositories. Your controller calls its specialized repository (StuffController - StuffRepository), receives a model and binds that model to a view, kinda like this:
public class StuffController : Controller
{
private MyDbContext _db;
private StuffRepository _repo;
public StuffController()
{
_db = new MyDbContext();
_repo = new StuffRepository(_db);
}
// ...
public ActionResult Details(int id)
{
var model = _repo.ReadDetails(id);
// ...
return View(model);
}
protected override void Dispose(bool disposing)
{
_db.Dispose();
base.Dispose(disposing);
}
}
What on-demand caching would do is wrap that call to the repository in such a way that if the result of that method was already in the cache and it was not stale, it would return it from the cache. Otherwise it would hit the database.
Here's a simplified (and probably nonfunctional) example of a CacheWrapper class so you can understand what it does, using HttpRuntime.Cache:
public static class CacheWrapper
{
private static List<string> _keys = new List<string>();
public static List<string> Keys
{
get { lock(_keys) { return _keys.ToList(); } }
}
public static T Fetch<T>(string key, Func<T> dlgt, bool refresh = false) where T : class
{
var result = HttpRuntime.Cache.Get(key) as T;
if(result != null && !refresh) return result;
lock(HttpRuntime.Cache)
{
lock(_keys)
{
_keys.Add(key);
}
result = dlgt();
HttpRuntime.Cache.Add(key, result, /* some other params */);
}
return result;
}
}
And the new way to call things from the controller:
public ActionResult Details(int id)
{
var model = CacheWrapper.Fetch("StuffDetails_" + id, () => _repo.ReadDetails(id));
// ...
return View(model);
}
A slightly more complex version of this is in production on a public web application as we speak and working quite well.

Best practice to refresh an EF5 result from database

What is the best way to refresh data in Entity Framework 5? I've got an WPF application showing statistics from a database where data is changing all the time. Every 10 seconds the application is updating the result but the default behaviour for EF seems to be to cache the previous results. I would thus like a way to invalidate the previous results so a new set of data can be loaded.
The context of interest is defined in the following way:
public partial class MyEntities: DbContext
{
...
public DbSet<Stat> Stats { get; set; }
...
}
After some reading I was able to find a few approaches, but I have no idea of how efficient these ways are and if they come with downsides.
Create a new instance of the entities object
using (var db = new MyEntities())
{
var stats = from s in db.Stats ...
}
This works but feels inefficient because there are many other places where data is retrieved, and I don't want to reopen a new connection every time I need some data. Wouldn't it be more efficient to keep the connection open and do it another way?
Call refresh on the ObjectContext
var stats = from s in db.Stats ...
ObjectContext.Refresh(RefreshMode.StoreWins, stats );
This also assumes I'm extracting ObjectContext from the dbContext in this way:
private MyEntities db = null;
private ObjectContext ObjectContext
{
get
{
return ((IObjectContextAdapter)db).ObjectContext;
}
}
This is the solution I'm using as it is now. It seems simple. But I read somewhere that ObjectContext nowadays isn't directly accessible in DbContext because the EF team doesn't think that anyone would need it, and that you can do all things you need directly in DbContext. This makes me think that maybe this is not the best way to do it, or?
I know there is a reload method of dbContext.Entry but since I'm not reloading a single entity but rather retrieve a list of entities, I don't really know if this way will work. If I get 5 stat objects in the first query, save them in a list and do a reload on them when it's time to update, I might miss out others that have been added to the list on the database. Or have I completely misunderstood the reload method? Can I do a reload on a DbSetspecified in MyEntities?
There are a number of questions above but what I mainly want to know is what is the best practice in EF5 for asking the same query to the database over and over again? It might very well be something that I haven't discovered yet...

Actually, and even if it seems counter intuitive, the first option is the correct one, see this
DbContext are design to have short lifespans, hence their instantiation cost is quite low compared to the cost of reloading everything, it's mostly due to things like caching, and their data loading designs in general.
That's also why EF works so "naturally" well with ASP .NET MVC, since the DbContext is instantiated at each request.
That doesn't mean you have to create DbContext all over the place of course, in your context, using a DbContext per update operation (the one happening every 10secs) seems good enough, if during that operation you would need to delete a particular row, for example, you would pass the DbContext around, not create a new one.

Linq to SQL - How should I manage database requests?

I have studied a bit into the lifespan of the DataContext trying to work out what is the best possible way of doing things.
Given I want to re-use my DAL in a web application I decided to go with the DataContext Per Business Object Request approach.
My idea was to extend my L2S entities from the dbml file to retrieve information the database creating a separate context per request e.g.
public partial class AnEntity
{
public IEnumerable<RelatedEntity> GetRelatedEntities()
{
using (var dc = new MyDataContext())
{
return dc.RelatedEntities.Where(r => r.EntityID == this.ID);
}
}
}
In terms of returning the Entities...do I need to return POCOs at this point or is it ok to simply return the business object returned from the query? I understand that if I was to try access properties of the returned entity (after the DataContext has been disposed) it would fail. However, this is the reason I have decided to implement these type of methods e.g.
Instead of:
AnEntity entity = null;
using (var repo = new EntityRepo())
{
entity = repo.GetEntity(12345);
}
var related = entity.RelatedEntities; // this would cause an exception
In theory I should be able to do:
AnEntity entity = null;
using (var repo = new EntityRepo())
{
entity = repo.GetEntity(12345);
}
var related = entity.GetRelatedEntities();
Given the circumstances of my particular app (needs to work in a windows service & web application) I would like to know if this seems a plausible approach, whether there are obvious flaws and if there are better approaches for what it is I am trying to do.
Thanks.

Generally speaking, as long as you are not calling a single DataContext object using more than one thread, you should be OK. In other words, use one DataContext object per thread, and do not share data or state between different DataContext objects.
The remaining multi-threaded issues have to do with concurrency in the database, not threading operations.
Other than these caveats, your approach seems sound. You can either use partial classes to implement your business methods, or you can add a business layer between the Linq to SQL classes and your repository.

You can get away with this:
var repo = new EntityRepo();
entity = repo.GetEntity(12345);
var related = entity.RelatedEntities;
See this StackOverflow post for an explanation why not disposing your context doesn't cause connection leaks or stuff like that.
The repository and context will get cleaned up by the garbage collector when the entities that were fetched by them fall out of scope (when building a website, at the end of the request).
Edit: MSDN documents that connections are opened as late as possible and closed as soon as possible. So skipping the using doesn't causes connection pool problems.

Reuseable ObjectContext or new ObjectContext for each set of operations?

I'm new to the Entities Framework, and am just starting to play around with it in my free time. One of the major questions I have is regarding how to handle ObjectContexts.
Which is generally preferred/recommended of these:
This
public class DataAccess{
MyDbContext m_Context;
public DataAccess(){
m_Context = new MyDbContext();
}
public IEnumerable<SomeItem> GetSomeItems(){
return m_Context.SomeItems;
}
public void DeleteSomeItem(SomeItem item){
m_Context.DeleteObject(item);
m_Context.SaveChanges();
}
}
Or this?
public class DataAccess{
public DataAccess(){ }
public IEnumerable<SomeItem> GetSomeItems(){
MyDbContext context = new DbContext();
return context.SomeItems;
}
public void DeleteSomeItem(SomeItem item){
MyDbContext context = new DbContext();
context.DeleteObject(item);
context.SaveChanges();
}
}

The ObjectContext is meant to be the "Unit of Work".
Essentially what this means is that for each "Operation" (eg: each web-page request) there should be a new ObjectContext instance. Within that operation, the same ObjectContext should be re-used.
This makes sense when you think about it, as transactions and change submission are all tied to the ObjectContext instance.
If you're not writing a web-app, and are instead writing a WPF or windows forms application, it gets a bit more complex, as you don't have the tight "request" scope that a web-page-load gives you, but you get the idea.
PS: In either of your examples, the lifetime of the ObjectContext will either be global, or transient. In both situations, it should NOT live inside the DataAccess class - it should be passed in as a dependency

If you keep the same context for a long-running process running lots queries against it, linq-to-sql (I didn't test against linq to entities, but I guess that's the same problem) gets VERY slow (1 query a second after some 1000 simple queries). Renewing the context on a regular basis fixes this issue, and doesn't cost so much.
What happens is that the context keeps track of every query you do on it, so if it's not reset in a way, it gets really fat... Other issue is then the memory it takes.
So it mainly depends on the way your application is working, and if you new up a DataAccess instance regularly or if you keep it the same all along.
Hope this helps.
Stéphane

Just a quick note - the two code pieces are roughly the same in their underlying problem. This is something that I have been looking at, because you dont want to keep opening and closing the context (see second example) at the same time you are not sure if you can trust Microsoft to properly dispose of the context for you.
One of the things I did was create a common base class that lazy loads the Context in and implement the base class destruct-er to dispose of things. This works well for something like the MVC framework, but unfortunately leads to the problem of having to pass the context around to the various layers so the business objects can share the call.
In the end I went with something using Ninject to inject this dependency into each layer and had it track usage

While I'm not in favour of always creating, what must be, complicated objects each time I need them - I too have found that the DataContexts in Linq to Sql and the ObjectContexts in EF are best created when required.
Both of these perform a lot of static initialisation based on the model that you run them against, which is cached for subsequent calls, so you'll find that the initial startup for a context will be longer than all subsequent instantiations.
The biggest hurdle you face with this is the fact that once you obtain an entity from the context, you can't simply pass it back into another to perform update operations, or add related entities back in. In EF you can reattach an entity back to a new context. In L2S this process is nigh-on impossible.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.