difference between DetectChange and ChangeTracking in entity framework

difference between DetectChange and ChangeTracking in entity framework - c#

i am reading this topic on MSDN
can any one please explain me that what is difference between these two as mention
DetectChanges is use to detect the changes in DBContext and related Entities
and
ChangeTracking is also use to detect changes in Entity as mentioned in this link
please explain me the actual difference betweeen theese two.

So EF needs to detect changes you make to the context, like adding\modifying\removing entities. Entities might be plain POCO entities and so have no embedded behaviour to track changes to their properties. So EF should make a snapshot of the entities it receives from database and then compare this database snapshot with actual state of the context. Even more - EF should track relationships between objects in the context and keep them synchronized. All this is done by method called DetectChanges(). It is being called at various moments, most importantly when you call SaveChanges, but also when you add\remove\attach entities to the context and so on.
If you design your entity classes in a special way (all properties virtual, collections represented by ICollection etc) - you can use automatic change tracking. EF will create special proxy classes inherited from your entity classes and will use that to immediatly detect changes to your entity properties. Note that DetectChanges is still used in this case, exactly as described above. But it performs less work, since most of the changes are already detected right when they happened.
Summary: DetectChanges is a method to perform snapshot-based change detection (and more) and is a part of the group of ways used by Entity Framework to track (detect) changes to the context. Read more about DetectChanges here: http://blog.oneunicorn.com/2012/03/10/secrets-of-detectchanges-part-1-what-does-detectchanges-do/

Related

Do I need entity data model in OnModelCreating for runtime DB context?

So, I'm using EF DbContexts in the following way. One AppDbContext is an abstract class derived from DbContext, it contains my DbSets and is used as a type to inject into services. Its "implementation" is like AppDbContextMySql, which is self-explanatory - it's derived from AppDbContext and handles the connection to the actual DB. There can be several such abstract/implementation pairs to separate the data tables, but usually, all of them point to the same actual DB instance.
Then I need to migrate it all, so I add a MigrationDbContext implementing all the datasets and all the entity configurations needed, namely composite primary keys which can only be configured in OnModelCreating override.
The question is, if I already have the data model configuration in MigrationDbContext, have applied the migration successfully to the DB, and it's DB's job to handle keys and indexes anyway, do I need to have the model configuration in my actually consumed AppDbContext or AppDbContextMySql? In other words, is the model only used to generate migration scripts, or is it also needed at runtime to help EF handle the data in any way?

The short answer is yes, the model is definitely needed in all the cases.
Generating migrations from it is just one of the possible usages, and for sure not the primary - migrations are optional feature which may not be used at all.
The primary purpose of the model is to provide the mappings between the data (objects, properties, navigations) and the storage (database tables, columns and relationships). It is basically the M part of the ORM (Object Relational Mapper) what EF Core is.
It controls all EF Core runtime behaviors - LINQ queries, change tracking, CUD etc. Like what is the associated table/column name, what is the PK/FK property/column, what is the cascade delete behavior, what is the cardinality of the relationship and many others.
The way you asked the question makes me think you are assuming that the model is used just for migrations while at runtime EF Core takes that metadata information from the actual database. That's not the case. For EF Core the database is whatever you told them in the model configuration. Whether the physical database is in sync (correctly mapped) is not checked at all. That's why people start getting errors for non exiting columns in their queries when they screw-up some fluent configuration, or more importantly - do not include such.
Which is the main drawback of using separate (a.k.a. "bounded") contexts for a single database. Because having a DbSet type property in the derived context is not the only way the entity is included in the model. In fact typed DbSet properties are just for convenience and are nothing more than shortcut to the DbContext.Set<T>() method. A common (a sort of hidden) method of including entity in the model is when it is referred (either directly or via collection) by another already included entity. And all that recursively.
And when the entity is included in the model, it needs the associated fluent configuration regardless of the concrete context class. Same for referenced entities, and their references etc.
So I don't really see the benefits of "bounded" context classes - they probably work for simple self containing object set with no relations to other (and vice versa), but can easily be broken by this automatic entity inclusion mechanism.
For reference, see Including types in the model and Creating and configuring a model.

ef core best practice to update complexe objects

We are leading into some issues with ef-core on sql databases in a web-api when trying to update complexe objects on the database provided by a client.
A detailed example: When receiving an object "Blog" with 1-n "Posts" from an client and trying to update this existing object on database, should we:
Make sure the primary keys are set and just use
dbContext.Update(blogFromClient)
Load and track the blog while
including the posts from database, then patch the changes from
client onto this object and use SaveChanges()
When using approach (1) we got issues with:
Existing posts for the existing blog on database are not deleted
when the client does not post them any more, needing to manually
figure them out and delete them
Getting tracking issues ("is already been tracked") if
dependencies of the blog (for example an "User" as "Creator") are
already in ChangeTracker
Cannot unit test our business logic without using a real DbContext
while using a repository pattern (tracking errors do just not exist)
While using a real DbContext with InMemoryDatabase for tests cannot rely on things like foreign-key exceptions or computed
columns
when using approach (2):
we can easily manage updated relations and keep an easy track of
the object
lead into performance penalty because of loading the
object which we do not really need
need to map many manual things
as tools like AutoMapper cannot be used to automaticlly map
objects with n-n relations while keeping a correct track by ef core (getting some primary key errors, as some objects are deleted from lists and are added again with the same primary
key, which is not allowed as the primary key cannot be set on insert)
n-n relations can be easily damaged by this as on database
there could be n-n blog to post, while the post in blog does hold
the same relation to its posts. if only one relation is (blog to
post, but not post to blog - which is the same in sql) is posted and
the other part is deleted from list, ef core will track this entry
as "deleted".
in vanilla SQL we would manage this by
deleting all existing relations for the blog to posts
updating the post itself
creating all new relations
in ef core we cannot write such statements like deleting of bulk relations without loading them before and then keeping detailed track on each relation.
Is there any best practice, how to handle an update of complexe objects with deep relations while getting the "new" data from a client?

The correct approach is #2: "Load and track the blog while including the posts from database, then patch the changes from client onto this object and use SaveChanges()".
As to your concerns:
lead into performance penalty because of loading the object which we do not really need
You are incorrect in assuming you don't need this. You do in fact need this because you absolutely shouldn't be posting every single property on every single entity and related entity, including things that should not be be changed like audit props and such. If you don't post every property, then you will end up nulling stuff out when you save. As such, the only correct path is to always load the full dataset from the database and then modify that via what was posted. Doing it any other way will cause problems and is totally and completely 100% wrong.
need to map many manual things as tools like AutoMapper cannot be used to automaticlly map objects with n-n relations while keeping a correct track by ef core
What you're describing here is a limitation of any automatic mapping. In order to map entity to entity in collections, the tool would have to somehow know what identifies each entity uniquely. That's usually going to be a PK, of course, but AutoMapper doesn't (and shouldn't) make assumptions about that. Instead, the default and naive behavior is to simply replace the collection on the destination with the collection on the source. To EF, though, that looks like you're deleting everything in the collection and then adding new items to the collection, which is the source of your issue.
There's two paths forward. First, you can simply ignore the collection props on the source, and then manually map these. You can still use AutoMapper for the mapping, but you'd simply need to iterate over each item in the collection individually matching it with the appropriate item that should map to it, based on your knowledge of what identifies the entity (i.e. the part AutoMapper doesn't know).
Second, there's actually an additional library for AutoMapper to make this easier: AutoMapper.Collection. The entire point of this library is to provide the ability to tell AutoMapper how to identify your entities, so that it can then map collections correctly. If you utilize this library and add the additional necessary configuration, then you can map your entities as normal without worrying about collections getting messed up.

Referential equality of entity attached to multiple DbContexts?

When receiving an entity from (so it is attached to) DbContext dbContext1, and then manually attaching this entity to DbContext dbContext2, will every entity (representing the same database object) received from either of these two contexts (e.g. as a query result) in the future have referential equality with the first-mentioned object?
I know that entities of the same database object always have referential equality withing the scope of one single DbContext, but can this be extended when attaching an entity to multiple DbContexts?
Background: I have an ASP.NET Core Web API application which does not only use the DbContexts coming with each per-request scope, but there is also a long-living scope for background tasks with its own DbContext. I want to know if entities originating within the request scope can be attached to the long-living scope so that changes to these entities within requests are also seen by the background tasks without the need to refresh those entities separately.

No. EF does some tricks in the context to ensure that you always get the same instance back of a particular entity (basically, it creates an object cache and doesn't create a new instance, unless it can't find an existing instance of that entity in the cache). However, this is per context. If you query entity X from context 1 and then query the same entity X from context 2 and attempt to compare them like x1 == x2, it will return false.
Your only real option if you need referential integrity is to override Equals, GetHashCode and the == operator on your entity class. However, doing so is not trivial, and it's very easy to make a mistake that will either make the equality check overly greedy or overly strict, causing errant true/false results. It's doable, for sure; just do your research first and make sure you know what you're doing. Microsoft has documentation to at least get you started.
However, the simpler and more robust solution here is to simply leave the two context scopes disconnected. Do what you need to do in the request scope and then simply ensure that your context in singleton scope always reloads from the database before doing anything. The documentation on handling concurrency conflicts should give you a good idea of what you need to do, as this is technically a form of concurrency.

How to add an entity without related entities, but saving relation?

As far as I understand, if I change a state of an entry in context like that:
context.Entry(doc).State = EntityState.Added;
the whole object graph behind doc will be set to EntityState.Added. That is how this mechanism described here:
Note that for all of these examples if the entity being added has
references to other entities that are not yet tracked then these new
entities will also be added to the context and will be inserted into
the database the next time that SaveChanges is called.
In my situation this behaviour is undesirable. When I receive doc entity, it's relations are already in database (were added in different context) and adding them again will cause an error. I need to add doc to a database with all references, but don't try to add other objects in graph.
Of course, I can iterate through all graph and set state explicitly, but does an easier way exist?

In Entity Framework Core, the behavior changed, calling:
context.Entry(asset).State = EntityState.Added;
will affect only the entity and not the related ones.
👉 I know the question is for Entity Framework classic (not Core), but surely will be more people using EF Core reaching here (like me) 😉

You may have a look at GraphDiff
According to this dedicated blog entry, it seems to fit your needs :
Say you have a Company which has many Contacts. A contact is not
defined on its own and is a One-To-Many (with required parent) record
of a Company. i.e. The company is the Aggregate Root. Assume you have
a detached Company graph with its Contacts attached and want to
reflect the state of this graph in the database.
At present using the Entity Framework you will need to perform the
updates of the contacts manually, check if each contact is new and
add, check if updated and edit, check if removed then delete it from
the database. Once you have to do this for a few different aggregates
in a large system you start to realize there must be a better, more
generic way.
Well good news is that after a few refactorings I've found a nice solution to this problem.

how can I save/keep-in-sync an in-memory graph of objects with the database?

Question - What is a good best practice approach for how can I save/keep-in-sync an jn-memory graph of objects with the database?
Background:
That is say I have the classes Node and Relationship, and the application is building up a graph of related objects using these classes. There might be 1000 nodes with various relationships between them. The application needs to query the structure hence an in-memory approach is good for performance no doubt (e.g. traverse the graph from Node X to find the root parents)
The graph does need to be persisted however into a database with tables NODES and RELATIONSHIPS.
Therefore what is a good best practice approach for how can I save/keep-in-sync an jn-memory graph of objects with the database?
Ideal requirements would include:
build up changes in-memory and then 'save' afterwards (mandatory)
when saving, apply updates to database in correct order to avoid hitting any database constraints (mandatory)
keep persistence mechanism separate from model, for ease in changing persistence layer if needed, e.g. don't just wrap an ADO.net DataRow in the Node and Relationship classes (desirable)
mechanism for doing optimistic locking (desirable)
Or is the overhead of all this for a smallish application just not worth it and I should just hit the database each time for everything? (assuming the response times were acceptable) [would still like to avoid if not too much extra overhead to remain somewhat scalable re performance]

I'm using the self tracking entities in Entity Framework 4. After the entities are loaded into memory the StartTracking() MUST be called on every entity. Then you can modify your entity graph in memory without any DB-Operations. When you're done with the modifications, you call the context extension method "ApplyChanges(rootOfEntityGraph)" and SaveChanges(). So your modifications are persisted. Now you have to start the tracking again on every entity in the graph. Two hints/ideas I'm using at the moment:
1.) call StartTracking() at the beginning on every entity
I'm using an Interface IWorkspace to abstract the ObjectContext (simplifies testing -> see the OpenSource implementation bbv.DomainDrivenDesign at sourceforge). They also use a QueryableContext. So I created a further concrete Workspace and QueryableContext implementation and intercept the loading process with an own IEnumerable implementation. When the workspace's consumer executes the query which he get with CreateQuery(), my intercepting IEnumerable object registers an eventhandler on the context's ChangeTracker. In this event handler I call StartTracking() for every entity loaded and added into the context (doesn't work if you load the objects with NoTrakcing, because in that case the objects aren't added to the context and the event handler will not be fired). After the enumeration in the self made Iterator, the event handler on the ObjectStateManager is deregistered.
2.) call StartTracking() after ApplyChanges()/SaveChanges()
In the workspace implementation, I ask the context's ObjectStateManager for the modified entities, i.e:
var addedEntities = this.context.ObjectStateManager.GetObjectStateEntries(EntityState.Added);
--> analogous for modified entities
cast them to IObjectWithChangeTracker and call the AcceptChanges() method on the entity itself. This starts the object's changetracker again.
For my project I have the same mandatory points as you. I played around with EF 3.5 and didn't find a satisfactory solution. But the new ability of self tracking entities in EF 4 seems to fit my requirements (as far as I explored the funcionality).
If you're interested, I'll send you my "spike"-project.
Have anyone an alternative solution? My project is a server application which holds objects in memory for fast operations, while modifications should also be persisted (no round trip to DB). At some points in code the object graphs are marked as deleted/terminated and are removed from the in-memory container. With the explained solution above I can reuse the generated model from EF and have not to code and wrapp all objects myself again. The generated code for the self tracking entities arises from T4 templates which can be adapted very easily.
Thanks a lot for other ideas/critism

Short answer is that you can still keep a graph (collection of linked objects) of the objects in memory and write the changes to the database as they occur. If this is taking too long, you could put the changes onto a message queue (but that is probably overkill) or execute the updates and inserts on a separate thread.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.