We are leading into some issues with ef-core on sql databases in a web-api when trying to update complexe objects on the database provided by a client.
A detailed example: When receiving an object "Blog" with 1-n "Posts" from an client and trying to update this existing object on database, should we:
Make sure the primary keys are set and just use
dbContext.Update(blogFromClient)
Load and track the blog while
including the posts from database, then patch the changes from
client onto this object and use SaveChanges()
When using approach (1) we got issues with:
Existing posts for the existing blog on database are not deleted
when the client does not post them any more, needing to manually
figure them out and delete them
Getting tracking issues ("is already been tracked") if
dependencies of the blog (for example an "User" as "Creator") are
already in ChangeTracker
Cannot unit test our business logic without using a real DbContext
while using a repository pattern (tracking errors do just not exist)
While using a real DbContext with InMemoryDatabase for tests cannot rely on things like foreign-key exceptions or computed
columns
when using approach (2):
we can easily manage updated relations and keep an easy track of
the object
lead into performance penalty because of loading the
object which we do not really need
need to map many manual things
as tools like AutoMapper cannot be used to automaticlly map
objects with n-n relations while keeping a correct track by ef core (getting some primary key errors, as some objects are deleted from lists and are added again with the same primary
key, which is not allowed as the primary key cannot be set on insert)
n-n relations can be easily damaged by this as on database
there could be n-n blog to post, while the post in blog does hold
the same relation to its posts. if only one relation is (blog to
post, but not post to blog - which is the same in sql) is posted and
the other part is deleted from list, ef core will track this entry
as "deleted".
in vanilla SQL we would manage this by
deleting all existing relations for the blog to posts
updating the post itself
creating all new relations
in ef core we cannot write such statements like deleting of bulk relations without loading them before and then keeping detailed track on each relation.
Is there any best practice, how to handle an update of complexe objects with deep relations while getting the "new" data from a client?
The correct approach is #2: "Load and track the blog while including the posts from database, then patch the changes from client onto this object and use SaveChanges()".
As to your concerns:
lead into performance penalty because of loading the object which we do not really need
You are incorrect in assuming you don't need this. You do in fact need this because you absolutely shouldn't be posting every single property on every single entity and related entity, including things that should not be be changed like audit props and such. If you don't post every property, then you will end up nulling stuff out when you save. As such, the only correct path is to always load the full dataset from the database and then modify that via what was posted. Doing it any other way will cause problems and is totally and completely 100% wrong.
need to map many manual things as tools like AutoMapper cannot be used to automaticlly map objects with n-n relations while keeping a correct track by ef core
What you're describing here is a limitation of any automatic mapping. In order to map entity to entity in collections, the tool would have to somehow know what identifies each entity uniquely. That's usually going to be a PK, of course, but AutoMapper doesn't (and shouldn't) make assumptions about that. Instead, the default and naive behavior is to simply replace the collection on the destination with the collection on the source. To EF, though, that looks like you're deleting everything in the collection and then adding new items to the collection, which is the source of your issue.
There's two paths forward. First, you can simply ignore the collection props on the source, and then manually map these. You can still use AutoMapper for the mapping, but you'd simply need to iterate over each item in the collection individually matching it with the appropriate item that should map to it, based on your knowledge of what identifies the entity (i.e. the part AutoMapper doesn't know).
Second, there's actually an additional library for AutoMapper to make this easier: AutoMapper.Collection. The entire point of this library is to provide the ability to tell AutoMapper how to identify your entities, so that it can then map collections correctly. If you utilize this library and add the additional necessary configuration, then you can map your entities as normal without worrying about collections getting messed up.
Related
We are using entity framework code first to save reports to SQL database, many of the objects have many to many relations so the data is split into different tables.
In order to prevent duplication of data we first check if a certain object is already saved and later on we add the relation to the database.
For example have object Person that can have many countries, and object Country that can hold multiple Person object.
During the beginning of the save flow we query the database for existing countries and update them in the Person object if they exist, or create them if they don't.
This flow worked fine while we have only one saving process at the same time but now we got requirement to support it many times simultaneously and my worry is that one thread will add a new country right after other thread will check existing countries.
I was wondering what good practices are there to solve this problem with minimal impact on performance.
Thanks!
It doesn't sound like your fully leveraging the capabilities of your chosen ORM. Relationships are represented in your returned entities if you are using the library according to it's documentation. So updating a single entity's many-to-many relationship would update for all other related entities so long as the EntityID remains the same.
If you still are having trouble trusting the integrity of this relationship, I would suggest using the bulk-update feature of entity framework
As far as I understand, if I change a state of an entry in context like that:
context.Entry(doc).State = EntityState.Added;
the whole object graph behind doc will be set to EntityState.Added. That is how this mechanism described here:
Note that for all of these examples if the entity being added has
references to other entities that are not yet tracked then these new
entities will also be added to the context and will be inserted into
the database the next time that SaveChanges is called.
In my situation this behaviour is undesirable. When I receive doc entity, it's relations are already in database (were added in different context) and adding them again will cause an error. I need to add doc to a database with all references, but don't try to add other objects in graph.
Of course, I can iterate through all graph and set state explicitly, but does an easier way exist?
In Entity Framework Core, the behavior changed, calling:
context.Entry(asset).State = EntityState.Added;
will affect only the entity and not the related ones.
👉 I know the question is for Entity Framework classic (not Core), but surely will be more people using EF Core reaching here (like me) 😉
You may have a look at GraphDiff
According to this dedicated blog entry, it seems to fit your needs :
Say you have a Company which has many Contacts. A contact is not
defined on its own and is a One-To-Many (with required parent) record
of a Company. i.e. The company is the Aggregate Root. Assume you have
a detached Company graph with its Contacts attached and want to
reflect the state of this graph in the database.
At present using the Entity Framework you will need to perform the
updates of the contacts manually, check if each contact is new and
add, check if updated and edit, check if removed then delete it from
the database. Once you have to do this for a few different aggregates
in a large system you start to realize there must be a better, more
generic way.
Well good news is that after a few refactorings I've found a nice solution to this problem.
We are running into various StackOverflowException and OutOfMemoryExceptions when sending certain entities to the server as part of an InvokeServerMethod call. The problem seems to come up because DevForce ends up trying to serialize a ton more data than we are expecting it to. I tracked it down to data that is stored in the OriginalValuesMap.
The original values are for DataEntityProperties that we've added to the entity but that aren't marked with [DataMember] so they normally don't get sent to the server. But if we have an existing (previously saved entity) and then change one of those properties, the initial value of the property does end up getting serialized as part of the OriginalValuesMap. This is causing us big problems because it turns out the original value is an entity that has a huge entity graph.
Adding to the problem, the entities we are dealing with are actually clones (via ((ICloneable)origEntity).Clone()) of existing (previously saved) entities so they have a state of detached and I haven't found a way to clear the OriginalValuesMap for detached entities. Usually I'd do myEntity.EntityAspect.AcceptChanges() but that doesn't do anything for detached entities. I couldn't find any other easy way to do this.
So far, the only way I've found to clear the original values is to attach the entity to an Entity Manager. This ends up clearing the original values but it is a major pain because I'm actually dealing with a large number of entities (so performance is a concern) and many of these entities don't have unique primary key values (in fact, they don't have any key values filled in because they are just 'in memory' objects that I don't plan to actually ever save) so I need to do extra work to avoid 'duplicate key exception' errors when adding them to an entity manager.
Is there some other way I can clear the original values for a detached entity? Or should detached entities even be tracking original values in the first place if things like AcceptChanges don't even work for detached entities? Or maybe a cloned entity shouldn't 'inherit' the original values of its source? I don't really have a strong opinion on either of these possibilities...I just want to be able to serialize my entities.
Our app is a Silverlight client running DevForce 2012 v7.2.4.0
Before diving into the correct behavior for detached entities, I'd like to back up and verify that it really is the OriginalValuesMap which is causing the exception. The contents of the OriginalValuesMap should follow the usual rules for the DataContractSerializer, so I'd think that non-DataMember items would not be serialized. Can you try serializing one of these problem entities to a text file to send to IdeaBlade support? You can use SerializationFns.Save(entity, filename, null, false) to quickly serialize an item. If it does look like the OriginalValuesMap contains things it shouldn't, I'll also need the type definition(s) involved.
I have a table that used throughout an app by Entity. I have a view that returns an identical column set, but is actually a union on itself to try to work around some bad normalization (The app is large and partially out of my hands, this part is unavoidable).
Is it possible to have Entity 4 treat a view that is exactly like a table as the same type, so that I can use this view to populate a collection of the same type? This question seems to indicate it is possible in nhibernatem but I can't find anything like it for entity. It would be an extra bonus of the navigation properties could still be used to Include(), but this is not necessary (I can always manually join).
Since EF works on mappings from objects to database entities this is not directly possible. What you need is something like changing the queried database entity dynamically, and AFAIK this is not possible without manually changing the object context.
For sure the EF runtime won't care as long as it can treat the view as if it was completely separate table. The two possible challenges that I forsee are:
Tooling: Our wizard does allow you to select views when doing reverse engineering (i.e. database-first). Definitively if you can use 'code first against an existing database' you can just pretend that the view is just a table, but you won't get any help scripting the database creation or migrations.
Updates: in general you can perform updates for a view setting up store procedure mapping (which is available in the EF Designer from v1 or in Code First starting in EF6). You might also be able to make your view updatable directly or using instead off triggers (see "Updatable Views" here for more details). If I remember correctly the SQL generated by EF to retrieve database generated values (e.g. for identity columns) is not compatible in some cases with instead-off triggers. Yet another alternative is to have your application treat the view as read-only and perform all updates through the actual table, which you would map as a separate entity. Keep in in mind that in-memory entities for the view and the original table will not be kept in sync.
Hope this helps!
Question - What is a good best practice approach for how can I save/keep-in-sync an jn-memory graph of objects with the database?
Background:
That is say I have the classes Node and Relationship, and the application is building up a graph of related objects using these classes. There might be 1000 nodes with various relationships between them. The application needs to query the structure hence an in-memory approach is good for performance no doubt (e.g. traverse the graph from Node X to find the root parents)
The graph does need to be persisted however into a database with tables NODES and RELATIONSHIPS.
Therefore what is a good best practice approach for how can I save/keep-in-sync an jn-memory graph of objects with the database?
Ideal requirements would include:
build up changes in-memory and then 'save' afterwards (mandatory)
when saving, apply updates to database in correct order to avoid hitting any database constraints (mandatory)
keep persistence mechanism separate from model, for ease in changing persistence layer if needed, e.g. don't just wrap an ADO.net DataRow in the Node and Relationship classes (desirable)
mechanism for doing optimistic locking (desirable)
Or is the overhead of all this for a smallish application just not worth it and I should just hit the database each time for everything? (assuming the response times were acceptable) [would still like to avoid if not too much extra overhead to remain somewhat scalable re performance]
I'm using the self tracking entities in Entity Framework 4. After the entities are loaded into memory the StartTracking() MUST be called on every entity. Then you can modify your entity graph in memory without any DB-Operations. When you're done with the modifications, you call the context extension method "ApplyChanges(rootOfEntityGraph)" and SaveChanges(). So your modifications are persisted. Now you have to start the tracking again on every entity in the graph. Two hints/ideas I'm using at the moment:
1.) call StartTracking() at the beginning on every entity
I'm using an Interface IWorkspace to abstract the ObjectContext (simplifies testing -> see the OpenSource implementation bbv.DomainDrivenDesign at sourceforge). They also use a QueryableContext. So I created a further concrete Workspace and QueryableContext implementation and intercept the loading process with an own IEnumerable implementation. When the workspace's consumer executes the query which he get with CreateQuery(), my intercepting IEnumerable object registers an eventhandler on the context's ChangeTracker. In this event handler I call StartTracking() for every entity loaded and added into the context (doesn't work if you load the objects with NoTrakcing, because in that case the objects aren't added to the context and the event handler will not be fired). After the enumeration in the self made Iterator, the event handler on the ObjectStateManager is deregistered.
2.) call StartTracking() after ApplyChanges()/SaveChanges()
In the workspace implementation, I ask the context's ObjectStateManager for the modified entities, i.e:
var addedEntities = this.context.ObjectStateManager.GetObjectStateEntries(EntityState.Added);
--> analogous for modified entities
cast them to IObjectWithChangeTracker and call the AcceptChanges() method on the entity itself. This starts the object's changetracker again.
For my project I have the same mandatory points as you. I played around with EF 3.5 and didn't find a satisfactory solution. But the new ability of self tracking entities in EF 4 seems to fit my requirements (as far as I explored the funcionality).
If you're interested, I'll send you my "spike"-project.
Have anyone an alternative solution? My project is a server application which holds objects in memory for fast operations, while modifications should also be persisted (no round trip to DB). At some points in code the object graphs are marked as deleted/terminated and are removed from the in-memory container. With the explained solution above I can reuse the generated model from EF and have not to code and wrapp all objects myself again. The generated code for the self tracking entities arises from T4 templates which can be adapted very easily.
Thanks a lot for other ideas/critism
Short answer is that you can still keep a graph (collection of linked objects) of the objects in memory and write the changes to the database as they occur. If this is taking too long, you could put the changes onto a message queue (but that is probably overkill) or execute the updates and inserts on a separate thread.