We are using entity framework code first to save reports to SQL database, many of the objects have many to many relations so the data is split into different tables.
In order to prevent duplication of data we first check if a certain object is already saved and later on we add the relation to the database.
For example have object Person that can have many countries, and object Country that can hold multiple Person object.
During the beginning of the save flow we query the database for existing countries and update them in the Person object if they exist, or create them if they don't.
This flow worked fine while we have only one saving process at the same time but now we got requirement to support it many times simultaneously and my worry is that one thread will add a new country right after other thread will check existing countries.
I was wondering what good practices are there to solve this problem with minimal impact on performance.
Thanks!
It doesn't sound like your fully leveraging the capabilities of your chosen ORM. Relationships are represented in your returned entities if you are using the library according to it's documentation. So updating a single entity's many-to-many relationship would update for all other related entities so long as the EntityID remains the same.
If you still are having trouble trusting the integrity of this relationship, I would suggest using the bulk-update feature of entity framework
Related
We are leading into some issues with ef-core on sql databases in a web-api when trying to update complexe objects on the database provided by a client.
A detailed example: When receiving an object "Blog" with 1-n "Posts" from an client and trying to update this existing object on database, should we:
Make sure the primary keys are set and just use
dbContext.Update(blogFromClient)
Load and track the blog while
including the posts from database, then patch the changes from
client onto this object and use SaveChanges()
When using approach (1) we got issues with:
Existing posts for the existing blog on database are not deleted
when the client does not post them any more, needing to manually
figure them out and delete them
Getting tracking issues ("is already been tracked") if
dependencies of the blog (for example an "User" as "Creator") are
already in ChangeTracker
Cannot unit test our business logic without using a real DbContext
while using a repository pattern (tracking errors do just not exist)
While using a real DbContext with InMemoryDatabase for tests cannot rely on things like foreign-key exceptions or computed
columns
when using approach (2):
we can easily manage updated relations and keep an easy track of
the object
lead into performance penalty because of loading the
object which we do not really need
need to map many manual things
as tools like AutoMapper cannot be used to automaticlly map
objects with n-n relations while keeping a correct track by ef core (getting some primary key errors, as some objects are deleted from lists and are added again with the same primary
key, which is not allowed as the primary key cannot be set on insert)
n-n relations can be easily damaged by this as on database
there could be n-n blog to post, while the post in blog does hold
the same relation to its posts. if only one relation is (blog to
post, but not post to blog - which is the same in sql) is posted and
the other part is deleted from list, ef core will track this entry
as "deleted".
in vanilla SQL we would manage this by
deleting all existing relations for the blog to posts
updating the post itself
creating all new relations
in ef core we cannot write such statements like deleting of bulk relations without loading them before and then keeping detailed track on each relation.
Is there any best practice, how to handle an update of complexe objects with deep relations while getting the "new" data from a client?
The correct approach is #2: "Load and track the blog while including the posts from database, then patch the changes from client onto this object and use SaveChanges()".
As to your concerns:
lead into performance penalty because of loading the object which we do not really need
You are incorrect in assuming you don't need this. You do in fact need this because you absolutely shouldn't be posting every single property on every single entity and related entity, including things that should not be be changed like audit props and such. If you don't post every property, then you will end up nulling stuff out when you save. As such, the only correct path is to always load the full dataset from the database and then modify that via what was posted. Doing it any other way will cause problems and is totally and completely 100% wrong.
need to map many manual things as tools like AutoMapper cannot be used to automaticlly map objects with n-n relations while keeping a correct track by ef core
What you're describing here is a limitation of any automatic mapping. In order to map entity to entity in collections, the tool would have to somehow know what identifies each entity uniquely. That's usually going to be a PK, of course, but AutoMapper doesn't (and shouldn't) make assumptions about that. Instead, the default and naive behavior is to simply replace the collection on the destination with the collection on the source. To EF, though, that looks like you're deleting everything in the collection and then adding new items to the collection, which is the source of your issue.
There's two paths forward. First, you can simply ignore the collection props on the source, and then manually map these. You can still use AutoMapper for the mapping, but you'd simply need to iterate over each item in the collection individually matching it with the appropriate item that should map to it, based on your knowledge of what identifies the entity (i.e. the part AutoMapper doesn't know).
Second, there's actually an additional library for AutoMapper to make this easier: AutoMapper.Collection. The entire point of this library is to provide the ability to tell AutoMapper how to identify your entities, so that it can then map collections correctly. If you utilize this library and add the additional necessary configuration, then you can map your entities as normal without worrying about collections getting messed up.
I'm having a hard time just defining my situation so please be patient. Either I have a situation that no one blogs about, or I've created a problem in my mind by lack of understanding the concepts.
I have a database which is something of a mess and the DB owner wants to keep it that way. By mess I mean it is not normalized and no relationships defined although they do exist...
I want to use EF, and I want to optimize my code by reducing database calls.
As a simplified example I have two tables with no relationships set like so:
Table: Human
HumanId, HumanName, FavoriteFoodId, LeastFavoriteFoodId, LastFoodEatenId
Table: Food
FoodId, FoodName, FoodProperty1, FoodProperty2
I want to write a single EF database call that will return a human and a full object for each related food item.
First, is it possible to do this?
Second, how?
Boring background information: A super sql developer has written a query that returns 21 tables in 20 milliseconds which contain a total of 1401 columns. This is being turned into an xml document for our front end developer to bind to. I want to change our technique to use objects and thus reduce the amount of hand coding and mapping from fields to xml (not to mention the handling of nulls vs empty strings etc) and create a type safe compile time environment. Unfortunately we are not allowed to change the database or add relationships...
If I understand you correct, it's better for you to use Entity Framework Code First Approach:
You can define your objects (entities) Human and Food
Make relations between them in code even if they don't have foreign keys in DB
Query them usinq linq-to-sql
And yes, you can select all related information in one call.
You can define the relationships in the code with Entity Framework using Fluent API. In your case you might be able to define your entities manually, or use a tool to reverse engineer your EF model from an existing database. There is some support for this built in to Visual Studio, and there are VS extensions like EF Power Tools that offer this capability.
As for making a single call to the database with EF, you would probably need to create a stored procedure or a view that returns all of the information you need. Using the standard setup with lazy-loading enabled, EF will make calls to the database and populate the data as needed.
As far as I understand, if I change a state of an entry in context like that:
context.Entry(doc).State = EntityState.Added;
the whole object graph behind doc will be set to EntityState.Added. That is how this mechanism described here:
Note that for all of these examples if the entity being added has
references to other entities that are not yet tracked then these new
entities will also be added to the context and will be inserted into
the database the next time that SaveChanges is called.
In my situation this behaviour is undesirable. When I receive doc entity, it's relations are already in database (were added in different context) and adding them again will cause an error. I need to add doc to a database with all references, but don't try to add other objects in graph.
Of course, I can iterate through all graph and set state explicitly, but does an easier way exist?
In Entity Framework Core, the behavior changed, calling:
context.Entry(asset).State = EntityState.Added;
will affect only the entity and not the related ones.
👉 I know the question is for Entity Framework classic (not Core), but surely will be more people using EF Core reaching here (like me) 😉
You may have a look at GraphDiff
According to this dedicated blog entry, it seems to fit your needs :
Say you have a Company which has many Contacts. A contact is not
defined on its own and is a One-To-Many (with required parent) record
of a Company. i.e. The company is the Aggregate Root. Assume you have
a detached Company graph with its Contacts attached and want to
reflect the state of this graph in the database.
At present using the Entity Framework you will need to perform the
updates of the contacts manually, check if each contact is new and
add, check if updated and edit, check if removed then delete it from
the database. Once you have to do this for a few different aggregates
in a large system you start to realize there must be a better, more
generic way.
Well good news is that after a few refactorings I've found a nice solution to this problem.
I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.
Given a table, is there any way to identify all tables which are taking foreign key reference on that table?
The actual scenario goes like this. Given a database, I have a set of C# schema classes which I have to populated from data in the database and stored them onto a cache. All these schemas should always be in sync with the database.
Now, I have two ways to solve the above problem, one is that whenever a database change happens, go and update all the stored schemas which will be very costly. The other is use some heuristic based algo to correctly identify the schema which will be impacted from a db change and update those only.
In order to do implement this, I was thinking of building a dependency tree/graph kind of structure where a table T1 is called as dependent on Table T2 of T1 has a foreign key constrain on T2. So that whenever a change happens in one or more table, I can quickly iterate over the graph and says that these all schemas needs to be updated.
I know that using Data Dictionaries you can find these kind of dependencies but since I am using Entity Framework, I'm looking for a way of doing it through Entity Framework.
Also, if someone has a better approach of doing the same, share that as well.