EntityFramework: To Slice or Not To Slice?

EntityFramework: To Slice or Not To Slice? - c#

So I'm just getting started with Entity Framework. I'm working with a very large, existing database. I find myself wanting to use EF to create models that are "slices" of the whole database. These slices corresponde to 1 aspect of the application. Is that the right way to look at it, or should I try to model the whole database in 1 EDMX?
Let me give you a fictional example:
Suppose that 1 of the many things that this database contains is customer billing information. I feel like I want to create an EF model that just focuses on the tables that the Customer Billing module needs to interact with. (so then that model would NOT be used for other modules in the app, rather, those same tables might appear in other small EF models). This would allow me to leverage EF's conceptual model features (inheritance, etc) to build a view that is correct for Customer Billing, without worrying about that model's effects, on say Customer Support (even though the 2 modules share some tables)
Does that sound right?

It sounds right to me. The point of an Entity Model, after all, is to provide a set of persistence-capable business objects at a level of abstraction that's appropriate to the required business logic.
You should absolutely create entity models that support modules of the application, not models that copy the underlying database schema. As the link above describes, separating logic from persistence is one of the primary purposes of EF.

I would prefer to use a slice approach, based of following reasons:
If you have a massive database with loads of tables, then it would be difficult to manage massive Entity Model.
It is easier to maintain application / domain specific entities, as entity framework is not a table to entity mapping, you can create custom entities and also combine and split tables across entities.

Related

Should I use inheritance in Entity Framework or is there a better approach?

I have various objects that I would like to track in an application. The objects are computers, cameras, switches, routers etc. I want the various objects to inherit from an object called Device since they will all have some properties in common (i.e. IP Address, MAC Address, etc.) I like to create the objects using the designer (Model First) but I do not like the difficulty in updating the database from the model. Basically, I do not like to have to drop the database and recreate it, especially once I start populating the database. The other approach that I experimented with was creating the database using SSMS in SQL Server, but then when I create the POCOs from the database the entities do not inherit from each other. What is a good approach for my situation ?

I want the various objects to inherit from an object called Device since they will all have some properties in common (i.e. IP Address, MAC Address, etc.)
You are essentially talking about which inheritance pattern you are going to use in EF; or how the model maps to your database tables. There are 3 main types of inheritance patterns in EF (see Inheritance Mapping: A Walkthrough Guide for Beginners):
Table-per-Hierarchy
Table-per-Type
Table-per-Concrete Type
Each has pros and cons (such as performance). But, you should also consider that this model is a model that relates to the database, and in larger projects you might then create a second layer to work with for business logic. DDD talks about persistence models and domain models. Again, your choices here are weighing up initial speed of development and scalability and performance later on.
I like to create the objects using the designer (Model First) but I do not like the difficulty in updating the database from the model.
There are 4, and only 4 development strategies for EF (see Entity Framework Development Workflows):
Model First
Code First (new database)
Database First
Code-first (existing database)
I do not like to have to drop the database and recreate it, especially once I start populating the database
Code First is really very, very good at this:
Seeding in Code First allows you to populate databases with test or live data depending on where you are deploying to.
Migrations allow you to do non-destructive updates to the database, and migrate data in a fully testable, utterly reliable fashion for live deployment.
Doing this with Model First is, unfortunately, just harder. The only real solution I know is to generate a new database, and use a SQL compare (with data compare) tool to generate the data in the new database.
Choosing a pattern and a strategy
Each strategy has pros and cons, and each inheritance pattern is better used with particular development strategies. The trade offs are really your own to judge, for example you might have to use database-first if you have en existing database you inherited, or you may be happier using the EF designer so would use model-first.
Model First (by that I mean using the EF designer to define your model) uses the TPT strategy by default. See EF Designer TPT Inheritance. If you want TPH, then you can use Model-first (see EF Designer TPH Inheritance), but you have extra work to do; Code First is a better fit for TPH. TPC is even harder using Model First, and Code First is really the best (only viable) option for that in EF 5.
when I create the POCOs from the database the entities do not inherit from each other
It is good to remember that the model deals with classes; the database deals with storage in tables. When generating a model from your database it is hard for EF to work out what the TPH or TPC inheritance should be. All it can do is create a "best guess" at your model based on the table assocations. You have to help it out after the model is generated by renaming properties, changing associations or applying inheritance. There really is no other way to do this. Updates to the database may also therefore require more work on the model.
Your best approach
That is, unfortunately, down to opinion. However if your primary requirements are:
You want TPH or TPC (or mixed strategies)
You don't want to drop your database when you issue updates to the model
then the best match for these technical requirements is Code First development, with migrations and seeding.
The downside of Code First is having to write your own POCOs, and learning the data annotation attributes. However, keep in mind:
writing POCOs is not so different from writing a database table (and once you are used to it is is just as quick)
Code First is a lot more usable with automated testing (e.g. with DI and/or IoC to test without the database involved), so can have benefits later on
If you are going to do a lot of EDMX manipulation with database first, or a lot of work whenever you drop and update your database using model first, then you are just putting in time and effort in other places instead of in writing POCOs

Entity Framework DataContexts

I'm wrestling with a design and trying to figure out the best way of approaching it.
We have many tables, and in a current LinqToSql implementation, our DBML is many megs in size, very unwieldy. I want to avoid recreating this situation if I can. We decide our connection string on a per user basis, so it got very difficult to make separate dbmls for different groups of tables.
I'm set on using Entity Framework, and although we don't need the Code First elements, I'm liking the lightweight code without all the generation and we don't need the visual mapping so I was thinking of generating the code files for all the tables and then adding them into a DataContext as DbSets.
This got me thinking about best practice here, and I wanted to ask the question;
Is it wise to create a DataContext for every group of tables you want to use. I.e. I'm going to have a module, it will be responsible for gathering data from 5 tables, it doesn't need every single table in the database, just 5. Do I create a DbContext that includes these 5 tables. If I need more in the future I can add them in, but it's lightweight.

While you may have a separate context for each grouping of tables, if your model is that large, or your domains that disparate, you may want to look into adding a layer of abstraction. By this, I mean having a single context that encompasses your whole model, then adding something along the lines of the repository pattern. This is a decent write-up on accomplishing this with EF.
By doing this, would you be essentially accomplishing two goals: abstracting out your data tier, thus freeing up implementation concerns; and, allowing your developers to work with just the entities they need, possibly grouped by aggregate root.
One thing I would like to make clear though. I am not necessarily suggesting that you go with a specific end-to-end architecture (i.e. DDD). What I am trying to do here is suggest a few patterns that will give you the flexibility to allow you to make mistakes (fail gracefully) while still making progress with your project.

You can certainly do this. You just add tables to the edmx model just as in Linq2SQL so by just adding the 5 tables you need you'll save on having any overhead for entity tracking for the other untracked tables. Entity Framework nicely adds 2-way Navigation Properties which Linq2SQL doesn't have too. I'd recommend using EF instead of Linq2SQL.

There is nothing inherently bad about a large DBML model, the performance impact should be negligible in EF.
On the other hand in my opinion reducing complexity also applies to Entity Framework - if your code only needs 5 tables from the database by all means create a separate context that only has the entities for those 5 tables. By factoring out completely independent tables into separate contexts you are expressing this separation in clear way - there are no dependencies from these tables to other tables in your database, and no dependencies from the code to unrelated entities - if that is the case I think (and there might be other opinions) this is the way to go.
However keep in mind that if you need some of those tables in another context you would have to put the corresponding entities into that context as well - it can get hard to understand that the same tables are present in multiple context or even have cross-dependencies between contexts. That should be avoided since it adds complexity.

Question about Entity Framework: Why do we need to remodel our generated models by visual studio?

I am reading the Entity Framework 4.0 recipe. In chapter 2, it has a bunch of recipes for modeling our entities relationship ( Table per Type, one to many, ...) basing on the relationship table.
My question is that EF will automatically create models to match our database tables relationship already. So why do we need to remodel our entity models again even though that won't change our database scheme and tables?
Note: I am using an existing database scheme and don't want to change any relationship from the database.

Well that's is the whole idea of Entity Framework that a conceptual model is how we portray or perceive the business looks like. However the data is stored usually in 2nd or 3rd normal form to make efficient use of disk space and also improve the efficiency of data retrieval. To make both worlds happy, we consume the best of both worlds and apply modeling on top of it mask the conceptual model into the storage model.
If you business model is exactly similar to how you store your data in the database, then you do not need to remodel it.

Application Design - Database Tables and Interfaces

I have a database with tables for each entity in the system. e.g. PersonTable has columns PersonId, Name, HomeStateId. There is also a table for 'reference data' (i.e. states, countries, all currencies, etc.) data that will be used to fill drop down list boxes. This reference table will also be used so that PersonTable's HomeStateId will be a foreign key to the reference table.
In the C# application we have interfaces and classes defined for the entity.
e.g. PersonImplementationClass : IPersonInterface. The reason for having the interfaces for each entity is because the actual entity class will store data differently depending on a 3rd party product that may change.
The question is, should the interface have properties for Name, HomeStateId, and HomeStateName (which will be retrieved from the reference table). OR should the interface not expose the structure of the database, i.e. NOT have HomeStateId, and just have Name, HomeStateName?

I'd say you're on the right track when thinking about property names!
Model your classes as you would in the real world.
Forget the database patterns and naming conventions of StateID and foreign keys in general. A person has a city, not a cityID.
It'll be up to your data layer to map and populate the properties of those objects at run time. You should have the freedom to express your intent and the representation of 'real world' objects in your code, and not be stuck to your DB implementation.

Either way is acceptable, but they both have their pros and cons.
The first way (entities have IDs) is analagous to the ActiveRecord pattern, where your entities are thin wrappers over the database structure. This is often a flexible and fast way of structuring your data layer, because your entities have freedom to work directly with the database to accomplish domain operations. The drawback is that when the data model changes, your app is likely to need maintenance.
The second way (entities reflect more of a real-world structure) is more analagous to a heavier ORM like Entity Framework or Hibernate. In this type of data access layer, your entity management framework would take care of automatically mapping the entities back and forth into the database. This more cleanly separates the application from the data, but can be a lot more plumbing to deal with.
This is a big choice, and shouldn't be taken lightly. It really depends on your project requirements and size, who will be consuming it.

It may help to separate the design a little bit.
For each entity, use two classes:
One that deals with database operations on the entity (where you would put IDs)
One that is a simple data object (where you would have standard fields that actually mean something)
As #womp mentioned, if your entity persistence is only going to be to databases, strongly consider the use of an ORM so you don't end up rolling your own.

When to separate certain entities into different repositories?

I generally try and keep all related entities in the same repository. The following are entities that have a relationship between the two (marked with indentation):
User
UserPreference
So they make sense to go into a user repository. However users are often linked to many different entities, what would you do in the following example?
User
UserPrefence
Order
Order
Product
Order has a relationship with both product and user but you wouldn't put functionality for all 4 entities in the same repository. What do you do when you are dealing with the user entities and gathering order information? You may need extra information about the product and often ORMs will offer the ability of lazy loading. However if your product entity is in a separate repository to the user entity then surely this would cause a conflict between repositories?

In the Eric Evan's Domain Driven Design ( http://domaindrivendesign.org/index.htm ) sense of things you should first think about what about your Aggregates. You then build you repositories around those.
There are many techniques for handling Aggregates that relate to each other. The one that I use most often is to only allow Aggregates to relate to each other through a read only interface. One of the key thoughts behind Aggregates is that you can't change state of underlying objects without going through the root. So if Product and User are root Aggregates in your model than I can't update a Product if I got to it by going through User->Order->Product. I have to get the Product from the Product repository to edit it. (From a UI point of view you can make it look like you go User->Order->Product, but when you hit the Product edit screen you grab the entity from the Product Repository).
When you are looking at a Product (in code) by going from User->Order->Product you should be looking at a Product interface that does not have any way to change the underlying state of the Product (only gets no sets etc.)
Organize your Aggregates and therefor Repositories by how you use them. I can see User and Prodcut being their own Aggregates and having their own Repositories. I'm not sure from your description if Order should belong to User or also be stand alone.
Either way use a readonly interface when Aggregates relate. When you have to cross over from one Aggregate to the other go fetch it from its own Repository.
If your Repositories are caching then when you load an Order (through a User) only load the Product Id's from the database. Then load the details from the Product Repository using the Product Id. You can optimize a bit by loading any other invariants on the Product as you load the Order.

By repository you mean class?
Depending on the use of the objects (repositories) you could make a view that combines the data on the database and create a class (repository) with your ORM to represent that view. This design would work when you want to display lighter weight objects with only a couple columns from each of the tables.

If SQL Server is your database, and by repository you mean a database, then I would just stick the information in whatever database makes sense and have a view in dependent databases that selects out of the other database via three-dot notation.

I'm still confused by what you mean by "repository." I would make all of the things you talked about separate classes (and therefore separate files) and they'd all reside in the same project.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.