I have a legacy database that I'd like to interact with Entity Framework.
The database is highly normalised for storing information about flights. In order to make it easier to work with some of the data, a number of SQL Views have been written to flatten data and to pivot certain multi-table joins into more logical information.
After quickly looking over this I see two problems with using Views in EF.
The Views contains lots and lots of Keys. Some quick googling seems to indicate I will need to manually edit the EDMX file to remove this info.
The Views don't have any relationships to the other table entities. These associations need to be manually added in order to link a View -> Table.
Both of these seem like major pain points when it comes to refreshing the Model from the DB, when teh DBA team make changes.
Is this just something you need to "put up with" when working with EF or are there any suggested patterns/practices to deal with these.
Mixing Table-Entities with View-Entities is ok and largely depends on your requirements.
My experience has been these are things you are going to have to deal with.
When I first started using Entity, I used views a lot because I was told I needed to use them. As I became more familiar with Entity I began to prefer the use of table-entities over view-entities; mainly because I felt I had more control. Views are ok when you are presenting read-only info, or as you described (flattend data, pivots, joins etc.); however, when your requirements change and you now have to add CRUD, you are going to have to use stored procedures or change your model to use table-entites anyway, so you might as well use table-entities from the start.
The Views contains lots and lots of Keys. Some quick googling seems to
indicate I will need to manually edit the EDMX file to remove this
info.
This wasn't ever really a problem for me. You can undo keys of the view-entity in the designer. If your talking about doing this for the view in the storage layer, then yes, you can, to make it work, but as soon as you update your model from the database, you are going to have to do this over again -- I wouldn't recommend doing this. You are better off working with your DBA to adjust the key constraints in the database.
The Views don't have any relationships to the other table entities.
These associations need to be manually added in order to link a View
-> Table.
This was often a problem for me. Sometimes you are able to add keys and create relationships without any problems, but often times you may have to change the keys and/or relationships in the db to make it work -- this depends on your requirements; you may have to deal with this even when using table-entities.
Hope this helps.
I've been in a similar situation as we transitioned into using Entity Framework.
The first step was to start with a blank EF model and add the tables when we created the domain service calls. This at least meant that the model wasn't crazy to start with! Then the plan was to try and not use views as much as possible and move that kind of logic into the domain service, where at least it could be tested, and slowly deprecate the CRUD stored procedures. It's worked fine and there haven't really been any major problems.
In practice there are still some views, mainly used for situations that need to be performant, Fortunately these views can be considered in isolation (for read only grids) and have been left as such in the model with no associations. Adding the keys in would I'm sure be annoying.
Editing the EDMX file is okay, but sometimes on a model refresh these changes can get lost. This has happened to me particularly when EF thinks a table is a view. And yes it's a pain and something that has just been put up with.
Related
I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.
I've been practicing the Code First concept on my database. Since I have worked with databases longer than .NET Code Development, I tend to always consider ways to leverage the power of the database and so my standard practice has always been Database first and then use an ORM like EF to generate the context for the DAL.
So please excuse if this may seem like a stupid question. I've been getting accustomed to using the Code Migration utility to create my primary and foreign key relationships, define Primary Keys and etc. Great learning experience but for some tasks I think it is better to simply go straight to the database and run a script. Such as creating check constraints, stored procedures to implement as functions in code and etc.
So my question is, if I perform a task like creating a check constraint in the database is there any way to propagate the change to the code first model? I know the update-migration command will force changes in the model to the database so as I am a beginner to this method I am wondering how can I get Code First to receive changes in the database as opposed to pushing changes out.
I ask because everything I've learned so far seem to enforce the term "Code First" as if when you begin developing in this mode, you cannot reverse the logic if needed.
Is this right or am I simply missing something???
Also, is it considered a common or reasonable practice to use Code First and EF? Seems as though Code First provides the same benefits as EF and so it would seem to be a redundant use of code.
I am currently working on an MVC 4 application. I am planning to implement a command query seperation pattern to enhance performance and the structure of the application. I am happy with my commands - which map my view models to my entities use then use nhibernate to save my data.
The commands and queries will be running off the same database.
I am a bit unsure of the best approach to manage my queries. In my last project I used Stored procedures for all of my reads/queries, then used automapper to map my IDataReaders to my ViewModels. This worked ok but the main problem was the turn around time of writing the stored procedures and also when the domain model changed the stored procedures got out of sync.
Therefore, ideally I would like something that auto generated the views or sprocs from my view models. But realistically, I cannot see a way of doing this. As the Sprocs/Views need some knowledge of potentially more that one table. So simply reflecting on the View model properties would not be enough.
I could auto generate a table for each view model, read this during development, then once the domain was stable and before we went to test convert these to views/sprocs?
So I guess what I am asking is:
Has anyone managed to solve the sproc/view auto generation problem I described above? (this would be my favourite outcome!) Or even better has designed a much more graceful solution!
Or is it more sensible to only implement raw ADO reads where they are absolutely necessary - i.e searches, and there dispense with the need for lots of sprocs/views.
But then still separate out my queries into a separate channel (but inside some of them they use NHibernate, whilst others use my ADO reader).
(p.s I have looked at the other stackoverflow CQS related questions and I hope mine is different enough to warrant this question)
What do stored procedures solve for you? Why can't you use NHibernate for reads too? Are the queries NHibernate produces that bad?
If performance of reads is crucial for you, and the shape of your viewmodels is very different from how you store your model - making the denormalization process to a viewmodel too heavy to do on the fly, you might have to consider completely splitting reads and writes.
When you write something, you can raise an event - often done asynchronously - on which subscribers listening can store data on the readside in such a way that it's optimal for reads (close to the shape of your viewmodel). This would make querying really fast.
Since a picture says more than a thousand words..
You can read a good introduction to CQRS here.
I'm wrestling with a design and trying to figure out the best way of approaching it.
We have many tables, and in a current LinqToSql implementation, our DBML is many megs in size, very unwieldy. I want to avoid recreating this situation if I can. We decide our connection string on a per user basis, so it got very difficult to make separate dbmls for different groups of tables.
I'm set on using Entity Framework, and although we don't need the Code First elements, I'm liking the lightweight code without all the generation and we don't need the visual mapping so I was thinking of generating the code files for all the tables and then adding them into a DataContext as DbSets.
This got me thinking about best practice here, and I wanted to ask the question;
Is it wise to create a DataContext for every group of tables you want to use. I.e. I'm going to have a module, it will be responsible for gathering data from 5 tables, it doesn't need every single table in the database, just 5. Do I create a DbContext that includes these 5 tables. If I need more in the future I can add them in, but it's lightweight.
While you may have a separate context for each grouping of tables, if your model is that large, or your domains that disparate, you may want to look into adding a layer of abstraction. By this, I mean having a single context that encompasses your whole model, then adding something along the lines of the repository pattern. This is a decent write-up on accomplishing this with EF.
By doing this, would you be essentially accomplishing two goals: abstracting out your data tier, thus freeing up implementation concerns; and, allowing your developers to work with just the entities they need, possibly grouped by aggregate root.
One thing I would like to make clear though. I am not necessarily suggesting that you go with a specific end-to-end architecture (i.e. DDD). What I am trying to do here is suggest a few patterns that will give you the flexibility to allow you to make mistakes (fail gracefully) while still making progress with your project.
You can certainly do this. You just add tables to the edmx model just as in Linq2SQL so by just adding the 5 tables you need you'll save on having any overhead for entity tracking for the other untracked tables. Entity Framework nicely adds 2-way Navigation Properties which Linq2SQL doesn't have too. I'd recommend using EF instead of Linq2SQL.
There is nothing inherently bad about a large DBML model, the performance impact should be negligible in EF.
On the other hand in my opinion reducing complexity also applies to Entity Framework - if your code only needs 5 tables from the database by all means create a separate context that only has the entities for those 5 tables. By factoring out completely independent tables into separate contexts you are expressing this separation in clear way - there are no dependencies from these tables to other tables in your database, and no dependencies from the code to unrelated entities - if that is the case I think (and there might be other opinions) this is the way to go.
However keep in mind that if you need some of those tables in another context you would have to put the corresponding entities into that context as well - it can get hard to understand that the same tables are present in multiple context or even have cross-dependencies between contexts. That should be avoided since it adds complexity.
I have a pluggable system management tool. The architecture of this kind of thing is well understood (interfaces, publish/ subscribe, ....). How about the data store though. What do people do?
I need plugins to be able to add new entities, extend existing entities, establish new relationships, etc.
My thoughts (SQL), not necessarily well thought out
each plugin simply extends the schema when they are installed. In the old days changing the schema was a big no-no; now databases are very relaxed about this
plugins have their own tables. If 2 of them have an entity (say) person, then there are 2 tables p1_person and p2_person
plugins have their own database
invent some sort of flexible scheme where the tables are softly typed. Maybe many attributes packed into a single attribute. The ultimate is to have one big table called data, with key of table name & column name and a single data value.
Not SQL
object DB. I have no experience with these. Anybody care to pass on experience. db4o for example. Can I change the 'schema' of objects as the app evolves
NO-SQL
this is 'where its at' at the moment. Most of these seem to be aimed slightly differently than my needs. Anybody want to pass on experience with these
Apologies for the open ended question
My suggestion is go read about the entity framework
a lot of the situations you are describing can be solved (very elegantly) using table inheritance.
Your idea of one big table called data makes the hamsters in my computer cry ;)
The general trend is away from weakly typed schemas because they cannot be debugged at compile time. What you get from something like entity framework is a strongly typed extenislbe schema that you can code against using linq.
Object databases:
like you i havent played with them massivley - however the time when i was considering them was a time when there was no good ORM for .net and writing ado.net code was slowly killing me.
as for NO-SQL these are databases that meet a performance need. SQL performs badly in situations here there are lots of small writes occuring. I say badly tounge in cheek - it performs very well but when you scale to millions of concurrent users everything changes. My understanding of no sql is that it is a non rationalised format designed for lots of small fast writes and reads. The scale of sites that use these is usually very large.
OK - in response
I am currently lucky enough to be on a green field project so i am using EF to generate my schema.
On non greenfield projects I use sql scripts to update my table structures. As for implementing table inheritance in sql its very easy once you know the concept, its essentially a one to many relationship with a constraint that it will only ever be 0-1.
I wouldn't write .net code that updates the database structure ... that sounds like a disaster waiting to happen to me.
Beginning to think i have misunderstood what you are looking for. I find databases to be second nature as I have spent so long with them.
I haven't found a replacement for being meticulous about script management.