I am currently working on an MVC 4 application. I am planning to implement a command query seperation pattern to enhance performance and the structure of the application. I am happy with my commands - which map my view models to my entities use then use nhibernate to save my data.
The commands and queries will be running off the same database.
I am a bit unsure of the best approach to manage my queries. In my last project I used Stored procedures for all of my reads/queries, then used automapper to map my IDataReaders to my ViewModels. This worked ok but the main problem was the turn around time of writing the stored procedures and also when the domain model changed the stored procedures got out of sync.
Therefore, ideally I would like something that auto generated the views or sprocs from my view models. But realistically, I cannot see a way of doing this. As the Sprocs/Views need some knowledge of potentially more that one table. So simply reflecting on the View model properties would not be enough.
I could auto generate a table for each view model, read this during development, then once the domain was stable and before we went to test convert these to views/sprocs?
So I guess what I am asking is:
Has anyone managed to solve the sproc/view auto generation problem I described above? (this would be my favourite outcome!) Or even better has designed a much more graceful solution!
Or is it more sensible to only implement raw ADO reads where they are absolutely necessary - i.e searches, and there dispense with the need for lots of sprocs/views.
But then still separate out my queries into a separate channel (but inside some of them they use NHibernate, whilst others use my ADO reader).
(p.s I have looked at the other stackoverflow CQS related questions and I hope mine is different enough to warrant this question)
What do stored procedures solve for you? Why can't you use NHibernate for reads too? Are the queries NHibernate produces that bad?
If performance of reads is crucial for you, and the shape of your viewmodels is very different from how you store your model - making the denormalization process to a viewmodel too heavy to do on the fly, you might have to consider completely splitting reads and writes.
When you write something, you can raise an event - often done asynchronously - on which subscribers listening can store data on the readside in such a way that it's optimal for reads (close to the shape of your viewmodel). This would make querying really fast.
Since a picture says more than a thousand words..
You can read a good introduction to CQRS here.
Related
I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.
I have a legacy database that I'd like to interact with Entity Framework.
The database is highly normalised for storing information about flights. In order to make it easier to work with some of the data, a number of SQL Views have been written to flatten data and to pivot certain multi-table joins into more logical information.
After quickly looking over this I see two problems with using Views in EF.
The Views contains lots and lots of Keys. Some quick googling seems to indicate I will need to manually edit the EDMX file to remove this info.
The Views don't have any relationships to the other table entities. These associations need to be manually added in order to link a View -> Table.
Both of these seem like major pain points when it comes to refreshing the Model from the DB, when teh DBA team make changes.
Is this just something you need to "put up with" when working with EF or are there any suggested patterns/practices to deal with these.
Mixing Table-Entities with View-Entities is ok and largely depends on your requirements.
My experience has been these are things you are going to have to deal with.
When I first started using Entity, I used views a lot because I was told I needed to use them. As I became more familiar with Entity I began to prefer the use of table-entities over view-entities; mainly because I felt I had more control. Views are ok when you are presenting read-only info, or as you described (flattend data, pivots, joins etc.); however, when your requirements change and you now have to add CRUD, you are going to have to use stored procedures or change your model to use table-entites anyway, so you might as well use table-entities from the start.
The Views contains lots and lots of Keys. Some quick googling seems to
indicate I will need to manually edit the EDMX file to remove this
info.
This wasn't ever really a problem for me. You can undo keys of the view-entity in the designer. If your talking about doing this for the view in the storage layer, then yes, you can, to make it work, but as soon as you update your model from the database, you are going to have to do this over again -- I wouldn't recommend doing this. You are better off working with your DBA to adjust the key constraints in the database.
The Views don't have any relationships to the other table entities.
These associations need to be manually added in order to link a View
-> Table.
This was often a problem for me. Sometimes you are able to add keys and create relationships without any problems, but often times you may have to change the keys and/or relationships in the db to make it work -- this depends on your requirements; you may have to deal with this even when using table-entities.
Hope this helps.
I've been in a similar situation as we transitioned into using Entity Framework.
The first step was to start with a blank EF model and add the tables when we created the domain service calls. This at least meant that the model wasn't crazy to start with! Then the plan was to try and not use views as much as possible and move that kind of logic into the domain service, where at least it could be tested, and slowly deprecate the CRUD stored procedures. It's worked fine and there haven't really been any major problems.
In practice there are still some views, mainly used for situations that need to be performant, Fortunately these views can be considered in isolation (for read only grids) and have been left as such in the model with no associations. Adding the keys in would I'm sure be annoying.
Editing the EDMX file is okay, but sometimes on a model refresh these changes can get lost. This has happened to me particularly when EF thinks a table is a view. And yes it's a pain and something that has just been put up with.
Our DBAs have created a pattern where our database layer is exposed to EF via views and CRUD stored procedures. The CRUD works against the view. All the views have the NOLOCK hint. From what I understand, NOLOCK is a dirty read, and that makes me nervous. Our databases are not high volume, but it seems like blanket NOLOCK is not very scalable while maintaining data integrity. I get that the decoupling is a good idea, but the problem there is we don't. Our externally exposed objects look just like our views which map 1 to 1 with our tables.
"If we want to change the underlying data model, we can." ... but we don't. I won't touch on what a PITA this all is from a VS/EF tooling viewpoint.
Is NOLOCK used in this situation bad? Since our database looks exactly like our class library, I think it makes sense to just get rid of the whole view/sproc layer and hit the DB direct from EF, does it?
Issuing a nolock is absolutely a dirty read. There are times that there is no impact from this, but in some scenarios you may have result sets with missing records or duplicates Itzik Ben-Gan has some Q&A regarding this topic. The reason for using stored procs to abstract your CRUD operations are pretty obvious when you want to do some storage optimizations after the project goes into maintenance mode. Think of the views as a way for you to not need to worry about that later. It can be easier for your DBA's to optimize the data access code as well without consuming your time as a developer. I cannot say that your DBA's are right or wrong based only on the data in this post. There are simply too many variables that may go into the decision. A blanket implementation of nolock being the correct option would be rare though. HTH
bob beauchemin blog has many good articles about gauging the strengths and weakness of ORM wrappers from an expert DB designer perspecive. Good to check out to learn wtf is actually going on when you use EF. Regarding using NOLOCK hint this will be good until it isn't ! As you seem to allready be aware when you scale to a certain extent you will run into all type of integrity issues but this depends on what you tolerance are for phantom reads, writes etc. Basically the more precise you want to be with your atomoticity the more of a bad idea it is.
I'm really late to the .Net game and struggling to learn ADO.Net. I prefer to learn how to do data access the "right way". Somewhere I've picked up on the idea that it's considered superior to manually code your own Connections, Data Adapters, DataSets, DataTables, and even command statements for updating, adding, and deleting rather than using Visual Studios data wizard. I understand from my reading that there are some things you can only do by writing your own command statements but it isn't completely clear to me what that might be.
Should I always code my own connections, data adapters, datasets and datatables? What about my update, insert, and delete command statements? How do I know when I should code those manually?
There is no right or wrong way. However I would suggest you first do things the "hard way" in that you write your own code for each of the data access routines you need. Of course that would mean you'll also need to know and understand SQL. Eventually you could use/build tools that generate all of your code just the way you need it.
Preferably you'll use stored procedures instead of SQL statements in code, because stored procedures provide an additional level of abstraction, abstracting your database schema from even your data layer and of course your business layer.
I'd used ADO.NET core (that is writing your own code for data access and such). I'd use DataSets/DataTable (if you have to) purely as in-memory data structures without using them to do automatic updates/deletes and the like. Stick to DataReaders to the extent possible converting them over to DTOs (for data retrieval methods). For data modification methods, your data layer should get DTOs as parameters (or simple data types as parameters if there are just one or two).
Personally I use tools to generate the data access layer code that uses ADO.NET core (and not EF or LINQ2SQL and such). That is my personal preference and depending on the size of your application it goes a very long way in towards performance as well as needing to have in-depth knowledge of only two things. Your database and SQL and C# code without also having to learn about the nuances of abstraction layers and specialized languages (in some cases).
In large projects (and teams) leaving the database schema and stored procedures to people specialized in that area becomes a necessity and requirement and in those cases using ADO.NET core also becomes a requirement.
On my blog I have posted an article where in I introduce a tool that generates all of the code. The tool and source code are available for download. The tool also generates code for strongly typed datareaders. That is under the covers you're using a DataReader while in code it looks/feels like a DTO in terms of strongly typed properties.
Data Access Layer CodeGen
DataReader Wrappers - TypeSafe
in my own experience is preferred to always use hard code instead using smart control wizard.
I think you should learn how its done under the covers first and then pick your own abstraction layer of which there are many.
LINQ to SQL does a great job of automating common Db tasks. All your basic CRUD (Create,Read,Update,Delete) operations will be much easier to code by using a DataContext dbml file. The code is much easier to write, does not rely on strings, is compatible with other ADO.NET commands (You can execute a direct DbCommand against your DataContext, and it is more highly optimized than anything most people will write (Especially a beginner!). You will save yourself a whole lot of time by using something like LINQ to SQL or another ORM. Unless your objective is pure learning, you would be best off by creating a working DataContext, and analyzing the source to see how it is working instead of teaching yourself ADO.NET. The fact that you are at a point where you need to ask this question, probably indicates that you will not add value to your application by writing your own boiler plate DB access code.
It looks like a lot of people are recommending that you hard code your DAL first, before you use an ORM like LINQ to SQL. I would just like to point out that the logic involved in this line of thinking would necessitate that we also learn to code with IL before writing C# code, build a computer before we use one, and sail across the ocean before we take an international air plane.
There's not really going to be a black-and-white answer for this, but in my experience, I've always been better off coding my own stuff. This has largely been because I'm just an anal-retentive obsessive-compulsive control freak, and I just don't trust wizards to write code the way I want it written. I'm sure that many people agree with me, just as I'm sure that many people disagree with me.
The fact that OR/Ms exist is plenty of proof to prove that you don't always need to roll your own code. The fact that it's not mandatory is also proof that you aren't compelled to use it.
Do whatever feels right and meets the needs of your solution and its time and budgetary constraints.
I have been developing many application and have been into confusion about using dataset.
Till date i dont use dataset and works into my application directly from my database using queries and procedures that runs on Database Engine.
But I would like to know, what is the good practice
Using Dataset ?
or
Working direclty on Database.
Plz try to give me certain cases also when to use dataset along with operation (Insert/Update)
can we set read/write lock on dataset with respect to our database
You should either embrace stored procedures, or make your database dumb. That means that you have no logic whatsoever in your db, only CRUD operations. If you go with the dumb database model, Datasets are bad. You are better off working with real objects so you can add business logic to them. This approach is more complicated than just operating directly on your database with stored procs, but you can manage complexity better as your system grows. If you have large system with lots of little rules, stored procedures become very difficult to manage.
In ye olde times before MVC was a mere twinkle in Haack's eye, it was jolly handy to have DataSet handle sorting, multiple relations and caching and whatnot.
Us real developers didn't care about such trivia as locks on the database. No, we had conflict resolution strategies that generally just stamped all over the most recent edits. User friendliness? < Pshaw >.
But in these days of decent generic collections, a plethora of ORMs and an awareness of separation of concerns they really don't have much place any more. It would be fair to say that whenever I've seen a DataSet recently I've replaced it. And not missed it.
As a rule of thumb, I would put logic that refers to data consistency, integrity etc. as close to that data as possible - i.e. in the database. Also, if I am having to fetch my data in a way that is interdependent (i.e. fetch from tables A, B and C where the relationship between A, B and C's contribution is known at request time), then it makes sense to save on callout overhead and do it one go, via a database object such as a function, procedure (as already pointed out by OMGPonies). For logic that is a level or two removed, it makes sense to have it where dealing with it "procedurally" is a bit more intuitive, such as in a dataset. Having said all that, rules of thumb are sometimes what their acronym infers...ROT!
In past .Net projects I've often done data imports/transformations (e.g. for bank transaction data files) in the database (one callout, all logic is encapsulated in in procedure and is transaction protected), but have "parsed" items from that same data in a second stage, in my .net code using datatables and the like (although these days I would most likely skip the dataset stage and work on them from a higher lever of abstraction, using class objects).
I have seen datasets used in one application very well, but that is in 7 years development on quite a few different applications (at least double figures).
There are so many best practices around these days that point twords developing with Objects rather than datasets for enterprise development. Objects along with an ORM like NHibernate or Entity Framework can be very powerfull and take a lot of the grunt work out of creating CRUD stored procedures. This is the way I favour developing applications as I can seperate business logic nicely this way in a domain layer.
That is not to say that datasets don't have their place, I am sure in certain circumstances they may be a better fit than Objects but for me I would need to be very sure of this before going with them.
I have also been wondering this when I never needed DataSets in my source code for months.
Actually, if your objects are O/R-mapped, and use serialization and generics, you would never need DataSets.
But DataSet has a great use in generating reports.
This is because, reports have no specific structure that can be or should be O/R-mapped.
I only use DataSets in tandem with reporting tools.