I have come from an environment where I was taught to use objects and employ OOP techniques where possible and I guess it has guided me down a particular road and influenced my product designs for quite some time.
Typically my data access layer will have a number of classes which map onto database tables, so if I need to display a list of Companies, I will have a 'Company' object and a database table called 'company'. The company object knows how to instantiate itself from a DataRow from the database read using a 'SELECT * FROM company WHERE id = x' type query. So when-ever I display a companies list I will populate a list of company objects and display them. If I need to display attributes of the company I already have the data loaded.
It has been mentioned that 'select *' is frowned on and my object approach can be inefficient, but I am having issues identifying another way of working with a database table and objects which would work if you only read specific fields - the object just wouldn't be populated.
Yes I could change the list to directly query just the required fields from the database and display them but that means my UI code would need to be more closely linked to the data access code - I personally like the degree of separation of having the object separating the layers.
Always willing to learn though - I work on my own so don't always get up to speed with the latest technologies or methodologies so any comments welcome.
I don't think I can show you a definitive solution to that, but I'll try pointing you on the right direction (since it's more of a theoretical question).
Depending on the design pattern you follow on your application, you could decouple the data access layer from the UI and still follow this rule of not fetching all columns when they are not necessary. I mean, chosing the right design pattern for an application can bring you this sort of easyness.
For example, maybe you could interpret the less detailed version of an object as an object itself (which honestly I don't think it would be a good approach).
Also, I'll comment that the very popular rails ORM ActiveRecord only fetches from DB when the data is used. Maybe you could use a similar logic to track not only when but which columns will be used so you could limit the query.
Related
I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.
This is a bit of difficult question to ask but any feedback at all is welcome.
Ill start by the background, I am a university student studying software engineering last year we covered c# and I got myself a job working in a software house coding prototype software in c# (their main language is c++ using QT) after producing the prototype it was given to some clients which have all passed back positive feedback.
Now I am looking at the app and thinking well I could use this as a showcase with my CV esp as the clients who used the software have said that they will sign something to reference it.
So if I am going to do that then I had better get it right and do it to the best I possibly can. so I have started to look at it and think where I can improve it and one of the ways in which I think that I can is the way it handles the database connections and the data along with it.
the app itself runs along side a MySQL server and there is 6 different schemas which it gets its data from.
I have written a class (called it databaseHandler) which has the mysqlconnection in it ( one question was about if the connection should remain open the whole time the app is running, or open it fire a query then close it etc) inside this class I have written a method which takes some arguments and creates its query string which it then does the whole mysqlDataReader = cmd.executeReader(), this then returns the reader back to where ever it was called from.
After speaking to a friend he mentioned it might be nice if the method returned the raw data and not the reader, therefore keeping all the database "stuff" away from the main app.
After playing around I managed to find a couple of tutorials on putting the reader data into arrays and arraylists and passing then back, also had a go at passing back an array list of hashtables - these methods obv mean that the dev must know the column names in order to find the correct data.
then I stumbled across a page which went on about creating a Class which had the attributes of the column names and created a list which you could pull your data from:
http://zensoftware.org/archives/248 is the link
so this made me think, in order to use this method, would I need to create 6 classes with the attributes of the columns of my tables ( a couple of tables has up to 10-15 columns)? or is there a better way for me to handle my data?
I am not really clued up on these things but if pointed in the right direction I am a very fast learner :)
Again I thank you for any input what so ever.
Vade
You have a lot of ideas that are very close but are pretty common problems, but good that you are actively thinking about how to handle them!
On the question about leaving the connection open for the whole program or only having it open during the actual query time. The common (and proper) way to do this is only have the connection open as much as you need it, so
MySqlConnection cn = new MySqlConnection(yourConnectionString);
//Execute your queries
cn.close();
This is better since you don't risk leaving open connections, or having transaction issues typing up databases and resources.
With the having just the data returned and not the actual datareader this is a good idea but by just returning the data as an ArrayList or whatever you are kind of losing the structure of the data a little.
A good way to do this would be to either have your class just take the datareader to populate it's data OR have the Data Layer just return an instance of your class after reading the data.
I believe that it would be an excellent approach if your data access class returned a custom class populated with data from the database. That would be object-oriented. Instead of, say, returning a DataSet or an array containing customer information, you would create a Customer class with properties. Then, when you retrieve the data from the database, you populate an instance of the Customer class with the data, and return it to the calling code.
A lot of the newer Microsoft technologies are focusing on making this task easier. Quite often, there are many more than 6 classes needed, and writing all that code can seem like drudgery. I would suggest that, if you are interested in learning about those newer approaches, and possibly adapting them to your own code, you can check out Linq to SQL and Entity Framework.
one question was about if the connection should remain open the whole
time the app is running, or open it fire a query then close it etc
You want to keep the connection open as little as possible. Therefore you should open on each data request and close it as soon as you are done. You should also dispose it but if your database stuff is inside a C# using statement that happens automatically.
As far as the larger question on how to return the data to your application you are on the right track. You typically want to hide the raw database from the rest of your application and mapping the raw data to other intermediate classes is the correct thing to do.
Now, how you do this mapping is a very large topic. Ideally you don't want to create classes that map one to one your tables/columns but rather provide your app a more app-friendly representation of the data (e.g. business objects rather than database tables.) For example, if your employee data is split in to or three tables for normalization purposes you can hide this complexity and present the information as a single Employee class that binds the data from the other tables together.
Abstracting away your data access code using objects is known as Object/Relational mapping. It's actually a much more complex task than it appears at first sight. There are several libraries, even in the framework itself, that already do very well what you're trying to implement.
If your needs are very simple, look into typed DataSets. They let you create the table classes in a designer and also generate objects that will do the loading and saving for you (given certain limitations)
If your needs are less simple, but still pretty simple, I recommend you take a look at Linq To SQL to see if it covers your needs, as it does table-class mapping in a very straightforward way and uses a more modern usage pattern than DataSets.
There are also more complex ORMs that allow you to define more complex mappings, like Entity Framework or nHibernate, but very often their complexity is not necessary.
Details like connection lifetime will then depend on your specific needs. Sometimes it's best to keep the connection open, if you have a lot of queries caused by user interaction, like is usually the case with a desktop app. Other times it's best to keep them as short as possible to avoid congestion, like the case of web apps.
Whichever technology you choose will likely end up guiding you onto a good set of practices for it, and the best you can do is try things out and see which works best for you.
I have created domain model and define entities, value objects, Services and so on. Now, my query is that i have an entity called "Company" which has around 20+ attributes and i want to bind a dropdownlist control in one of my page that require only two attributes i.e. company.Name, company.Id. Why should i use such a heavy entity having 20+ attributes to bind the dropdownlist.
Is there any way to handle this considering the performance in mind as well.
Thanks in advance.
Question is not that much DDD related. It's about performance.
As I do - I do not care if there is 1 or 20 attributes as long as those attributes does not come from separate tables. There isn't high difference if select picks up 1 or 20 fields, but there is noticeable difference if select starts to join other tables and there is highly noticeable difference when there's select n+1.
So - when I retrieve list of Company through my ORM in order to create selectlist, it is smart enough to run sql select only over Company table and lazy load other things if they are necessary (which aren't in this particular case).
Luckily - I'm not developing systems that demands ultra high performance so I don't care if it takes 1 or 20 fields. If I did - I doubt that I would use ORM anyway.
For some other persistence mechanisms - it might not be a problem at all. E.g. - if You are using document database, You can store/retrieve whole aggregate into one document cause it's schema less. Performance penalty goes down drastically.
In this scenario I would either consider a Reference Data Service returning lightweight representations of the domain model or preferably (for simplicity) implement caching. Either way remember this is only something you should really worry about after you've identified performance of the architecture to be an issue.
Create a generic key/value model and expose a method from your service that returns your company name / company id list. This is similar to other value type data you query your DB for (ex state code / state id).
DDD is not about performance. With DDD you should operate "aggregates" - consistent set of connected entities. Aggregates should be loaded from datatbase into application memory.
Ah-hoc queries (actually almolst all queries) considered in DDD as "Reporting", and should be done by separate mechanisms like denirmalized databases.
http://devlicio.us/blogs/casey/archive/2009/02/13/ddd-command-query-separation-as-an-architectural-concept.aspx
it's bullshit. Don't use DDD.
Write queries when you need. Use Linq to compose-decompose queries.
In many posts concerning this topic I come across very simple examples that do not answer my question.
Let's say a have a document table and user table. In DAL written in ADO.NET i have a method to retries all documents for some criteria. Now I the UI I have a case where I need to show this list along with the names of the creator.
Up to know I have it done with one method in DAL containig JOIN statement.
However eveytime I have such a complex method i have to do custom mapping to some object that doesn't mark 1:1 to DB.
Should it be put into another layer ? If so then I will have to resing from join query for iteration through results and querying each document author. . . which doen't make sense... (performance)
what is the best approach for such scenarios ?
For your ui my suggestion is to have a dto (a viewmodel for those mvp/mvc people) hold the user's data and the corresponding list of documents.
Custom mapping will always be present so I suggest you take a look at Automapper here to ease those mapping pains.
I ran into the same thing in the past while creating my own custom data access layers. You want your objects to map one to one to your db, but many times you just need to write one off custom functions to retrieve inner joining data. I would not put these custom actions into their own layer.
At times, what I have done was created a general class that was responsible for retrieving data for grids, combo boxes, etc, that joined information from a number of tables. This class would return custom objects containing the retrieved results. I you are not satisfied with a tool that performs automatic custom mapping for you, I would suggest creating your own auto mapping class builder utility.
As long as you split your app into data access, business, and UI layers I think you are headed in the right direction.
I'm looking for pointers and information here, I'll make this CW since I suspect it has no single one correct answer. This is for C#, hence I'll make some references to Linq below. I also apologize for the long post. Let me summarize the question here, and then the full question follows.
Summary: In a UI/BLL/DAL/DB 4-layered application, how can changes to the user interface, to show more columns (say in a grid), avoid leaking through the business logic layer and into the data access layer, to get hold of the data to display (assuming it's already in the database).
Let's assume a layered application with 3(4) layers:
User Interface (UI)
Business Logic Layer (BLL)
Data Access Layer (DAL)
Database (DB; the 4th layer)
In this case, the DAL is responsible for constructing SQL statements and executing them against the database, returning data.
Is the only way to "correctly" construct such a layer to just always do "select *"? To me that's a big no-no, but let me explain why I'm wondering.
Let's say that I want, for my UI, to display all employees that have an active employment record. By "active" I mean that the employment records from-to dates contains today (or perhaps even a date I can set in the user interface).
In this case, let's say I want to send out an email to all of those people, so I have some code in the BLL that ensures I haven't already sent out email to the same people already, etc.
For the BLL, it needs minimal amounts of data. Perhaps it calls up the data access layer to get that list of active employees, and then a call to get a list of the emails it has sent out. Then it joins on those and constructs a new list. Perhaps this could be done with the help of the data access layer, this is not important.
What's important is that for the business layer, there's really not much data it needs. Perhaps it just needs the unique identifier for each employee, for both lists, to match upon, and then say "These are the unique identifiers of those that are active, that you haven't already sent out an email to". Do I then construct DAL code that constructs SQL statements that only retrieve what the business layer needs? Ie. just "SELECT id FROM employees WHERE ..."?
What do I do then for the user interface? For the user, it would perhaps be best to include a lot more information, depending on why I want to send out emails. For instance, I might want to include some rudimentary contact information, or the department they work for, or their managers name, etc., not to say that I at least name and email address information to show.
How does the UI get that data? Do I change the DAL to make sure I return enough data back to the UI? Do I change the BLL to make sure that it returns enough data for the UI? If the object or data structures returned from the DAL back to the BLL can be sent to the UI as well, perhaps the BLL doesn't need much of a change, but then requirements of the UI impacts a layer beyond what it should communicate with. And if the two worlds operate on different data structures, changes would probably have to be done to both.
And what then when the UI is changed, to help the user even further, by adding more columns, how deep would/should I have to go in order to change the UI? (assuming the data is present in the database already so no change is needed there.)
One suggestion that has come up is to use Linq-To-SQL and IQueryable, so that if the DAL, which deals with what (as in what types of data) and why (as in WHERE-clauses) returned IQueryables, the BLL could potentially return those up to the UI, which could then construct a Linq-query that would retrieve the data it needs. The user interface code could then pull in the columns it needs. This would work since with IQuerables, the UI would end up actually executing the query, and it could then use "select new { X, Y, Z }" to specify what it needs, and even join in other tables, if necessary.
This looks messy to me. That the UI executes the SQL code itself, even though it has been hidden behind a Linq frontend.
But, for this to happen, the BLL or the DAL should not be allowed to close the database connections, and in an IoC type of world, the DAL-service might get disposed of a bit sooner than the UI code would like, so that Linq query might just end up with the exception "Cannot access a disposed object".
So I'm looking for pointers. How far off are we? How are you dealing with this? I consider the fact that changes to the UI will leak through the BLL and into the DAL a very bad solution, but right now it doesn't look like we can do better.
Please tell me how stupid we are and prove me wrong?
And note that this is a legacy system. Changing the database schema isn't in the scope for years yet, so a solution to use ORM objects that would essentially do the equivalent of "select *" isn't really an option. We have some large tables that we'd like to avoid pulling up through the entire list of layers.
This is not at all an easy problem to solve. I have seen many attempts (including the IQueryable approach you describe), but none that are perfect. Unfortunately we are still waiting for the perfect solution. Until then, we will have to make do with imperfection.
I completely agree that DAL concerns should not be allowed to leak through to upper layers, so an insulating BLL is necessary.
Even if you don't have the luxury of redefining the data access technology in your current project, it still helps to think about the Domain Model in terms of Persistence Ignorance. A corrolary of Persistence Ignorance is that each Domain Object is a self-contained unit that has no notion of stuff like database columns. It is best to enforce data integretiy as invariants in such objects, but this also means that an instantiated Domain Object will have all its constituent data loaded. It's an either-or proposition, so the key becomes to find a good Domain Model that ensures that each Domain Object hold (and must be loaded with) an 'appropriate' amount of data.
Too granular objects may lead to chatty DAL interfaces, but too coarse-grained objects may lead to too much irrelevant data being loaded.
A very important exercise is to analyze and correctly model the Domain Model's Aggregates so that they strike the right balance. The book Domain-Driven Design contains some very illuminating analyses of modeling Aggregates.
Another strategy which can be helpful in this regard is to aim to apply the Hollywood Principle as much as possible. The main problem you describe concerns Queries, but if you can shift your focus to be more Command-oriented, you may be able to define some more coarse-grained interfaces that doesn't require you to always load too much data.
I'm not aware of any simple solution to this challenge. There are techniques like the ones I described above that can help you address some of the issues, but in the end it is still an art that takes experience, skill and discipline.
Use the concept of a view model (or data transfer objects) that are UI consumption cases. It will be the job of the BLL to take these objects and if the data is incomplete, request additional data (which we call model). Then the BLL can make correct decisions about what view models to return. Don't let your model (data) specifics permeate to the UI.
UI <-- (viewmodel) ---> BLL <-- (model) --> Peristence/Data layers
This decoupling lets to scale you application better. The persistence independence I think just naturally falls out of this approach, as construction and specification of the view models could done flexibly in the BLL by using linq2ql or another orm technology.