For a large project, would it make sense to have one datacontext which maps out your database, which you can interact with from your classes?
Or would it make more sense to split this up into small datacontexts which are focused on the specific tasks within the database that will be required.
I'm curious as to the performance. It's my understanding that the datacontext itself is a very lightweight object, which only initializes its internal collections as they're required etc. Therfore dealing with a datacontext with many defintions but only two tables of data should be as fast as dealing with a special datacontext with only those two tables in it.
I also think you would benefit at the JIT time, as the first class to do data access will compile your dc which is now available to all classes.
I'm assuming you're asking about design versus runtime pattern. I general I'd say no:
While it may be possible to partition your database into multiple data contexts, then that would be desirable if and only if there was zero overlap between the two contexts.
Overlap is Bad
e.g. you have a WebsiteContext and an AdminContext. The WebsiteContext is for displaying Product and fulfilling Orders. A WebsiteUser is attached to an Order. The AdminContext is for your Staff members to process refunds for cancelled Orders, which also reference WebsiteUser. The AdminContext also needs to reset passwords and update other details for the WebsiteUser.
You're thinking of doing this because you don't want the website to process or even know about Returns
WebsiteContext
Product -- Order -- WebsiteUser
AdminContext
Staff -- Returns -- Order -- WebsiteUser
In the above, we can see we're duplicating many objects in the different Data Contexts. This smells bad, and it really indicates that artificially dividing the database into different data contexts is a the wrong decision. Do you have >2 databases after all, or just the one? The duplication violates the DRY principle (Dont Repeat Yourself) because WebsiteContext.WebsiteUser is not the same as AdminContext.WebsiteUser and in all likelihood the code is going to be messy when something needs to care which one they're referencing.
The Linq Data Context is just an OR mapper, and needs to be treated as a fancy black box that makes writing some of the data access code easier. Some linq demos make it look like you no longer need the other layers, but a program of any complexity still benefits from a layered design.
You're probably better off treating the Linq objects as just objects for easily transferring data and create a Domain layer that hides them as an implementation detail. Have a read of DDD - Domain Driven Design.
On their own, just using the Linq objects from the UI most resembles the Transaction Script pattern. In such a case you'd still benefit from having a logic layer that takes care of the details.
While you may not want the responsibility of a context to be so broad, the data context is just a representation of the database. It's not a security mechanism, and it can't prevent you from corrupting the data.
From a "repository pattern" angle, there is something to be said for having separate aggregates and minimizing the navigation properties between aggregates. I'm not 100% sure what that something is though... at the moment I'm considering using a moderately-sized dbml, with multiple repositories using it (the UI doesn't use the datacontexts directly - only the repository classes), with the navigation properties marked internal so the DAL can use them... Maybe...
Related
I've been learning DDD for the past few days and struggling to understand some core concepts of aggregate roots. Maybe somebody could give me push in the right direction and elaborate what's best practice in this scenario:
To make this example less complex, let's say we have a domain of two entities: restaurant and opening time. Restaurant is defined as aggregate root.
From what I understood in all the online examples (correct me if I'm mistaken), the aggregate root is eager loading all sub-entities every time I need an instance of it. So whenever I want to call call a method on the restaurant, all opening times are loaded (no matter if they are used or not).
In this example I want to verify that there is no intersection of opening times when a new time get's added to that restaurant. In this case the eager loading of every other opening time makes sense because I need to compare them with the existing ones.
BUT: this is kind of restricting because I know every time I want to add another collection of something (let's say restaurant images), the SQL load get's heavier and heavier even though most of the methods only require one of the collections.
I could think of two possible solutions:
Lazy loading
Lazy loading opening times / sub entities through entity framework property proxies. So the aggregate root can exist without eager loading them but whenever they are needed they can be accessed.
However everywhere I was searching for an answer I've red that lazy loading in aggregate roots is considered bad practice. Maybe somebody could explain why.
Smaller aggregate roots
Of course I could define the opening time itself as an aggregate root but then I need to take the business logic (in this case the verification of intersections) outside of the model.
In all the examples above I'm only talking about the command side (not querying or serializing).
Maybe I'm missing some fundamental ideas. How should the aggregate roots be organized in this example and why is lazy loading considered bad practice?
EDIT
Not sure why this question was closed because of "opinion based". I'm asking for best practice and why lazy loading is not in this case.
However everywhere I was searching for an answer I've red that lazy loading in aggregate roots is considered bad practice.
That is not exactly the case. The actual restriction is for lazy loading other aggregates, due to direct references, at aggregate roots; referring other aggregates with their identifiers is recommended instead. This restriction has very good reasons behind it.
References to aggregates unnecessarily increases application's memory footprint (retrieving an entity that is not going to be used in a transaction); in highly concurrent use cases, where locks are in effect, degrades application performance (unnecessarily locking an entity); hampers data partitioning (both aggregates need to be processed in the same data node).
Each aggregate root defines its own consistency boundary; each transaction is meant to ensure one aggregate's consistency. Update operations (or transactions) across aggregates, communicated through domain events, are supposed to be eventually consistent.
If you are holding direct reference to another aggregate, thereby necessitating lazy loading, and performing updates on them, you should rethink your design.
The choice in your scenario, as usual, depends on the business context or the domain it deals with. If you think OpeninigTime is a separate aggregate, with its own consistency boundary, you should hold only its id and publish domain events, containing the id, that the OpeningTime aggregate will handle by retrieving the appropriate aggregate. If, however, it is not the case (that seems to be more likely), you may very much hold reference, lazy load, and perform updates on it.
This problem classifies as a set validation problem and there's a few potential solutions depending on the domain needs.
Strong Consistency
Create an AR to protect the entire set. In the above example if such set has a reasonable length you may create a dedicated RestaurantSchedule AR. You may even partition such AR further into RestaurantWeekSchedule where you'd have 1 AR for each week. When you want to add/remove opening days it would create/load the AR of the given week. Another option could be to tweak the ORM to load only a subset of the collection e.g. schedule = scheduleRepo.loadForWeek(openingTime.week()); schedule.add(openingTime);. Optimistic locking would still allow to detect conflicts.
Enforce the rule in the DB. If you have a relational DB for instance you may have OpeningTime as an AR and then use unique constraints to prevent violations. The rule would end up living outside the domain, but it may be acceptable. Uniqueness rules aren't that interesting.
Eventual Consistency
A very common way to deal with set validation problems is to avoid trying to avoid the violation and focus on detecting and correcting it after the fact either through manual or automated compensating actions. This can be as simple as an exception report that displays overlapping entries or more complex such as a single-threaded listener to OpeningTimeAdded events that checks for conflicts and marks such entries for correction.
For your question why is lazy loading considered as bad practice:
Some issue I have with lazy loading is that you need to be constantly aware of potential performance issues. On the other hand However LL is a decent default behaviour for ORMs. It is great if you have long term running applications but it is pointless implementing it on top of your services. I think it depends a lot on the data you are using so its neither good or bad.
The Problem
How are large collections implemented in DDD that "feel" like they should be a part of the aggregate root, yet would be impractical if they were? Here are a few examples based on my domain.
Employee Aggregate Root
Announcements Collection
Direct Messages Collection
Product Aggregate Root
Stock Items Collection
etc. etc..
What I'm Thinking
I would like to keep the ability to navigate to these large collections from the aggregate root but since I'm wrapping my O/RM with Repositories lazy loading isn't really an option... Unless I implement lazy loading by injecting either the necessary repository. But I know from what I've read about DDD that domain entities should not know about any such repositories..
The other option would be to take the approach that any potentially large collection of entities in my domain is an Aggregate Root and should have its own repository with the required interface to get the collection of items by another aggregate root. Eg.
public interface IStockRepository
{
IEnumerable<StockItem> FetchByProduct(Product product);
// ...
}
This "I would like to keep the ability to navigate to these large collections from the aggregate root" ... is a smell. You seem pretty obsessed, if you don't mind me saying, with the structure of your aggregate and not with its behavior, what problem(s) it is solving, any invariants that come into play. Frankly, the feeling you have is misplaced. It's a residue of our structural, database oriented way of thinking.
In general, I'd say, one should not have these large collections in the first place. For one loading them will require resources (memory, cpu, bandwidth) better spent elsewhere. From a more functional perspective people tend to not deal with large amounts at once anyway, and even computers can do more work when you break things down into units of work. As such, try to stay away from large collections and always question "why" you'd need them in the first place.
An announcement could be its own aggregate, referring to the employee by its id, so we know who the announcement was about (or for?). If the announcements are targeted at groups of employees, you might want to look into what defines that group, and model it explicitly. A direct message could also be its own aggregate because it is probably a message from one person to another. One could say the employee has the role of being a message recipient and/or sender. Again, referring to the employee aggregate by id might suffice. A stock item might be treated individually and refer to the product it represents within the stock by its productid. What is the behavior of an employee, an announcement, a direct message, a product, a stockitem? How and when does changing the state of its collaborators affect them and really, why is that? It's a means to a root cause. Find it.
All that said, there are times when you can bend the rules a bit, but they should be few.
Take a look at the Forum DDD example from Vaughn Vernon. He modeled the large collections out of the aggregate root. Creation is done by a factory method on the aggregate to keep control of some thing, like a dicussion can not be created when the Forum is closed. Actions are done through the AR Forum (like startDiscussion and moderatePost).
The method returns an entity (Post) that need to be saved in a separate repository (PostRepository) by the application service. Now you can have large collections without the need to load every time.
https://github.com/VaughnVernon/IDDD_Samples/tree/master/iddd_collaboration/src/main/java/com/saasovation/collaboration/domain/model/forum
I seem to be missing something and extensive use of google didn't help to improve my understanding...
Here is my problem:
I like to create my domain model in a persistence ignorant manner, for example:
I don't want to add virtual if I don't need it otherwise.
I don't like to add a default constructor, because I like my objects to always be fully constructed. Furthermore, the need for a default constructor is problematic in the context of dependency injection.
I don't want to use overly complicated mappings, because my domain model uses interfaces or other constructs not readily supported by the ORM.
One solution to this would be to have separate domain objects and data entities. Retrieval of the constructed domain objects could easily be solved using the repository pattern and building the domain object from the data entity returned by the ORM. Using AutoMapper, this would be trivial and not too much code overhead.
But I have one big problem with this approach: It seems that I can't really support lazy loading without writing code for it myself. Additionally, I would have quite a lot of classes for the same "thing", especially in the extended context of WCF and UI:
Data entity (mapped to the ORM)
Domain model
WCF DTO
View model
So, my question is: What am I missing? How is this problem generally solved?
UPDATE:
The answers so far suggest what I already feared: It looks like I have two options:
Make compromises on the domain model to match the prerequisites of the ORM and thus have a domain model the ORM leaks into
Create a lot of additional code
UPDATE:
In addition to the accepted answer, please see my answer for concrete information on how I solved those problems for me.
I would question that matching the prereqs of an ORM is necessarily "making compromises". However, some of these are fair points from the standpoint of a highly SOLID, loosely-coupled architecture.
An ORM framework exists for one sole reason; to take a domain model implemented by you, and persist it into a similar DB structure, without you having to implement a large number of bug-prone, near-impossible-to-unit-test SQL strings or stored procedures. They also easily implement concepts like lazy-loading; hydrating an object at the last minute before that object is needed, instead of building a large object graph yourself.
If you want stored procs, or have them and need to use them (whether you want to or not), most ORMs are not the right tool for the job. If you have a very complex domain structure such that the ORM cannot map the relationship between a field and its data source, I would seriously question why you are using that domain and that data source. And if you want 100% POCO objects, with no knowledge of the persistence mechanism behind, then you will likely end up doing an end run around most of the power of an ORM, because if the domain doesn't have virtual members or child collections that can be replaced with proxies, then you are forced to eager-load the entire object graph (which may well be impossible if you have a massive interlinked object graph).
While ORMs do require some knowledge in the domain of the persistence mechanism in terms of domain design, an ORM still results in much more SOLID designs, IMO. Without an ORM, these are your options:
Roll your own Repository that contains a method to produce and persist every type of "top-level" object in your domain (a "God Object" anti-pattern)
Create DAOs that each work on a different object type. These types require you to hard-code the get and set between ADO DataReaders and your objects; in the average case a mapping greatly simplifies the process. The DAOs also have to know about each other; to persist an Invoice you need the DAO for the Invoice, which needs a DAO for the InvoiceLine, Customer and GeneralLedger objects as well. And, there must be a common, abstracted transaction control mechanism built into all of this.
Set up an ActiveRecord pattern where objects persist themselves (and put even more knowledge about the persistence mechanism into your domain)
Overall, the second option is the most SOLID, but more often than not it turns into a beast-and-two-thirds to maintain, especially when dealing with a domain containing backreferences and circular references. For instance, for fast retrieval and/or traversal, an InvoiceLineDetail record (perhaps containing shipping notes or tax information) might refer directly to the Invoice as well as the InvoiceLine to which it belongs. That creates a 3-node circular reference that requires either an O(n^2) algorithm to detect that the object has been handled already, or hard-coded logic concerning a "cascade" behavior for the backreference. I've had to implement "graph walkers" before; trust me, you DO NOT WANT to do this if there is ANY other way of doing the job.
So, in conclusion, my opinion is that ORMs are the least of all evils given a sufficiently complex domain. They encapsulate much of what is not SOLID about persistence mechanisms, and reduce knowledge of the domain about its persistence to very high-level implementation details that break down to simple rules ("all domain objects must have all their public members marked virtual").
In short - it is not solved
(here goes additional useless characters to post my awesome answer)
All good points.
I don't have an answer (but the comment got too long when I decided to add something about stored procs) except to say my philosophy seems to be identical to yours and I code or code generate.
Things like partial classes make this a lot easier than it used to be in the early .NET days. But ORMs (as a distinct "thing" as opposed to something that just gets done in getting to and from the database) still require a LOT of compromises and they are, frankly, too leaky of an abstraction for me. And I'm not big on having a lot of dupe classes because my designs tend to have a very long life and change a lot over the years (decades, even).
As far as the database side, stored procs are a necessity in my view. I know that ORMs support them, but the tendency is not to do so by most ORM users and that is a huge negative for me - because they talk about a best practice and then they couple to a table-based design even if it is created from a code-first model. Seems to me they should look at an object datastore if they don't want to use a relational database in a way which utilizes its strengths. I believe in Code AND Database first - i.e. model the database and the object model simultaneously back and forth and then work inwards from both ends. I'm going to lay it out right here:
If you let your developers code ORM against your tables, your app is going to have problems being able to live for years. Tables need to change. More and more people are going to want to knock up against those entities, and now they all are using an ORM generated from tables. And you are going to want to refactor your tables over time. In addition, only stored procedures are going to give you any kind of usable role-based manageability without dealing with every tabl on a per-column GRANT basis - which is super-painful. If you program well in OO, you have to understand the benefits of controlled coupling. That's all stored procedures are - USE THEM so your database has a well-defined interface. Or don't use a relational database if you just want a "dumb" datastore.
Have you looked at the Entity Framework 4.1 Code First? IIRC, the domain objects are pure POCOs.
this what we did on our latest project, and it worked out pretty well
use EF 4.1 with virtual keywords for our business objects and have our own custom implementation of T4 template. Wrapping the ObjectContext behind an interface for repository style dataaccess.
using automapper to convert between Bo To DTO
using autoMapper to convert between ViewModel and DTO.
you would think that viewmodel and Dto and Business objects are same thing, and they might look same, but they have a very clear seperation in terms of concerns.
View Models are more about UI screen, DTO is more about the task you are accomplishing, and Business objects primarily concerned about the domain
There are some comprimises along the way, but if you want EF, then the benfits outweigh things that you give up
Over a year later, I have solved these problems for me now.
Using NHibernate, I am able to map fairly complex Domain Models to reasonable database designs that wouldn't make a DBA cringe.
Sometimes it is needed to create a new implementation of the IUserType interface so that NHibernate can correctly persist a custom type. Thanks to NHibernates extensible nature, that is no big deal.
I found no way to avoid adding virtual to my properties without loosing lazy loading. I still don't particularly like it, especially because of all the warnings from Code Analysis about virtual properties without derived classes overriding them, but out of pragmatism, I can now live with it.
For the default constructor I also found a solution I can live with. I add the constructors I need as public constructors and I add an obsolete protected constructor for NHibernate to use:
[Obsolete("This constructor exists because of NHibernate. Do not use.")]
protected DataExportForeignKey()
{
}
I'm wrestling with a design and trying to figure out the best way of approaching it.
We have many tables, and in a current LinqToSql implementation, our DBML is many megs in size, very unwieldy. I want to avoid recreating this situation if I can. We decide our connection string on a per user basis, so it got very difficult to make separate dbmls for different groups of tables.
I'm set on using Entity Framework, and although we don't need the Code First elements, I'm liking the lightweight code without all the generation and we don't need the visual mapping so I was thinking of generating the code files for all the tables and then adding them into a DataContext as DbSets.
This got me thinking about best practice here, and I wanted to ask the question;
Is it wise to create a DataContext for every group of tables you want to use. I.e. I'm going to have a module, it will be responsible for gathering data from 5 tables, it doesn't need every single table in the database, just 5. Do I create a DbContext that includes these 5 tables. If I need more in the future I can add them in, but it's lightweight.
While you may have a separate context for each grouping of tables, if your model is that large, or your domains that disparate, you may want to look into adding a layer of abstraction. By this, I mean having a single context that encompasses your whole model, then adding something along the lines of the repository pattern. This is a decent write-up on accomplishing this with EF.
By doing this, would you be essentially accomplishing two goals: abstracting out your data tier, thus freeing up implementation concerns; and, allowing your developers to work with just the entities they need, possibly grouped by aggregate root.
One thing I would like to make clear though. I am not necessarily suggesting that you go with a specific end-to-end architecture (i.e. DDD). What I am trying to do here is suggest a few patterns that will give you the flexibility to allow you to make mistakes (fail gracefully) while still making progress with your project.
You can certainly do this. You just add tables to the edmx model just as in Linq2SQL so by just adding the 5 tables you need you'll save on having any overhead for entity tracking for the other untracked tables. Entity Framework nicely adds 2-way Navigation Properties which Linq2SQL doesn't have too. I'd recommend using EF instead of Linq2SQL.
There is nothing inherently bad about a large DBML model, the performance impact should be negligible in EF.
On the other hand in my opinion reducing complexity also applies to Entity Framework - if your code only needs 5 tables from the database by all means create a separate context that only has the entities for those 5 tables. By factoring out completely independent tables into separate contexts you are expressing this separation in clear way - there are no dependencies from these tables to other tables in your database, and no dependencies from the code to unrelated entities - if that is the case I think (and there might be other opinions) this is the way to go.
However keep in mind that if you need some of those tables in another context you would have to put the corresponding entities into that context as well - it can get hard to understand that the same tables are present in multiple context or even have cross-dependencies between contexts. That should be avoided since it adds complexity.
I'm looking for pointers and information here, I'll make this CW since I suspect it has no single one correct answer. This is for C#, hence I'll make some references to Linq below. I also apologize for the long post. Let me summarize the question here, and then the full question follows.
Summary: In a UI/BLL/DAL/DB 4-layered application, how can changes to the user interface, to show more columns (say in a grid), avoid leaking through the business logic layer and into the data access layer, to get hold of the data to display (assuming it's already in the database).
Let's assume a layered application with 3(4) layers:
User Interface (UI)
Business Logic Layer (BLL)
Data Access Layer (DAL)
Database (DB; the 4th layer)
In this case, the DAL is responsible for constructing SQL statements and executing them against the database, returning data.
Is the only way to "correctly" construct such a layer to just always do "select *"? To me that's a big no-no, but let me explain why I'm wondering.
Let's say that I want, for my UI, to display all employees that have an active employment record. By "active" I mean that the employment records from-to dates contains today (or perhaps even a date I can set in the user interface).
In this case, let's say I want to send out an email to all of those people, so I have some code in the BLL that ensures I haven't already sent out email to the same people already, etc.
For the BLL, it needs minimal amounts of data. Perhaps it calls up the data access layer to get that list of active employees, and then a call to get a list of the emails it has sent out. Then it joins on those and constructs a new list. Perhaps this could be done with the help of the data access layer, this is not important.
What's important is that for the business layer, there's really not much data it needs. Perhaps it just needs the unique identifier for each employee, for both lists, to match upon, and then say "These are the unique identifiers of those that are active, that you haven't already sent out an email to". Do I then construct DAL code that constructs SQL statements that only retrieve what the business layer needs? Ie. just "SELECT id FROM employees WHERE ..."?
What do I do then for the user interface? For the user, it would perhaps be best to include a lot more information, depending on why I want to send out emails. For instance, I might want to include some rudimentary contact information, or the department they work for, or their managers name, etc., not to say that I at least name and email address information to show.
How does the UI get that data? Do I change the DAL to make sure I return enough data back to the UI? Do I change the BLL to make sure that it returns enough data for the UI? If the object or data structures returned from the DAL back to the BLL can be sent to the UI as well, perhaps the BLL doesn't need much of a change, but then requirements of the UI impacts a layer beyond what it should communicate with. And if the two worlds operate on different data structures, changes would probably have to be done to both.
And what then when the UI is changed, to help the user even further, by adding more columns, how deep would/should I have to go in order to change the UI? (assuming the data is present in the database already so no change is needed there.)
One suggestion that has come up is to use Linq-To-SQL and IQueryable, so that if the DAL, which deals with what (as in what types of data) and why (as in WHERE-clauses) returned IQueryables, the BLL could potentially return those up to the UI, which could then construct a Linq-query that would retrieve the data it needs. The user interface code could then pull in the columns it needs. This would work since with IQuerables, the UI would end up actually executing the query, and it could then use "select new { X, Y, Z }" to specify what it needs, and even join in other tables, if necessary.
This looks messy to me. That the UI executes the SQL code itself, even though it has been hidden behind a Linq frontend.
But, for this to happen, the BLL or the DAL should not be allowed to close the database connections, and in an IoC type of world, the DAL-service might get disposed of a bit sooner than the UI code would like, so that Linq query might just end up with the exception "Cannot access a disposed object".
So I'm looking for pointers. How far off are we? How are you dealing with this? I consider the fact that changes to the UI will leak through the BLL and into the DAL a very bad solution, but right now it doesn't look like we can do better.
Please tell me how stupid we are and prove me wrong?
And note that this is a legacy system. Changing the database schema isn't in the scope for years yet, so a solution to use ORM objects that would essentially do the equivalent of "select *" isn't really an option. We have some large tables that we'd like to avoid pulling up through the entire list of layers.
This is not at all an easy problem to solve. I have seen many attempts (including the IQueryable approach you describe), but none that are perfect. Unfortunately we are still waiting for the perfect solution. Until then, we will have to make do with imperfection.
I completely agree that DAL concerns should not be allowed to leak through to upper layers, so an insulating BLL is necessary.
Even if you don't have the luxury of redefining the data access technology in your current project, it still helps to think about the Domain Model in terms of Persistence Ignorance. A corrolary of Persistence Ignorance is that each Domain Object is a self-contained unit that has no notion of stuff like database columns. It is best to enforce data integretiy as invariants in such objects, but this also means that an instantiated Domain Object will have all its constituent data loaded. It's an either-or proposition, so the key becomes to find a good Domain Model that ensures that each Domain Object hold (and must be loaded with) an 'appropriate' amount of data.
Too granular objects may lead to chatty DAL interfaces, but too coarse-grained objects may lead to too much irrelevant data being loaded.
A very important exercise is to analyze and correctly model the Domain Model's Aggregates so that they strike the right balance. The book Domain-Driven Design contains some very illuminating analyses of modeling Aggregates.
Another strategy which can be helpful in this regard is to aim to apply the Hollywood Principle as much as possible. The main problem you describe concerns Queries, but if you can shift your focus to be more Command-oriented, you may be able to define some more coarse-grained interfaces that doesn't require you to always load too much data.
I'm not aware of any simple solution to this challenge. There are techniques like the ones I described above that can help you address some of the issues, but in the end it is still an art that takes experience, skill and discipline.
Use the concept of a view model (or data transfer objects) that are UI consumption cases. It will be the job of the BLL to take these objects and if the data is incomplete, request additional data (which we call model). Then the BLL can make correct decisions about what view models to return. Don't let your model (data) specifics permeate to the UI.
UI <-- (viewmodel) ---> BLL <-- (model) --> Peristence/Data layers
This decoupling lets to scale you application better. The persistence independence I think just naturally falls out of this approach, as construction and specification of the view models could done flexibly in the BLL by using linq2ql or another orm technology.