I've been learning DDD for the past few days and struggling to understand some core concepts of aggregate roots. Maybe somebody could give me push in the right direction and elaborate what's best practice in this scenario:
To make this example less complex, let's say we have a domain of two entities: restaurant and opening time. Restaurant is defined as aggregate root.
From what I understood in all the online examples (correct me if I'm mistaken), the aggregate root is eager loading all sub-entities every time I need an instance of it. So whenever I want to call call a method on the restaurant, all opening times are loaded (no matter if they are used or not).
In this example I want to verify that there is no intersection of opening times when a new time get's added to that restaurant. In this case the eager loading of every other opening time makes sense because I need to compare them with the existing ones.
BUT: this is kind of restricting because I know every time I want to add another collection of something (let's say restaurant images), the SQL load get's heavier and heavier even though most of the methods only require one of the collections.
I could think of two possible solutions:
Lazy loading
Lazy loading opening times / sub entities through entity framework property proxies. So the aggregate root can exist without eager loading them but whenever they are needed they can be accessed.
However everywhere I was searching for an answer I've red that lazy loading in aggregate roots is considered bad practice. Maybe somebody could explain why.
Smaller aggregate roots
Of course I could define the opening time itself as an aggregate root but then I need to take the business logic (in this case the verification of intersections) outside of the model.
In all the examples above I'm only talking about the command side (not querying or serializing).
Maybe I'm missing some fundamental ideas. How should the aggregate roots be organized in this example and why is lazy loading considered bad practice?
EDIT
Not sure why this question was closed because of "opinion based". I'm asking for best practice and why lazy loading is not in this case.
However everywhere I was searching for an answer I've red that lazy loading in aggregate roots is considered bad practice.
That is not exactly the case. The actual restriction is for lazy loading other aggregates, due to direct references, at aggregate roots; referring other aggregates with their identifiers is recommended instead. This restriction has very good reasons behind it.
References to aggregates unnecessarily increases application's memory footprint (retrieving an entity that is not going to be used in a transaction); in highly concurrent use cases, where locks are in effect, degrades application performance (unnecessarily locking an entity); hampers data partitioning (both aggregates need to be processed in the same data node).
Each aggregate root defines its own consistency boundary; each transaction is meant to ensure one aggregate's consistency. Update operations (or transactions) across aggregates, communicated through domain events, are supposed to be eventually consistent.
If you are holding direct reference to another aggregate, thereby necessitating lazy loading, and performing updates on them, you should rethink your design.
The choice in your scenario, as usual, depends on the business context or the domain it deals with. If you think OpeninigTime is a separate aggregate, with its own consistency boundary, you should hold only its id and publish domain events, containing the id, that the OpeningTime aggregate will handle by retrieving the appropriate aggregate. If, however, it is not the case (that seems to be more likely), you may very much hold reference, lazy load, and perform updates on it.
This problem classifies as a set validation problem and there's a few potential solutions depending on the domain needs.
Strong Consistency
Create an AR to protect the entire set. In the above example if such set has a reasonable length you may create a dedicated RestaurantSchedule AR. You may even partition such AR further into RestaurantWeekSchedule where you'd have 1 AR for each week. When you want to add/remove opening days it would create/load the AR of the given week. Another option could be to tweak the ORM to load only a subset of the collection e.g. schedule = scheduleRepo.loadForWeek(openingTime.week()); schedule.add(openingTime);. Optimistic locking would still allow to detect conflicts.
Enforce the rule in the DB. If you have a relational DB for instance you may have OpeningTime as an AR and then use unique constraints to prevent violations. The rule would end up living outside the domain, but it may be acceptable. Uniqueness rules aren't that interesting.
Eventual Consistency
A very common way to deal with set validation problems is to avoid trying to avoid the violation and focus on detecting and correcting it after the fact either through manual or automated compensating actions. This can be as simple as an exception report that displays overlapping entries or more complex such as a single-threaded listener to OpeningTimeAdded events that checks for conflicts and marks such entries for correction.
For your question why is lazy loading considered as bad practice:
Some issue I have with lazy loading is that you need to be constantly aware of potential performance issues. On the other hand However LL is a decent default behaviour for ORMs. It is great if you have long term running applications but it is pointless implementing it on top of your services. I think it depends a lot on the data you are using so its neither good or bad.
Related
I am creating a web application on the top of ASP.NET 6 framework. I am trying to figure out the best ORM to use for this project. I am leaning toward Entity Framework for the following reason
I'll be able to use LINQ to write my queries
I'll be able to access my relations easily and directly using native C# model.
Here is where the complication starts. This app will be connecting to a very large database with over 500 tables. Also, the app is going to be broken down into many small logical areas so it's easy for me to maintain it.
If Entity Framework is the way to go, how should I setup the DbContext so I can manage 500+ DbSet and the relations? In other words, should I create a single DbContext for the entire app even when my app is broken down into multiple Areas? Or should I create a DbContext for each area? But if I do that, what if I need to establish relation across multiple areas? For example, X model in X-area need to create a relation to B model in B-area and C model in C-area? I thought about introducing DbContext inheritance where CAreaDbContext would inherit from BAreaDbContext which inherits from AAreaDbContext but that would break real quick.
Is Entity Framework if the right framework for a large database app? If so, how can I manage the DbContext across multiple areas? If not, what would be the alternative to use without having to write plain SQL queries?
EF is perfectly fine for large databases. When mapping a large number of tables and relationships there is a single-time startup cost for the very first query as EF initializes and validates its mapping, but this is a static cost for an application, not each time a DbContext is initialized.
You can split the application across several DbContexts to help make organizing entities more logical and reduce those initial setup costs. This is generally referred to as using Bounded Contexts if you want to search up examples. These typically organize your application down to aggregate roots or top-level entities with everything else falling under those aggregates or serving as lookups, etc. Entities can be registered with multiple DbContexts, though you should aim to ensure that one aggregate root is nominated for being responsible for editing and creating a given entity.
The most important details to consider with EF and areas of performance and avoiding unwanted/unexpected behaviour would be to ensure you generally don't load more data than you need through the entities, more often than you need to.
Some general advice would include:
Absolutely AVOID the temptation to use the Generic Repository pattern with EF. Non-generic Repositories are great to facilitate unit testing or centralize important, common rules/validation, but Generic flavours lead to inefficient and expensive, or overly complex code, usually both.
Keep DbContext lifetimes as short as possible. For Web applications this should be kept no longer than the Request length (when using an IoC container for instance) or shorter. Worst case, use using blocks to scope your DbContext. The longer a DbContext is kept alive, the more entities it tracks, and the more it tracks, the more it needs to sift through looking for references when loading other entities that might have navigation properties and the slower it gets. Long-lived DbContexts can also get "poisoned" when you have an issue attempting to save entity changes. Those invalid entities will remain tracked by the DbContext and interfere with future unrelated SaveChanges calls until they are removed (Detached) or corrected.
Gain an understanding of Projection using Select or AutoMapper's ProjectTo method. Loading entire entity graphs will get expensive, especially if the DbContext is left to track all of those instances. Projection down to ViewModels/DTOs help ensure that only as much data is needed is ever loaded and transmitted and makes it crystal clear what is being passed around. (As opposed to passing detached entities, or worse, partially filled detached entities)
Understand IQueryable and everything that Linq can bring to working with the data. EF query building is extremely valuable, so you can leverage sorting, filtering, pagination, projection, as well as scenarios to get Counts and check existence (.Any()) all without fetching a ton of data via Entities. See point #1 to avoid falling into this trap.
Use ToList/ToListAsync sparingly and be aware that any logic you feed EF in Linq expressions needs to be able to translated down to SQL. Sometimes you will find yourself trying to build a query where EF complains that it cannot evaluate your expression. Things like calling private methods / unmapped properties. Adding a ToList before the expression will seem like a magic fix, forcing a client-side evaluation. This is an expensive operation as you are effectively fetching (and typically tracking) all entities up to that point then continuing in memory. This gets expensive for memory use.
Asynchronous methods are not a silver bullet and does not make queries faster. Awaiting asynchronous EF methods is very useful when you have queries that are going to take a while to run, or be called extremely often. My advice is to default to synchronous methods and test run your code against production-like volumes as early as possible. I use 250ms as a threshold, but pick something acceptable to you and profile your queries. Anything over that threshold is something that would likely benefit from being made asynchronous. Typically things like searches, especially ones that involve text match searches are good candidates as these can be a bit slow and are generally run fairly frequently by several users at a time. The same goes for any operation that might get called a lot through the course of an application by many users at the same time. async/await doesn't make queries faster, they make them slightly slower, but they do make your server more responsive by not hanging the request until the query finishes. Using this by default makes your code a touch slower and a bit tougher to debug for no real benefit. (As it can easily be introduced as needed.)
Profile your queries. With traditional data access you would create your schema and write your access queries (Sprocs etc.) creating indexes as you go. With EF building your queries, indexing becomes more of a reactionary process where you might add your typical indexes, but should look at the queries being run in a production-like scenario to refine indexes based on high-volume queries that EF is building. This also provides key insight into other inefficiencies that might creep into your queries, as well as performance problems like lazy loading being tripped. Expensive queries should be investigated and optimized where possible.
Prepare to employ things like queuing for truly expensive queries. Systems will often call for things like Reports and data exports or just really expensive query options. Aim to set reasonable expectations by default so for instance avoiding things like string Contains() in text searches opting for string StartsWith(). Where you do need to support expensive queries, build a mechanism to allow users/processes to queue the query details as a request and employ a background worker/pool to pick up and process these requests. The temptation might be to just employ async/await here but the important thing is to avoid situations where too many of these queries are kicked off at once. Queries like this will "touch" a lot of data leading to locks and deadlocks in a system. Users have a bad tendency to repeatedly kick off actions when it looks like one isn't responding which compounds the problem on the back-end.
Are there any frameworks out there for MongoDB in C# that can automatically map Document Relations? I'm now talking about a model or "schema" that is purely defined by documents themselves and not by objects in .Net or any other external schema for that matter.
Think dynamic objects / bsondocuments that can automatically lazy-load relations between other documents.
I have several ideas how to solve this myself however if there already exist any frameworks or perhaps BsonDocument extensions (how I intended to solve this myself) this would lessen the need to add complexity to the project I'm working at itself.
The question is largely off-topic ('are there frameworks'), but I'd like to challenge the idea in itself:
this would lessen the need to add complexity to the project I'm working at itself.
I think it would merely hide complexity by moving it to a part of the code that knows nothing about your functional or non-functional requirements. Combined with a database that has no constraints except unique that doesn't sound like a good idea.
I'd recommend to stay away from lazy loading as an almost general rule, because it makes it impossible to tell whether
an operation is super costly (database call) or a mere memory lookup
the properties' state will be fetched on access, or is cached, thus hiding the key aspect of serialization from the user.
In other words: I'd stay away from the idea, or use something like EF with whatever database for it. If you don't care about your serialization, use a well-tested commonplace solution.
The Problem
How are large collections implemented in DDD that "feel" like they should be a part of the aggregate root, yet would be impractical if they were? Here are a few examples based on my domain.
Employee Aggregate Root
Announcements Collection
Direct Messages Collection
Product Aggregate Root
Stock Items Collection
etc. etc..
What I'm Thinking
I would like to keep the ability to navigate to these large collections from the aggregate root but since I'm wrapping my O/RM with Repositories lazy loading isn't really an option... Unless I implement lazy loading by injecting either the necessary repository. But I know from what I've read about DDD that domain entities should not know about any such repositories..
The other option would be to take the approach that any potentially large collection of entities in my domain is an Aggregate Root and should have its own repository with the required interface to get the collection of items by another aggregate root. Eg.
public interface IStockRepository
{
IEnumerable<StockItem> FetchByProduct(Product product);
// ...
}
This "I would like to keep the ability to navigate to these large collections from the aggregate root" ... is a smell. You seem pretty obsessed, if you don't mind me saying, with the structure of your aggregate and not with its behavior, what problem(s) it is solving, any invariants that come into play. Frankly, the feeling you have is misplaced. It's a residue of our structural, database oriented way of thinking.
In general, I'd say, one should not have these large collections in the first place. For one loading them will require resources (memory, cpu, bandwidth) better spent elsewhere. From a more functional perspective people tend to not deal with large amounts at once anyway, and even computers can do more work when you break things down into units of work. As such, try to stay away from large collections and always question "why" you'd need them in the first place.
An announcement could be its own aggregate, referring to the employee by its id, so we know who the announcement was about (or for?). If the announcements are targeted at groups of employees, you might want to look into what defines that group, and model it explicitly. A direct message could also be its own aggregate because it is probably a message from one person to another. One could say the employee has the role of being a message recipient and/or sender. Again, referring to the employee aggregate by id might suffice. A stock item might be treated individually and refer to the product it represents within the stock by its productid. What is the behavior of an employee, an announcement, a direct message, a product, a stockitem? How and when does changing the state of its collaborators affect them and really, why is that? It's a means to a root cause. Find it.
All that said, there are times when you can bend the rules a bit, but they should be few.
Take a look at the Forum DDD example from Vaughn Vernon. He modeled the large collections out of the aggregate root. Creation is done by a factory method on the aggregate to keep control of some thing, like a dicussion can not be created when the Forum is closed. Actions are done through the AR Forum (like startDiscussion and moderatePost).
The method returns an entity (Post) that need to be saved in a separate repository (PostRepository) by the application service. Now you can have large collections without the need to load every time.
https://github.com/VaughnVernon/IDDD_Samples/tree/master/iddd_collaboration/src/main/java/com/saasovation/collaboration/domain/model/forum
I am looking to see what approaches people might have taken to detect changes in entities that are a part of their aggregates. I have something that works, but I am not crazy about it. Basically, my repository is responsible for determining if the state of an aggregate root has changed. Let's assume that I have an aggregate root called Book and an entity called Page within the aggregate. A Book contains one or more Page entities, stored in a Pages collection.
Primarily, insert vs. update scenarios are done by inspecting the aggregate root and its entities to determine the presence of a key. If the key is present, it is presumed that the object has been, at one time, saved to the underlying data source. This makes it a candidate for an update; but it is not definitive based upon that alone for the entities. With the aggregate root the answer is obvious, since there is only one and it is the singular point of entry, it can be assumed that key presence will dictate the operation. It is an acceptable scenario, in my case, to save the aggregate root itself back again so that I can capture a modification date.
To help facilitate this behavior for the entities themselves, my EntityBase class contains two simple properties: IsUpdated(), IsDeleted(). Both of these default to false. I don't need to know if it is new or not, because I can make that determination based upon the presence of the key, as mentioned previously. The methods on the implementation, in this case the Page, would have each method that changes the backing data set IsUpdated() to true.
So, for example, Page has a method called UpdateSectionName() which changes the backing value of the SectionName property, which is read-only. This approach is used consistently, as it allows for a logical attachment point of validators in the method (preventing the entity from entering an invalid state) that performs that data setting. The end result is that I have to put a this.IsUpdated() = true; at the end of the method.
When the aggregate root is sent into the repository for the Save() (a logic switch to either an Insert() or Update() operation), it can then iterate over the Pages collection in the Book, looking for any pages that have one of three scenarios:
No key. A Page with no key will be inserted.
IsDeleted = true; A delete trumps an update, and the deletion will be committed - ignoring any update for the Page.
IsUpdated = true; An update will be committed for the Page.
Doing it this way prevents me from just blindly updating everything that is in the Pages collection, which could be daunting if there were several hundred Page entities in the Book, for example. I had been considering retrieving a copy of the Book, and doing a comparison and only committing changes detected, (inserts, updates, and deletes based upon presence and/or comparison), but it seemed to be an awfully chatty way to go about it.
The main drawback is that the developer has to remember to set IsUpdated in each method in the entity. Forget one, and it will not be able to detect changes for that value. I have toyed with the idea of some sort of a custom backing store that could transparently timestamp changes, which could in turn make IsUpdated a read-only property that the repository could use to aggregate updates.
The repository is using a unit of work pattern implementation that is basing its actions on the timestamp generated when the aggregate root was added to it. Since there might be multiple entities queued for operations, entity operations are rolled up and executed immediately after the aggregate root operation(s) are executed that the entities belong to. I could see taking it a step further and creating another unit of work to just handle the entity operations and base them off some sort of event tracking used in the entity (which is how I am assuming that some of the ORM products on the market accomplish a similar level of functionality).
Before I keep on moving in this direction, though, I would love to hear ideas/recommendations/experiences regarding this.
Edit: A few additional pieces of information that might be helpful to know:
The current language that I am working with is C#, although I tried to keep as much language-specific information out as possible, because this is more of a theoretical discussion.
The code for the repositories/services/entities/etc. is based upon Tim McCarthy's concept in his book, ".NET Domain-Driven Design with C#" and the supporting code on CodePlex. It provides a runnable understanding of the type of approach taken, although what I am working with has largely been rewritten from the ground up.
In short, my answer is that I went with what I proposed. It is working, although I am sure that there is room for improvement. The changes actually took very little time, so I feel I didn't navigate too far from the KISS or YAGNI principals in this case. :-)
I still feel that there is room for timing related issues on operations, but I should be able to work around them in the repository implementations. Not the ideal solution, but I am not sure that it is worth reinventing the wheel to correct a problem that can be avoided in less time than it takes to fix.
For a large project, would it make sense to have one datacontext which maps out your database, which you can interact with from your classes?
Or would it make more sense to split this up into small datacontexts which are focused on the specific tasks within the database that will be required.
I'm curious as to the performance. It's my understanding that the datacontext itself is a very lightweight object, which only initializes its internal collections as they're required etc. Therfore dealing with a datacontext with many defintions but only two tables of data should be as fast as dealing with a special datacontext with only those two tables in it.
I also think you would benefit at the JIT time, as the first class to do data access will compile your dc which is now available to all classes.
I'm assuming you're asking about design versus runtime pattern. I general I'd say no:
While it may be possible to partition your database into multiple data contexts, then that would be desirable if and only if there was zero overlap between the two contexts.
Overlap is Bad
e.g. you have a WebsiteContext and an AdminContext. The WebsiteContext is for displaying Product and fulfilling Orders. A WebsiteUser is attached to an Order. The AdminContext is for your Staff members to process refunds for cancelled Orders, which also reference WebsiteUser. The AdminContext also needs to reset passwords and update other details for the WebsiteUser.
You're thinking of doing this because you don't want the website to process or even know about Returns
WebsiteContext
Product -- Order -- WebsiteUser
AdminContext
Staff -- Returns -- Order -- WebsiteUser
In the above, we can see we're duplicating many objects in the different Data Contexts. This smells bad, and it really indicates that artificially dividing the database into different data contexts is a the wrong decision. Do you have >2 databases after all, or just the one? The duplication violates the DRY principle (Dont Repeat Yourself) because WebsiteContext.WebsiteUser is not the same as AdminContext.WebsiteUser and in all likelihood the code is going to be messy when something needs to care which one they're referencing.
The Linq Data Context is just an OR mapper, and needs to be treated as a fancy black box that makes writing some of the data access code easier. Some linq demos make it look like you no longer need the other layers, but a program of any complexity still benefits from a layered design.
You're probably better off treating the Linq objects as just objects for easily transferring data and create a Domain layer that hides them as an implementation detail. Have a read of DDD - Domain Driven Design.
On their own, just using the Linq objects from the UI most resembles the Transaction Script pattern. In such a case you'd still benefit from having a logic layer that takes care of the details.
While you may not want the responsibility of a context to be so broad, the data context is just a representation of the database. It's not a security mechanism, and it can't prevent you from corrupting the data.
From a "repository pattern" angle, there is something to be said for having separate aggregates and minimizing the navigation properties between aggregates. I'm not 100% sure what that something is though... at the moment I'm considering using a moderately-sized dbml, with multiple repositories using it (the UI doesn't use the datacontexts directly - only the repository classes), with the navigation properties marked internal so the DAL can use them... Maybe...