Domain Driven Design - Large child collections

Domain Driven Design - Large child collections - c#

The Problem
How are large collections implemented in DDD that "feel" like they should be a part of the aggregate root, yet would be impractical if they were? Here are a few examples based on my domain.
Employee Aggregate Root
Announcements Collection
Direct Messages Collection
Product Aggregate Root
Stock Items Collection
etc. etc..
What I'm Thinking
I would like to keep the ability to navigate to these large collections from the aggregate root but since I'm wrapping my O/RM with Repositories lazy loading isn't really an option... Unless I implement lazy loading by injecting either the necessary repository. But I know from what I've read about DDD that domain entities should not know about any such repositories..
The other option would be to take the approach that any potentially large collection of entities in my domain is an Aggregate Root and should have its own repository with the required interface to get the collection of items by another aggregate root. Eg.
public interface IStockRepository
{
IEnumerable<StockItem> FetchByProduct(Product product);
// ...
}

This "I would like to keep the ability to navigate to these large collections from the aggregate root" ... is a smell. You seem pretty obsessed, if you don't mind me saying, with the structure of your aggregate and not with its behavior, what problem(s) it is solving, any invariants that come into play. Frankly, the feeling you have is misplaced. It's a residue of our structural, database oriented way of thinking.
In general, I'd say, one should not have these large collections in the first place. For one loading them will require resources (memory, cpu, bandwidth) better spent elsewhere. From a more functional perspective people tend to not deal with large amounts at once anyway, and even computers can do more work when you break things down into units of work. As such, try to stay away from large collections and always question "why" you'd need them in the first place.
An announcement could be its own aggregate, referring to the employee by its id, so we know who the announcement was about (or for?). If the announcements are targeted at groups of employees, you might want to look into what defines that group, and model it explicitly. A direct message could also be its own aggregate because it is probably a message from one person to another. One could say the employee has the role of being a message recipient and/or sender. Again, referring to the employee aggregate by id might suffice. A stock item might be treated individually and refer to the product it represents within the stock by its productid. What is the behavior of an employee, an announcement, a direct message, a product, a stockitem? How and when does changing the state of its collaborators affect them and really, why is that? It's a means to a root cause. Find it.
All that said, there are times when you can bend the rules a bit, but they should be few.

Take a look at the Forum DDD example from Vaughn Vernon. He modeled the large collections out of the aggregate root. Creation is done by a factory method on the aggregate to keep control of some thing, like a dicussion can not be created when the Forum is closed. Actions are done through the AR Forum (like startDiscussion and moderatePost).
The method returns an entity (Post) that need to be saved in a separate repository (PostRepository) by the application service. Now you can have large collections without the need to load every time.
https://github.com/VaughnVernon/IDDD_Samples/tree/master/iddd_collaboration/src/main/java/com/saasovation/collaboration/domain/model/forum

Related

DDD: Lazy loading in aggregates

I've been learning DDD for the past few days and struggling to understand some core concepts of aggregate roots. Maybe somebody could give me push in the right direction and elaborate what's best practice in this scenario:
To make this example less complex, let's say we have a domain of two entities: restaurant and opening time. Restaurant is defined as aggregate root.
From what I understood in all the online examples (correct me if I'm mistaken), the aggregate root is eager loading all sub-entities every time I need an instance of it. So whenever I want to call call a method on the restaurant, all opening times are loaded (no matter if they are used or not).
In this example I want to verify that there is no intersection of opening times when a new time get's added to that restaurant. In this case the eager loading of every other opening time makes sense because I need to compare them with the existing ones.
BUT: this is kind of restricting because I know every time I want to add another collection of something (let's say restaurant images), the SQL load get's heavier and heavier even though most of the methods only require one of the collections.
I could think of two possible solutions:
Lazy loading
Lazy loading opening times / sub entities through entity framework property proxies. So the aggregate root can exist without eager loading them but whenever they are needed they can be accessed.
However everywhere I was searching for an answer I've red that lazy loading in aggregate roots is considered bad practice. Maybe somebody could explain why.
Smaller aggregate roots
Of course I could define the opening time itself as an aggregate root but then I need to take the business logic (in this case the verification of intersections) outside of the model.
In all the examples above I'm only talking about the command side (not querying or serializing).
Maybe I'm missing some fundamental ideas. How should the aggregate roots be organized in this example and why is lazy loading considered bad practice?
EDIT
Not sure why this question was closed because of "opinion based". I'm asking for best practice and why lazy loading is not in this case.

However everywhere I was searching for an answer I've red that lazy loading in aggregate roots is considered bad practice.
That is not exactly the case. The actual restriction is for lazy loading other aggregates, due to direct references, at aggregate roots; referring other aggregates with their identifiers is recommended instead. This restriction has very good reasons behind it.
References to aggregates unnecessarily increases application's memory footprint (retrieving an entity that is not going to be used in a transaction); in highly concurrent use cases, where locks are in effect, degrades application performance (unnecessarily locking an entity); hampers data partitioning (both aggregates need to be processed in the same data node).
Each aggregate root defines its own consistency boundary; each transaction is meant to ensure one aggregate's consistency. Update operations (or transactions) across aggregates, communicated through domain events, are supposed to be eventually consistent.
If you are holding direct reference to another aggregate, thereby necessitating lazy loading, and performing updates on them, you should rethink your design.
The choice in your scenario, as usual, depends on the business context or the domain it deals with. If you think OpeninigTime is a separate aggregate, with its own consistency boundary, you should hold only its id and publish domain events, containing the id, that the OpeningTime aggregate will handle by retrieving the appropriate aggregate. If, however, it is not the case (that seems to be more likely), you may very much hold reference, lazy load, and perform updates on it.

This problem classifies as a set validation problem and there's a few potential solutions depending on the domain needs.
Strong Consistency
Create an AR to protect the entire set. In the above example if such set has a reasonable length you may create a dedicated RestaurantSchedule AR. You may even partition such AR further into RestaurantWeekSchedule where you'd have 1 AR for each week. When you want to add/remove opening days it would create/load the AR of the given week. Another option could be to tweak the ORM to load only a subset of the collection e.g. schedule = scheduleRepo.loadForWeek(openingTime.week()); schedule.add(openingTime);. Optimistic locking would still allow to detect conflicts.
Enforce the rule in the DB. If you have a relational DB for instance you may have OpeningTime as an AR and then use unique constraints to prevent violations. The rule would end up living outside the domain, but it may be acceptable. Uniqueness rules aren't that interesting.
Eventual Consistency
A very common way to deal with set validation problems is to avoid trying to avoid the violation and focus on detecting and correcting it after the fact either through manual or automated compensating actions. This can be as simple as an exception report that displays overlapping entries or more complex such as a single-threaded listener to OpeningTimeAdded events that checks for conflicts and marks such entries for correction.

For your question why is lazy loading considered as bad practice:
Some issue I have with lazy loading is that you need to be constantly aware of potential performance issues. On the other hand However LL is a decent default behaviour for ORMs. It is great if you have long term running applications but it is pointless implementing it on top of your services. I think it depends a lot on the data you are using so its neither good or bad.

.NET Domain Model, when to eager load

I'm fairly new to the whole .NET scene and I'm still trying to figure this thing out.
One thing that seems to be very advocated for is the Domain Driven Design pattern. And eager as I am to get a flying start in the .NET world and doing it proper I dived in trying to apply it to my project.
As I understand it is bad practice to give a domain object access to persistence layer functions like repositories, but then I'm really struggling on how to get around the problem of eager loading when working with heavily connected graphs. It usually ends up with having pretty much no logic in the domain model and moving all of it to a service layer which does calculations and fills the model objects with data or returns the result directly, for example price = productService.CalculatePriceFor(product, user); instead of price = product.Price(user) since the latter cannot be accomplished without eager loading the whole productgroup tree and discount matrix when first requesting the object.
What is good practice here? Implement subclasses of product where the information for getting the user price is calculated at load time and have another subclass of it when I don't need the user price?

In proper DDD you don't have any heavily connected graphs. Vaughn Vernon's article about the aggregates design should help you catch the idea.
http://dddcommunity.org/library/vernon_2011/
Also consider if product and user belong to the same bounded context. I would say they don't. In different context there might be similar ideas implemented in different way, for example, in Pricing Context, the Client class might be a simple aggregate providing some rebates but no information about shipping, invoices etc. However, is Shipping Context there would be a similar user class being only an value object (with an address) within a Shipping aggregate. User class seems to be a part of Authentication Context.

Implementing the repository and service pattern with RavenDB

I have some difficulties implementing the repository and service pattern in my RavenDB project. The major concern is how my repository interface should look like because in RavenDB I use a couple of indexes for my queries.
Let's say I need to fetch all items where the parentid equals 1. One way is to use the IQueryable List() and get all documents and then add a where clause to select the items where the parentid equals 1. This seems like a bad idea because I can't use any index features in RavenDB. So the other approach is to have something like this, IEnumerable Find(string index, Func predicate) in the repository but that also seems like a bad idea because it's not generic enough and requires that I implement this method for if I would change from RavenDB to a common sql server.
So how can I implement a generic repository but still get the benefits of indexes in RavenDB?

This post sums it all up very nicely:
http://novuscraft.com/blog/ravendb-and-the-repository-pattern

First off, ask why you want to use the repository pattern?
If you're wanting to use the pattern because you're doing domain driven design, then as another of these answers points out, you need to re-think the intent of your query, and talk about it in terms of your domain - and you can start to model things around this.
In that case, specifications are probably your friend and you should look into them.
HOWEVER, let's look at a single part of your question momentarily before continuing with my answer:
seems like a bad idea because it's not generic enough and requires that I implement this method for if I would change from RavenDB to a common sql server.
You're going about it the wrong way - trying to make your system entirely persistence-agnostic at this level is asking for trouble - if you try hiding the unique features of your datastore from the queries themselves then why bother using RavenDB?
A method I tend to use in simple document-oriented (IE, I do talk in terms of data, which is what you appear to be doing), is to split up my queries from my commands.
Ask yourself, why do you want to query for your documents by parent ID? Is it to display a list on a page? Why are you trying to model this in terms of documents then? Why not model this in terms of a view model and use the most effective method of retrieving this data from RavenDB? (A query over an index (dynamic or otherwise)), stick this in a factory which takes 'some inputs' and generates 'the output' and if you do decide to change your persistence store, you can change these factories. (I go one step further in my ASP.NET MVC applications, and have single action controllers, and I don't call them controllers, making the query from those in most cases).
If you want to actually pull out your documents by parent id in order to update them or run some business logic across them, perhaps you've modelled them wrong - a write operation will typically only involve change to a single document, or in other words you should be modelling your documents around your transaction boundaries.
TL;DR
Think about what it is you actually want to achieve - why do you want to use the "Repository pattern" or the "Service pattern" - these words exist as ways of describing a scenario you might end up with if you model your application around your needs, as a common way of expressing the role of a certain object- not as something you need to shoehorn your every piece of functionality into.

Let's say I need to fetch all items
where the parentid equals 1.
First, stop thinking of your data access needs this way.
You DO NOT need to "fetch all items where the parentid equals 1". It will help to try and stop thinking in such a data oriented way.
What you need is to fetch all items with a particular parent. This is a concept that exists in your problem space (your application's domain).
The fact that you model this in the database with a foreign key and a field named parentid is an implementation detail. Encapsulate this, do not leak it throughout your application.
One way is to use the IQueryable
List() and get all documents and then
add a where clause to select the items
where the parentid equals 1. This
seems like a bad idea because I can't
use any index features in RavenDB. So
the other approach is to have
something like this, IEnumerable
Find(string index, Func predicate) in
the repository but that also seems
like a bad idea because
Both of these are bad ideas. What you are suggesting is requiring the code that calls your repository or query to have knowledge of your schema.
Why should the consumer of your repository care or know that there is a parentid field? If this changes, if the definition of some particular concept in your problem space changes, how many places in your code will have to change?
Every single place that fetches items with a particular parent.
This is bad, it is the antithesis of encapsulation.
My opinion is that you will want to model queries as explicit concepts, and not lambda's or strings passed around and used all over.
You can model queries explicitly with the Specification pattern, named query methods on a repository, Query Object pattern, etc.
it's not
generic enough and requires that I
implement this method for if I would
change from RavenDB to a common sql
server.
Well, that Func is too generic. Again, think about what your consuming code will need to know in order to use such a method of querying, you will be tying upper layers of your code directly to your DB schema doing this.
Also, if you change from one storage engine to another, you cannot avoid re-implementing queries where performance was enough of a factor to use storage-engine-specific aids (indexes in Raven, for example).

I would actually discourage you from using the repository pattern. In most cases, it is over-architecting and actually makes the code more complicated.
Ayende has made a number of posts to that end recently:
http://ayende.com/Blog/archive/2011/03/16/architecting-in-the-pit-of-doom-the-evils-of-the.aspx
http://ayende.com/Blog/archive/2011/03/18/the-wages-of-sin-over-architecture-in-the-real-world.aspx
http://ayende.com/Blog/archive/2011/03/22/the-wages-of-sin-proper-and-improper-usage-of-abstracting.aspx
I recommend just writing against Raven's native API.
If you feel that my response is too general, list some of the benefits you hope to gain from using another layer of abstraction and we can continue the discussion.

Pattern for retrieving complex object graphs with Repository Pattern with Entity Framework

We have an ASP.NET MVC site that uses Entity Framework abstractions with Repository and UnitOfWork patterns. What I'm wondering is how others have implemented navigation of complex object graphs with these patterns. Let me give an example from one of our controllers:
var model = new EligibilityViewModel
{
Country = person.Pathway.Country.Name,
Pathway = person.Pathway.Name,
Answers = person.Answers.ToList(),
ScoreResult = new ScoreResult(person.Score.Value),
DpaText = person.Pathway.Country.Legal.DPA.Description,
DpaQuestions = person.Pathway.Country.Legal.DPA.Questions,
Terms = person.Pathway.Country.Legal.Terms,
HowHearAboutUsOptions = person.Pathway.Referrers
};
It's a registration process and pretty much everything hangs off the POCO class Person. In this case we're caching the person through the registration process. I've now started implementing the latter part of the registration process which requires access to data deeper in the object graph. Specifically DPA data which hangs off Legal inside Country.
The code above is just mapping out the model information into a simpler format for the ViewModel. My question is do you consider this fairly deep navigation of the graph good practice or would you abstract out the retrieval of the objects further down the graph into repositories?

In my opinion, the important question here is - have you disabled LazyLoading?
If you haven't done anything, then it's on by default.
So when you do Person.Pathway.Country, you will be invoking another call to the database server (unless you're doing eager loading, which i'll speak about in a moment). Given you're using the Repository pattern - this is a big no-no. Controllers should not cause direct calls to the database server.
Once a C ontroller has received the information from the M odel, it should be ready to do projection (if necessary), and pass onto the V iew, not go back to the M odel.
This is why in our implementation (we also use repository, ef4, and unit of work), we disable Lazy Loading, and allow the pass through of the navigational properties via our service layer (a series of "Include" statements, made sweeter by enumerations and extension methods).
We then eager-load these properties as the Controllers require them. But the important thing is, the Controller must explicitly request them.
Which basically tells the UI - "Hey, you're only getting the core information about this entity. If you want anything else, ask for it".
We also have a Service Layer mediating between the controllers and the repository (our repositories return IQueryable<T>). This allows the repository to get out of the business of handling complex associations. The eager loading is done at the service layer (as well as things like paging).
The benefit of the service layer is simple - more loose coupling. The Repository handles only Add, Remove, Find (which returns IQueryable), Unit of Work handles "newing" of DC's, and Commiting of changes, Service layer handles materialization of entities into concrete collections.
It's a nice, 1-1 stack-like approach:
personService.FindSingle(1, "Addresses") // Controller calls service
|
--- Person FindSingle(int id, string[] includes) // Service Interface
|
--- return personRepository.Find().WithIncludes(includes).WithId(id); // Service calls Repository, adds on "filter" extension methods
|
--- IQueryable<T> Find() // Repository
|
-- return db.Persons; // return's IQueryable of Persons (deferred exec)
We haven't got up to the MVC layer yet (we're doing TDD), but a service layer could be another place you could hydrate the core entities into ViewModels. And again - it would be up to the controller to decide how much information it wishes.
Again, it's all about loose coupling. Your controllers should be as simplistic as possible, and not have to worry about complex associations.
In terms of how many Repositories, this is a highly debated topic. Some like to have one per entity (overkill if you ask me), some like to group based on functionality (makes sense in terms of functionality, easier to work with), however we have one per aggregate root.
I can only guess on your Model that "Person" should be the only aggregate root i can see.
Therefore, it doesn't make much sense having another repository to handle "Pathways", when a pathway is always associated with a particular "Person". The Person repository should handle this.
Again - maybe if you screencapped your EDMX, we could give you more tips.
This answer might be extending out a little too far based on the scope of the question, but thought i'd give an in-depth answer, as we are dealing with this exact scenario right now.
HTH.

It depends on how much of the information you're using at any one time.
For example, if you just want to get the country name for a person (person.Pathway.Country.Name) what is the point in hydrating all of the other objects from the database?
When I only need a small part of the data I tend to just pull out what I'm going to use. In other words I will project into an anonymous type (or a specially made concrete type if I must have one).
It's not a good idea to pull out an entire object and everything related to that object every time you want to access some properties. What if you're doing this once every postback, or even multiple times? By doing this you might be making life easier in the short term at the cost of you're making your application less scalable long term.
As I stated at the start though, there isn't a one size fits all rule for this, but I'd say it's rare that you need to hydrate that much information.

Should I use one LINQ DataContext or many?

For a large project, would it make sense to have one datacontext which maps out your database, which you can interact with from your classes?
Or would it make more sense to split this up into small datacontexts which are focused on the specific tasks within the database that will be required.
I'm curious as to the performance. It's my understanding that the datacontext itself is a very lightweight object, which only initializes its internal collections as they're required etc. Therfore dealing with a datacontext with many defintions but only two tables of data should be as fast as dealing with a special datacontext with only those two tables in it.
I also think you would benefit at the JIT time, as the first class to do data access will compile your dc which is now available to all classes.

I'm assuming you're asking about design versus runtime pattern. I general I'd say no:
While it may be possible to partition your database into multiple data contexts, then that would be desirable if and only if there was zero overlap between the two contexts.
Overlap is Bad
e.g. you have a WebsiteContext and an AdminContext. The WebsiteContext is for displaying Product and fulfilling Orders. A WebsiteUser is attached to an Order. The AdminContext is for your Staff members to process refunds for cancelled Orders, which also reference WebsiteUser. The AdminContext also needs to reset passwords and update other details for the WebsiteUser.
You're thinking of doing this because you don't want the website to process or even know about Returns
WebsiteContext
Product -- Order -- WebsiteUser
AdminContext
Staff -- Returns -- Order -- WebsiteUser
In the above, we can see we're duplicating many objects in the different Data Contexts. This smells bad, and it really indicates that artificially dividing the database into different data contexts is a the wrong decision. Do you have >2 databases after all, or just the one? The duplication violates the DRY principle (Dont Repeat Yourself) because WebsiteContext.WebsiteUser is not the same as AdminContext.WebsiteUser and in all likelihood the code is going to be messy when something needs to care which one they're referencing.
The Linq Data Context is just an OR mapper, and needs to be treated as a fancy black box that makes writing some of the data access code easier. Some linq demos make it look like you no longer need the other layers, but a program of any complexity still benefits from a layered design.
You're probably better off treating the Linq objects as just objects for easily transferring data and create a Domain layer that hides them as an implementation detail. Have a read of DDD - Domain Driven Design.
On their own, just using the Linq objects from the UI most resembles the Transaction Script pattern. In such a case you'd still benefit from having a logic layer that takes care of the details.
While you may not want the responsibility of a context to be so broad, the data context is just a representation of the database. It's not a security mechanism, and it can't prevent you from corrupting the data.

From a "repository pattern" angle, there is something to be said for having separate aggregates and minimizing the navigation properties between aggregates. I'm not 100% sure what that something is though... at the moment I'm considering using a moderately-sized dbml, with multiple repositories using it (the UI doesn't use the datacontexts directly - only the repository classes), with the navigation properties marked internal so the DAL can use them... Maybe...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.