I am looking to see what approaches people might have taken to detect changes in entities that are a part of their aggregates. I have something that works, but I am not crazy about it. Basically, my repository is responsible for determining if the state of an aggregate root has changed. Let's assume that I have an aggregate root called Book and an entity called Page within the aggregate. A Book contains one or more Page entities, stored in a Pages collection.
Primarily, insert vs. update scenarios are done by inspecting the aggregate root and its entities to determine the presence of a key. If the key is present, it is presumed that the object has been, at one time, saved to the underlying data source. This makes it a candidate for an update; but it is not definitive based upon that alone for the entities. With the aggregate root the answer is obvious, since there is only one and it is the singular point of entry, it can be assumed that key presence will dictate the operation. It is an acceptable scenario, in my case, to save the aggregate root itself back again so that I can capture a modification date.
To help facilitate this behavior for the entities themselves, my EntityBase class contains two simple properties: IsUpdated(), IsDeleted(). Both of these default to false. I don't need to know if it is new or not, because I can make that determination based upon the presence of the key, as mentioned previously. The methods on the implementation, in this case the Page, would have each method that changes the backing data set IsUpdated() to true.
So, for example, Page has a method called UpdateSectionName() which changes the backing value of the SectionName property, which is read-only. This approach is used consistently, as it allows for a logical attachment point of validators in the method (preventing the entity from entering an invalid state) that performs that data setting. The end result is that I have to put a this.IsUpdated() = true; at the end of the method.
When the aggregate root is sent into the repository for the Save() (a logic switch to either an Insert() or Update() operation), it can then iterate over the Pages collection in the Book, looking for any pages that have one of three scenarios:
No key. A Page with no key will be inserted.
IsDeleted = true; A delete trumps an update, and the deletion will be committed - ignoring any update for the Page.
IsUpdated = true; An update will be committed for the Page.
Doing it this way prevents me from just blindly updating everything that is in the Pages collection, which could be daunting if there were several hundred Page entities in the Book, for example. I had been considering retrieving a copy of the Book, and doing a comparison and only committing changes detected, (inserts, updates, and deletes based upon presence and/or comparison), but it seemed to be an awfully chatty way to go about it.
The main drawback is that the developer has to remember to set IsUpdated in each method in the entity. Forget one, and it will not be able to detect changes for that value. I have toyed with the idea of some sort of a custom backing store that could transparently timestamp changes, which could in turn make IsUpdated a read-only property that the repository could use to aggregate updates.
The repository is using a unit of work pattern implementation that is basing its actions on the timestamp generated when the aggregate root was added to it. Since there might be multiple entities queued for operations, entity operations are rolled up and executed immediately after the aggregate root operation(s) are executed that the entities belong to. I could see taking it a step further and creating another unit of work to just handle the entity operations and base them off some sort of event tracking used in the entity (which is how I am assuming that some of the ORM products on the market accomplish a similar level of functionality).
Before I keep on moving in this direction, though, I would love to hear ideas/recommendations/experiences regarding this.
Edit: A few additional pieces of information that might be helpful to know:
The current language that I am working with is C#, although I tried to keep as much language-specific information out as possible, because this is more of a theoretical discussion.
The code for the repositories/services/entities/etc. is based upon Tim McCarthy's concept in his book, ".NET Domain-Driven Design with C#" and the supporting code on CodePlex. It provides a runnable understanding of the type of approach taken, although what I am working with has largely been rewritten from the ground up.
In short, my answer is that I went with what I proposed. It is working, although I am sure that there is room for improvement. The changes actually took very little time, so I feel I didn't navigate too far from the KISS or YAGNI principals in this case. :-)
I still feel that there is room for timing related issues on operations, but I should be able to work around them in the repository implementations. Not the ideal solution, but I am not sure that it is worth reinventing the wheel to correct a problem that can be avoided in less time than it takes to fix.
Related
I've been learning DDD for the past few days and struggling to understand some core concepts of aggregate roots. Maybe somebody could give me push in the right direction and elaborate what's best practice in this scenario:
To make this example less complex, let's say we have a domain of two entities: restaurant and opening time. Restaurant is defined as aggregate root.
From what I understood in all the online examples (correct me if I'm mistaken), the aggregate root is eager loading all sub-entities every time I need an instance of it. So whenever I want to call call a method on the restaurant, all opening times are loaded (no matter if they are used or not).
In this example I want to verify that there is no intersection of opening times when a new time get's added to that restaurant. In this case the eager loading of every other opening time makes sense because I need to compare them with the existing ones.
BUT: this is kind of restricting because I know every time I want to add another collection of something (let's say restaurant images), the SQL load get's heavier and heavier even though most of the methods only require one of the collections.
I could think of two possible solutions:
Lazy loading
Lazy loading opening times / sub entities through entity framework property proxies. So the aggregate root can exist without eager loading them but whenever they are needed they can be accessed.
However everywhere I was searching for an answer I've red that lazy loading in aggregate roots is considered bad practice. Maybe somebody could explain why.
Smaller aggregate roots
Of course I could define the opening time itself as an aggregate root but then I need to take the business logic (in this case the verification of intersections) outside of the model.
In all the examples above I'm only talking about the command side (not querying or serializing).
Maybe I'm missing some fundamental ideas. How should the aggregate roots be organized in this example and why is lazy loading considered bad practice?
EDIT
Not sure why this question was closed because of "opinion based". I'm asking for best practice and why lazy loading is not in this case.
However everywhere I was searching for an answer I've red that lazy loading in aggregate roots is considered bad practice.
That is not exactly the case. The actual restriction is for lazy loading other aggregates, due to direct references, at aggregate roots; referring other aggregates with their identifiers is recommended instead. This restriction has very good reasons behind it.
References to aggregates unnecessarily increases application's memory footprint (retrieving an entity that is not going to be used in a transaction); in highly concurrent use cases, where locks are in effect, degrades application performance (unnecessarily locking an entity); hampers data partitioning (both aggregates need to be processed in the same data node).
Each aggregate root defines its own consistency boundary; each transaction is meant to ensure one aggregate's consistency. Update operations (or transactions) across aggregates, communicated through domain events, are supposed to be eventually consistent.
If you are holding direct reference to another aggregate, thereby necessitating lazy loading, and performing updates on them, you should rethink your design.
The choice in your scenario, as usual, depends on the business context or the domain it deals with. If you think OpeninigTime is a separate aggregate, with its own consistency boundary, you should hold only its id and publish domain events, containing the id, that the OpeningTime aggregate will handle by retrieving the appropriate aggregate. If, however, it is not the case (that seems to be more likely), you may very much hold reference, lazy load, and perform updates on it.
This problem classifies as a set validation problem and there's a few potential solutions depending on the domain needs.
Strong Consistency
Create an AR to protect the entire set. In the above example if such set has a reasonable length you may create a dedicated RestaurantSchedule AR. You may even partition such AR further into RestaurantWeekSchedule where you'd have 1 AR for each week. When you want to add/remove opening days it would create/load the AR of the given week. Another option could be to tweak the ORM to load only a subset of the collection e.g. schedule = scheduleRepo.loadForWeek(openingTime.week()); schedule.add(openingTime);. Optimistic locking would still allow to detect conflicts.
Enforce the rule in the DB. If you have a relational DB for instance you may have OpeningTime as an AR and then use unique constraints to prevent violations. The rule would end up living outside the domain, but it may be acceptable. Uniqueness rules aren't that interesting.
Eventual Consistency
A very common way to deal with set validation problems is to avoid trying to avoid the violation and focus on detecting and correcting it after the fact either through manual or automated compensating actions. This can be as simple as an exception report that displays overlapping entries or more complex such as a single-threaded listener to OpeningTimeAdded events that checks for conflicts and marks such entries for correction.
For your question why is lazy loading considered as bad practice:
Some issue I have with lazy loading is that you need to be constantly aware of potential performance issues. On the other hand However LL is a decent default behaviour for ORMs. It is great if you have long term running applications but it is pointless implementing it on top of your services. I think it depends a lot on the data you are using so its neither good or bad.
Are there any frameworks out there for MongoDB in C# that can automatically map Document Relations? I'm now talking about a model or "schema" that is purely defined by documents themselves and not by objects in .Net or any other external schema for that matter.
Think dynamic objects / bsondocuments that can automatically lazy-load relations between other documents.
I have several ideas how to solve this myself however if there already exist any frameworks or perhaps BsonDocument extensions (how I intended to solve this myself) this would lessen the need to add complexity to the project I'm working at itself.
The question is largely off-topic ('are there frameworks'), but I'd like to challenge the idea in itself:
this would lessen the need to add complexity to the project I'm working at itself.
I think it would merely hide complexity by moving it to a part of the code that knows nothing about your functional or non-functional requirements. Combined with a database that has no constraints except unique that doesn't sound like a good idea.
I'd recommend to stay away from lazy loading as an almost general rule, because it makes it impossible to tell whether
an operation is super costly (database call) or a mere memory lookup
the properties' state will be fetched on access, or is cached, thus hiding the key aspect of serialization from the user.
In other words: I'd stay away from the idea, or use something like EF with whatever database for it. If you don't care about your serialization, use a well-tested commonplace solution.
Not sure if there's a "offical" name, but by DataContext I mean an object which transparently maintains objects' state, providing change tracking, statement-of-work functionality, concurrency and probably many other useful features. (In Entity Framework it's ObjectContext, in NHibernate - ISession).
Eventually I've come to an idea that something like that should be implemented in my application (it uses mongodb as back-end, and mongodb's partial updates are fine when we're able to track a certain property change).
So actually, I've got several questions on this subject
Could anyone formulate requirements to DataContext? - what's your understanding of it's tasks and responsibilities? (The most relevant I've managed to find is Esposito's book, but unfortunately that's at about msdn samples level).
What would you suggest for changes tracking implementation? (In simplest way it's possible to track changes "manually" in entities, but requires coding and mixes dal with business logic, so I mostly interested in "automatic" way, keeping entities more poco).
Is there way to utilize some existing solution? (I hoped nhibernate infrastructure would allow plugging-in custom module to work with mongo behind the scene, but not sure if it allows working with non-sql dbs at all).
The DataContext (ObjectContext or DbContext in EF) is nothing else than an implementation of the Unit of Work (UoW)/Repository pattern.
I suggest you Fowler's book on Patterns of Enterprise Application Architecture in which he outlines the implementation of several persistency patterns. That might be a help in implementing your own solution.
A DataContext basically needs to fullfil the job of a UoW. It needs to handle the reading and management of objects that are involved in a given lifecycle (i.e. HTTP request), s.t. there are no two objects in memory that represent the same record on the DB. Moreover it needs to provide some change tracking for performing partial updates to the DB (as you already mentioned).
To what regards change tracking, I fully agree that polluting properties with change events etc is bad. One of the recent templates introduced in EF4.1 uses Proxies to handle that and to give the possibility to have plain POCOs.
Answer to quetion 2: To make POCO classes work, you will need to generate code at run-time, possibly using System.Reflection.
If you analyse EntityFramework, you will see that it requires virtual properties to do change-tracking... that is because it needs to create a generated class at run-time, that overrides each property, and then adds code to tell the DataContext when someone changes that property.
EntityFramework also generates code to initialize collections, so that when someone try to do operations like Add and Remove, the collection object itself knows what to do.
Question - What is a good best practice approach for how can I save/keep-in-sync an jn-memory graph of objects with the database?
Background:
That is say I have the classes Node and Relationship, and the application is building up a graph of related objects using these classes. There might be 1000 nodes with various relationships between them. The application needs to query the structure hence an in-memory approach is good for performance no doubt (e.g. traverse the graph from Node X to find the root parents)
The graph does need to be persisted however into a database with tables NODES and RELATIONSHIPS.
Therefore what is a good best practice approach for how can I save/keep-in-sync an jn-memory graph of objects with the database?
Ideal requirements would include:
build up changes in-memory and then 'save' afterwards (mandatory)
when saving, apply updates to database in correct order to avoid hitting any database constraints (mandatory)
keep persistence mechanism separate from model, for ease in changing persistence layer if needed, e.g. don't just wrap an ADO.net DataRow in the Node and Relationship classes (desirable)
mechanism for doing optimistic locking (desirable)
Or is the overhead of all this for a smallish application just not worth it and I should just hit the database each time for everything? (assuming the response times were acceptable) [would still like to avoid if not too much extra overhead to remain somewhat scalable re performance]
I'm using the self tracking entities in Entity Framework 4. After the entities are loaded into memory the StartTracking() MUST be called on every entity. Then you can modify your entity graph in memory without any DB-Operations. When you're done with the modifications, you call the context extension method "ApplyChanges(rootOfEntityGraph)" and SaveChanges(). So your modifications are persisted. Now you have to start the tracking again on every entity in the graph. Two hints/ideas I'm using at the moment:
1.) call StartTracking() at the beginning on every entity
I'm using an Interface IWorkspace to abstract the ObjectContext (simplifies testing -> see the OpenSource implementation bbv.DomainDrivenDesign at sourceforge). They also use a QueryableContext. So I created a further concrete Workspace and QueryableContext implementation and intercept the loading process with an own IEnumerable implementation. When the workspace's consumer executes the query which he get with CreateQuery(), my intercepting IEnumerable object registers an eventhandler on the context's ChangeTracker. In this event handler I call StartTracking() for every entity loaded and added into the context (doesn't work if you load the objects with NoTrakcing, because in that case the objects aren't added to the context and the event handler will not be fired). After the enumeration in the self made Iterator, the event handler on the ObjectStateManager is deregistered.
2.) call StartTracking() after ApplyChanges()/SaveChanges()
In the workspace implementation, I ask the context's ObjectStateManager for the modified entities, i.e:
var addedEntities = this.context.ObjectStateManager.GetObjectStateEntries(EntityState.Added);
--> analogous for modified entities
cast them to IObjectWithChangeTracker and call the AcceptChanges() method on the entity itself. This starts the object's changetracker again.
For my project I have the same mandatory points as you. I played around with EF 3.5 and didn't find a satisfactory solution. But the new ability of self tracking entities in EF 4 seems to fit my requirements (as far as I explored the funcionality).
If you're interested, I'll send you my "spike"-project.
Have anyone an alternative solution? My project is a server application which holds objects in memory for fast operations, while modifications should also be persisted (no round trip to DB). At some points in code the object graphs are marked as deleted/terminated and are removed from the in-memory container. With the explained solution above I can reuse the generated model from EF and have not to code and wrapp all objects myself again. The generated code for the self tracking entities arises from T4 templates which can be adapted very easily.
Thanks a lot for other ideas/critism
Short answer is that you can still keep a graph (collection of linked objects) of the objects in memory and write the changes to the database as they occur. If this is taking too long, you could put the changes onto a message queue (but that is probably overkill) or execute the updates and inserts on a separate thread.
For a large project, would it make sense to have one datacontext which maps out your database, which you can interact with from your classes?
Or would it make more sense to split this up into small datacontexts which are focused on the specific tasks within the database that will be required.
I'm curious as to the performance. It's my understanding that the datacontext itself is a very lightweight object, which only initializes its internal collections as they're required etc. Therfore dealing with a datacontext with many defintions but only two tables of data should be as fast as dealing with a special datacontext with only those two tables in it.
I also think you would benefit at the JIT time, as the first class to do data access will compile your dc which is now available to all classes.
I'm assuming you're asking about design versus runtime pattern. I general I'd say no:
While it may be possible to partition your database into multiple data contexts, then that would be desirable if and only if there was zero overlap between the two contexts.
Overlap is Bad
e.g. you have a WebsiteContext and an AdminContext. The WebsiteContext is for displaying Product and fulfilling Orders. A WebsiteUser is attached to an Order. The AdminContext is for your Staff members to process refunds for cancelled Orders, which also reference WebsiteUser. The AdminContext also needs to reset passwords and update other details for the WebsiteUser.
You're thinking of doing this because you don't want the website to process or even know about Returns
WebsiteContext
Product -- Order -- WebsiteUser
AdminContext
Staff -- Returns -- Order -- WebsiteUser
In the above, we can see we're duplicating many objects in the different Data Contexts. This smells bad, and it really indicates that artificially dividing the database into different data contexts is a the wrong decision. Do you have >2 databases after all, or just the one? The duplication violates the DRY principle (Dont Repeat Yourself) because WebsiteContext.WebsiteUser is not the same as AdminContext.WebsiteUser and in all likelihood the code is going to be messy when something needs to care which one they're referencing.
The Linq Data Context is just an OR mapper, and needs to be treated as a fancy black box that makes writing some of the data access code easier. Some linq demos make it look like you no longer need the other layers, but a program of any complexity still benefits from a layered design.
You're probably better off treating the Linq objects as just objects for easily transferring data and create a Domain layer that hides them as an implementation detail. Have a read of DDD - Domain Driven Design.
On their own, just using the Linq objects from the UI most resembles the Transaction Script pattern. In such a case you'd still benefit from having a logic layer that takes care of the details.
While you may not want the responsibility of a context to be so broad, the data context is just a representation of the database. It's not a security mechanism, and it can't prevent you from corrupting the data.
From a "repository pattern" angle, there is something to be said for having separate aggregates and minimizing the navigation properties between aggregates. I'm not 100% sure what that something is though... at the moment I'm considering using a moderately-sized dbml, with multiple repositories using it (the UI doesn't use the datacontexts directly - only the repository classes), with the navigation properties marked internal so the DAL can use them... Maybe...