I have some difficulties implementing the repository and service pattern in my RavenDB project. The major concern is how my repository interface should look like because in RavenDB I use a couple of indexes for my queries.
Let's say I need to fetch all items where the parentid equals 1. One way is to use the IQueryable List() and get all documents and then add a where clause to select the items where the parentid equals 1. This seems like a bad idea because I can't use any index features in RavenDB. So the other approach is to have something like this, IEnumerable Find(string index, Func predicate) in the repository but that also seems like a bad idea because it's not generic enough and requires that I implement this method for if I would change from RavenDB to a common sql server.
So how can I implement a generic repository but still get the benefits of indexes in RavenDB?
This post sums it all up very nicely:
http://novuscraft.com/blog/ravendb-and-the-repository-pattern
First off, ask why you want to use the repository pattern?
If you're wanting to use the pattern because you're doing domain driven design, then as another of these answers points out, you need to re-think the intent of your query, and talk about it in terms of your domain - and you can start to model things around this.
In that case, specifications are probably your friend and you should look into them.
HOWEVER, let's look at a single part of your question momentarily before continuing with my answer:
seems like a bad idea because it's not generic enough and requires that I implement this method for if I would change from RavenDB to a common sql server.
You're going about it the wrong way - trying to make your system entirely persistence-agnostic at this level is asking for trouble - if you try hiding the unique features of your datastore from the queries themselves then why bother using RavenDB?
A method I tend to use in simple document-oriented (IE, I do talk in terms of data, which is what you appear to be doing), is to split up my queries from my commands.
Ask yourself, why do you want to query for your documents by parent ID? Is it to display a list on a page? Why are you trying to model this in terms of documents then? Why not model this in terms of a view model and use the most effective method of retrieving this data from RavenDB? (A query over an index (dynamic or otherwise)), stick this in a factory which takes 'some inputs' and generates 'the output' and if you do decide to change your persistence store, you can change these factories. (I go one step further in my ASP.NET MVC applications, and have single action controllers, and I don't call them controllers, making the query from those in most cases).
If you want to actually pull out your documents by parent id in order to update them or run some business logic across them, perhaps you've modelled them wrong - a write operation will typically only involve change to a single document, or in other words you should be modelling your documents around your transaction boundaries.
TL;DR
Think about what it is you actually want to achieve - why do you want to use the "Repository pattern" or the "Service pattern" - these words exist as ways of describing a scenario you might end up with if you model your application around your needs, as a common way of expressing the role of a certain object- not as something you need to shoehorn your every piece of functionality into.
Let's say I need to fetch all items
where the parentid equals 1.
First, stop thinking of your data access needs this way.
You DO NOT need to "fetch all items where the parentid equals 1". It will help to try and stop thinking in such a data oriented way.
What you need is to fetch all items with a particular parent. This is a concept that exists in your problem space (your application's domain).
The fact that you model this in the database with a foreign key and a field named parentid is an implementation detail. Encapsulate this, do not leak it throughout your application.
One way is to use the IQueryable
List() and get all documents and then
add a where clause to select the items
where the parentid equals 1. This
seems like a bad idea because I can't
use any index features in RavenDB. So
the other approach is to have
something like this, IEnumerable
Find(string index, Func predicate) in
the repository but that also seems
like a bad idea because
Both of these are bad ideas. What you are suggesting is requiring the code that calls your repository or query to have knowledge of your schema.
Why should the consumer of your repository care or know that there is a parentid field? If this changes, if the definition of some particular concept in your problem space changes, how many places in your code will have to change?
Every single place that fetches items with a particular parent.
This is bad, it is the antithesis of encapsulation.
My opinion is that you will want to model queries as explicit concepts, and not lambda's or strings passed around and used all over.
You can model queries explicitly with the Specification pattern, named query methods on a repository, Query Object pattern, etc.
it's not
generic enough and requires that I
implement this method for if I would
change from RavenDB to a common sql
server.
Well, that Func is too generic. Again, think about what your consuming code will need to know in order to use such a method of querying, you will be tying upper layers of your code directly to your DB schema doing this.
Also, if you change from one storage engine to another, you cannot avoid re-implementing queries where performance was enough of a factor to use storage-engine-specific aids (indexes in Raven, for example).
I would actually discourage you from using the repository pattern. In most cases, it is over-architecting and actually makes the code more complicated.
Ayende has made a number of posts to that end recently:
http://ayende.com/Blog/archive/2011/03/16/architecting-in-the-pit-of-doom-the-evils-of-the.aspx
http://ayende.com/Blog/archive/2011/03/18/the-wages-of-sin-over-architecture-in-the-real-world.aspx
http://ayende.com/Blog/archive/2011/03/22/the-wages-of-sin-proper-and-improper-usage-of-abstracting.aspx
I recommend just writing against Raven's native API.
If you feel that my response is too general, list some of the benefits you hope to gain from using another layer of abstraction and we can continue the discussion.
Related
The Problem
How are large collections implemented in DDD that "feel" like they should be a part of the aggregate root, yet would be impractical if they were? Here are a few examples based on my domain.
Employee Aggregate Root
Announcements Collection
Direct Messages Collection
Product Aggregate Root
Stock Items Collection
etc. etc..
What I'm Thinking
I would like to keep the ability to navigate to these large collections from the aggregate root but since I'm wrapping my O/RM with Repositories lazy loading isn't really an option... Unless I implement lazy loading by injecting either the necessary repository. But I know from what I've read about DDD that domain entities should not know about any such repositories..
The other option would be to take the approach that any potentially large collection of entities in my domain is an Aggregate Root and should have its own repository with the required interface to get the collection of items by another aggregate root. Eg.
public interface IStockRepository
{
IEnumerable<StockItem> FetchByProduct(Product product);
// ...
}
This "I would like to keep the ability to navigate to these large collections from the aggregate root" ... is a smell. You seem pretty obsessed, if you don't mind me saying, with the structure of your aggregate and not with its behavior, what problem(s) it is solving, any invariants that come into play. Frankly, the feeling you have is misplaced. It's a residue of our structural, database oriented way of thinking.
In general, I'd say, one should not have these large collections in the first place. For one loading them will require resources (memory, cpu, bandwidth) better spent elsewhere. From a more functional perspective people tend to not deal with large amounts at once anyway, and even computers can do more work when you break things down into units of work. As such, try to stay away from large collections and always question "why" you'd need them in the first place.
An announcement could be its own aggregate, referring to the employee by its id, so we know who the announcement was about (or for?). If the announcements are targeted at groups of employees, you might want to look into what defines that group, and model it explicitly. A direct message could also be its own aggregate because it is probably a message from one person to another. One could say the employee has the role of being a message recipient and/or sender. Again, referring to the employee aggregate by id might suffice. A stock item might be treated individually and refer to the product it represents within the stock by its productid. What is the behavior of an employee, an announcement, a direct message, a product, a stockitem? How and when does changing the state of its collaborators affect them and really, why is that? It's a means to a root cause. Find it.
All that said, there are times when you can bend the rules a bit, but they should be few.
Take a look at the Forum DDD example from Vaughn Vernon. He modeled the large collections out of the aggregate root. Creation is done by a factory method on the aggregate to keep control of some thing, like a dicussion can not be created when the Forum is closed. Actions are done through the AR Forum (like startDiscussion and moderatePost).
The method returns an entity (Post) that need to be saved in a separate repository (PostRepository) by the application service. Now you can have large collections without the need to load every time.
https://github.com/VaughnVernon/IDDD_Samples/tree/master/iddd_collaboration/src/main/java/com/saasovation/collaboration/domain/model/forum
We have an ASP.NET MVC site that uses Entity Framework abstractions with Repository and UnitOfWork patterns. What I'm wondering is how others have implemented navigation of complex object graphs with these patterns. Let me give an example from one of our controllers:
var model = new EligibilityViewModel
{
Country = person.Pathway.Country.Name,
Pathway = person.Pathway.Name,
Answers = person.Answers.ToList(),
ScoreResult = new ScoreResult(person.Score.Value),
DpaText = person.Pathway.Country.Legal.DPA.Description,
DpaQuestions = person.Pathway.Country.Legal.DPA.Questions,
Terms = person.Pathway.Country.Legal.Terms,
HowHearAboutUsOptions = person.Pathway.Referrers
};
It's a registration process and pretty much everything hangs off the POCO class Person. In this case we're caching the person through the registration process. I've now started implementing the latter part of the registration process which requires access to data deeper in the object graph. Specifically DPA data which hangs off Legal inside Country.
The code above is just mapping out the model information into a simpler format for the ViewModel. My question is do you consider this fairly deep navigation of the graph good practice or would you abstract out the retrieval of the objects further down the graph into repositories?
In my opinion, the important question here is - have you disabled LazyLoading?
If you haven't done anything, then it's on by default.
So when you do Person.Pathway.Country, you will be invoking another call to the database server (unless you're doing eager loading, which i'll speak about in a moment). Given you're using the Repository pattern - this is a big no-no. Controllers should not cause direct calls to the database server.
Once a C ontroller has received the information from the M odel, it should be ready to do projection (if necessary), and pass onto the V iew, not go back to the M odel.
This is why in our implementation (we also use repository, ef4, and unit of work), we disable Lazy Loading, and allow the pass through of the navigational properties via our service layer (a series of "Include" statements, made sweeter by enumerations and extension methods).
We then eager-load these properties as the Controllers require them. But the important thing is, the Controller must explicitly request them.
Which basically tells the UI - "Hey, you're only getting the core information about this entity. If you want anything else, ask for it".
We also have a Service Layer mediating between the controllers and the repository (our repositories return IQueryable<T>). This allows the repository to get out of the business of handling complex associations. The eager loading is done at the service layer (as well as things like paging).
The benefit of the service layer is simple - more loose coupling. The Repository handles only Add, Remove, Find (which returns IQueryable), Unit of Work handles "newing" of DC's, and Commiting of changes, Service layer handles materialization of entities into concrete collections.
It's a nice, 1-1 stack-like approach:
personService.FindSingle(1, "Addresses") // Controller calls service
|
--- Person FindSingle(int id, string[] includes) // Service Interface
|
--- return personRepository.Find().WithIncludes(includes).WithId(id); // Service calls Repository, adds on "filter" extension methods
|
--- IQueryable<T> Find() // Repository
|
-- return db.Persons; // return's IQueryable of Persons (deferred exec)
We haven't got up to the MVC layer yet (we're doing TDD), but a service layer could be another place you could hydrate the core entities into ViewModels. And again - it would be up to the controller to decide how much information it wishes.
Again, it's all about loose coupling. Your controllers should be as simplistic as possible, and not have to worry about complex associations.
In terms of how many Repositories, this is a highly debated topic. Some like to have one per entity (overkill if you ask me), some like to group based on functionality (makes sense in terms of functionality, easier to work with), however we have one per aggregate root.
I can only guess on your Model that "Person" should be the only aggregate root i can see.
Therefore, it doesn't make much sense having another repository to handle "Pathways", when a pathway is always associated with a particular "Person". The Person repository should handle this.
Again - maybe if you screencapped your EDMX, we could give you more tips.
This answer might be extending out a little too far based on the scope of the question, but thought i'd give an in-depth answer, as we are dealing with this exact scenario right now.
HTH.
It depends on how much of the information you're using at any one time.
For example, if you just want to get the country name for a person (person.Pathway.Country.Name) what is the point in hydrating all of the other objects from the database?
When I only need a small part of the data I tend to just pull out what I'm going to use. In other words I will project into an anonymous type (or a specially made concrete type if I must have one).
It's not a good idea to pull out an entire object and everything related to that object every time you want to access some properties. What if you're doing this once every postback, or even multiple times? By doing this you might be making life easier in the short term at the cost of you're making your application less scalable long term.
As I stated at the start though, there isn't a one size fits all rule for this, but I'd say it's rare that you need to hydrate that much information.
I had asked this question in a much more long-winded way a few days ago, and the fact that I got no answers isn't surprising considering the length, so I figured I'd get more to the point.
I have to make a decision about what to display a user based on their assignment to a particular customer. The domain object looks like this vastly simplified example:
public class Customer
{
public string Name { get; set; }
public IEnumerable<Users> AssignedUsers { get; set; }
}
In the real world, I'll also be evaluating whether they have permissions (using bitwise comparisons on security flags) to view this particular customer even if they aren't directly assigned to it.
I'm trying to stick to domain-driven design (DDD) principles here. Also, I'm using LINQ to SQL for my data access. In my service layer, I supply the user requesting the list of customers, which right now is about 1000 items and growing by about 2% a month.
If I am strict about keeping logic in my service layer, I will need to use Linq to do a .Where that evaluates whether the AssignedUsers list contains the user requesting the list. This will cause a cascade of queries for each Customer as the system enumerates through. I haven't done any testing, but this seems inefficient.
If I fudge on the no-logic-in-the-data, then I could simply have a GetCustomersByUser() method that will do an EXISTS type of SQL query and evaluate security at the same time. This will surely be way faster, but now I'm talking about logic creeping into the database, which might create problems later.
I'm sure this is a common question people come up on when rolling out Linq...any suggestions on which way is better? Is the performance hit of Linq's multiple queries better than logic in my database?
Which is worse?
Depends who you ask.
Possibly if you ask the DDD ultra-purist they'll say logic in the database is worse.
If you ask pretty much anyone else, IMHO, especially your end users, pragmatic developers and the people who pay for the hardware and the software development, they'll probably say a large performance hit is worse.
DDD has much to commend it, as do lots of other design approaches, but they all fall down if you dogmatically follow them to the point of coming up with a "pure" design at the expense of real world considerations, such as performance.
If your really are having to perform this sort of query on data, then the database is almost certainly far better at performing the task.
Alternatively, have you "missed a trick" here. Is your design, however DDD, actually not right?
Overall - use your tools appropriately. By all means strive to keep logic cleanly seperated in your service layer, but not when that logic is doing large amounts of work that a database is designed for.
LINQ is an abstraction, it wraps a bunch of functionality into a nice little package with a big heart on top.
With any abstraction, you're going to get overhead mainly because things are just not as efficient as you or I might want them. MS did a great job in making LINQ quite efficient.
Logic should be where it is needed.Pure is nice, but if you are delivering a service or product you have to have the following in mind (these are in no particular order):
Maintenance. Will you be easily be able to get some work done after it's released without pulling the entire thing apart.
Scalability.
Performance
Usability.
Number 3 is one of the biggest aspects when working with the web. Would you do trigonometry on a SQL Server? No. Would you filter results based on input parameters? Yes.
SQL Server is built to handle massive queries, filtering, sorting, and data mining. It thrives on that, so let it do it.
It's not a logic creep, it's putting functionality where it belongs.
If AssignedUser is properly mapped (i.e. assosiation is generated by Linq2SQL designer or you have mark property with AssosiationAttribute (or some other from http://msdn.microsoft.com/en-us/library/system.data.linq.mapping(v=VS.90).aspx namespace, I'm not sure right now), Linq2Sql will translate linq query to SQL command, and will not iterate throught AssingedUser for each Customer.
Also you may use 'reversed' query like
from c in Customer
join u in Users on c.CustomerId equals u.AssignedToCustomerId // or other condition links user to customer
where <you condition here>
select c
If I am strict about keeping logic in my service layer, I will need to use Linq to do a .Where that evaluates whether the AssignedUsers list contains the user requesting the list. This will cause a cascade of queries for each Customer as the system enumerates through. I haven't done any testing, but this seems inefficient.
There should be no need to enumerate a local Customer collection.
The primary purpose of LinqToSql, is to allow you to declare the logic in the service layer, and execute that logic in the data layer.
int userId = CurrentUser.UserId;
IEnumerable<Customer> customerQuery =
from c in dataContext.Customers
where c.assignedUsers.Any(au => au.UserId = userId)
select c;
List<Customer> result = customerQuery.ToList();
I think your model is best described as a many-to-many relationship between the Customer class and the User class. Each User references a list of related Customers, and each Customer references a list of related Users. From a database perspective, this is expressed using a join table (according to Microsoft's LINQ to SQL terminology, they call this a "junction table").
Many-to-many relationships is the one feature LINQ to SQL doesn't support out of the box, you probably will notice this if you tried generating a DBML.
Several blogs have published workarounds, including one from MSDN (without any concrete examples, unfortunately). There's one blog (2-part post) which closely adheres to the MSDN suggestion:
Link
Link
I'd personally go with the stored proc. That's the right tool for job.
Not using it might be a nice design choice, but design paradigms are there to guide you, not to constrain you, in my opinion.
Plus, the boss only cares about performance :-)
I am using ASP.NET MVC 2 and C# with Entity Framework 4.0 to code against a normalised SQL Server database. A part of my database structure contains a table of entries with foreign keys relating to sub-tables containing drivers, cars, engines, chassis etc.
I am following the Nerd Dinner tutorial which sets up a repository for dinners which is fair enough. Do I do one for drivers, one for engines, one for cars and so on or do I do one big one for entries?
Which is the best practise for this type of work? I am still new to this method of coding.
I guess there's really no single "best practice" for this - it depends on your coding style and the requirements for your app. You definitely could create a repository for each entity type in your system - that'll work just fine.
In your case here, I would probably at least consider having a repository for drivers, and possibly a second one for cars, engines, chassis (since those are kinda in the same area of expertise - they're interconnected, they "belong" together).
But: of course, if that single repository for cars, engine and chassis gets too bloated, you might consider breaking it up into three separate repositories.
I would try to find a balance between the number of repositories - try to group together what logically belongs together - and the number of methods on those repositories. Five, ten methods is okay - if you're talking 20, 30 or 50 methods, your repository might just be too big and unwieldy.
This is definitely an architectural decision, and as such, they're not really a lot of hard facts to guide you - it's more of a "gut feeling" and experience kind of thing. If you don't have that necessary experience yet - go with one approach, use it, and when you're done - look at it again with a critical eye and see: what worked about it? What about it didn't work? and then in your next project, try some other approach and question its validity, too, at the end of the project. Live and learn!
Personally speaking, I find it best to create a separate repository for each table.
I then create a services layer where in one class I would run all the commands for a specific action (ex, change an already existing driver's car to a newly added car). The services layer is where I'll call multiple repository if an action I need done contains multiple objects that are interrelated.
I try my best to keep my repositories as "dumb" as possible and put all the "smart" stuff in the services layer. The extra services layer also helps me avoid bloating my controllers.
I am using generic (parametrized) repository for every entity. This repository contains basic CRUD function. Above that I have service for every functionality. Every service instantiates needed repositories. ObjectContext is common for all of them and there is one for every request. This is my repository interface:
public interface IRepository<T>
{
//Retrieves list of items in table
IQueryable<T> List();
IQueryable<T> List(params string[] includes);
//Creates from detached item
void Create(T item);
void Delete(int id);
T Get(int id);
T Get(int id, params string[] includes);
void SaveChanges();
}
I have also generic implementation, that is long, but easy to implement. You won't have to create repository class for every type, just instantiate Repository<Type>.
I usually take the diagramm of the database and draw circles around what I consisder a bigger unit or system.
That gets me something like "ProductSystem" , "OrderSystem" , "UserSystem", "PaymentSystem" in a shop.
If systems get too big I devide them.
If systems are too little I dont care: Everything will grow anyway.
If something belongs in some way to 2 systems I choose 1 and dont change it anymore, even if the next day the decission seams wrong: I know that the day will come when I want to change it back again.
I get a list of items from my Repository. Now I need to sort and filter them, which I believe would be done in the Repository for efficiency. I think there would be two ways of doing this in a DDD way:
Send a filter and a sort object full of conditions to the Repository (What is this called)?
Repository result would produce an object with .filter and .sort methods? (This wouldn't be a POJO/POCO because it contains more than one object?).
So is the answer 1, 2, or other? Could you explain why? I am leaning toward #1 because the Repository would be able to only send the data I want (or will #2 be able to delay accessing data like a LazyList?) A code example (or website link) would be very helpful.
Example:
Product product = repo.GetProducts(mySortObject, myFilterObject); // List of Poco
product.AddFilter("price", "lessThan", "3.99"); product.AddSort("price", "descending");
I would personally go with your first option.
If you think about the second option from a DDD perspective the product object, which I'm assuming is your domain object, has a knowledge about something that isn't really part of the business problem you are trying to solve (I.E., you domain). Rather sorting and filtering is used in the user interface or some other back end processing component.
Additionally when looking at the second option from a Single Responsibility (aka SOLID) perspective you'll see that your Product business object has the responsibility for sorting and filtering, something that isn't at all related to a product.
That's how I see things. I would be interested in others opinions.
This isn't a complete answer, but you might want to take a look at some of the thinking behind CQRS (Command Query Responsibility Segregation) (some good links can be found here).
CQRS is a way of thinking about DDD that can help clarify some of this. It's at a higher level than your specific question, but it might help.
Essentially I think it will just help you decide to go with your first option (which is what I ended up with in a similar situation). We called it a Query Object.