Caching calculated values (Sums/Totals) in the Database

Caching calculated values (Sums/Totals) in the Database - c#

Consider the Following object model (->> indicates collection):
Customer->Orders
Orders->>OrderLineItems->Product{Price}
The app is focused on processing orders, so most of the time tables showing all the orders that match certain criteria are used in the UI. 99% of the time i am only interested in displaying the Sum of LineTotals, not the individual LineTotals.
Thinking about it further, there also might be multiple payments (wire transfers,cheque, credit card etc.) associated with each order, again, im only interested in the sum of money that i received.
When querying the database for an order, I dont want to select all orders and then, for each order, its payments and LineItems.
My idea was to store the associate each order with a "status" object, caching all the sums and status of an order, improving query performance by orders of magnitude and also supporting query scenarios for unpaid orders, paid orders, orders due etc.
This prevents domain logic (e.g. when an order is considered to be paid) from leaking into database queries. However, it puts the responsibility for keeping the sums up to date. The system usually has well defined points where that needs to happen, e.g. entering or integrating payments, creating/modifying an order.
So far i have used Observable Collections, that trigger recalculations of Status when items are added or removed, or certain properties on the items are updated. I ask myself where the logic for all that should be put from a ddd perspective. It seems strange to me to force all the event wiring and calculation logic in the aggregate root.

You need to express the intent of a request in an intention-revealing interface, so that your repositories can understand what exactly you want to do and react accordingly. In this case the interface reveals intent, not to other developers, but to other code. So if you want a status or total, create an interface that reveals this intent and request an object of that type from your repository. The repository can then create and return a domain object which encapsulates doing exactly the work required to calculate the total and no more than that.
In addition, your DAL can intelligently choose which fetching strategy to apply from the interface you request, i.e. lazy loading for situations where you don't need to access child objects and eager loading where you do.
Udi Dahan has some great blog posts about this. He has written and talked on applying intention-revealing interfaces to this problem, which he calls making roles explicit.

I highly recommend looking into OR (object relational) mappers that support LINQ. To name the two primary ones, LINQ to SQL and Entity Framework, both from Microsoft. I believe LLBLGen also supports LINQ now, and nHibernate has a few half-baked LINQ solutions you could try. My prime recommendation is Entity Framework v4.0, which is available through .NET 4.0 betas or the Visual Studio 2010 Beta.
With a LINQ enabled OR mapper, you can easily query for the aggregate information you need dynamically, real-time, using only your domain model. There is no need for business logic to leak into your data layer, because you generally will not use stored procedures. OR mappers generate parameterized SQL for you on the fly. LINQ combined with OR mappers is an extremely powerful tool that allows you to not only query for and retrieve entities and entity graphs, but also query for data projections on your domain model...allowing the retrieval of custom data sets, aggregations, etc. via a single conceptual model.

"It seems strange to me to force all the event wiring and calculation logic in the aggregate root."
That is usually a call for a «Service».

Related

Serializing complex EF model over JSON

I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave

It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.

For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.

i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .

I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.

Implementing the repository and service pattern with RavenDB

I have some difficulties implementing the repository and service pattern in my RavenDB project. The major concern is how my repository interface should look like because in RavenDB I use a couple of indexes for my queries.
Let's say I need to fetch all items where the parentid equals 1. One way is to use the IQueryable List() and get all documents and then add a where clause to select the items where the parentid equals 1. This seems like a bad idea because I can't use any index features in RavenDB. So the other approach is to have something like this, IEnumerable Find(string index, Func predicate) in the repository but that also seems like a bad idea because it's not generic enough and requires that I implement this method for if I would change from RavenDB to a common sql server.
So how can I implement a generic repository but still get the benefits of indexes in RavenDB?

This post sums it all up very nicely:
http://novuscraft.com/blog/ravendb-and-the-repository-pattern

First off, ask why you want to use the repository pattern?
If you're wanting to use the pattern because you're doing domain driven design, then as another of these answers points out, you need to re-think the intent of your query, and talk about it in terms of your domain - and you can start to model things around this.
In that case, specifications are probably your friend and you should look into them.
HOWEVER, let's look at a single part of your question momentarily before continuing with my answer:
seems like a bad idea because it's not generic enough and requires that I implement this method for if I would change from RavenDB to a common sql server.
You're going about it the wrong way - trying to make your system entirely persistence-agnostic at this level is asking for trouble - if you try hiding the unique features of your datastore from the queries themselves then why bother using RavenDB?
A method I tend to use in simple document-oriented (IE, I do talk in terms of data, which is what you appear to be doing), is to split up my queries from my commands.
Ask yourself, why do you want to query for your documents by parent ID? Is it to display a list on a page? Why are you trying to model this in terms of documents then? Why not model this in terms of a view model and use the most effective method of retrieving this data from RavenDB? (A query over an index (dynamic or otherwise)), stick this in a factory which takes 'some inputs' and generates 'the output' and if you do decide to change your persistence store, you can change these factories. (I go one step further in my ASP.NET MVC applications, and have single action controllers, and I don't call them controllers, making the query from those in most cases).
If you want to actually pull out your documents by parent id in order to update them or run some business logic across them, perhaps you've modelled them wrong - a write operation will typically only involve change to a single document, or in other words you should be modelling your documents around your transaction boundaries.
TL;DR
Think about what it is you actually want to achieve - why do you want to use the "Repository pattern" or the "Service pattern" - these words exist as ways of describing a scenario you might end up with if you model your application around your needs, as a common way of expressing the role of a certain object- not as something you need to shoehorn your every piece of functionality into.

Let's say I need to fetch all items
where the parentid equals 1.
First, stop thinking of your data access needs this way.
You DO NOT need to "fetch all items where the parentid equals 1". It will help to try and stop thinking in such a data oriented way.
What you need is to fetch all items with a particular parent. This is a concept that exists in your problem space (your application's domain).
The fact that you model this in the database with a foreign key and a field named parentid is an implementation detail. Encapsulate this, do not leak it throughout your application.
One way is to use the IQueryable
List() and get all documents and then
add a where clause to select the items
where the parentid equals 1. This
seems like a bad idea because I can't
use any index features in RavenDB. So
the other approach is to have
something like this, IEnumerable
Find(string index, Func predicate) in
the repository but that also seems
like a bad idea because
Both of these are bad ideas. What you are suggesting is requiring the code that calls your repository or query to have knowledge of your schema.
Why should the consumer of your repository care or know that there is a parentid field? If this changes, if the definition of some particular concept in your problem space changes, how many places in your code will have to change?
Every single place that fetches items with a particular parent.
This is bad, it is the antithesis of encapsulation.
My opinion is that you will want to model queries as explicit concepts, and not lambda's or strings passed around and used all over.
You can model queries explicitly with the Specification pattern, named query methods on a repository, Query Object pattern, etc.
it's not
generic enough and requires that I
implement this method for if I would
change from RavenDB to a common sql
server.
Well, that Func is too generic. Again, think about what your consuming code will need to know in order to use such a method of querying, you will be tying upper layers of your code directly to your DB schema doing this.
Also, if you change from one storage engine to another, you cannot avoid re-implementing queries where performance was enough of a factor to use storage-engine-specific aids (indexes in Raven, for example).

I would actually discourage you from using the repository pattern. In most cases, it is over-architecting and actually makes the code more complicated.
Ayende has made a number of posts to that end recently:
http://ayende.com/Blog/archive/2011/03/16/architecting-in-the-pit-of-doom-the-evils-of-the.aspx
http://ayende.com/Blog/archive/2011/03/18/the-wages-of-sin-over-architecture-in-the-real-world.aspx
http://ayende.com/Blog/archive/2011/03/22/the-wages-of-sin-proper-and-improper-usage-of-abstracting.aspx
I recommend just writing against Raven's native API.
If you feel that my response is too general, list some of the benefits you hope to gain from using another layer of abstraction and we can continue the discussion.

Binding DropdownList in Domain Driven Design

I have created domain model and define entities, value objects, Services and so on. Now, my query is that i have an entity called "Company" which has around 20+ attributes and i want to bind a dropdownlist control in one of my page that require only two attributes i.e. company.Name, company.Id. Why should i use such a heavy entity having 20+ attributes to bind the dropdownlist.
Is there any way to handle this considering the performance in mind as well.
Thanks in advance.

Question is not that much DDD related. It's about performance.
As I do - I do not care if there is 1 or 20 attributes as long as those attributes does not come from separate tables. There isn't high difference if select picks up 1 or 20 fields, but there is noticeable difference if select starts to join other tables and there is highly noticeable difference when there's select n+1.
So - when I retrieve list of Company through my ORM in order to create selectlist, it is smart enough to run sql select only over Company table and lazy load other things if they are necessary (which aren't in this particular case).
Luckily - I'm not developing systems that demands ultra high performance so I don't care if it takes 1 or 20 fields. If I did - I doubt that I would use ORM anyway.
For some other persistence mechanisms - it might not be a problem at all. E.g. - if You are using document database, You can store/retrieve whole aggregate into one document cause it's schema less. Performance penalty goes down drastically.

In this scenario I would either consider a Reference Data Service returning lightweight representations of the domain model or preferably (for simplicity) implement caching. Either way remember this is only something you should really worry about after you've identified performance of the architecture to be an issue.

Create a generic key/value model and expose a method from your service that returns your company name / company id list. This is similar to other value type data you query your DB for (ex state code / state id).

DDD is not about performance. With DDD you should operate "aggregates" - consistent set of connected entities. Aggregates should be loaded from datatbase into application memory.
Ah-hoc queries (actually almolst all queries) considered in DDD as "Reporting", and should be done by separate mechanisms like denirmalized databases.
http://devlicio.us/blogs/casey/archive/2009/02/13/ddd-command-query-separation-as-an-architectural-concept.aspx
it's bullshit. Don't use DDD.
Write queries when you need. Use Linq to compose-decompose queries.

1000+ Linq queries or logic in the database...which is worse?

I had asked this question in a much more long-winded way a few days ago, and the fact that I got no answers isn't surprising considering the length, so I figured I'd get more to the point.
I have to make a decision about what to display a user based on their assignment to a particular customer. The domain object looks like this vastly simplified example:
public class Customer
{
public string Name { get; set; }
public IEnumerable<Users> AssignedUsers { get; set; }
}
In the real world, I'll also be evaluating whether they have permissions (using bitwise comparisons on security flags) to view this particular customer even if they aren't directly assigned to it.
I'm trying to stick to domain-driven design (DDD) principles here. Also, I'm using LINQ to SQL for my data access. In my service layer, I supply the user requesting the list of customers, which right now is about 1000 items and growing by about 2% a month.
If I am strict about keeping logic in my service layer, I will need to use Linq to do a .Where that evaluates whether the AssignedUsers list contains the user requesting the list. This will cause a cascade of queries for each Customer as the system enumerates through. I haven't done any testing, but this seems inefficient.
If I fudge on the no-logic-in-the-data, then I could simply have a GetCustomersByUser() method that will do an EXISTS type of SQL query and evaluate security at the same time. This will surely be way faster, but now I'm talking about logic creeping into the database, which might create problems later.
I'm sure this is a common question people come up on when rolling out Linq...any suggestions on which way is better? Is the performance hit of Linq's multiple queries better than logic in my database?

Which is worse?
Depends who you ask.
Possibly if you ask the DDD ultra-purist they'll say logic in the database is worse.
If you ask pretty much anyone else, IMHO, especially your end users, pragmatic developers and the people who pay for the hardware and the software development, they'll probably say a large performance hit is worse.
DDD has much to commend it, as do lots of other design approaches, but they all fall down if you dogmatically follow them to the point of coming up with a "pure" design at the expense of real world considerations, such as performance.
If your really are having to perform this sort of query on data, then the database is almost certainly far better at performing the task.
Alternatively, have you "missed a trick" here. Is your design, however DDD, actually not right?
Overall - use your tools appropriately. By all means strive to keep logic cleanly seperated in your service layer, but not when that logic is doing large amounts of work that a database is designed for.

LINQ is an abstraction, it wraps a bunch of functionality into a nice little package with a big heart on top.
With any abstraction, you're going to get overhead mainly because things are just not as efficient as you or I might want them. MS did a great job in making LINQ quite efficient.
Logic should be where it is needed.Pure is nice, but if you are delivering a service or product you have to have the following in mind (these are in no particular order):
Maintenance. Will you be easily be able to get some work done after it's released without pulling the entire thing apart.
Scalability.
Performance
Usability.
Number 3 is one of the biggest aspects when working with the web. Would you do trigonometry on a SQL Server? No. Would you filter results based on input parameters? Yes.
SQL Server is built to handle massive queries, filtering, sorting, and data mining. It thrives on that, so let it do it.
It's not a logic creep, it's putting functionality where it belongs.

If AssignedUser is properly mapped (i.e. assosiation is generated by Linq2SQL designer or you have mark property with AssosiationAttribute (or some other from http://msdn.microsoft.com/en-us/library/system.data.linq.mapping(v=VS.90).aspx namespace, I'm not sure right now), Linq2Sql will translate linq query to SQL command, and will not iterate throught AssingedUser for each Customer.
Also you may use 'reversed' query like
from c in Customer
join u in Users on c.CustomerId equals u.AssignedToCustomerId // or other condition links user to customer
where <you condition here>
select c

If I am strict about keeping logic in my service layer, I will need to use Linq to do a .Where that evaluates whether the AssignedUsers list contains the user requesting the list. This will cause a cascade of queries for each Customer as the system enumerates through. I haven't done any testing, but this seems inefficient.
There should be no need to enumerate a local Customer collection.
The primary purpose of LinqToSql, is to allow you to declare the logic in the service layer, and execute that logic in the data layer.
int userId = CurrentUser.UserId;
IEnumerable<Customer> customerQuery =
from c in dataContext.Customers
where c.assignedUsers.Any(au => au.UserId = userId)
select c;
List<Customer> result = customerQuery.ToList();

I think your model is best described as a many-to-many relationship between the Customer class and the User class. Each User references a list of related Customers, and each Customer references a list of related Users. From a database perspective, this is expressed using a join table (according to Microsoft's LINQ to SQL terminology, they call this a "junction table").
Many-to-many relationships is the one feature LINQ to SQL doesn't support out of the box, you probably will notice this if you tried generating a DBML.
Several blogs have published workarounds, including one from MSDN (without any concrete examples, unfortunately). There's one blog (2-part post) which closely adheres to the MSDN suggestion:
Link
Link

I'd personally go with the stored proc. That's the right tool for job.
Not using it might be a nice design choice, but design paradigms are there to guide you, not to constrain you, in my opinion.
Plus, the boss only cares about performance :-)

When to separate certain entities into different repositories?

I generally try and keep all related entities in the same repository. The following are entities that have a relationship between the two (marked with indentation):
User
UserPreference
So they make sense to go into a user repository. However users are often linked to many different entities, what would you do in the following example?
User
UserPrefence
Order
Order
Product
Order has a relationship with both product and user but you wouldn't put functionality for all 4 entities in the same repository. What do you do when you are dealing with the user entities and gathering order information? You may need extra information about the product and often ORMs will offer the ability of lazy loading. However if your product entity is in a separate repository to the user entity then surely this would cause a conflict between repositories?

In the Eric Evan's Domain Driven Design ( http://domaindrivendesign.org/index.htm ) sense of things you should first think about what about your Aggregates. You then build you repositories around those.
There are many techniques for handling Aggregates that relate to each other. The one that I use most often is to only allow Aggregates to relate to each other through a read only interface. One of the key thoughts behind Aggregates is that you can't change state of underlying objects without going through the root. So if Product and User are root Aggregates in your model than I can't update a Product if I got to it by going through User->Order->Product. I have to get the Product from the Product repository to edit it. (From a UI point of view you can make it look like you go User->Order->Product, but when you hit the Product edit screen you grab the entity from the Product Repository).
When you are looking at a Product (in code) by going from User->Order->Product you should be looking at a Product interface that does not have any way to change the underlying state of the Product (only gets no sets etc.)
Organize your Aggregates and therefor Repositories by how you use them. I can see User and Prodcut being their own Aggregates and having their own Repositories. I'm not sure from your description if Order should belong to User or also be stand alone.
Either way use a readonly interface when Aggregates relate. When you have to cross over from one Aggregate to the other go fetch it from its own Repository.
If your Repositories are caching then when you load an Order (through a User) only load the Product Id's from the database. Then load the details from the Product Repository using the Product Id. You can optimize a bit by loading any other invariants on the Product as you load the Order.

By repository you mean class?
Depending on the use of the objects (repositories) you could make a view that combines the data on the database and create a class (repository) with your ORM to represent that view. This design would work when you want to display lighter weight objects with only a couple columns from each of the tables.

If SQL Server is your database, and by repository you mean a database, then I would just stick the information in whatever database makes sense and have a view in dependent databases that selects out of the other database via three-dot notation.

I'm still confused by what you mean by "repository." I would make all of the things you talked about separate classes (and therefore separate files) and they'd all reside in the same project.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.