Calculated fields that improve performance but need to be maintained (EF)

Calculated fields that improve performance but need to be maintained (EF) - c#

I have this "1 to N" model:
class Reception
{
public int ReceptionId { get; set; }
public string Code { get; set; }
public virtual List<Item> Items { get; set; }
}
class Item
{
public int ItemId { get; set; }
public string Code { get; set; }
public int Quantity { get; set; }
public int ReceptionId { get; set; }
public virtual Reception Reception { get; set; }
}
And this action, api/receptions/list
public JsonResult List()
{
return dbContext.Receptions
.Select(e => new
{
code = e.Code,
itemsCount = e.Items.Count,
quantity = e.Items.Sum(i => i.Quantity)
}).ToList();
}
which returns a list of receptions, with their number of items:
[
{code:"1231",itemsCount:10,quantity:30},
{code:"1232",itemsCount:5,quantity:70},
{code:"1234",itemsCount:30,quantity:600},
...
]
This was working fine but I'm having too many Reception's and Item's thus the query is taking too long...
So I want to speed up by adding some persisted fields to Reception:
class Reception
{
public int ReceptionId { get; set; }
public string Code { get; set; }
public virtual List<Item> Items { get; set; }
public int ItemsCount { get; set; } // Persisted
public int Quantity { get; set; } // Persisted
}
With this change, the query ends up being this:
public JsonResult List()
{
return dbContext.Receptions
.Select(e => new
{
code = e.Code,
itemsCount = e.ItemsCount,
quantity = e.Quantity
}).ToList();
}
My question is:
What's the best way to maintain these two fields?
I will gain in performance but now I will need to be more careful with the creation of Item's
Today an Item can be created, edited and deleted:
api/items/create?receptionId=...
api/items/edit?itemId=...
api/items/delete?itemId=...
I also have a tool for importing receptions via Excel:
api/items/createBulk?...
Maybe tomorrow I will have more ways of creating Item's, so the question is how do I make sure that these two new fields, ItemsCount and Quantity will be up to date always?
Should I create a method within Reception like this?
class Reception
{
...
public void UpdateMaintainedFields()
{
this.Quantity = this.Items.Sum(e => e.Quantity);
this.ItemsCount = this.Items.Count();
}
}
And then REMEMBER to call it from all the previous URL's? (items/create, items/edit, ...)
Or maybe should I have a stored procedure in the database?
What is the common practice? I know there are calculated columns but these refer to fields of the same class. Also there are indexed views, but I'm not sure if they apply well to scenarios like this.

From your code it seems to me that you do not have a layer for business logic, and everything is implemented in the controllers, this causes the problem for you that when you would have a different way (and it seems, that you mean a different controller) you have to implement this logic again and it is easy to forget, and if you do not forget, you could forget to maintain later.
So I would recommend to have a layer for business logic (like adding new items) and use it from the controllers where you want to create items.
I would also recommend write the function UpdateMaintainedFields as you asked, but call it in the business logic layer after adding the items, not in the controllers!
You could write the logic on the database also (trigger) if you can accept that you can't write unit test.

Assuming the original query cannot be improved with the correct execution plan in SQLServer, the way to update these fields is via a trigger in the DB. When an insert occurs (or possible an update if your persisted fields change according to the data) then when an insert occurs to that table, the trigger is run. It would be responsible for updating all the rows with the new values.
Obviously your insert performance would drop, but your query performance would be that of a simple index and read of a single row. Obviously you wouldn't be able to use this trick if you were to return a subset of the table, as all the quantities would be fixed.
An alternative is to hold the count and quantity sums in a separate table, or in a dummy row that holds the summed quantities as its entry for quantity. YMMV.
PS I hate how what is a SQL question has been turned in one about C# code! Learn SQL and run the queries you need directly in the DB, that will show you much more about the performance and structure of what you're looking for than getting EF involved. /rant :)

You want to store the same information duplicitly, which can lead to inconsistencies. As an inspiration, indexes are also duplicating data. How do you update them? You don't. It is all fully transparent. And I would recommend the same approach here.
Make sum table, maintained by triggers. The table would not be included in any datacontext schema, only way to read it would be through non updateable views or stored procedures. Its name should evoke, that nobody should ever touch this table directly.
You can now access your data from various frameworks and do not worry about updating anything. Database would assure the precalculated sums are always correct, as long as you do not write to the sum table on your own. In fact you can add or remove this table any time and no application would even notice.

Related

Calculated Property in Entity Framework

I stuck with a problem I have been looking for a solution to, but without much luck.
Using Entity Framework Code First, I need the ability to create a calculated property that does not rely on loading all of the objects before calculating.
// Psuedo Code for what I need
public class GoodInventoryChange
{
public int GoodID { get; set; }
public double Amount { get; set; } // Amount of change
public DateTime OccurredAt { get; set; } // Timestamp of the change
public double RunningTotal { get { /* CODE TBD*/ } } // Prior record plus amount
}
All of the suggestions on how to do this that I have found require calling .ToList() or similar, which may require many 1000s of records to be loaded in order to find a single entry.
In the end, I need the ability to query for:
// Psuedo Code
int goodID = 123;
var lowestRunningTotal = (from item in Context.GoodInventoryChanges
where item.GoodID == goodID && DateTime.Now <= item.OccurredAt
orderby item.RunningTotal
select item).FirstOrDefault();
I am using RunningTotal as an example here, but I have about 15-20 fields that need to be calculated in a similar fashion.
Does anyone have any advice or direction to point me in? I know I can brute force it, but I am hoping to do it via the SQL layer of Entity Framework.
I am OK creating calculated fields in the DB if there is a nice way to map them to Entity Framework classes as well.

You can use computed columns in the database, and decorate your entity with DatabaseGenerated attribute to prevent EF to try to write back its value to the table. After load, EF will read its value when you insert or update:
[DatabaseGenerated(DatabaseGeneratedOption.Computed)]
public string YourComputedProperty { get; set; }

Should I introduce redundancy into model design

I am trying to design a new system for tracking sales. A simplistic version of my data models are:
public class Sale
{
public int SaleId { get; set; }
public DateTime CompletedDateTime { get; set; }
public virtual List<SaleItem> SaleItems { get; set; }
public decimal Total
{
get
{
return SaleItems.Sum(i => i.Price);
}
}
}
public class SaleItem
{
public int SaleItemId { get; set; }
public decimal Price { get; set; }
public int SaleId { get; set; }
public virtual Sale Sale { get; set; }
}
I am now writing some reports which total the sales value for between a specified period. I have the following code to do that:
List<Sale> dailySales = db.Sales
.Where(x => DbFunctions.TruncateTime(x.CompletedDateTime) >= fromParam)
.Where(x => DbFunctions.TruncateTime(x.CompletedDateTime) <= toParam)
.ToList();
decimal total = dailySales.Sum(x => x.Total);
This is working ok and giving me the expected result. I feel like this might give me problems further down the line though once large datasets get involved. I assume having to load all the Sale's into a list would become resource intensive, plus my actual implementation has tax, costs etc. associated with each SaleItem so again becomes more complex.
The following would allow me to do all the processing on the database, however it is not possible to do this as the DB does not have a representation for Total, so EF throws an error:
Decimal total = db.Sales.Sum(x=>x.Total);
Which leads me to my question. I could set me model as the following and each time I add a SaleItem, make sure I update the Total:
public class Sale
{
...
public decimal Total { get; set; }
}
This would then allow me to query the database as required, and I assume will be less resource intensive. The flip side though is that I have reduced redundancy into the database. Is the latter way, the better method of dealing with this or is there an alternative method I haven't even considered that is better?

It depends on many factors. For instance, how often will you require the "Total" amount to be available? And how many SaleItems are there usually in a Sale?
If we're talking about, say, a supermarket kind of sale where you have... say... maximum of maximums 200 items. It's quite okay to just quickly calculate it on the fly. Then again, if this ever gets mapped to a RDBMS and if you have all the SaleItems in one single table, having an index on the foreign key (which links each individual SaleItem to its Sale) is a must, otherwise performance will take a huge hit once you start to have millions of transactions to sift through.
Answering the second half of your question, having redundancy is not always a bad thing... you just need to make sure that if each Sale ever needs to get its List modified, at the end of it the Total is recalculated. It's slightly dangerous (redundancy always has this attached burden) but you just need to ensure that whatever has the potential to change the Sale, does so in a way (maybe even with a trigger in the RDBMS) that the total will be automatically recalculated.
Hope it helps!

You're right that, it's much more efective to calculate totals on the DB side instead of loading the whole list and calculating it on the application.
I think you're missing that you can make a LINQ query that gets the SUM of related children entities.
using (var ctx = new MyDbContext())
{
var totalSales = ctx.Sales
.Select(s => s.SaleItems.Sum(si => si.Price)) // Total of each Sale
.Sum(tsi => tsi); // Sum of the total of each sale
}
You can of course shape the query to bring additional information, projecting the result in an anonymous class or in a class created ad-hoc for this purpose.
Of course, this EF query will be translated into a SQL query and executed on the server side.
When you start using LINQ to EF it's not very obvious how to get what you want, but in most occassions you can do it.

RESTful Interface - Adding child objects

I have a RESTful interface exposed that allows for adding Category and SubCategory types.
Category
public class Category : EntityBase<Category>
{
public string Name { get; set; }
public bool Enabled { get; set; }
public virtual ICollection<SubCategory> SubCategories { get; set; }
}
SubCategory
public class SubCategory : EntityBase<SubCategory>
{
public int CategoryId { get; set; }
public string Name { get; set; }
public bool Enabled { get; set; }
public virtual ICollection<Product> Products { get; set; }
}
My question is should I pass the Category object with it's associated children SubCategories and then figure out what children are new:
public void AddSubCategory(Category category)
{
// Figure out what object on the SubCategories collection are new (no PK value?)?
}
or would an approach like so be better?:
public void AddSubCategory(int categoryId, SubCategory subCategory);

Your second approach is cleaner but may take you a bit longer to setup on the front end. You can go right to pushing the subcategory into your db/store [although I recommend doing a check in your stored procedure or entity repository to prevent two subcategories of the same name].
With the first approach you will need to iterate through the entire list of subcategories and possibility do a database call on each one or do something messy like submit the entire list to a stored procedure and churn through it in there.
The second approach will scale much better as well. Consider how much more data is being sent to the server and then being reprocessed as the list of subcategories grows.
Besides transferring redundant data and needing to potentially cause a lot of extra database calls, you are probably concerned with the style choice for those who will implement your API and I think developers would expect separate methods for adding, updating, deleting and would find submitting the entire list confusing.
Cheers

Well, i think there is no obvious answer to this question, and it comes down to a matter of taste.
Personally, if what you wish to only add one object at a time, i would go with the second approach, since it would save you iterating over a list of subcategories, which will benefit the performance of your application. You never know how your application will scale and you might end up with a lot of categories to deal with.
What i would do is take the second approach, but since you already have a CategoryId inside your second method, i would just change the signature to be:
public void AddSubCategory(SubCategory subCategory);
And extract the category id from the subCategory.

Setting what table a DbContext maps to

In an application I'm working on, I have what are essentially a bunch of lookup tables in a database which all contain two things: The ID (int) and a Value (string).
There's only a handful of them, but I want to map all of them to a single Context which depends on the table name. Something like:
class LookupContext : DbContext
{
public DbSet<Lookup> Lookups { get; set; }
public LookupContext(String table)
{
// Pseudo code:
// Bind Lookups based on what table is
Lookups = MyDatabase.BindTo(table);
}
}
So if I create a new LookupContext("foo"), it binds against the foo table. If I do new LookupContext("bar") it uses the bar table, and so forth.
Is there any way to do this? Or do I have to create a separate context + model for every table I have?
This is more or less my first time doing this, so I'm not really sure if what I'm doing is right.

The answer we should be able to give you is to use enums, but that's not available quite yet - it's in the next version of EF. See here for details: http://blogs.msdn.com/b/adonet/archive/2011/06/30/walkthrough-enums-june-ctp.aspx
With earlier versions of EF, you can simply create a class per lookup value (assuming state as an example) and have code that looks something like the following:
public class State
{
public int StateId {get;set;}
public string StateName {get;set;}
}
public class LookupContext : DbContext
{
public DbSet<State> States {get;set;}
// ... more lookups as DbSets
}
This will allow you to use one context but will still require one class per table. You can also use the fluent API if you want your table/column names to differ from your class/property names respectively. Hope that helps!

I actually realized I was completely over complicating things beyond reason. There was no reason for storing multiple tables with two columns.
I'm better off storing my data as:
public class LookupValue
{
public string LookupValueId { get; set; }
public string Value { get; set; }
public string LookupType { get; set; }
}
Where the third field was simply the name of the table that I was previously storing in the database.
I'm still interested in the idea of mapping a single Context class to multiple tables, but I believe what I described above is the least convoluted way of accomplishing what I need.

Adding a property for the count of associated entities in Entity Framework Code First

I'm using Code First to write my data layer, then transmitting to a Silverlight front end using RIA services. Since I have to serialize everything, I would like to get some additional information on each entity before sending it across the wire (to reduce load time). In the past I have done this by translating everything to a POCO class that has the additional information. I'm wondering if there's a better way of doing this. To give you an idea, here's my class:
public class District
{
// ... Other properties, not important
public ICollection Installations { get; set; }
//The property I would like to calculate on the fly
[NotMapped]
public int InstallationCount { get; set; }
}
Is there a way to have this property calculate automatically before I send it across the wire? One option would be just to Include the Installation collection, but that adds a lot of bulk (there are about 50 properties on the Installation entity, and potentially hundreds of records per district).

Rather than making InstallationCount an automatic property, just use the get to return the count function of Installations collection.
public class District
{
public virtual ICollection<Installation> Installations { get; set; }
[NotMapped]
public int InstallationCount { get { return Installations.Count; } }
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Calculated fields that improve performance but need to be maintained (EF) - c#

Related

Calculated Property in Entity Framework

Should I introduce redundancy into model design

RESTful Interface - Adding child objects

Setting what table a DbContext maps to

Adding a property for the count of associated entities in Entity Framework Code First

Categories

Resources