I stuck with a problem I have been looking for a solution to, but without much luck.
Using Entity Framework Code First, I need the ability to create a calculated property that does not rely on loading all of the objects before calculating.
// Psuedo Code for what I need
public class GoodInventoryChange
{
public int GoodID { get; set; }
public double Amount { get; set; } // Amount of change
public DateTime OccurredAt { get; set; } // Timestamp of the change
public double RunningTotal { get { /* CODE TBD*/ } } // Prior record plus amount
}
All of the suggestions on how to do this that I have found require calling .ToList() or similar, which may require many 1000s of records to be loaded in order to find a single entry.
In the end, I need the ability to query for:
// Psuedo Code
int goodID = 123;
var lowestRunningTotal = (from item in Context.GoodInventoryChanges
where item.GoodID == goodID && DateTime.Now <= item.OccurredAt
orderby item.RunningTotal
select item).FirstOrDefault();
I am using RunningTotal as an example here, but I have about 15-20 fields that need to be calculated in a similar fashion.
Does anyone have any advice or direction to point me in? I know I can brute force it, but I am hoping to do it via the SQL layer of Entity Framework.
I am OK creating calculated fields in the DB if there is a nice way to map them to Entity Framework classes as well.
You can use computed columns in the database, and decorate your entity with DatabaseGenerated attribute to prevent EF to try to write back its value to the table. After load, EF will read its value when you insert or update:
[DatabaseGenerated(DatabaseGeneratedOption.Computed)]
public string YourComputedProperty { get; set; }
Related
I have some table like this:
CREATE TABLE names (
score INTEGER NOT NULL PRIMARY KEY,
name TEXT NOT NULL
);
And I want to get some stat from it. In sqlite I can use LEAD, but not there. I now about linq2db, but I wouldn't want to use it, because of its algorithm. As I understand it, this package does not add LEAD template to SQL conversion in EF linq, and executes the LEAD algorithm on its own side (not on the database side, which is more efficient). If I'm wrong, correct me.
For example, I want to execute query:
var lst = db.table_names.FromSqlRaw("SELECT score, LEAD(cid, 1) OVER (ORDER BY score) as next, LEAD(score, 1) OVER (ORDER BY score) - score AS diff FROM names ORDER BY diff DESC LIMIT 1");
This SQL-expression returns the two scores with the largest gap between them. The query is executed and returns a single row (known from lst.Count() and debugger).
The result is there, but how do I get it? Perhaps there is some feature of EF that allows to legally get data from the custom SQL-formed data structure?
I would not like to put crutches with filling in the class structure with the data I need to transmit to code, but not correct from the point of the purpose of the class fields.
Maybe there are illegal, but still less crutchy ways than the one I gave above?
You have two ways to approach this issue.
Create a view on the database level with the query you have and use it in the entity framework, then you will be able to simply do the following
var lst = db.vw_name.OrderBy(d => d.diff).ToList();
Use LINQ Query Syntax instead, but you will need to write multiple queries and join them together, as well as creating a new class that the query can use to instantiate a list of objects of. Here is a simplified example that does not contain SQL functions
public class Scores {
public int Score { get; set; }
public int Next { get; set; }
public int Max { get; set; }
}
and
var lst = (from x in db.table_names
orderby x.diff
select new Scores {
Score = x.score,
Next = x.next,
Max = x.Max
}).ToList();
The former approach is much better for many reasons in my opinion.
Addition to answer from Bassel Ashi:
Create a view on the database level with the query you have and use it in the entity framework
Create a view on the database level:
db.Database.ExecuteSqlRaw(#"CREATE VIEW View_scoreDiff AS SELECT score, LEAD(cid, 1) OVER (ORDER BY score) as next, LEAD(score, 1) OVER (ORDER BY score) - score AS diff FROM names ORDER BY diff DESC LIMIT 1");
Then you need to create a class:
public class View_scoreDiffClass {
public int Score { get; private set; }
public int Next { get; private set; }
public int Diff { get; private set; }
}
Add next field to your context:
public DbSet<View_scoreDiffClass> View_scoreDiff { get; private set; }
Add the following line to OnModelCreating:
modelBuilder.Entity<View_scoreDiffClass>().ToView("View_scoreDiff").HasNoKey();
After all this, you can execute db.View_scoreDiff.FirstOrDefault() and get the desired columns.
I have this "1 to N" model:
class Reception
{
public int ReceptionId { get; set; }
public string Code { get; set; }
public virtual List<Item> Items { get; set; }
}
class Item
{
public int ItemId { get; set; }
public string Code { get; set; }
public int Quantity { get; set; }
public int ReceptionId { get; set; }
public virtual Reception Reception { get; set; }
}
And this action, api/receptions/list
public JsonResult List()
{
return dbContext.Receptions
.Select(e => new
{
code = e.Code,
itemsCount = e.Items.Count,
quantity = e.Items.Sum(i => i.Quantity)
}).ToList();
}
which returns a list of receptions, with their number of items:
[
{code:"1231",itemsCount:10,quantity:30},
{code:"1232",itemsCount:5,quantity:70},
{code:"1234",itemsCount:30,quantity:600},
...
]
This was working fine but I'm having too many Reception's and Item's thus the query is taking too long...
So I want to speed up by adding some persisted fields to Reception:
class Reception
{
public int ReceptionId { get; set; }
public string Code { get; set; }
public virtual List<Item> Items { get; set; }
public int ItemsCount { get; set; } // Persisted
public int Quantity { get; set; } // Persisted
}
With this change, the query ends up being this:
public JsonResult List()
{
return dbContext.Receptions
.Select(e => new
{
code = e.Code,
itemsCount = e.ItemsCount,
quantity = e.Quantity
}).ToList();
}
My question is:
What's the best way to maintain these two fields?
I will gain in performance but now I will need to be more careful with the creation of Item's
Today an Item can be created, edited and deleted:
api/items/create?receptionId=...
api/items/edit?itemId=...
api/items/delete?itemId=...
I also have a tool for importing receptions via Excel:
api/items/createBulk?...
Maybe tomorrow I will have more ways of creating Item's, so the question is how do I make sure that these two new fields, ItemsCount and Quantity will be up to date always?
Should I create a method within Reception like this?
class Reception
{
...
public void UpdateMaintainedFields()
{
this.Quantity = this.Items.Sum(e => e.Quantity);
this.ItemsCount = this.Items.Count();
}
}
And then REMEMBER to call it from all the previous URL's? (items/create, items/edit, ...)
Or maybe should I have a stored procedure in the database?
What is the common practice? I know there are calculated columns but these refer to fields of the same class. Also there are indexed views, but I'm not sure if they apply well to scenarios like this.
From your code it seems to me that you do not have a layer for business logic, and everything is implemented in the controllers, this causes the problem for you that when you would have a different way (and it seems, that you mean a different controller) you have to implement this logic again and it is easy to forget, and if you do not forget, you could forget to maintain later.
So I would recommend to have a layer for business logic (like adding new items) and use it from the controllers where you want to create items.
I would also recommend write the function UpdateMaintainedFields as you asked, but call it in the business logic layer after adding the items, not in the controllers!
You could write the logic on the database also (trigger) if you can accept that you can't write unit test.
Assuming the original query cannot be improved with the correct execution plan in SQLServer, the way to update these fields is via a trigger in the DB. When an insert occurs (or possible an update if your persisted fields change according to the data) then when an insert occurs to that table, the trigger is run. It would be responsible for updating all the rows with the new values.
Obviously your insert performance would drop, but your query performance would be that of a simple index and read of a single row. Obviously you wouldn't be able to use this trick if you were to return a subset of the table, as all the quantities would be fixed.
An alternative is to hold the count and quantity sums in a separate table, or in a dummy row that holds the summed quantities as its entry for quantity. YMMV.
PS I hate how what is a SQL question has been turned in one about C# code! Learn SQL and run the queries you need directly in the DB, that will show you much more about the performance and structure of what you're looking for than getting EF involved. /rant :)
You want to store the same information duplicitly, which can lead to inconsistencies. As an inspiration, indexes are also duplicating data. How do you update them? You don't. It is all fully transparent. And I would recommend the same approach here.
Make sum table, maintained by triggers. The table would not be included in any datacontext schema, only way to read it would be through non updateable views or stored procedures. Its name should evoke, that nobody should ever touch this table directly.
You can now access your data from various frameworks and do not worry about updating anything. Database would assure the precalculated sums are always correct, as long as you do not write to the sum table on your own. In fact you can add or remove this table any time and no application would even notice.
My question is the same as this SO thread, but I would like to know if having an index on 'DbGeography LocationPoints' is really necessary? I think the simple answer is yes of course, but maybe the spacial type has something under the hood that does something special I'm unaware of to keep it fast for searching through...
If I'm going to be querying through lots of spacial data will I have a performance hit with no index? Is there something else I can do to increase the query performance, either inside or outside of entity framework? Is it possible to create an index on a DbGeography type outside of EF code first? I will drop code first if I have to and go back to ADO.net. I just read that MongoDb has a spatial index that can be used for fast searching of location data. Should I move to MongoDb on Azure?
I see a few links here from MSDN spacial indexes and create spacial index but don't know if they would apply to me.
Just in case I need to show code for this thread here ya go. Here is my query and the objects that are used to search through the spacial data. I've only tested on local host but running through a few hundred queries seems kinds slow.
var yogaSpaces = (from u in context.YogaSpaces
orderby u.Address.LocationPoints.Distance(myLocation)
where ((u.Address.LocationPoints.Distance(myLocation) <= 8047)
&& (u.Events.Any(e => e.DateTimeScheduled >= classDate)))
select u).ToPagedList(page, 10);
public class YogaSpaceAddress
{
//omitted code here
public DbGeography LocationPoints { get; set; }
// omitted code here
}
public class YogaSpaceEvent
{
public int YogaSpaceEventId { get; set; }
//public string Title { get; set; }
[Index]
//research more about clustered indexes to see if it's really needed here
//[Index(IsClustered = true, IsUnique = false)]
public DateTime DateTimeScheduled { get; set; }
//omitted code here
}
public class YogaSpace
{
//omitted code here
public virtual YogaSpaceAddress Address { get; set; }
//omitted code here
public virtual ICollection<YogaSpaceEvent> Events { get; set; }
//omitted code here
}
My suggestion would be to get the SQL query executed by LINQ and then use that in the SQL management studio to analyze it and get the execution plan and it'll give you suggestions on how to improve it.
If those suggestions don't work then it might be helpful to go to MongoDB. Something to consider is that the Entity Framework may be the problem because it may be loading so many objects in to memory.
I am trying to design a new system for tracking sales. A simplistic version of my data models are:
public class Sale
{
public int SaleId { get; set; }
public DateTime CompletedDateTime { get; set; }
public virtual List<SaleItem> SaleItems { get; set; }
public decimal Total
{
get
{
return SaleItems.Sum(i => i.Price);
}
}
}
public class SaleItem
{
public int SaleItemId { get; set; }
public decimal Price { get; set; }
public int SaleId { get; set; }
public virtual Sale Sale { get; set; }
}
I am now writing some reports which total the sales value for between a specified period. I have the following code to do that:
List<Sale> dailySales = db.Sales
.Where(x => DbFunctions.TruncateTime(x.CompletedDateTime) >= fromParam)
.Where(x => DbFunctions.TruncateTime(x.CompletedDateTime) <= toParam)
.ToList();
decimal total = dailySales.Sum(x => x.Total);
This is working ok and giving me the expected result. I feel like this might give me problems further down the line though once large datasets get involved. I assume having to load all the Sale's into a list would become resource intensive, plus my actual implementation has tax, costs etc. associated with each SaleItem so again becomes more complex.
The following would allow me to do all the processing on the database, however it is not possible to do this as the DB does not have a representation for Total, so EF throws an error:
Decimal total = db.Sales.Sum(x=>x.Total);
Which leads me to my question. I could set me model as the following and each time I add a SaleItem, make sure I update the Total:
public class Sale
{
...
public decimal Total { get; set; }
}
This would then allow me to query the database as required, and I assume will be less resource intensive. The flip side though is that I have reduced redundancy into the database. Is the latter way, the better method of dealing with this or is there an alternative method I haven't even considered that is better?
It depends on many factors. For instance, how often will you require the "Total" amount to be available? And how many SaleItems are there usually in a Sale?
If we're talking about, say, a supermarket kind of sale where you have... say... maximum of maximums 200 items. It's quite okay to just quickly calculate it on the fly. Then again, if this ever gets mapped to a RDBMS and if you have all the SaleItems in one single table, having an index on the foreign key (which links each individual SaleItem to its Sale) is a must, otherwise performance will take a huge hit once you start to have millions of transactions to sift through.
Answering the second half of your question, having redundancy is not always a bad thing... you just need to make sure that if each Sale ever needs to get its List modified, at the end of it the Total is recalculated. It's slightly dangerous (redundancy always has this attached burden) but you just need to ensure that whatever has the potential to change the Sale, does so in a way (maybe even with a trigger in the RDBMS) that the total will be automatically recalculated.
Hope it helps!
You're right that, it's much more efective to calculate totals on the DB side instead of loading the whole list and calculating it on the application.
I think you're missing that you can make a LINQ query that gets the SUM of related children entities.
using (var ctx = new MyDbContext())
{
var totalSales = ctx.Sales
.Select(s => s.SaleItems.Sum(si => si.Price)) // Total of each Sale
.Sum(tsi => tsi); // Sum of the total of each sale
}
You can of course shape the query to bring additional information, projecting the result in an anonymous class or in a class created ad-hoc for this purpose.
Of course, this EF query will be translated into a SQL query and executed on the server side.
When you start using LINQ to EF it's not very obvious how to get what you want, but in most occassions you can do it.
let's say I have a pesistent class called DailyVisitorSummary, which describes for each web page how many visitors it had per day. For simplicitzy, assume that we represent the day as a pure integer.
Now I would like to create a query to retrieve a specific day data, and also the data from its previous and next days. What I know is that there surely wil be at most one previous and next day data record for the same webpage, so I could write an SQL query (MySQL syntax) like:
SELECT c.*,p.*,n.* from DailyVisitorSummary c
LEFT JOIN DailyVisitorSummary p ON p.WebPage = c.WebPage AND p.Day = c.Day - 1
LEFT JOIN DailyVisitorSummary n ON n.WebPage = c.WebPage AND n.Day = c.Day + 1
WHERE c.Day = 453;
I would like to populate the following viewmodel with the result:
public class VMDailyVisitors3Day {
public VMDailyVisitors CurrentDay { get; set; }
public VMDailyVisitors PreviousDay { get; set; }
public VMDailyVisitors NextDay { get; set; }
}
public class VMDailyVisitors {
public int Day { get; set;; }
public int WebPageID { get; set; }
public int VisitorCount { get; set; }
}
How could I do this query with Linq to XPO?
I need a LINQ solution, because I need to use the result in a server-mode MVC GridView.
Linq to XPO supports group join only. A possible solution would be to create a SQL view in the database and populate it with data using your SQL query. Then you can map another persistent class to this view to obtain data.
The view must have at least one column with unique values. Do not use the newid function or similar to generate unique values, because this approach assigns different values to one and the same row each time the data is being queried. Server Mode uses a key column to identify rows. Use actual data to populate a key column. For example, concatenate values from the WebPage and the Day columns. Make sure that this produces distinct values, though.
I've found a workaround with the help of the DX support by using the PersistentAliasAttribute as a middle step, where I can do free joins, and then use that property in the XPQuery. If anyone intrested, check out here.