I am using the repository pattern + entity framework for a small project I'm working on. I needed to cut out lazy loading for performance reasons, but now I also need to include child entities in my db fetches. My current (working) solution is this:
protected MediaDbEntities MediaDb { get { return db ?? (db = DatabaseFactory.Get()); } }
public virtual IEnumerable<T> All(string[] childEntities = null)
{
var query = MediaDb.Set<T>();
foreach (var childEntity in childEntities)
{
query.Include(childEntity);
}
return query.ToList();
}
I would really like to explore the use of an aggregate in this case, but don't really know how to apply. I have only used aggregates for sums and arithmetic operations. Anybody have an answer I can learn from?
In this case, you're not actually aggregating any data so trying to create any sort of aggregate is the wrong answer.
There's nothing wrong with your current solution. It makes perfect sense the way it is and I, personally, would leave it the way it is.
Related
I have a problem with EF6 when trying to optimize the queries. Consider this class with one collection:
public class Client
{
... a lot of properties
public virtual List<Country> Countries { get; set; }
}
As you might know, with Lazy Loading I have this n+1 problem, when EF tries to get all the Countries, for each client.
I tried to use Linq projections; for example:
return _dbContext.Clients
.Select(client => new
{
client,
client.Countries
}).ToList().Select(data =>
{
data.client.Countries = data.Countries; // Here is the problem
return data.client;
}).ToList();
Here I'm using two selects: the first for the Linq projection, so EF can create the SQL, and the second to map the result to a Client class. The reason for that is because I'm using a repository interface, which returns List<Client>.
Despite the query is generated with the Countries in it, EF still is using Lazy Loading when I try to render the whole information (the same n+1 problem). The only way to avoid this, is to remove the virtual accessor:
public class Client
{
... a lot of properties
public List<Country> Countries { get; set; }
}
The issue I have with this solution is that we still want to have this property as virtual. This optimization is only necessary for a particular part of the application, whilst on the other sections we want to have this Lazy Loading feature.
I don't know how to "inform" EF about this property, that has been already lazy-loaded via this Linq projection. Is that possible? If not, do we have any other options? The n+1 problems makes the application to take several seconds to load like 1000 rows.
Edit
Thanks for the responses. I know I can use the Include() extension to get the collections, but my problem is with some additional optimizations I need to add (I'm sorry for not posting the complete example, I thought with the Collection issue would be enough):
public class Client
{
... a lot of properties
public virtual List<Country> Countries { get; set; }
public virtual List<Action> Actions { get; set; }
public virtual List<Investment> Investments { get; set; }
public User LastUpdatedBy {
get {
if(Actions != null) {
return Actions.Last();
}
}
}
}
If I need to render the clients, the information about the last update and the number of investments (Count()), with the Include() I practically need to bring all the information from the database. However, if I use the projection like
return _dbContext.Clients
.Select(client => new
{
client,
client.Countries,
NumberOfInvestments = client.Investments.Count() // this is translated to an SQL query
LastUpdatedBy = client.Audits.OrderByDescending(m => m.Id).FirstOrDefault(),
}).ToList().Select(data =>
{
// here I map back the data
return data.client;
}).ToList();
I can reduce the query, getting only the required information (in the case of LastUpdatedBy I need to change the property to a getter/setter one, which is not a big issue, as its only used for this particular part of the application).
If I use the Select() with this approach (projection and then mapping), the Include() section is not considered by EF.
If i understand correctly you can try this
_dbContext.LazyLoading = false;
var clientWithCountres = _dbContext.Clients
.Include(c=>c.Countries)
.ToList();
This will fetch Client and only including it Countries. If you disable lazy-loading the no other collection will load from the query. Unless you are specifying a include or projection.
FYI : Projection and Include() doesn't work together see this answer
If you are projection it will bypass the include.
https://stackoverflow.com/a/7168225/1876572
don't know what you want to do, you are using lambda expression not linq, and your second select it's unnecessary.
data.client is client, data.Countries is client.Countries, so data.client.Countries = data.Countries alway true.
if you don't want lazy load Countries, use _dbContext.Clients.Include("Countries").Where() or select ().
In order to force eager loading of virtual properties you are supposed to use Include extension method.
Here is a link to MSDN https://msdn.microsoft.com/en-us/library/jj574232(v=vs.113).aspx.
So something like this should work:
return _dbContext.Clients.Include(c=>c.Countries).ToList();
Im not 100% sure but I think your issue is that you are still maintaining a queryable for your inner collection through to the end of the query.
This queryable is lazy (because in the model it was lazy), and you havent done anything to explain that this should not be the case, you have simply projected that same lazy queryable into the result set.
I cant tell you off the top of my head what the right answer here is but I would try things around the following:
1 use a projection on the inner queriable also eg
return _dbContext.Clients
.Select(client => new
{
client,
Countries = client.Countries.Select(c=>c)// or a new Country
})
2 Put the include at the end of the query (Im pretty sure include applies to the result not the input. It definitally doesnt work if you put it before a projection) eg:
_dbContext.Clients
.Select(client => new
{
client,
client.Countries
}.Include(c=>c.Countries)`
3 Try specifying the enumeration inside the projection eg:
_dbContext.Clients
.Select(client => new
{
client,
Countries = client.Countries.AsEnumerable() //perhaps tolist if it works
}`
I do want to caviat this by saying that I havent tried any of the above but I think this will set you on the right path.
A note on lazy loading
IMO there are very few good use cases for lazy loading. It almost always causes too many queries to be generated, unless your user is following a lazy path directly on the model. Use it only with extreme caution and IMO not at all in request response (eg web) apps.
I have been using the Entity Framework with the POCO First approach. I have pretty much followed the pattern described by Steve Sanderson in his book 'Pro ASP.NET MVC 3 Framework', using a DI container and DbContext class to connect to SQL Server.
The underlying tables in SQL server contain very large datasets used by different applications. Because of this I have had to create views for the entities I need in my application:
class RemoteServerContext : DbContext
{
public DbSet<Customer> Customers { get; set; }
public DbSet<Order> Orders { get; set; }
public DbSet<Contact> Contacts { get; set; }
...
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<Customer>().ToTable("vw_Customers");
modelBuilder.Entity<Order>().ToTable("vw_Orders");
...
}
}
and this seems to work fine for most of my needs.
The problem I have is that some of these views have a great deal of data in them so that when I call something like:
var customers = _repository.Customers().Where(c => c.Location == location).Where(...);
it appears to be bringing back the entire data set, which can take some time before the LINQ query reduces the set to those which I need. This seems very inefficient when the criteria is only applicable to a few records and I am getting the entire data set back from SQL server.
I have tried to work around this by using stored procedures, such as
public IEnumerable<Customer> CustomersThatMatchACriteria(string criteria1, string criteria2, ...) //or an object passed in!
{
return Database.SqlQuery<Customer>("Exec pp_GetCustomersForCriteria #crit1 = {0}, #crit2 = {1}...", criteria1, criteria2,...);
}
whilst this is much quicker, the problem here is that it doesn't return a DbSet and so I lose all of the connectivity between my objects, e.g. I can't reference any associated objects such as orders or contacts even if I include their IDs because the return type is a collection of 'Customers' rather than a DbSet of them.
Does anyone have a better way of getting SQL server to do the querying so that I am not passing loads of unused data around?
var customers = _repository.Customers().Where(c => c.Location == location).Where(...
If Customers() returns IQueryable, this statement alone won't actually be 'bringing back' anything at all - calling Where on an IQueryable gives you another IQueryable, and it's not until you do something that causes query execution (such as ToList, or FirstOrDefault) that anything will actually be executed and results returned.
If however this Customers method returns a collection of instantiated objects, then yes, since you are asking for all the objects you're getting them all.
I've never used either code-first or indeed even then repository pattern, so I don't know what to advise, other than staying in the realm of IQueryable for as long as possible, and only executing the query once you've applied all relevant filters.
What I would have done to return just a set of data would have been the following:
var customers = (from x in Repository.Customers where <boolean statement> &&/|| <boolean statement select new {variableName = x.Name , ...).Take(<integer amount for amount of records you need>);
so for instance:
var customers = (from x in _repository.Customers where x.ID == id select new {variableName = x.Name} ).take(1000);
then Iterate through the results to get the data: (remember, the linq statement returns an IQueryable)...
foreach (var data in customers)
{
string doSomething = data.variableName; //to get data from your query.
}
hope this helps, not exactly the same methods, but I find this handy in my code
Probably it's because your Cusomters() method in your repository is doing a GetAll() kind of thing and fetching the entire list first. This prohibits LINQ and your SQL Server from creating smart queries.
I don't know if there's a good workaround for your repository, but if you would do something like:
using(var db = new RemoteServerContext())
{
var custs = db.Customers.Where(...);
}
I think that will be a lot quicker. If your project is small enough, you can do without a repository. Sure, you'll lose an abstraction layer, but with small projects this may not be a big problem.
On the other hand, you could load all Customers in your repository once and use the resulting collection directly (instead of the method-call that fills the list). Beware of adding, removing and modifying Customers though.
You need the LINQ query to return less data like sql paging like top function in sql or do manual querying using stored procedures. In either cases, you need to rewrite your querying mechanism. This is one of the reasons why I didn't use EF, because you don't have a lot of control over the code it seems.
I have a LINQ to SQL class, we'll call it Test, and I want to be able to access properties with LINQ queries but I get the famed "No Supported Translation to SQL" runtime error. I'm interested in the conceptual problem. Here is my simplified class:
public class Test
{
public int ID {get; set;} // Stored in Database
public int NonForeignKeyValue {get; set;} // Stored in Database
}
Here is sort of an example of what I'm trying to accomplish, but I don't want the overhead of always explicitly writing the join in LINQ:
var db = (new DataContext()).GetTable<Test>();
var q = (from t in db.GetTable<Test>()
join o in db.GetTable<OtherTable>() on o.ID equals t.ID
where t.OtherStuff
select t)
I'd like to be able to add a property to Test that tells me if there are any rows in OtherTable that could be joined with Test:
public bool IsInOtherTable
{
get
{
return (new DataContext())
.GetTable<OtherTabke>()
.Any(x => x.NonForeignKeyValue == this.NonForeignKeyValue));
}
}
Ultimately this is what I want my code to look like, but it errors. I basically want to return all entries that contain some database computed value:
using (DataContext db = new DataContext())
{
var q = db.GetTable<Test>()
.Where(x => x.IsInOtherTable && x.OtherStuff); //Error
}
I'm basically trying to save myself from writing this code every single time I want to check if Test has certain information in another table. I'm not that interested in the exact problem I described, I'm more interested in how to conceptually add the join part to the SQL and still use LINQ. I'm guessing I use Linq.Expression, but I really don't know and I'm not aware of how to do it.
As an aside, I could just write the actual SQL, as its not that complicated, but I'd like to know how to get around this and still use LINQ.
Edit: I tried this property, but I get the same error. Its more complicated that just changing the return type to Expression...
public System.Linq.Expressions.Expression<Func<Article3, bool>> Exists
{
get
{
using (DataContext db = new DataContext())
{
return i => db.GetTable<OtherTable>()
.Any(x => x.NonForeignKeyValue == i.NonForeignKeyValue));
}
}
}
Each time the linq generator is to translate a code into a query, it has to process an expression tree.
In your examples, you are not passing around expression but rather - properties, delegates, i.e. stuff which the expression visitor is unable to "step into".
In general, try to rethink your conditions so that instead of bool you have Expression<T, bool> etc.
http://netpl.blogspot.com/2008/02/linq-to-object-vs-linq-to-sql-and.html
Firstly, I think you may be overestimating "the overhead of always explicitly writing the join in LINQ". It's an extra line of code which has the advantage of being relatively self-documenting as to just what you are doing (always a nice thing), and any other approach is going to be turned first into SQL and then into a query plan that will be at least as expensive to execute, possibly more expensive (SQLServer is good a joins!)
Still, there are two alternatives I can thinkof.
One is to have an EntityRef property on the class that defines this relationship with the other table. You can then test if it is null in your query (or EntitySet if it's on the other side of a one-to-many relationship).
The other is to define a SQL function that returns a bit result indicating whether an id refers to a row that does or doesn't relate to the other table.
Then define a protected method on your DataContext-derived class that matches the signature in C# terms, and has a Function attribute mapping it to that SQL function. (Since this isn't something that you can give a sensible non-db-using version of in the C# version, you can implement the C# function by calling ExecuteMethodCall).
Then you can use that method instead of the join.
Still, this is likely less self-explanatory in the code and at risk of being less efficient than just keeping the join.
Given i have a class like so in my Data Layer
public class GenericRepository<TEntity> where TEntity : class
{
public MyDataContext DataContext {get;set;}
[System.ComponentModel.DataObjectMethod(System.ComponentModel.DataObjectMethodType.Select)]
public IQueryable<TEntity> SelectAll()
{
return DataContext.GetTable<TEntity>();
}
}
I would be able to query a table in my database like so from a higher layer
using (GenericRepositry<MyTable> mytable = new GenericRepositry<MyTable>())
{
var myresult = from m in mytable.SelectAll()
where m.IsActive
select m;
}
is this considerably slower than using the usual code in my Data Layer
using (MyDataContext ctx = new MyDataContext())
{
var myresult = from m in ctx.MyTable
where m.IsActive
select m;
}
Eliminating the need to write simple single table selects in the Data layer saves a lot of time, but will i regret it?
Edit: # Skeet
I have actually implemented this approach in a fairly large WCF/Silverlight LOB project, and it seems our servers CPU's are struggling to keep up. The extra work of creating/destroying extra objects couldn't possibly be attributed to the rise in cpu usage over projects using the usual way?
You haven't shown where your "generic repository" is getting its context from - I assume it's creating a new one, and proxying the dispose call?
If so, it should basically be the same - it's been a while since I looked into the difference between GetTable<T>() and using the property, but I wouldn't be surprised if the property just called GetTable<T> itself. Other than that, there's no real difference.
The important point is that you're still using IQueryable<T> in both cases, so the query will still be translated into SQL - if your SelectAll method returned IEnumerable<T> instead, it would be disastrous.
I have a table with two columns, GroupId and ParentId (both are GUIDS). The table forms a hierarchy so I can look for a value in the “GroupId” filed, when I have found it I can look at its ParentId. This ParentId will also appear in the GroupId of a different record. I can use this to walk up the hierarchy tree from any point to the root (root is an empty GUID). What I’d like to do is get a list of records when I know a GroupId. This would be the record with the GroupId and all the parents back to the root record. Is this possible with Linq and if so, can anyone provide a code snippet?
LINQ is not designed to handle recursive selection.
It is certainly possible to write your own extension method to compensate for that in LINQ to Objects, but I've found that LINQ to Entities does not like functionality not easily translated into SQL.
Edit:
Funnily enough, LINQ to Entities does not complain about Matt Warren's take on recursion using LINQ here. You could do:
var result = db.Table.Where(item => item.GroupId == 5)
.Traverse(item => db.Table.Where(parent
=> item.ParentId == parent.GroupId));
using the extension method defined here:
static class LinqExtensions
{
public static IEnumerable<T> Traverse<T>(this IEnumerable<T> source,
Func<T,IEnumerable<T>> selector){
foreach(T item in source){
yield return item;
IEnumerable<T> children = selector(item);
foreach (T child in children.Traverse(selector))
{
yield return child;
}
}
}
Performace might be poor, though.
It's definitely possible with Linq, but you'd have to make a DB call for each level in the heirarchy. Not exactly optimal.
The other respondents are right - performance is going to be pretty bad on this, since you'll have to make multiple round-trips. This will be somewhat dependent on your particular case, however - is your tree deep and will be people be performing this operation often, for instance.
You may be well served by creating a stored procedure that does this (using a CTE), and wiring it up in the Entities Designer to return your particularly defined Entity.