Improve Linq query performance that use ToList() - c#

this code written by #Rahul Singh in this post Convert TSQL to Linq to Entities :
var result = _dbContext.ExtensionsCategories.ToList().GroupBy(x => x.Category)
.Select(x =>
{
var files = _dbContext.FileLists.Count(f => x.Select(z => z.Extension).Contains(f.Extension));
return new
{
Category = x.Key,
TotalFileCount = files
};
});
but this code have problem when used inside database context and we should use ToList() like this to fix "Only primitive types or enumeration types are supported in this context" error :
var files = _dbContext.FileLists.Count(f => x.Select(z => z.Extension).ToList().Contains(f.Extension));
the problem of this is ToList() fetch all records and reduce performance, now i wrote my own code :
var categoriesByExtensionFileCount =
_dbContext.ExtensionsCategories.Select(
ec =>
new
{
Category = ec.Category,
TotalSize = _dbContext.FileLists.Count(w => w.Extension == ec.Extension)
});
var categoriesTOtalFileCount =
categoriesByExtensionFileCount.Select(
se =>
new
{
se.Category,
TotalCount =
categoriesByExtensionFileCount.Where(w => w.Category == se.Category).Sum(su => su.TotalSize)
}).GroupBy(x => x.Category).Select(y => y.FirstOrDefault());
the performance of this code is better but it have much line of code, any idea about improve performance of first code or reduce line of second code :D
Regards, Mojtaba

You should have a navigation property from ExtensionCategories to FileLists. If you are using DB First, and have your foreign key constraints set up in the database, it should do this automatically for you.
If you supply your table designs (or model classes), it would help a lot too.
Lastly, you can rewrite using .ToList().Contains(...) with .Any() which should solve your immediate issue. Something like:
_dbContext.FileLists.Count(f => x.Any(z => z.Extension==f.Extension)));

Related

LINQ Lambda efficiency of code groupby orderby

I have this code, but I think that it could run faster, or I just hope to. But I have plenty of data. I'd like to have it as effective as it can be.
Here is the code:
(Need to return newest translations of words (Language and value) from resources grouped by resource and language based on Expression<Func<ResourcesTranslation, bool>> ConditionExpression)
KeyValues = item.Resources
.Where(ConditionExpression)
.GroupBy(g => new { g.ResourceId, g.Language })
.Select(m => m.OrderByDescending(o => o.Changed ?? o.Created))
.Select( s => new KeyValues
{
Language = s.FirstOrDefault().Language,
KeyValue = s.FirstOrDefault().Value
}).ToList();
As you need only one element after grouping, you can return it right in GroupBy clause, it will simplify your code:
KeyValues = item.Resources
.Where(ConditionExpression)
.GroupBy(g => new { g.ResourceId, g.Language },
(x, y) => new { Max = y.OrderByDescending(o => o.Changed ?? o.Created).First() })
.Select(s => new KeyValues
{
Language = s.Max.Language,
KeyValue = s.Max.Value
})
.ToList();
Even though you can get some performance by removing the first, unneeded select (depending on the volume of data this could be minimal to medium improvement) like this:
KeyValues = item.Resources
.Where(ConditionExpression)
.GroupBy(g => new { g.ResourceId, g.Language })
.OrderByDescending(o => o.Changed.HasValue ? o.Changed : o.Created)
.Select( s => new KeyValues
{
Language = s.Language,
KeyValue = s.Value
}).ToList();
Depending on your case, you could:
If your data is in a database, you can create database improvements like adding indexes, updating statistics, using hints etc.
if this is local data, you can use some strategy to split new and old data between various enumerables.
There is no other way to significantly improve your linq query. You need to find another strategy to achieve that.
I found out that Visual Studio translates it in to selects, so I realized that, the best solution for stuff like this is to make some View.. Just giving answer to own Q for another guys.

Problem with LINQ query: Select first task from each goal

I'm looking for suggestions on how to write a query. For each Goal, I want to select the first Task (sorted by Task.Sequence), in addition to any tasks with ShowAlways == true. (My actual query is more complex, but this query demonstrates the limitations I'm running into.)
I tried something like this:
var tasks = (from a in DbContext.Areas
from g in a.Goals
from t in g.Tasks
let nextTaskId = g.Tasks.OrderBy(tt => tt.Sequence).Select(tt => tt.Id).DefaultIfEmpty(-1).FirstOrDefault()
where t.ShowAlways || t.Id == nextTaskId
select new CalendarTask
{
// Member assignment
}).ToList();
But this query appears to be too complex.
System.InvalidOperationException: 'Processing of the LINQ expression 'OrderBy<Task, int>(
source: MaterializeCollectionNavigation(Navigation: Goal.Tasks(< Tasks > k__BackingField, DbSet<Task>) Collection ToDependent Task Inverse: Goal, Where<Task>(
source: NavigationExpansionExpression
Source: Where<Task>(
source: DbSet<Task>,
predicate: (t0) => Property<Nullable<int>>((Unhandled parameter: ti0).Outer.Inner, "Id") == Property<Nullable<int>>(t0, "GoalId"))
PendingSelector: (t0) => NavigationTreeExpression
Value: EntityReferenceTask
Expression: t0
,
predicate: (i) => Property<Nullable<int>>(NavigationTreeExpression
Value: EntityReferenceGoal
Expression: (Unhandled parameter: ti0).Outer.Inner, "Id") == Property<Nullable<int>>(i, "GoalId"))),
keySelector: (tt) => tt.Sequence)' by 'NavigationExpandingExpressionVisitor' failed. This may indicate either a bug or a limitation in EF Core. See https://go.microsoft.com/fwlink/?linkid=2101433 for more detailed information.'
The problem is the line let nextTaskId =.... If I comment out that, there is no error. (But I don't get what I'm after.)
I'll readily admit that I don't understand the details of the error message. About the only other way I can think of to approach this is return all the Tasks and then sort and filter them on the client. But my preference is not to retrieve data I don't need.
Can anyone see any other ways to approach this query?
Note: I'm using the very latest version of Visual Studio and .NET.
UPDATE:
I tried a different, but less efficient approach to this query.
var tasks = (DbContext.Areas
.Where(a => a.UserId == UserManager.GetUserId(User) && !a.OnHold)
.SelectMany(a => a.Goals)
.Where(g => !g.OnHold)
.Select(g => g.Tasks.Where(tt => !tt.OnHold && !tt.Completed).OrderBy(tt => tt.Sequence).FirstOrDefault()))
.Union(DbContext.Areas
.Where(a => a.UserId == UserManager.GetUserId(User) && !a.OnHold)
.SelectMany(a => a.Goals)
.Where(g => !g.OnHold)
.Select(g => g.Tasks.Where(tt => !tt.OnHold && !tt.Completed && (tt.DueDate.HasValue || tt.AlwaysShow)).OrderBy(tt => tt.Sequence).FirstOrDefault()))
.Distinct()
.Select(t => new CalendarTask
{
Id = t.Id,
Title = t.Title,
Goal = t.Goal.Title,
CssClass = t.Goal.Area.CssClass,
DueDate = t.DueDate,
Completed = t.Completed
});
But this also produced an error:
System.InvalidOperationException: 'Processing of the LINQ expression 'Where<Task>(
source: MaterializeCollectionNavigation(Navigation: Goal.Tasks (<Tasks>k__BackingField, DbSet<Task>) Collection ToDependent Task Inverse: Goal, Where<Task>(
source: NavigationExpansionExpression
Source: Where<Task>(
source: DbSet<Task>,
predicate: (t) => Property<Nullable<int>>((Unhandled parameter: ti).Inner, "Id") == Property<Nullable<int>>(t, "GoalId"))
PendingSelector: (t) => NavigationTreeExpression
Value: EntityReferenceTask
Expression: t
,
predicate: (i) => Property<Nullable<int>>(NavigationTreeExpression
Value: EntityReferenceGoal
Expression: (Unhandled parameter: ti).Inner, "Id") == Property<Nullable<int>>(i, "GoalId"))),
predicate: (tt) => !(tt.OnHold) && !(tt.Completed))' by 'NavigationExpandingExpressionVisitor' failed. This may indicate either a bug or a limitation in EF Core. See https://go.microsoft.com/fwlink/?linkid=2101433 for more detailed information.'
This is a good example for the need of full reproducible example. When trying to reproduce the issue with similar entity models, I was either getting a different error about DefaulIfEmpty(-1) (apparently not supported, don't forget to remove it - the SQL query will work correctly w/o it) or no error when removing it.
Then I noticed a small deeply hidden difference in your error messages compared to mine, which led me to the cause of the problem:
MaterializeCollectionNavigation(Navigation: Goal.Tasks (<Tasks>k__BackingField, DbSet<Task>)
specifically the DbSet<Task> at the end (in my case it was ICollection<Task>). I realized that you used DbSet<T> type for collection navigation property rather than the usual ICollection<T>, IEnumerable<T>, List<T> etc., e.g.
public class Goal
{
// ...
public DbSet<Task> Tasks { get; set; }
}
Simply don't do that. DbSet<T> is a special EF Core class, supposed to be used only from DbContext to represent db table, view or raw SQL query result set. And more importantly, DbSets are the only real EF Core query roots, so it's not surprising that such usage confuses the EF Core query translator.
So change it to some of the supported interfaces/classes (for instance, ICollection<Task>) and the original problem will be solved.
Then removing the DefaultIfEmpty(-1) will allow successfully translating the first query in question.
I don't have EF Core up and running, but are you able to split it up like this?
var allTasks = DbContext.Areas
.SelectMany(a => a.Goals)
.SelectMany(a => a.Tasks);
var always = allTasks.Where(t => t.ShowAlways);
var next = allTasks
.OrderBy(tt => tt.Sequence)
.Take(1);
var result = always
.Concat(next)
.Select(t => new
{
// Member assignment
})
.ToList();
Edit: Sorry, I'm not great with query syntax, maybe this does what you need?
var allGoals = DbContext.Areas
.SelectMany(a => a.Goals);
var allTasks = DbContext.Areas
.SelectMany(a => a.Goals)
.SelectMany(a => a.Tasks);
var always = allGoals
.SelectMany(a => a.Tasks)
.Where(t => t.ShowAlways);
var nextTasks = allGoals
.SelectMany(g => g.Tasks.OrderBy(tt => tt.Sequence).Take(1));
var result = always
.Concat(nextTasks)
.Select(t => new
{
// Member assignment
})
.ToList();
I would recommend you start by breaking up this query into individual parts. Try iterating through the Goals in a foreach with your Task logic inside. Add each new CalendarTask to a List that you defined ahead of time.
Overall breaking this logic up and experimenting a bit will probably lead you to some insight with the limitations of Entity Framework Core.
I think we might separate the query into two steps. First, query each goals and get the min Sequence task and store them(maybe with a anonymous type like {NextTaskId,Goal}). Then, we query the temp data and get the result. For example
Areas.SelectMany(x=>x.Goals)
.Select(g=>new {
NextTaskId=g.Tasks.OrderBy(t=>t.Sequence).FirstOrDefault()?.Id,
Tasks=g.Tasks.Where(t=>t.ShowAlways)
})
.SelectMany(a=>a.Tasks,(a,task)=>new {
NextTaskId = a.NextTaskId,
Task = task
});
I tried to create the linq request but I'm not sure about the result
var tasks = ( from a in DbContext.Areas
from g in a.Goals
from t in g.Tasks
join oneTask in (from t in DbContext.Tasks
group t by t.Id into gt
select new {
Id = gt.Key,
Sequence = gt.Min(t => t.Sequence)
}) on new { t.Id, t.Sequence } equals new { oneTask.Id,oneTask.Sequence }
select new {Area = a, Goal = g, Task = t})
.Union(
from a in DbContext.Areas
from g in a.Goals
from t in g.Tasks
where t.ShowAlways
select new {Area = a, Goal = g, Task = t});
I currently don't have EF Core, but do you really need to compare this much?
Wouldn't querying the tasks be sufficient?
If there is a navigation property or foreign key defined I could imaging using something like this:
Tasks.Where(task => task.Sequence == Tasks.Where(t => t.GoalIdentity == task.GoalIdentity).Min(t => t.Sequence) || task.ShowAlways);

Refactor Linq query

I am trying to cut this linq down
var sys = db.tlkpSystems
.Where(a => db.tlkpSettings.Where(e => e.Hidden < 3)
.Select(o => o.System)
.ToList().Contains(a.System)) //cannot get this part in?
.OrderBy(a => a.SystemName).ToList();
foreach (var item in sys)
model.Add(new SettingSystem {
System = item.System,
SystemName = item.SystemName
});
I have tried the following:
List<SettingSystem> model = new List<SettingSystem>();
model = db.tlkpSettings.Where(e => e.Hidden < 3)
.OrderBy(e => e.Setting)
.Select(e => new SettingSystem
{
System = e.System,
SystemName = e.Setting
}).ToList();
How can I call the .Contains(a.System) part in my query?
Thanks
Some general rules when working with LINQ to Entities:
Avoid using ToList inside the query. It prevents EF to build a correct SQL query.
Don't use Contains when working with entities (tables). Use Any or joins.
Here is your query (in case System is not an entity navigation property):
var sys = db.tlkpSystems
.Where(a => db.tlkpSettings.Any(e => e.Hidden < 3 && e.System == a.System))
.OrderBy(a => a.SystemName).ToList();
As an addendum, there is also AsEnumerable for when you must pull a query into memory (such as calling methods within another clause). This is generally better than ToList or ToArray since it'll enumerate the query, rather than enumerating, putting together a List/Array, and then enumerating that collection.

Chain query in Linq

I have the following queries:
var ground = db
.Ground
.Where(g => g.RowKey == Ground_Uuid)
.ToList();
var building = db
.Building
.Where(b => ground.Any(gr => gr.RowKey == b.Ground.RowKey))
.ToList();
var floor = db
.Floor
.Where(b => building.Any(by => by.RowKey == b.Building.RowKey))
.ToList();
So the second relies on the id from the first set and so on.
I got following error when an execution goes to the second query:
Unable to create a constant value of type 'Domain.Model.Entities.Ground'. Only primitive types or enumeration types are supported in this context.
Any ideas how to resolve it?
The problem with your code is that ToList is converting the result into an in-memory object and a collection of objects in memory cannot be joined with a set of data in the database.
var ground = db.Ground.Where(g => g.RowKey == Ground_Uuid);
var building = db.Building.Where(b => ground.Any(gr => gr.RowKey == b.Ground.RowKey));
var floor = db.Floor.Where(b => building.Any(by => by.RowKey == b.Building.RowKey));
Also, frankly after reading #juharr's comment, I saw the relationship between floor, building & ground. Since you are already doing b.Building.RowKey, b.Ground.RowKey predicting the relationship was easy and I totally agree, it can be simplified as:-
var floor = db.Floor.Where(b => b.Building.Ground.RowKey == Ground_Uuid);
It seems like the first query is redundant. You already know that the RowKey column for each row will be equal to Ground_Uuid.
var building = db.Building.Where(b => b.Ground.RowKey == Ground_Uuid);
var floor = db.Floor.Where(b => b.Building.Ground.RowKey == Ground_Uuid);
Removing ToList() would do the job, but moreover if RowKey is the foreign key you can utilize Linq:
var floor = db.Floor
.Where(b => b.Building.Ground.RowKey == Ground_Uuid)
.ToList();

Filter nested models on property in last node with NHibernate

I am using NHibernate with mapping by code.
I have three models: Solution, Installation and System. There are one-to-many relations between them. So that each Solution has a list of Installations, and each Installation has a list of Systems.
Each system has a property "Type", which can be "1" or "0".
I am trying to write a method in the Solution repository that will return all the Solutions, with their Installations with only the Systems of type "1".
I have tried the Where-keyword in the SystemMap but i get the same result with and without it. Then i tried a few different experiments with QueryOver(???) without success.
How do i go about to filter on information in the last node?
Thank to your answer, i have done the following implementation, but it results in a huge amount of Systems and Solutions. Maybe i have done something wrong?
The Maps are as follows:
public SAPSolutionMap()
{
Id(t => t.YPID);
Property(e => e.ShortName);
Property(e => e.FullName);
Bag(x => x.SapInstallations, colmap =>
{
colmap.Table("SAPInstallation");
colmap.Key(x => x.Column("Solution"));
colmap.Inverse(true);
colmap.Lazy(CollectionLazy.NoLazy);
colmap.Fetch(CollectionFetchMode.Join);
colmap.Cascade(Cascade.None);
}, map => map.OneToMany(m => m.Class(typeof(SAPInstallation))));
}
public SAPInstallationMap()
{
Id(t => t.InstallationNumber);
Bag(x => x.SapSystems, colmap =>
{
colmap.Table("sapgui");
colmap.Key(x => x.Column("Installation"));
colmap.Inverse(true);
colmap.Lazy(CollectionLazy.NoLazy);
colmap.Cascade(Cascade.None);
colmap.Fetch(CollectionFetchMode.Join);
//colmap.Where("Type = 1");
}, map => map.OneToMany(m => m.Class(typeof(SAPSystem))));
ManyToOne(x => x.SapSolution, map =>
{
map.Column("Solution");
map.NotNullable(true);
map.Cascade(Cascade.None);
map.Class(typeof(SAPSolution));
});
}
public SAPSystemMap()
{
Id(t => t.ID, t => t.Generator(Generators.Identity));
Property(e => e.Type);
Property(e => e.ExplanationText);
ManyToOne(x => x.SapInstallation, map =>
{
map.Column("Installation");
map.NotNullable(true);
map.Cascade(Cascade.None);
map.Class(typeof(SAPInstallation));
});
}
And the Query:
public IList<SAPSolution> GetProductionSystems()
{
SAPSystem syst = null;
SAPInstallation installation = null;
var subquery = QueryOver.Of(() => syst)
.JoinQueryOver(x => x.SapInstallation, () => installation)
.Where(() => syst.Type == 1)
.Select(x => installation.SapSolution.YPID);
// main Query
var query = Session.QueryOver<SAPSolution>()
.WithSubquery
.WhereProperty(root => root.YPID)
.In(subquery);
return query.List<SAPSolution>();
}
Thank you!
General solution should be:
// this is a subquery (SELECT ....
System syst = null;
Installation installation = null;
var subquery = QueryOver.Of(() => syst)
.JoinQueryOver(x => x.Installation, () => installation)
.Where(() => syst.Type == 1)
.Select(x => installation.Solution.ID)
;
// main Query
var query = session.QueryOver<Solution>()
.WithSubquery
.WhereProperty(root => root.ID)
.In(subquery)
;
var list = query
.Take(10)
.Skip(10)
.List<Solution>();
What we can see, that Solution, Installation and System
System has property Installation (many-to-one)
Installation has property Solution (many-to-one)
This is expect-able, because it goes side by side with one-to-many (it is the reverse mapping)
So, then we create subquery, which returns just solution ID's which belong to system with searched Type.
Main query is flat (the great benefit) and we can use paging on top of it.
We would be able to do that even if there is only one way (one-to-many). But that will generate more complicated SQL query ... and does not make sense. In C# we can have both relations...
EXTEND:
You did a great job. Your mapping and query is really cool. But there is one big but: LAZY is what we should/MUST use. Check this:
NHibernate is lazy, just live with it, by Ayende
So, our, collections cannot be FETCHING with a JOIN, because that will multiply the result (10 solutions * 100 installation * 10 systems == 10000 results)
Bag(x => x.SapSystems, colmap =>
{
...
// THIS IS not good way
colmap.Lazy(CollectionLazy.NoLazy);
colmap.Fetch(CollectionFetchMode.Join);
We should use LAZY as possible. To avoid later 1 + N issue, we can use batch-fetching (for example check this)
How to Eager Load Associations without duplication in NHibernate?
So, our collections should be mapped like this:
Bag(x => x.SapSystems, colmap =>
{
...
// THIS IS not good way
colmap.Lazy(CollectionLazy.Lazy);
colmap.BatchSize(100);
With this setting, the query will really use only the root object and related collections will be loaded very effectively

Categories

Resources