C# Entity Framework Group By Tie Breaker - c#

I have a class called Person defined like so:
public class Person {
string Name; // "Joe", "Alex", etc.
string State; // "Minnesota", "Texas", etc.
int Age; // "12", "23", etc.
}
I'm using PostgresQL as my backing database w/ EF Core to bind the person table to the Person class.
I would like to perform a query that returns a List<Person> of the oldest person in each state. For example, if I had:
Person p1 = new{
Name = "Joe",
State = "Minnesota",
Age = 34
};
Person p2 = new{
Name = "Alex",
State = "Minnesota",
Age = 55
};
Person p3 = new{
Name = "George",
State = "Texas",
Age = 62
}
I would like to perform a query that returns p2 and p3 (the order of p2 and p3 within the resultant list doesn't matter).
Right now, I'm using the following query:
List<Person> oldestByState = await _context.Person.GroupBy(x => x.State).Select(x => x.OrderByDescending(y => y.Age).First()).ToListAsync();
The issue with this is that C# doesn't know how to translate it into SQL. I get an error that x.OrderByDescending(y => y.Age) can't be translated.
I know that a quick fix would be to do the grouping on the client, but I'd prefer to run the query on the database by fixing the Linq query.
What query should I use to return a list of the oldest people in each state that is compatible with Linq to PostgresQL translation?
EF Core: 5.0.7
Npgsql: 5.0.6

I tested this with SQL Server and the error received from EF is that "First can only be used as a final operator, try FirstOrDefault." Using FirstOrDefault worked returning the expected oldest row.
await _context.Person
.GroupBy(x => x.State)
.Select(x => x.OrderByDescending(y => y.Age).FirstOrDefault())
.ToListAsync();
If it still doesn't work with PostgreSQL then it may be a limitation of that provider.
Edit:
One possible work-around, pending a fix from the npgsql team, would be to use a raw SQL query. For my test schema which is a bit different, EF for SQL Server produced:
SELECT
[Limit1].[PersonId] AS [PersonId],
[Limit1].[Name] AS [Name],
[Limit1].[State] AS [State],
[Limit1].[Age] AS [Age],
FROM (SELECT DISTINCT
[Extent1].[State] AS [State]
FROM [dbo].[Persons] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[PersonId] AS [PersonId], [Project2].[Name] AS [Name], [Project2].[Age] AS [Age], [Project2].[State] AS [State]
FROM ( SELECT
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Name] AS [Name],
[Extent2].[Age] AS [Age],
[Extent2].[State] AS [State]
FROM [dbo].[Children] AS [Extent2]
WHERE [Distinct1].[State] = [Extent2].[State]
) AS [Project2]
ORDER BY [Project2].[Age] DESC ) AS [Limit1]
Then executed by:
await _context.Person.FromSqlRaw(#" SELECT
[Limit1].[PersonId] AS [PersonId],
[Limit1].[Name] AS [Name],
[Limit1].[State] AS [State],
[Limit1].[Age] AS [Age],
FROM (SELECT DISTINCT
[Extent1].[State] AS [State]
FROM [dbo].[Persons] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[PersonId] AS [PersonId], [Project2].[Name] AS [Name], [Project2].[Age] AS [Age], [Project2].[State] AS [State]
FROM ( SELECT
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Name] AS [Name],
[Extent2].[Age] AS [Age],
[Extent2].[State] AS [State]
FROM [dbo].[Children] AS [Extent2]
WHERE [Distinct1].[State] = [Extent2].[State]
) AS [Project2]
ORDER BY [Project2].[Age] DESC ) AS [Limit1]").ToListAsync();
Ugly as &*^, and may not match your schema but if you can write a query that returns the data then provided it returns the entity you want, EF should be able to populate it. This would be a lot more complicated if you are trying to return a more complex object graph. (I.e. Person with many columns and related entities) In those cases I would recommend running two queries, one that does the grouping and selection that just returns PersonId, then use those PersonIds to fetch the relevant Persons with their related data by ID.

Related

Entity Framework 6: Skip() & Take() do not generate SQL, instead the result set is filtered after loading into memory. Or am I doing something wrong?

I have the following code that should gets some book, and retrieve the first 2 tags (Tag entities) from that book (Book entity).
So Tags is a navigation property of the Book entity.
using (var context = new FakeEndavaBookLibraryEntities())
{
Book firstBook = context.Set<Book>().Take(1).First();
var firstTwoTags = firstBook.Tags.OrderBy(tag => tag.Id).Skip(0).Take(2).ToList();
}
I expect obtaining the following SQL query that has to be generated by EF.
SELECT TOP(2)
[Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Instead, the EF Profiler shows me that the EF is generating unbounded result set (like SELECT * FROM ...)
SELECT [Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Here is a scheme fragment if you need it
I also tried to append the .AsQueryable() to firstBook.Tags property and/or remove .Skip(0) method as is shown below, but this didn't help as well.
var firstTwoTags = firstBook.Tags.AsQueryable().OrderBy(tag => tag.Id).Skip(0).Take(2).ToList();
The same undesired behavior:
SELECT [Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Have you ever encountered the same problem when working with Entity Framework 6?
Are there any workarounds to overcome this problem or I've designed the query in a wrong way...?
Thanks for any tip!
firstBook.Tags is a lazily-loaded IEnumerable<Tag>. On the first access, all tags are loaded, and subsequent attempts to turn it into an IQueryable<Tag> do not work, since you did not start from something from which you could sensibly query.
Instead, start from a known good IQueryable<Tag>. Something along the lines of
Tag firstTag = context.Set<Tag>()
.Where(tag => tag.Books.Contains(firstBook))
.OrderBy(tag => tag.Id).Skip(0).Take(1).SingleOrDefault();
should work. You might need minor tweaking to turn the filter condition into something EF understands.
As #hvd pointed out, I had to work with IQueryable<Tag>, whereas firstBook.Tags navigation property is just a lazy-loaded IEnumerable<Tag>.
So here is the solution of my problem, based on the #hvd's answer.
Tag firstTag = context.Set<Tag>() // or even context.Tags
.Where(tag => tag.Books.Any(book => book.Id == firstBook.Id))
.OrderBy(tag => tag.Id)
.Skip(0).Take(1)
.SingleOrDefault();
So the minor changes of #hvd's solution are: replacing the
.Where(tag => tag.Books.Contains(firstBook)) with
Something that EF understands
1) .Where(tag => tag.Books.Any(book => book.Id == firstBook.Id)).
or
2) .Where(tag => tag.Books.Select(book => book.Id).Contains(firstBook.Id))
Any sequence of code (1) or (2) generates the following SQL query, which is definitely no longer an unbounded result set.
SELECT [Project2].[Id] AS [Id],
[Project2].[Version] AS [Version],
[Project2].[Name] AS [Name]
FROM (SELECT [Extent1].[Id] AS [Id],
[Extent1].[Version] AS [Version],
[Extent1].[Name] AS [Name]
FROM [Literature].[Tag] AS [Extent1]
WHERE EXISTS (SELECT 1 AS [C1]
FROM [Literature].[BookTagRelation] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[TagId])
AND ([Extent2].[BookId] = 1 /* #p__linq__0 */))) AS [Project2]
ORDER BY [Project2].[Id] ASC
OFFSET 0 ROWS
FETCH NEXT 1 ROWS ONLY

How get Row Index in a SQL Database using entity framework without query all data?

We do have a database with thousand of entries. we would like to get the "ranked position" for a specific item by its name.
Since there is a lot of data we would like to avoid bring query ALL data in order to determine row index (using ToList() and IndexOf() for instance).
I tried using
List<Ranking> ranking = _context.Testes
.OrderByDescending(t => t.Val)
.Select((d, i) => new Ranking() {
Name = d.Name,
Ranking= i
}).First(d=>d.Name = "Test");
but I got this error:
'value(Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable`1[WebApplication4.Models.Teste]).OrderByDescending(t => t.Val).Select((d, i) => new Ranking() {Name = d.Name, Ranking = i})': This overload of the method 'System.Linq.Queryable.Select' is currently not supported.
Is that possible somehow?
You can't translate this Select() overload to SQL. ORMs aren't meant for reporting and this is 100% a reporting query.
SQL Server offers ranking functions like ROW_NUMBER, RANK and DENSE_RANK. You could create a view that calculates the rank and map your rankings to it, eg :
CREATE VIEW Rankings
AS
SELECT
Name,
DENSE_RANK() OVER(ORDER BY Val) Ranking
From Tests
DENSE_RANK() will return the same rank number if two records tie and continue with the next rank number. ROW_NUMBER will just use incrementing numbers. If you use ROW_NUMBER you should probably use additional sorting critera to avoid generating random rankings for ties.
EF will probably map the Ranking class to the Rankings view by convention. If not, you map it using the Table attribute or ToTable if you use Code-First configuration :
[Table("Rankings")
public class Ranking
{
public string Name{get;set;}
public int Ranking {get;set;}
}
Retrieving a specific ranking requires only a Where() clause:
var someRanking=context.Rankings.Where(r=>r.Name=someName);
In LINQ something like (note you must handle ties to get a well-defined ranking)
var q = from t in db.Testes
where t.Name == "whatever"
select new
{
Testes =t,
Rank =1+db.Testes.Where(ot => ot.Val < t.Val || (ot.Val == t.Val && ot.Id < t.Id) ).Count()
};
which translates to
SELECT
[Project1].[Id] AS [Id],
[Project1].[Val] AS [Val],
[Project1].[Name] AS [Name],
1 + [Project1].[C1] AS [C1]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Val] AS [Val],
[Extent1].[Name] AS [Name],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[Testes] AS [Extent2]
WHERE ([Extent2].[Val] < [Extent1].[Val]) OR (([Extent2].[Val] = [Extent1].[Val]) AND ([Extent2].[Id] < [Extent1].[Id]))) AS [C1]
FROM [dbo].[Testes] AS [Extent1]
WHERE N'whatever' = [Extent1].[Name]
) AS [Project1]

LINQ Query generates unnecessarily long SQL

I have a bunch of Services, who references Students (many to one), who references StudentEnrollments (one to many).
When I query these services, it is generating SQL that contains 2 blocks that looks the same which slows down performance. I cannot for the life of me figure out why.
Here is my C# Code (narrowed down):
IQueryable<StudentServiceDm> query = GetListQuery();
List<int> schoolIds = // from front-end: in this case: 20, 21, 22, 23, 89, 90, 93, 95
query = query.Where(m => m.Student.StudentEnrollments.Any(s => schoolIds.Contains(s.SchoolId.Value)));
IQueryable<StudentServiceDto> dtoQuery = query.Select(m => new StudentServiceDto
{
Id = m.Id,
Name = m.Name,
ParentParticipationCount = m.ParentCount,
StudentFirstName = m.Student.FirstName,
StudentLastName = m.Student.LastName,
StudentId = m.StudentId.Value,
StudentServiceType = m.StudentServiceType.Name,
StudentServiceSubType = m.StudentServiceSubType.Name,
Date = m.Date,
DurationInMinutes = m.DurationInMinutes
});
return dtoQuery;
Here is the generated SQL:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent1].[ParentCount] AS [ParentCount],
[Extent2].[FirstName] AS [FirstName],
[Extent2].[LastName] AS [LastName],
[Extent1].[StudentId] AS [StudentId],
[Extent3].[Name] AS [Name1],
[Extent4].[Name] AS [Name2],
[Extent1].[Date] AS [Date],
[Extent1].[DurationInMinutes] AS [DurationInMinutes]
FROM [dbo].[StudentService] AS [Extent1]
LEFT OUTER JOIN [dbo].[Student] AS [Extent2] ON ([Extent2].[Deleted] = 0) AND ([Extent1].[StudentId] = [Extent2].[Id])
LEFT OUTER JOIN [dbo].[StudentServiceType] AS [Extent3] ON ([Extent3].[Deleted] = 0) AND ([Extent1].[StudentServiceTypeId] = [Extent3].[Id])
LEFT OUTER JOIN [dbo].[StudentServiceSubType] AS [Extent4] ON ([Extent4].[Deleted] = 0) AND ([Extent1].[StudentServiceSubTypeId] = [Extent4].[Id])
WHERE ([Extent1].[Deleted] = 0) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[StudentEnrollment] AS [Extent5]
INNER JOIN [dbo].[Student] AS [Extent6] ON [Extent6].[Id] = [Extent5].[StudentId]
WHERE ([Extent6].[Deleted] = 0) AND ([Extent1].[StudentId] = [Extent6].[Id]) AND ([Extent5].[Deleted] = 0) AND ([Extent5].[SchoolId] IN (20, 21, 22, 23, 89, 90, 93, 95))
)) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[StudentEnrollment] AS [Extent7]
INNER JOIN [dbo].[Student] AS [Extent8] ON [Extent8].[Id] = [Extent7].[StudentId]
WHERE ([Extent8].[Deleted] = 0) AND ([Extent1].[StudentId] = [Extent8].[Id]) AND ([Extent7].[Deleted] = 0) AND ([Extent7].[SchoolId] IN (20, 21, 22, 23, 89, 90, 93, 95))
))
ORDER BY [Extent1].[Date] DESC, [Extent1].[Id] ASC
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY
As you can see, the SQL is doing two boolean blocks (A AND B) where A and B looks exactly the same (with the [extend] suffix being different of course). I think my query is simple enough as to not confuse LINQ to generate such query. Can any expert tell me why this is happening? Or How I can write my query in another way.
Entity Framework makes little attempt to optimize the SQL being generated - quite the opposite in practice. It's meant to be convenient rather than fast.
LINQ and Entity Framework are free, but Windows Azure charges by the second for database access. The slower the queries are, the more money Microsoft makes.
So I'm sure Microsoft are working really, really hard to speed it up for you.
If you need speed but cannot get it from EF, there are options:
Write a SQL stored procedure or SQL view - both can be called from Entity Framework.
Write your own query in SQL and execute it using ADO.NET
Fiddle around with the LINQ query until it speeds up by itself
query = query.Where(m => m.Student.StudentEnrollments.Any(s => schoolIds.Contains(s.SchoolId.Value)));
to
query = query.Where(a => schoolIds.Any(b => a.Student.StudentEnrollments.Select(c => c.SchoolId.Value).Contains(b)));
I flipped the logic and it generates a query that increased the performance. Even though it's longer and not ideal, but at least it is "correct". The first LINQ just for some reason has those 2 duplicated blocks which really kills the performance in this case.

Entity Framework v6.1 query compilation performance

I am confused how EF LINQ queries are compiled and executed. When I run a piece of program in LINQPad couple of times, I get varied performance results (each time the same query takes different amount of time). Please find below my test execution environment.
tools used: EF v6.1 & LINQPad v5.08.
Ref DB : ContosoUniversity DB downloaded from MSDN.
For queries, I am using Persons, Courses & Departments tables from the above DB; see below.
Now, I have below data:
Query goal: get the second person and associated departments.
Query:
var test = (
from p in Persons
join d in Departments on p.ID equals d.InstructorID
select new {
person = p,
dept = d
}
);
var result = (from pd in test
group pd by pd.person.ID into grp
orderby grp.Key
select new {
ID = grp.Key,
FirstName = grp.First().person.FirstName,
Deps = grp.Where(x => x.dept != null).Select(x => x.dept).Distinct().ToList()
}).Skip(1).Take(1).ToList();
foreach(var r in result)
{
Console.WriteLine("person is..." + r.FirstName);
Console.WriteLine(r.FirstName + "' deps are...");
foreach(var d in r.Deps){
Console.WriteLine(d.Name);
}
}
When I run this I get the result and LINQPad shows time taken value from 3.515 sec to 0.004 sec (depending how much gap I take between different runs).
If I take the generated SQL query and execute it, that query always runs between 0.015 sec to 0.001sec.
Generated query:
-- Region Parameters
DECLARE #p0 Int = 1
DECLARE #p1 Int = 1
-- EndRegion
SELECT [t7].[ID], [t7].[value] AS [FirstName]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t6].[ID]) AS [ROW_NUMBER], [t6].[ID], [t6].[value]
FROM (
SELECT [t2].[ID], (
SELECT [t5].[FirstName]
FROM (
SELECT TOP (1) [t3].[FirstName]
FROM [Person] AS [t3]
INNER JOIN [Department] AS [t4] ON ([t3].[ID]) = [t4]. [InstructorID]
WHERE [t2].[ID] = [t3].[ID]
) AS [t5]
) AS [value]
FROM (
SELECT [t0].[ID]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
GROUP BY [t0].[ID]
) AS [t2]
) AS [t6]
) AS [t7]
WHERE [t7].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t7].[ROW_NUMBER]
GO
-- Region Parameters
DECLARE #x1 Int = 2
-- EndRegion
SELECT DISTINCT [t1].[DepartmentID], [t1].[Name], [t1].[Budget], [t1]. [StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
WHERE #x1 = [t0].[ID]
My questions:
1) Are those LINQ statements correct? Or can they be optimized?
2) Is the time difference for LINQ query execution normal?
Another different question:
I have modified the first query to execute immediately (called ToList before the second query). This time generated SQL is very simple as shown below (it doesn't look like there is a SQL query for the first LINQ statement with ToList() included):
SELECT [t0].[ID], [t0].[LastName], [t0].[FirstName], [t0].[HireDate], [t0]. [EnrollmentDate], [t0].[Discriminator], [t1].[DepartmentID], [t1].[Name], [t1]. [Budget], [t1].[StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
Running this modified query also took varied amount of time but the difference is not as big as the first query set run.
In my application, there going to be lot of rows and I prefer first query set to second one but I am confused.
Please guide.
(Note: I have a little SQL Server knowledge so, I am using LINQPad to fine tune queries based on the performance)
Thanks

LINQ to Entities Query Optimization - Poor Performance

I've got some very poor performing queries all over the place in my EF6 implemented app. Here is one query that is taking nearly 3000 MS to be performed. (localhost to external sql server)
dash.UserActivities = db.Activities.Include(a => a.Customer).Include(a => a.ActivityType).Where(a => a.AssignedUserId == userId)
.Where(a => a.IsComplete == false).OrderBy(a => a.DueDateTime).Take(10).Select(
a => new ActivityViewModel()
{
Id = a.Id,
CustomerFirstName = a.Customer.FirstName,
CustomerLastName = a.Customer.LastName,
ActivityType = a.ActivityType.Name,
DueDateTime = a.DueDateTime,
}
).ToList();
Clearly something doesn't feel right about this, it is probably something obvious. But I have no clue what it is!
UPDATE
The SQL being generated from this is:
SELECT TOP (10)
[Project1].[C1] AS [C1],
[Project1].[Id] AS [Id],
[Project1].[FirstName] AS [FirstName],
[Project1].[LastName] AS [LastName],
[Project1].[Name] AS [Name],
[Project1].[DueDateTime] AS [DueDateTime]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[DueDateTime] AS [DueDateTime],
[Extent2].[FirstName] AS [FirstName],
[Extent2].[LastName] AS [LastName],
[Extent3].[Name] AS [Name],
1 AS [C1]
FROM [dbo].[Activities] AS [Extent1]
INNER JOIN [dbo].[Customers] AS [Extent2] ON [Extent1].[CustomerId] = [Extent2].[Id]
INNER JOIN [dbo].[ActivityTypes] AS [Extent3] ON [Extent1].[ActivityTypeId] = [Extent3].[Id]
WHERE (0 = [Extent1].[IsComplete]) AND ([Extent1].[AssignedUserId] = 037da3f4-99cc-4338-8b36-491ca0fcfcb1 /* #p__linq__0 */)
) AS [Project1]
ORDER BY [Project1].[DueDateTime] ASC
As #MarcinJuraszek suggested in the comments, we needed to perform these queries locally and audit the performance here. Using tuning adviser, we found many opportunities for improvement.
First thing, turn of LazyLoading in the context if enabled... Performance are bad, really bad. And the include will be disregarded if enabled.
In your example, as you are using some real class in your query, the Select is not done on the SQL Server side, but via C#. That means it will do the join operation on all table, do a "select *" as he does not know what fields he will need, and then map it. Slow, really slow.
dash.UserActivities = db.Activities..Where(a => a.AssignedUserId == userId && a => !a.IsComplete).OrderBy(a => a.DueDateTime).Select( a => new {
Id = a.Id,
CustomerFirstName = a.Customer.FirstName,
CustomerLastName = a.Customer.LastName,
ActivityType = a.ActivityType.Name,
DueDateTime = a.DueDateTime
}).Take(10).Select(
a => new ActivityViewModel()
{
Id = a.Id,
CustomerFirstName = a.CustomerFirstName,
CustomerLastName = a.CustomerLastName ,
ActivityType = a.ActivityType
DueDateTime = a.DueDateTime,
}
).ToList();

Categories

Resources