LINQ/EF Optimising group by query with count - c#

I'm trying to do a very simple group by with a count but running into issues as LINQ is throwing some very horrible SQL which is timing out. I'm running EF7 and ASP NET 5.
Here's my code -
var counts = (from log in db.LogTable
where proj.Contains(log.projId)
group log by log.callTypeId into grp
select new
{
key = (from temp in callTypes where temp.Id == grp.Key select temp),
count = grp.Count()
});
callTypes is cached, LogTable contains over 1m records.
I've even tried removing the where clause, changing the key to select from the callTypes cache and just return the id, and also adding an orderby callTypeId but neither of these made any improvements.
Query being executed is as follows -
SELECT [log].[Id], [log].[callTypeId], [all other columns...]
FROM [LogTable] AS [log]
WHERE [log].[callTypeId] IN (1, 2, 3, 4, 5)
Does anyone have any idea whether there is a way I can force this to "SELECT callTypeId, count(*) FROM LogTable WHERE... GROUP BY callTypeId" to stop it from returning all the records and doing the grouping on the application side please? Or do I need to revert to Stored Procedures ??
UPDATE -
As explained above, replacing (from temp in callTypes where temp.Id == grp.Key select temp) with just grp.Key gives the same query so this is not the issue.
Many thanks!

Related

Selecting a new object without grouping in linq?

I am trying to write a query to collect the Job number, start date, customer name, Model, and completion date for jobs at my work. To get this info, I look at 3 different tables, using joins to put them together. Here are the three tables:
STAGE - (stages each job goes through during production)
ORDER - (this is where I get the customer's name)
JOBS (start date, completion date, job number, model)
So most of the info is from the JOBS table. But I'm joining onto the ORDER table by Job Number (JobNum) to obtain the customer's name. Here's what the query looks like. I created it in SQL before I tried to translate it into one of my ViewModels:
var CompletedTrucksQuery =
(from FA in context.JOBS
join ORD in context.ORDER on FA.ORGANIZATION_ID equals ORD.ORGANIZATION_ID
where FA.ORDER_NUMBER == ORD.ORDER_NUMBER
join StageF in context.STAGE on FA.JobNum equals StageF.JobNum
where StageF.StageID == 325
join TruckComp in context.STAGE on FA.JobNum equals TruckComp.JobNum
where TruckComp.StageID == 327
join INSP in context.STAGE on FA.JobNum equals INSP.JobNum
where INSP.StageID == 487
orderby StageF.CompDate descending, FA.JobNum ascending
select new {FA.JobNum, FA.StartDate, ORD.CUSTOMER_NAME, FA.MODEL_NAME, StageF.CompDate});
At this point, I'm wanting to select the
Job number (from JOBS),
the start date, (From JOBS),
the Customer's name (from ORDER),
the Model of product (from JOBS),
and the date it was completed in StageF (from STAGE)
as you can see in my select statement. I DO have an object to hold each of these called CompJob, and have tried to do a 'group by' and select a new CompJob and set the properties, but I can't seem to group it right and get 'access' to all of the properties I want to set. Here's an excerpt of what I'm talking about:
group new {FA.JobNum, O = ORD} by StageF into grp
select new CompletedTruck
{
JobNum = grp.Key.JobNum,
StartDate = grp. //???
}
As you can tell I stopped, because for some reason I couldn't 'find' the start date. I know it's something to do with my grouping. I'm very new to linq and databases, in general.
MY QUESTION: What's the best way I can select these columns of interest into my
ObservableCollection<CompJob> CompJobList;
so that I may use it in my scrollviewer in a view?
Thanks to everyone for helping me out. I solved the issue and this is what I did. I followed Gert Arnold's suggestion of selecting a new CompJob object. This would set each row of the query to a new object:
select new CompJob {JobNum = FA.JobNum, StartDate = FA.StartDate, Customer = ORD.CUSTOMER_NAME,.....}
and then said
).ToList();
At the end of the query. Then I set my ObservableCollection to the list returned by my query:
CompJobList = CompletedTrucksQuery ;
Thanks again for helping me out; I know I'm new to all of this!

Converting a SQL to Linq Query

Currently I have SQL query like
select tt.userId, count(tt.userId) from (SELECT userId,COUNT(userId) as cou
FROM [dbo].[users]
where createdTime> DATEADD(wk,-1,GETDATE())
group by userId,DATEPART(minute,createdTime)/5) tt group by tt.userId
Now I have the Data in the Data Table, I need to convert the above the query to LINQ and execute against the data table. I am unable to do so , can anybody help me out.
This is what query does, It groups the users into 5 minutes time slots and then counts the number of timeslots per user.
Note : I am not able to use Linqer to create the Linq queries because this table does not exist in the database, it's a virtual one created dynamically.
Bit complex query, giving my best to make it work.
var result = table.AsEnumerable().Where(u=> u.Field<DateTime>("createdTime") > DateTime.Now.AddDays(-7)) //subtract a week
.GroupBy(g=> new { userid = g.Field<string>("userId") , span = g.Field<DateTime>("createdTime").Minute })
.Select(g=> new { userid = g.Key.userid, count = g.Count()})
.GroupBy(g=> g.userid ).Select(s=> new {userid = s.Key, count = s.Count()});
Working Demo
This SQL can be rewritten like this
SELECT
COUNT(U.UserId),
U.[createdTime]
FROM USERS U WHERE createdTime> DATEADD(wk,-1,GETDATE())
GROUP BY U.UserId,
DATEPART(MONTH, U.[createdTime]),
DATEPART(DAY, U.[createdTime]),
DATEPART(HOUR, U.[createdTime]),
(DATEPART(MINUTE, U.[createdTime]) / 5)
And its corresponding Linq for DataTable would be
var users = myDataTable.AsEnumerable()
.Select(r=> new {
UserId = r.Field<int>("UserId"),
CreatedTime = r.Field<DateTime>("createdTime")
}).ToList();
var groupedUsersResult = from user in users where user.CreatedTime > user.CreatedTime.AddDays(-7) group user by
new {user.CreatedTime.Year,user.CreatedTime.Month,user.CreatedTime.Day,Minute=(user.CreatedTime.Minute/5),user.UserId}
into groupedUsers select groupedUsers;
Fiddle is here
I will suggest to use LINQPad4. It would be easy to do that and that will help you a lot in writing LINQ queries.
https://www.linqpad.net/

Entity Framework generates inefficient SQL for paged query

I have a simple paged linq query against one entity:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t);
data = data.OrderBy(t => t.Id);
if (page > 0)
{
data = data.Skip(rows * (page - 1)).Take(rows);
}
var l = data.ToList();
I expected it to generate SQL similar to:
select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id
When I run the above query in SSMS, it returns quickly (had to rebuild my indexes first).
However, the generated SQL is different. It contains a nested query as shown below:
SELECT TOP (50)
[Project1].[Id] AS [Id],
[Project1].[CampaignId] AS [CampaignId]
<redacted>
FROM ( SELECT [Project1].[Id] AS [Id],
[Project1].[CampaignId] AS [CampaignId],
<redacted>,
row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[CampaignId] AS [CampaignId],
<redacted>
FROM [dbo].[Widgets] AS [Extent1]
WHERE ([Extent1].[CampaignId] = #p__linq__0) AND ([Extent1].[CalendarEventId] = #p__linq__1) AND ([Extent1].[RecurringEventId] = #p__linq__2 OR [Extent1].[RecurringEventId] IS NULL)
) AS [Project1]
) AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[Id] ASC
The Widgets table is enormous and the inner query returns 100000s of records, causing a timeout.
Is there anything I can do to change the generation? Anything I am doing wrong?
UPDATE
I finally managed to refactor my code to return the results relatively quickly:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t)).AsEnumerable().Select((item, index) => new { Index = index, Item = item });
data = data.OrderBy(t => t.Index);
if (page > 0)
{
data = data.Where(t => t.Index >= (rows * (page - 1)));
}
data = data.Take(rows);
Note, the page > 0 logic is simply used to prevent an invalid parameter being used; it does no optimization. In fact page > 1 , while valid, does not provide any noticeable optimization for the 1st page; since the Where is not a slow operation.
Prior SQL Server 2012, the generated SQL code is the best way to perform pagging. Yes, it is awfull and very inefficient but is the best you can do even writing your own SQL scritp by hand. There are tons of digital ink about this in the net. Just google it.
In the firt page, this can be optimized not doing Skip and just Take but in any other page you are f***** up.
A workarround could be to generate your own row_number in persistence (an auto-identity could work) and just do where(widget.number > (page*rows) ).Take(rows) in code. If there is a good index in your widget.number the query should be very fast. But, this breaks the dynamic orderBy.
However, I can see in your code that you are ordering by widget.id always; so, if dynamic orderBy is not essential, this could be a valid workaround.
Will you take your own medicine?
could you ask me.
No, I will not. The best way to deal with this is having a persistence read-model in wich you can even have one table per widget orderBy field with its own widget.number. The problem is that modeling a system with a persistence read-model just for this issue is too crazy. Having a read-model is part of the overall design of your system and requires taking it in account from the very beginning of the design and development of a system.
The generated query is so complex and nested because you used Skip method. In T-SQL Take is easy achievable by using just Top, but that is not the case with Skip - to apply it you need row_number and that is why there is a nested query - inner returns rows with row_number and outer filters them to get proper amount of rows. Your query:
select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id
lacks Skipping initial rows. To keep the query very efficient it would be best to, instead of using Take and Skip to keep paging by condition on Id, because you are ordering your rows for paging basing on that field:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t);
data = data
.OrderBy(t => t.Id);
.Where(t => t.Id >= rows * (page - 1) && t.Id < rows * page )
.ToList();
AFAIK you cannot change query generated by Entity. Although you can force entity to run raw SQL query:
https://msdn.microsoft.com/en-us/data/jj592907.aspx
You can also use stored procedures:
https://msdn.microsoft.com/en-us/data/gg699321.aspx
Even if there's a chance to change generated query IMO it would be spitting into the wind. I bet that easier will be to write the SQL query on your own.

get same result from c# linq as SQL statement

I am trying to figure out how to get all the notifications from relations that get multiple notifications, because i want to combine these notifications to 1 notification so the relation wil only get 1 e-mail instead of multiple.
I created the following SQL statement, which for as far as i can tell does what i want:
select distinct r.Notificatie
, r.RelatieNr
FROM [configuratie].[dbo].[NotificatieRecID] r
join [configuratie].[dbo].[Notificatie] n on r.Notificatie = n.ID
where n.Verzonden = 0
and r.RelatieNr in(select RelatieNr from [configuratie].[dbo].[NotificatieRecID]
group by RelatieNr having count(*) > 1)
order by r.RelatieNr
It returns the following
Notification Relation
3A2A53B9-D92A-4504-874D-5A901AD01041 114147
4C499F6C-53C8-49E0-B529-8B045819BE10 114147
AF4ED8CB-D033-47A4-96AE-F379BB484532 114147
977885C5-4C12-431B-AB72-59383B1824C6 303327
3A2A53B9-D92A-4504-874D-5A901AD01041 303327
4C499F6C-53C8-49E0-B529-8B045819BE10 303327
AF4ED8CB-D033-47A4-96AE-F379BB484532 303327
Later in my c# code i will get all the values from the different notifications and simply combine them but first i need to write this SQL statement in a way i can use it with linq in c#.
I have no idea how to do SELECT DISTINCT, and r.RelatieNr in and group by RelatieNr having count(*) > 1
Could someone provide me with an example? (it does not have to be 1 linq statement, i've kind of figured that's impossible, though i would like as little temporary Lists/Iqueriables since the tables are huge)
You can use the following
var details= (from r in NotificatieRecID
join n in Notificatie on r.Notificatie=n.ID
where n.Verzonden=0 &&
(from t in NotificatieRecID
group t by t.RelatieNr into grp
where grp.Count()>1
select grp.Key).Contains(r.relatieNr)
select new {
Notificate=r.Notificatie,
RelatieNr=r.RelatieNr
}).Distinct();

Linq to SQL one to many relationships

Last couple of days i was struggling with a linq querys performance:
LinqConnectionDataContext context = new LinqConnectionDataContext();
System.Data.Linq.DataLoadOptions options = new System.Data.Linq.DataLoadOptions();
options.LoadWith<Question>(x => x.Answers);
options.LoadWith<Question>(x => x.QuestionVotes);
options.LoadWith<Answer>(x => x.AnswerVotes);
context.LoadOptions = options;
var query =( from c in context.Questions
where c.UidUser == userGuid
&& c.Answers.Any() == true
select new
{
c.Uid,
c.Content,
c.UidUser,
QuestionVote = from qv in c.QuestionVotes where qv.UidQuestion == c.Uid && qv.UidUser == userGuid select new {qv.UidQuestion, qv.UidUser },
Answer = from d in c.Answers
where d.UidQuestion == c.Uid
select new
{
d.Uid,
d.UidUser,
d.Conetent,
AnswerVote = from av in d.AnswerVotes where av.UidAnswer == d.Uid && av.UidUser == userGuid select new { av.UidAnswer, av.UidUser }
}
}).ToList();
Query have to run through 5000 rows, and it takes up to 1 minute. How can i improve performance of this query?
Update:
something to get you started.
CREATE PROCEDURE GetQuestionsAndAnswers
(
#UserGuid VARCHAR(100)
)
AS
BEGIN
SELECT
c.Uid,
c.Content,
c.UidUser,
qv.UidQuestion,
qv.UidUser,
av.UidAnswer,
av.UidUser,
av.Content,
d.Uid,
d.UidUser,
d.Content
FROM Question c
INNER JOIN QuestionVotes qv ON qv.UidQuestion = c.Uid AND qv.UidUser = #UserGuid
INNER JOIN Answers d ON d.UidQuestion = c.Uid
INNER JOIN AnswerVotes av ON av.UidAnswer = d.Uid AND av.UidUser = #UserGuid
WHERE c.UidUser = #UserGuid
END
You will already have clustered indexes on primary columns by default (just confirm this on your database side), and you would want non-clustered indexes on QuestionVote - UidUser column, AnswerVote - UidUser column, and Answer - UidQuestion column.
Also have a look here. You might want to use .AsQueryable() instead of ToList() for deferred execution
Do you ToList()?
Checked out the generated sql using sql-debug-visualizer and then copy the generated SQL and run it from SQL Client and see how much time it takes. If it takes near to 1 min you need to imporve performance at DB Level by adding indexing and / or stored procedure or creating views etc.
If above is not taking much time you can always create Stored Procedure and call that using LINQ to SQL.
One more recommendation is to use Entity Framework if you can change to because it is the future.

Categories

Resources