How to COUNT rows within EntityFramework without loading contents? - c#

I'm trying to determine how to count the matching rows on a table using the EntityFramework.
The problem is that each row might have many megabytes of data (in a Binary field). Of course the SQL would be something like this:
SELECT COUNT(*) FROM [MyTable] WHERE [fkID] = '1';
I could load all of the rows and then find the Count with:
var owner = context.MyContainer.Where(t => t.ID == '1');
owner.MyTable.Load();
var count = owner.MyTable.Count();
But that is grossly inefficient. Is there a simpler way?
EDIT: Thanks, all. I've moved the DB from a private attached so I can run profiling; this helps but causes confusions I didn't expect.
And my real data is a bit deeper, I'll use Trucks carrying Pallets of Cases of Items -- and I don't want the Truck to leave unless there is at least one Item in it.
My attempts are shown below. The part I don't get is that CASE_2 never access the DB server (MSSQL).
var truck = context.Truck.FirstOrDefault(t => (t.ID == truckID));
if (truck == null)
return "Invalid Truck ID: " + truckID;
var dlist = from t in ve.Truck
where t.ID == truckID
select t.Driver;
if (dlist.Count() == 0)
return "No Driver for this Truck";
var plist = from t in ve.Truck where t.ID == truckID
from r in t.Pallet select r;
if (plist.Count() == 0)
return "No Pallets are in this Truck";
#if CASE_1
/// This works fine (using 'plist'):
var list1 = from r in plist
from c in r.Case
from i in c.Item
select i;
if (list1.Count() == 0)
return "No Items are in the Truck";
#endif
#if CASE_2
/// This never executes any SQL on the server.
var list2 = from r in truck.Pallet
from c in r.Case
from i in c.Item
select i;
bool ok = (list.Count() > 0);
if (!ok)
return "No Items are in the Truck";
#endif
#if CASE_3
/// Forced loading also works, as stated in the OP...
bool ok = false;
foreach (var pallet in truck.Pallet) {
pallet.Case.Load();
foreach (var kase in pallet.Case) {
kase.Item.Load();
var item = kase.Item.FirstOrDefault();
if (item != null) {
ok = true;
break;
}
}
if (ok) break;
}
if (!ok)
return "No Items are in the Truck";
#endif
And the SQL resulting from CASE_1 is piped through sp_executesql, but:
SELECT [Project1].[C1] AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(cast(1 as bit)) AS [A1]
FROM [dbo].[PalletTruckMap] AS [Extent1]
INNER JOIN [dbo].[PalletCaseMap] AS [Extent2] ON [Extent1].[PalletID] = [Extent2].[PalletID]
INNER JOIN [dbo].[Item] AS [Extent3] ON [Extent2].[CaseID] = [Extent3].[CaseID]
WHERE [Extent1].[TruckID] = '....'
) AS [GroupBy1] ) AS [Project1] ON 1 = 1
[I don't really have Trucks, Drivers, Pallets, Cases or Items; as you can see from the SQL the Truck-Pallet and Pallet-Case relationships are many-to-many -- although I don't think that matters. My real objects are intangibles and harder to describe, so I changed the names.]

Query syntax:
var count = (from o in context.MyContainer
where o.ID == '1'
from t in o.MyTable
select t).Count();
Method syntax:
var count = context.MyContainer
.Where(o => o.ID == '1')
.SelectMany(o => o.MyTable)
.Count()
Both generate the same SQL query.

I think you want something like
var count = context.MyTable.Count(t => t.MyContainer.ID == '1');
(edited to reflect comments)

As I understand it, the selected answer still loads all of the related tests. According to this msdn blog, there is a better way.
http://blogs.msdn.com/b/adonet/archive/2011/01/31/using-dbcontext-in-ef-feature-ctp5-part-6-loading-related-entities.aspx
Specifically
using (var context = new UnicornsContext())
var princess = context.Princesses.Find(1);
// Count how many unicorns the princess owns
var unicornHaul = context.Entry(princess)
.Collection(p => p.Unicorns)
.Query()
.Count();
}

This is my code:
IQueryable<AuctionRecord> records = db.AuctionRecord;
var count = records.Count();
Make sure the variable is defined as IQueryable then when you use Count() method, EF will execute something like
select count(*) from ...
Otherwise, if the records is defined as IEnumerable, the sql generated will query the entire table and count rows returned.

Well, even the SELECT COUNT(*) FROM Table will be fairly inefficient, especially on large tables, since SQL Server really can't do anything but do a full table scan (clustered index scan).
Sometimes, it's good enough to know an approximate number of rows from the database, and in such a case, a statement like this might suffice:
SELECT
SUM(used_page_count) * 8 AS SizeKB,
SUM(row_count) AS [RowCount],
OBJECT_NAME(OBJECT_ID) AS TableName
FROM
sys.dm_db_partition_stats
WHERE
OBJECT_ID = OBJECT_ID('YourTableNameHere')
AND (index_id = 0 OR index_id = 1)
GROUP BY
OBJECT_ID
This will inspect the dynamic management view and extract the number of rows and the table size from it, given a specific table. It does so by summing up the entries for the heap (index_id = 0) or the clustered index (index_id = 1).
It's quick, it's easy to use, but it's not guaranteed to be 100% accurate or up to date. But in many cases, this is "good enough" (and put much less burden on the server).
Maybe that would work for you, too? Of course, to use it in EF, you'd have to wrap this up in a stored proc or use a straight "Execute SQL query" call.
Marc

Use the ExecuteStoreQuery method of the entity context. This avoids downloading the entire result set and deserializing into objects to do a simple row count.
int count;
using (var db = new MyDatabase()){
string sql = "SELECT COUNT(*) FROM MyTable where FkId = {0}";
object[] myParams = {1};
var cntQuery = db.ExecuteStoreQuery<int>(sql, myParams);
count = cntQuery.First<int>();
}

I think this should work...
var query = from m in context.MyTable
where m.MyContainerId == '1' // or what ever the foreign key name is...
select m;
var count = query.Count();

Related

How to get total available rows from paginated ef core query

Thanks in advance for taking time to read this question.
I have a view in my database, lets call it Members_VW
In my .net 5 API, I'm trying to get a paginated response for the list of members from the view with search parameters. I need to also return the total number of responses for the front end to know in how many pages the results will be returned in.
Currently the Members_VW is made with a query like:
select
col1, col2, col3
from
table1 1
inner join table2 2 on 1.key = 2.key
inner join tble3 3 on 3.key = 2.key
where
defaultcondition1 = '1'
and
defaultcondition2 = '2'
I referred to this answer and tried using CTE which ended up changing my view to using a query like this:
with cte1 as (
select
col1, col2, col3
from
table1 1
inner join table2 2 on 1.key = 2.key
inner join tble3 3 on 3.key = 2.key
where
defaultcondition1 = '1'
and
defaultcondition2 = '2')
cte2 as (
select count(*) over() from cte1 )
select
*
from
cte1, cte2
But this didn't work because it would always return the total number of rows in cte1 without any of the filters applied.
So, I continued to try to construct queries to return the total number of rows after the conditions are applied and found that this query works:
select
col1, col2, col3, count(*) over()
from
table1 1
inner join table2 2 on 1.key = 2.key
inner join tble3 3 on 3.key = 2.key
where
defaultcondition1 = '1'
and
defaultcondition2 = '2'
Currently, I'm trying to implement the same query with EF Core but am struggling to implement that.
I've tried implementing the solution provided here, but as one of the comments suggests, this implementation is no longer allowed.
I am trying to avoid an implementation where I use a raw query. Is there anyway to get the result from count(*) over() without using a raw query?
The following is my current implementation:
IQueryable<MembersVW> membersQuery = _context.MembersVW;
membersQuery = membersQuery.Where(u => u.MemberId == memberid);
membersQuery = membersQuery.OrderBy(m => m.MemberId).Skip(page * size).Take(size);
When I do:
membersQuery = membersQuery.Count()
I'm returned with the following error:
Error CS0029 Cannot implicitly convert type 'int' to 'System.Linq.IQueryable<PersonalPolicyAPI.Models.VwPersonalPolicyMember>'
Again, thanks for reading my question, appreciate any help you can offer. 🙏🏾
I've read your question about can it be done with one query. While I'm not aware of any way to do it with 1 query I can offer one more solution that will help with your concern about performance and 2 queries. I do this frequently. 😁 Try:
//execute both queries at the same time instead of sequentially
var countqry = membersQuery.CountAsync();
var pageqry = membersQuery.OrderBy(m => m.MemberId).Skip(page * size).Take(size).ToListAsync();
//wait for them both to complete
Task.WaitAll(countqry, pageqry);
//use the results
var count = countqry.Result;
var page = pageqry.Result;
membersQuery.Count() returns integer not the queryable
you can do
int count = membersQuery.Count();
List<MemberVW> = membersQuery.OrderBy(m => m.MemberId).Skip(page * size).Take(size).ToList();
and you can return with
public class MemberVwWithCount {
public int Count{get;set;}
public List<MemberVW> Members {get; set;}
}
You try to assign the Count Value, which is an Integer, to the variable of your query, which is an IQueryable. That's all there is to it.
If you want to do it in one single query, as you suggest in one of your comments, you can first execute the query to get all Entries, then count the result, and then filter the result with skip/take. This is most probably not the most efficient way to do this, but it should work.
I'd also suggest to use AsNoTracking() if you do not modify any data in this function/api.
EDIT:
I'd suggest this solution for now. The counting is fast, as it actually doesn't fetch any data and just counts the rows. It is still two queries tho, gonna try to combine it & edit my answer later.
var count = await yourContext.YourTable.CountAsync();
var data = await yourContext.YourTable
.OrderBy(x => x.YourProp)
.Skip(10).Take(10)
//.AsNoTracking()
.ToListAsync();
EDIT2:
Okay, so, I couldn't get it to just make on DB-Call yet, however, I could combine it syntactically. However, the approach in my first edit is easier to read and does basically the same. Still, gonna dig deeper into this, there's gotta be a funky way to do this.
var query = yourContext.YourTable.AsQueryable();
var result = await query.OrderBy(x => x.Prop)
.Select(x => new {Data = x, Count = query.Count()} )
.Skip(50).Take(50)
.AsNoTracking()
.ToListAsync();
var count = result.FirstOrDefault()?.Count ?? 0; //If empty/null return 0
var data = result.Select(x => x.Data).ToList();
In membersQuery = membersQuery.Count() line you are assigning integer value to a queryable list, which is incorrect. You can get the list item counts after your query like this i.e.
membersQuery = membersQuery.OrderBy(m => m.MemberId).Skip(page * size).Take(size);
int totalCount = membersQuery.Count();
To get count column in same list, you first need to add Count property in your MembersVW class and then use LINQ projection to add column value.
Solution-1:
memberQuery = membersQuery.Select(p => new MembersVW
{
col1 = p.col1
col2 = p.col2
col3 = p.col3
count = totalCount
});
Solution-2:
With LINQ foreach loop i.e.
membersQuery.ForEach(item =>
{
item.count = totalCount;
});

Linq equivalent of aggregate function on multiple tables in one database trip

I have a table function which returns table names and number of entries within that table :
CREATE FUNCTION [dbo].[ufnGetLookups] ()
RETURNS
#lookupsWithItemCounts TABLE
(
[Name] VARCHAR(100),
[EntryCount] INT
)
AS
BEGIN
INSERT INTO #lookupsWithItemCounts([Name],[EntryCount])
VALUES
('Table1', (SELECT COUNT(*) FROM Table1)),
('Table2', (SELECT COUNT(*) FROM Table2)),
('Table3', (SELECT COUNT(*) FROM Table))
RETURN;
END
What would be the Linq equivalent of above simple function? Notice that I want to get the result in one single shot and the speed of the operation is quite important for me. If I realise that the converted linq to sql results in a massive bulky sql with performance hit, I would rather stick to my existing user defined function and forget about the linq equivilant.
You can do that with a UNION query. EG
var q = db.Books.GroupBy(g => "Books").Select(g => new { Name = g.Key, EntryCount = g.Count() })
.Union(db.Authors.GroupBy(g => "Authors").Select(g => new { Name = g.Key, EntryCount = g.Count() }));
var r = q.ToList();
Not an EF guy, and not sure if this would be more performant.
Select TableName = o.name
,RowCnt = sum(p.Rows)
From sys.objects as o
Join sys.partitions as p on o.object_id = p.object_id
Where o.type = 'U'
and o.is_ms_shipped = 0x0
and index_id < 2 -- 0:Heap, 1:Clustered
--and o.name in ('Table1','Table2','Table3' ) -- Include (or not) your own filter
Group By o.schema_id,o.name
Note: Wish I could recall the source of this, but I've used it in my discovery process.

Entity Framework generates inefficient SQL for paged query

I have a simple paged linq query against one entity:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t);
data = data.OrderBy(t => t.Id);
if (page > 0)
{
data = data.Skip(rows * (page - 1)).Take(rows);
}
var l = data.ToList();
I expected it to generate SQL similar to:
select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id
When I run the above query in SSMS, it returns quickly (had to rebuild my indexes first).
However, the generated SQL is different. It contains a nested query as shown below:
SELECT TOP (50)
[Project1].[Id] AS [Id],
[Project1].[CampaignId] AS [CampaignId]
<redacted>
FROM ( SELECT [Project1].[Id] AS [Id],
[Project1].[CampaignId] AS [CampaignId],
<redacted>,
row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[CampaignId] AS [CampaignId],
<redacted>
FROM [dbo].[Widgets] AS [Extent1]
WHERE ([Extent1].[CampaignId] = #p__linq__0) AND ([Extent1].[CalendarEventId] = #p__linq__1) AND ([Extent1].[RecurringEventId] = #p__linq__2 OR [Extent1].[RecurringEventId] IS NULL)
) AS [Project1]
) AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[Id] ASC
The Widgets table is enormous and the inner query returns 100000s of records, causing a timeout.
Is there anything I can do to change the generation? Anything I am doing wrong?
UPDATE
I finally managed to refactor my code to return the results relatively quickly:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t)).AsEnumerable().Select((item, index) => new { Index = index, Item = item });
data = data.OrderBy(t => t.Index);
if (page > 0)
{
data = data.Where(t => t.Index >= (rows * (page - 1)));
}
data = data.Take(rows);
Note, the page > 0 logic is simply used to prevent an invalid parameter being used; it does no optimization. In fact page > 1 , while valid, does not provide any noticeable optimization for the 1st page; since the Where is not a slow operation.
Prior SQL Server 2012, the generated SQL code is the best way to perform pagging. Yes, it is awfull and very inefficient but is the best you can do even writing your own SQL scritp by hand. There are tons of digital ink about this in the net. Just google it.
In the firt page, this can be optimized not doing Skip and just Take but in any other page you are f***** up.
A workarround could be to generate your own row_number in persistence (an auto-identity could work) and just do where(widget.number > (page*rows) ).Take(rows) in code. If there is a good index in your widget.number the query should be very fast. But, this breaks the dynamic orderBy.
However, I can see in your code that you are ordering by widget.id always; so, if dynamic orderBy is not essential, this could be a valid workaround.
Will you take your own medicine?
could you ask me.
No, I will not. The best way to deal with this is having a persistence read-model in wich you can even have one table per widget orderBy field with its own widget.number. The problem is that modeling a system with a persistence read-model just for this issue is too crazy. Having a read-model is part of the overall design of your system and requires taking it in account from the very beginning of the design and development of a system.
The generated query is so complex and nested because you used Skip method. In T-SQL Take is easy achievable by using just Top, but that is not the case with Skip - to apply it you need row_number and that is why there is a nested query - inner returns rows with row_number and outer filters them to get proper amount of rows. Your query:
select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id
lacks Skipping initial rows. To keep the query very efficient it would be best to, instead of using Take and Skip to keep paging by condition on Id, because you are ordering your rows for paging basing on that field:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t);
data = data
.OrderBy(t => t.Id);
.Where(t => t.Id >= rows * (page - 1) && t.Id < rows * page )
.ToList();
AFAIK you cannot change query generated by Entity. Although you can force entity to run raw SQL query:
https://msdn.microsoft.com/en-us/data/jj592907.aspx
You can also use stored procedures:
https://msdn.microsoft.com/en-us/data/gg699321.aspx
Even if there's a chance to change generated query IMO it would be spitting into the wind. I bet that easier will be to write the SQL query on your own.

Get total row count in Entity Framework

I'm using Entity Framework to get the total row count for a table. I simply want the row count, no where clause or anything like that. The following query works, but is slow. It took about 7 seconds to return the count of 4475.
My guess here is that it's iterating through the entire table, just like how IEnumerable.Count() extension method works.
Is there a way I can get the total row count "quickly"? is there a better way?
public int GetLogCount()
{
using (var context = new my_db_entities(connection_string))
{
return context.Logs.Count();
}
}
You can even fire Raw SQL query using entity framework as below:
var sql = "SELECT COUNT(*) FROM dbo.Logs";
var total = context.Database.SqlQuery<int>(sql).Single();
That is the way to get your row count using Entity Framework. You will probably see faster performance on the second+ queries as there is an initialization cost the first time that you run it. (And it should be generating a Select Count() query here, not iterating through each row).
If you are interested in a faster way to get the raw row count in a table, then you might want to try using a mini ORM like Dapper or OrmLite.
You should also make sure that your table is properly defined (at the very least, that it has a Primary Key), as failure to do this can also affect the time to count rows in the table.
If you have access to do so, it would be much quicker to query the sys tables to pull this information.
E.g.
public Int64 GetLogCount()
{
var tableNameParam = new SqlParameter("TableName", "Logs");
var schemaNameParam = new SqlParameter("SchemaName", "dbo");
using (var context = new my_db_entities(connection_string))
{
var query = #"
SELECT ISNULL([RowCount],0)
FROM (
SELECT SchemaName,
TableName,
Sum(I.rowcnt) [RowCount]
FROM sysindexes I
JOIN sysobjects O (nolock) ON I.id = o.id AND o.type = 'U'
JOIN (
SELECT so.object_id,
ss.name as SchemaName,
so.name as TableName
FROM sys.objects SO (nolock)
JOIN sys.schemas SS (nolock) ON ss.schema_id = so.schema_id
) SN
ON SN.object_id = o.id
WHERE I.indid IN ( 0, 1 )
AND TableName = #TableName AND SchemaName = #SchemaName
GROUP BY
SchemaName, TableName
) A
";
return context.ExecuteStoreQuery<Int64>(query, tableNameParam, schemaNameParam).First();
}
}

Linq to SQL one to many relationships

Last couple of days i was struggling with a linq querys performance:
LinqConnectionDataContext context = new LinqConnectionDataContext();
System.Data.Linq.DataLoadOptions options = new System.Data.Linq.DataLoadOptions();
options.LoadWith<Question>(x => x.Answers);
options.LoadWith<Question>(x => x.QuestionVotes);
options.LoadWith<Answer>(x => x.AnswerVotes);
context.LoadOptions = options;
var query =( from c in context.Questions
where c.UidUser == userGuid
&& c.Answers.Any() == true
select new
{
c.Uid,
c.Content,
c.UidUser,
QuestionVote = from qv in c.QuestionVotes where qv.UidQuestion == c.Uid && qv.UidUser == userGuid select new {qv.UidQuestion, qv.UidUser },
Answer = from d in c.Answers
where d.UidQuestion == c.Uid
select new
{
d.Uid,
d.UidUser,
d.Conetent,
AnswerVote = from av in d.AnswerVotes where av.UidAnswer == d.Uid && av.UidUser == userGuid select new { av.UidAnswer, av.UidUser }
}
}).ToList();
Query have to run through 5000 rows, and it takes up to 1 minute. How can i improve performance of this query?
Update:
something to get you started.
CREATE PROCEDURE GetQuestionsAndAnswers
(
#UserGuid VARCHAR(100)
)
AS
BEGIN
SELECT
c.Uid,
c.Content,
c.UidUser,
qv.UidQuestion,
qv.UidUser,
av.UidAnswer,
av.UidUser,
av.Content,
d.Uid,
d.UidUser,
d.Content
FROM Question c
INNER JOIN QuestionVotes qv ON qv.UidQuestion = c.Uid AND qv.UidUser = #UserGuid
INNER JOIN Answers d ON d.UidQuestion = c.Uid
INNER JOIN AnswerVotes av ON av.UidAnswer = d.Uid AND av.UidUser = #UserGuid
WHERE c.UidUser = #UserGuid
END
You will already have clustered indexes on primary columns by default (just confirm this on your database side), and you would want non-clustered indexes on QuestionVote - UidUser column, AnswerVote - UidUser column, and Answer - UidQuestion column.
Also have a look here. You might want to use .AsQueryable() instead of ToList() for deferred execution
Do you ToList()?
Checked out the generated sql using sql-debug-visualizer and then copy the generated SQL and run it from SQL Client and see how much time it takes. If it takes near to 1 min you need to imporve performance at DB Level by adding indexing and / or stored procedure or creating views etc.
If above is not taking much time you can always create Stored Procedure and call that using LINQ to SQL.
One more recommendation is to use Entity Framework if you can change to because it is the future.

Categories

Resources