Linq equivalent of aggregate function on multiple tables in one database trip - c#

I have a table function which returns table names and number of entries within that table :
CREATE FUNCTION [dbo].[ufnGetLookups] ()
RETURNS
#lookupsWithItemCounts TABLE
(
[Name] VARCHAR(100),
[EntryCount] INT
)
AS
BEGIN
INSERT INTO #lookupsWithItemCounts([Name],[EntryCount])
VALUES
('Table1', (SELECT COUNT(*) FROM Table1)),
('Table2', (SELECT COUNT(*) FROM Table2)),
('Table3', (SELECT COUNT(*) FROM Table))
RETURN;
END
What would be the Linq equivalent of above simple function? Notice that I want to get the result in one single shot and the speed of the operation is quite important for me. If I realise that the converted linq to sql results in a massive bulky sql with performance hit, I would rather stick to my existing user defined function and forget about the linq equivilant.

You can do that with a UNION query. EG
var q = db.Books.GroupBy(g => "Books").Select(g => new { Name = g.Key, EntryCount = g.Count() })
.Union(db.Authors.GroupBy(g => "Authors").Select(g => new { Name = g.Key, EntryCount = g.Count() }));
var r = q.ToList();

Not an EF guy, and not sure if this would be more performant.
Select TableName = o.name
,RowCnt = sum(p.Rows)
From sys.objects as o
Join sys.partitions as p on o.object_id = p.object_id
Where o.type = 'U'
and o.is_ms_shipped = 0x0
and index_id < 2 -- 0:Heap, 1:Clustered
--and o.name in ('Table1','Table2','Table3' ) -- Include (or not) your own filter
Group By o.schema_id,o.name
Note: Wish I could recall the source of this, but I've used it in my discovery process.

Related

SQL Unions with table counts using EntityFramework LINQ query

I am trying replicate the SQL below using LINQ and Entity Framework and cannot figure out how this should be written.
My simplistic LINQ version does a query per table
public IActionResult Index()
{
dynamic view = new ExpandoObject();
view.AppUsers = Context.AppUsers.Count();
view.CustomerShops = Context.CustomerShops.Count();
view.FavouriteOrders = Context.FavouriteOrders.Count();
view.Items = Context.Items.Count();
view.ItemVariations = Context.ItemVariations.Count();
view.MenuCategories = Context.MenuCategories.Count();
view.MenuCategoryProducts = Context.MenuCategoryProducts.Count();
view.Orders = Context.Orders.Count();
view.Products = Context.Products.Count();
view.ProductVariations = Context.ProductVariations.Count();
view.Shops = Context.Shops.Count();
view.Staffs = Context.Staffs.Count();
return View(view);
}
I use this pattern from time to time to for reporting on my column counts and thought this should be easy to do in LINQ, but no luck so far.
This pure SQL UNION would only generate 1 SQL request, instead of a request per table.
select * from (
select 'asp_net_roles' as type, count(*) from asp_net_roles
union
select 'asp_net_user_roles' as type, count(*) from asp_net_user_roles
union
select 'asp_net_users' as type, count(*) from asp_net_users
union
select 'app_users' as type, count(*) from app_users
union
select 'shops' as type, count(*) from shops
union
select 'staffs' as type, count(*) from shops
union
select 'items' as type, count(*) from items
union
select 'item_variations' as type, count(*) from item_variations
union
select 'products' as type, count(*) from products
union
select 'product_variations' as type, count(*) from product_variations
union
select 'menu_categories' as type, count(*) from menu_categories
) as counters
order by 1;
I saw a partial implementation [linq-group-by-multiple-tables] (https://stackoverflow.com/a/3435503/473923) but this is based of grouping data.
FYI: I'm new to C#/Linq, so sorry if this seams obvious.
Use the this code from my answer
And fill ExpandoObject with result:
var tablesinfo = Context.GetTablesInfo();
var expando = new ExpandoObject();
if (tablesinfo != null)
{
var dic = (IDictionary<string, object>)expando;
foreach(var info in tablesinfo)
{
dic.Add(info.TableName, info.RecordCount);
}
}
Idea is that you can UNION counts if you group entities by constant.
Schematically function builds the following IQueryable Expression:
var tablesinfo =
Context.AppUsers.GroupBy(x => 1).Select(g => new TableInfo{ TableName = "asp_net_roles", RecordCount = g.Count() })
.Concat(Context.MenuCategories.GroupBy(x => 1).Select(g => new TableInfo{ TableName = "menu_categories", RecordCount = g.Count() }))
.Concat(Context.Items.GroupBy(x => 1).Select(g => new TableInfo{ TableName = "items", RecordCount = g.Count() }))
....
There is nothing wrong with your LINQ query. It's very acceptable approach. However it's not the most efficient.
There is no need to fetch count from individual tables one by one. You can get the counts from all the tables at once using the System tables Sys.Objects and Sys.Partitions. Just try running this query in your database.
SELECT A.Name AS TableName, SUM(B.rows) AS RecordCount
FROM sys.objects A INNER JOIN sys.partitions B
ON A.object_id = B.object_id
WHERE A.type = 'U' AND B.index_id IN (0, 1)
GROUP BY A.Name
For quick response and cleaner code, you can store this SQL query in a string variable, and run the LINQ
var result = dataContext.ExecuteQuery<YOUR_MODEL_CLASS>
(your_string_query);
I would put something like this:
Dictionary<string, int> view = new() {
new() {'asp_net_roles', Context.AppUsers.Count() },
...
}
return View(view);
maybe not the most pure way, but does the job (unless I misunderstood what you try to accomplish)

How do I delete records before a specified date in Entity Framework Core excluding the latest?

How can I delete all but the latest stock records, that were created before a specific date, in Entity Framework Core. I am unable to figure out the required LINQ query but have managed to put together SQL that should do the job:
--
-- Parameters.
--
DECLARE #PurgeDate DATETIME = DATEADD(day, -7, GETDATE());
DECLARE #RegionId INT = 28;
DECLARE #StockCodes TABLE(
StockCode NVARCHAR(10)
);
-- Could be a significant number
INSERT INTO #StockCodes VALUES ('ABC123'), ('DEF123') /* etc... */;
--
-- Get stock records that are newer than the purge date or the latest record if not.
-- This ensures there is always at least one stock record for a stock code.
--
WITH LatestStockRecords
AS
(
SELECT s.*, [RowNumber] = ROW_NUMBER() OVER (PARTITION BY s.[StockCode] ORDER BY s.[CreatedAt] DESC)
FROM StockRecords AS s
INNER JOIN Locations AS l
ON s.[LocationId] = l.[Id]
WHERE l.[RegionId] = #RegionId
AND s.[StockCode] IN (SELECT * FROM #StockCodes)
)
SELECT *.[Id]
INTO #_STOCK_RECORD_IDS
FROM LatestStockRecords
WHERE [CreatedAt] >= #PurgeDate
OR [RowNumber] = 1;
--
-- Delete the stock records that do not appear in the latest stock records temporary table.
--
DELETE s
FROM StockRecords AS s
INNER JOIN Locations AS l
ON s.[LocationId] = l.[Id]
WHERE l.[RegionId] = #RegionId
AND s.[StockCode] IN (SELECT * FROM #StockCodes)
AND s.[Id] NOT IN (SELECT * FROM #_STOCK_RECORD_IDS);
There could be a significant number of records to delete so performance needs to be a consideration.
EDIT: Removed DbContext and entities as I don't think they're relevant to the question.
This is how I eventually solved the problem. I had to force the evaluation of the grouping query as Entity Framework Core doesn't seem to support the neccessary query at this point.
var regionId = 28;
var stockCodes = new string[] { "ABC123", "DEF123" /* etc... */ };
var purgeDate = DateTime.UtcNow.AddDays(-NumberOfDaysToPurge);
bool IsPurgeable(StockRecord stockRecord)
{
return stockRecord.CreatedAt >= purgeDate;
}
var latestStockRecordIds = context.StockRecords
.Where(stockRecord =>
stockRecord.Location.RegionId == regionId
&& stockCodes.Contains(stockRecord.StockCode))
.AsEnumerable() // <-- force execution
.GroupBy(stockRecord => stockRecord.StockCode)
.SelectMany(group =>
{
var orderedStockRecords = group.OrderByDescending(stockRecord => stockRecord.CreatedAt);
var stockRecords = orderedStockRecords.Count(IsPurgeable) > 0
? orderedStockRecords.Where(IsPurgeable)
: orderedStockRecords.Take(1);
return stockRecords.Select(stockRecord => stockRecord.Id);
});
var stockRecordsToRemove = await context.StockRecords
.Where(stockRecord =>
stockRecord.Location.RegionId == regionId
&& StockCodeCodes.Contains(stockRecord.StockCode)
&& stockRecord.CreatedAt <= purgeDate
&& !latestStockRecordIds.Contains(stockRecord.Id))
.ToListAsync();
context.ChangeTracker.AutoDetectChangesEnabled = false;
context.StockRecords.RemoveRange(stockRecordsToRemove);
await context.SaveChangesAsync();

Linq Queries with multiple group by and Scalar valued function

I am trying to write a LINQ query, having multiple group by and scalar valued functions using Entity Framework.
This is a sample query with simpler names:
var test = context.<db_view>.Where(predicate)
.GroupBy(x => new {x.col1, x.col2, x.col3})
.Select(y => new
{
a = y.key.col1,
b = y.key.col2,
c = y.key.col3,
d = ctx.ScalarFunction(y.key.col2)
});
I however get an error:
"Column Distinct1.col1 is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
I do have col1 in the GROUP BY Clause. Am I missing something here?
This is the SQL Query generated by Entity Framework:
SELECT
1 AS [C1],
[Distinct1].[col1] AS [col1],
[Distinct1].[col2] AS [col2],
[Distinct1].[col3] AS [col3],
[dbo].[scalarfunction]([Distinct1].[col2]) AS [C2],
FROM ( SELECT DISTINCT
[Extent1].[col1] AS [col1],
[Extent1].[col2] AS [col2],
[Extent1].[col3] AS [col3],
FROM (SELECT
[view].[col1] AS [col1],
[view].[col2] AS [col2],
[view].[col3] AS [col3],
[view].[col4] AS [col4],
[view].[col5] AS [col5],
[view].[col6] AS [col6]
FROM [dbo].[view] AS [view]) AS [Extent1]
WHERE (predicate
) AS [Distinct1]
Why use GroupBy? You don't need groups. You just need (col1, col2 col3) distinct tuples.
So use the Distinct operator.
Try this:
var test = context.<db_view>.Where(predicate)
.Select(x => new {x.col1, x.col2, x.col3})
.Distinct()
.Select(y => new
{
a = y.col1,
b = y.col2,
c = y.col3,
d = ctx.ScalarFunction(y.col2)
});

Get total row count in Entity Framework

I'm using Entity Framework to get the total row count for a table. I simply want the row count, no where clause or anything like that. The following query works, but is slow. It took about 7 seconds to return the count of 4475.
My guess here is that it's iterating through the entire table, just like how IEnumerable.Count() extension method works.
Is there a way I can get the total row count "quickly"? is there a better way?
public int GetLogCount()
{
using (var context = new my_db_entities(connection_string))
{
return context.Logs.Count();
}
}
You can even fire Raw SQL query using entity framework as below:
var sql = "SELECT COUNT(*) FROM dbo.Logs";
var total = context.Database.SqlQuery<int>(sql).Single();
That is the way to get your row count using Entity Framework. You will probably see faster performance on the second+ queries as there is an initialization cost the first time that you run it. (And it should be generating a Select Count() query here, not iterating through each row).
If you are interested in a faster way to get the raw row count in a table, then you might want to try using a mini ORM like Dapper or OrmLite.
You should also make sure that your table is properly defined (at the very least, that it has a Primary Key), as failure to do this can also affect the time to count rows in the table.
If you have access to do so, it would be much quicker to query the sys tables to pull this information.
E.g.
public Int64 GetLogCount()
{
var tableNameParam = new SqlParameter("TableName", "Logs");
var schemaNameParam = new SqlParameter("SchemaName", "dbo");
using (var context = new my_db_entities(connection_string))
{
var query = #"
SELECT ISNULL([RowCount],0)
FROM (
SELECT SchemaName,
TableName,
Sum(I.rowcnt) [RowCount]
FROM sysindexes I
JOIN sysobjects O (nolock) ON I.id = o.id AND o.type = 'U'
JOIN (
SELECT so.object_id,
ss.name as SchemaName,
so.name as TableName
FROM sys.objects SO (nolock)
JOIN sys.schemas SS (nolock) ON ss.schema_id = so.schema_id
) SN
ON SN.object_id = o.id
WHERE I.indid IN ( 0, 1 )
AND TableName = #TableName AND SchemaName = #SchemaName
GROUP BY
SchemaName, TableName
) A
";
return context.ExecuteStoreQuery<Int64>(query, tableNameParam, schemaNameParam).First();
}
}

How to COUNT rows within EntityFramework without loading contents?

I'm trying to determine how to count the matching rows on a table using the EntityFramework.
The problem is that each row might have many megabytes of data (in a Binary field). Of course the SQL would be something like this:
SELECT COUNT(*) FROM [MyTable] WHERE [fkID] = '1';
I could load all of the rows and then find the Count with:
var owner = context.MyContainer.Where(t => t.ID == '1');
owner.MyTable.Load();
var count = owner.MyTable.Count();
But that is grossly inefficient. Is there a simpler way?
EDIT: Thanks, all. I've moved the DB from a private attached so I can run profiling; this helps but causes confusions I didn't expect.
And my real data is a bit deeper, I'll use Trucks carrying Pallets of Cases of Items -- and I don't want the Truck to leave unless there is at least one Item in it.
My attempts are shown below. The part I don't get is that CASE_2 never access the DB server (MSSQL).
var truck = context.Truck.FirstOrDefault(t => (t.ID == truckID));
if (truck == null)
return "Invalid Truck ID: " + truckID;
var dlist = from t in ve.Truck
where t.ID == truckID
select t.Driver;
if (dlist.Count() == 0)
return "No Driver for this Truck";
var plist = from t in ve.Truck where t.ID == truckID
from r in t.Pallet select r;
if (plist.Count() == 0)
return "No Pallets are in this Truck";
#if CASE_1
/// This works fine (using 'plist'):
var list1 = from r in plist
from c in r.Case
from i in c.Item
select i;
if (list1.Count() == 0)
return "No Items are in the Truck";
#endif
#if CASE_2
/// This never executes any SQL on the server.
var list2 = from r in truck.Pallet
from c in r.Case
from i in c.Item
select i;
bool ok = (list.Count() > 0);
if (!ok)
return "No Items are in the Truck";
#endif
#if CASE_3
/// Forced loading also works, as stated in the OP...
bool ok = false;
foreach (var pallet in truck.Pallet) {
pallet.Case.Load();
foreach (var kase in pallet.Case) {
kase.Item.Load();
var item = kase.Item.FirstOrDefault();
if (item != null) {
ok = true;
break;
}
}
if (ok) break;
}
if (!ok)
return "No Items are in the Truck";
#endif
And the SQL resulting from CASE_1 is piped through sp_executesql, but:
SELECT [Project1].[C1] AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(cast(1 as bit)) AS [A1]
FROM [dbo].[PalletTruckMap] AS [Extent1]
INNER JOIN [dbo].[PalletCaseMap] AS [Extent2] ON [Extent1].[PalletID] = [Extent2].[PalletID]
INNER JOIN [dbo].[Item] AS [Extent3] ON [Extent2].[CaseID] = [Extent3].[CaseID]
WHERE [Extent1].[TruckID] = '....'
) AS [GroupBy1] ) AS [Project1] ON 1 = 1
[I don't really have Trucks, Drivers, Pallets, Cases or Items; as you can see from the SQL the Truck-Pallet and Pallet-Case relationships are many-to-many -- although I don't think that matters. My real objects are intangibles and harder to describe, so I changed the names.]
Query syntax:
var count = (from o in context.MyContainer
where o.ID == '1'
from t in o.MyTable
select t).Count();
Method syntax:
var count = context.MyContainer
.Where(o => o.ID == '1')
.SelectMany(o => o.MyTable)
.Count()
Both generate the same SQL query.
I think you want something like
var count = context.MyTable.Count(t => t.MyContainer.ID == '1');
(edited to reflect comments)
As I understand it, the selected answer still loads all of the related tests. According to this msdn blog, there is a better way.
http://blogs.msdn.com/b/adonet/archive/2011/01/31/using-dbcontext-in-ef-feature-ctp5-part-6-loading-related-entities.aspx
Specifically
using (var context = new UnicornsContext())
var princess = context.Princesses.Find(1);
// Count how many unicorns the princess owns
var unicornHaul = context.Entry(princess)
.Collection(p => p.Unicorns)
.Query()
.Count();
}
This is my code:
IQueryable<AuctionRecord> records = db.AuctionRecord;
var count = records.Count();
Make sure the variable is defined as IQueryable then when you use Count() method, EF will execute something like
select count(*) from ...
Otherwise, if the records is defined as IEnumerable, the sql generated will query the entire table and count rows returned.
Well, even the SELECT COUNT(*) FROM Table will be fairly inefficient, especially on large tables, since SQL Server really can't do anything but do a full table scan (clustered index scan).
Sometimes, it's good enough to know an approximate number of rows from the database, and in such a case, a statement like this might suffice:
SELECT
SUM(used_page_count) * 8 AS SizeKB,
SUM(row_count) AS [RowCount],
OBJECT_NAME(OBJECT_ID) AS TableName
FROM
sys.dm_db_partition_stats
WHERE
OBJECT_ID = OBJECT_ID('YourTableNameHere')
AND (index_id = 0 OR index_id = 1)
GROUP BY
OBJECT_ID
This will inspect the dynamic management view and extract the number of rows and the table size from it, given a specific table. It does so by summing up the entries for the heap (index_id = 0) or the clustered index (index_id = 1).
It's quick, it's easy to use, but it's not guaranteed to be 100% accurate or up to date. But in many cases, this is "good enough" (and put much less burden on the server).
Maybe that would work for you, too? Of course, to use it in EF, you'd have to wrap this up in a stored proc or use a straight "Execute SQL query" call.
Marc
Use the ExecuteStoreQuery method of the entity context. This avoids downloading the entire result set and deserializing into objects to do a simple row count.
int count;
using (var db = new MyDatabase()){
string sql = "SELECT COUNT(*) FROM MyTable where FkId = {0}";
object[] myParams = {1};
var cntQuery = db.ExecuteStoreQuery<int>(sql, myParams);
count = cntQuery.First<int>();
}
I think this should work...
var query = from m in context.MyTable
where m.MyContainerId == '1' // or what ever the foreign key name is...
select m;
var count = query.Count();

Categories

Resources