Basic misunderstanding of LINQ to SQL and foreign keys

Basic misunderstanding of LINQ to SQL and foreign keys - c#

I am working on a much larger project, and can't seem to get LINQ to SQL working the way I expect it to. I created a simple subset of the project so I can use LinqPad to try to make sure I have a basic understanding of how this should work.
Clearly, I don't: I've created two very simple tables - customer and job. The customer table has an ID (auto-increment) and a Name, the Job table has an ID (ai), a Name, and a CustomerID (foreign key to ID in the customer table).
When I run the following code against an initially empty database:
void Main()
{
string custName = "James";
string[] jobNames = new string[] {"Home Depot", "Menards", "Sam's Club" };
var cust = customer.FirstOrDefault(c => c.Name == custName);
if (cust == null)
{
cust = new customer
{
Name = custName
};
customer.InsertOnSubmit(cust);
}
foreach(var jn in jobNames)
{
if (!job.Any(j => j.Customer.Name == cust.Name && j.Name == jn))
job.InsertOnSubmit(new job {
Name = jn,
Customer = cust
});
}
SubmitChanges();
customer.Dump();
job.Dump();
}
I would expect to end up with 1 customer and 3 jobs in the database - that's all good. But the generated SQL and the setting of the customer IDs are not at all what I expect:
SQL --
SELECT t0.ID, t0.Name
FROM customer AS t0
WHERE (t0.Name = #p0)
LIMIT 0, 1
-- p0 = [James]
SELECT COUNT(*) AS value
FROM job AS t0
LEFT OUTER JOIN customer AS t1
ON (t1.ID = t0.CustomerID)
WHERE ((t1.Name = #p0) AND (t0.Name = #p1))
-- p0 = [James]
-- p1 = [Home Depot]
SELECT COUNT(*) AS value
FROM job AS t0
LEFT OUTER JOIN customer AS t1
ON (t1.ID = t0.CustomerID)
WHERE ((t1.Name = #p0) AND (t0.Name = #p1))
-- p0 = [James]
-- p1 = [Menards]
SELECT COUNT(*) AS value
FROM job AS t0
LEFT OUTER JOIN customer AS t1
ON (t1.ID = t0.CustomerID)
WHERE ((t1.Name = #p0) AND (t0.Name = #p1))
-- p0 = [James]
-- p1 = [Sam's Club]
INSERT INTO job(CustomerID, ID, Name)
VALUES (NULL, 0, #p0)
-- p0 = [Home Depot]
INSERT INTO job(CustomerID, ID, Name)
VALUES (NULL, 0, #p0)
-- p0 = [Menards]
INSERT INTO job(CustomerID, ID, Name)
VALUES (NULL, 0, #p0)
-- p0 = [Sam's Club]
INSERT INTO customer(ID, Name)
VALUES (0, #p0)
-- p0 = [James]
SELECT t0.ID, t0.Name
FROM customer AS t0
SELECT t0.CustomerID, t0.ID, t0.Name
FROM job AS t0
Results in LinqPad:
I thought that the beauty of LINQ to SQL was that I don't have to manage to set the foreign keys and that I should be able to do what I'm trying to do here with a single hit to the database. What am I missing?
EDIT: So I guess I understand the customer IDs being set to NULL because the generated SQL is calling INSERT on the jobs before the INSERT on the customer, hence there is no ID yet. Why would it do that? Also, if I run the same query again, I get three more rows in the jobs table, but the CustomerIDs are all still set to NULL.

Linq query have no issues.
check DB table relations

Related

How do I delete records before a specified date in Entity Framework Core excluding the latest?

How can I delete all but the latest stock records, that were created before a specific date, in Entity Framework Core. I am unable to figure out the required LINQ query but have managed to put together SQL that should do the job:
--
-- Parameters.
--
DECLARE #PurgeDate DATETIME = DATEADD(day, -7, GETDATE());
DECLARE #RegionId INT = 28;
DECLARE #StockCodes TABLE(
StockCode NVARCHAR(10)
);
-- Could be a significant number
INSERT INTO #StockCodes VALUES ('ABC123'), ('DEF123') /* etc... */;
--
-- Get stock records that are newer than the purge date or the latest record if not.
-- This ensures there is always at least one stock record for a stock code.
--
WITH LatestStockRecords
AS
(
SELECT s.*, [RowNumber] = ROW_NUMBER() OVER (PARTITION BY s.[StockCode] ORDER BY s.[CreatedAt] DESC)
FROM StockRecords AS s
INNER JOIN Locations AS l
ON s.[LocationId] = l.[Id]
WHERE l.[RegionId] = #RegionId
AND s.[StockCode] IN (SELECT * FROM #StockCodes)
)
SELECT *.[Id]
INTO #_STOCK_RECORD_IDS
FROM LatestStockRecords
WHERE [CreatedAt] >= #PurgeDate
OR [RowNumber] = 1;
--
-- Delete the stock records that do not appear in the latest stock records temporary table.
--
DELETE s
FROM StockRecords AS s
INNER JOIN Locations AS l
ON s.[LocationId] = l.[Id]
WHERE l.[RegionId] = #RegionId
AND s.[StockCode] IN (SELECT * FROM #StockCodes)
AND s.[Id] NOT IN (SELECT * FROM #_STOCK_RECORD_IDS);
There could be a significant number of records to delete so performance needs to be a consideration.
EDIT: Removed DbContext and entities as I don't think they're relevant to the question.

This is how I eventually solved the problem. I had to force the evaluation of the grouping query as Entity Framework Core doesn't seem to support the neccessary query at this point.
var regionId = 28;
var stockCodes = new string[] { "ABC123", "DEF123" /* etc... */ };
var purgeDate = DateTime.UtcNow.AddDays(-NumberOfDaysToPurge);
bool IsPurgeable(StockRecord stockRecord)
{
return stockRecord.CreatedAt >= purgeDate;
}
var latestStockRecordIds = context.StockRecords
.Where(stockRecord =>
stockRecord.Location.RegionId == regionId
&& stockCodes.Contains(stockRecord.StockCode))
.AsEnumerable() // <-- force execution
.GroupBy(stockRecord => stockRecord.StockCode)
.SelectMany(group =>
{
var orderedStockRecords = group.OrderByDescending(stockRecord => stockRecord.CreatedAt);
var stockRecords = orderedStockRecords.Count(IsPurgeable) > 0
? orderedStockRecords.Where(IsPurgeable)
: orderedStockRecords.Take(1);
return stockRecords.Select(stockRecord => stockRecord.Id);
});
var stockRecordsToRemove = await context.StockRecords
.Where(stockRecord =>
stockRecord.Location.RegionId == regionId
&& StockCodeCodes.Contains(stockRecord.StockCode)
&& stockRecord.CreatedAt <= purgeDate
&& !latestStockRecordIds.Contains(stockRecord.Id))
.ToListAsync();
context.ChangeTracker.AutoDetectChangesEnabled = false;
context.StockRecords.RemoveRange(stockRecordsToRemove);
await context.SaveChangesAsync();

Group by and left join in linq

There are two tables, one is customers has the fields customerID,GroupID and the other one is CustomerGroup has the fields GroupID,GroupName, I want to get the quantity of customerID in each group, here is the LINQ statement:
var groups = from customerGroups in db.CustomerGroup
join customers in db.Customers on customerGroups.GroupID equals customers.GroupID into gc
where customerGroups.MerchantID == merchantID
from subCustomerGroups in gc.DefaultIfEmpty()
group customerGroups by customerGroups.GroupName into grpCustomerGroups
select new { GroupName = grpCustomerGroups.Key, Quantity = customers.Count()};
the problme is that Quantity = customers.Count() is invalid, how to correct the statement?
The expected sql steatment is
exec sp_executesql N'SELECT
1 AS [C1],
[GroupBy1].[K1] AS [GroupName],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Extent1].[GroupName] AS [K1],
COUNT(CustomerID) AS [A1]
FROM [dbo].[CustomerGroup] AS [Extent1]
LEFT OUTER JOIN [dbo].[Customer] AS [Extent2] ON [Extent1].[GroupID] = [Extent2].[GroupID]
WHERE [Extent1].[MerchantID] = #p__linq__0
GROUP BY [Extent1].[GroupName]
) AS [GroupBy1]',N'#p__linq__0 bigint',#p__linq__0=9

Usually, if you find yourself doing a left outer join followed by a GroupBy, it is because you want "items with their sub-items", Like "Schools with their Students", "Clients with their Orders", "CustomerGroups with their Customers", etc. If you want this, consider using GroupJoin instead of "Join + DefaultIfEmpty + GroupBy"
I'm more familiar with method syntax, so I'll use that one.
int merchantId = ...
var result = dbContext.CustomerGroups
// keep only the CustomerGroups from merchantId
.Where(customerGroup => customerGroup.MerchantId == merchantId)
.GroupJoin(dbContext.Customers, // GroupJoin with Customers
customerGroup => customerGroup.GroupId, // from every CustomerGroup take the GroupId
customer => customer.GroupId, // from every Customer take the GroupId
// ResultSelector:
(customerGroup, customersInThisGroup) => new // from every CustomerGroup with all its
{ // matching customers make one new object
GroupName = customerGroup.Key,
Quantity = customersInThisGroup.CustomerId, // ???
});
In words:
Take the sequence of CustomerGroups. Keep only those CustomerGroups that have a value for property MerchantId equal to merchantId. From every remaining CustomerGroup, get all its Customers, by comparing the CustomerGroup.GroupId with each Customer.GroupId.
The result is a sequence of CustomerGroups, each with its Customers. From this result (parameter ResultSelector) get the GroupName from the Customer and the Quantity from the Customers in this group.
Your statement was:
Quantity = customers.CustomerID,
This will not work. I'm sure this is not what you want. Alas you forgot to write what you want. I think it is this:
Quantity = customers.Count().
But if you want the CustomerId of all Customers in this CustomerGroup:
// ResultSelector:
(customerGroup, customersInThisGroup) => new
{
GroupName = customerGroup.Key,
CustomerIds = customersInThisGroup.Select(customer => customer.CustomerId)
.ToList(),
);
If you want you can use the ResultSelector to get "CustomerGroups with their Customers". Most efficient is to select only the properties you actually plan to use:
// ResultSelector:
(customerGroup, customersInThisGroup) => new
{
// select only the CustomerGroup properties that you plan to use:
Id = CustomerGroup.GroupId,
Name = CustomerGroup.Name,
... // other properties that you plan to use
Customers = customersInThisGroup.Select(customer => new
{
// again, select only the Customer properties that you plan to use
Id = customer.Id,
Name = customer.Name,
...
// not needed, you know the value:
// GroupId = customer.GroupId
});
The reason not to select the foreign key of the Customers, is efficiency. If CustomerGroup [14] has 1000 Customers, then every Customer in this group will have a value for GroupId equal to [14]. It would be a waste to send this value [14] 1001 times.

Linq equivalent of aggregate function on multiple tables in one database trip

I have a table function which returns table names and number of entries within that table :
CREATE FUNCTION [dbo].[ufnGetLookups] ()
RETURNS
#lookupsWithItemCounts TABLE
(
[Name] VARCHAR(100),
[EntryCount] INT
)
AS
BEGIN
INSERT INTO #lookupsWithItemCounts([Name],[EntryCount])
VALUES
('Table1', (SELECT COUNT(*) FROM Table1)),
('Table2', (SELECT COUNT(*) FROM Table2)),
('Table3', (SELECT COUNT(*) FROM Table))
RETURN;
END
What would be the Linq equivalent of above simple function? Notice that I want to get the result in one single shot and the speed of the operation is quite important for me. If I realise that the converted linq to sql results in a massive bulky sql with performance hit, I would rather stick to my existing user defined function and forget about the linq equivilant.

You can do that with a UNION query. EG
var q = db.Books.GroupBy(g => "Books").Select(g => new { Name = g.Key, EntryCount = g.Count() })
.Union(db.Authors.GroupBy(g => "Authors").Select(g => new { Name = g.Key, EntryCount = g.Count() }));
var r = q.ToList();

Not an EF guy, and not sure if this would be more performant.
Select TableName = o.name
,RowCnt = sum(p.Rows)
From sys.objects as o
Join sys.partitions as p on o.object_id = p.object_id
Where o.type = 'U'
and o.is_ms_shipped = 0x0
and index_id < 2 -- 0:Heap, 1:Clustered
--and o.name in ('Table1','Table2','Table3' ) -- Include (or not) your own filter
Group By o.schema_id,o.name
Note: Wish I could recall the source of this, but I've used it in my discovery process.

Entity Framework v6.1 query compilation performance

I am confused how EF LINQ queries are compiled and executed. When I run a piece of program in LINQPad couple of times, I get varied performance results (each time the same query takes different amount of time). Please find below my test execution environment.
tools used: EF v6.1 & LINQPad v5.08.
Ref DB : ContosoUniversity DB downloaded from MSDN.
For queries, I am using Persons, Courses & Departments tables from the above DB; see below.
Now, I have below data:
Query goal: get the second person and associated departments.
Query:
var test = (
from p in Persons
join d in Departments on p.ID equals d.InstructorID
select new {
person = p,
dept = d
}
);
var result = (from pd in test
group pd by pd.person.ID into grp
orderby grp.Key
select new {
ID = grp.Key,
FirstName = grp.First().person.FirstName,
Deps = grp.Where(x => x.dept != null).Select(x => x.dept).Distinct().ToList()
}).Skip(1).Take(1).ToList();
foreach(var r in result)
{
Console.WriteLine("person is..." + r.FirstName);
Console.WriteLine(r.FirstName + "' deps are...");
foreach(var d in r.Deps){
Console.WriteLine(d.Name);
}
}
When I run this I get the result and LINQPad shows time taken value from 3.515 sec to 0.004 sec (depending how much gap I take between different runs).
If I take the generated SQL query and execute it, that query always runs between 0.015 sec to 0.001sec.
Generated query:
-- Region Parameters
DECLARE #p0 Int = 1
DECLARE #p1 Int = 1
-- EndRegion
SELECT [t7].[ID], [t7].[value] AS [FirstName]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t6].[ID]) AS [ROW_NUMBER], [t6].[ID], [t6].[value]
FROM (
SELECT [t2].[ID], (
SELECT [t5].[FirstName]
FROM (
SELECT TOP (1) [t3].[FirstName]
FROM [Person] AS [t3]
INNER JOIN [Department] AS [t4] ON ([t3].[ID]) = [t4]. [InstructorID]
WHERE [t2].[ID] = [t3].[ID]
) AS [t5]
) AS [value]
FROM (
SELECT [t0].[ID]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
GROUP BY [t0].[ID]
) AS [t2]
) AS [t6]
) AS [t7]
WHERE [t7].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t7].[ROW_NUMBER]
GO
-- Region Parameters
DECLARE #x1 Int = 2
-- EndRegion
SELECT DISTINCT [t1].[DepartmentID], [t1].[Name], [t1].[Budget], [t1]. [StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
WHERE #x1 = [t0].[ID]
My questions:
1) Are those LINQ statements correct? Or can they be optimized?
2) Is the time difference for LINQ query execution normal?
Another different question:
I have modified the first query to execute immediately (called ToList before the second query). This time generated SQL is very simple as shown below (it doesn't look like there is a SQL query for the first LINQ statement with ToList() included):
SELECT [t0].[ID], [t0].[LastName], [t0].[FirstName], [t0].[HireDate], [t0]. [EnrollmentDate], [t0].[Discriminator], [t1].[DepartmentID], [t1].[Name], [t1]. [Budget], [t1].[StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
Running this modified query also took varied amount of time but the difference is not as big as the first query set run.
In my application, there going to be lot of rows and I prefer first query set to second one but I am confused.
Please guide.
(Note: I have a little SQL Server knowledge so, I am using LINQPad to fine tune queries based on the performance)
Thanks

Complicated SQL Server query

I am trying to write an SQL (Server) query which will return all events on a current day, and for all events where the column recurring= 1, I want it to return this event on the day it is being held and for the subsequent 52 weeks following the event.
My tables are structured as followed :
Event
{
event_id (PK)
title,
description,
event_start DATETIME,
event_end DATETIME,
group_id,
recurring
}
Users
{
UserID (PK)
Username
}
Groups
{
GroupID (PK)
GroupName
}
Membership
{
UserID (FK)
GroupID (FK)
}
The code I have thus far is as follows :
var db = Database.Open("mPlan");
string username = HttpContext.Current.Request.Cookies.Get("mpUsername").Value;
var listOfGroups = db.Query("SELECT GroupID FROM Membership WHERE UserID = (SELECT UserID from Users WHERE Username = #0 )", username);
foreach(var groupID in listOfGroups)
{
int newGroupID = groupID.GroupID;
var result = db.Query(
#"SELECT e.event_id, e.title, e.description, e.event_start, e.event_end, e.group_id, e.recurring
FROM event e
JOIN Membership m ON m.GroupID = e.group_id
WHERE e.recurring = 0
AND m.GroupID = #0
AND e.event_start >= #1
AND e.event_end <= #2
UNION ALL
SELECT e.event_id, e.title, e.description, DATEADD(week, w.weeks, e.event_start), DATEADD(week, w.weeks, e.event_end), e.group_id, e.recurring
FROM event e
JOIN Membership m ON m.GroupID = e.group_id
CROSS JOIN
( SELECT row_number() OVER (ORDER BY Object_ID) AS weeks
FROM SYS.OBJECTS
) AS w
WHERE e.recurring = 1
AND m.GroupID = #3
AND DATEADD(WEEK, w.Weeks, e.event_start) >= #4
AND DATEADD(WEEK, w.Weeks, e.event_end) <= #5", newGroupID, start, end, newGroupID, start, end
);
This results in when one queries for the date of the event stored in the database, this event and 52 weeks of events are returned. When one queries for the event the week after this one, nothing is returned.

The simplest solution would be to alter the following 2 lines
AND e.event_start >= #4
AND e.event_end <= #5"
to
AND DATEADD(WEEK, w.Weeks, e.event_start) >= #4
AND DATEADD(WEEK, w.Weeks, e.event_end) <= #5"
However, I'd advise putting all this SQL into a stored procedure, SQL-Server will cache the execution plans and it will result in (slightly) better performance.
CREATE PROCEDURE dbo.GetEvents #UserName VARCHAR(50), #StartDate DATETIME, #EndDate DATETIME
AS
BEGIN
-- DEFINE A CTE TO GET ALL GROUPS ASSOCIATED WITH THE CURRENT USER
;WITH Groups AS
( SELECT GroupID
FROM Membership m
INNER JOIN Users u
ON m.UserID = u.UserID
WHERE Username = #UserName
GROUP BY GroupID
),
-- DEFINE A CTE TO GET ALL EVENTS FOR THE GROUPS DEFINED ABOVE
AllEvents AS
( SELECT e.*
FROM event e
INNER JOIN Groups m
ON m.GroupID = e.group_id
UNION ALL
SELECT e.event_id, e.title, e.description, DATEADD(WEEK, w.weeks, e.event_start), DATEADD(WEEK, w.weeks, e.event_end), e.group_id, e.recurring
FROM event e
INNER JOIN Groups m
ON m.GroupID = e.group_id
CROSS JOIN
( SELECT ROW_NUMBER() OVER (ORDER BY Object_ID) AS weeks
FROM SYS.OBJECTS
) AS w
WHERE e.recurring = 1
)
-- GET ALL EVENTS WHERE THE EVENTS FALL IN THE PERIOD DEFINED
SELECT *
FROM AllEvents
WHERE Event_Start >= #StartDate
AND Event_End <= #EndDate
END
Then you can call this with
var result = db.Query("EXEC dbo.GetEvents #0, #1, #2", username, start, end);
This elimates the need to iterate over groups in your code behind. If this is actually a requirement then you could modify the stored procedure to take #GroupID as a parameter, and change the select statements/where clauses as necessary.
I have assumed knowledge of Common Table Expressions. They are not required to make the query work, they just make things slightly more legible in my opinion. I can rewrite this without them if required.

I would check my parameters one at a time against some trivial SQL, just to rule them out as possible culprits. Something like this:
var result = db.Query("select r=cast(#0 as varchar(80))",username);
var result = db.Query("select r=cast(#0 as int)",newGroupID);
var result = db.Query("select r=cast(#0 as datetime)",start);
var result = db.Query("select r=cast(#0 as datetime)",end);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Basic misunderstanding of LINQ to SQL and foreign keys - c#

Linq query have no issues. check DB table relations

Related

How do I delete records before a specified date in Entity Framework Core excluding the latest?

Group by and left join in linq

Linq equivalent of aggregate function on multiple tables in one database trip

Entity Framework v6.1 query compilation performance

Complicated SQL Server query

Categories

Resources