LINQ to SQL complex query problem - c#

I have 3 tables: Principal (Principal_ID, Scale), Frequency (Frequency_ID, Value) and Visit (Visit_ID, Principal_ID, Frequency_ID).
I need a query which returns all principals (in the Principal table), and for each record, query the capacity required for that principal, calculated as below:
Capacity = (Principal.Scale == 0 ? 0 : (Frequency.Value == 1 ? 1 : Frequency.Value * 1.8) / Principal.Scale)
I'm using LINQ to SQL, so here is the query:
from Principal p in ShopManagerDataContext.Instance.Principals
let cap =
(
from Visit v in p.Visits
let fqv = v.Frequency.Value
select (p.Scale != 0 ? ((fqv == 1.0f ? fqv : fqv * 1.8f) / p.Scale) : 0)
).Sum()
select new
{
p,
Capacity = cap
};
The generated TSQL:
SELECT [t0].[Principal_ID], [t0].[Name], [t0].[Scale], (
SELECT SUM(
(CASE
WHEN [t0].[Scale] <> #p0 THEN (
(CASE
WHEN [t2].[Value] = #p1 THEN [t2].[Value]
ELSE [t2].[Value] * #p2
END)) / (CONVERT(Real,[t0].[Scale]))
ELSE #p3
END))
FROM [Visit] AS [t1]
INNER JOIN [Frequency] AS [t2] ON [t2].[Frequency_ID] = [t1].[Frequency_ID]
WHERE [t1].[Principal_ID] = [t0].[Principal_ID]
) AS [Capacity]
FROM [Principal] AS [t0]
And the error I get:
SqlException: Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.
And ideas how to solve this, if possible, in one query?
Thank you very much in advance!

Here are 2 ways to do this by changing up your approach:
Create a user defined aggregate function using the SQL CLR. This may not be the right solution for you, but it's a perfect fit for the problem as stated. For one thing, this would move all of the logic into the data layer so LINQ would be of limited value. With this approach you get effeciency, but there's a big impact on your architecture.
Load Visit and Fequency tables into a typed DataSet and use LINQ to datasets. This will probably work using your existing code, but I haven't tried it. With this approach your achitecture is more or less preserved, but you could have a big efficency hit if Visit and Frequency are large.

Based on the comment, I've an alternative suggestion. Since your error is coming from SQL, and you aren't using the new column as a filter, you can move your calculation to the client. For this to work, you'll need to pull all the relevant records (using DataLoadOptions.LoadWith<> on your context).
To further your desire for use with binding to a DataGrid, it'd probably be easiest to bury the complexity in a property of Principal.
partial class Principal
{
public decimal Capacity
{
get
{
return this.Scale == 0 ? 0 : this.Visits.Select(v =>
(v.Frequency.Value == 1 ? 1 : v.Frequency.Value * 1.8) / this.Scale).Sum();
}
}
}
Then your retrieval gets really simple:
using (ShopManagerDataContext context = new ShopManagerDataContext())
{
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<Principal>(p => p.Visits);
options.LoadWith<Visit>(v => v.Frequency);
context.LoadOptions = options;
return (from p in context.Principals
select p).ToList();
}

Related

How to get records from tables in custom order of ids?

I have some Ids store in below variable:
List<int> Ids;
Now I want to get records based on above Ids but with same order as it is in above Ids.
For eg: Records are like this in database:
Employee:
Id
1
2
3
4
5
Now if Ids array holds Ids like this : 4,2,5,3,1 then I am trying to get records in this order order only:
Query:
var data = context.Employee.Where(t => Ids.Contains(t.Id)).ToList();
But above query is giving me output like it is in table:
Id
1
2
3
4
5
Expected output :
Id
4
2
5
3
1
Update:I have already tried this below solution but as this is entity framework it didn't work out:
var data = context.Employee.Where(t => Ids.Contains(t.Id))
.OrderBy(d => Ids.IndexOf(d.Id)).ToList();
For above solution to make it working I have to add to list :
var data = context.Employee.Where(t => Ids.Contains(t.Id)).ToList()
.OrderBy(d => Ids.IndexOf(d.Id)).ToList();
But I don't want to load data in memory and then filter out my record.
Since the order in which the data is returned when you do not specify an ORDER BY is not determined, you have to add an ORDER BY to indicate how you want it sorted. Unfortunately you have to order based on objects/values in-memory, and cannot use that to order in your SQL query.
Therefore, the best you can do is to order in-memory once the data is retrieved from the database.
var data = context.Employee
// Add a criteria that we only want the known ids
.Where(t => Ids.Contains(t.Id))
// Anything after this is done in-memory instead of by the database
.AsEnumerable()
// Sort the results, in-memory
.OrderBy(d => Ids.IndexOf(d.Id))
// Materialize into a list
.ToList();
Without stored procedures you can use Union and ?: that are both canonical functions.
I can't immagine other ways.
?:
You can use it to assign a weigth to each id value then order by the weigth. Also, you have to generate ?: using dynamic linq.
What is the equivalent of "CASE WHEN THEN" (T-SQL) with Entity Framework?
Dynamically generate LINQ queries
Union
I think this is the more simple way to obtain it. In this case you can add a Where/Union for each Id.
EDIT 1
About using Union you can use code similar to this
IQueryable<Foo> query = context.Foos.AsQueryable();
List<int> Ids = new List<int>();
Ids.AddRange(new[] {3,2,1});
bool first = true;
foreach (int id in Ids)
{
if (first)
{
query = query.Where(_ => _.FooId == id);
first = false;
}
else
{
query = query.Union(context.Foos.Where(_ => _.FooId == id));
}
}
var results = query.ToList();
This generate the followiong query
SELECT
[Distinct2].[C1] AS [C1]
FROM ( SELECT DISTINCT
[UnionAll2].[C1] AS [C1]
FROM (SELECT
[Distinct1].[C1] AS [C1]
FROM ( SELECT DISTINCT
[UnionAll1].[FooId] AS [C1]
FROM (SELECT
[Extent1].[FooId] AS [FooId]
FROM [Foos] AS [Extent1]
WHERE [Extent1].[FooId] = #p__linq__0
UNION ALL
SELECT
[Extent2].[FooId] AS [FooId]
FROM [Foos] AS [Extent2]
WHERE [Extent2].[FooId] = #p__linq__1) AS [UnionAll1]
) AS [Distinct1]
UNION ALL
SELECT
[Extent3].[FooId] AS [FooId]
FROM [Foos] AS [Extent3]
WHERE [Extent3].[FooId] = #p__linq__2) AS [UnionAll2]
) AS [Distinct2]
p__linq__0 = 3
p__linq__1 = 2
p__linq__2 = 1
EDIT 2
I think the best approach is in memory approach because it has the same network load, EF does not generate the ugly query that could not work on databases different from SQL Server and code is more readable. In your particular application could be that union/where is better. So, generally I would suggest you to try memory approach then, if you have [performance] issues, you can check if union/where is better.

Entity Framework generates inefficient SQL for paged query

I have a simple paged linq query against one entity:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t);
data = data.OrderBy(t => t.Id);
if (page > 0)
{
data = data.Skip(rows * (page - 1)).Take(rows);
}
var l = data.ToList();
I expected it to generate SQL similar to:
select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id
When I run the above query in SSMS, it returns quickly (had to rebuild my indexes first).
However, the generated SQL is different. It contains a nested query as shown below:
SELECT TOP (50)
[Project1].[Id] AS [Id],
[Project1].[CampaignId] AS [CampaignId]
<redacted>
FROM ( SELECT [Project1].[Id] AS [Id],
[Project1].[CampaignId] AS [CampaignId],
<redacted>,
row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[CampaignId] AS [CampaignId],
<redacted>
FROM [dbo].[Widgets] AS [Extent1]
WHERE ([Extent1].[CampaignId] = #p__linq__0) AND ([Extent1].[CalendarEventId] = #p__linq__1) AND ([Extent1].[RecurringEventId] = #p__linq__2 OR [Extent1].[RecurringEventId] IS NULL)
) AS [Project1]
) AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[Id] ASC
The Widgets table is enormous and the inner query returns 100000s of records, causing a timeout.
Is there anything I can do to change the generation? Anything I am doing wrong?
UPDATE
I finally managed to refactor my code to return the results relatively quickly:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t)).AsEnumerable().Select((item, index) => new { Index = index, Item = item });
data = data.OrderBy(t => t.Index);
if (page > 0)
{
data = data.Where(t => t.Index >= (rows * (page - 1)));
}
data = data.Take(rows);
Note, the page > 0 logic is simply used to prevent an invalid parameter being used; it does no optimization. In fact page > 1 , while valid, does not provide any noticeable optimization for the 1st page; since the Where is not a slow operation.
Prior SQL Server 2012, the generated SQL code is the best way to perform pagging. Yes, it is awfull and very inefficient but is the best you can do even writing your own SQL scritp by hand. There are tons of digital ink about this in the net. Just google it.
In the firt page, this can be optimized not doing Skip and just Take but in any other page you are f***** up.
A workarround could be to generate your own row_number in persistence (an auto-identity could work) and just do where(widget.number > (page*rows) ).Take(rows) in code. If there is a good index in your widget.number the query should be very fast. But, this breaks the dynamic orderBy.
However, I can see in your code that you are ordering by widget.id always; so, if dynamic orderBy is not essential, this could be a valid workaround.
Will you take your own medicine?
could you ask me.
No, I will not. The best way to deal with this is having a persistence read-model in wich you can even have one table per widget orderBy field with its own widget.number. The problem is that modeling a system with a persistence read-model just for this issue is too crazy. Having a read-model is part of the overall design of your system and requires taking it in account from the very beginning of the design and development of a system.
The generated query is so complex and nested because you used Skip method. In T-SQL Take is easy achievable by using just Top, but that is not the case with Skip - to apply it you need row_number and that is why there is a nested query - inner returns rows with row_number and outer filters them to get proper amount of rows. Your query:
select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id
lacks Skipping initial rows. To keep the query very efficient it would be best to, instead of using Take and Skip to keep paging by condition on Id, because you are ordering your rows for paging basing on that field:
var data = (from t in ctx.ObjectContext.Widgets
where t.CampaignId == campaignId &&
t.CalendarEventId == calendarEventId
(t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
select t);
data = data
.OrderBy(t => t.Id);
.Where(t => t.Id >= rows * (page - 1) && t.Id < rows * page )
.ToList();
AFAIK you cannot change query generated by Entity. Although you can force entity to run raw SQL query:
https://msdn.microsoft.com/en-us/data/jj592907.aspx
You can also use stored procedures:
https://msdn.microsoft.com/en-us/data/gg699321.aspx
Even if there's a chance to change generated query IMO it would be spitting into the wind. I bet that easier will be to write the SQL query on your own.

Linq to SQL one to many relationships

Last couple of days i was struggling with a linq querys performance:
LinqConnectionDataContext context = new LinqConnectionDataContext();
System.Data.Linq.DataLoadOptions options = new System.Data.Linq.DataLoadOptions();
options.LoadWith<Question>(x => x.Answers);
options.LoadWith<Question>(x => x.QuestionVotes);
options.LoadWith<Answer>(x => x.AnswerVotes);
context.LoadOptions = options;
var query =( from c in context.Questions
where c.UidUser == userGuid
&& c.Answers.Any() == true
select new
{
c.Uid,
c.Content,
c.UidUser,
QuestionVote = from qv in c.QuestionVotes where qv.UidQuestion == c.Uid && qv.UidUser == userGuid select new {qv.UidQuestion, qv.UidUser },
Answer = from d in c.Answers
where d.UidQuestion == c.Uid
select new
{
d.Uid,
d.UidUser,
d.Conetent,
AnswerVote = from av in d.AnswerVotes where av.UidAnswer == d.Uid && av.UidUser == userGuid select new { av.UidAnswer, av.UidUser }
}
}).ToList();
Query have to run through 5000 rows, and it takes up to 1 minute. How can i improve performance of this query?
Update:
something to get you started.
CREATE PROCEDURE GetQuestionsAndAnswers
(
#UserGuid VARCHAR(100)
)
AS
BEGIN
SELECT
c.Uid,
c.Content,
c.UidUser,
qv.UidQuestion,
qv.UidUser,
av.UidAnswer,
av.UidUser,
av.Content,
d.Uid,
d.UidUser,
d.Content
FROM Question c
INNER JOIN QuestionVotes qv ON qv.UidQuestion = c.Uid AND qv.UidUser = #UserGuid
INNER JOIN Answers d ON d.UidQuestion = c.Uid
INNER JOIN AnswerVotes av ON av.UidAnswer = d.Uid AND av.UidUser = #UserGuid
WHERE c.UidUser = #UserGuid
END
You will already have clustered indexes on primary columns by default (just confirm this on your database side), and you would want non-clustered indexes on QuestionVote - UidUser column, AnswerVote - UidUser column, and Answer - UidQuestion column.
Also have a look here. You might want to use .AsQueryable() instead of ToList() for deferred execution
Do you ToList()?
Checked out the generated sql using sql-debug-visualizer and then copy the generated SQL and run it from SQL Client and see how much time it takes. If it takes near to 1 min you need to imporve performance at DB Level by adding indexing and / or stored procedure or creating views etc.
If above is not taking much time you can always create Stored Procedure and call that using LINQ to SQL.
One more recommendation is to use Entity Framework if you can change to because it is the future.

EF Pre Compile query and return of a scalar value

I use asp.net 4 c# and ef4.
I have this code, it should compile a query and return a single scalar value (I use anonymous type).
My code does not have apparently errors, but because is the first time I write a compiled query I would like to know if is well written or could be improved for a performance boost.
var query = CompiledQuery.Compile((CmsConnectionStringEntityDataModel ctx)
=> from o in ctx.CmsOptions
where o.OptionId == 7
select new
{
Value = o.Value
});
uxHtmlHead.Text = query(context).FirstOrDefault().Value;// I print the result in a Label
SQL Profile Output:
SELECT TOP (1)
[Extent1].[OptionId] AS [OptionId],
[Extent1].[Value] AS [Value]
FROM [dbo].[CmsOptions] AS [Extent1]
WHERE 7 = [Extent1].[OptionId]
Many Thanks
Result after Wouter advice (please guys have a double check again):
static readonly Func<CmsConnectionStringEntityDataModel, int, string> compiledQueryHtmlHead =
CompiledQuery.Compile<CmsConnectionStringEntityDataModel, int, string>(
(ctx, id) => ctx.CmsOptions.FirstOrDefault(o => o.OptionId == id).Value);
using (var context = new CmsConnectionStringEntityDataModel())
{
int id = 7;
uxHtmlHead.Text = compiledQueryHtmlHead.Invoke(context, id);
}
Resulting SQL (I do not understand why with a LEFT JOIN)
exec sp_executesql N'SELECT
[Project1].[Value] AS [Value]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent1].[Value] AS [Value]
FROM [dbo].[CmsOptions] AS [Extent1]
WHERE [Extent1].[OptionId] = #p__linq__0 ) AS [Project1] ON 1 = 1',N'#p__linq__0 int',#p__linq__0=7
There are 2 things you can improve on.
First, precompiling a query is definitely a good idea but if you have a look at your code you will see that it precompiles the query each and every time instead of only once.
You need to move the precompiled query to a static variable that is initialized only once.
Another thing you need to be careful of is that when precompiling a query you shouldn't modify the query anymore before executing it.
You are building a precompiled query that will select all rows and then you say 'firstordefault' which changes the precompiled query to a SELECT TOP (1) and you lose the benefit of precompiling. You need to move the FirstOrDefault part inside your precompiled query and return only one result.
Have a look at this documentation. If you look at the examples you can see how they use a static field to hold the compiled query and how they specify the return value.

How to COUNT rows within EntityFramework without loading contents?

I'm trying to determine how to count the matching rows on a table using the EntityFramework.
The problem is that each row might have many megabytes of data (in a Binary field). Of course the SQL would be something like this:
SELECT COUNT(*) FROM [MyTable] WHERE [fkID] = '1';
I could load all of the rows and then find the Count with:
var owner = context.MyContainer.Where(t => t.ID == '1');
owner.MyTable.Load();
var count = owner.MyTable.Count();
But that is grossly inefficient. Is there a simpler way?
EDIT: Thanks, all. I've moved the DB from a private attached so I can run profiling; this helps but causes confusions I didn't expect.
And my real data is a bit deeper, I'll use Trucks carrying Pallets of Cases of Items -- and I don't want the Truck to leave unless there is at least one Item in it.
My attempts are shown below. The part I don't get is that CASE_2 never access the DB server (MSSQL).
var truck = context.Truck.FirstOrDefault(t => (t.ID == truckID));
if (truck == null)
return "Invalid Truck ID: " + truckID;
var dlist = from t in ve.Truck
where t.ID == truckID
select t.Driver;
if (dlist.Count() == 0)
return "No Driver for this Truck";
var plist = from t in ve.Truck where t.ID == truckID
from r in t.Pallet select r;
if (plist.Count() == 0)
return "No Pallets are in this Truck";
#if CASE_1
/// This works fine (using 'plist'):
var list1 = from r in plist
from c in r.Case
from i in c.Item
select i;
if (list1.Count() == 0)
return "No Items are in the Truck";
#endif
#if CASE_2
/// This never executes any SQL on the server.
var list2 = from r in truck.Pallet
from c in r.Case
from i in c.Item
select i;
bool ok = (list.Count() > 0);
if (!ok)
return "No Items are in the Truck";
#endif
#if CASE_3
/// Forced loading also works, as stated in the OP...
bool ok = false;
foreach (var pallet in truck.Pallet) {
pallet.Case.Load();
foreach (var kase in pallet.Case) {
kase.Item.Load();
var item = kase.Item.FirstOrDefault();
if (item != null) {
ok = true;
break;
}
}
if (ok) break;
}
if (!ok)
return "No Items are in the Truck";
#endif
And the SQL resulting from CASE_1 is piped through sp_executesql, but:
SELECT [Project1].[C1] AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(cast(1 as bit)) AS [A1]
FROM [dbo].[PalletTruckMap] AS [Extent1]
INNER JOIN [dbo].[PalletCaseMap] AS [Extent2] ON [Extent1].[PalletID] = [Extent2].[PalletID]
INNER JOIN [dbo].[Item] AS [Extent3] ON [Extent2].[CaseID] = [Extent3].[CaseID]
WHERE [Extent1].[TruckID] = '....'
) AS [GroupBy1] ) AS [Project1] ON 1 = 1
[I don't really have Trucks, Drivers, Pallets, Cases or Items; as you can see from the SQL the Truck-Pallet and Pallet-Case relationships are many-to-many -- although I don't think that matters. My real objects are intangibles and harder to describe, so I changed the names.]
Query syntax:
var count = (from o in context.MyContainer
where o.ID == '1'
from t in o.MyTable
select t).Count();
Method syntax:
var count = context.MyContainer
.Where(o => o.ID == '1')
.SelectMany(o => o.MyTable)
.Count()
Both generate the same SQL query.
I think you want something like
var count = context.MyTable.Count(t => t.MyContainer.ID == '1');
(edited to reflect comments)
As I understand it, the selected answer still loads all of the related tests. According to this msdn blog, there is a better way.
http://blogs.msdn.com/b/adonet/archive/2011/01/31/using-dbcontext-in-ef-feature-ctp5-part-6-loading-related-entities.aspx
Specifically
using (var context = new UnicornsContext())
var princess = context.Princesses.Find(1);
// Count how many unicorns the princess owns
var unicornHaul = context.Entry(princess)
.Collection(p => p.Unicorns)
.Query()
.Count();
}
This is my code:
IQueryable<AuctionRecord> records = db.AuctionRecord;
var count = records.Count();
Make sure the variable is defined as IQueryable then when you use Count() method, EF will execute something like
select count(*) from ...
Otherwise, if the records is defined as IEnumerable, the sql generated will query the entire table and count rows returned.
Well, even the SELECT COUNT(*) FROM Table will be fairly inefficient, especially on large tables, since SQL Server really can't do anything but do a full table scan (clustered index scan).
Sometimes, it's good enough to know an approximate number of rows from the database, and in such a case, a statement like this might suffice:
SELECT
SUM(used_page_count) * 8 AS SizeKB,
SUM(row_count) AS [RowCount],
OBJECT_NAME(OBJECT_ID) AS TableName
FROM
sys.dm_db_partition_stats
WHERE
OBJECT_ID = OBJECT_ID('YourTableNameHere')
AND (index_id = 0 OR index_id = 1)
GROUP BY
OBJECT_ID
This will inspect the dynamic management view and extract the number of rows and the table size from it, given a specific table. It does so by summing up the entries for the heap (index_id = 0) or the clustered index (index_id = 1).
It's quick, it's easy to use, but it's not guaranteed to be 100% accurate or up to date. But in many cases, this is "good enough" (and put much less burden on the server).
Maybe that would work for you, too? Of course, to use it in EF, you'd have to wrap this up in a stored proc or use a straight "Execute SQL query" call.
Marc
Use the ExecuteStoreQuery method of the entity context. This avoids downloading the entire result set and deserializing into objects to do a simple row count.
int count;
using (var db = new MyDatabase()){
string sql = "SELECT COUNT(*) FROM MyTable where FkId = {0}";
object[] myParams = {1};
var cntQuery = db.ExecuteStoreQuery<int>(sql, myParams);
count = cntQuery.First<int>();
}
I think this should work...
var query = from m in context.MyTable
where m.MyContainerId == '1' // or what ever the foreign key name is...
select m;
var count = query.Count();

Categories

Resources