LinqToSql - SQL generated by CONCAT (UNION)

LinqToSql - SQL generated by CONCAT (UNION) - c#

Simply put, is there any way to coerce LinqToSql into generating chained concats without nesting the UNION ALL statements?
Example:
a.Concat(b).Concat(c)
results in something semantically similar to:
SELECT * FROM (
SELECT * FROM A
UNION ALL
SELECT * FROM B
)
UNION ALL
SELECT * FROM C
It would be much more readable/preferable if I could convince it to do:
SELECT * FROM A
UNION ALL
SELECT * FROM B
UNION ALL
SELECT * FROM C
I understand why it does it (and I'm not even sure these two things are exactly the same semantically) but is there any way to make this happen? It would make a bunch of our generated queries significantly easier to read and debug.

The SQL generated from LINQ is not designed to be readable, so I don't think there is a way to do that. It would be nice, I agree. This is one reason I am intrigued by the concept of micro-ORMs like Dapper, so I can just write my own SQL.

I had issues concatenating a large autogenerated set of iqueryables, as the nesting behavior of the generated sql query simply got too deep for sql to handle. I solved it by concatenating the items in a binare-tree-like fashion like so:
public static IQueryable<T> BinaryConcatenation<T>(this IEnumerable<IQueryable<T>> queries)
{
var count = queries.Count();
var firsthalf = queries.Take(count / 2).ToArray();
var secondhalf = queries.Skip(count / 2).ToArray();
if (firsthalf.Length == 0 || secondhalf.Length == 0) return queries.Aggregate((src, next) => src.Concat(next));
var first = BinaryConcatenation(firsthalf);
var second = BinaryConcatenation(secondhalf);
return first.Concat(second);
}
This way the generated SQL nesting depth becomes drastically lower (log(n)?).

Related

How to force LINQ to SQL to evaluate the whole query in the database?

I have a query which is fully translatable to SQL. For unknown reasons LINQ decides the last Select() to execute in .NET (not in the database), which causes to run a lot of additional SQL queries (per each item) against database.
Actually, I found a 'strange' way to force the full translation to SQL:
I have a query (this is a really simplified version, which still does not work as expected):
MainCategories.Select(e => new
{
PlacementId = e.CatalogPlacementId,
Translation = Translations.Select(t => new
{
Name = t.Name,
// ...
}).FirstOrDefault()
})
It will generates a lot of SQL queries:
SELECT [t0].[CatalogPlacementId] AS [PlacementId]
FROM [dbo].[MainCategories] AS [t0]
SELECT TOP (1) [t0].[Name]
FROM [dbo].[Translations] AS [t0]
SELECT TOP (1) [t0].[Name]
FROM [dbo].[Translations] AS [t0]
...
However, if I append another Select() which just copies all members:
.Select(e => new
{
PlacementId = e.PlacementId,
Translation = new
{
Name = e.Translation.Name,
// ...
}
})
It will compile it into a single SQL statement:
SELECT [t0].[CatalogPlacementId] AS [PlacementId], (
SELECT [t2].[Name]
FROM (
SELECT TOP (1) [t1].[Name]
FROM [dbo].[Translations] AS [t1]
) AS [t2]
) AS [Name]
FROM [dbo].[MainCategories] AS [t0]
Any clues why? How to force the LINQ to SQL to generate a single query more generically (without the second copying Select())?
NOTE: I've updated to query to make it really simple.
PS: Only, idea I get is to post-process/transform queries with similar patterns (to add the another Select()).

When you call SingleOrDefault in MyQuery, you are executing the query at that point which is loading the results into the client.
SingleOrDefault returns IEnumerable<T> which is no longer an IQueryable<T>. You have coerced it at this point which will do all further processing on the client - it can no longer perform SQL composition.

Not entirely sure what is going on, but I find the way you wrote this query pretty 'strange'. I would write it like this, and suspect this will work:
var q = from e in MainCategories
let t = Translations.Where(t => t.Name == "MainCategory"
&& t.RowKey == e.Id
&& t.Language.Code == "en-US").SingleOrDefault()
select new TranslatedEntity<Category>
{
Entity = e,
Translation = new TranslationDef
{
Language = t.Language.Code,
Name = t.Name,
Xml = t.Xml
}
};
I always try to separate the from part (selection of the datasources) from the select part (projection to your target type. I find it also easier to read/understand, and it generally also works better with most linq providers.

You can write the query as follows to get the desired result:
MainCategories.Select(e => new
{
PlacementId = e.CatalogPlacementId,
TranslationName = Translations.FirstOrDefault().Name,
})
As far as i'm aware, it's due to how LINQ projects the query. I think when it see's the nested Select, it will not project that into multiple sub-queries, as essentially that would be what would be needed, as IIRC you cannot use multiple return columns from a sub-query in SQL, so LINQ changes this to a query-per-row. FirstOrDefault with a column accessor seems to be a direct translation to what would happen in SQL and therefore LINQ-SQL knows it can write a sub-query.
The second Select must project the query similar to how I have written it above. It would be hard to confirm without digging into a reflector. Generally, if I need to select many columns, I would use a let statement like below:
from e in MainCategories
let translation = Translations.FirstOrDefault()
select new
{
PlacementId = e.CatalogPlacementId,
Translation = new {
translation.Name,
}
})

LINQ understading Non-Equijoins

I use asp.net 4, ef 4 and c#, LINQ and Non-Equijoins.
Here below I wrote two examples of Non-Equijoins.
Both are working fine in my model.
Because I'm pretty new to Linq, I would like ask you:
Which syntax typology would you advice me to adopt in my code?
Which code performance faster?
Thanks for your help:
Here some useful links:
http://msdn.microsoft.com/en-us/library/bb882533.aspx
http://msdn.microsoft.com/en-us/library/bb311040.aspx
http://msdn.microsoft.com/en-us/library/bb310804.aspx
// Query sintax
var queryContents =
from cnt in context.CmsContents
let cntA =
from a in context.CmsContentsAssignedToes
select a.CmsContent.ContentId
where cntA.Contains(cnt.ContentId) == false
select cnt;
// Query method
var queryContents02 =
from c in context.CmsContents
where !(
from a in context.CmsContentsAssignedToes
select a.ContentId).Contains(c.ContentId)
select c;

I'd prompt for a third option:
var validContentIds = from a in context.CmsContentsAssignedToes
select a.ContentId;
var queryContents = from cnt in context.CmsContents
where !validContentIds.Contains(cnt.ContentId)
select cnt;
Or alternatively (and equivalently):
var validIds = context.CmsContentsAssignedToes.Select(a => a.ContentId);
var queryContents = context.CmsContents
.Where(cnt => !validIds.Contains(cnt.ContentId));
I wouldn't expect the performance to be impacted - I'd expect all of these to end up with the same SQL.

I like the first query syntax (it is better readable for me but this part of question is subjective) and I think the perforamance will be the same because queries are actually the same. let keyword just stores subexpression to variable but generated SQL query should be the "same".

Help Converting T-SQL to LINQ

Have the following (non-straightforward) T-SQL query, which i'm trying to convert to LINQ (to be used in a L2SQL expression):
declare #IdAddress int = 481887
select * from
(
select top 3 p.*
from tblProCon p
inner join vwAddressExpanded a
on p.IdPrimaryCity = a.IdPrimaryCity
where a.AddressType = 3
and p.IsPro = 1
and a.IdAddress = #IdAddress
order by AgreeCount desc
) as Pros
union
select * from
(
select top 3 p.*
from tblProCon p
inner join vwAddressExpanded a
on p.IdPrimaryCity = a.IdPrimaryCity
where a.AddressType = 3
and p.IsPro = 0
and a.IdAddress = #IdAddress
order by AgreeCount desc
) as Cons
order by ispro desc, AgreeCount desc
In a nutshell, i have an #IdAddress - and i'm trying to find the top 3 pro's and top 3 con's for that address.
The above query does work as expected. I'm not entirely sure how to convert it to a LINQ query (never done unions before with LINQ). I don't even know where to start. :)
Query-style/Lambda accepted (prefer query-style, for readability).
Also - i have LinqPad installed - but i'm not sure how to "convert T-SQL to Linq" - is there an option for that? Bonus upvote will be awarded for that. :)
The above T-SQL query performs well, and this L2SQL query will be executed frequently, so it needs to perform pretty well.
Appreciate the help.

var baseQuery = (from p in db.tblProCon
join a in db.vwAddresssExpanded
on p.IdPrimaryCity equals a.IdPrimaryCity
where a.AddressType == (byte) AddressType.PrimaryCity &&
a.IdAddress == idAddress
order by p.AgreeCount descending
select p);
var pros = baseQuery.Where(x=> x.IsPro).Take(3);
var cons = baseQuery.Where(x=> !x.IsPro).Take(3);
var results = pros
.Union(cons)
.OrderByDescending(x => x.IsPro)
.ThenByDescending(x => x.AgreeCount)
.ToList();

You can call (some query expression).Union(other query expression).
You can also (equivalently) write Enumerable.Union(some query expression, other query expression).
Note that both expressions must return the same type.
AFAIK, there are no tools that automatically convert SQL to LINQ.
(For non-trivial SQL, that's a non-trivial task)

What's faster? Struct array or DataTable

I am using LinqToSQL to process data from SQL Server to dump it into an iSeries server for further processing. More details on that here.
My problem is that it is taking about 1.25 minutes to process those 350 rows of data. I am still trying to decipher the results from the SQL Server Profiler, but there are a TON of queries being run. Here is a bit more detail on what I am doing:
using (CarteGraphDataDataContext db = new CarteGraphDataDataContext())
{
var vehicles = from a in db.EquipmentMainGenerals
join b in db.EquipmentMainConditions on a.wdEquipmentMainGeneralOID equals b.wdEquipmentMainGeneralOID
where b.Retired == null
orderby a.VehicleId
select a;
et = new EquipmentTable[vehicles.Count()];
foreach (var vehicle in vehicles)
{
// Move data to the array
// Rates
GetVehcileRates(vehicle.wdEquipmentMainGeneralOID);
// Build the costs accumulators
GetPartsAndOilCosts(vehicle.VehicleId);
GetAccidentAndOutRepairCosts(vehicle.wdEquipmentMainGeneralOID);
// Last Month's Accumulators
et[i].lastMonthActualGasOil = GetFuel(vehicle.wdEquipmentMainGeneralOID) + Convert.ToDecimal(oilCost);
et[i].lastMonthActualParts = Convert.ToDecimal(partsCost);
et[i].lastMonthActualLabor = GetLabor(vehicle.VehicleId);
et[i].lastMonthActualOutRepairs = Convert.ToDecimal(outRepairCosts);
et[i].lastMonthActualAccidentCosts = Convert.ToDecimal(accidentCosts);
// Move more data to the array
i++;
}
}
The Get methods all look similar to:
private void GetPartsAndOilCosts(string vehicleKey)
{
oilCost = 0;
partsCost = 0;
using (CarteGraphDataDataContext db = new CarteGraphDataDataContext())
{
try
{
var costs = from a in db.WorkOrders
join b in db.MaterialLogs on a.WorkOrderId equals b.WorkOrder
join c in db.Materials on b.wdMaterialMainGeneralOID equals c.wdMaterialMainGeneralOID
where (monthBeginDate.Date <= a.WOClosedDate && a.WOClosedDate <= monthEndDate.Date) && a.EquipmentID == vehicleKey
group b by c.Fuel into d
select new
{
isFuel = d.Key,
totalCost = d.Sum(b => b.Cost)
};
foreach (var cost in costs)
{
if (cost.isFuel == 1)
{
oilCost = (double)cost.totalCost * (1 + OVERHEAD_RATE);
}
else
{
partsCost = (double)cost.totalCost * (1 + OVERHEAD_RATE);
}
}
}
catch (InvalidOperationException e)
{
oilCost = 0;
partsCost = 0;
}
}
return;
}
My thinking here is cutting down the number of queries to the DB should speed up the processing. If LINQ does a SELECT for every record, maybe I need to load every record into memory first.
I still consider myself a beginner with C# and OOP in general (I do mostly RPG programming on the iSeries). So I am guessing I am doing something stupid. Can you help me fix my stupidity (at least with this problem)?
Update: Thought I would come back and update you on what I have discovered. It appears like the database was poorly designed. Whatever LINQ was generating in the background it was highly inefficient code. I am not saying the LINQ is bad, it just was bad for this database. I converted to a quickly thrown together .XSD setup and the processing time went from 1.25 minutes to 15 seconds. Once I do a proper redesign, I can only guess I'll shave a few more seconds off of that. Thank you all for you comments. I'll try LINQ again some other day on a better database.

There are a few things that I spot in your code:
You query the database multiple times for each item in the 'var vehicles' query, you might want to rewrite that query so that less database queries are needed.
When you don't need all the properties of the queried entity, or need sub entities of that entity, it's better for performance to use an anonymous type in your select. LINQ to SQL will analyze this and retrieve less data from your database. Such a select might look like this: select new { a.VehicleId, a.Name }
The query in the GetPartsAndOilCosts can be optimized by putting the calculation cost.totalCost * (1 + OVERHEAD_RATE) in the LINQ query. This way the query can be executed in the database completely, which should make it much faster.
You are doing a Count() on the var vehicles query, but you only use it for determining the size of the array. While LINQ to SQL will make a very efficient SELECT count(*) query of it, it takes an extra round trip to the database. Besides that (depending on your isolation level) the time you start iterating the query an item could be added. In that case your array is too small and an ArrayIndexOutOfBoundsException will be thrown. You can simply use .ToArray() on the query or create a List<EquipmentTable> and call .ToArray() on that. This will normally be fast enough especially when you only have only 380 items in this collection and it will certainly be faster than having an extra roundtrip to the database (the count).
As you probably already expect, the amount of database queries are the actual problem. Switching between struct array or DataTable will not perform much different.
After you optimized away as much queries that you could, start analyzing the queries left (using SQL profiler) and optimize these queries using the Index tuning wizard. It will propose some new indexes for you, that could speed things up considerably.
A little extra explanation for point #1. What you're doing here is a bit like this:
var query = from x in A select something;
foreach (var row in query)
{
var query2 = from y in data where y.Value = row.Value select something;
foreach (var row2 in query2)
{
// do some computation.
}
}
What you should try to accomplish is to remove the query2 subquery, because it is executing on each row of the top query. So you could end up with something like this:
var query =
from x in A
from y in B
where x.Value == y.Value
select something;
foreach (var row in query)
{
}
Of course this example is simplistic and in real life it gets get pretty complicated (as you’ve already noticed). In your case also because you've got multiple of those 'sub queries'. It can take you some time to get this right, especially with your lack of knowledge of LINQ to SQL (as you said yourself).
If you can't figure it out, you can always ask again here at Stackoverflow, but please remember to strip your problem to the smallest possible thing, because it's no fun to read over someone's mess (we're not getting paid for this) :-)
Good luck.

LINQ to SQL: Complicated query with aggregate data for a report from multiple tables for an ordering system

I want to convert the following query into LINQ syntax. I am having a great deal of trouble managing to get it to work. I actually tried starting from LINQ, but found that I might have better luck if I wrote it the other way around.
SELECT
pmt.guid,
pmt.sku,
pmt.name,
opt.color,
opt.size,
SUM(opt.qty) AS qtySold,
SUM(opt.qty * opt.itemprice) AS totalSales,
COUNT(omt.guid) AS betweenOrders
FROM
products_mainTable pmt
LEFT OUTER JOIN
orders_productsTable opt ON opt.products_mainTableGUID = pmt.guid
LEFT OUTER JOIN orders_mainTable omt ON omt.guid = opt.orders_mainTableGUID AND
(omt.flags & 1) = 1
GROUP BY
pmt.sku, opt.color, opt.size, pmt.guid, pmt.name
ORDER BY
pmt.sku
The end result is a table that shows me information about a product as you can see above.
How do I write this query, in LINQ form, using comprehension syntax ?
Additionally, I may want to add additional filters (to the orders_mainTable, for instance).
Here is one example that I tried to make work, and was fairly close but am not sure if it's the "correct" way, and was not able to group it by size and color from the orders_productsTable.
from pmt in products_mainTable
let Purchases =
from opt in pmt.orders_productsTable
where ((opt.orders_mainTable.flags & 1) == 1)
where ((opt.orders_mainTable.date_completedon > Convert.ToDateTime("01/01/2009 00:00:00")))
select opt
orderby pmt.sku
select new {
pmt.guid,
pmt.sku,
pmt.name,
pmt.price,
AvgPerOrder = Purchases.Average(p => p.qty).GetValueOrDefault(0),
QtySold = Purchases.Sum(p => p.qty).GetValueOrDefault(),
SoldFor = Purchases.Sum(p => p.itemprice * p.qty).GetValueOrDefault()
}
*Edit:
To be a little more explicit so you can understand what I am trying to do, here is some more explanation.
Products are stored in products_mainTable
Orders are stored in orders_mainTable
Products That Have Been Ordered are stored in orders_productsTable
I want to create several reports based on products, orders, etc. drilling into the data and finding meaningful bits to display to the end user.
In this instance, I am trying to show which products have been purchased over a period of time, and are the most popular. How many sold, for what price, and what is the breakout per order. Maybe not the best order, but I'm just experimenting and picked this one.
All of the tables have relationships to other tables. So from the product table, I can get to what orders ordered that product, etc.
The largest problem I am having, is understanding how LINQ works, especially with grouping, aggregate data, extensions, subqueries, etc. It's been fun, but it's starting to get frustrating because I am having difficulty finding detailed explanations on how to do this.

I'm also a beginner in LINQ. I don't know if this is the right way of grouping by several fields but I think you have to transform these grouping fields into a representing key. So, assuming that all your grouping fields are strings or ints you can make a key as follows:
var qry = from pmt in products_mainTable
join opt in orders_productsTable on pmt.guid equals opt.products_mainTableGUID
join omt in orders_mainTable on opt.orders_mainTableGUID equals omt.guid
where (opt.orders_mainTable.flags & 1) == 1
group omt by pmt.sku + opt.price + opt.size + pmt.guid + pmt.name into g
orderby g.sku
select new
{
g.FirstOrDefault().guid,
g.FirstOrDefault().sku,
g.FirstOrDefault().name,
g.FirstOrDefault().color,
g.FirstOrDefault().price,
AvgPerOrder = g.Average(p => p.qty).GetValueOrDefault(0),
QtySold = g.Sum(p => p.qty).GetValueOrDefault(),
SoldFor = g.Sum(p => p.itemprice * p.qty).GetValueOrDefault()
};
I didn't test this so please see if this helps you in any way.

Bruno, thank you so much for your assistance! The FirstOrDefault() was probably the largest help. Following some of what you did, and another resource I came up with the following that seems to work beautifully! This LINQ query below gave me nearly an exact replication of the SQL I posted above.
Here's the other resource I found on doing a LEFT OUTER JOIN in LINQ: Blog Post
Final Answer:
from pmt in products_mainTable
join opt in orders_productsTable on pmt.guid equals opt.products_mainTableGUID into tempProducts
from orderedProducts in tempProducts.DefaultIfEmpty()
join omt in orders_mainTable on orderedProducts.orders_mainTableGUID equals omt.guid into tempOrders
from ordersMain in tempOrders.DefaultIfEmpty()
group pmt by new { pmt.sku, orderedProducts.color, orderedProducts.size } into g
orderby g.FirstOrDefault().sku
select new {
g.FirstOrDefault().guid,
g.Key.sku,
g.Key.size,
QTY = g.FirstOrDefault().orders_productsTable.Sum(c => c.qty),
SUM = g.FirstOrDefault().orders_productsTable.Sum(c => c.itemprice * c.qty),
AVG = g.FirstOrDefault().orders_productsTable.Average(c => c.itemprice * c.qty),
Some = g.FirstOrDefault().orders_productsTable.Average(p => p.qty).GetValueOrDefault(0),
}

This was very helpful to me thanks. I had a similar issue I was trying to sort through only my case was much simpler as I didn't have any joins in it. I was simply trying to group one field, get the min of another, and the count. (min and count in the same query)
Here is the SQL I wanted to recreate in Linq syntax:
select t.Field1, min(t.Field2), COUNT(*)
from SomeTable t
group by t.Field1
order by t.Field1
Thanks to your post I eventually managed to come up with this:
from t in SomeTable
group t by new { t.Field1 } into g
orderby g.Key.Field1
select new
{
g.Key.Field1,
code = g.Min(c => c.Field2),
qty = g.Count()
}
Which creates the following SQL behind the scenes:
SELECT [t1].[Field1], [t1].[value] AS [code], [t1].[value2] AS [qty]
FROM (
SELECT MIN([t0].[Field2]) AS [value], COUNT(*) AS [value2], [t0].[Field1]
FROM [SomeTable] AS [t0]
GROUP BY [t0].[Field1]
) AS [t1]
ORDER BY [t1].[Field1]
Perfect, exactly what I was looking to do. The key for me was that you showed it possible to do this inside the new {} which is something I had never considered trying. This is huge, I now feel like I have a significantly better understanding going forward.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LinqToSql - SQL generated by CONCAT (UNION) - c#

The SQL generated from LINQ is not designed to be readable, so I don't think there is a way to do that. It would be nice, I agree. This is one reason I am intrigued by the concept of micro-ORMs like Dapper, so I can just write my own SQL.

Related

How to force LINQ to SQL to evaluate the whole query in the database?

LINQ understading Non-Equijoins

Help Converting T-SQL to LINQ

What's faster? Struct array or DataTable

LINQ to SQL: Complicated query with aggregate data for a report from multiple tables for an ordering system

Categories

Resources