I'm just wondering if anyone can offer any advice on how to improve my query.
Basically, it'll be merging 2 rows into 1. The only thing the rows will differ by is a 'Type' char column ('S' or 'C') and the Value. What I want to do is select one row, with the 'S' value and the 'C' value, and calculate the difference (S-C).
My query works, but it's pretty slow - it takes around 8 seconds to get the results, which is not ideal for my application. I wish I could change the database structure but I can't sadly!
Here is my query:
var sales = (from cm in dc.ConsignmentMarginBreakdowns
join sl in dc.SageAccounts on new { LegacyID = cm.Customer, Customer = true } equals new { LegacyID = sl.LegacyID, Customer = sl.Customer }
join ss in dc.SageAccounts on sl.ParentAccount equals ss.ID
join vt in dc.VehicleTypes on cm.ConsignmentTripBreakdown.VehicleType.Trim() equals vt.ID.ToString() into vtg
where cm.ConsignmentTripBreakdown.DeliveryDate >= dates.FromDate && cm.ConsignmentTripBreakdown.DeliveryDate <= dates.ToDate
where (customer == null || ss.SageID == customer)
where cm.BreakdownType == 'S'
orderby cm.Depot, cm.TripNumber
select new
{
NTConsignment = cm.NTConsignment,
Trip = cm.ConsignmentTripBreakdown,
LegacyID = cm.LegacyID,
Costs = dc.ConsignmentMarginBreakdowns.Where(a => a.BreakdownType == 'C' && a.NTConsignment == cm.NTConsignment && a.LegacyID == cm.LegacyID && a.TripDate == cm.TripDate && a.Depot == cm.Depot && a.TripNumber == cm.TripNumber).Single().Value,
Sales = cm.Value ?? 0.00m,
Customer = cm.Customer,
SageID = ss.SageID,
CustomerName = ss.ShortName,
FullCustomerName = ss.Name,
Vehicle = cm.ConsignmentTripBreakdown.Vehicle ?? "None",
VehicleType = vtg.FirstOrDefault().VehicleTypeDescription ?? "Subcontractor"
});
A good place to start when optimizing Linq to SQL queries is the SQL Server Profiler. There you can find what SQL code is being generated by Linq to SQL. From there, you can toy around with the linq query to see if you can get it to write a better query. If that doesn't work, you can always write a stored procedure by hand, and then call it from Linq to SQL.
There really isn't enough information supplied to make an informed opinion. For example, how many rows in each of the tables? What does the generated T-SQL look like?
One thing I would suggest first is to take the outputted T-SQL, generate a query plan and look for table or index scans.
Related
I am using Entity Framework 5 to access my DB. The model is quite complex with a lot of navigation properties. I have written the following query using linq:
var myQuery =
from cp in context.ClosedPositions.Include("Position").Include("Position.Folder").Include("Position.Strategy").Include("Position.Symbol").Include("Position.StopTargetPlacer")
where cp.Position.EntryDate >= fromDT &&
cp.ExitDate <= toDT &&
(cp.Position.Folder.FolderCode == myFolder || showAllFolders) &&
(cp.Position.Strategy.Name == myStrategy || showAllStrategies) &&
(cp.Position.Symbol.Name == mySymbol || showAllSymbols) &&
(cp.Position.Symbol.Exchange == myExchange || showAllExchanges)
orderby cp.Position.EntryDate
select cp;
The navigations multiplicity are the following:
Position 1 - * ClosedPostion
Position * - 1 Folder
Position * - 1 Strategy
Position * - 1 Symbol
Then in a foreach query I use the Data from the Included navigation properties. I think this way should not hit the database more than once. The query runs for about 6 seconds.
Then I have rewritten the query to this:
var myQuery =
from cp in context.ClosedPositions
join p in context.Positions on cp.PositionID equals p.ID
join f in context.Folders on p.FolderID equals f.ID
join sy in context.Symbols on p.SymbolID equals sy.ID
join st in context.Strategies on p.StrategyID equals st.ID
join stp in context.StopTargetPlacers on p.StopTargetPlacerID equals stp.ID
where p.EntryDate >= fromDT &&
cp.ExitDate <= toDT &&
(f.FolderCode == myFolder || showAllFolders) &&
(st.Name == myStrategy || showAllStrategies) &&
(sy.Name == mySymbol || showAllSymbols) &&
(sy.Exchange == myExchange || showAllExchanges)
orderby p.EntryDate
select new
{
ClosedPositionID = cp.ID,
PositionID = p.ID,
p.EntryChartID,
cp.ExitChartID,
p.EntryDate,
cp.ExitDate,
Symbol = sy.Name,
Strategy = st.Name,
p.Size,
cp.Profit,
STPlacer = p.StopTargetPlacer.Name,
InitialRisk = p.InitialRisk,
StrategyDirection = st.Direction
};
Again I have used the same foreach loop to work on the data. This time the total processing time was only around 1 second.
I have examined the generated SQL queries by both LINQ query in running them in the SSMS both of them returned the same data in the same amount of time.
My question is why is there the huge delay between using an anonymous class and a class from the model of the context?
Ok after some research I found out that the problem is that in the first case in the background the EF builds up the change tracking structure and in the second case since I am using an anonymous class this will not happen. The solution was the AsNoTracking function:
var myQuery =
from cp in context.ClosedPositions.Include("Position").AsNoTracking().Include("Position.Folder").Include("Position.Strategy").Include("Position.Symbol").Include("Position.StopTargetPlacer")
where cp.Position.EntryDate >= fromDT &&
cp.ExitDate <= toDT &&
(cp.Position.Folder.FolderCode == myFolder || showAllFolders) &&
(cp.Position.Strategy.Name == myStrategy || showAllStrategies) &&
(cp.Position.Symbol.Name == mySymbol || showAllSymbols) &&
(cp.Position.Symbol.Exchange == myExchange || showAllExchanges)
orderby cp.Position.EntryDate
select cp;
How many records are we talking about?
Have you disabled Entity Tracking at the context level?
Remember that when EF materializes an entity with tracking enabled it has to check every record coming from the database to make sure it does not materialize the same object again. Also tracking an entity is more expensive because it has to register all the entities (more code execution).
When you materialize an anonymous type the context does not worry about all this.
Looks to me like the first query is going to basically pull back every property on the entities that are used in the navigation properties even if you don't want them and the second query is using projection so its only going to retrieve those properties you have specifically asked for in your query... in this instance only the Strategy name for example and ignores everything else that may belong to a strategy.
Although this is over simplified... imagine the following scenario... (MyTable has X number of columns)
First Query is much like:
SELECT * FROM MyTable
Where as projection enables EF to be alot more specific...
SELECT column1, column2, column3 FROM MyTable
And that is why you are experiencing a much faster query.
Sometimes when I'm writing queries using LINQ and if I use it inside of a loop. It turns so slow the performance.
var query1 = from c in db.Classes
where c.TeacherId.Equals(teacherId)
select c;
// AnsweredAssignment Query
var query2 = (from c in db.AnsweredAssignments
where c.AssignmentId == assignmentId && c.Student.Class.TeacherId.Equals(teacherId)
select c).ToArray();
// Tokens Query
var query3 = (from c in db.Tokens
where c.AssignmentId == assignmentId && c.Student.Class.TeacherId.Equals(teacherId)
select c).ToArray();
// OverwrittenScores Query
var query4 = (from os in db.OverwrittenScores
where os.AssignmentId == assignmentId && os.Student.Class.TeacherId.Equals(teacherId)
select os).ToArray();
foreach (var c in query1)
{
foreach (var s in c.Students)
{
var aaItems = (from aa in query2
where aa.StudentId == s.StudentId
select aa).ToArray();
// Generate scores for objectives
var id3 = (from aa in aaItems
where !aa.IsMakeup
orderby aa.Score descending
select aa).FirstOrDefault();
if (id3 != null)
{
var aa3 = (from aa in query2
where aa.AnsweredAssignmentId == id3.AnsweredAssignmentId
select aa).SingleOrDefault();
...
}
var tokens = (from t in query3
where t.StudentId == s.StudentId
select new MonitorByGeneralScoreToAnsweredAssignment(AssignmentStatus.Pending)).ToList();
...
// does exist any overwritten score?
var osItem = query4.Where(os => os.StudentId == s.StudentId).SingleOrDefault();
...
}
// OverwrittenScores Query
var query4 = (from os in db.OverwrittenScores
where os.AssignmentId == assignmentId && os.Student.Class.TeacherId.Equals(teacherId)
select os).ToArray();
What I'm doing now is to get the records which I'm gonna use instead of getting one by one inside of the loop. Is this a good practice? Sometimes I guess that I'm not doing a good work :(
When I've got the records, I've save it into memory and using LINQ TO OBJECTS (from memory) I get to record.
So remember that making calls to a database will always be slow. In fact, it's often the slowest part of most applications. Thus, you should strive to return a lot of stuff at once, rather than trying to get items one at a time.
Strive to rewrite your queries such that you return as much of the required information in one go as necessary. Although you might use up more memory, it's more often than not worth it for the time savings. Connecting to databases is slow!
Secondly, (last I checked) Entity Framework uses reflection to be able to set properties on your objects. Reflection is also very slow, which is why - despite EFs cool factor - I still prefer to do my queries by hand. The performance is just significantly faster (but of course introduces another layer of complication since now you're not only dealing with one language - C# - but two - C# and SQL - which are conceptually very different).
I am trying to join tables using LINQ by matching columns where a column in the joined table is equal to a variable or the variable is null (at which point the join still needs to happen just not on that field).
My LINQ is something like:
var data = (
from lt in cxt.CmsPageRow
join page in cxt.CmsPage on new { lt.CmsPageID, cmsSiteID.Value } equals new { page.CmsPageID, page.CmsSiteID }
...
cmsSiteID is a nullable INT.
I cannot compile my code as it is complaining about "Type inference failed in the call to 'Join'."
On top of that I need to only join on page.CmsSiteID when cmsSiteID is not null. If cmsSiteID is null then the join on lt.CmsPageID still needs to happen.
* EDIT *
The question has kind of changed now. I can get it to do what I want by using a WHERE clause on the join in my LINQ.
join page in cxt.CmsPage.Where(p=>(cmsSiteID==0||p.CmsSiteID==cmsSiteID)) on lt.CmsPageID equals page.CmsPageID
However, this still runs slow. If I change the parameter passed through to a literal it executes instantly.
Slow runner
(#p__linq__1 = 0 OR [Extent2].[CmsSiteID] = #p__linq__1)
Fast runner
(267 = 0 OR [Extent2].[CmsSiteID] = 267)
Is there a way to speed this up?
join in LINQ assumes an inner join (no nulls). Try pulling the null stuff out into separate where clauses. I think something along these lines should work for what you're describing.
from lt in cxt.CmsPageRow
join page in cxt.CmsPage on lt.CmsPageID == page.CmsPageID
where cmsSiteID == null ||
(cmsSiteID != null && (page.CmsSiteID == null || page.CmsSiteId == cmsSiteID.Value))
select ...
Update
I didn't realize that performance was an issue for you. In that case, I'd suggest creating a different query structure based on values that are known at run-time and don't depend on individual rows:
var rows =
from lt in cxt.CmsPageRow
join page in cxt.CmsPage on lt.CmsPageID == page.CmsPageID
select new {lt, page};
if (cmsSiteID != null)
{
rows = rows.Where(r => r.page.CmsSiteID == null ||
r.page.CmsSiteId == cmsSiteID.Value));
}
var data = rows.Select(...);
Also, if your data context is set up right, you should be able to use navigation properties to simplify your code somewhat.
var rows = ctx.CmsPageRow;
if (cmsSiteID != null)
{
rows = rows.Where(r => r.CmsPage.Any(p => p.CmsSiteID == null ||
p.CmsSiteId == cmsSiteID.Value));
}
var data = rows.Select(...);
I retrieve data from two different repositories:
List<F> allFs = fRepository.GetFs().ToList();
List<E> allEs = eRepository.GetEs().ToList();
Now I need to join them so I do the following:
var EFs = from c in allFs.AsQueryable()
join e in allEs on c.SerialNumber equals e.FSerialNumber
where e.Year == Convert.ToInt32(billingYear) &&
e.Month == Convert.ToInt32(billingMonth)
select new EReport
{
FSerialNumber = c.SerialNumber,
FName = c.Name,
IntCustID = Convert.ToInt32(e.IntCustID),
TotalECases = 0,
TotalPrice = "$0"
};
How can I make this LINQ query better so it will run faster? I would appreciate any suggestions.
Thanks
Unless you're able to create one repository that contains both pieces of data, which would be a far preferred solution, I can see the following things which might speed up the process.
Since you'r always filtering all E's by Month and Year, you should do that before calling ToList on the IQueryable, that way you reduce the number of E's in the join (probably considerably)
Since you're only using a subset of fields from E and F, you can use an anonymous type to limit the amount of data to transfer
Depending on how many serialnumbers you're retrieving from F's, you could filter your E's by serials in the database (or vice versa). But if most of the serialnumbers are to be expected in both sets, that doesn't really help you much further
Reasons why you might not be able to combine the repositories into one are probably because the data is coming from two separate databases.
The code, updated with the above mentioned points 1 and 2 would be similar to this:
var allFs = fRepository.GetFs().Select(f => new {f.Name, f.SerialNumber}).ToList();
int year = Convert.ToInt32(billingYear);
int month = Convert.ToInt32(billingMonth);
var allEs = eRepository.GetEs().Where(e.Year == year && e.Month == month).Select(e => new {e.FSerialNumber, e.IntCustID}).ToList();
var EFs = from c in allFs
join e in allEs on c.SerialNumber equals e.FSerialNumber
select new EReport
{
FSerialNumber = c.SerialNumber,
FName = c.Name,
IntCustID = Convert.ToInt32(e.IntCustID),
TotalECases = 0,
TotalPrice = "$0"
};
I am thoroughly frustrated right now. I am having an issue with LINQ-To-SQL. About 80% of the time, it works great and I love it. The other 20% of the time, the query that L2S creates returns the correct data, but when actually running it from code, it doesn't return anything. I am about to pull my hair out. I am hoping somebody can see a problem or has heard of this before. Google searching isn't returning much of anything.
Here is the linq query...
var query = from e in DataLayerGlobals.GetInstance().db.MILLERTIMECARDs
where e.deleted_by == -1
&& e.LNAME == lastName
&& e.FNAME == firstName
&& e.TIMECARDDATE == startDate.ToString("MM/dd/yyyy")
group e by e.LNAME into g
select new EmployeeHours
{
ContractHours = g.Sum(e => e.HRSCONTRACT),
MillerHours = g.Sum(e => e.HRSSHOWRAIN + e.HRSOTHER),
TravelHours = g.Sum(e => e.HRSTRAVEL)
};
This is the generated query....
SELECT SUM([t0].[HRSCONTRACT]) AS [ContractHours],
SUM([t0].[HRSSHOWRAIN] + [t0].[HRSOTHER]) AS [MillerHours],
SUM([t0].[HRSTRAVEL]) AS [TravelHours]
FROM [dbo].[MILLERTIMECARD] AS [t0]
WHERE ([t0].[deleted_by] = #p0)
AND ([t0].[LNAME] = #p1)
AND ([t0].[FNAME] = #p2)
AND ([t0].[TIMECARDDATE] = #p3)
GROUP BY [t0].[LNAME]
Now when I plug in the EXACT same values that the linq query is using into the generated query, I get the correct data. When I let the code run, I get nothing.
Any ideas?
What type is TIMECARDDATE? Date, datetime, datetime2, smalldatetime, datetimeoffset or character?
Any chance local date/time settings are messing up the date comparison of startDate.ToString(...)? Since you're sending #p3 as a string, 01/02/2009 may mean Feb 1st or January 2nd, depending on the date/time setting on the server.
My instinct is telling me that you need to be pulling out DataLayerGlobals.GetInstance().db.MILLERTIMECARDs into an IQueryable variable and executing your Linq query against that, although there really should be no difference at all (other than maybe better readability).
You can check the results of the IQueryable variable first, before running the Linq query against it.
To extend this concept a bit further, you can create a series of IQueryable variables that each store the results of a Linq query using each individual condition in the original query. In this way, you should be able to isolate the condition that is failing.
I'd also have a look at the LNAME & FNAME data types. If they're NCHAR/NVARCHAR you may need to Trim the records, e.g.
var query = from e in DataLayerGlobals.GetInstance().db.MILLERTIMECARDs
where e.deleted_by == -1
&& e.LNAME.Trim() == lastName
&& e.FNAME.Trim() == firstName
&& e.TIMECARDDATE == startDate.ToString("MM/dd/yyyy")
group e by e.LNAME into g
select new EmployeeHours
{
ContractHours = g.Sum(e => e.HRSCONTRACT),
MillerHours = g.Sum(e => e.HRSSHOWRAIN + e.HRSOTHER),
TravelHours = g.Sum(e => e.HRSTRAVEL)
};