I'm new to working with nHibernate and have inherited a project which has implemented it for accessing its database. So far I've been able to write additional single-table query methods using QueryOvery<>, but I'm finding the logic behind table joins perplexing.
I want to implement the following T-SQL query:
SELECT DISTINCT f.FILE_ID, f.COMPANY_ID, f.FILE_META_ID, (etc...)
FROM AUDIT.FILE_INSTANCE f
INNER JOIN AUDIT.FILE_INSTANCE_REPORTING_PERIOD p ON p.FILE_ID = f.FILE_ID
WHERE p.REPORTING_PERIOD_ID BETWEEN 20150101 AND 20151304
AND FILE_STATUS != 'CANCELLED';
In this example, the reporting period ids would be parameters. Please note that they are integers, not dates, and the DISTINCT clause is important. How should I proceed?
I do not know your classes but if i would have written it, it would look like this:
var results = session.QueryOver<FileInstance>()
.Where(fi => fi.Status == FileStatus.Canceled)
.JoinQueryOver(fi => fi.ReportingPeriods)
.WhereRestrictionOn(period => period.Id).Between(20150101, 20151304)
.Transform(Transformers.DistinctRootEntity)
.List();
Related
I need to retrieve data from 2 SQL tables, using LINQ. I was hoping to combine them using a Join. I've looked this problem up on Stack Overflow, but all the questions and answers I've seen involve retrieving the data using ToList(), but I need to use lazy loading. The reason for this is there's too much data to fetch it all. Therefore, I've got to apply a filter to both queries before performing a ToList().
One of these queries is easily specified:
var solutions = ctx.Solutions.Where(s => s.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || s.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear);
It retrieves all the data from the Solution table, where the SolutionNumber starts with either the current or previous year. It returns an IQueryable.
The thing that's tough for me to figure out is how to retrieve a filtered list from another table named Proficiency. At this point all I've got is this:
var profs = ctx.Proficiencies;
The Proficiency table has a column named SolutionID, which is a foreign key to the ID column in the Solution table. If I were doing this in SQL, I'd do a subquery where SolutionID is in a collection of IDs from the Solution table, where those Solution records match the same Where clause I'm using to retrieve the IQueryable for Solutions above. Only when I've specified both IQueryables do I want to then perform a ToList().
But I don't know how to specify the second LINQ query for Proficiency. How do I go about doing what I'm trying to do?
As far as I understand, you are trying to fetch Proficiencies based on some Solutions. This might be achieved in two different ways. I'll try to provide solutions in Linq as it is more readable. However, you can change them in Lambda Expressions later.
Solution 1
var solutions = ctx.Solutions
.Where(s => s.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || s.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear)
.Select(q => q.SolutionId);
var profs = (from prof in ctx.Proficiencies where (from sol in solutions select sol).Contains(prof.SolutionID) select prof).ToList();
or
Solution 2
var profs = (from prof in ctx.Proficiencies
join sol in ctx.Solutions on prof.SolutionId equals sol.Id
where sol.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || sol.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear
select prof).Distinct().ToList();
You can trace both queries in SQL Profiler to investigate the generated queries. But I'd go for the first solution as it will generate a subquery that is faster and does not use Distinct function that is not recommended unless you have to.
We have 18 table join which is typical for ERP systems. The join is done via LINQ over Entity Framework.
The join gets progressively slower as more joins are added. The return result set is small(15 records). The LINQ generated query is captured via SQL Profiler and when we run this via Microsoft Management Console it is very fast : 10ms. When we run it via our C# LINQ-over-EntityFramework it takes 4 seconds.
What i guess is happening:
The time it takes to compile expression tree into SQL is 2 seconds out of total 4 seconds, and another 2 seconds i guess is spent internally to convert SQL result set into actually C# classes. Also it is not connected to initialization of entity framework because we run some queries before and repetitive calls to this join produce same 4 seconds.
Is there a way to speed this up. Otherwise we are considering abandoning Entity Framework for being absolutely inefficient...
In case it helps, I had a nasty performance issue, whereby a simple query that took 1-2 seconds in raw SQL took about 11 seconds via EF.
I went from using...
List<GeographicalLocation> geographicalLocations = new SalesTrackerCRMEntities()
.CreateObjectSet<GeographicalLocation>()
.Where(g => g.Active)
.ToList();
which took about 11 seconds via EF, to using...
var geographicalLocations = getContext().CreateObjectSet<GeographicalLocation>()
.AsNoTracking()
.Where(g => g.Active).ToList();
which took less than 200 milliseconds.
The disadvantage to this is that it won't load related entities, so you have to load them manually afterwards, but it gives such an immense performance boost that it was well worth it (in this case at least).
You would have to assess each case individually to see if the extra speed is worth the extra code.
You correctly identified bottlenecks.
If you have quite complex queries, I would suggest you to use compiled queries to overcome expression tree to sql query conversion.
You can refer Compiled Queries in EF from here.
Fo second part if EF is using two much time materialize your object graph then I would suggest to use some other means to retrieve data apart from EF.
One option can be Dapper.NET, You can have your concise sql query and you can directly retrieve its result in concrete model objects using Dapper (or any other tiny ORM)
I suspect your query takes so long to generate becuase you are treating Entity Framework like it is a SQL Query, which is not correct. You have many joins and akward calls in your linq syntax. Generally, your syntax should be similar to the following fictitious modeling query:
var result = (from appointment in appointments
from operation in appointment.Operations
where appointment.Id == 12
select new Model {
Id = appointment.Id,
Name = appointment.Name,
// etc, etc
}).ToList();
There is no use of joins above, the navigation property between Appointment and Operations takes care of the neccessary plumbing. Remember, this is an ORM, there is no concept of a join, only a concept of relationships.
The call to Distinct at the end, also indicates the structure of the db schema may be problematic if it returns too many duplicate results.
If after refactoring the entity model and correctly constructing the query still leaves with underperformance, it is advisable to use a stored procedure and map the result with EF's built in methods for doing so.
It is hard to tell what is going wrong here without seeing how you are using linq, but I suspect this will fix your problem:
var myResult = dataContext.table.Where(x == "Your joins and otherstuff").ToList();
//after converting it to a list use it how you need, but not before.
If this does not help please post your code.
The problem is that you are probably passing it to a data source that is running all sorts of additional queries based on you open result set.
Try this instead:
IEnumerable<SigmaTEK.Web.Models.SchedulerGridModel> tasks = (from appointment in _appointmentRep.Value.Get().Where(a => (a.Start < DbContext.MaxTime && DbContext.MinTime < a.Expiration))
join timeApplink in _timelineAppointmentLink.Value.Get().Where(a => a.AppointmentId != Guid.Empty)
on appointment.Id equals timeApplink.AppointmentId
join timeline in timelineRep.Value.Get().Where(i => timelines.Contains(i.Id))
on timeApplink.TimelineId equals timeline.Id
join repeater in _appointmentRepeaterRep.Value.Get().Where(repeater => (repeater.Start < DbContext.MaxTime && DbContext.MinTime < repeater.Expiration))
on appointment.Id equals repeater.Appointment
into repeaters
from repeater in repeaters.DefaultIfEmpty()
join aInstance in _appointmentInstanceRep.Value.Get()
on appointment.Id equals aInstance.Appointment
into instances
from instance in instances.DefaultIfEmpty()
join opRes in opResRep.Get()
on instance.ResourceOwner equals opRes.Id
into opResources
from op in opResources.DefaultIfEmpty()
join itemResource in _opDocItemResourcelinkRep.Value.Get()
on op.Id equals itemResource.Resource
into itemsResources
from itemresource in itemsResources.DefaultIfEmpty()
join opDocItem in opDocItemRep.Get()
on itemresource.OpDocItem equals opDocItem.Id
into opDocItems
from opdocitem in opDocItems.DefaultIfEmpty()
join opDocSection in opDocOpSecRep.Get()
on opdocitem.SectionId equals opDocSection.Id
into sections
from section in sections.DefaultIfEmpty()
join opDoc in opDocRep.Get()
on section.InternalOperationalDocument equals opDoc.Id
into opdocs
from opdocitem2 in opDocItems.DefaultIfEmpty()
join opDocItemLink in opDocItemStrRep.Get()
on opdocitem2.Id equals opDocItemLink.Parent
into opDocItemLinks
from link in opDocItemLinks.DefaultIfEmpty()
join finItem in finItemsRep.Get()
on link.Child equals finItem.Id
into temp1
from rd1 in temp1.DefaultIfEmpty()
join sec in finSectionRep.Get()
on rd1.SectionId equals sec.Id
into opdocsections
from finopdocsec in opdocsections.DefaultIfEmpty()
join finopdoc in opDocRep.Get().Where(i => i.DocumentType == "Quote")
on finopdocsec.InternalOperationalDocument equals finopdoc.Id
into finOpdocs
from finOpDoc in finOpdocs.DefaultIfEmpty()
join entry in entryRep.Get()
on rd1.Transaction equals entry.Transaction
into entries
from entry2 in entries.DefaultIfEmpty()
join resproduct in resprosductRep.Get()
on entry2.Id equals resproduct.Entry
into resproductlinks
from resprlink in resproductlinks.DefaultIfEmpty()
join res in resRep.Get()
on resprlink.Resource equals res.Id
into rootResource
from finopdoc in finOpdocs.DefaultIfEmpty()
join rel in orgDocIndRep.Get().Where(i => (i.Relationship == "OrderedBy"))
on finopdoc.Id equals rel.OperationalDocument
into orgDocIndLinks
from orgopdoclink in orgDocIndLinks.DefaultIfEmpty()
join org in orgRep.Get()
on orgopdoclink.Organization equals org.Id
into toorgs
from opdoc in opdocs.DefaultIfEmpty()
from rootresource in rootResource.DefaultIfEmpty()
from toorg in toorgs.DefaultIfEmpty()
select new SigmaTEK.Web.Models.SchedulerGridModel()
{
Id = appointment.Id,
Description = appointment.Description,
End = appointment.Expiration,
Start = appointment.Start,
OperationDisplayId = op.DisplayId,
OperationName = op.Name,
AppContextId = _appContext.Id,
TimelineId = timeline.Id,
AssemblyDisplayId = rootresource.DisplayId,
//Duration = SigmaTEK.Models.App.Utils.StringHelpers.TimeSpanToString((appointment.Expiration - appointment.Start)),
WorkOrder = opdoc.DisplayId,
Organization = toorg.Name
}).Distinct().ToList();
//In your UI
MyGrid.DataSource = tasks;
MyGrid.DataBind();
//Do not use an ObjectDataSource! It makes too many extra calls
I have a situation where my application constructs a dynamic LINQ query using PredicateBuilder based on user-specified filter criteria (aside: check out this link for the best EF PredicateBuilder implementation). The problem is that this query usually takes a long time to run and I need the results of this query to perform other queries (i.e., joining the results with other tables). If I were writing T-SQL, I'd put the results of the first query into a temporary table or a table variable and then write my other queries around that. I thought of getting a list of IDs (e.g., List<Int32> query1IDs) from the first query and then doing something like this:
var query2 = DbContext.TableName.Where(x => query1IDs.Contains(x.ID))
This will work in theory; however, the number of IDs in query1IDs can be in the hundreds or thousands (and the LINQ expression x => query1IDs.Contains(x.ID) gets translated into a T-SQL "IN" statement, which is bad for obvious reasons) and the number of rows in TableName is in the millions. Does anyone have any suggestions as to the best way to deal with this kind of situation?
Edit 1: Additional clarification as to what I'm doing.
Okay, I'm constructing my first query (query1) which just contains the IDs that I'm interested in. Basically, I'm going to use query1 to "filter" other tables. Note: I am not using a ToList() at the end of the LINQ statement---the query is not executed at this time and no results are sent to the client:
var query1 = DbContext.TableName1.Where(ComplexFilterLogic).Select(x => x.ID)
Then I take query1 and use it to filter another table (TableName2). I now put ToList() at the end of this statement because I want to execute it and bring the results to the client:
var query2 = (from a in DbContext.TableName2 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
Then I take query1 and re-use it to filter yet another table (TableName3), execute it and bring the results to the client:
var query3 = (from a in DbContext.TableName3 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
I can keep doing this for as many queries as I like:
var queryN = (from a in DbContext.TableNameN join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
The Problem: query1 is takes a long time to execute. When I execute query2, query3...queryN, query1 is being executed (N-1) times...this is not a very efficient way of doing things (especially since query1 isn't changing). As I said before, if I were writing T-SQL, I would put the result of query1 into a temporary table and then use that table in the subsequent queries.
Edit 2:
I'm going to give the credit for answering this question to Albin Sunnanbo for his comment:
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).
I think that's really the best that one can do with Entity Framework. In the end, if the performance gets really bad, I'll probably go with John Wooley's suggestion:
This may be a situation where dropping to native ADO against a stored proc returning multiple results and using an internal temp table might be your best option for this operation. Use EF for the other 90% of your app.
Thanks to everyone who commented on this post...I appreciate everyone's input!
If the size of TableName is not too big to load the whole table you use
var tableNameById = DbContext.TableName.ToDictionary(x => x.ID);
to fetch the whole table and automatically put it in a local Dictionary with ID as key.
Another way is to just "force" the LINQ evaluation with .ToList(), in the case fetch the whole table and do the Where part locally with Linq2Objects.
var query1Lookup = new Hashset<int>(query1IDs);
var query2 = DbContext.TableName.ToList().Where(x => query1IDs.Contains(x.ID));
Edit:
Storing a list of ID:s from one query in a list and use that list as filter in another query can usually be rewritten as a join.
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).
Since you are running a subsequent query off the results, take your first query and use it as a View on your SQL Server, add the view to your context, and build your LINQ queries against the view.
Have you considered composing your query as per this article (using the decorator design pattern):
Composed LINQ Queries using the Decorator Pattern
The premise is that, instead of enumerating your first (very constly) query, you basically use the decorator pattern to produce a chain of IQueryable that is a result of query 1 and query N. This way you always execute the filtered form of the query.
Hope this might help
I have a system that uses tags to categorize content similar to the way Stack Overflow does. I am trying to generate a list of the most recently used tags using LINQ to SQL.
(from x in cm.ContentTags
join t in cm.Tags on x.TagID equals t.TagID
orderby x.ContentItem.Date descending select t)
.Distinct(p => p.TagID) // <-- Ideally I'd be able to do this
The Tags table has a many to many relationship with the ContentItems table. ContentTags joins them, each tuple having a reference to the Tag and ContentItem.
I can't just use distinct because it compares on Tablename.* rather than Tablename.PrimaryKey, and I can't implement an IEqualityComparer since that doesn't translate to SQL, and I don't want to pull potential millions of records from the DB with .ToList(). So, what should I do?
You could write your own query provider, that supports such an overloaded distinct operator. It's not cheap, but it would probably be worthwhile, particularly if you could make it's query generation composable. That would enable a lot of customizations.
Otherwise you could create a stored proc or a view.
Use a subselect:
var result = from t in cm.Tags
where(
from x in cm.ContentTags
where x.TagID == t.TagID
).Contains(t.TagID)
select t;
This will mean only distinct records are returned form cm.Tags, only problem is you will need to find some way to order result
Use:
.GroupBy(x => x.TagID)
And then extract the data in:
Select(x => x.First().Example)
I'm trying to create a LINQ provider. I'm using the guide LINQ: Building an IQueryable provider series, and I have added the code up to LINQ: Building an IQueryable Provider - Part IV.
I am getting a feel of how it is working and the idea behind it. Now I'm stuck on a problem, which isn't a code problem but more about the understanding.
I'm firing off this statement:
QueryProvider provider = new DbQueryProvider();
Query<Customer> customers = new Query<Customer>(provider);
int i = 3;
var newLinqCustomer = customers.Select(c => new { c.Id, c.Name}).Where(p => p.Id == 2 | p.Id == i).ToList();
Somehow the code, or expression, knows that the Where comes before the Select. But how and where?
There is no way in the code that sorts the expression, in fact the ToString() in debug mode, shows that the Select comes before the Where.
I was trying to make the code fail. Normal I did the Where first and then the Select.
So how does the expression sort this? I have not done any change to the code in the guide.
The expressions are "interpreted", "translated" or "executed" in the order you write them - so the Where does not come before the Select
If you execute:
var newLinqCustomer = customers.Select(c => new { c.Id, c.Name})
.Where(p => p.Id == 2 | p.Id == i).ToList();
Then the Where is executed on the IEnumerable or IQueryable of the anonymous type.
If you execute:
var newLinqCustomer = customers.Where(p => p.Id == 2 | p.Id == i)
.Select(c => new { c.Id, c.Name}).ToList();
Then the Where is executed on the IEnumerable or IQueryable of the customer type.
The only thing I can think of is that maybe you're seeing some generated SQL where the SELECT and WHERE have been reordered? In which case I'd guess that there's an optimisation step somewhere in the (e.g.) LINQ to SQL provider that takes SELECT Id, Name FROM (SELECT Id, Name FROM Customer WHERE Id=2 || Id=#i) and converts it to SELECT Id, Name FROM Customer WHERE Id=2 || Id=#i - but this must be a provider specific optimisation.
No, in the general case (such as LINQ to Objects) the select will be executed before the where statement. Think of it is a pipeline, your first step is a transformation, the second a filter. Not the other way round, as it would be the case if you wrote Where...Select.
Now, a LINQ Provider has the freedom to walk the expression tree and optimize it as it sees fit. Be aware that you may not change the semantics of the expression though. This means that a smart LINQ to SQL provider would try to pull as many where clauses it can into the SQL query to reduce the amount of data travelling over the network. However, keep the example from Stuart in mind: Not all query providers are clever, partly because ruling out side effects from query reordering is not as easy as it seems.