I am trying to use LINQ in C# to perform a SQL query which selects multiple records based on a complex query.
I have a List of Person classes that have the following members that i want to search by:
First Name and City.
Normally in SQL I would iterate over this List and generate the following where statement
Where (Name="Name1" and City="City1") or (Name="Name2" and City="City2") or ...
I am wondering how I can achieve a similar result using LINQ.
I could do this using multiple queries but I want to use less queries due to the latency of the network and to let the database self optimize. The List will contain no more than 100 items at any time.
EDIT
As stated in the below comments the Person class was just an example to try and keep things simple. I will try and go into full detail and explain what I am trying to accomplish and see if you can help me or provide a better solution.
I have 2 database table of interest
Components
Placements
Components contains 8 fields plus an ComponentID. Every component should be unique.
Placements contains a list of fields plus a FK back to Components. There will be several Placements referencing one Component record in the database.
In C# I want to add Placements to the database with one transaction using a DataContext. I will be adding upto 1000 Placement rows to the database. If the Component does not exist I will have to create it. If it does exist I need to get its ComponentID so that I can link the Placement to the Component.
I have recently tried the following code:
The placementGroup has several members of which only 8 are needed to verify uniqueness. It should be noted that the 8 members listed below are unique between all of the objects stored in the list.
List<DBProductionComponent> existingComponents = context.DBProductionComponents.Where(dbc => placementGroup.Any(p =>
p.ComponentName == dbc.ComponentName &&
p.LotNumber == dbc.LotNumber &&
p.Manufacturer == dbc.Manufacturer &&
p.Package == dbc.PackageName &&
p.PartNumber == dbc.PartNumber &&
p.Supplier == dbc.Supplier &&
p.SupplierPartNumber == dbc.SupplierPartNumber &&
p.Value == dbc.Value
)).ToList();
But get the following error: "Local sequence cannot be used in LINQ to SQL implementations of queries except the Contains operator.
Related
I need to retrieve data from 2 SQL tables, using LINQ. I was hoping to combine them using a Join. I've looked this problem up on Stack Overflow, but all the questions and answers I've seen involve retrieving the data using ToList(), but I need to use lazy loading. The reason for this is there's too much data to fetch it all. Therefore, I've got to apply a filter to both queries before performing a ToList().
One of these queries is easily specified:
var solutions = ctx.Solutions.Where(s => s.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || s.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear);
It retrieves all the data from the Solution table, where the SolutionNumber starts with either the current or previous year. It returns an IQueryable.
The thing that's tough for me to figure out is how to retrieve a filtered list from another table named Proficiency. At this point all I've got is this:
var profs = ctx.Proficiencies;
The Proficiency table has a column named SolutionID, which is a foreign key to the ID column in the Solution table. If I were doing this in SQL, I'd do a subquery where SolutionID is in a collection of IDs from the Solution table, where those Solution records match the same Where clause I'm using to retrieve the IQueryable for Solutions above. Only when I've specified both IQueryables do I want to then perform a ToList().
But I don't know how to specify the second LINQ query for Proficiency. How do I go about doing what I'm trying to do?
As far as I understand, you are trying to fetch Proficiencies based on some Solutions. This might be achieved in two different ways. I'll try to provide solutions in Linq as it is more readable. However, you can change them in Lambda Expressions later.
Solution 1
var solutions = ctx.Solutions
.Where(s => s.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || s.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear)
.Select(q => q.SolutionId);
var profs = (from prof in ctx.Proficiencies where (from sol in solutions select sol).Contains(prof.SolutionID) select prof).ToList();
or
Solution 2
var profs = (from prof in ctx.Proficiencies
join sol in ctx.Solutions on prof.SolutionId equals sol.Id
where sol.SolutionNumber.Substring(0, 2) == yearsToConsider.PreviousYear || sol.SolutionNumber.Substring(0, 2) == yearsToConsider.CurrentYear
select prof).Distinct().ToList();
You can trace both queries in SQL Profiler to investigate the generated queries. But I'd go for the first solution as it will generate a subquery that is faster and does not use Distinct function that is not recommended unless you have to.
Using an ADO.NET entity data model I've constructed two queries below against a table containing 1800 records that has just over 30 fields that yield staggering results.
// Executes slowly, over 6000 ms
int count = context.viewCustomers.AsNoTracking()
.Where(c => c.Cust_ID == _custID).Count();
// Executes instantly, under 20 ms
int count = context.viewCustomers.AsNoTracking()
.Where(c => c.Cust_ID == 625).Count();
I see from the database log that Entity Framework provides that the queries are almost identical except that the filter portion uses a parameter. Copying this query into SSMS and declaring & setting this parameter there results in a near instant query so it doesn't appear to be on the database end of things.
Has anyone encountered this that can explain what's happening? I'm at the mercy of a third party control that adds this command to the query in an attempt to limit the number of rows returned, getting the count is a must. This is used for several queries so a generic solution is needed. It is unfortunate it doesn't work as advertised, it seems to only make the query take 5-10 times as long as it would if I just loaded the entire view into memory. When no filter is used however, it works like a dream.
Use of these components includes the source code so I can change this behavior but need to consider which approaches can be used to provide a reusable solution.
You did not mention about design details of your model but if you only want to have count of records based on condition, then this can be optimized by only counting the result set based on one column. For example,
int count = context.viewCustomers.AsNoTracking().Where(c => c.Cust_ID == _custID).Count();
If you design have 10 columns, and based on above statement let say 100 records have been returned, then against every record result set contains 10 columns' data which is of not use.
You can optimize this by only counting result set based on single column.
int count = context.viewCustomers.AsNoTracking().Where(c => c.Cust_ID == _custID).Select(x=>new {x.column}).Count();
Other optimization methods, like using async variants of count CountAsync can be used.
Our front end UI has a filtering system that, in the back end, operates over millions of rows. It uses a an IQueryable that is built up over the course of the logic, then executed all at once. Each individual UI component is ANDed together (for example, Dropdown1 and Dropdown2 will only return rows that have both of what is selected in common). This is not a problem. However, Dropdown3 has has two types of data in it, and the checked items need to be ORd together, then ANDed with the rest of the query.
Due to the large amount of rows it is operating over, it keeps timing out. Since there are some additional joins that need to happen, it is somewhat tricky. Here is my code, with the table names replaced:
//The end list has driver ids in it--but the data comes from two different places. Build a list of all the driver ids.
driverIds = db.CarDriversManyToManyTable.Where(
cd =>
filter.CarIds.Contains(cd.CarId) && //get driver IDs for each car ID listed in filter object
).Select(cd => cd.DriverId).Distinct().ToList();
driverIds = driverIds.Concat(
db.DriverShopManyToManyTable.Where(ds => filter.ShopIds.Contains(ds.ShopId)) //Get driver IDs for each Shop listed in filter object
.Select(ds => ds.DriverId)
.Distinct()).Distinct().ToList();
//Now we have a list solely of driver IDs
//The query operates over the Driver table. The query is built up like this for each item in the UI. Changing from Linq is not an option.
query = query.Where(d => driverIds.Contains(d.Id));
How can I streamline this query so that I don't have to retrieve thousands and thousands of IDs into memory, then feed them back into SQL?
There are several ways to produce a single SQL query. All they require to keep the parts of the query of type IQueryable<T>, i.e. do not use ToList, ToArray, AsEnumerable etc. methods that force them to be executed and evaluated in memory.
One way is to create Union query containing the filtered Ids (which will be unique by definition) and use join operator to apply it on the main query:
var driverIdFilter1 = db.CarDriversManyToManyTable
.Where(cd => filter.CarIds.Contains(cd.CarId))
.Select(cd => cd.DriverId);
var driverIdFilter2 = db.DriverShopManyToManyTable
.Where(ds => filter.ShopIds.Contains(ds.ShopId))
.Select(ds => ds.DriverId);
var driverIdFilter = driverIdFilter1.Union(driverIdFilter2);
query = query.Join(driverIdFilter, d => d.Id, id => id, (d, id) => d);
Another way could be using two OR-ed Any based conditions, which would translate to EXISTS(...) OR EXISTS(...) SQL query filter:
query = query.Where(d =>
db.CarDriversManyToManyTable.Any(cd => d.Id == cd.DriverId && filter.CarIds.Contains(cd.CarId))
||
db.DriverShopManyToManyTable.Any(ds => d.Id == ds.DriverId && filter.ShopIds.Contains(ds.ShopId))
);
You could try and see which one performs better.
The answer to this question is complex and has many facets that, individually, may or may not help in your particular case.
First of all, consider using pagination. .Skip(PageNum * PageSize).Take(PageSize) I doubt your user needs to see millions of rows at once in the front end. Show them only 100, or whatever other smaller number seems reasonable to you.
You've mentioned that you need to use joins to get the data you need. These joins can be done while forming your IQueryable (entity framework), rather than in-memory (linq to objects). Read up on join syntax in linq.
HOWEVER - performing explicit joins in LINQ is not the best practice, especially if you are designing the database yourself. If you are doing database first generation of your entities, consider placing foreign-key constraints on your tables. This will allow database-first entity generation to pick those up and provide you with Navigation Properties which will greatly simplify your code.
If you do not have any control or influence over the database design, however, then I recommend you construct your query in SQL first to see how it performs. Optimize it there until you get the desired performance, and then translate it into an entity framework linq query that uses explicit joins as a last resort.
To speed such queries up, you will likely need to perform indexing on all of the "key" columns that you are joining on. The best way to figure out what indexes you need to improve performance, take the SQL query generated by your EF linq and bring it on over to SQL Server Management Studio. From there, update the generated SQL to provide some predefined values for your #p parameters just to make an example. Once you've done this, right click on the query and either use display estimated execution plan or include actual execution plan. If indexing can improve your query performance, there is a pretty good chance that this feature will tell you about it and even provide you with scripts to create the indexes you need.
It looks to me that using the instance versions of the LINQ extensions is creating several collections before you're done. using the from statement versions should cut that down quite a bit:
driveIds = (from var record in db.CarDriversManyToManyTable
where filter.CarIds.Contains(record.CarId)
select record.DriverId).Concat
(from var record in db.DriverShopManyToManyTable
where filter.ShopIds.Contains(record.ShopId)
select record.DriverId).Distinct()
Also using the groupby extension would give better performance than querying each driver Id.
I have a Linq2SQL query running against a database provided by a third party. The main part of the query looks like this:
var valuationQuery =
from v in context.Valuations
where v.ModelId == QualifiedModelId.ModelId
&& v.QualifyModelId == QualifiedModelId.Qualifier
&& v.LanguageCode == QualifiedModelId.LanguageCode
&& v.Edition == Data.Meta.Edition.CurrentEdition.Date
&& v.RegDate == yearReg.RegistrationDate
&& v.ValTypeDescription == "Normal"
&& v.MileageBandID == MileageBand
When I loop around it with a foreach loop, it works or fails depending on the select at the end. When the select specifies all the fields like this...
select new
{
v.Value1,
v.Value2,
v.Value3,
... snip ...
v.Value14,
v.Value15,
v.ValueTypeID
};
... it works normally. When I do the following, the loop iterates the correct amount of times, but brings back the first record every time:
select v;
I want to be able to select all the fields without specifying the names in case the vendor adds more fields (they really are called "Value1" upto "Value15"), which means I can change one constant in my code and update the DBML and all the relevant code will lookup from the correct amount of fields. This query (and similar ones) are used in various places, so I'm looking for the least-effort in the future!
I have run SQL Profiler to ensure the query being run returns the correct results, and it does.
The foreach loop is this (the Convert's are due to the very badly designed database having very inconsistent datatypes):
foreach (var record in valuationQuery)
{
int CurrentValType = Convert.ToInt32(record.ValueTypeID);
string FieldName = "Value";
if (MPLower.MileagePointID >= 1 && MPLower.MileagePointID <= MaxMileagePoints)
{
FieldName = "Value" + MPLower.MileagePointID.ToString().Trim();
MPLower.MileagePointPounds[CurrentValType] = Convert.ToInt32(record.GetType().GetProperty(FieldName).GetValue(record, null));
}
if (MPHigher.MileagePointID >= 1 && MPHigher.MileagePointID <= MaxMileagePoints)
{
FieldName = "Value" + MPHigher.MileagePointID.ToString().Trim();
MPHigher.MileagePointPounds[CurrentValType] = Convert.ToInt32(record.GetType().GetProperty(FieldName).GetValue(record, null));
}
}
I'm new to C# (I've inherited some code), so I realise it's probably something stupid I've done or not done!! Please can anyone help?
Identity Maps
A common problem when using ORMs (ok, maybe not exactly common, but it can happen) is that a query which you expect to return distinct records ends up returning multiple copies of the same record. This is often caused by the ORM's Identity Map. Here's a quick overview of how it normally works.
The identity map is essentially an object cache, based on the primary key of each object.
When you ask the ORM for a record with a certain primary key, it'll check to see if the record already exists in the identity map. If it does already exist, it'll return the existing record.
This is usually pretty handy, but sometimes things go wrong. If two objects have the same primary key, or if you haven't specified what the primary keys are (forcing the ORM to guess), the identity map can't distinguish between them, and the first one will always be returned.
The moral of the story is, always double-check your primary keys if you are seeing multiple copies of the same record where you shouldn't be!
In this question, the code
select new
{
v.Value1,
v.Value2,
v.Value3,
... snip ...
v.Value14,
v.Value15,
v.ValueTypeID
};
works because you are using the select new{} projection to return an anonymous type, which bypasses the identity map.
select v
selects the objects directly, which does use the identity map, causing the problem you were seeing.
My database structure is this: an OptiUser belongs to multiple UserGroups through the IdentityMap table, which is a matching table (many to many) with some additional properties attached to it. Each UserGroup has multiple OptiDashboards.
I have a GUID string which identifies a particular user (wlid in this code). I want to get an IEnumerable of all of the OptiDashboards for the user identified by wlid.
Which of these two Linq-to-Entities queries is the most efficient? Do they run the same way on the back-end?
Also, can I shorten option 2's Include statements to just .Include("IdentityMaps.UserGroup.OptiDashboards")?
using (OptiEntities db = new OptiEntities())
{
// option 1
IEnumerable<OptiDashboard> dashboards = db.OptiDashboards
.Where(d => d.UserGroups
.Any(u => u.IdentityMaps
.Any(i => i.OptiUser.WinLiveIDToken == wlid)));
// option 2
OptiUser user = db.OptiUsers
.Include("IdentityMaps")
.Include("IdentityMaps.UserGroup")
.Include("IdentityMaps.UserGroup.OptiDashboards")
.Where(r => r.WinLiveIDToken == wlid).FirstOrDefault();
// then I would get the dashboards through user.IdentityMaps.UserGroup.OptiDashboards
// (through foreach loops...)
}
You may be misunderstanding what the Include function actually does. Option 1 is purely a query syntax which has no effect on what is returned by the entity framework. Option 2, with the Include function instructs the entity framework to Eagerly Fetch the related rows from the database when returns the results of the query.
So option 1 will result in some joins, but the "select" part of the query will be restricted to the OptiDashboards table.
Option 2 will result in joins as well, but in this case it will be returning the results from all the included tables, which obviously is going to introduce more of a performance hit. But at the same time, the results will include all the related entities you need, avoiding the [possible] need for more round-trips to the database.
I think the Include will render as joins an you will the able to access the data from those tables in you user object (Eager Loading the properties).
The Any query will render as exists and not load the user object with info from the other tables.
For best performance if you don't need the additional info use the Any query
As has already been pointed out, the first option would almost certainly perform better, simply because it would be retrieving less information. Besides that, I wanted to point out that you could also write the query this way:
var dashboards =
from u in db.OptiUsers where u.WinLiveIDToken == wlid
from im in u.IdentityMaps
from d in im.UserGroup.OptiDashboards
select d;
I would expect the above to perform similarly to the first option, but you may (or may not) prefer the above form.