Too much data in Contains (Linq): How to increase performace - c#

I have a linq query like this:
from a in context.table_A
join b in
(from temp in context.table_B
where idlist.Contains(temp.id)
select temp)
on a.seq_id equals b.seq_id into c
where
idlist.Contains(a.id)
select new MyObject
{
...
}).ToList();
idlist is List
The problem I have is that the idlist has too many values (hundreds of thousands to several million records). It works fine with few records but when there are too many records, the contains function is error.
Error log is
The query plan cannot be created due to lack of internal resources of
the query processor. This is a rare event and only occurs for very
complex queries or queries that reference a very large number of
tables or partitions. Make the query easy.
I want to improve the performance of this section. Any ideas?

I would suggest to install extension linq2db.EntityFrameworkCore and use temporary tables with fast BulkCopy
public class IdItem
{
public int Id { get; set; }
}
...
var items = idList.Select(id => new IdItem { Id = id });
using var db = context.CreateLinqToDBConnection();
using var temp = db.CreateTempTable("#IdList", items);
var query =
from a in context.table_A
join id1 in temp on a.Id equals id1.Id
join b in context.table_B on a.seq_id equals b.seq_id
join id2 in temp on b.Id equals id2.Id
select new MyObject
{
...
};
// switch to alternative translator
var query = query.ToLinqToDB();
var result = query.ToList();

Related

Linq Join query returning empty dataset

I am using below code to join two tables based on officeId field. Its retuning 0 records.
IQueryable<Usage> usages = this.context.Usage;
usages = usages.Where(usage => usage.OfficeId == officeId);
var agencyList = this.context.Agencies.ToList();
var usage = usages.ToList();
var query = usage.Join(agencyList,
r => r.OfficeId,
a => a.OfficeId,
(r, a) => new UsageAgencyApiModel () {
Id = r.Id,
Product = r.Product,
Chain = a.Chain,
Name = a.Name
}).ToList();
I have 1000+ records in agencies table and 26 records in usage table.
I am expecting 26 records as a result with chain and name colums attached to result from agency table.
Its not returning anything. I am new to .net please guide me if I am missing anything
EDIT
#Tim Schmelter's solution works fine if I get both table context while executing join. But I need to add filter on top of usage table before applying join
IQueryable<Usage> usages = this.context.Usage;
usages = usages.Where(usage => usage.OfficeId == officeId);
var query = from a in usages
// works with this.context.usages instead of usages
join u in this.context.Agencies on a.OfficeId equals u.OfficeId
select new
{
Id = a.Id,
Product = a.Product,
Chain = u.Chain,
Name = u.Name
};
return query.ToList();
Attaching screenshot here
same join query works fine with in memory data as you see below
Both ways works fine if I add in memory datasource or both datasource directly. But not working if I add filter on usages based on officeId before applying join query
One problem ist that you load all into memory first(ToList()).
With joins i prefer query syntax, it is less verbose:
var query = from a in this.context.Agencies
join u in this.context.Usage on a.OfficeId equals u.OfficeId
select new UsageAgencyApiModel()
{
Id = u.Id,
Product = u.Product,
Chain = a.Chain,
Name = a.Name
};
List<UsageAgencyApiModel> resultList = query.ToList();
Edit: You should be able to apply the Where after the Join. If you still don't get records there are no matching:
var query = from a in this.context.Agencies
join u in this.context.Usage on a.OfficeId equals u.OfficeId
where u.OfficeId == officeId
select new UsageAgencyApiModel{ ... };
The following code can help to get the output based on the ID value.
Of course, I wrote with Lambda.
var officeId = 1;
var query = context.Agencies // your starting point - table in the "from" statement
.Join(database.context.Usage, // the source table of the inner join
agency => agency.OfficeId, // Select the primary key (the first part of the "on" clause in an sql "join" statement)
usage => usage.OfficeId , // Select the foreign key (the second part of the "on" clause)
(agency, usage) => new {Agency = agency, Usage = usage }) // selection
.Where(x => x.Agency.OfficeId == id); // where statement

Query multiple tables c# linq efficiently

I have a database with about 20 table, all of them have some columns which are same, eg, name, cost, year manufactured etc. I need to query those tables. what is best, most efficient way. tables and data have to stay as they are.
Here is what I am doing right now.
var table1 = (from a in _entities.ta1
join b in _entities.subTa on a.tid equals b.Id
select new
{
id = a.id,
name = a.name,
type = a.type,
}).ToList();
var table2 = (from a in _entities.ta2
join b in _entities.subTa2 on a.tid equals b.Id
select new
{
id = a.id,
name = a.name,
type = a.type,
}).ToList();
var abc = table1;
abc.AddRange(table2)
var list = new List<MyClass>();
foreach(var item in abc)
{
var classItem = new MyClass();
classItem.id = item.id;
classItem.name = item.name;
classItem.type = item.type;
list.Add(classItem);
}
return list;
This needs to be done to many table, which is not very efficient coding.
How can I improve this code?
You could use that, assuming that data types for all columns do match.
List<MyClass> list =
( //table1
from a in _entities.ta1
join b in _entities.subTa
on a.tid equals b.Id
select new MyClass()
{
id = a.id,
name = a.name,
type = a.type
})
.Concat(( //table2 (use .Union instead of .Concat if you wish to eliminate duplicate rows)
from a in _entities.ta2
join b in _entities.subTa2
on a.tid equals b.Id
select new MyClass()
{
id = a.id,
name = a.name,
type = a.type
}
)).ToList();
return list;
The reason why this could be more efficient is that it
groups all queries into one database query, thus causes less database traffic,
lets all the computation happen on the database server, therefore eliminating duplicate computation in the OP code, and
optimizes memory usage as it does NOT create a list of anonymous objects and then adding them to a new List<MyClass>.
This is the answer i was looking or. this is not ideal but it got the job done
var tables= new Dictionary<string, string>();
tables.Add("table1", "subTable1");
tables.Add("table2", "subTable2");
foreach (var table in tables)
{
var tableName = table.Key;
var subName= table.Value;
var data = _entities.Database.SqlQuery<MyClass>($#"select
a.Id,a.Name,b.subName from {tableName} a left join {subName} b on
a.subId=b.Id").ToList();
}
Thank you everyone for your contribution

How to make EF efficiently call an aggregate function?

I'm trying to write a LINQ-to-entities query that will take an ICollection navigation property of my main object and attach some metadata to each of them which is determined through joining each of them to another DB table and using an aggregate function. So the main object is like this:
public class Plan
{
...
public virtual ICollection<Room> Rooms { get; set; }
}
And my query is this:
var roomData = (
from rm in plan.Rooms
join conf in context.Conferences on rm.Id equals conf.RoomId into cjConf
select new {
RoomId = rm.Id,
LastUsedDate = cjConf.Count() == 0 ? (DateTime?)null : cjConf.Max(conf => conf.EndTime)
}
).ToList();
What I want is for it to generate some efficient SQL that uses the aggregate function MAX to calculate the LastUsedDate, like this:
SELECT
rm.Id, MAX(conf.EndTime) AS LastUsedDate
FROM
Room rm
LEFT OUTER JOIN
Conference conf ON rm.Id = conf.RoomId
WHERE
rm.Id IN ('a967c9ce-5608-40d0-a586-e3297135d847', '2dd6a82d-3e76-4441-9a40-133663343d2b', 'bb302bdb-6db6-4470-a24c-f1546d3e6191')
GROUP BY
rm.id
But when I profile SQL Server it shows this query from EF:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[RoomId] AS [RoomId],
[Extent1].[ProviderId] AS [ProviderId],
[Extent1].[StartTime] AS [StartTime],
[Extent1].[EndTime] AS [EndTime],
[Extent1].[Duration] AS [Duration],
[Extent1].[ParticipantCount] AS [ParticipantCount],
[Extent1].[Name] AS [Name],
[Extent1].[ServiceType] AS [ServiceType],
[Extent1].[Tag] AS [Tag],
[Extent1].[InstantMessageCount] AS [InstantMessageCount]
FROM [dbo].[Conference] AS [Extent1]
So it is selecting everything from Conference and doing the Max() calculation in memory, which is very inefficient. How can I get EF to generate the proper SQL query with the aggregate function in?
The equivalent LINQ to Entities query which closely translates to the SQL query you are after is like this:
var roomIds = plan.Rooms.Select(rm => rm.Id);
var query =
from rm in context.Rooms
join conf in context.Conferences on rm.Id equals conf.RoomId
into rmConf from rm in rmConf.DefaultIfEmpty() // left join
where roomIds.Contains(rm.Id)
group conf by rm.Id into g
select new
{
RoomId = g.Key,
LastUsedDate = g.Max(conf => (DateTime?)conf.EndTime)
};
The trick is to start the query from EF IQueryable, thus allowing it to be fully translated to SQL, rather than from plan.Rooms as in the query in question which is IEnumerable and makes the whole query execute in memory (context.Conferences is treated as IEnumerable and causes loading the whole table in memory).
The SQL IN clause is achieved by in memory IEnumerable<Guid> and Contains method.
Finally, there is no need to check the count. SQL naturally handles nulls, all you need is to make sure to call the nullable Max overload, which is achieved with the (DateTime?)conf.EndTime cast. There is no need to check conf for null as in LINQ to Objects because LINQ to Entities/SQL handles that naturally as well (as soon the receiver variable is nullable).
Since plan.Rooms isn't IQueryable with a query provider attached, the join statement is compiled as Enumarable.Join. This means that context.Conferences is implicitly cast to IEumerable and its content is pulled into memory before other operators are applied to it.
You can fix this by not using join:
var roomIds = plan.Rooms.Select(r => r.Id).ToList();
var maxPerRoom = context.Conferences
.Where(conf => roomIds.Contains(conf.RoomId))
.GroupBy(conf => conf.RoomId)
.Select(g => new
{
RoomId = g.Key,
LastUsedDate = g.Select(conf => conf.EndTime)
.DefaultIfEmpty()
.Max()
}
).ToList();
var roomData = (
from rm in plan.Rooms
join mx in maxPerRoom on rm.Id equals mx.RoomId
select new
{
RoomId = rm.Id,
LastUsedDate = mx.LastUsedDate
}
).ToList();
This first step collects the LastUsedDate data from the context and then joins with the plan.Rooms collection in memory. This last step isn't even necessary if you're not interested in returning/displaying anything else than the room's Id, but that's up to you.

Search method joing multiple tables

I need help with a search method for searching the tables for a matching text.
This works, except that the join needs to be LEFT OUTER JOIN otherwise I dont get any results if the pageId is missing in any of the tables.
This solution takes to long time to run, I would appreciate if someone can help me out with a better solution to handle this task.
public async Task<IEnumerable<Result>> Search(string query)
{
var temp = await (from page in _context.Pages
join pageLocation in _context.PageLocations on page.Id equals pageLocation.PageId
join location in _context.Locations on pageLocation.LocationId equals location.Id
join pageSpecialty in _context.PageSpecialties on page.Id equals pageSpecialty.PageId
join specialty in _context.Specialties on pageSpecialty.SpecialtyId equals specialty.Id
where
page.Name.ToLower().Contains(query)
|| location.Name.ToLower().Contains(query)
|| specialty.Name.ToLower().Contains(query)
select new Result
{
PageId = page.Id,
Name = page.Name,
Presentation = page.Presentation,
Rating = page.Rating
}).ToListAsync();
var results = new List<Result>();
foreach (var t in temp)
{
if (!results.Exists(p => p.PageId == t.PageId))
{
t.Locations = GetLocations(t.PageId);
t.Specialties = GetSpecialties(t.PageId);
results.Add(t);
}
}
return results;
}
Using navigation properties, the query could look like:
var temp = await (from page in _context.Pages
where Name.Contains(query)
|| page.PageLocation.Any(pl => pl.Location.Name.Contains(query))
|| page.PageSpecialties.Any(pl => pl.Specialty.Name.Contains(query))
select new Result
{
PageId = page.Id,
Name = page.Name,
Presentation = page.Presentation,
Rating = page.Rating,
Locations = page.PageLocation.Select(pl => pl.Location),
Specialties = page.PageSpecialties.Select(pl => pl.Specialty)
}).ToListAsync();
This has several benefits:
By the absence of joins, The query returns unique Result objects right away, so you don't need to deduplicate them afterwards.
The locations and specialties are loaded in the same query instead of two queries per Result (aka n+1 problem).
(Likely) ToLower is removed because the search is probably not case sensitive anyway. The query is executed as SQL and most of the times, SQL databases have case-insensitive collations. Removing ToLower makes the query sargable again.

Dynamic Join of multiple entities based on some filter in Entity Framework

I am pretty new to Entity Framework and LINQ and I have an entity with more than 10+ other associated entities (one-to-many relationships). Now, I'm planning to make a search page in my application in which users could select which fields (i.e. those 10+ tables) they want to be considered when searching.
Now, I'm trying to write a query to achieve the above goal. Any help how I could sort this out using LINQ method syntax? I mean, to write a multiple join query based on user's choice. (i.e. which of Class1, Class2, ... to join with main Entity to finally have all the related fields in one place). Below is a sample code (Just a hunch, in fact)
if(somefilter#1)
result = db.Companies.Join(db.Channels, p => p.Id, k => k.CId,
(p, k) => new {Company = p, Channels=k});
if(somefilter#2)
result = result.Join(db.BusinnessType, ........);
if(somefilter#3)
result = result.Join(db.Values, .......);
For complex queries it may be easier to use the other LINQ notation. You could join multiple entities like this:
from myEntity in dbContext.MyEntities
join myOtherEntity in dbContext.MyOtherEntities on myEntity.Id equals myOtherEntity.MyEntityId
join oneMoreEntity in dbContext.OneMoreEntities on myEntity.Id equals oneMoreEntity.MyEntityId
select new {
myEntity.Id,
myEntity.Name,
myOtherEntity.OtherProperty,
oneMoreEntity.OneMoreProperty
}
You can join in other entities by adding more join statements.
You can select properties of any entity from your query. The example I provided uses a dynamic class, but you can also define a class (like MyJoinedEntity) into which you can select instead. To do it you would use something like:
...
select new MyJoinedEntity {
Id = myEntity.Id,
Name = myEntity.Name,
OtherProperty = myOtherEntity.OtherProperty,
OneMoreProperty = oneMoreEntity.OneMoreProperty
}
EDIT:
In case when you want to have conditional joins you can define MyJoinedEntity with all the properties you will need if you were to join everything. Then break up the join into multiple methods. Like this:
public IEnumerable<MyJoinedEntity> GetEntities() {
var joinedEntities = from myEntity in dbContext.MyEntities
join myOtherEntity in dbContext.MyOtherEntities on myEntity.Id equals myOtherEntity.MyEntityId
join oneMoreEntity in dbContext.OneMoreEntities on myEntity.Id equals oneMoreEntity.MyEntityId
select new MyJoinedEntity {
Id = myEntity.Id,
Name = myEntity.Name,
OtherProperty = myOtherEntity.OtherProperty,
OneMoreProperty = oneMoreEntity.OneMoreProperty
};
if (condition1) {
joinedEntities = JoinWithRelated(joinedEntities);
}
}
public IEnumerable<MyJoinedEntity> JoinWithRelated(IEnumerable<MyJoinedEntity> joinedEntities) {
return from joinedEntity in joinedEntities
join relatedEntity in dbContext.RelatedEntities on joinedEntity.Id equals relatedEntity.MyEntityId
select new MyJoinedEntity(joinedEntity) {
Comments = relatedEntity.Comments
};
}

Categories

Resources