Enhance performance with for loop and linq query - c#

i have an object array and i need to get some object Info from database based on certain parameters, i am implementing this:
public IList<object[]> GetRelevants(Registro[] records)
{
List<object[]> returnList = new List<object[]>();
using (var session = SessionFactory.OpenStatelessSession())
{
for (int kr = 0; kr < records.Length; kr++)
{
Registro record = records[kr];
var query = session.QueryOver<Registro>()
.Where(sb => sb.Asunto == record.Asunto)
.And(sb => sb.FechaInicial == record.FechaInicial)
.And(sb => sb.FechaFinal == record.FechaFinal)
.And(sb => sb.FoliosDesde == record.FoliosDesde)
.And(sb => sb.FoliosHasta == record.FoliosHasta)
.And(sb => sb.TomoNumero == record.TomoNumero)
.And(sb => sb.TomoTotal == record.TomoTotal)
.And(sb => sb.SerieCodigo == record.SerieCodigo)
.And(sb => sb.Caja == record.Caja)
.And(sb => sb.Carpeta == record.Carpeta).SelectList(list => list
.Select(p => p.Id)
.Select(p => p.NuevaCaja)
.Select(p => p.NuevaCarpeta)
.Select(p => p.Periodo));
var result = query.SingleOrDefault<object[]>();
returnList.Add(result);
}
}
return returnList;
}
In records however, there are more than 10000 items so NHibernate takes about 10 minutes to do that.
Is there any way to enhance performance in this?

A good solution would probably be to ditch NHibernate for this task and insert your data into a temporary table, then join on that temporary table.
However, if you want to use NHibernate, you could speed this up by not issuing 10,000 separate queries (which is what's happening now). You could try to break your query into reasonably sized chunks instead:
List<object[]> ProcessChunk(
IStatelessSession session,
int start,
IEnumerable<Registro> currentChunk)
{
var disjunction = Restrictions.Disjunction();
foreach (var item in currentChunk)
{
var restriction = Restrictions.Conjunction()
.Add(Restrictions.Where<Registro>(t => t.Asunto == item.Asunto))
/* etc, calling .Add(..) for every field you want to include */
disjunction.Add(restriction);
}
return session.QueryOver<Registro>()
.Where(disjunction)
.SelectList(list => list
.Select(t => t.Id)
/* etc, the rest of the select list */
.List<object[]>()
.ToList();
}
Then call that method from your main loop:
const int chunkSize = 500;
for (int kr = 0; kr < records.Length; kr += chunkSize)
{
var currentChunk = records.Skip(i).Take(chunkSize);
resultList.AddRange(ProcessChunk(session, i, currentChunk));
}
What you're doing here is issuing 20 (instead of 10,000) queries that look like:
select
/* select list */
from
[Registro]
where
([Registro].[Asunto] = 'somevalue' and ... and ... and .. ) or
([Registro].[Asunto] = 'someothervalue' and ... and ... and ... )
/* x500, one for each item in the chunk */
and so on. Each query will return up to 500 records if 500 is the size of each chunk.
This is still not going to be blazing fast. My local test about halved the running time.
Depending on your database engine, you might quickly run up against a maximum number of parameters that you can pass. You'll probably have to play with the chunkSize to get it to work.
You could probably get this down to a few seconds if you used a temporary table and ditched NHibernate though.

Related

Joining table to a list using Entity Framework

I have the following Entity Framework function that it joining a table to a list. Each item in serviceSuburbList contains two ints, ServiceId and SuburbId.
public List<SearchResults> GetSearchResultsList(List<ServiceSuburbPair> serviceSuburbList)
{
var srtList = new List<SearchResults>();
srtList = DataContext.Set<SearchResults>()
.AsEnumerable()
.Where(x => serviceSuburbList.Any(m => m.ServiceId == x.ServiceId &&
m.SuburbId == x.SuburbId))
.ToList();
return srtList;
}
Obviously that AsEnumerable is killing my performance. I'm unsure of another way to do this. Basically, I have my SearchResults table and I want to find records that match serviceSuburbList.
If serviceSuburbList's length is not big, you can make several Unions:
var table = DataContext.Set<SearchResults>();
IQuerable<SearchResults> query = null;
foreach(var y in serviceSuburbList)
{
var temp = table.Where(x => x.ServiceId == y.ServiceId && x.SuburbId == y.SuburbId);
query = query == null ? temp : query.Union(temp);
}
var srtList = query.ToList();
Another solution - to use Z.EntityFramework.Plus.EF6 library:
var srtList = serviceSuburbList.Select(y =>
ctx.Customer.DeferredFirstOrDefault(
x => x.ServiceId == y.ServiceId && x.SuburbId == y.SuburbId
).FutureValue()
).ToList().Select(x => x.Value).Where(x => x != null).ToList();
//all queries together as a batch will be sent to database
//when first time .Value property will be requested

How to aggregate and SUM EntityFramework fields with multiple joins

I am able to produce a set of results that are desirable, but I have the need to group and sum of these fields and am struggling to understand how to approach this.
In my scenario, what would be the best way to get results that will:
Have a distinct [KeyCode] (right now I get many records, same KeyCode
but different occupation details)
SUM wage and projection fields (in same query)
Here is my LINQ code:
private IQueryable<MyAbstractCustomOccupationInfoClass> GetMyAbstractCustomOccupationInfoClass(string[] regionNumbers)
{
//Get a list of wage data
var wages = _db.ProjectionAndWages
.Join(
_db.HWOLInformation,
wages => wages.KeyCode,
hwol => hwol.KeyCode,
(wages, hwol) => new { wages, hwol }
)
.Where(o => regionNumbers.Contains(o.hwol.LocationID))
.Where(o => o.wages.areaID.Equals("48"))
.Where(o => regionNumbers.Contains(o.wages.RegionNumber.Substring(4))); //regions filter, remove first 4 characters (0000)
//Join OccupationInfo table to wage data, for "full" output results
var occupations = wages.Join(
_db.OccupationInfo,
o => o.wages.KeyCode,
p => p.KeyCode,
(p, o) => new MyAbstractCustomOccupationInfoClass
{
KeyCode = o.KeyCode,
KeyTitle = o.KeyTitle,
CareerField = o.CareerField,
AverageAnnualOpeningsGrowth = p.wages.AverageAnnualOpeningsGrowth,
AverageAnnualOpeningsReplacement = p.wages.AverageAnnualOpeningsReplacement,
AverageAnnualOpeningsTotal = p.wages.AverageAnnualOpeningsTotal,
});
//TO-DO: How to Aggregate and Sum "occupations" list here & make the [KeyCode] Distinct ?
return occupations;
}
I am unsure if I should perform the Grouping mechanism on the 2nd join? Or perform a .GroupJoin()? Or have a third query?
var occupations = _db.OccupationInfo.GroupJoin(
wages,
o => o.KeyCode,
p => p.wages.KeyCode,
(o, pg) => new MyAbstractCustomOccupationInfoClass {
KeyCode = o.KeyCode,
KeyTitle = o.KeyTitle,
CareerField = o.CareerField,
AverageAnnualOpeningsGrowth = pg.Sum(p => p.wages.AverageAnnualOpeningsGrowth),
AverageAnnualOpeningsReplacement = pg.Sum(p => p.wages.AverageAnnualOpeningsReplacement),
AverageAnnualOpeningsTotal = pg.Sum(p => p.wages.AverageAnnualOpeningsTotal),
});

Entity framework use already selected value saved in new variable later in select sentance

I wrote some entity framework select:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id) * count ...
});
I have always select total:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id) * a.OtherTable.Where(b => b.id == id).Sum(c => c.value)
});
Is it possible to select it like in my first example, because I have already retrieved the value (and how to do that) or should I select it again?
One possible approach is to use two successive selects:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id)
})
.Select(x => new
{
count = x.count,
total = x.total * x.count
};
You would simple do
var listFromDatabase = context.MyTable;
var query1 = listFromDatabase.Select(a => // do something );
var query2 = listFromDatabase.Select(a => // do something );
Although to be fair, Select requires you to return some information, and you aren't, you're somewhere getting count & total and setting their values. If you want to do that, i would advise:
var listFromDatabase = context.MyTable.ToList();
listFromDatabase.ForEach(x =>
{
count = do_some_counting;
total = do_some_totalling;
});
Note, the ToList() function stops it from being IQueryable and transforms it to a solid list, also the List object allows the Linq ForEach.
If you're going to do complex stuff inside the Select I would always do:
context.MyTable.AsEnumerable()
Because that way you're not trying to still Query from the database.
So to recap: for the top part, my point is get all the table contents into variables, use ToList() to get actual results (do a workload). Second if trying to do it from a straight Query use AsEnumerable to allow more complex functions to be used inside the Select

Grouping records using LINQ method syntax

I'm trying to find a way to group records by date (not taking into account time) using the LINQ method syntax but only select one instance of each record (which is ItemId within the model)
My simple query is as follows:
range1.Count(x => ((x.OpenedDate >= todayFirst) && (x.OpenedDate <= todayLast))
How could I count the unique records within this range by ItemId?
Sounds like you want:
var query = range1.Where(x.OpenedDate >= todayFirst && x.OpenedDate <= todayLast)
.GroupBy(x => x.ItemId)
.Select(g => new { ItemId = g.Key, Count = g.Count() });
foreach (var result in query)
{
Console.WriteLine("{0} - {1}", result.ItemId, result.Count);
}
It's possible that I haven't really understood you properly though - it's not clear whether you really want to group by date or item ID
EDIT: If you just want the count of distinct item IDs in that range, you can use:
var count = range1.Where(x.OpenedDate >= todayFirst && x.OpenedDate <= todayLast)
.Select(x => x.ItemId)
.Distinct()
.Count();

Split query into multiple queries and then join the results

I have this function below that takes a list of id's and searches the DB for the matching persons.
public IQueryable<Person> GetPersons(List<int> list)
{
return db.Persons.Where(a => list.Contains(a.person_id));
}
The reason I need to split this into four queries is because the query can't take more than 2100 comma-separated values:
The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
How can I split the list into 4 pieces and make a query for each list. Then join the results into one list of persons?
Solved
I don't want to post it as an own answer and take cred away from #George Duckett's answer, just show the solution:
public IQueryable<Person> GetPersons(List<int> list)
{
var persons = Enumerable.Empty<Person>().AsQueryable<Person>();
var limit = 2000;
var result = list.Select((value, index) => new { Index = index, Value = value })
.GroupBy(x => x.Index / limit)
.Select(g => g.Select(x => x.Value).ToList())
.ToList();
foreach (var r in result)
{
var row = r;
persons = persons.Union(db.Persons.Where(a => row.Contains(a.person_id)));
}
return persons;
}
See this answer for splitting up your list: Divide a large IEnumerable into smaller IEnumerable of a fix amount of item
var result = list.Select((value, index) => new { Index = index, Value = value})
.GroupBy(x => x.Index / 5)
.Select(g => g.Select(x => x.Value).ToList())
.ToList();
Then do a foreach over the result (a list of lists), using the below to combine them.
See this answer for combining the results: How to combine Linq query results
I am not sure why you have a method like this. What exactly are you trying to do. Anyway you can do it with Skip and Take methods that are used for paging.
List<Person> peopleToReturn = new List<Person>();
int pageSize = 100;
var idPage = list.Skip(0).Take(pageSize).ToList();
int index = 1;
while (idPage.Count > 0)
{
peopleToReturn.AddRange(db.Persons.Where(a => idPage.Contains(a.person_id)).ToList());
idPage = list.Skip(index++ * pageSize).Take(pageSize).ToList();
}

Categories

Resources