Linq performance query - c#

I have this query that gives the correct results but it takes about 15 seconds to run
int Count= P.Pets.Where(c => !P.Pets.Where(a => a.IsOwned == true)
.Select(a => a.OwnerName).Contains(c.OwnerName) && c.CreatedDate >=
EntityFunctions.AddDays(DateTime.Now, -8)).GroupBy(b=>b.OwnerName).Count();
If I remove this part of the linq
'&& c.CreatedDate >= EntityFunctions.AddHours(DateTime.Now, -8)'
It only takes about 3 seconds to run. How can I keep the same condition happening but a lot faster?
I need that date criteria because I don't want any Classeses that were created 8 days old to be included in the count
Edit
I have a table by the name of People which is referred to in this query as P and I want to return a count of the total of Pets they are that do not have a owner and remove the ones from the query that don't do have an owner even if they exist in another Pet reference has not the owner of that Pet. Meaning if a person has at least one record in the Pets table to be considered as an owner of a pet than I want to remove all cases where that person exist in the return query and once that is done only return the Pets that have been created newer than 8 days

You should cache the date and put that evaluation first (since the DateTime evaluation should be faster than a Contains evaluation). Also avoid recalculating the same query multiple times.
DateTime eightDaysOld = EntityFunctions.AddHours(DateTime.Now, -8);
//calculate these independently from the rest of the query
var ownedPetOwnerNames = P.Pets.Where(a => a.IsOwned == true)
.Select(a => a.OwnerName);
//Evaluate the dates first, it should be
//faster than Contains()
int Count = P.Pets.Where(c => c.CreatedDate >= eightDaysOld &&
//Using the cached result should speed this up
ownedPetOwnerNames.Contains(c.OwnerName))
.GroupBy(b=>b.OwnerName).Count();
That should return the same results. (I hope)

You are loosing any ability to use indices with that snippet, as it calculates that static date for every row. Declare a DateTime variable before your query and set it to DateTime.Now.AddHours(-8) and use the variable instead of your snippet in the where clause.

By separating the query and calling ToList() on it and inserting it in the master query make it go 4 times faster
var ownedPetOwnerNames = P.Pets.Where(a => a.IsOwned == true)
.Select(a => a.OwnerName).ToList();
int Count = P.Pets.Where(c => c.CreatedDate >= Date&&
ownedPetOwnerNames.Contains(c.OwnerName)).GroupBy(b=>b.OwnerName).Count();

You could use (and maybe first create) a navigation property Pet.Owner:
var refDate = DateTime.Today.AddDays(-8);
int Count= P.Pets
.Where(p => !p.Owner.Pets.Any(p1 => p1.IsOwned)
&& p.CreatedDate >= refDate)
.GroupBy(b => b.OwnerName).Count();
This may increase performance because the Contains is gone. At least it is better scalable than your two-phase query with a Contains involving an unpredictable number of strings.
Of course you also need to make sure there is an index on CreatedDate.

Related

How to find largest property in List?

I have linq expression "Where" that may returns several rows:
var checkedPrices = prices.Where(...).ToList();
As there are several rows, retrieves from db => i want to take the largest string from this list of rows.
Also there is a case when one of the fields may have same lenght, so i tried to find the largest from another field.
int countPrices = checkedPrices.Count();
if (checkedPrices == 0)
{
checkedPrices = null;
}
else if (checkedPrices == 1)
{
checkedPrices = checkedPrices.Take(1).ToList();
}
else if (countFixedPrices > 1)
{
var maxPrices1 = checkedPrices.Max(i => i.Field1.Length);
if (maxPrices1 > 1)
{
var maxPrices2 = checkedPrices.Max(i => i.Field2.Length);
checkedPrices = checkedPrices.IndexOf(maxPrices2 );
}
checkedPrices = checkedPrices .ElementAt(maxPrices2);
}
So, i have an issue in the last "else if".
My logic was to find the max largest of Field1.
If there is the only one largest field - rewrite it to the "Where" expression (checkedPrices).
If there is not only one max largest of Field1 => take the largest from Field2.
The problem of mine is i'm confused how could i take the row data from the largest Field1/Field2.
This part of code is ridiculously bad(doesnt even compile):
if (maxPrices1 > 1)
{
var maxPrices2 = checkedPrices.Max(i => i.Field2.Length);
checkedPrices = checkedPrices.IndexOf(maxPrices2 );
}
checkedPrices = checkedPrices .ElementAt(maxPrices2);
Since it seems you need only one price I would recommend just write correct query to fetch it only. You can order items (with LINQ's OrderByDescending and ThenByDescending) and the take the top one:
var checkedPrice = prices
.Where(...)
.OrderByDescending(c => c.Field1.Length)
.ThenByDescending(c => c.Field2.Length)
.FirstOrDefault();
P.S:
For LINQ-to-Objects this solution can be inefficient for large datasets after filtering due to sorting being O(n * log(n)) operation while finding maximum element is O(n) task.
There can be implementation depended LINQ optimizations for some of cases like combination of OrderBy(Descending) with some overloads of operators like First(OrDefault) and possibly Skip/Take (see one, two, three).

Get max version within a date period query

I am trying to write a LINQ query that gets all the records and groups them by Period i.e. Sep-18 and then returns the record with the highest Version number within the periods. For example if I have three periods contained within my periodNames list the output list should return:
Sep-18
Versions: 1, 2, 3 (Returns record with version 3)
Oct-18
Versions: 1, 2 (Returns record with version 2)
Nov-18
Versions: 1, 2, 3, 4 (Returns record with version 4)
This is the query I have written so far:
var previousStatements = _context.Statements.Where(x => periodNames.Contains(x.Period) &&
x.Version == _context.Statement.Max(y => y.Version)).toList();
How can I adapt this to the above specification? Thanks
You can use GroupBy in order to group the statements and Max in order to find the maximum value, e.g.
var previousStatements = _context.Statements.Where(x => periodNames.Contains(x.Period))
.GroupBy(x => x.Period)
.Select(x => new { Period = x.Key, MaxVersion = x.Max(y => y.Version))
.ToList();
The code above returns the Period and the maximum version number only. If you need the record with the highest version number for each period, you can use this:
var previousStatements = (ctx.Items.Where(x => periodNames.Contains(x.Period))
.GroupBy(x => x.Period)
.ToArray())
.Select(x => x.OrderByDescending(y => y.Version).First())
.ToList();
Please note that the code above first uses a call to ToArray to send the GroupBy-query to the database. From the returned groups, the row with the highest version number for each period is then retrieved in memory.
Try to use GroupBy and then orderbydescending for the max versiĆ³n:
_context.GroupBy(f => f.Period).Select(f=>f.OrderByDescending(r=>r.Version).First()).ToList();
I think you would have known your solution if you would have written a proper requirement
You wrote:
...groups them by Period i.e. Sep-18 and then returns the highest Version number within the periods
Your examples don't return the highest version number but the row with the highest version number, so let's assume that is what you want:
From a sequence of Statements, group these statements into groups of statements with equal Period, and return from every group, the statement with the largest VersionNumber.
You haven't defined what you want if two statements within the same Period have the same VersionNumber. Let's assume you think that this will not occur, so you don't care which one is returned in that case.
So you have sequence of Statements, where every Statement has a Period and a VersionNumber.
Officially you haven't defined the class of Period and VersionNumber, the only thing we know about them is that you have some code that can decide whether two Periods are equal, and you have something where you can decide which VersionNumber is larger.
IEqualityComparer<Period> periodComparer = ...
IComparer<VersionNumber> versionComparer = ...
If Period is similar to a DateTime and VersionNumber is similar to an int, then these comparers are easy, otherwise you'll need to write comparers.
From your requirement the code is simple:
Take all input statements
Make groups of statements with equal Period
From every group of statements with this Period keep only the one with the highest VersionNumber
IEnumerable<Statement> statements = ...
var latestStatementsWithinAPeriod = statements
.GroupBy(statement => statement.Period, // group by same value for Period
(period, statementsWithThisPeriod) =>
// From every group of statements keep only the one with the highest VersionNumber
// = order by VersionNumber and take the first
statementWithThisPeriod
.OrderByDescending(statement => statement.VersionNumber,
versionComparer)
.FirstOrDefault(),
periodComparer);
Once again: if default comparers can be used to decide when two Periods are equal and which VersionNumber is larger, you don't need to add the comparers.
The disadvantage of the SorBy is that the 3rd and 4rd element etc are also sorted, while you only need the first element, which is the one with the largest VersionNumber.
This can be optimized by using the less commonly used Aggregate:
(period, statementsWithThisPeriod) => statementWithThisPeriod.Aggregate(
(newestStatement, nextStatement) =>
(versionComparer.Compare(newestStatement.VersionNumber, nextStatement.VersionNumber) >=0 ) ?
newestStatement :
nextStatement)
This will put the first statement as the newestStatement (= until now this was the one with the highest version number). The 2nd element will be put in nextStatement. both statements will be compared, and if nextStatement has a VersionNumber larger than newestStatement, then nextStatement will be considered to be newer, and thus will replace newestStatement. The end of the Aggregate will return newestStatement
You can try with GroupBy and OrderByDescending and then take first one.
var statements = _context.Statements
.Where(x => periodNames.Contains(x.Period))
.GroupBy(g => g.Period)
.Select(s => s.OrderByDescending(o => o.Version)
.FirstOrDefault()).ToList();

LINQ/lambda: How can I query a DB table based on information from another table? (many to many relationship)

I have a database scheme with 3 tables. One for requisitions, one for hospitals, and one joining the two (many-to-many relationship).
I'd like to list all requisitions in the database that are linked to a selected hospital.
This is what I have so far:
var valgtSykehus = Db.Sykehus.Where(n => n.Navn == sykehus).Single(); //this gives me a variable with my current hospital. I want to list all requistions that contains this.
var Rekvisisjoner = Db.Rekvisisjoner
.Where(r => r.Arkivert == true) //get only archived requsitions
.Include(p1 => p1.Sykehus) //include hospitals
.ToList() //this generates a list of -all- requisitions with the hospitals they are attached to.
.Where(x => x.Created > DateTime.Now.AddYears(-3)) /only go 3 years back
.Where(x => x.Sykehus.Contains(valgtSykehus)); //here is the problem. I want to discard all requisitions that does NOT contain the hospital in the valgtSykehus variable
Anyway, this gives me zero requistions, but if I skip the last line, I get all archived requistions.
x.Sykehus.Contains(valgtSykehus) executes in LINQ to Objects context (due to the intermediate ToList call) and most likely uses reference equality, which normally should work as soon as you use tracking queries.
Still, it's safer and also more efficient to do the whole thing with a single db query using Any condition with primitive key. Something like this:
var Rekvisisjoner = Db.Rekvisisjoner
.Include(r => r.Sykehus) //include hospitals
.Where(r => r.Arkivert == true) //get only archived requsitions the hospitals they are attached to.
.Where(r => r.Created > DateTime.Now.AddYears(-3)) /only go 3 years back
.Where(r => r.Sykehus.Any(s => s.Navn == sykehus));
If there is an issues with using DateTime.Now.AddYears(-3) inside the query, just put into variable outside of the query and use it inside.
var minDate = DateTime.Now.AddYears(-3);
var Rekvisisjoner =
// ...
.Where(r => r.Created > minDate)
//...
The issue may lie in the implementation of Contains. Contains has to check equality somehow. Anyway, if your valgtSykehus object is logically contained in x.Sykehus (i.e. has the same data), but not exactly the same object (i.e. the same reference), it's possible that Contains fails to find it, due to the default implementation of == in reference types (== is true, if the objects are exactly the same reference, false otherwise, even though all the data is the same).
You could try the following:
var Rekvisisjoner = Db.Rekvisisjoner
.Where(r => r.Arkivert == true)
.Include(p1 => p1.Sykehus)
.ToList()
.Where(x => x.Created > DateTime.Now.AddYears(-3))
.Where(x => x.Sykehus.Any(sh => sh.Id == valgtSykehus.Id));
If Id (or whatever your ID property is named) is a value field (most likely) this will return true whenever the ID of an Sykehus matches the ID of valgtSykehus.
Oh my.
I just realised that none of the archived requisitions contains any connections to the hospitals, as they apparently are removed one-by-one when the requisition is processed in the program.
I figured this out while trying to reverse the query, so thanks for that tip.

trying to optimize if/else condition slows down program

I am currently trying to optimize a .net application with the help of the VS-Profiling tools.
One function, which gets called quite often, contains the following code:
if (someObjectContext.someObjectSet.Where(i => i.PNT_ATT_ID == tmp_ATT_ID).OrderByDescending(i => i.Position).Select(i => i.Position).Count() == 0)
{
lastPosition = 0;
}
else
{
lastPosition = someObjectContext.someObjectSet.Where(i => i.PNT_ATT_ID == tmp_ATT_ID).OrderByDescending(i => i.Position).Select(i => i.Position).Cast<int>().First();
}
Which I changed to something like this:
var relevantEntities = someObjectContext.someObjectSet.Where(i => i.PNT_ATT_ID == tmp_ATT_ID).OrderByDescending(i => i.Position).Select(i => i.Position);
if (relevantEntities.Count() == 0)
{
lastPosition = 0;
}
else
{
lastPosition = relevantEntities.Cast<int>().First();
}
I was hoping that the change would speed up the method a bit, as I was unsure wether the compiler would notice that the query is done twice and cache the results.
To my surprise the execution time (the number of inklusive samplings) of the method has not decreased, but even increased by 9% (according to the profiler)
Can someone explain why this is happening?
I was hoping that the change would speed up the method a bit, as I was unsure whether the compiler would notice that the query is done twice and cache the results.
It will not. In fact it cannot. The database might not return the same results for the two queries. It's entirely possible for a result to be added or removed after the first query and before the second. (Making this code not only inefficient, but potentially broken if that were to happen.) Since it's entirely possible that you want two queries to be executed, knowing that the results could differ, it's important that the results of the query not be re-used.
The important point here is the idea of deferred execution. relevantEntities is not the results of a query, it's the query itself. It's not until the IQueryable is iterated (by a method such as Count, First, a foreach loop, etc.) that the database will be queried, and each time you iterate the query it will perform another query against the database.
In your case you can just do this:
var lastPosition = someObjectContext.someObjectSet
.Where(i => i.PNT_ATT_ID == tmp_ATT_ID)
.OrderByDescending(i => i.Position)
.Select(i => i.Position)
.Cast<int>()
.FirstOrDefault();
This leverages the fact that the default value of an int is 0, which is what you were setting the value to in the event that there was not match before.
Note that this is a query that is functionally the same as yours, it just avoids executing it twice. An even better query would be the one suggested by lazyberezovsky in which you leveraged Max rather than ordering and taking the first. If there is an index on that column there wouldn't be much of a difference, but if there's not a an index ordering would be a lot more expensive.
You can use Max() to get maximum position instead of ordering and taking first item, and DefaultIfEmpty() to provide default value (zero for int) if there are no entities matching your condition. Btw you can provide custom default value to return if sequence is empty.
lastPosition = someObjectContext.someObjectSet
.Where(i => i.PNT_ATT_ID == tmp_ATT_ID)
.Select(i => i.Position)
.Cast<int>()
.DefaultIfEmpty()
.Max();
Thus you will avoid executing two queries - one for defining if there is any positions, and another for getting latest position.

How to use two conditions in Linq lambda which has different where clause

I want to query my item in table Items, where the last update of each item must be less than 91 days old (from last update till now) and the quantity > 0.
This is my code in the Model:
public IList<Item> GetAllProducts()
{
var ien_item = from i in this.DataContext.Items
orderby i.LastUpdated descending
select i;
return ien_item.ToList().Where(
s =>
HelperClasses.HelperClass.IsLastUpdate(s.LastUpdated.Value) == true
&&
(s => s.Quantity) > 0
)
.ToList();
}
Anyone can solve it? Thanks.
We don't really know what's not working here. EDIT: Merlyn spotted it; your lambda syntax is messed up. There's more to do here though.
However, I'd have thought you'd want this:
public IList<Item> GetAllProducts()
{
var lastUpdateLimit = DateTime.UtcNow.Date.AddDays(-91);
var query = from item in DataContext.Items
where item.Quantity > 0 && item.LastUpdated >= lastUpdateLimit
orderby item.LastUpdated descending
select item;
return query.ToList();
}
Note that this is able to do all the querying at the database side instead of fetching all the items and filtering at the client side. It does assume that HelperClasses.HelperClass.IsLastUpdate is simple though, and basically equivalent to the filter I've got above.
(One additional point to note is that by evaluating UtcNow.Date once, the result will be consistent for all items - whereas if your code evaluates "today" on every call to IsLastUpdate, some values in the query may end up being filtered against a different date to other values, due to time progressing while the query is evaluating.)
EDIT: If you really need to use HelperClasses.HelperClass.IsLastUpdate then I'd suggest:
public IList<Item> GetAllProducts()
{
var query = from item in DataContext.Items
where item.Quantity > 0
orderby item.LastUpdated descending
select item;
return query.AsEnumerable()
.Where(s => HelperClass.IsLastUpdate(s.LastUpdated.Value))
.ToList();
}
... then at least the quantity filter is performed at the database side, and you're not creating a complete buffered list before you need to (note the single call to ToList).
The problem is your lambda syntax. You're trying to define a second lambda while in the middle of defining a first lambda. While this is possible to do, and useful in some contexts, it is sort of an advanced scenario, and probably won't be useful to you until you know you need it.
Right now, you don't need it. Unless you know you need it, you don't need it :)
So -
Instead of what you've written:
.Where(
s =>
HelperClasses.HelperClass.IsLastUpdate(s.LastUpdated.Value) == true
&& (s => s.Quantity) > 0
)
Write this instead:
.Where(
s =>
HelperClasses.HelperClass.IsLastUpdate(s.LastUpdated.Value) == true
&& s.Quantity > 0 // Notice I got rid of the extra lambda here
)
If you're morbidly curious:
The compile error you got is because you didn't define your second lambda correctly. It redefined a variable you'd already used (s), and you were trying to check if a lambda was greater than zero. That makes no sense. You can only compare the result of a lambda to some value. It's like calling a function. You don't compare functions to numbers - you compare the result you get when calling a function to a number.
Easy ...
public IList<Item> GetAllProducts()
{
var ien_item =
from i in DataContext.Items
where
HelperClasses.HelperClass.IsLastUpdate(i.LastUpdated.Value)
&& s.Quantity > 0
orderby i.LastUpdated descending
select i;
return ien_item.ToList();
}
Linq to SQL: Methods are not allowed (linq is not magic and can not convert C# methods to TSQL)
http://msdn.microsoft.com/en-us/library/bb425822.aspx
Linq to Object: while looking the same, it is much more powerful than linq to SQL... but can not query SQL databases :)
http://msdn.microsoft.com/en-us/library/bb397919.aspx
Linq to XML: same as linq to Object, with xml object
http://msdn.microsoft.com/en-us/library/bb387098.aspx
Linq to Dataset: not the same as Linq to SQL !
http://msdn.microsoft.com/en-us/library/bb386977.aspx
Other linq providers:
http://en.wikipedia.org/wiki/Language_Integrated_Query

Categories

Resources