Linq orderby not working as expected - c#

i'm trying to get data from a db, but I don't get the expected result.
var totalNews = GetNewsData(false, CmsPagesRepository.CurrentUserSettings.CmsLanguageID)
.OrderBy(n => n.Priority)
.ThenBy(n => n.Index)
.ThenBy(n => n.Title);
I have a table of News with a column Index and a column Priority, and I want to order the news by Priority and if the Priority is null first show the ones with priority and after the others.
But now if a have 3 news with index (1,4,2) and priority(null,0,1) in the list of totalNews I get on the first position the one with Priority null and index 1. What do I have to correct?

Though the answer you have accepted will work, I don't much like it. First, in the unlikely event that you have some of the largest integer in there, they will not be ordered correctly with respect to null. A good solution works for any inputs, not just common inputs. Second, the code does not match the specification. Your specification is "order first by whether the priority is null, then by priority, then by...", so that's how the code should read. I would suggest you write:
GetNewsData(...)
.OrderBy(n => n.Priority == null) // nulls last
.ThenBy(n => n.Priority)
.ThenBy(n => n.Index)
.ThenBy(n => n.Title);

You probably want a simple null check in the OrderBy priority, like this:
.OrderBy(n => n.Priority ?? int.MaxValue)
This will default the priority to a high number if it is null.

Related

Check if tie in count of lists using linq

States have cities. I need the state with most cities only if there is no tie. Tie means top 2 states have the same number of cities.
var stateWithMostCities = _states
.OrderByDescending(_p => _p.cities.Count())
.Take(2)
.ToList();
Now I can check if the city count of first state = second state and determine if there is a tie. However iam asking if this can achieved on the same line shown above using takewhile, skip and other creative uses of linq. Thanks
Something like this?
var stateWithMostCitiesWithoutATie =_states.GroupBy(_p => _p.cities.Count())
.OrderByDescending(g=>g.Key)
.FirstOrDefault(g=> g.Count()==1? g.First():null);
The key is, as #Mong Zhu pointed out to Group by the counts of cities, after that you can order by desc to get the max, and if the max group has more than one then you have a tie
Technically, you can use Aggregate over ordered states:
// the state with maximum cities; null in case of tie
var stateWithMostCities = _states
.OrderByDescending(state => state.cities.Count())
.Take(2) // at most 2 items to analyze
.Aggregate((s, a) => s.cities.Count() == a.cities.Count() ? null : s);
But I doubt if you should do this: comparing top 2 states is more readable.

Get max version within a date period query

I am trying to write a LINQ query that gets all the records and groups them by Period i.e. Sep-18 and then returns the record with the highest Version number within the periods. For example if I have three periods contained within my periodNames list the output list should return:
Sep-18
Versions: 1, 2, 3 (Returns record with version 3)
Oct-18
Versions: 1, 2 (Returns record with version 2)
Nov-18
Versions: 1, 2, 3, 4 (Returns record with version 4)
This is the query I have written so far:
var previousStatements = _context.Statements.Where(x => periodNames.Contains(x.Period) &&
x.Version == _context.Statement.Max(y => y.Version)).toList();
How can I adapt this to the above specification? Thanks
You can use GroupBy in order to group the statements and Max in order to find the maximum value, e.g.
var previousStatements = _context.Statements.Where(x => periodNames.Contains(x.Period))
.GroupBy(x => x.Period)
.Select(x => new { Period = x.Key, MaxVersion = x.Max(y => y.Version))
.ToList();
The code above returns the Period and the maximum version number only. If you need the record with the highest version number for each period, you can use this:
var previousStatements = (ctx.Items.Where(x => periodNames.Contains(x.Period))
.GroupBy(x => x.Period)
.ToArray())
.Select(x => x.OrderByDescending(y => y.Version).First())
.ToList();
Please note that the code above first uses a call to ToArray to send the GroupBy-query to the database. From the returned groups, the row with the highest version number for each period is then retrieved in memory.
Try to use GroupBy and then orderbydescending for the max versiĆ³n:
_context.GroupBy(f => f.Period).Select(f=>f.OrderByDescending(r=>r.Version).First()).ToList();
I think you would have known your solution if you would have written a proper requirement
You wrote:
...groups them by Period i.e. Sep-18 and then returns the highest Version number within the periods
Your examples don't return the highest version number but the row with the highest version number, so let's assume that is what you want:
From a sequence of Statements, group these statements into groups of statements with equal Period, and return from every group, the statement with the largest VersionNumber.
You haven't defined what you want if two statements within the same Period have the same VersionNumber. Let's assume you think that this will not occur, so you don't care which one is returned in that case.
So you have sequence of Statements, where every Statement has a Period and a VersionNumber.
Officially you haven't defined the class of Period and VersionNumber, the only thing we know about them is that you have some code that can decide whether two Periods are equal, and you have something where you can decide which VersionNumber is larger.
IEqualityComparer<Period> periodComparer = ...
IComparer<VersionNumber> versionComparer = ...
If Period is similar to a DateTime and VersionNumber is similar to an int, then these comparers are easy, otherwise you'll need to write comparers.
From your requirement the code is simple:
Take all input statements
Make groups of statements with equal Period
From every group of statements with this Period keep only the one with the highest VersionNumber
IEnumerable<Statement> statements = ...
var latestStatementsWithinAPeriod = statements
.GroupBy(statement => statement.Period, // group by same value for Period
(period, statementsWithThisPeriod) =>
// From every group of statements keep only the one with the highest VersionNumber
// = order by VersionNumber and take the first
statementWithThisPeriod
.OrderByDescending(statement => statement.VersionNumber,
versionComparer)
.FirstOrDefault(),
periodComparer);
Once again: if default comparers can be used to decide when two Periods are equal and which VersionNumber is larger, you don't need to add the comparers.
The disadvantage of the SorBy is that the 3rd and 4rd element etc are also sorted, while you only need the first element, which is the one with the largest VersionNumber.
This can be optimized by using the less commonly used Aggregate:
(period, statementsWithThisPeriod) => statementWithThisPeriod.Aggregate(
(newestStatement, nextStatement) =>
(versionComparer.Compare(newestStatement.VersionNumber, nextStatement.VersionNumber) >=0 ) ?
newestStatement :
nextStatement)
This will put the first statement as the newestStatement (= until now this was the one with the highest version number). The 2nd element will be put in nextStatement. both statements will be compared, and if nextStatement has a VersionNumber larger than newestStatement, then nextStatement will be considered to be newer, and thus will replace newestStatement. The end of the Aggregate will return newestStatement
You can try with GroupBy and OrderByDescending and then take first one.
var statements = _context.Statements
.Where(x => periodNames.Contains(x.Period))
.GroupBy(g => g.Period)
.Select(s => s.OrderByDescending(o => o.Version)
.FirstOrDefault()).ToList();

Does LINQ know how to optimize "queries"?

Suppose I do something like
var Ordered = MyList.OrderBy(x => x.prop1).ThenBy(x => x.prop2);
Does MyList.OrderBy(x => x.prop1) return the filtered list, and then does it further filter that list by ThenBy(x => x.prop2)? In other words, is it equivalent to
var OrderedByProp1 = MyList.OrderBy(x => x.prop1);
var Ordered = OrderedByProp1.OrderBy(x => x.prop2);
???
Because obviously it's possible to optimize this by running a sorting algorithm with a comparator:
var Ordered = MyList.Sort( (x,y) => x.prop1 != y.prop1 ? x.prop1 < y.prop1 : ( x.prop2 < y.prop2 ) );
If it does do some sort of optimization and intermediate lists are not returned in the process, then how does it know how to do that? How do you write a class that optimizes chains of methods on itself? Makes no sense.
Does MyList.OrderBy(x => x.prop1) return the filtered list
No. LINQ methods (at least typically) return queries, not the results of executing those queries.
OrderBy just returns an object which, when you ask it for an item, will return the first item in the collection given a particular ordering. But until you actually ask it for a result it's not doing anything.
Note you can also get a decent idea as to what's going on by just looking at what OrderBy returns. It returns IOrderedEnumerable<T>. That interface has a method CreateOrderedEnumerable which:
Performs a subsequent ordering on the elements of an IOrderedEnumerable according to a key.
That method is what ThenBy uses to indicate that there is a subsequent ordering.
This means that you're building up all of the comparers that you want to be used, from the OrderBy and all ThenBy calls before you ever need to generate a single item in the result set.
For more specifics on exactly how you can go about creating this behavior, see Jon Skeet's blog series on the subject.

Linq performance query

I have this query that gives the correct results but it takes about 15 seconds to run
int Count= P.Pets.Where(c => !P.Pets.Where(a => a.IsOwned == true)
.Select(a => a.OwnerName).Contains(c.OwnerName) && c.CreatedDate >=
EntityFunctions.AddDays(DateTime.Now, -8)).GroupBy(b=>b.OwnerName).Count();
If I remove this part of the linq
'&& c.CreatedDate >= EntityFunctions.AddHours(DateTime.Now, -8)'
It only takes about 3 seconds to run. How can I keep the same condition happening but a lot faster?
I need that date criteria because I don't want any Classeses that were created 8 days old to be included in the count
Edit
I have a table by the name of People which is referred to in this query as P and I want to return a count of the total of Pets they are that do not have a owner and remove the ones from the query that don't do have an owner even if they exist in another Pet reference has not the owner of that Pet. Meaning if a person has at least one record in the Pets table to be considered as an owner of a pet than I want to remove all cases where that person exist in the return query and once that is done only return the Pets that have been created newer than 8 days
You should cache the date and put that evaluation first (since the DateTime evaluation should be faster than a Contains evaluation). Also avoid recalculating the same query multiple times.
DateTime eightDaysOld = EntityFunctions.AddHours(DateTime.Now, -8);
//calculate these independently from the rest of the query
var ownedPetOwnerNames = P.Pets.Where(a => a.IsOwned == true)
.Select(a => a.OwnerName);
//Evaluate the dates first, it should be
//faster than Contains()
int Count = P.Pets.Where(c => c.CreatedDate >= eightDaysOld &&
//Using the cached result should speed this up
ownedPetOwnerNames.Contains(c.OwnerName))
.GroupBy(b=>b.OwnerName).Count();
That should return the same results. (I hope)
You are loosing any ability to use indices with that snippet, as it calculates that static date for every row. Declare a DateTime variable before your query and set it to DateTime.Now.AddHours(-8) and use the variable instead of your snippet in the where clause.
By separating the query and calling ToList() on it and inserting it in the master query make it go 4 times faster
var ownedPetOwnerNames = P.Pets.Where(a => a.IsOwned == true)
.Select(a => a.OwnerName).ToList();
int Count = P.Pets.Where(c => c.CreatedDate >= Date&&
ownedPetOwnerNames.Contains(c.OwnerName)).GroupBy(b=>b.OwnerName).Count();
You could use (and maybe first create) a navigation property Pet.Owner:
var refDate = DateTime.Today.AddDays(-8);
int Count= P.Pets
.Where(p => !p.Owner.Pets.Any(p1 => p1.IsOwned)
&& p.CreatedDate >= refDate)
.GroupBy(b => b.OwnerName).Count();
This may increase performance because the Contains is gone. At least it is better scalable than your two-phase query with a Contains involving an unpredictable number of strings.
Of course you also need to make sure there is an index on CreatedDate.

trying to optimize if/else condition slows down program

I am currently trying to optimize a .net application with the help of the VS-Profiling tools.
One function, which gets called quite often, contains the following code:
if (someObjectContext.someObjectSet.Where(i => i.PNT_ATT_ID == tmp_ATT_ID).OrderByDescending(i => i.Position).Select(i => i.Position).Count() == 0)
{
lastPosition = 0;
}
else
{
lastPosition = someObjectContext.someObjectSet.Where(i => i.PNT_ATT_ID == tmp_ATT_ID).OrderByDescending(i => i.Position).Select(i => i.Position).Cast<int>().First();
}
Which I changed to something like this:
var relevantEntities = someObjectContext.someObjectSet.Where(i => i.PNT_ATT_ID == tmp_ATT_ID).OrderByDescending(i => i.Position).Select(i => i.Position);
if (relevantEntities.Count() == 0)
{
lastPosition = 0;
}
else
{
lastPosition = relevantEntities.Cast<int>().First();
}
I was hoping that the change would speed up the method a bit, as I was unsure wether the compiler would notice that the query is done twice and cache the results.
To my surprise the execution time (the number of inklusive samplings) of the method has not decreased, but even increased by 9% (according to the profiler)
Can someone explain why this is happening?
I was hoping that the change would speed up the method a bit, as I was unsure whether the compiler would notice that the query is done twice and cache the results.
It will not. In fact it cannot. The database might not return the same results for the two queries. It's entirely possible for a result to be added or removed after the first query and before the second. (Making this code not only inefficient, but potentially broken if that were to happen.) Since it's entirely possible that you want two queries to be executed, knowing that the results could differ, it's important that the results of the query not be re-used.
The important point here is the idea of deferred execution. relevantEntities is not the results of a query, it's the query itself. It's not until the IQueryable is iterated (by a method such as Count, First, a foreach loop, etc.) that the database will be queried, and each time you iterate the query it will perform another query against the database.
In your case you can just do this:
var lastPosition = someObjectContext.someObjectSet
.Where(i => i.PNT_ATT_ID == tmp_ATT_ID)
.OrderByDescending(i => i.Position)
.Select(i => i.Position)
.Cast<int>()
.FirstOrDefault();
This leverages the fact that the default value of an int is 0, which is what you were setting the value to in the event that there was not match before.
Note that this is a query that is functionally the same as yours, it just avoids executing it twice. An even better query would be the one suggested by lazyberezovsky in which you leveraged Max rather than ordering and taking the first. If there is an index on that column there wouldn't be much of a difference, but if there's not a an index ordering would be a lot more expensive.
You can use Max() to get maximum position instead of ordering and taking first item, and DefaultIfEmpty() to provide default value (zero for int) if there are no entities matching your condition. Btw you can provide custom default value to return if sequence is empty.
lastPosition = someObjectContext.someObjectSet
.Where(i => i.PNT_ATT_ID == tmp_ATT_ID)
.Select(i => i.Position)
.Cast<int>()
.DefaultIfEmpty()
.Max();
Thus you will avoid executing two queries - one for defining if there is any positions, and another for getting latest position.

Categories

Resources