C# Entity Framework - Order by and Take - c#

I am trying to select 5 oldest entries from my database. I am using the following statement:
dbContext.Items.Take(5).OrderBy(i => i.LastCheck).ToListAsync();
The problem here is that EF first takes the first 5 Items from the table, and then sorts them. So I always get the 5 first entries from the table. But I want it first to sort the items and then select the top 5 ones, like when I execute this sql command:
select top 5 * from Items order by LastCheck asc
Here I get the right result.
Is there a possibility to do that in EF or do I have to execute the query?

you have to switch Take() and OrderBy()
dbContext.Items.OrderBy(i => i.LastCheck).Take(5).ToListAsync();

Think this has already been answered above but:-
dbContext.Items.OrderBy(x=>x.LastCheck).Take(5).ToListAsync();
Doing take first, would select your top 5 items from the list, and then sort just those 5, whereas what you want to do, is sort the list first into date order and then take the top 5.
Similarly if you wanted to sort by newest first that above query (to ensure it is sorted) would become:-
dbContext.Items.OrderByDescending(x=>x.LastCheck).Take(5).ToListAsync();
Hope this helps!

Related

Entity Framework not returns duplicate matching items

I found an interesting issue in Entity Framework. Check the code bellow. Here i am using Contains() to find all matching Id rows from Table Test1 but when i add same id multiple times it returns only 1 item not duplicating items but i want to get duplicate items too. How can i do this then?
var ids = new List<int>();
ids.Add(1);
ids.Add(1);
var foo = ctx.Test1.Include("Test2").Where(x => ids.Contains(x.Id)).ToList();
YOu can not. You really need to learn the basic of how SQL works and how query works because your question is a fundamental misunderstanding.
when i add same id multiple times it returns only 1 item not duplicating items
Because the table STILL contains only 1 item. if you add the same ID multiple times, why would you expect it to return the row multiple times?
The way it is evaluated is:
Take row
Check whether the ID matches any of the provided list.
Next row.
So, regardless how often you put the ID into the list of approved id's, it OBVIOUSLY will only return one row. You do not get duplicate items because you do not have duplicate items to start with.
Like so often when using anything Ef related, it also helps to intercept and look at the generated SQL and the generated query plan - this at least will make obviously clear that you can not get 2 id's. Contains will be an IN clause, containing the list of values. Like I said above, contains checks for rows, it will not magically duplicate them.
I would suggest making the duplication manually after query - though in 25 years I have never seen this requirement coming up, so I would strongly suggest you check whether what you try to do makes any logical sense from a higher perspective first.
Why should it be the other way? Your EF Contains instruction has in SQL "IN" form:
SELECT
...
FROM ...
WHERE ... IN (1, 1)

How OrderBy happens in linq when all the records have the same value?

When orderBy is happened on datetime with same value, I am getting different results in different hits from linq to sql .
Let us say some 15 records have the same datetime as one of their field
and if pagination is there for those 15 records and per page limit is 10 in my case, say some 10 records came on 1st run for page 1. Then for page 2 I am not getting the remaining 5 records, but some 5 records from the previous 10 records of page 1.
Question:
How this orderBy and skip and take functions are working and
Why this discrepancy in the result ?
LINQ does not play a role on how the ordering unto the underlying data source is applied. Linq itself is simply an enumerating extension. As per your comment to your question, you are asking how MSSQL applies ordering in a query.
In MSSQL (and most other RDBMS), the ordering on identical values is dependent on the underlying implementation and configuration of the RDBMS. The ordered result for such values can be perceived as random, and can change between identical queries. This does not mean you will see a difference, but you cannot rely on the data to be returned in a specific order.
This has been asked and answered before on SO, here.
This is also described in the community addon comments in this MSDN article.
No ordering is applied beyond that specified in the ORDER BY clause. If all rows have the same value, they can be returned in whatever order is fastest. That's especially evident when a query is executed in parallel.
This means that you can't use paging on results ordered by non-unique values. Each time you make a call the order can change.
In such cases you need to add tie-breaker columns that will ensure unique ordering values, eg the ID of a product ORDER BY Date, ProductID

Linq query for ordering by ascending order except for one row

I am pulling data from a table like the example below:
status_id status_description
1 Unknown
2 Personal
3 Terminated
4 Relocated
6 Other
7 LOP
8 Continuing
I want to get the results into a IEnumerable which i am then returning to the front end to display the descriptions in a dropdown.
I want to sort this alphabetically and have the "Other" option always show up in the bottom of the dropdown.
Is there any way to get this in the backend? Currently I have this:
IEnumerable<employee_map> map= await(from emp in db.emp_status_map
orderby emp.status_description
select emp).ToListAsync();
Simply order on two values, first on whether the description is Other, then on the actual description itself:
orderby emp.status_description == "Other", emp.status_description
Servy's answer is fine, it works and fullfils your requirements. Another solution slightly different would be to add a field called "DisplayOrder", for example, and set it to 1 for all the rows except for "other", and set it to 2 (or whatever number you want) to "other". Then, you just order by DisplayOrder, Description.
It's highly probably that this solution is gonna be much faster if you define an index on DisplayOrder, Description.

SQL Linq .Take() latest 20 rows from HUGE database, performance-wise

I'm using EntityFramework 6 and I make Linq queries from Asp.NET server to a azure sql database.
I need to retrieve the latest 20 rows that satisfy a certain condition
Here's a rough example of my query
using (PostHubDbContext postHubDbContext = new PostHubDbContext())
{
DbGeography location = DbGeography.FromText(string.Format("POINT({1} {0})", latitude, longitude));
IQueryable<Post> postQueryable =
from postDbEntry in postHubDbContext.PostDbEntries
orderby postDbEntry.Id descending
where postDbEntry.OriginDbGeography.Distance(location) < (DistanceConstant)
select new Post(postDbEntry);
postQueryable = postQueryable.Take(20);
IOrderedQueryable<Post> postOrderedQueryable = postQueryable.OrderBy(Post => Post.DatePosted);
return postOrderedQueryable.ToList();
}
The question is, what if I literally have a billion rows in my database. Will that query brutally select millions of rows which meet the condition then get 20 of them ? Or will it be smart and realise that I only want 20 rows hence it will only select 20 rows ?
Basically how do I make this query work efficiently with a database that has a billion rows ?
According to http://msdn.microsoft.com/en-us/library/bb882641.aspx Take() function has deferred streaming execution as well as select statement. This means that it should be equivalent to TOP 20 in SQL and SQL will get only 20 rows from the database.
This link: http://msdn.microsoft.com/en-us/library/bb399342(v=vs.110).aspx shows that Take has a direct translation in Linq-to-SQL.
So the only performance you can make is in database. Like #usr suggested you can use indexes to increase performance. Also storing the table in sorted order helps a lot (which is likely your case as you sort by id).
Why not try it? :) You can inspect the sql and see what it generates, and then look at the execution plan for that sql and see if it scans the entire table
Check out this question for more details
How do I view the SQL generated by the Entity Framework?
This will be hard to get really fast. You want an index to give you the sort order on Id but you want a different (spatial) index to provide you with efficient filtering. It is not possible to create an index that fulfills both goals efficiently.
Assume both indexes exist:
If the filter is very selective expect SQL Server to "select" all rows where this filter is true, then sorting them, then giving you the top 20. Imagine there are only 21 rows that pass the filter - then this strategy is clearly very efficient.
If the filter is not at all selective SQL Server will rather traverse the table ordered by Id, test each row it comes by and outputs the first 20. Imagine that the filter applies to all rows - then SQL Server can just output the first 20 rows it sees. Very fast.
So for 100% or 0% selectivity the query will be fast. In between there are nasty mixtures. If you have that this question requires further thought. You probably need more than a clever indexing strategy. You need app changes.
Btw, we don't need an index on DatePosted. The sorting by DatePosted is only done after limiting the set to 20 rows. We don't need an index to sort 20 rows.

Linq to sql group by a number of records

Is there a way to create a linq to sql query to group by a Take parameter?
For instance if I have a Table that has 20 records with one unique ID value from 1 to 20, i would like to get a group of records grouped by 5 records:
Group 1: 1,2,3,4,5
Group 2: 6,7,8,9,10
....
I can think of two ways to do this
By making 5 queries:
The first query to count the total records, and the next 4 queries would be select queries where i skip 5 and take 5.
And by making one query, looping trough the results with an inner index and creating objects with the groups of 5
Is there a more elegant way to do this with linq to sql?
Your second idea is exactly what I would do. Just get everything from the database and loop on the .NET side. Probably there are ways to use Aggregate to do it in a more LINQ-esque way but I am sure they will be harder to read. If you do it in a lazy fashion (use yield to implement enumerator) you will still loop through the sequence only once so you will not lose performance.
If you're going to end up retrieving all the records from the database anyways, why not just go ahead and do it then use something like this:
collection.GroupBy(x => collection.IndexOf(x) / 5);
Can you group like this
var items = from i in arr
let m = i / 5
group i by m into d
select new { d };
If you had 10 elements it will create two groups of 5 each
Pull your data as-is, feed it to the container then split em up.
Your queries should never, ever be aware of anything concerning how the data they pull is shown to the user.

Categories

Resources