Linq to sql group by a number of records - c#

Is there a way to create a linq to sql query to group by a Take parameter?
For instance if I have a Table that has 20 records with one unique ID value from 1 to 20, i would like to get a group of records grouped by 5 records:
Group 1: 1,2,3,4,5
Group 2: 6,7,8,9,10
....
I can think of two ways to do this
By making 5 queries:
The first query to count the total records, and the next 4 queries would be select queries where i skip 5 and take 5.
And by making one query, looping trough the results with an inner index and creating objects with the groups of 5
Is there a more elegant way to do this with linq to sql?

Your second idea is exactly what I would do. Just get everything from the database and loop on the .NET side. Probably there are ways to use Aggregate to do it in a more LINQ-esque way but I am sure they will be harder to read. If you do it in a lazy fashion (use yield to implement enumerator) you will still loop through the sequence only once so you will not lose performance.

If you're going to end up retrieving all the records from the database anyways, why not just go ahead and do it then use something like this:
collection.GroupBy(x => collection.IndexOf(x) / 5);

Can you group like this
var items = from i in arr
let m = i / 5
group i by m into d
select new { d };
If you had 10 elements it will create two groups of 5 each

Pull your data as-is, feed it to the container then split em up.
Your queries should never, ever be aware of anything concerning how the data they pull is shown to the user.

Related

C# Entity Framework - Order by and Take

I am trying to select 5 oldest entries from my database. I am using the following statement:
dbContext.Items.Take(5).OrderBy(i => i.LastCheck).ToListAsync();
The problem here is that EF first takes the first 5 Items from the table, and then sorts them. So I always get the 5 first entries from the table. But I want it first to sort the items and then select the top 5 ones, like when I execute this sql command:
select top 5 * from Items order by LastCheck asc
Here I get the right result.
Is there a possibility to do that in EF or do I have to execute the query?
you have to switch Take() and OrderBy()
dbContext.Items.OrderBy(i => i.LastCheck).Take(5).ToListAsync();
Think this has already been answered above but:-
dbContext.Items.OrderBy(x=>x.LastCheck).Take(5).ToListAsync();
Doing take first, would select your top 5 items from the list, and then sort just those 5, whereas what you want to do, is sort the list first into date order and then take the top 5.
Similarly if you wanted to sort by newest first that above query (to ensure it is sorted) would become:-
dbContext.Items.OrderByDescending(x=>x.LastCheck).Take(5).ToListAsync();
Hope this helps!

Linq order by ascending sort by descending

Sorry for title, but i didn't know how to write it better. I will try in post.
When I am trying to get values from database using linq with orderby something strange happens. Let's look on 4 queries:
//1
var badAsc = new ConcurrentBag<int>((from x in se.produkts orderby x.numerProduktu select x.numerProduktu));
//2
var bagDesc = new ConcurrentBag<int>((from x in se.produkts orderby x.numerProduktu descending select x.numerProduktu));
//3
var listAsc = (from x in se.produkts orderby x.numerProduktu select x.numerProduktu).ToList();
//4
var listdesc = (from x in se.produkts orderby x.numerProduktu descending select x.numerProduktu).ToList();
We got 2 ConcurrentBags<int> and 2 List<int>. What I was expecting from this is that 1 and 3 will be the same and also 2 and 4 will be the same. Check what values i got:
Ascending sort for ConcurrentBag<int> is in fact descending. On Microsoft site we can read, that ConcurrentBag is good when sorting does not matters, but as we can see in bagDesc, sorting is kept. To show, that I don't have any strange things in database I also make two List<int> where sorting is kept as it should be.
Executing select * from produkt in my database gives me values sorted like listAsc and bagDesc.
Database is mssql 2014 and numerProduktu is primary key in this table.
Do anybody know what happened there?
See here.
The ConcurrentBag appears to be implemented as a stack rather than a
queue. That is, the last item added is the first item removed. I
wouldn't count on that, though.
So they are getting added in reverse order. However, order is not meant to be reliably consistent in ConcurrentBag, so it's not guaranteed to always behave that way, especially if accessed by multiple threads.
If you care about maintaining the original order of entry, then you probably want a ConcurrentQueue.

SQL Linq .Take() latest 20 rows from HUGE database, performance-wise

I'm using EntityFramework 6 and I make Linq queries from Asp.NET server to a azure sql database.
I need to retrieve the latest 20 rows that satisfy a certain condition
Here's a rough example of my query
using (PostHubDbContext postHubDbContext = new PostHubDbContext())
{
DbGeography location = DbGeography.FromText(string.Format("POINT({1} {0})", latitude, longitude));
IQueryable<Post> postQueryable =
from postDbEntry in postHubDbContext.PostDbEntries
orderby postDbEntry.Id descending
where postDbEntry.OriginDbGeography.Distance(location) < (DistanceConstant)
select new Post(postDbEntry);
postQueryable = postQueryable.Take(20);
IOrderedQueryable<Post> postOrderedQueryable = postQueryable.OrderBy(Post => Post.DatePosted);
return postOrderedQueryable.ToList();
}
The question is, what if I literally have a billion rows in my database. Will that query brutally select millions of rows which meet the condition then get 20 of them ? Or will it be smart and realise that I only want 20 rows hence it will only select 20 rows ?
Basically how do I make this query work efficiently with a database that has a billion rows ?
According to http://msdn.microsoft.com/en-us/library/bb882641.aspx Take() function has deferred streaming execution as well as select statement. This means that it should be equivalent to TOP 20 in SQL and SQL will get only 20 rows from the database.
This link: http://msdn.microsoft.com/en-us/library/bb399342(v=vs.110).aspx shows that Take has a direct translation in Linq-to-SQL.
So the only performance you can make is in database. Like #usr suggested you can use indexes to increase performance. Also storing the table in sorted order helps a lot (which is likely your case as you sort by id).
Why not try it? :) You can inspect the sql and see what it generates, and then look at the execution plan for that sql and see if it scans the entire table
Check out this question for more details
How do I view the SQL generated by the Entity Framework?
This will be hard to get really fast. You want an index to give you the sort order on Id but you want a different (spatial) index to provide you with efficient filtering. It is not possible to create an index that fulfills both goals efficiently.
Assume both indexes exist:
If the filter is very selective expect SQL Server to "select" all rows where this filter is true, then sorting them, then giving you the top 20. Imagine there are only 21 rows that pass the filter - then this strategy is clearly very efficient.
If the filter is not at all selective SQL Server will rather traverse the table ordered by Id, test each row it comes by and outputs the first 20. Imagine that the filter applies to all rows - then SQL Server can just output the first 20 rows it sees. Very fast.
So for 100% or 0% selectivity the query will be fast. In between there are nasty mixtures. If you have that this question requires further thought. You probably need more than a clever indexing strategy. You need app changes.
Btw, we don't need an index on DatePosted. The sorting by DatePosted is only done after limiting the set to 20 rows. We don't need an index to sort 20 rows.

is there a better performance's way to get only few columns in ASP MVC without getting the whole table and filter it with LinQ?

im calling a table with 200.000 rows and 6 columns, but i only want 2 of these columns to be used in one controller, so i want to know if there is a better way to call them from the server without compromising performance, because as i know Linq queries get the whole table and them makes the filtering, i think maybe Views is a good way, but i want to know if there are others and betters, Thanks.
for example:
var items = from i in db.Items select new {i.id,i.name};
in case i have 1.000.000 items, will it be a trouble for the server?
Your initial assumption is incorrect.
In general LINQ queries do not get the whole table. the query is converted into a "server side expression" (i.e. a SQL statement) and the statement is resolved on the server and only the requested data is returned.
Given the statement you provided you will return only two columns but you will get 1,000,000 objects in the result if you do not do any filtering. But that isn't a problem with LINQ, that's a problem with you not filtering. If you included a where clause you would only get the rows you requested.
var items = from i in db.Items
where i.Whatever == SomeValue
select new { i.id, i.name };
Your original query would be translated (roughly) into the following SQL:
SELECT id, name FROM Items
You didn't include a where clause so you're going to get everything.
With the version that included a where clause you'd get the following SQL generated:
SELECT id, name FROM Items WHERE Whatever = SomeValue
Only the rows that match the condition would be returned to your application and converted into objects.

Using Linq to SQL, how do I find min and max of a column in a table?

I want to find the fastest way to get the min and max of a column in a table with a single Linq to SQL roundtrip. So I know this would work in two roundtrips:
int min = MyTable.Min(row => row.FavoriteNumber);
int max = MyTable.Max(row => row.FavoriteNumber);
I know I can use group but I don't have a group by clause, I want to aggregate over the whole table! And I can't use the .Min without grouping first. I did try this:
from row in MyTable
group row by true into r
select new {
min = r.Min(z => z.FavoriteNumber),
max = r.Max(z => z.FavoriteNumber)
}
But that crazy group clause seems silly, and the SQL it makes is more complex than it needs to be.
So, is there any way to just get the correct SQL out?
EDIT: These guys failed too: Linq to SQL: how to aggregate without a group by? ... lame oversight by LINQ designers if there's really no answer.
EDIT 2: I looked at my own solution (with the nonsensical constant group by clause) in the SQL Server Management Studio execution plan analysis, and it looks to me like it is identical to the plan generated by:
SELECT MIN(FavoriteNumber), MAX(FavoriteNumber)
FROM MyTable
so unless someone can come up with a simpler-or-equally-as-good answer, I think I have to mark it as answered-by-myself. Thoughts?
As stated in the question, this method seems to actually generate optimal SQL code, so while it looks a bit squirrely in LINQ, it should be optimal performance-wise.
from row in MyTable
group row by true into r
select new {
min = r.Min(z => z.FavoriteNumber),
max = r.Max(z => z.FavoriteNumber)
}
I could find only this one which produces somewhat clean sql still not really effective comparing to select min(val), max(val) from table:
var r =
(from min in items.OrderBy(i => i.Value)
from max in items.OrderByDescending(i => i.Value)
select new {min, max}).First();
the sql is
SELECT TOP (1)
[t0].[Value],
[t1].[Value] AS [Value2]
FROM
[TestTable] AS [t0],
[TestTable] AS [t1]
ORDER BY
[t0].[Value],
[t1].[Value] DESC
still there is another option to use single connection for both min and max queries (see Multiple Active Result Sets (MARS))
or stored procedure..
I'm not sure how to translate it into C# yet (I'm working on it)
This is the Haskell version
minAndMax :: Ord a => [a] -> (a,a)
minAndMax [x] = (x,x)
minAndMax (x:xs) = (min a x, max b x)
where (a,b) = minAndMax xs
The C# version should involve Aggregate some how (I think).
You could select the whole table, and do your min and max operations in memory:
var cache = // select *
var min = cache.Min(...);
var max = cache.Max(...);
Depending on how large your dataset is, this might be the way to go about not hitting your database more than once.
A LINQ to SQL query is a single expression. Thus, if you can't express your query in a single expression (or don't like it once you do) then you have to look at other options.
Stored procedures, since they can have statements, enable you to accomplish this in a single round-trip. You will either have two output parameters or select a result set with two rows. Either way, you will need custom code to read the stored procedure's result.
(I don't personally see the need to avoid two round-trips here. It seems like a premature optimization, especially since you will probably have to jump through hoops to get it working. Not to mention the time you will spend justifying this decision and explaining the solution to other developers.)
Put another way: you've already answered your own question. "I can't use the .Min without grouping first", followed by "that crazy group clause seems silly, and the SQL it makes is more complex than it needs to be", are clues that the simple and easily-understood two-round-trip solution is the best expression of your intent (unless you write custom SQL).

Categories

Resources