Calling SKIP() in code or using TOP in function - c#

I'm coding an application with Entity Framework in which I rely heavily on user defined functions.
I have a question about the best way (most optimized way) of how I limit and page my result sets. Basically I am wondering if these two options are the same or one is prefered performance wise.
Option 1.
//C#
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
Option 2.
//C#
var result2 = _DB.fn_GetData(page = 0, size = 100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
ORDER BY Id
OFFSET (size * page) ROWS FETCH NEXT size ROWS ONLY
To me these seem to be producing about the same result, but maybe I am missing some key aspect.

You'll have to be aware when your LINQ statement is AsEnumerable and when it is AsQueryable. As long as your statement is an IQueryable<...> the software will try to translate it into SQL and let your database do the query. Once it really has lost the IQueryable, and has become an implementation of an IEnumerable, the data has been brought to local memory, and all further LINQ statements will be performed by your process, not by the database.
If you use your debugger, you will see that the return value of your fn_getData returns an IEnumerable. This means that the result of fn_GetData is brought to local memory and your OrderBy etc is performed by your process.
Usually it is much more efficient to only move the records that you will use to local memory. Besides: do not fetch the complete records, but only the properties that you plan to use. So in this case I guess you'll have to create an extended version of fn_GetData that returns only the values you plan to use

I suggest second option because SQL Server can more faster then C# methods.
In your first option, you take all of the records in table and loop through. But second option, SQL Server do it for you and you get what you want.

You should apply the limiting and where clauses (it depends on table indexes) in the database as far as possible. For first example;
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
The whole table is retrieved from database into in-memory and it kills the performance and reliability. I strongly don't suggest it. You should consider to put some limitations to filter records on the database. So, the second option is better approach in this case.

Related

Which query is optimized?

I am fetching a list of products including their prices. I want to get just enable prices.
I wrote two type of queries:
context.Products.Include("Prices").Where(p=>p.Prices.Where(pr=>pr.Enable==true).Count()>0).ToList();
And the other one is:
context.Products.Include("Prices").ToList().RemoveAll(p => p.Prices.Where(pr => pr.Enable == true).ToList().Count == 0);
Which one is more optimized?
Assuming you are using an EntityFramework context, the first one is way better.
This is because Linq to SQL will translate the statement into an SQL statement. The Where statements will result in an according SQL Where. So only the necessary subset of the elements are retrieved.
The second statement retrieves all Products and Prices and then removes the unwanted elements.
This assumes that you have a remote database. If your database is running locally or you already have all Products and Prices in memory its not so easy to tell (you would have to use the profiler for that).
This kind of question really depends on a lot of things, so it is not so easy to say which is better.
But from the code, the first one is doing the where clause at sql side, where the second code is getting all the data out from sql and do the where in application.
so it will depend on the sql server, the application hardware and data amount.

LINQ and selection rows from big database

I have some database ang now it contains a table with about 100 rows. But in future it will have not 100 but 1 000 000+ rows and I have to be careful with my web application I'm developing now.
Problem is next: at web page I need to create paged list what will show records to user. And here is a sample of code that I plan to use
public IQueryable<MyTable> GetRows(int from, int to)
{
var queryRes = (from row in SomeDataContext.MyTable
order by row.id
select row).AsQueriable();
return queryRes.Take(to).Skip(from);
}
It is only sample of code. I did not run it.
But question is what will go on in this case? I see tow scenarios
It will load all rows from database and at server side and records in range from 'from' to 'to' will be returned. Other will be ignored. In this case my application will have big troubles. Imagine load 1 000 000 rows from database every time. It will be disaster.
It will construct SQL request what will return only rows I need without loading others. That's exactly what I need.
I think that it will be 2 scenario but I'm not sure and can't check it. Am I correct?
As a side-note, you don't have to call AsQueryable. It is enough to do
var queryRes = SomeDataContext.MyTable.OrderBy(r => r.Id);
return queryRes.Take(to).Skip(from);
And to answer your question - scenario 2 will be executed. You can always check the generated SQL by using the SQL Server Profiler, but in case you are using Entity Framework, you can even do queryRes.ToString(). And as #Aron correctly pointed out - the query will be actually executed against the database only when enumerating the results (e.g. calling queryRes.ToList()).
These questions address the issue of looking up the SQL code in more detail:
How to view generated SQL from Entity Framework?
exact sql query executed by Entity Framework
Strictly speaking, neither 1 nor 2 is correct. Running the code DOES NOT hit the database. It constructs an expression tree. The calling code can still modify the expression tree further without hitting the database.
With the IQueryable interface no SQL is run. It is at the point when you call IEnumerable.GetEnumerator() that the underlying Linq Provider converts the WHOLE expression into a query. In this case a SQL query, and then run it.
So for example, with this code. You could have
void Main()
{
var foo = from x in GetRows(10, 10)
where x.Id > 1000
select x;
foreach(var f in foo)
{
//Stuff
}
}
The sql that is actually run will actually be closer to
SELECT a,b,c FROM
(SELECT a,b,c, ROW_NUMBER() OVER (ORDER BY ...) as row_number
FROM Table
WHERE id > 1000) t0
WHERE to.row_number BETWEEN 10 and 20;
To be honest you are going about this wrong. You don't need a GetRows method. I would directly call the Linq query when constructing the table itself. You should take a look at the IRepository pattern that MVC scaffolding uses.
Finally if this is meant to be called as a WebQuery for AJAX I would look at the two OData implementations in .net (WCF Data Services and WebAPI OData).
You are right.
The 2. scenario is what will happen. When the query is eventuallty exectuted.
I Would sugges to reverse the Take - Skip, so you start by Skip
queryRes.Skip(from).Take(to)
Debuggen this method will not make any calls to the database. It just returns the query - not the resualt.
If you want to test exactly what will happen, try download LinqPad - it is a great to for demystifying linq queries.

Improving Linq query

I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.

Dynamically Generating a Linq/Lambda Where Clause

I've been searching here and Google, but I'm at a loss. I need to let users search a database for reports using a form. If a field on the form has a value, the app will get any reports with that field set to that value. If a field on a form is left blank, the app will ignore it. How can I do this? Ideally, I'd like to just write Where clauses as Strings and add together those that are not empty.
.Where("Id=1")
I've heard this is supposed to work, but I keep getting an error: "could not be resolved in the current scope of context Make sure all referenced variables are in scope...".
Another approach is to pull all the reports then filter it one where clause at a time. I'm hesitant to do this because 1. that's a huge chunk of data over the network and 2. that's a lot of processing on the user side. I'd like to take advantage of the server's processing capabilities. I've heard that it won't query until it's actually requested. So doing something like this
var qry = ctx.Reports
.Select(r => r);
does not actually run the query until I do:
qry.First()
But if I start doing:
qry = qry.Where(r => r.Id = 1).Select(r => r);
qry = qry.Where(r => r.reportDate = '2010/02/02').Select(r => r);
Would that run the query? Since I'm adding a where clause to it. I'd like a simple solution...in the worst case I'd use the Query Builder things...but I'd rather avoid that (seems complex).
Any advice? :)
Linq delays record fetching until a record must be fetched.
That means stacking Where clauses is only adding AND/OR clauses to the query, but still not executing.
Execution of the generated query will be done in the precise moment you try to get a record (First, Any etc), a list of records(ToList()), or enumerate them (foreach).
.Take(N) is not considered fetching records - but adding a (SELECT TOP N / LIMIT N) to the query
No, this will not run the query, you can structure your query this way, and it is actually preferable if it helps readability. You are taking advantage of lazy evaluation in this case.
The query will only run if you enumerate results from it by using i.e. foreach or you force eager evaluation of the query results, i.e. using .ToList() or otherwise force evaluation, i.e evaluate to a single result using i.e First() or Single().
Try checking out this dynamic Linq dll that was released a few years back - it still works just fine and looks to be exactly what you are looking for.

Is LINQ faster on a list or a table?

I have many queries to do and I was wondering if there is a significant performance difference between querying a List and a DataTable or even a SQL server indexed table? Or maybe would it be faster if I go with another type of collection?
In general, what do you think?
Thank you!
It should almost always be faster querying anything in memory, like a List<T> or a DataTable vis-a-vis a database.
Having said that, you have to get the data into an in-memory object like a List before it can be queried, so I certainly hope you're not thinking of dumping your DB into a List<T> for fast querying. That would be a very bad idea.
Am I getting the point of your question?
You might be confusing Linq with a database query language. I would suggest reading up on Linq, particularly IQueryable vs IEnumerable.
In short, Linq is an in-code query language, which can be pointed at nearly any collection of data to perform searches, projections, aggregates, etc in a similar fashion as SQL, but not limited to RDBMSes. It is not, on its face, a DB query language like SQL; it can merely be translated into one by use of an IQueryable provider, line Linq2SQL, Linq2Azure, Linq for Entities... the list goes on.
The IEnumerable side of Linq, which works on in-memory objects that are already in the heap, will almost certainly perform better than the IQueryable side, which exists to be translated into a native query language like SQL. However, that's not because of any inherent weakness or strength in either side of the language. It is instead a factor of (usually) having to send the translated IQueryable command over a network channel and get the results over same, which will perform much more slowly than your local computer's memory.
However, the "heavy lifting" of pulling records out of a data store and creating in-memory object representations has to be done at some time, and IQueryable Linq will almost certainly be faster than instantiating ALL records as in-memory objects, THEN using IEnumerable Linq (Linq 2 Objects) to filter to get your actual data.
To illustrate: You have a table MyTable; it contains a relatively modest 200 million rows. Using a Linq provider like Linq2SQL, your code might look like this:
//GetContext<>() is a method that will return the IQueryable provider
//used to produce MyTable entitiy objects
//pull all records for the past 5 days
var results = from t in Repository.GetContext<MyTable>()
where t.SomeDate >= DateTime.Today.AddDays(-5)
&& t.SomeDate <= DateTime.Now
select t;
This will be digested by the Linq2SQL IQueryable provider into a SQL string like this:
SELECT [each of MyTable's fields] FROM MyTable WHERE SomeDate Between #p1 and #p2; #p1 = '2/26/2011', #p2 = '3/3/2011 9:30:00'
This query can be easily digested by the SQL engine to return EXACTLY the information needed (say 500 rows).
Without a Linq provider, but wanting to use Linq, you may do something like this:
//GetAllMyTable() is a method that will execute and return the results of
//"Select * from MyTable"
//pull all records for the past 5 days
var results = from t in Repository.GetAllMyTable()
where t.SomeDate >= DateTime.Today.AddDays(-5)
&& t.SomeDate <= DateTime.Now
select t;
On the surface, the difference is subtle. Behind the scenes, the devil's in those details. This second query relies on a method that retrieves and instantiates an object for every record in the database. That means it has to pull all those records, and create a space in memory for them. That will give you a list of 200 MILLION records, which isn't so modest anymore now that each of those records was transmitted over the network and is now taking up residence in your page file. The first query MAY introduce some overhead in building and then digesting the expression tree into SQL, but it's MUCH preferred over dumping an entire table into an in-memory collection and iterating over it.

Categories

Resources