LINQ and selection rows from big database - c#

I have some database ang now it contains a table with about 100 rows. But in future it will have not 100 but 1 000 000+ rows and I have to be careful with my web application I'm developing now.
Problem is next: at web page I need to create paged list what will show records to user. And here is a sample of code that I plan to use
public IQueryable<MyTable> GetRows(int from, int to)
{
var queryRes = (from row in SomeDataContext.MyTable
order by row.id
select row).AsQueriable();
return queryRes.Take(to).Skip(from);
}
It is only sample of code. I did not run it.
But question is what will go on in this case? I see tow scenarios
It will load all rows from database and at server side and records in range from 'from' to 'to' will be returned. Other will be ignored. In this case my application will have big troubles. Imagine load 1 000 000 rows from database every time. It will be disaster.
It will construct SQL request what will return only rows I need without loading others. That's exactly what I need.
I think that it will be 2 scenario but I'm not sure and can't check it. Am I correct?

As a side-note, you don't have to call AsQueryable. It is enough to do
var queryRes = SomeDataContext.MyTable.OrderBy(r => r.Id);
return queryRes.Take(to).Skip(from);
And to answer your question - scenario 2 will be executed. You can always check the generated SQL by using the SQL Server Profiler, but in case you are using Entity Framework, you can even do queryRes.ToString(). And as #Aron correctly pointed out - the query will be actually executed against the database only when enumerating the results (e.g. calling queryRes.ToList()).
These questions address the issue of looking up the SQL code in more detail:
How to view generated SQL from Entity Framework?
exact sql query executed by Entity Framework

Strictly speaking, neither 1 nor 2 is correct. Running the code DOES NOT hit the database. It constructs an expression tree. The calling code can still modify the expression tree further without hitting the database.
With the IQueryable interface no SQL is run. It is at the point when you call IEnumerable.GetEnumerator() that the underlying Linq Provider converts the WHOLE expression into a query. In this case a SQL query, and then run it.
So for example, with this code. You could have
void Main()
{
var foo = from x in GetRows(10, 10)
where x.Id > 1000
select x;
foreach(var f in foo)
{
//Stuff
}
}
The sql that is actually run will actually be closer to
SELECT a,b,c FROM
(SELECT a,b,c, ROW_NUMBER() OVER (ORDER BY ...) as row_number
FROM Table
WHERE id > 1000) t0
WHERE to.row_number BETWEEN 10 and 20;
To be honest you are going about this wrong. You don't need a GetRows method. I would directly call the Linq query when constructing the table itself. You should take a look at the IRepository pattern that MVC scaffolding uses.
Finally if this is meant to be called as a WebQuery for AJAX I would look at the two OData implementations in .net (WCF Data Services and WebAPI OData).

You are right.
The 2. scenario is what will happen. When the query is eventuallty exectuted.
I Would sugges to reverse the Take - Skip, so you start by Skip
queryRes.Skip(from).Take(to)
Debuggen this method will not make any calls to the database. It just returns the query - not the resualt.
If you want to test exactly what will happen, try download LinqPad - it is a great to for demystifying linq queries.

Related

Calling SKIP() in code or using TOP in function

I'm coding an application with Entity Framework in which I rely heavily on user defined functions.
I have a question about the best way (most optimized way) of how I limit and page my result sets. Basically I am wondering if these two options are the same or one is prefered performance wise.
Option 1.
//C#
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
Option 2.
//C#
var result2 = _DB.fn_GetData(page = 0, size = 100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
ORDER BY Id
OFFSET (size * page) ROWS FETCH NEXT size ROWS ONLY
To me these seem to be producing about the same result, but maybe I am missing some key aspect.
You'll have to be aware when your LINQ statement is AsEnumerable and when it is AsQueryable. As long as your statement is an IQueryable<...> the software will try to translate it into SQL and let your database do the query. Once it really has lost the IQueryable, and has become an implementation of an IEnumerable, the data has been brought to local memory, and all further LINQ statements will be performed by your process, not by the database.
If you use your debugger, you will see that the return value of your fn_getData returns an IEnumerable. This means that the result of fn_GetData is brought to local memory and your OrderBy etc is performed by your process.
Usually it is much more efficient to only move the records that you will use to local memory. Besides: do not fetch the complete records, but only the properties that you plan to use. So in this case I guess you'll have to create an extended version of fn_GetData that returns only the values you plan to use
I suggest second option because SQL Server can more faster then C# methods.
In your first option, you take all of the records in table and loop through. But second option, SQL Server do it for you and you get what you want.
You should apply the limiting and where clauses (it depends on table indexes) in the database as far as possible. For first example;
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
The whole table is retrieved from database into in-memory and it kills the performance and reliability. I strongly don't suggest it. You should consider to put some limitations to filter records on the database. So, the second option is better approach in this case.

Linq: Method has no supported translation to SQL - but how to dump to memory?

I have a Linq query that reads from a SQL table and 1 of the fields it returns are from a custom function (in C#).
Something like:
var q = from my in MyTable
select new
{
ID = my.ID,
Amount = GetAmount(ID)
};
If I do a q.Dump() in LinqPad, it shows the results, which tells me that it runs the custom function without trying to send it to SQL.
Now I want to union this to another query, with:
var q1 = (from p in AnotherQuery.Union(q)...
and the I get the error that Method has no supported translation to SQL.
So, my logic tells me that I need to dump q in memory and then try to union to that. I've tried doing that with ToList() and creating a secondary query that populates itself from the List, but that leads to a long list of different errors. Am I on the right track, by trying to get q in memory and union on that, or are there better ways of doing this?
You can't use any custom functions in a LINQ query that gets translated - only the functions supported by the given LINQ provider. If you want your query to happen on the server, you need to stick with the supported functions (even if it sometimes means having to inline code that would otherwise be reused).
The difference between your two queries boils down to when (and where) the projection happens. In your first case, the data from MyTable is returned from the DB - in your sample, just the ID. Then, the projection happens on top of this - the GetAmount method is called in your application for each of ID.
On the other hand, there's no such way for this to happen in your second query, since you're not using GetAmount in the final projection.
You either need to replace the custom function with inlined query the provider understands, or refactor all your queries to use the supported functions in addition with whatever you need to do in-memory. There's no point in giving you any sample code, since it depends entirely on your actual query, and what you're really trying to query for.

Performance issues to iterate results with C# SQLite DataReader and attached database

I am using System.Data.SQLite and SQLiteDataReader in my C# project. I am facing performance issues when getting the results of a query with attached databases.
Here is an example of a query to search text into two databases :
ATTACH "db2.db" as db2;
SELECT MainRecord.RecordID,
((LENGTH(MainRecord.Value) - LENGTH(REPLACE(UPPER(MainRecord.Value), UPPER("FirstValueToSearch"), ""))) / 18) AS "FirstResultNumber",
((LENGTH(DB2Record.Value) - LENGTH(REPLACE(UPPER(DB2Record.Value), UPPER("SecondValueToSearch"), ""))) / 19) AS "SecondResultNumber"
FROM main.Record MainRecord
JOIN db2.Record DB2Record ON DB2Record.RecordID BETWEEN (MainRecord.PositionMin) AND (MainRecord.PositionMax)
WHERE FirstResultNumber > 0 AND SecondResultNumber > 0;
DETACH db2;
When executing this query with SQLiteStudio or SQLiteAdmin, this works fine, I am getting the results in a few seconds (the Record table can contain hundreds of thousands of records, the query returns 36000 records).
When executing this query in my C# project, the execution takes a few seconds too, but it takes hours to run through all the results.
Here is my code :
// Attach databases
SQLiteDataReader data = null;
using (SQLiteCommand command = this.m_connection.CreateCommand())
{
command.CommandText = "SELECT...";
data = command.ExecuteReader();
}
if (data.HasRows)
{
while (data.Read())
{
// Do nothing, just iterate all results
}
}
data.Close();
// Detach databases
Calling the Read method of the SQLiteDataReader once can take more than 10 seconds ! I guess this is because the SQLiteDataReader is lazy loaded (and so it doesn't return the whole rowset before reading the results), am I right ?
EDIT 1 :
I don't know if this has something to do with lazy loading, like I said initially, but all I want is being able to get ALL the results as soon as the query is ended. Isn't it possible ? In my opinion, this is really strange that it takes hours to get results of a query executed in few seconds...
EDIT 2 :
I just added a COUNT(*) in my select query in order to see if I could get the total number of results at the first data.Read(), just to be sure that it was only the iteration of the results that was taking so long. And I was wrong : this new request executes in few seconds in SQLiteAdmin / SQLiteStudio, but takes hours to execute in my C# project. Any idea why the same query is so much longer to execute in my C# project?
EDIT 3 :
Thanks to EXPLAIN QUERY PLAN, I noticed that there was a slight difference in the execution plan for the same query between SQLiteAdmin / SQLiteStudio and my C# project. In the second case, it is using an AUTOMATIC PARTIAL COVERING INDEX on DB2Record instead of using the primary key index. Is there a way to ignore / disable the use of automatic partial covering indexes? I know it is used to speed up the queries, but in my case, it's rather the opposite that happens...
Thank you.
Besides finding matching records, it seems that you're also counting the number of times the strings matched. The result of this count is also used in the WHERE clause.
You want the number of matches, but the number of matches does not matter in the WHERE clause - you could try change the WHERE clause to:
WHERE MainRecord.Value LIKE '%FirstValueToSearch%' AND DB2Record.Value LIKE '%SecondValueToSearch%'
It might not result in any difference though - especially if there's no index on the Value columns - but worth a shot. Indexes on text columns require alot of space, so I wouldn't blindly recommend that.
If you haven't done so yet, place an index on the DB2's RecordID column.
You can use EXPLAIN QUERY PLAN SELECT ... to make SQLite spit out what it does to try to make your query perform, the output of that might help diagnose the problem.
Are you sure you use the same version of sqlite in System.Data.SQLite, SQLiteStudio and SQLiteAdmin ?
You can have huge differences.
One more typical reason why SQL query can take different amount of time when executed with ADO.NET and from native utility (like SQLiteAdmin) are command parameters used in CommandText (it is not clear from your code whether parameters are used or not). Depending on ADO.NET provider implementation the following identical CommandText values:
SELECT * FROM sometable WHERE somefield = ? // assume parameter is '2'
and
SELECT * FROM sometable WHERE somefield='2'
may lead to absolutely different execution plan and query performance.
Another suggestion: you may disable journal (specifying "Journal Mode=off;" in the connection string) and synchronous mode ("Synchronous=off;") as these options also may affect query performance in some cases.

Improving Linq query

I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.

How to make more efficient query using Nhibernate in asp.net mvc3

I have about 1 million records in my contact table in the DB, now I have to get only 30. How can I make the following query more efficient:
GetContacts Method
private IQueryable<Contact> GetContacts()
{
return this.Query().Where(c => c.ActiveEmployer== true); // here ' this ' is the current service context
}
Gettin Contacts
var contactlist = GetContacts(); // this will get all records from db
var finalcontacts = contactlist.Skip(0).Take(30).Fetch(c.Roles).Fetch(c.SalaryInfo).ThenFetch(c.Employer);
But this query takes almost 10 seconds or more some times, and in the future I can have 30 - 40 million contacts then what would I do?
There may be some people that could come up with a great fix for that query to help you without needing more information, but for that query and all others you can probably gain much more insight by taking a look at the query that is created by NHibernate. If you've not done that before you can enable Log4Net (and possibly NLog) to output the SQL statements NH produces, or you can profile the SQL connection inside SQL Server. You can then run that SQL yourself and select to show the query plan. This will tell you exactly what the query is doing and potentially you can solve the problem yourself. It might be missing index, the original WHERE or one or all of the subsequent FETCH statements. Another question would be, have you hand crafted a SQL statement to do something similar and how long does that take?

Categories

Resources