Paging Large Datasets - SQL Server (Best Practice)

Paging Large Datasets - SQL Server (Best Practice) - c#

I am interested in what the best practices are for paging large datasets (100 000+ records) using ASP.NET and SQL Server.
I have used SQL server to perform the paging before, and although this seems to be an ideal solution, issues arise around dynamic sorting with this solution (case statements for the order by clause to determine column and case statements for ASC/DESC order). I am not a fan of this as not only does it bind the application with the SQL details, it is a maintainability nightmare.
Opened to other solutions...
Thanks all.

In my experience, 100 000+ records are too many records for the user looking at them. Last time I did this, I provided filters. So users can use them and see the filtered (less number of) records and order them, so paging and ordering became much faster (than paging/ordering on whole 100 000+ records). If user didn't use filters, I showed a "warning" that large number of records would be returned and there would be delays. Adding an index on the column being ordered as suggested by Erick would also definitely help.

I wanted to add a quick suggestion to Raj's answer. If you create a temp table with the format ##table, it will survive. However, it will also be shared across all connections.
If you create an Index on the column that will be sorted, the cost of this method is far lower.
Erick

If you use the Order by technique, every time you page through, you will cause the same load on the server because you running the query, then filtering the data.
When I have had the luxury of non-connection-pooled environments, I have created and left the connection open until paging is done. Created a #Temp table on the connection with just the IDs of the rows that need to get back, and added and IDENTITY field to this rowset. Then do paging using this table to get the fastest returns.
If you are restricted to a connection-pooled environment, then the #Temp table is lost as soon as the connection is closed. In that case, you will have to cache the list of Ids on the server - never send them to the client to be cached.

Related

How to implement paging of rows with WCF and SQL Server?

Firstly, I am a newbie to WCF and SQL Server. I am developing an application that connects with WCF and SQL Server 2012. I have a table with rows having a million records and that count will keep on increasing. When the client sends a request, I will fetch 30 rows and then show the next 30 rows if the user request it and so on. My requirement is to do paging in WCF or in SQL. I have the following questions:
I was wondering where should I implement the paging concept, in WCF or in SQL Server? Which approach is the fastest?
If I implement it in WCF I will use LINQ's Skip and Take operators to fetch the page requested. Is that the right way?
If I use SQL, which approach will fetch the fastest result OFFSET (or) ROW_NUMBER() OVER option?
Since I am new, these approaches are what I know. Is there any other approach that I don't know of?

You want to page on SQL Server, if you don't every query you make will return a million rows you must load in to memory then you immediately throw away all but 30 of them. Very inefficient.
It depends on what you mean. If you are using something like Entity Framework using Skip and Take will not do those operations in memory, it will transform them in to sql queries and run them in sql.
If it is available to you OFFSET will give you better performance than ROW_NUMBER() OVER.
One thing to note, if you are doing paging you must make sure your ordering is deterministic. There can be no ties in ordering, if you allow ties you could have one right on your page break. Lets say Row A and Row B are "tied" by your order by:
You run a query for page 1 and Row A is considered to be before Row B and Row A is shown as the last item on the page
You run a query for page 2 and Row B is considered to be before Row A and Row A is shown as the first item on the 2nd page.
You never displayed Row B in the above example. The easiest way to fix this is always make sure you do a order by on your primary key (or any other set of columns you could do a unique index on) as the last thing you order by, this makes sure you never have any ties.

How to update huge sql data in asp .net application using c#

I am doing web application using c# .net and sql server 2008 as back end. Where application read data from excel and insert into sql table. For this mechanism I have used SQLBulkCopy function which work very well. Sql table has 50 fields from which system_error and mannual_error are two fields. After inserting records in 48 columns I need to re-ckeck all this records and update above mentioned two columns by specific errors e.g. Name filed have number, qty Not specified etc. For this I have to check each column by fetching in datatable and using for loop.
Its work very well when record numbers are 1000 to 5000. But it took huge time say 50 minutes when records are around 100,000 or more than this.
Initially I have used simple SQL Update Query then I had used stored procedure but both requires same time.
How to increase the performance of application? What are other ways when dealing with huge data to update? Do suggestions.

I hope this is why people use mongodb and no SQL systems. You can update huge data setsby optimizing your query. Read more here:
http://www.sqlservergeeks.com/blogs/AhmadOsama/personal/450/sql-server-optimizing-update-queries-for-large-data-volumes
Also check:Best practices for inserting/updating large amount of data in SQL Server 2008

One thing to consider is that iterating over a database table row by row, rather than performing set based update operations would incur a significant performance hit.
If you are in fact performing set based updates on your data and still have significant performance problems you should look at the execution plan of your queries so that you can workout where and why they are performing so badly.

Scrollable ODBC cursor in C#

I'm a C++ programmer and I'm not familiar with the .NET database model. I usually use IDataReader (OdbcDataReader, OledbDataReader or SqlDataReader) to read data from database. Sometimes when I need a bulk of data I use DataAdapter, but what should I do to achieve the functionality of scrollable cursors that exists in native libraries like ODBC?
Thanks all of you for your answers, but I am in a situation that I can't accept them, of course this is my fault that didn't explain my problem completely. I explain it as a comment in one of answers that now removed.
I have to write a program that will act as a proxy between client side program and MSSQL, for this library I have following requirements:
My program should be compatible with MSSQL2000
I don't know all the tables and queries that will be sent by the user, I should simply add some information to it, make a log, ... and then execute it against MSSQL, so it is really hard to use techniques that based on ordered field(s) of the query or primary key of the table(All my works are in one database but that database is huge and may change over time).
Only a part of data is needed by the client, most DBMS support LIMIT OFFSET, unfortunately MSSQL do not support it, and ROW_NUMBER does not exist in the MSSQL2000 and if it supported, then again I need to understand program logic and that need a parse of SQL command(Actually I write a parsing library with boost::spirit but that's native code and beside that I'm not yet 100% sure about its functionality).
I may have multiple clients but most of queries that will be sent by them are one of a few predefined queries(of course users still send custom queries but its about 30% of all queries), So I think I can open some scrollable cursors and respond to clients using that cursors and a custom cache.
Server machine and its MSSQL will be dedicated to my program, so I really want to use all of the power of the server and DBMS to achieve my functionality.
So now:
What is the problem in using scrollable cursors and why I should avoid them?
How can I use scrollable cursors in .NET?

In SQL Server you can create queries paged thus. The page number you handle it easily from the application. You do not need to create cursors for this task.
For SQL Server 2005 o higher
SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY ID) AS ROW FROM TABLEA ) AS ALIAS
WHERE ROW > 40
AND ROW <= 49
For SQL Server 2000
SELECT TOP 10 T.* FROM TABLA AS T WHERE T.ID NOT IN
( SELECT TOP 39 id from tabla order by id desc )
ORDER BY T.ID DESC
PD: edited to include support for SQL Server 2000

I usually use DataReader.Read() to skip all rows that I do not want to use when doing paging on a DB which do not support paging.
If you don't want to build the SQL paged query yourself you are free to use my paging class: https://github.com/jgauffin/Griffin.Data/blob/master/src/Griffin.Data/BasicLayer/Paging/SqlServerPager.cs

When Microsoft designed the ADO.NET API, they made the decision to expose only firehose cursors (IDataReader etc). This may or may not actually pose a problem for you. You say that you want "functionality of scrollable cursors", but that can mean all sorts of things, not just paging, and each particular use case can be tackled in a variety of ways. For example:
Requirement: The user should be able to arbitrarily page up and down the resultset.
Retrieve only one page of data at a time, e.g. using the ROW_NUMBER() function. This is more efficient than scrolling through a cursor.
Requirement: I have an extremely large data set and I only want to process one row at a time to avoid running out of memory.
Use the firehose cursor provided by ADO.NET. Note that this is only practical if (a) you don't need to hit the database at all during the loop, or (b) you have MARS configured in your connection string.
Simulate a keyset cursor by retrieving the set of unique identifiers into an array, then loop through the array and read one row of data at a time.
Requirement: I am doing a complicated calculation that involves moving forwards and backwards through the resultset.
You should be able to re-write your algorithm to eliminate this requirement. For example, read one set of rows, process them, read another set of rows, process them, etc.
UPDATE (more information provided in the question)
Your business requirements are asking too much. You have to handle arbitrary queries that assume the presence of scrollable cursors, but you can't provide scrollable cursors, and you can't re-write the client code to not use scrollable cursors. That's an impossible position to be in. I recommend you stick with what you currently have (C++ and ODBC) and don't bother trying to re-write it in .NET.

I don't think cursors will work for you particular case. The main reason is that you have 3 tiers. But let's take two steps back.
Most 3 tier applications have a stateless middle tier (your c++ code). Caching is fine since it really just an optimization and does not create any real state in the middle tier. The middle tier normally has a small number of open sessions to the database. Because opening a db session is expensive for the processor, and after the db session is open a set amount of RAM is reserved at the database server. When a request is received by the middle tier, the request is processed and handed on to the SQL database. An algorithm may be used to pick any of the open sessions, or it can even be done at random. In this model it is not possible to know what session will receive the next request. Cursors belong to the session that received the original query request. So you can't really expect that the receiving session will be the one that has your open cursor.
The 3 tier model I described is used mainly for web applications so they can scale to hundreds or thousands of clients. Were SQL servers would never be able to open that many sessions. Microsoft ADO.NET already has many features to support the kind of architecture I described, so it is not very hard to implement. And the same is used even in non Web applications depending on the circumstance. You could potentially keep track of your sessions so you could open a single session per client, I would first make sure that the use case justifies that. Know that open cursors can take up a lot of resources as well.
Cursors still have a place within a single transaction, it's just hard to keep them open so that the client application can fetch/update values within the result set.
What I would suggest its that you do the following within the query transaction. Store in a separate table the primary key values of the main table in your query. On the separate table include other values like sessionid and rownumber. Return a few of the first rows by linking to the new table in the original query. And in subsequent calls just query the corresponding rows again by linking to your new table. You will need an equivalent to a caching mechanism to purge old data, and to refresh the result set according to your needs.

Is SQL or C# faster at pairing?

I have a lot of data which needs to be paired based on a few simple criteria. There is a time window (both records have a DateTime column), if one record is very close in time (within 5 seconds) to another then it is a potential match, the record which is the closest in time is considered a complete match. There are other fields which help narrow this down also.
I wrote a stored procedure which does this matching on the server before returning the
full, matched dataset to a C# application. My question is, would it be better to pull in the 1 million (x2) rows and deal with them in C#, or is sql server better suited to perform this matching? If Sql server is, then what is the fastest way of pairing data using datetime fields?
Right now I select all records from Table 1/Table 2 into temporary tables, iterate through each record in Table 1, look for a match in Table 2 and store the match (if one exists) in a temporary table, then I delete both records in their own temporary tables.
I had to rush this piece for a game I'm writing, so excuse the bad (very bad) procedure... It works, it's just horribly inefficient! The whole SP is available on pastebin: http://pastebin.com/qaieDsW7
I know the SP is written poorly, so saying "hey, dumbass... write it better" doesn't help! I'm looking for help in improving it, or help/advice on how I should do the whole thing differently! I have about 3/5 days to rewrite it, I can push that deadline back a bit, but I'd rather not if you guys can help me in time! :)
Thanks!

Ultimately, compiling your your data on the database side is preferable 99% of the time, as it's designed for data crunching (through the use of indexes, relations, etc). A lot of your code can be consolidated by the use of joins to compile the data in exactly the format you need. In fact, you can bypass almost all your temp tables entirely and just fill a master Event temp table.
The general pattern is this:
INSERT INTO #Events
SELECT <all interested columns>
FROM
FireEvent
LEFT OUTER JOIN HitEvent ON <all join conditions for HitEvent>
This way you match all fire events to zero or more HitEvents. After our discussion in chat, you can even limit it to zero or one hit event by wrapping it in a subquery and using a window function for ROW_NUMBER() OVER (PARTITION BY HitEvent.EventID ORDER BY ...) AS HitRank and add a WHERE HitRank = 1 to the outer query. This is ultimately what you ended up doing and got the results you were expecting (with a bit of work and learning in the process).

If the data is already in the database, that is where you should do the work. You absolutely should learn to display and query plans using SQL Server Management Studio, and become able to notice and optimize away expensive computations like nested loops.
Your task probably does not require any use of temporary tables. Temporary tables tend to be efficient when they are relatively small and/or heavily reused, which is not your case.

I would advise you to try to optimize the stored procedure if is not running fast enough and not rewrite it in C#. Why would you want to transfer millions of rows out of SQL Server anyway?
Unfortunately I don't have an SQL Server installation so I can't test your script, but I don't see any CREATE INDEX statements in there. If you didn't just skipped them for brevity, then you should surely analyze your queries and see which indexes are needed.

So the answer depends on several factors like resources available per client/server (Ram/CPU/Concurrent Users/Concurrent processes, etc.)
Here are some basic rules that will improve your performance regardless of what you use:
Loading a million rows into c# program is not a good practice. Unless this is a stand alone process with plenty of ram.
Uniqueidentifiers will never out perform Integers. Comparisons
Common Table Expression are a good alternative for fast performing matching. How to use CTE
Finally you have to consider output. If there is constant reading and writing that affects the user interface, then you should manage that in memory (c#), otherwise all CRUD operations should be kept inside the database.

How to improve ASP.Net Dynamic Data pagination performance

I'm using ASP.Net Dynnamic Data (.Net Framework version 4.0) and SQL Server 2008 to display data from a set of tables. In the database there is a certain master table containing 83 columns with 854581 rows of data with is still expected to grow.
I did little modification to the default templates, and most operations are snappy enough for a business website, including filtering, foreign key displaying, etc. Yet the pager proved to be a major performance problem.
Before I did any optimization using the Database Tuning Advisor, the master table list page won't even display as a simple count query times out. After all optimizations, the first few hundred pages will display without a problem, but anything beyond 1000 pages is unbearably slow and the last page times out.
Note that I've optimized the query for the last page using Database Tuning Advisor, which gave me no advice at all.
I tried to run a very simple query using row_number() to see the last page, similar to the query below:
SELECT TOP 10* FROM
(SELECT *, row_number() OVER (ORDER BY [Primary Key Column] ASC) AS [rn]
FROM [Master Table]) AS TB1
WHERE TB1.rn > 854580
The above query took about 20 seconds to complete the first time it was executed, while the SQL Server service sqlservr.exe ate up 1,700 KiB of memory (I'm using 32-bit windows with no special kernel switches, therefore, every process has at most 2 GiB of address space).
So, my question is, is there a more efficient way than using row_number() to do pagination in ASP.Net Dynamic data and how to implement it.
There are two possibilities that I came up with:
There is some magical T-SQL language construct with better performance, for instance, something like LIMIT <From row number>, <To row number> in MySQL.
I add a sequence number column in the table, and page according to that number when there is no filtering or sorting
Perhaps the greater problem is how to implement this custom paging in ASP.Net Dynamic Data. Any help is appreciated, even pagination samples not intended to improve performance, just to give me an idea.
Note: I can't change the database engine I use, so don't recommend MySQL or PostgreSQL or anything like that.

20 seconds is long even for 1 million rows - check your SQL Server indexes!
Also make sure you are retrieving only the data columns required in the List view. (you mentioned 83 columns....you are not showing all those in the List view.
Another approach I can think of is to use the Linq query expressions with Skip() and Take() to get a page at a time on pagination. Of course this means you will be querying less data, more often.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.