Firstly, I am a newbie to WCF and SQL Server. I am developing an application that connects with WCF and SQL Server 2012. I have a table with rows having a million records and that count will keep on increasing. When the client sends a request, I will fetch 30 rows and then show the next 30 rows if the user request it and so on. My requirement is to do paging in WCF or in SQL. I have the following questions:
I was wondering where should I implement the paging concept, in WCF or in SQL Server? Which approach is the fastest?
If I implement it in WCF I will use LINQ's Skip and Take operators to fetch the page requested. Is that the right way?
If I use SQL, which approach will fetch the fastest result OFFSET (or) ROW_NUMBER() OVER option?
Since I am new, these approaches are what I know. Is there any other approach that I don't know of?
You want to page on SQL Server, if you don't every query you make will return a million rows you must load in to memory then you immediately throw away all but 30 of them. Very inefficient.
It depends on what you mean. If you are using something like Entity Framework using Skip and Take will not do those operations in memory, it will transform them in to sql queries and run them in sql.
If it is available to you OFFSET will give you better performance than ROW_NUMBER() OVER.
One thing to note, if you are doing paging you must make sure your ordering is deterministic. There can be no ties in ordering, if you allow ties you could have one right on your page break. Lets say Row A and Row B are "tied" by your order by:
You run a query for page 1 and Row A is considered to be before Row B and Row A is shown as the last item on the page
You run a query for page 2 and Row B is considered to be before Row A and Row A is shown as the first item on the 2nd page.
You never displayed Row B in the above example. The easiest way to fix this is always make sure you do a order by on your primary key (or any other set of columns you could do a unique index on) as the last thing you order by, this makes sure you never have any ties.
Related
I am working on a project to build an IDE like SQL Developer (or SQL Server Management Studio). I am intending to write any ad-hoc queries in the IDE editor and show the retrieved data in a datagrid. The project is built in C# for a .NET Windows Form environment.
My challenge is to execute any select operations on the large tables (millions of rows) without applying any pagination. Pagination will change my actual query in SQL Server Profiler.
Suppose, if I write
SELECT *
FROM LargeTableA
and apply virtualization and each time I fetch 50 rows from LargeTableA, internally the actual query will be changed and the number of database hits will be multiple. This is not my expectation.
I want to accomplish the job like Professional SQL IDEs (SQL Developer). I need to know how they do it without applying pagination. I am kindly requesting experts suggestions to guide me so that I can accomplish the task with your kind help.
Do we agree about the basics? When you call command.ExecuteReader() you get a SqlDataReader. This has a Read() method that lets you iterate, once, forward, through your result set. Your connection stays open until you close it. You can close it before you've read every row if, for whatever reason, you decide you don't need to continue. You don't know, and shouldn't care, how many rows are buffered in your connection. All this is independent of pagination in the sql query, and it would let you, for example, implement an infinite scroll in your data grid, or display the first n records only, to avoid burying your application. Does that help?
I'm a C++ programmer and I'm not familiar with the .NET database model. I usually use IDataReader (OdbcDataReader, OledbDataReader or SqlDataReader) to read data from database. Sometimes when I need a bulk of data I use DataAdapter, but what should I do to achieve the functionality of scrollable cursors that exists in native libraries like ODBC?
Thanks all of you for your answers, but I am in a situation that I can't accept them, of course this is my fault that didn't explain my problem completely. I explain it as a comment in one of answers that now removed.
I have to write a program that will act as a proxy between client side program and MSSQL, for this library I have following requirements:
My program should be compatible with MSSQL2000
I don't know all the tables and queries that will be sent by the user, I should simply add some information to it, make a log, ... and then execute it against MSSQL, so it is really hard to use techniques that based on ordered field(s) of the query or primary key of the table(All my works are in one database but that database is huge and may change over time).
Only a part of data is needed by the client, most DBMS support LIMIT OFFSET, unfortunately MSSQL do not support it, and ROW_NUMBER does not exist in the MSSQL2000 and if it supported, then again I need to understand program logic and that need a parse of SQL command(Actually I write a parsing library with boost::spirit but that's native code and beside that I'm not yet 100% sure about its functionality).
I may have multiple clients but most of queries that will be sent by them are one of a few predefined queries(of course users still send custom queries but its about 30% of all queries), So I think I can open some scrollable cursors and respond to clients using that cursors and a custom cache.
Server machine and its MSSQL will be dedicated to my program, so I really want to use all of the power of the server and DBMS to achieve my functionality.
So now:
What is the problem in using scrollable cursors and why I should avoid them?
How can I use scrollable cursors in .NET?
In SQL Server you can create queries paged thus. The page number you handle it easily from the application. You do not need to create cursors for this task.
For SQL Server 2005 o higher
SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY ID) AS ROW FROM TABLEA ) AS ALIAS
WHERE ROW > 40
AND ROW <= 49
For SQL Server 2000
SELECT TOP 10 T.* FROM TABLA AS T WHERE T.ID NOT IN
( SELECT TOP 39 id from tabla order by id desc )
ORDER BY T.ID DESC
PD: edited to include support for SQL Server 2000
I usually use DataReader.Read() to skip all rows that I do not want to use when doing paging on a DB which do not support paging.
If you don't want to build the SQL paged query yourself you are free to use my paging class: https://github.com/jgauffin/Griffin.Data/blob/master/src/Griffin.Data/BasicLayer/Paging/SqlServerPager.cs
When Microsoft designed the ADO.NET API, they made the decision to expose only firehose cursors (IDataReader etc). This may or may not actually pose a problem for you. You say that you want "functionality of scrollable cursors", but that can mean all sorts of things, not just paging, and each particular use case can be tackled in a variety of ways. For example:
Requirement: The user should be able to arbitrarily page up and down the resultset.
Retrieve only one page of data at a time, e.g. using the ROW_NUMBER() function. This is more efficient than scrolling through a cursor.
Requirement: I have an extremely large data set and I only want to process one row at a time to avoid running out of memory.
Use the firehose cursor provided by ADO.NET. Note that this is only practical if (a) you don't need to hit the database at all during the loop, or (b) you have MARS configured in your connection string.
Simulate a keyset cursor by retrieving the set of unique identifiers into an array, then loop through the array and read one row of data at a time.
Requirement: I am doing a complicated calculation that involves moving forwards and backwards through the resultset.
You should be able to re-write your algorithm to eliminate this requirement. For example, read one set of rows, process them, read another set of rows, process them, etc.
UPDATE (more information provided in the question)
Your business requirements are asking too much. You have to handle arbitrary queries that assume the presence of scrollable cursors, but you can't provide scrollable cursors, and you can't re-write the client code to not use scrollable cursors. That's an impossible position to be in. I recommend you stick with what you currently have (C++ and ODBC) and don't bother trying to re-write it in .NET.
I don't think cursors will work for you particular case. The main reason is that you have 3 tiers. But let's take two steps back.
Most 3 tier applications have a stateless middle tier (your c++ code). Caching is fine since it really just an optimization and does not create any real state in the middle tier. The middle tier normally has a small number of open sessions to the database. Because opening a db session is expensive for the processor, and after the db session is open a set amount of RAM is reserved at the database server. When a request is received by the middle tier, the request is processed and handed on to the SQL database. An algorithm may be used to pick any of the open sessions, or it can even be done at random. In this model it is not possible to know what session will receive the next request. Cursors belong to the session that received the original query request. So you can't really expect that the receiving session will be the one that has your open cursor.
The 3 tier model I described is used mainly for web applications so they can scale to hundreds or thousands of clients. Were SQL servers would never be able to open that many sessions. Microsoft ADO.NET already has many features to support the kind of architecture I described, so it is not very hard to implement. And the same is used even in non Web applications depending on the circumstance. You could potentially keep track of your sessions so you could open a single session per client, I would first make sure that the use case justifies that. Know that open cursors can take up a lot of resources as well.
Cursors still have a place within a single transaction, it's just hard to keep them open so that the client application can fetch/update values within the result set.
What I would suggest its that you do the following within the query transaction. Store in a separate table the primary key values of the main table in your query. On the separate table include other values like sessionid and rownumber. Return a few of the first rows by linking to the new table in the original query. And in subsequent calls just query the corresponding rows again by linking to your new table. You will need an equivalent to a caching mechanism to purge old data, and to refresh the result set according to your needs.
I'm using ASP.Net Dynnamic Data (.Net Framework version 4.0) and SQL Server 2008 to display data from a set of tables. In the database there is a certain master table containing 83 columns with 854581 rows of data with is still expected to grow.
I did little modification to the default templates, and most operations are snappy enough for a business website, including filtering, foreign key displaying, etc. Yet the pager proved to be a major performance problem.
Before I did any optimization using the Database Tuning Advisor, the master table list page won't even display as a simple count query times out. After all optimizations, the first few hundred pages will display without a problem, but anything beyond 1000 pages is unbearably slow and the last page times out.
Note that I've optimized the query for the last page using Database Tuning Advisor, which gave me no advice at all.
I tried to run a very simple query using row_number() to see the last page, similar to the query below:
SELECT TOP 10* FROM
(SELECT *, row_number() OVER (ORDER BY [Primary Key Column] ASC) AS [rn]
FROM [Master Table]) AS TB1
WHERE TB1.rn > 854580
The above query took about 20 seconds to complete the first time it was executed, while the SQL Server service sqlservr.exe ate up 1,700 KiB of memory (I'm using 32-bit windows with no special kernel switches, therefore, every process has at most 2 GiB of address space).
So, my question is, is there a more efficient way than using row_number() to do pagination in ASP.Net Dynamic data and how to implement it.
There are two possibilities that I came up with:
There is some magical T-SQL language construct with better performance, for instance, something like LIMIT <From row number>, <To row number> in MySQL.
I add a sequence number column in the table, and page according to that number when there is no filtering or sorting
Perhaps the greater problem is how to implement this custom paging in ASP.Net Dynamic Data. Any help is appreciated, even pagination samples not intended to improve performance, just to give me an idea.
Note: I can't change the database engine I use, so don't recommend MySQL or PostgreSQL or anything like that.
20 seconds is long even for 1 million rows - check your SQL Server indexes!
Also make sure you are retrieving only the data columns required in the List view. (you mentioned 83 columns....you are not showing all those in the List view.
Another approach I can think of is to use the Linq query expressions with Skip() and Take() to get a page at a time on pagination. Of course this means you will be querying less data, more often.
I am interested in what the best practices are for paging large datasets (100 000+ records) using ASP.NET and SQL Server.
I have used SQL server to perform the paging before, and although this seems to be an ideal solution, issues arise around dynamic sorting with this solution (case statements for the order by clause to determine column and case statements for ASC/DESC order). I am not a fan of this as not only does it bind the application with the SQL details, it is a maintainability nightmare.
Opened to other solutions...
Thanks all.
In my experience, 100 000+ records are too many records for the user looking at them. Last time I did this, I provided filters. So users can use them and see the filtered (less number of) records and order them, so paging and ordering became much faster (than paging/ordering on whole 100 000+ records). If user didn't use filters, I showed a "warning" that large number of records would be returned and there would be delays. Adding an index on the column being ordered as suggested by Erick would also definitely help.
I wanted to add a quick suggestion to Raj's answer. If you create a temp table with the format ##table, it will survive. However, it will also be shared across all connections.
If you create an Index on the column that will be sorted, the cost of this method is far lower.
Erick
If you use the Order by technique, every time you page through, you will cause the same load on the server because you running the query, then filtering the data.
When I have had the luxury of non-connection-pooled environments, I have created and left the connection open until paging is done. Created a #Temp table on the connection with just the IDs of the rows that need to get back, and added and IDENTITY field to this rowset. Then do paging using this table to get the fastest returns.
If you are restricted to a connection-pooled environment, then the #Temp table is lost as soon as the connection is closed. In that case, you will have to cache the list of Ids on the server - never send them to the client to be cached.
I've a US city/state list table in my sql server 2005 database which is having a million records. My web application pages are having location textbox which uses AJAX autocomplete feature. I need to show complete city/state when user types in 3 characters.
For example:
Input bos..
Output:Boston,MA
Currently, performance wise, this functionality is pretty slow. How can i improve it?
Thanks for reading.
Have you checked in the indexes on your database? If your query is formatted correctly, and you have the proper indexes on your table, you can query a 5 million row database and get your results in less then a second. I would suggest to see if you have an index on the City with added column State onto the index. That way when you query by city, it will return both the city and state from the index.
If you run your query in sql management studio and press ctrl-m you can see the execution plan on your query. If you see something like table scan or index scan then you have the wrong index on your table. You want to make sure your results have an index seek, this means that your query is going through the proper pages in the database to find your data.
Hope this helps.
My guess would be that the problem you're having is not the database itself (although you should check it for index problems), but the amount of time that it takes to retrieve the information from the database, put it in the appropriate objects etc., and send it to the browser. If this is the case, there aren't a lot of options without some real work.
You can cache frequently accessed information on the web server. If you know there are a lot of cities which are frequently accessed, you can store them up-front and then check the database if what the user is looking for isn't in the system. We use prefix trees to store information when a user is typing something and we need to find it in a list.
You can start to pull information from the database as soon as the user starts to type and then pair down full result set down after you get more information from the user. This is a little trickier, as you'll have to store the information in memory between requests (so if the user types "B", you start the retrieval and store it in a session. When the user is done typing "BOS" the result set from the initial query is in memory temporarily and you can loop through and pull the subset that matches the final request).
Use parent child dropdowns