T-SQL query timeout / performance issue - c#

I have a table having around 1 million records. Table structure is shown below. The UID column is a primary key and uniqueidentifier type.
Table_A (contains a million records)
UID Name
-----------------------------------------------------------
E8CDD244-B8E4-4807-B04D-FE6FDB71F995 DummyRecord
I also have a function called fn_Split('Guid_1,Guid_2,Guid_3,....,Guid_n') which accepts a list of comma
seperated guids and gives back a table variable containing the guids.
From my application code I am passing a sql query to get new guids [Keys that are with application code but not in the database table]
var sb = new StringBuilder();
sb
.Append(" SELECT NewKey ")
.AppendFormat(" FROM fn_Split ('{0}') ", keyList)
.Append(" EXCEPT ")
.Append("SELECT UID from Table_A");
The first time this command is executed it times out on quite a few occassions. I am trying to figure out what would be a better approach here to avoid such timeouts and/or improve performance of this.
Thanks.

Firstly add an index if there isn't one, on table_a.uid, but i assume there is.
Some alternate queries to try,
select newkey
from fn_split
left outer join table_a
on newkey = uid
where uid IS NULL
select newkey
from fn_split(blah)
where newkey not in (select uid
from table_a)
select newkey
from fn_split(blah) f
where not exists(select uid
from table_a a
where f.newkey = a.uid)

There is plenty of info around here as to why you should not use a Guid for your primary key, especially if it in unordered. That would be the first thing to fix. As far as your query goes you might try what Paul or Tim suggested, but as far as I know EXCEPT and NOT IN will use the same execution plan, though the OUTER JOIN may be more efficint in some cases.

If you're using MS SQL 2008 then you can/should use TableValue Parameters. Essentially you'd send in your guids in the form of a DataTable to your stored procedure.
Then inside your stored procedure you can use the parameters as a "table" and do a join or EXCEPT or what have you to get your results.
This method is faster than using a function to split because functions in MS SQL server are really slow.
But I guess is the time is being taken due to massive Disk I/O this query requires. Since you're searching on your UId column and since they are "random" no index is going to help here. The engine will have to resort to a table scan. Which means you'll need some serious Disk I/O performance to get the results in "good time".
Using the Uid data type as in index is not recommended. However, it may not make a difference in your case. But let me ask you this:
The guids that you send in from your app, are in just a random list of guids or is here some business relationship or entity relationship here? It's possible, that your data model is not correct for what you are trying to do. So how do you determine what guids you have to search on?
However, for argument sake, let's assume your guids are just a random selection then there is no index that is really being used since the database engine will have to do a table scan to pick out each of the required guids/records from the million records you have. In a situation like this the only way to speed things up is at the physical database level, that is how your data is physically stored on the hard drives etc.
For example:
Having faster drives will improve performance
If this kind of query is being fired over and over then more memory on the box will help because the engine can cache the data in memory and it won't need to do physical reads
If you partition your table then the engine can parallelize the the seek operation and get you results faster.
If your table contains a lot of other fields that you don't always need, then spliting the table in two tables where table1 contains the guid and the bare minimum set of fields and table2 contains the rest will speed up the query quite a bit due to the disk I/O demands being less
Lot's of other things to look at here
Also note that when you send in adhoc SQL statements that don't have parameters the engine has to create a plan each time you execute it. In this case it's not a big deal but keep in mind that each plan will be cached in memory thus pushing out any data that might have been cached.
Lastly you can always increase the commandTimeOut property in this case to get past the timeout issues.
How much time does it take now and what kind of improvement are you looking to get ot hoping to get?

If I understand your question correctly, in your client code you have a comma-delimited string of (string) GUIDs. These GUIDS are usable by the client only if they don't already exist in TableA. Could you invoke a SP which creates a temporary table on the server containing the potentially usable GUIDS, and then do this:
select guid from #myTempTable as temp
where not exists
(
select uid from TABLEA where uid = temp.guid
)
You could pass your string of GUIDS to the SP; it would populate the temp table using your function; and then return an ADO.NET DataTable to the client. This should be very easy to test before you even bother to write the SP.

I am questioning what you do with this information.
If you insert the keys into this table afterwards you could simply try to insert them on first hand - that's much faster and more solid in a multi-user environment then query first insert later:
create procedure TryToInsert #GUID uniqueidentifier, #Name varchar(n) as
begin try
insert into Table_A (UID,Name)
values (#GUID, #Name);
return 0;
end try
begin catch
return 1;
end;
In all cases you can split the KeyList at the client to get faster results - and you could query the keys that are not valid:
select UID
from Table_A
where UID in ('new guid','new guid',...);
If the GUID are random you should use newsequentialid() with you clustered primary key:
create table Table_A (
UID uniqueidentifier default newsequentialid() primary key,
Name varchar(n) not null
);
With this you can insert and query your newly inserted data in one step:
insert into Table_A (Name)
output inserted.*
values (#Name);
... just my two cents

In any case, are not GUIDs intrinsically engineered to be, for all intents and purposes, unique? (i.e. universally unique -- doesn't matter where generated). I wouldn't even bother to do the test beforehand; just insert your row with the GUID PK and if the insert should fail, discard the GUID. But it should not fail, unless these are not truly GUIDs.
http://en.wikipedia.org/wiki/GUID
http://msdn.microsoft.com/en-us/library/ms190215.aspx
It seems you are doing a lot of unnecessary work, but perhaps I don't grasp your application requirement.

Related

Calling SQL select statement from C# thousands of times and is very time consuming. Is there a better way?

I get a list of ID's and amounts from a excel file (thousands of id's and corresponding amounts). I then need to check the database to see if each ID exists and if it does check to make sure the amount in the DB is greater or equal to that of the amount from the excel file.
Problem is running this select statement upwards of 6000 times and return the values I need takes a long time. Even at a 1/2 of a second a piece it will take about an hour to do all the selects. (I normally dont get more than 5 results max back)
Is there a faster way to do this?
Is it possible to somehow pass all the ID's at once and just make 1 call and get the massive collection?
I have tried using SqlDataReaders and SqlDataAdapters but they seem to be about the same (too long either way)
General idea of how this works below
for (int i = 0; i < ID.Count; i++)
{
SqlCommand cmd = new SqlCommand("select Amount, Client, Pallet from table where ID = #ID and Amount > 0;", sqlCon);
cmd.Parameters.Add("#ID", SqlDbType.VarChar).Value = ID[i];
SqlDataAdapter da = new SqlDataAdapter(cmd);
da.Fill(dataTable);
da.Dispose();
}
Instead of a long in list (difficult to parameterise and has a number of other inefficiencies regarding execution plans: compilation time, plan reuse, and the plans themselves) you can pass all the values in at once via a table valued parameter.
See arrays and lists in SQL Server for more details.
Generally I make sure to give the table type a primary key and use option (recompile) to get the most appropriate execution plans.
Combine all the IDs together into a single large IN clause, so it reads like:
select Amount, Client, Pallet from table where ID in (1,3,5,7,9,11) and Amount > 0;
"I have tried using SqlDataReaders and SqlDataAdapters"
It sounds like you might be open to other APIs. Using Linq2SQL or Linq2Entities:
var someListIds = new List<int> { 1,5,6,7 }; //imagine you load this from where ever
db.MyTable.Where( mt => someListIds.Contains(mt.ID) );
This is safe in terms of avoiding potential SQL injection vulnerabilities and will generate a "in" clause. Note however the size of the someListIds can be so large that the SQL query generated exceeds limits of query length, but the same is true of any other technique involving the IN clause. You can easily workaround that by partitioning lists into large chunks, and still be tremendously better than a query per ID.
Use Table-Valued Parameters
With them you can pass a c# datatable with your values into a stored procedure as a resultset/table which you can join to and do a simple:
SELECT *
FROM YourTable
WHERE NOT EXISTS (SELECT * FORM InputResultSet WHERE YourConditions)
Use the in operator. Your problem is very common and it has a name: N+1 performance problem
Where are you getting the IDs from? If it is from another query, then consider grouping them into one.
Rather than performing a separate query for every single ID that you have, execute one query to get the amount of every single ID that you want to check (or if you have too many IDs to put in one query, then batch them into batches of a few thousand).
Import the data directly to SQL Server. Use stored procedure to output the data you need.
If you must consume it in the app tier... use xml datatype to pass into a stored procedure.
You can import the data from the excel file into SQL server as a table (using the import data wizard). Then you can perform a single query in SQL server where you join this table to your lookup table, joining on the ID field. There's a few more steps to this process, but it's a lot neater than trying to concatenate all the IDs into a much longer query.
I'm assuming a certain amount of access privileges to the server here, but this is what I'd do given the access I normally have. I'm also assuming this is a one off task. If not, the import of the data to SQL server can be done programmatically as well
IN clause has limits, so if you go with that approach, make sure a batch size is used to process X amount of Ids at a time, otherwise you will hit another issue.
A #Robertharvey has noted, if there are not a lot of IDs and there are no transactions occurring, then just pull all the Ids at once into memory into a dictionary like object and process them there. Six thousand values is not alot and a single select could return all those back within a few seconds.
Just remember that if another process is updating the data, your local cached version may be stale.
There is another way to handle this, Making XML of IDs and pass it to procedure. Here is code for procedure.
IF OBJECT_ID('GetDataFromDatabase') IS NOT NULL
BEGIN
DROP PROCEDURE GetDataFromDatabase
END
GO
--Definition
CREATE PROCEDURE GetDataFromDatabase
#xmlData XML
AS
BEGIN
DECLARE #DocHandle INT
DECLARE #idList Table (id INT)
EXEC SP_XML_PREPAREDOCUMENT #DocHandle OUTPUT, #xmlData;
INSERT INTO #idList (id) SELECT x.id FROM OPENXML(#DocHandle, '//data', 2) WITH ([id] INT) x
EXEC SP_XML_removeDOCUMENT #DocHandle ;
--SELECT * FROM #idList
SELECT t.Amount, t.Client, t.Pallet FROM yourTable t INNER JOIN #idList x ON t.id = x.id and t.Amount > 0;
END
GO
--Uses
EXEC GetDataFromDatabase #xmlData = '<root><data><id>1</id></data><data><id>2</id></data></root>'
You can put any logic in procedure. You can pass id, amount also via XML. You can pass huge list of ids via XML.
SqlDataAdapter objects too heavy for that.
Firstly, using stored procedures, it will be faster.
Secondly, use the group operation, for this pass as a parameter to a list of identifiers on the side of the database, run a query on these parameters, and return the processed result.
It will quickly and efficiently, as all data processing logic is on the side of the database server
You can select the whole resultset (or join multiple 'limited' result sets) and save it all to DataTable Then you can do selects and updates (if needed) directly on datatable. Then plug new data back... Not super efficient memory wise, but often is very good (and only) solution when working in bulk and need it to be very fast.
So if you have thousands of records, it might take couple of minutes to populate all records into the DataTable
then you can search your table like this:
string findMatch = "id = value";
DataRow[] rowsFound = dataTable.Select(findMatch);
Then just loop foreach (DataRow dr in rowsFound)

How to fetch lots of database table records by primary key?

Using the ADO.NET MySQL Connector, what is a good way to fetch lots of records (1000+) by primary key?
I have a table with just a few small columns, and a VARCHAR(128) primary key. Currently it has about 100k entries, but this will become more in the future.
In the beginning, I thought I would use the SQL IN statement:
SELECT * FROM `table` WHERE `id` IN ('key1', 'key2', [...], 'key1000')
But with this the query could be come very long, and also I would have to manually escape quote characters in the keys etc.
Now I use a MySQL MEMORY table (tempid INT, id VARCHAR(128)) to first upload all the keys with prepared INSERT statements. Then I make a join to select all the existing keys, after which I clean up the mess in the memory table.
Is there a better way to do this?
Note: Ok maybe its not the best idea to have a string as primary key, but the question would be the same if the VARCHAR column would be a normal index.
Temporary table: So far it seems the solution is to put the data into a temporary table, and then JOIN, which is basically what I currently do (see above).
I've dealt with a similar situation in a Payroll system where the user needed to generate reports based on a selection of employees (eg. employees X,Y,Z... or employees that work in certain offices). I've built a filter window with all the employees and all the attributes that could be considered as a filter criteria, and had that window save selected employee id's in a filter table from the database. I did this because:
Generating SELECT queries with dynamically generated IN filter is just ugly and highly unpractical.
I could join that table in all my queries that needed to use the filter window.
Might not be the best solution out there but served, and still serves me very well.
If your primary keys follow some pattern, you can select where key like 'abc%'.
If you want to get out 1000 at a time, in some kind of sequence, you may want to have another int column in your data table with a clustered index. This would do the same job as your current memory table - allow you to select by int range.
What is the nature of the primary key? It is anything meaningful?
If you're concerned about performance I definitely wouldn't recommend an 'IN' clause. It's much better try do an INNER JOIN if you can.
You can either first insert all the values into a temporary table and join to that or do a sub-select. Best is to actually profile the changes and figure out what works best for you.
Why can't you consider using a Table valued parameter to push the keys in the form of a DataTable and fetch the matching records back?
Or
Simply you write a private method that can concatenate all the key codes from a provided collection and return a single string and pass that string to the query.
I think it may solve your problem.

how to improve SQL query performance in my case

I have a table, schema is very simple, an ID column as unique primary key (uniqueidentifier type) and some other nvarchar columns. My current goal is, for 5000 inputs, I need to calculate what ones are already contained in the table and what are not. Tht inputs are string and I have a C# function which converts string into uniqueidentifier (GUID). My logic is, if there is an existing ID, then I treat the string as already contained in the table.
My question is, if I need to find out what ones from the 5000 input strings are already contained in DB, and what are not, what is the most efficient way?
BTW: My current implementation is, convert string to GUID using C# code, then invoke/implement a store procedure which query whether an ID exists in database and returns back to C# code.
My working environment: VSTS 2008 + SQL Server 2008 + C# 3.5.
My first instinct would be to pump your 5000 inputs into a single-column temporary table X, possibly index it, and then use:
SELECT X.thecol
FROM X
JOIN ExistingTable USING (thecol)
to get the ones that are present, and (if both sets are needed)
SELECT X.thecol
FROM X
LEFT JOIN ExistingTable USING (thecol)
WHERE ExistingTable.thecol IS NULL
to get the ones that are absent. Worth benchmarking, at least.
Edit: as requested, here are some good docs & tutorials on temp tables in SQL Server. Bill Graziano has a simple intro covering temp tables, table variables, and global temp tables. Randy Dyess and SQL Master discuss performance issue for and against them (but remember that if you're getting performance problems you do want to benchmark alternatives, not just go on theoretical considerations!-).
MSDN has articles on tempdb (where temp tables are kept) and optimizing its performance.
Step 1. Make sure you have a problem to solve. Five thousand inserts isn't a lot to insert one at a time in a lot of contexts.
Are you certain that the simplest way possible isn't sufficient? What performance issues have you measured so far?
What do you need to do with those entries that do or don't exist in your table??
Depending on what you need, maybe the new MERGE statement in SQL Server 2008 could fit your bill - update what's already there, insert new stuff, all wrapped neatly into a single SQL statement. Check it out!
http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx
http://www.sql-server-performance.com/articles/dba/SQL_Server_2008_MERGE_Statement_p1.aspx
http://blogs.msdn.com/brunoterkaly/archive/2008/11/12/sql-server-2008-merge-capability.aspx
Your statement would look something like this:
MERGE INTO
(your target table) AS t
USING
(your source table, e.g. a temporary table) AS s
ON t.ID = s.ID
WHEN NOT MATCHED THEN -- new rows does not exist in base table
....(do whatever you need to do)
WHEN MATCHED THEN -- row exists in base table
... (do whatever else you need to do)
;
To make this really fast, I would load the "new" records from e.g. a TXT or CSV file into a temporary table in SQL server using BULK INSERT:
BULK INSERT YourTemporaryTable
FROM 'c:\temp\yourimportfile.csv'
WITH
(
FIELDTERMINATOR =',',
ROWTERMINATOR =' |\n'
)
BULK INSERT combined with MERGE should give you the best performance you can get on this planet :-)
Marc
PS: here's a note from TechNet on MERGE performance and why it's faster than individual statements:
In SQL Server 2008, you can perform multiple data manipulation language (DML) operations in a single statement by using the MERGE statement. For example, you may need to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table. Typically, this is done by executing a stored procedure or batch that contains individual INSERT, UPDATE, and DELETE statements. However, this means that the data in both the source and target tables are evaluated and processed multiple times; at least once for each statement.
By using the MERGE statement, you can replace the individual DML statements with a single statement. This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed. However, performance gains depend on having correct indexes, joins, and other considerations in place. This topic provides best practice recommendations to help you achieve optimal performance when using the MERGE statement.
Try to ensure you end up running only one query - i.e. if your solution consists of running 5000 queries against the database, that'll probably be the biggest consumer of resources for the operation.
If you can insert the 5000 IDs into a temporary table, you could then write a single query to find the ones that don't exist in the database.
If you want simplicity, since 5000 records is not very many, then from C# just use a loop to generate an insert statement for each of the strings you want to add to the table. Wrap the insert in a TRY CATCH block. Send em all up to the server in one shot like this:
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
if you have a unique index or primary key defined on your string GUID, then the duplicate inserts will fail. Checking ahead of time to see if the record does not exist just duplicates work that SQL is going to do anyway.
If performance is really important, then consider downloading the 5000 GUIDS to your local station and doing all the analysis localy. Reading 5000 GUIDS should take much less than 1 second. This is simpler than bulk importing to a temp table (which is the only way you will get performance from a temp table) and doing an update using a join to the temp table.
Since you are using Sql server 2008, you could use Table-valued parameters. It's a way to provide a table as a parameter to a stored procedure.
Using ADO.NET you could easily pre-populate a DataTable and pass it as a SqlParameter.
Steps you need to perform:
Create a custom Sql Type
CREATE TYPE MyType AS TABLE
(
UniqueId INT NOT NULL,
Column NVARCHAR(255) NOT NULL
)
Create a stored procedure which accepts the Type
CREATE PROCEDURE spInsertMyType
#Data MyType READONLY
AS
xxxx
Call using C#
SqlCommand insertCommand = new SqlCommand(
"spInsertMyType", connection);
insertCommand.CommandType = CommandType.StoredProcedure;
SqlParameter tvpParam =
insertCommand.Parameters.AddWithValue(
"#Data", dataReader);
tvpParam.SqlDbType = SqlDbType.Structured;
Links: Table-valued Parameters in Sql 2008
Definitely do not do it one-by-one.
My preferred solution is to create a stored procedure with one parameter that can take and XML in the following format:
<ROOT>
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000">
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000001">
....
</ROOT>
Then in the procedure with the argument of type NCHAR(MAX) you convert it to XML, after what you use it as a table with single column (lets call it #FilterTable). The store procedure looks like:
CREATE PROCEDURE dbo.sp_MultipleParams(#FilterXML NVARCHAR(MAX))
AS BEGIN
SET NOCOUNT ON
DECLARE #x XML
SELECT #x = CONVERT(XML, #FilterXML)
-- temporary table (must have it, because cannot join on XML statement)
DECLARE #FilterTable TABLE (
"ID" UNIQUEIDENTIFIER
)
-- insert into temporary table
-- #important: XML iS CaSe-SenSiTiv
INSERT #FilterTable
SELECT x.value('#ID', 'UNIQUEIDENTIFIER')
FROM #x.nodes('/ROOT/MyObject') AS R(x)
SELECT o.ID,
SIGN(SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END)) AS FoundInDB
FROM #FilterTable o
LEFT JOIN dbo.MyTable t
ON o.ID = t.ID
GROUP BY o.ID
END
GO
You run it as:
EXEC sp_MultipleParams '<ROOT><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000"/><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000002"/></ROOT>'
And your results look like:
ID FoundInDB
------------------------------------ -----------
60EAD98F-8A6C-4C22-AF75-000000000000 1
60EAD98F-8A6C-4C22-AF75-000000000002 0

What is the best way, algorithm, method to difference large lists of data?

I am receiving a large list of current account numbers daily, and storing them in a database. My task is to find added and released accounts from each file. Right now, I have 4 SQL tables, (AccountsCurrent, AccountsNew, AccountsAdded, AccountsRemoved). When I receive a file, I am adding it entirely to AccountsNew. Then running the below queries to find which we added and removed.
INSERT AccountsAdded(AccountNum, Name) SELECT AccountNum, Name FROM AccountsNew WHERE AccountNumber not in (SELECT AccountNum FROM AccountsCurrent)
INSERT AccountsRemoved(AccountNum, Name) SELECT AccountNum, Name FROM AccountsCurrent WHERE AccountNumber not in (SELECT AccountNum FROM AccountsNew)
TRUNCATE TABLE AccountsCurrent
INSERT AccountsCurrent(AccountNum, Name) SELECT AccountNum, Name FROM AccountsNew
TRUNCATE TABLE AccountsNew
Right now, I am differencing about 250,000 accounts, but this is going to keep growing. Is this the best method, do you have any other ideas?
EDIT:
This is an MSSQL 2000 database. I'm using c# to process the file.
The only data I am focused on is the accounts that were added and removed between the last and current files. The AccountsCurrent, is only used to determine what accounts were added or removed.
To be honest, I think that I'd follow something like your approach. One thing is that you could remove the truncate, do a rename of the "new" to "current" and re-create "new".
Sounds like a history/audit process that might be better done using triggers. Have a separate history table that captures changes (e.g., timestamp, operation, who performed the change, etc.)
New and deleted accounts are easy to understand. "Current" accounts implies that there's an intermediate state between being new and deleted. I don't see any difference between "new" and "added".
I wouldn't have four tables. I'd have a STATUS table that would have the different possible states, and ACCOUNTS or the HISTORY table would have a foreign key to it.
Using IN clauses on long lists can be slow.
If the tables are indexed, using a LEFT JOIN can prove to be faster...
INSERT INTO [table] (
[fields]
)
SELECT
[fields]
FROM
[table1]
LEFT JOIN
[table2]
ON [join condition]
WHERE
[table2].[id] IS NULL
This assumes 1:1 relationships and not 1:many. If you have 1:many you can do any of...
1. SELECT DISTINCT
2. Use a GROUP BY clause
3. Use a different query, see below...
INSERT INTO [table] (
[fields]
)
SELECT
[fields]
FROM
[table1]
WHERE
EXISTS (SELECT * FROM [table2] WHERE [condition to match tables 1 and 2])
-- # This is quick provided that all fields to match the two tables are
-- # indexed in both tables. Should then be much faster than the IN clause.
You could also subtract the intersection to get the differences in one table.
If the initial file is ordered in a sensible and consistent way (big IF!), it would run considerably faster as a C# program which logically compared the files.

Can I get the rowcount before executing a stored procedure?

I have some complex stored procedures that may return many thousands of rows, and take a long time to complete.
Is there any way to find out how many rows are going to be returned before the query executes and fetches the data?
This is with Visual Studio 2005, a Winforms application and SQL Server 2005.
You mentioned your stored procedures take a long time to complete. Is the majority of the time taken up during the process of selecting the rows from the database or returning the rows to the caller?
If it is the latter, maybe you can create a mirror version of your SP that just gets the count instead of the actual rows. If it is the former, well, there isn't really that much you can do since it is the act of finding the eligible rows which is slow.
A solution to your problem might be to re-write the stored procedure so that it limits the result set to some number, like:
SELECT TOP 1000 * FROM tblWHATEVER
in SQL Server, or
SELECT * FROM tblWHATEVER WHERE ROWNUM <= 1000
in Oracle. Or implement a paging solution so that the result set of each call is acceptably small.
make a stored proc to count the rows first.
SELECT COUNT(*) FROM table
Unless there's some aspect of the business logic of you app that allows calculating this, no. The database it going to have to do all the where & join logic to figure out how line rows, and that's the vast majority of the time spend in the SP.
You can't get the rowcount of a procedure without executing the procedure.
You could make a different procedure that accepts the same parameters, the purpose of which is to tell you how many rows the other procedure should return. However, the steps required by this procedure would normally be so similar to those of the main procedure that it should take just about as long as just executing the main procedure.
You would have to write a different version of the stored procedure to get a row count. This one would probably be much faster because you could eliminate joining tables which you aren't filtered against, remove ordering, etc. For example if your stored proc executed the sql such as:
select firstname, lastname, email, orderdate from
customer inner join productorder on customer.customerid=productorder.productorderid
where orderdate>#orderdate order by lastname, firstname;
your counting version would be something like:
select count(*) from productorder where orderdate>#orderdate;
Not in general.
Through knowledge about the operation of the stored procedure, you may be able to get either an estimate or an accurate count (for instance, if the "core" or "base" table of the query is able to be quickly calculated, but it is complex joins and/or summaries which drive the time upwards).
But you would have to call the counting SP first and then the data SP or you could look at using a multiple result set SP.
It could take as long to get a row count as to get the actual data, so I wouldn't advodate performing a count in most cases.
Some possibilities:
1) Does SQL Server expose its query optimiser findings in some way? i.e. can you parse the query and then obtain an estimate of the rowcount? (I don't know SQL Server).
2) Perhaps based on the criteria the user gives you can perform some estimations of your own. For example, if the user enters 'S%' in the customer surname field to query orders you could determine that that matches 7% (say) of the customer records, and extrapolate that the query may return about 7% of the order records.
Going on what Tony Andrews said in his answer, you can get an estimated query plan of the call to your query with:
SET showplan_text OFF
GO
SET showplan_all on
GO
--Replace with call you your stored procedure
select * from MyTable
GO
SET showplan_all ofF
GO
This should return a table, or many tables which will let you get the estimated row count of your query.
You need to analyze the returned data set, to determine what is a logical, (meaningful) primary key for the result set that is being returned. In general this WILL be much faster than the complete procedure, because the server is not constructing a result set from data in all the columns of each row of each table, it is simply counting the rows... In general, it may not even need to read the actual table rows off disk to do this, it may simply need to count index nodes...
Then write another SQL statement that only includes the tables necessary to generate those key columns (Hopefully this is a subset of the tables in the main sql query), and the same where clause with the same filtering predicate values...
Then add another Optional parameter to the Stored Proc called, say, #CountsOnly, with a default of false (0) as so...
Alter Procedure <storedProcName>
#param1 Type,
-- Other current params
#CountsOnly TinyInt = 0
As
Set NoCount On
If #CountsOnly = 1
Select Count(*)
From TableA A
Join TableB B On etc. etc...
Where < here put all Filtering predicates >
Else
<Here put old SQL That returns complete resultset with all data>
Return 0
You can then just call the same stored proc with #CountsOnly set equal to 1 to just get the count of records. Old code that calls the proc would still function as it used to, since the parameter value is set to default to false (0), if it is not included
It's at least technically possible to run a procedure that puts the result set in a temporary table. Then you can find the number of rows before you move the data from server to application and would save having to create the result set twice.
But I doubt it's worth the trouble unless creating the result set takes a very long time, and in that case it may be big enough that the temp table would be a problem. Almost certainly the time to move the big table over the network will be many times what is needed to create it.

Categories

Resources