I have a database, with a table which has around 2.500.000 records. I am fetching around 150.000 records, with 2 different queries to test. First one returns results between 30 seconds and 1 minutes. But the second one, responds in between 3 - 4 minutes which is very weird. The only thing changes is first one doesn't use parameter but second one does. I am running both from C#. For security issues I want to use parametered one but I couldn't understand why does it take so much time. Any help will be appreciated.
First query:
DECLARE #page INT=3
DECLARE #pagesize INT=300
string sql = "SELECT Col1,Col2,Col3 FROM
(SELECT ROW_NUMBER() OVER(ORDER BY Col1) AS rownumber,Col1,Col2,Col3";
sql += " FROM my_table WHERE Col1 LIKE '" + letter + "%') as somex
WHERE rownumber >= (#page-1)*(#pagesize)";
sql += "AND rownumber <=(#page)*#pagesize;
SELECT COUNT(*) FROM my_table WHERE col1 LIKE '" + letter + "%'";
Second query:
DECLARE #page INT=3
DECLARE #pagesize INT=300
DECLARE #starting VARCHAR(10)='be'
string sql = "SELECT Col1,Col2,Col3FROM
(SELECT ROW_NUMBER() OVER(ORDER BY Col1) AS rownumber,Col1,Col2,Col3";
sql += " FROM my_table WHERE Col1 LIKE #letter+'%') as somex
WHERE rownumber >= (#page-1)*(#pagesize)";
sql += "AND rownumber <=(#page)*#pagesize; SELECT COUNT(*)
FROM my_table WHERE col1 LIKE #letter+'%'";
My server is 16GB Ram, 4 real 4 virtual CPU, Sata disks.
Edit: Col1 is Clustered and Non-clustered index.
Progress: It turns out that these queries work well on another server. But this confuses me more. Could it be some setting from SQL Server?
As I said in a comment, it sounds like parameter sniffing, but in the interest of being helpful I thought I'd expand on that. There are a number of articles on the web that
go into a lot more detail than I will, but the long and the short of parameter sniffing is that SQL-Server has cached an execution plan based on a value for the parameter that does not yield the best execution plan for the current value.
Supposing that Col1 has a nonclustered index on, but does not include col2 or col3 as non key columns then
SQL-Server has two options, it can either do a clustered index scan on My_Table to get all the rows where Col1 LIKE #letter+'%', or it can search the index on Col1 then do a bookmark lookup on the clustered index to get the values
for each row returned by the index. I can't quite remember off the top of my head at what point SQL-Server switches between the two based on the estimated row count, it is at quite a low percentage, so I am fairly sure that if you are returning 150,000 records
out of 2,500,000 the optimiser will go for a clustered index scan. However, if you were only returning a few hundred rows then a bookmark lookup would be preferable.
When you don't use parameters SQL-Server will create a new execution plan each time it is executed, and produce the best execution plan for that parameter (assuming your statistics are up to date), when you do use a paramter the first time they query is run sql-server creates a plan
based on that particular parameter value, and stores that plan for later use. Each subsequent time the query is run sql-server recognises that the query is the same so doesn't recompile it. This means though that if the first time the query was run it was for
a parameter that returned a low number of rows then the bookmark lookup plan will be stored. Then if the next time the query is run it is passed for a value that returns a high number of rows where the optimal plan is a clustered index scan then the query is still executed using the suboptimal bookmark lookup and
will result in a longer execution time. This could of course also be true the other way round. There are a number of ways to get around parameter sniffing, but since your query is not very complex the compile time will not be significant, especially in comparison to the 30 seconds you say this query is taking
to run even at its best, so I would use the OPTION RECOMPILE Query hint:
SELECT Col1, Col2, Col3
FROM ( SELECT ROW_NUMBER() OVER(ORDER BY Col1) AS rownumber,Col1,Col2,Col3
FROM my_table
WHERE Col1 LIKE #letter+'%'
) as somex
WHERE rownumber >= (#page-1)*(#pagesize)
AND rownumber <= (#page) * #pagesize
OPTION (RECOMPILE);
SELECT COUNT(*)
FROM my_table
WHERE Col1 LIKE #letter+'%'
OPTION (RECOMPILE);
The reason that when you tried this on a new server that it executed fine is that the first time it was run on the new server the parameterised query had to be compiled, and the plan generated was suitable to value of the parameter provided.
One final point, if you are using SQL_Server 2012 then you could use OFFSET/FETCH to do your paging:
SELECT Col1, Col2, Col3
FROM My_table
WHERE Col1 LIKE #letter+'%'
ORDER BY Col1 OFFSET (#page-1) * (#pagesize) ROWS FETCH NEXT #pagesize ROWS ONLY;
Related
I want use the LAG function to determine running total in another column, but it does not work in SQL Server 2012.
To answer the statement "I want to Use Ruuning_Total With Lag Function": you can't, the error is telling you exactly that. I assume you want something like this:
CREATE TABLE SomeTable (ID int IDENTITY(1,1),
SomeNumber int);
INSERT INTO SomeTable
VALUES (1),(17),(37),(24),(67),(265);
SELECT ID,
SomeNumber,
SUM(SomeNumber) OVER (ORDER BY ID) AS RunningTotal,
SUM(SomeNumber) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS RunningTotalMinus1
FROM SomeTable;
DROP TABLE SomeTable;
Is there a way to parse a given SQL SELECT query and wrap each column with a function call e.g. dbo.Foo(column_name) prior to running the SQL query?
We have looked into using a regular expression type 'replace' on the column names, however, we cannot seem to account for all the ways in which a SQL query can be written.
An example of the SQL query would be;
SELECT
[ColumnA]
, [ColumnB]
, [ColumnC] AS [Column C]
, CAST([ColumnD] AS VARCHAR(11)) AS [Bar]
, DATEPART([yyyy], GETDATE()) - DATEPART([yyyy], [ColumnD]) AS [Diff]
, [ColumnE]
FROM [MyTable]
WHERE LEN([ColumnE]) > 0
ORDER BY
[ColumnA]
, DATEPART([yyyy], [ColumnD]) - DATEPART([yyyy], GETDATE());
The result we require would be;
SELECT
[dbo].[Foo]([ColumnA])
, [dbo].[Foo]([ColumnB])
, [dbo].[Foo]([ColumnC]) AS [Column C]
, CAST([dbo].[Foo]([ColumnD]) AS VARCHAR(11)) AS [Bar]
, DATEPART([yyyy], GETDATE()) - DATEPART([yyyy], [dbo].[Foo]([ColumnD])) AS [Diff]
, [dbo].[Foo]([ColumnE])
FROM [MyTable]
WHERE LEN([dbo].[Foo]([ColumnE])) > 0
ORDER BY
[dbo].[Foo]([ColumnA])
, DATEPART([yyyy], [dbo].[Foo]([ColumnD])) - DATEPART([yyyy], GETDATE());
Any or all of the above columns might need the function called on them (including columns used in the WHERE and ORDER BY) which is why we require a query wide solution.
We have many pre-written queries like the above which need to be updated, which is why a manual update will be difficult.
The above example shows that some result columns might be calculated and some have simply been renamed. Most are also made up with joins and some contain case statements which I have left out for the purpose of this example.
Another scenario which would need to be accounted for is table name aliasing e.g. SELECT t1.ColumnA, t2.ColumnF etc.
Either a SQL or C# solution for solving this problem would be ideal.
Instead of replacing each occurrence of every column, you can replace the statement...
FROM MyTable
...with a subselect that includes all existing columns with the function call:
FROM (
SELECT dbo.Foo(ColumnA) AS ColumnA, dbo.Foo(ColumnB) AS ColumnB,
dbo.Foo(ColumnC) AS ColumnC --etc.
FROM MyTable
) AS MyTable
The rest of the query can remain unchanged. In case of table aliasing, you simply replace AS Table1 with AS t1.
Another option you should consider is to create views in your database that would be essentially the subselect. Combined with a naming convention, you can easily replace the occurrences in your FROM (and JOIN) statements with the view name:
FROM MyTable_Foo AS t1
If you want to replace all queries that you'll ever use, consider renaming the tables and creating views that are named like the old tables.
On a more general note: You should reconsider your approach to the underlying problem, since what you are doing here takes away much of the power of SQL. The worst thing here is that once you call the function on all columns, you will not be able to use the indices on those columns, which could mean a serious hit on DB performance.
I have a large inventory system, and I'm having to re-write part of the I/O portion of it. At its heart, there's a product table and a set of related tables. I need to be able to read pieces of it as efficiently as possible. From C# I construct this query:
select * -- includes productid
into #tt
from products where productClass = 547 -- possibly more conditions
select * from #tt;
select * from productHistory where productid in (select productid from #tt);
select * from productSuppliers where productid in (select productid from #tt);
select * from productSafetyInfo where productid in (select productid from #tt);
select * from productMiscInfo where productid in (select productid from #tt);
drop table #tt;
This query gives me exactly the results I need: 5 result sets each having zero, one or more records (if the first returns zero rows, the others do as well, of course). The program then takes those result sets and crams them into an appropriate DataSet. (Which then gets handed off into a constructor expecting just these records.) This query (with differing conditions) gets run a lot.
My question is, is there a more efficient way to retrieve this data?
Re-working this as a single join won't work because each child might return a variable number of rows.
If you have an index on products.productClass this might yield better performance.
select * from products where productClass = 547 -- includes productid
select productHistory.*
from productHistory
join products
on products.productid = productHistory.productid
and products,productClass = 547;
...
If productID is a clustered index then you will probalbly get better permance with
CREATE TABLE #Temp (productid INT PRIMARY KEY CLUSTERED);
insert into #temp
select productid from products where productClass = 547
order by productid;
go
select productHistory.*
from productHistory
join #Temp
on #Temp.productid = productHistory.productid;
A join on a clustered index seems to give the best performance.
Think about it - SQL can match the first and know it can forget about the rest then move to the second knowing it can move foward (not go back to the top).
With a where in (select ..) SQL cannot take advantage of order.
The more tables you need to join the more reason to #temp as you take about 1/2 second hit creating on populating the #temp.
If you are going to #temp you might as well make it a stuctured temp.
Make sure when you JOIN tables you are joining on indexes. Otherwise you will end up with table scans vs index scans and your code will be very slow specially when joining large tables.
Best practice is to optimize your SQL Queries to avoid table scans.
If you don't have it already, I would strongly suggest making this a stored procedure.
Also, I suspect, but can't prove without testing it, that you will get better performance if you perform joins on the products table for each of your subtables rather than copying into a local table.
Finally, unless you can combine the data, I don't think there is a more efficient way to do this.
Without seeing your schema and knowing a little more about your data and table sizes, it's hard to suggest definitive improvements on the query side.
However, instead of "cramming the results into an appropriate DataSet," since you are using a batched command to return multiple result sets, you could use SqlDataAdapter to do that part for you:
SqlDataAdapter adapter = new SqlDataAdapter(cmd);
DataSet results = new DataSet();
adapter.Fill(results);
After that, the first result set will be in results.Tables[0], the second in results.Tables[1], etc.
While using SqlDataReader, it's necessary to know the types of the fields returned in order to call appropriate GetXXX method. So is it possible to output this info in Sql Management Studio?
SELECT ..INTO.. and examine the definition of the new tabke
The WHERE 1 = 0 bit will by shortcircuited here so it should be quick. Of course, you'll need to add your own conditions.
SELECT
...
INTO dbo.TempTable
FROM ...
WHERE 1 = 0
GO
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'TempTable'
GO
DROP TABLE dbo.TempTable
If you have a single table in the FROM:
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'SourceTable'
Which method depends on complexity. For example, a calculation on decimal column changes precision and scale. Or varchar processing can change length or char to varchar.
You'd be running the SQL anyway to make sure it's OK before calling it the client code...
I have 200,000 records in a database with the PK as a varchar(50)
Every 5 minutes I do a SELECT COUNT(*) FROM TABLE
If that result is greater than the List.Count I then execute
"SELECT * FROM TABLE WHERE PRIMARYKEY NOT IN ( " + myList.ToCSVString() + ")"
The reason I do this is because records are being added to the table via another process.
This query takes a long time to run and I also believe its throwing an OutOfMemoryException
Is there a better way to implement this?
Thanks
SQL Server has a solution for this, add a timestamp column, every time you touch any row in the table the timestamp will grow.
Add an index for the timestamp column.
Instead of just storing ids in memory, store ids and last timestamp.
To update:
select max timestamp
select all the rows between old max timestamp and current max timestamp
merge that into the list
Handling deletions is a bit more tricky, but can be achieved if you tombstone as opposed to delete.
Can you change the table?
If so, you might want to add a new auto incremented column that will serve as the PK TableId.
On each SELECT save the max id and on the next select add where TableId > maxId.
Create an INT PK, and use something like this:
"SELECT * FROM TABLE WHERE MY_ID > " + myList.Last().Id;
If you can't change your PK, create another column with date as type , and with NOW() as the default value and use it to query for new items.
Create another table in the database with a single column for for the primary key. When your application starts, insert the PKs into this table. Then you can detect added keys directly with a select rather than checking the count:
select PrimaryKey from Table where PrimaryKey not in (select PrimaryKey from OtherTable)
If this CSV list is large, I would recommend loading your file into a temp table, put an index on it and do a left join where null
select tbl.*
from table tbl
left join #tmpTable tmp on tbl.primarykey = tmp.primarykey
where tmp.primary key is null
edit: a Primary Key should not be a varchar. It should almost always be a incremented int/bigint. This would've been a lot easier. select * from table where primarykey > #lastknownkey
Smack the DB programmer who designed this.. :p
This design would also cause index fragmentation because rows won't be inserted in a linear fashion.