I have a query of the form
SELECT DISTINCT Str,Score
FROM Tab
WHERE Str in ('Str1', 'Str2', 'Str3') AND Type = 0
Table schema is
Str - varchar(8000)
Score - int
Type - bit
I also have an index on Str which includes Type and Score
The number of strings in the IN vary
When I construct a direct query from C#, it's virtually instantaneous
When I use a parametrized query (using the method here https://stackoverflow.com/a/337792/508593 ), it becomes extremely slow -- the original query takes less than a second. This is timing out
Looking into SQL profiler and SSMS, the slowness seems to be due to the statement being wrapped in exec sp_executesql which causes an index scan instead of a seek. The direct query uses the index mentioned. With sp_executesql, the index does not
Is my suspicion correct and is there a way to resolve this?
In addition to the root cause specified by Martin, the solution was to explicitly set the parameter type using
command.Parameters[i].DbType = DbType.AnsiString;
Which forces varchar instead of nvarchar
The parameters need to be varchar not nvarchar.
Otherwise the query will be effectively
WHERE IMPLICIT_CAST(Str AS NVARCHAR(4000)) in (#P1,#P2,#P3) AND Type = 0
Which hurts index usage.
It's unclear from your question what approach to parametrization you took; the question you're referring to shows different methods.
If you went for the Table-Valued Parameter solution, you may be suffering from the cached query plan which is created by SQL Server without knowing the number of items in the TVP parameter. By default, IIRC, it assumes 10'000 items, which would explain the index scan instead of seek.
That being said, try to add a OPTION (RECOMPILE) hint at the end of the parametrized query, which will enable SQL Server to re-compile the query with the (then known) item counts.
The issue isn't with parametrized query.
When specifying the values hardcoded in your IN clause, according to MSDN, you better have a good estimation of #values:
Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632. To work around this problem, store the items in the IN list in a table
Related
I'm following up on my question yesterday, Entity Framework 6 get complext return value from a stored procedure. My stored procedure now runs under entity framework. However, it times out after 3 minutes, the connection time out.
I run the stored procedure in my SQL Server Management Studio with the line (customer information omitted):
EXEC spGetDupWOs #ProjectName=N'...', #City=N'...', #State=N'LA', #ProposalNum=N'201703080740-001', #County=N'...', #Owner=N'...', #QuoteRecipients=N'...', #ProjectID=-1
It executes in less than a second. When Entity framwork executes it, it takes forever.
Using the SQL Server Profiler, I determined that Entity Framework is sending this line to the SQL server:
exec sp_executesql N'EXEC spGetDupWOs',N'#ProjectName nvarchar(19),#City nvarchar(6),#State nvarchar(2),#ProjectNum nvarchar(12),#County nvarchar(10),#Owner nvarchar(23),#QuoteRecipients nvarchar(23),#ProjectID bigint',#ProjectName=N'...',#City=N'Holden',#State=N'LA',#ProposalNum=N'201703080740-001',#County=N'Livingston',#Owner=N'...',#BID_RECIP=N'...',#ProjectID=-1
When I run this in SSMS, it takes forever to run.
Reading the similar questions it looks like the issue is Parameter Sniffing and a change in execution plan.
Here is my call to execute the stored procedure in my application:
List<DuplicateProposals> duplicateCheckResults =
db.Database.SqlQuery<DuplicateProposals>("spGetDupWOs",
spl.ToArray())
.ToList();
After reading a bunch of articles online, I'm even more confused. How can I change my call to resolve this?
The issue identified is parameter sniffing in SQL Server. There are multiple approaches to handle this, but the most optimal for your scenario depends on your real use case, utilization, etc.
Here are some options.
Recompile the stored procedure with every execution. This may become very heavy CPU utilization, and is typically overkill. I would NOT recommend this option unless you have a very good reason. To implement: Use
WITH RECOMPILE or the OPTION(RECOMPILE) hint at the end of your query.
Optimize for hint. This can be a workaround for the parameter sniffing, but may result in a subpar execution plan for all of your queries. Typically, not an optimal approach. To implement: Use OPTION(OPTIMIZE FOR UNKNOWN)
Copy parameter to a local variable. Was more common in older versions of SQL Server. To implement: Declare a local variable, then copy the value from your input parameter to your local variable. DECLARE #local_var1 char(1) = #InputParam1;
Turn off parameter sniffing at query level. This approach uses the QUERYTRACEON hint. This may be the most optimal approach for this case scenario. I would recommend to explore this option as a primary strategy. To implement: add OPTION(QUERYTRACEON 4136) to the end of your query.
Example:
SELECT * FROM dbo.MyTable T
WHERE T.Col1 = #Param1 and T.Col2 = #Param2
OPTION(QUERYTRACEON 4136)
I ended up having to convert the entire call into a single string that I passed to the SqlQuery function.
string sql = string.Format("exec spGetDupWOs #ProjectName=N'{0}',#City=N'{1}',#State=N'{2}',#ProjectNumber=N'{3}',#County=N'{4}',#Owner=N'{5}',#QuoteRecipients=N'{6}',#ProjectID={7}",
project.ProjectName,
project.City,
project.State,
project.ProjectNumber,
project.County,
project.Owner,
quoteRecipientsList,
"null");
Yes, I had to include the N prefix to the strings to make it work, I'm not sure why but it worked.
Thanks for all of the help everyone. I could not have solved this without your help.
I am working with a .net webtool and our clients require some extremely large queries that are just plain not viable for our web servers use. Our solution was to kick the query back to our automation system and feed the data back to the user when its finished. The problem is, passing the query from the webtool to automation requires we store the query as a string for later use. This does not allow us to use parameterized inputs.
What is best practice here? How can we scrub this data before running it? Obviously, the inputs should be validated initially but I am looking for a more, catchall solution.
Any help would be greatly appreciated!
There are a some assumptions made in the question that need to be validated, such as the statement "we store the query as a string for later use. This does not allow us to use parameterized inputs." It is possible to use parameterized queries, and storing the complete SQL content as a string. Below is an example of a parameterized query:
declare #sql nvarchar(max)
declare #monthNo int = 2
declare #minAmount decimal = 100
set #sql = N'select column1, column2 from dbo.Mytable where mon = #MonthNo and amount > #minAmount'
exec sp_executesql #sql, N'#monthNo int, #minAmount decimal', #monthNo, #minAmount
If the above example is not a viable option, here is some syntax to catch injection attempts within a string:
-- check for sql injection
declare #sql nvarchar(max)
set #sql = N'My fancy sql query; truncate table dbo.VeryImportantTable'
IF CHARINDEX(';', ISNULL(#sql,'')) != 0 OR CHARINDEX('--', ISNULL(#sql,'')) != 0
BEGIN
RAISERROR('Invalid input parameter', 16, 1)
RETURN -1
END
Since there are no examples in the question of code or its implementation, the answer contains some degree of speculation.
Summing :
Just to make it clearer for me and give my understanding :
1) You've got users that provide large queries via a webtool.
2) Your webtool give them to an automation system via SQL server.
3) Your automation system with SQL dynamic will query and feed back results to your users.
I assume that you will put 2 single quotes on each single quote at step 2) in order to reduce your query to something "storable".
Solution :
I think it will be nearly impossible to protect yourself of your user at this point without changing your architecture. But if you really want to keep it, the simpliest things you can do here is an analytics at step 1) and store your query without any edit at step 2).
Example :
I assume that you already had your own set of rules to analyse your query so I will just talk about "storing query".
Two ways to go. One is not using SQL Server to store query like a file (that is the safest solution and probably the worst in term of performance).
The second is to use special replacement of your single quote character and convert back at execution time (it's safe until someone now of your replacent character(s)).
I'm using LINQ To Sql (not Entity Framework), the System.Data.Linq.DataContext library, hitting a SQL Server 2005 database and using .Net Framework 4.
The table dbo.Dogs has a column "Active" of type CHAR(1) NULL. If I was writing straight SQL the query would be:
SELECT * FROM dbo.Dogs where Active = 'A';
The LINQ query is this:
from d in myDataContext.Dogs where d.Active == 'A' select d;
The SQL that gets generated from the above LINQ query converts the Active field to UNICODE. This means I cannot use the index on the dbo.Dogs.Active column, slowing the query significantly:
SELECT [t0].Name, [t0].Active
FROM [dbo].[Dog] AS [t0]
WHERE UNICODE([t0].[Active]) = #p1
Is there anything I can do to stop Linq to Sql from inserting that UNICODE() call (and thus losing the benefit of my index on dogs.Active)? I tried wrapping the parameters using the EntityFunctions.AsNonUnicode() method, but that did no good (it inserted a CONVERT() to NVARCHAR instead of UNICODE() in the generated sql), eg:
...where d.Active.ToString() == EntityFunctions.AsNonUnicode('A'.ToString());
Linq is meant to make it easier to write queries and does not always generate optimal SQL. Sometimes when high performance is required it is more efficient to write raw SQL directly against the database, the Linq datacontext supports mapping of SQL result to entities just like linq.
In your case I would suggest writing:
IEnumerable<Dog> results = db.ExecuteQuery<Dog>(
"SELECT * FROM dbo.Dogs where Active = {0}",
'A');
This is an old question, but I bumped into this recently.
Instead of writing
from d in myDataContext.Dogs where d.Active == 'A' select d;
Write
from d in myDataContext.Dogs where d.Active.Equals('A') select d;
This will produce the desired SQL without having to resort to any of the "hacks" mentioned in other answers. I can't say why for certain.
I've posted that as a question, so we'll see if we get any good answers.
There's not much you can do to the way LINQ queries are translated into SQL statements, but you can write a stored procedure that contains your queries and call that SP as a LINQ2SQL function. This way you should get full benefit of SQL Server optimizaions
You can do a little hack (as it is often required with LINQ to SQL and EF). Declare the property as NCHAR in the dbml. I hope that will remove the need to do the UNICODE conversion. We are tricking L2S in a benign way with that.
Maybe you need to also insert the EntityFunctions.AsNonUnicode call to make the right hand side a non-unicode type.
You can also try mapping the column as varchar.
Is there a way to see final query which is passed to SQL Server database from my C# app ?
For ex I got query:
SELECT * FROM mytable WHERE x = #yyyy;
This creates and SQLCommand object
SqlCommand cmd = new SqlCommand("SELECT * FROM mytable WHERE x = #yyyy");
Plus I need to pass parameter:
cmd.Parameters.Add("#yyyy","MyValue");
What I want to see (in debug in C# or somewhere in SQL Server Management Studio) is this:
SELECT * FROM mytable WHERE x = MyValue
Where can I find such query ?!
Best regards
Where can I find such query ?!
You can't. Such a query never exists. The values are not substituted into the SQL.
I think actually sp_executesql is called, and this function accepts the parameters separately from the SQL. You can check this using SQL Profiler to see the actual SQL.
Update:
ORDER BY #descOrAsc
Your problem is that parameters can only be used in certain places where expressions are allowed. DESC is not an expression - it is a reserved word. You cannot use a parameter containing the string "DESC" instead of writing the keyword DESC in the query.
Also, you haven't specified which column to order by.
You can run the SQL Server Profiler and see all the queries that get executed, to see whats happening (and copy paste these into the Sql Server Management Studio to do tests etc)
I would expect the query to be passed to SQL Server with the parameters. There should be no need for anything to ever create a full SQL-only query. It makes no sense to do so, as it just means more conversions for either the client, the server or both. On the server side, the query processor is going to want to parse the query into clauses with values - if the command can pass those values directly, where's the advantage on converting them into the SQL statement, only to have the server parse them into separate values again?
1.You can use SQL Profiler. (here you can see all process)
2.You can write all your queries to SQL Server table. And then you can always get queries from this table.
I have some complex stored procedures that may return many thousands of rows, and take a long time to complete.
Is there any way to find out how many rows are going to be returned before the query executes and fetches the data?
This is with Visual Studio 2005, a Winforms application and SQL Server 2005.
You mentioned your stored procedures take a long time to complete. Is the majority of the time taken up during the process of selecting the rows from the database or returning the rows to the caller?
If it is the latter, maybe you can create a mirror version of your SP that just gets the count instead of the actual rows. If it is the former, well, there isn't really that much you can do since it is the act of finding the eligible rows which is slow.
A solution to your problem might be to re-write the stored procedure so that it limits the result set to some number, like:
SELECT TOP 1000 * FROM tblWHATEVER
in SQL Server, or
SELECT * FROM tblWHATEVER WHERE ROWNUM <= 1000
in Oracle. Or implement a paging solution so that the result set of each call is acceptably small.
make a stored proc to count the rows first.
SELECT COUNT(*) FROM table
Unless there's some aspect of the business logic of you app that allows calculating this, no. The database it going to have to do all the where & join logic to figure out how line rows, and that's the vast majority of the time spend in the SP.
You can't get the rowcount of a procedure without executing the procedure.
You could make a different procedure that accepts the same parameters, the purpose of which is to tell you how many rows the other procedure should return. However, the steps required by this procedure would normally be so similar to those of the main procedure that it should take just about as long as just executing the main procedure.
You would have to write a different version of the stored procedure to get a row count. This one would probably be much faster because you could eliminate joining tables which you aren't filtered against, remove ordering, etc. For example if your stored proc executed the sql such as:
select firstname, lastname, email, orderdate from
customer inner join productorder on customer.customerid=productorder.productorderid
where orderdate>#orderdate order by lastname, firstname;
your counting version would be something like:
select count(*) from productorder where orderdate>#orderdate;
Not in general.
Through knowledge about the operation of the stored procedure, you may be able to get either an estimate or an accurate count (for instance, if the "core" or "base" table of the query is able to be quickly calculated, but it is complex joins and/or summaries which drive the time upwards).
But you would have to call the counting SP first and then the data SP or you could look at using a multiple result set SP.
It could take as long to get a row count as to get the actual data, so I wouldn't advodate performing a count in most cases.
Some possibilities:
1) Does SQL Server expose its query optimiser findings in some way? i.e. can you parse the query and then obtain an estimate of the rowcount? (I don't know SQL Server).
2) Perhaps based on the criteria the user gives you can perform some estimations of your own. For example, if the user enters 'S%' in the customer surname field to query orders you could determine that that matches 7% (say) of the customer records, and extrapolate that the query may return about 7% of the order records.
Going on what Tony Andrews said in his answer, you can get an estimated query plan of the call to your query with:
SET showplan_text OFF
GO
SET showplan_all on
GO
--Replace with call you your stored procedure
select * from MyTable
GO
SET showplan_all ofF
GO
This should return a table, or many tables which will let you get the estimated row count of your query.
You need to analyze the returned data set, to determine what is a logical, (meaningful) primary key for the result set that is being returned. In general this WILL be much faster than the complete procedure, because the server is not constructing a result set from data in all the columns of each row of each table, it is simply counting the rows... In general, it may not even need to read the actual table rows off disk to do this, it may simply need to count index nodes...
Then write another SQL statement that only includes the tables necessary to generate those key columns (Hopefully this is a subset of the tables in the main sql query), and the same where clause with the same filtering predicate values...
Then add another Optional parameter to the Stored Proc called, say, #CountsOnly, with a default of false (0) as so...
Alter Procedure <storedProcName>
#param1 Type,
-- Other current params
#CountsOnly TinyInt = 0
As
Set NoCount On
If #CountsOnly = 1
Select Count(*)
From TableA A
Join TableB B On etc. etc...
Where < here put all Filtering predicates >
Else
<Here put old SQL That returns complete resultset with all data>
Return 0
You can then just call the same stored proc with #CountsOnly set equal to 1 to just get the count of records. Old code that calls the proc would still function as it used to, since the parameter value is set to default to false (0), if it is not included
It's at least technically possible to run a procedure that puts the result set in a temporary table. Then you can find the number of rows before you move the data from server to application and would save having to create the result set twice.
But I doubt it's worth the trouble unless creating the result set takes a very long time, and in that case it may be big enough that the temp table would be a problem. Almost certainly the time to move the big table over the network will be many times what is needed to create it.