I'm following up on my question yesterday, Entity Framework 6 get complext return value from a stored procedure. My stored procedure now runs under entity framework. However, it times out after 3 minutes, the connection time out.
I run the stored procedure in my SQL Server Management Studio with the line (customer information omitted):
EXEC spGetDupWOs #ProjectName=N'...', #City=N'...', #State=N'LA', #ProposalNum=N'201703080740-001', #County=N'...', #Owner=N'...', #QuoteRecipients=N'...', #ProjectID=-1
It executes in less than a second. When Entity framwork executes it, it takes forever.
Using the SQL Server Profiler, I determined that Entity Framework is sending this line to the SQL server:
exec sp_executesql N'EXEC spGetDupWOs',N'#ProjectName nvarchar(19),#City nvarchar(6),#State nvarchar(2),#ProjectNum nvarchar(12),#County nvarchar(10),#Owner nvarchar(23),#QuoteRecipients nvarchar(23),#ProjectID bigint',#ProjectName=N'...',#City=N'Holden',#State=N'LA',#ProposalNum=N'201703080740-001',#County=N'Livingston',#Owner=N'...',#BID_RECIP=N'...',#ProjectID=-1
When I run this in SSMS, it takes forever to run.
Reading the similar questions it looks like the issue is Parameter Sniffing and a change in execution plan.
Here is my call to execute the stored procedure in my application:
List<DuplicateProposals> duplicateCheckResults =
db.Database.SqlQuery<DuplicateProposals>("spGetDupWOs",
spl.ToArray())
.ToList();
After reading a bunch of articles online, I'm even more confused. How can I change my call to resolve this?
The issue identified is parameter sniffing in SQL Server. There are multiple approaches to handle this, but the most optimal for your scenario depends on your real use case, utilization, etc.
Here are some options.
Recompile the stored procedure with every execution. This may become very heavy CPU utilization, and is typically overkill. I would NOT recommend this option unless you have a very good reason. To implement: Use
WITH RECOMPILE or the OPTION(RECOMPILE) hint at the end of your query.
Optimize for hint. This can be a workaround for the parameter sniffing, but may result in a subpar execution plan for all of your queries. Typically, not an optimal approach. To implement: Use OPTION(OPTIMIZE FOR UNKNOWN)
Copy parameter to a local variable. Was more common in older versions of SQL Server. To implement: Declare a local variable, then copy the value from your input parameter to your local variable. DECLARE #local_var1 char(1) = #InputParam1;
Turn off parameter sniffing at query level. This approach uses the QUERYTRACEON hint. This may be the most optimal approach for this case scenario. I would recommend to explore this option as a primary strategy. To implement: add OPTION(QUERYTRACEON 4136) to the end of your query.
Example:
SELECT * FROM dbo.MyTable T
WHERE T.Col1 = #Param1 and T.Col2 = #Param2
OPTION(QUERYTRACEON 4136)
I ended up having to convert the entire call into a single string that I passed to the SqlQuery function.
string sql = string.Format("exec spGetDupWOs #ProjectName=N'{0}',#City=N'{1}',#State=N'{2}',#ProjectNumber=N'{3}',#County=N'{4}',#Owner=N'{5}',#QuoteRecipients=N'{6}',#ProjectID={7}",
project.ProjectName,
project.City,
project.State,
project.ProjectNumber,
project.County,
project.Owner,
quoteRecipientsList,
"null");
Yes, I had to include the N prefix to the strings to make it work, I'm not sure why but it worked.
Thanks for all of the help everyone. I could not have solved this without your help.
Related
I am working with a .net webtool and our clients require some extremely large queries that are just plain not viable for our web servers use. Our solution was to kick the query back to our automation system and feed the data back to the user when its finished. The problem is, passing the query from the webtool to automation requires we store the query as a string for later use. This does not allow us to use parameterized inputs.
What is best practice here? How can we scrub this data before running it? Obviously, the inputs should be validated initially but I am looking for a more, catchall solution.
Any help would be greatly appreciated!
There are a some assumptions made in the question that need to be validated, such as the statement "we store the query as a string for later use. This does not allow us to use parameterized inputs." It is possible to use parameterized queries, and storing the complete SQL content as a string. Below is an example of a parameterized query:
declare #sql nvarchar(max)
declare #monthNo int = 2
declare #minAmount decimal = 100
set #sql = N'select column1, column2 from dbo.Mytable where mon = #MonthNo and amount > #minAmount'
exec sp_executesql #sql, N'#monthNo int, #minAmount decimal', #monthNo, #minAmount
If the above example is not a viable option, here is some syntax to catch injection attempts within a string:
-- check for sql injection
declare #sql nvarchar(max)
set #sql = N'My fancy sql query; truncate table dbo.VeryImportantTable'
IF CHARINDEX(';', ISNULL(#sql,'')) != 0 OR CHARINDEX('--', ISNULL(#sql,'')) != 0
BEGIN
RAISERROR('Invalid input parameter', 16, 1)
RETURN -1
END
Since there are no examples in the question of code or its implementation, the answer contains some degree of speculation.
Summing :
Just to make it clearer for me and give my understanding :
1) You've got users that provide large queries via a webtool.
2) Your webtool give them to an automation system via SQL server.
3) Your automation system with SQL dynamic will query and feed back results to your users.
I assume that you will put 2 single quotes on each single quote at step 2) in order to reduce your query to something "storable".
Solution :
I think it will be nearly impossible to protect yourself of your user at this point without changing your architecture. But if you really want to keep it, the simpliest things you can do here is an analytics at step 1) and store your query without any edit at step 2).
Example :
I assume that you already had your own set of rules to analyse your query so I will just talk about "storing query".
Two ways to go. One is not using SQL Server to store query like a file (that is the safest solution and probably the worst in term of performance).
The second is to use special replacement of your single quote character and convert back at execution time (it's safe until someone now of your replacent character(s)).
I have an issue with stored procedures and Entity Framework.
Let me explain what is happening... and what I have tried thus far.
I have a stored procedure, which does not do an awful lot
SELECT
COUNT(DISTINCT(EmailAddress)) AcceptedQuotes,
CONVERT (DATE,QuoteDate) QuoteDate
FROM
Quote Q
JOIN
Person P on Q.PersonPk = P.Pk
JOIN
Product Pr on Q.ProductPk = Pr.Pk
JOIN
Accepted A on Q.Pk = A.QuotePk
WHERE
QuoteDate between #startDate and #endDate
AND CompanyPk = #companyPk
AND FirstName != 'Test'
AND FirstName != 'test'
AND FirstName != 'EOH'
I want to execute this, and it works fine in SSMS and does not even take 1 second.
Now, I import this in to Entity Framework, it times out and I set the command timeout to 120...
Ok so what I have tried thus far and what I have tested.
If I use SqlCommand, SqlDataAdapter, DataTable way, with my own connection string, it executes as expected. When I use Entity Framework connection string in this scenario, it times out.
I altered my stored procedure to include "Recompile" option and also tried the SET ARITHABORT way, no luck, it times out when run through the EF.
Is this a bug in EF?
I have now just about decided to rewrite this using "old school" data access.
Also note that the EF executes fine with other stored procs, from the same database.
Any ideas or help would be greatly appreciated...
PS. I found this article, but no help either :(
http://www.sommarskog.se/query-plan-mysteries.html
This may be caused by Parameter Sniffing
When a stored procedure is compiled or recompiled, the parameter values passed for that invocation are "sniffed" and used for cardinality estimation. The net effect is that the plan is optimized as if those specific parameter values were used as literals in the query.
Using dummy variables that are not directly displayed on parameters also ensure execution plan stability without need to add recompile
hint, example below:
create procedure dbo.SearchProducts
#Keyword varchar(100) As Declare #Keyworddummy as varchar(100) Set #Keyworddummy = #Keyword select * from Products where Keyword like
#Keyworddummy
To prevent this and other similar situations, you can use the following query option:
OPTIMIZE FOR RECOMPILE
Disable auto-update statistics during the batch
My query is fairly complex, but I have simplified it to figure out this problem and now it is a simple JOIN that I'm running on a SQL Server 2014 database. The query is:
SELECT * FROM SportsCars as sc INNER JOIN Cars AS c ON c.CarID = sc.CarID WHERE c.Type = 1
When I run this query from SMSS and watch it in SQL Profiler, it takes around 350ms to execute. When I run the same query inside my application using Entity Framework or ADO.NET (I've tried both). It takes 4500ms to execute.
ADO.NET Code:
using (var connection = new SqlConnection(connectionString))
{
connection.Open();
var cmdA = new SqlCommand("SET ARITHABORT ON", connection);
cmdA.ExecuteNonQuery();
var query = "SELECT * FROM SportsCars as sc INNER JOIN Cars AS c ON c.CarID = sc.CarID WHERE c.Type = 1";
var cmd = new SqlCommand(query, connection);
cmd.ExecuteNonQuery()
}
I've done an extensive Google search and found this awesome article and several StackOverflow questions (here and here). In order to make the session parameters identical for both queries I call SET ARITHABORT ON in ADO.NET and it makes no difference. This is a straight SQL query, so there is not a parameter sniffing problem. I've simplified the query and the indexes down to their most basic form for this test. There is nothing else running on the server and there is nothing else accessing the database during the test. There are no computed columns in the Cars or SportsCars table, just INTs and VARCHARs.
The SportsCars table has about 170k records and 4 columns, and the Cars table has about 1.2M records and 7 columns. The resulting data set (SportsCars of Type=1) has about 2600 records and 11 columns. I have a single non-clustered index on the Cars table, on the [Type] column that includes all the columns of the cars table. And both tables have a clustered index on the CarID column. No other indexes exist on either table. I'm running as the same database user in both cases.
When I view the data in SQL Profiler, I see that both queries are using the exact same, very simple query plan. In SQL Profiler, I'm using the Performance Event Class and the ShowPlan XML Statistics Profile, which I believe to be the proper event to monitor and capture the actual execution plan. The # of reads is the same for both queries (2596).
How can two exact same queries with the exact same query plan take 10x longer in ADO.NET vs. SMSS?
Figured it out:
Because I'm using Entity Framework, the connection string in my application has MultipleActiveResultSets=True. When I remove this from the connection string, the queries have the same performance in ADO.NET and SSMS.
Apparently there is an issue with this setting causing queries to respond slowly when connected to SQL Server via WAN. I found this link and this comment:
MARS uses "firehose mode" to retrieve data. Firehose mode means that
the server will produce data as fast as possible. This also means that
your client application must receive inbound data at the same speed as
it comes in. If it doesn't the data storage buffers on the server will
fill up and the processing will stop until those buffers empty.
So what? You may ask... But as long as the processing is stopped the
resources on the SQL server are in use and are tied up. This includes
the worker thread, schema and data locks, memory, etc. So it is
crucial that your client application consumes the inbound results as
quickly as they arrive.
I have to use this setting with Entity Framework otherwise lazy loading will generate exceptions. So I'm going to have to figure out some other workaround. But at least I understand the issue now.
How can two exact same queries with the exact same query plan take 10x longer in ADO.NET vs. SMSS?
First we need to be clear about what is considered "same" with regards to queries and query plans. Assuming that the query at the very top of the question is a copy-and-paste, then it is not the same query as the one being submitted via ADO.NET. For two queries to be the same, they need to be byte-by-byte the same, which includes all white-space, capitalization, punctuation, comments, etc.
The two queries shown are definitely very similar. And they might even share the same execution plan. But how was "same"ness determined for those? Was the XML the same in both cases? Or just what was shown graphically in SSMS when viewing the plans? If they were determined to be the same based on their graphical representation then that is sometimes misleading. The XML itself needs to be checked. Even if two query plans have the same query hash, there are still (sometimes) parts of a query plan that are variable and changes do not change the plan hash. One example is the evaluation of expressions. Sometimes they are calculated and their result is embedded into the plan as a constant. Sometimes they are calculated at the start of each execution and stored and reused within that particular execution, but not for any subsequent executions.
One difference between SSMS and ADO.NET is the default session properties for each. I thought I had seen a chart years ago showing the defaults for ADO / OLEDB / SQLNCLI but can't find it out. Either way, it doesn't need to be guess work as it can be discovered using the SESSIONPROPERTY function. Just run this query in the C# code instead of your current SELECT, and inspect the results in debug or print them out or whatever. Either way, run something like this:
SELECT SESSIONPROPERTY('ANSI_NULLS') AS [AnsiNulls],
SESSIONPROPERTY('ANSI_PADDING') AS [AnsiPadding],
SESSIONPROPERTY('CONCAT_NULL_YIELDS_NULL') AS [ConcatNullYieldsNull],
...;
Make sure to get all of the setting noted in the linked MSDN page. Now, in SSMS, go to the "Query" menu, select "Query Options...", and go to "Execution" | "ANSI". The settings coming back from the C# code need to match the ones showing in SSMS. Anything set different requires adding something like this to the beginning of your ADO.NET query string:
SET ANSI_NULLS ON;
{rest of query}
Now, if you want to eliminate the DataTable loading from being a possible suspect, just replace that line, just replace:
var cars = new DataTable();
cars.Load(reader);
with:
while(reader.Read());
And lastly, why not just put the query into a Stored Procedure? The session settings (i.e. ANSI_NULLS, etc) that typically matter the most are stored with the proc definition so they should work the same whether you EXEC from SSMS or from ADO.NET (again, we aren't dealing with any parameters here).
I have a query of the form
SELECT DISTINCT Str,Score
FROM Tab
WHERE Str in ('Str1', 'Str2', 'Str3') AND Type = 0
Table schema is
Str - varchar(8000)
Score - int
Type - bit
I also have an index on Str which includes Type and Score
The number of strings in the IN vary
When I construct a direct query from C#, it's virtually instantaneous
When I use a parametrized query (using the method here https://stackoverflow.com/a/337792/508593 ), it becomes extremely slow -- the original query takes less than a second. This is timing out
Looking into SQL profiler and SSMS, the slowness seems to be due to the statement being wrapped in exec sp_executesql which causes an index scan instead of a seek. The direct query uses the index mentioned. With sp_executesql, the index does not
Is my suspicion correct and is there a way to resolve this?
In addition to the root cause specified by Martin, the solution was to explicitly set the parameter type using
command.Parameters[i].DbType = DbType.AnsiString;
Which forces varchar instead of nvarchar
The parameters need to be varchar not nvarchar.
Otherwise the query will be effectively
WHERE IMPLICIT_CAST(Str AS NVARCHAR(4000)) in (#P1,#P2,#P3) AND Type = 0
Which hurts index usage.
It's unclear from your question what approach to parametrization you took; the question you're referring to shows different methods.
If you went for the Table-Valued Parameter solution, you may be suffering from the cached query plan which is created by SQL Server without knowing the number of items in the TVP parameter. By default, IIRC, it assumes 10'000 items, which would explain the index scan instead of seek.
That being said, try to add a OPTION (RECOMPILE) hint at the end of the parametrized query, which will enable SQL Server to re-compile the query with the (then known) item counts.
The issue isn't with parametrized query.
When specifying the values hardcoded in your IN clause, according to MSDN, you better have a good estimation of #values:
Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632. To work around this problem, store the items in the IN list in a table
I am running a query directly, it is trivial in nature:
SELECT * FROM [dbo].[vwUnloadedJobDetailsWithData] WHERE JobId = 36963
When I run this from Management studio the query doesn't even take a second. When I run it from within the table adapter it times out. I have fixed this multiple times, but the fix is ludicrous. If I delete the table adapter from my xsd file and recreate it the query time matches that of management studio for about two days, but I have to redeploy which is asinine.
Any insight into what could be causing this would be greatly appreciated. I've seen another question about this but the solution involving set arithabort on before the query had no effect for me.
Edit: It was asked that I show my code for calling the query. Now this happens when I go into my xsd file and just do preview data as well, but for sake of clarity, here it is:
using (TEAMSConnection connection = new TEAMSConnection())
{
connection.OpenConnection();
_JobDetailsDAO jobDetailDao= new _JobDetailsDAO(connection);
return jobDetailDao.GetUnloadedJobDetailsByJobId(jobId);
}
On disposal of connection the database connection is closed. using this line of code:
if (_DBConnection != null && _DBConnection.State == ConnectionState.Open)
_DBConnection.Close();
Edit2: I ran a trace and here are the set options that are being set
set quoted_identifier on
set arithabort off
set numeric_roundabort off
set ansi_warnings on
set ansi_padding on
set ansi_nulls on
set concat_null_yields_null on
set cursor_close_on_commit off
set implicit_transactions off
set language us_english
set dateformat mdy
set datefirst 7
set transaction isolation level read committed
I went and added that to the query that I generated in management studio and it still ran in less than a second. I even copied the query exactly as in the trace.
exec sp_executesql N'SELECT * FROM [dbo].[vwUnloadedJobDetailsWithData] WHERE JobID = #JobId',N'#JobId int',#JobId=36963
and it is still less than a second return time. I am so very confused.
Thanks,
Josh
the most likely scenarion why this would be happening is the difference in SET options between ssms and ado.net. that difference causes (re)building of execution plans that might not be optimal.
Alright, well I could not find any solution that would continue to allow me to use the dataset, so I went straight to using the SqlDataAdapter in code rather than using the auto generated TableAdapters.
According to the trace it performs the exact same query, but so far it works. It may not in two days, but for now it works it seems.
Just trying to think loudly:
Maybe there is a lock caused by another process/person? Is there anybody who updates the same row at the same time? Is there anybody who opens the table from Management studio or Query Analyzer with Open Table feature and plays with the filters?
Try looking for locks using sp_who2
Some thoughts:
What I'd call parameter sniffing for stored proc. Try the OPTION (RECOMPILE) hint, so your sent SQL looks like this:
exec sp_executesql
N'SELECT *
FROM [dbo].[vwUnloadedJobDetailsWithData]
WHERE JobID = #JobId
OPTION (RECOMPILE)',
N'#JobId int',
#JobId=36963
Explanation: When a query plan is produced and cached, it may be a bad, atypical value. Say JobID is usually very selective, but for that one execution it's not. When you run the query the next plan the cached plan is wrong for the next selective JobId. A plan will be recompiled for various reasons, but the value on recompilation matters.
Otherwise, what is the exact datatype of Jobid? If it's smallint, then the column will be converted to int in the parameterised query. When using a constant it will be smallint. Make sure the type is defined correctly: this matters in SQL code.