I've got a parameterized search query that doesn't perform too well, and I think it's because of parameter sniffing. I'd like to do something like the OPTION(RECOMPILE) mentioned in this answer, but I'm not sure what the Sybase equivalent is.
Is there a Sybase equivalent of OPTION(RECOMPILE)? Or would I need to switch to a stored procedure to get that functionality?
NOTE: I have no idea what 'parameter sniffing' is so fwiw ...
A few scenarios I can imagine (off the top of my head) that could explain poor performance for a query, where performance could be improved by forcing a (re)compile of the query:
1 - DDL changes (eg, column datatype change, index creation/modification) for a table, or updated stats, could lead to a better compilation plan; in the good ol' days it was necessary to run sp_recompile table_name after such a change, but in recent years (last 6-8 years?) this should be automatically performed under the covers; soooo, if you're running a (relatively) old version of ASE, and assuming a DDL/stats modification to a referenced table, it may be necessary to have the table's owner run sp_recompile table-name after such a DDL change
2 - with ASE 15/16 it's not uncommon for the DBA to configure the dataserver with statement cache enabled; this allows the dataserver to create light weight procedures (LWPs; aka 'temp' procedure) for queries that are oft repeated (the objective being to eliminate the costly compilation overhead for look-alike queries); the downside to using statement cache is that a difference in parameter values that could lead to large variations in record counts can cause follow-on queries to obtain/re-use a 'poor' query plan associated with a previous copy of the query; in this scenario the SQL developer can run set statement_cache off prior to running a query ... this will override any statement cache settings at the dataserver level and allow the query to be compiled with the current set of SARGs/parameters ... with the trade-off being that you'll now incur the overhead of compiling every query submitted to the dataserver
3 - if the application is using prepared statements to submit 'parameterized' queries, the process of 'preparing' the statement typically signals the dataserver to create a LWP (aka 'temp' procedure); in this scenario the first invocation of the prepared statement will be compiled and stored in procedure cache, with follow-on invocations (by the app) of the prepared statement re-using the query plan from the first invocation; again, one benefit is the elimination of costly compilation overhead for queries #2 through #n, but the downside of re-using 'poor' query plan if the parameter values can lead to a large variation in the number of records affected; the only 'fix' I can think of for this scenario is for the application to be recoded to not use prepared statements ... and if the dataserver is configured with statement cache enabled, then make sure 'set statement_cache off' is issued before submitting (non-prepared) statements to the dataserver; and of course the obvious downside is that each query will now incur the overhead of compilation
NOTES:
set statement_cache {on|off} is a session-level setting; you only need to issue it once to enable (on) or disable (off) statement cache for the rest of the session
If you know you have 2 (or 3/4/5) different sets of parameters that could lead to different optimization plans for the same query, you can trick the dataserver's statement cache lookup process by submitting slightly different versions of the same query; to the dataserver these will look like 'different' queries and thus be matched with different query plans in statement cache; with this little coding technique you could still benefit from the use of statement cache (or application-specific prepared statements) while limiting (eliminating?) the re-use of 'poor' query plans; for example, while these three queries are logically identical, the dataserver will treat them as 'different' queries because of the different table aliases ...
select count(*) from sysobjects where id = 1
select count(*) from sysobjects o where id = 1
select count(*) from sysobjects o1 where id = 1
... and if using application-side prepared statements then you would create/manage 3 different prepared statements
Related
I'm coding an application with Entity Framework in which I rely heavily on user defined functions.
I have a question about the best way (most optimized way) of how I limit and page my result sets. Basically I am wondering if these two options are the same or one is prefered performance wise.
Option 1.
//C#
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
Option 2.
//C#
var result2 = _DB.fn_GetData(page = 0, size = 100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
ORDER BY Id
OFFSET (size * page) ROWS FETCH NEXT size ROWS ONLY
To me these seem to be producing about the same result, but maybe I am missing some key aspect.
You'll have to be aware when your LINQ statement is AsEnumerable and when it is AsQueryable. As long as your statement is an IQueryable<...> the software will try to translate it into SQL and let your database do the query. Once it really has lost the IQueryable, and has become an implementation of an IEnumerable, the data has been brought to local memory, and all further LINQ statements will be performed by your process, not by the database.
If you use your debugger, you will see that the return value of your fn_getData returns an IEnumerable. This means that the result of fn_GetData is brought to local memory and your OrderBy etc is performed by your process.
Usually it is much more efficient to only move the records that you will use to local memory. Besides: do not fetch the complete records, but only the properties that you plan to use. So in this case I guess you'll have to create an extended version of fn_GetData that returns only the values you plan to use
I suggest second option because SQL Server can more faster then C# methods.
In your first option, you take all of the records in table and loop through. But second option, SQL Server do it for you and you get what you want.
You should apply the limiting and where clauses (it depends on table indexes) in the database as far as possible. For first example;
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
The whole table is retrieved from database into in-memory and it kills the performance and reliability. I strongly don't suggest it. You should consider to put some limitations to filter records on the database. So, the second option is better approach in this case.
In a heavily multi-threaded scenario, I have problems with a particular EF query. It's generally cheap and fast:
Context.MyEntity
.Any(se => se.SameEntity.Field == someValue
&& se.AnotherEntity.Field == anotherValue
&& se.SimpleField == simpleValue
// few more simple predicates with fields on the main entity
);
This compiles into a very reasonable SQL query:
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM (SELECT [Extent1].[Field1] AS [Field1]
FROM [dbo].[MyEntity] AS [Extent1]
INNER JOIN [dbo].[SameEntity] AS [Extent2] ON [Extent1].[SameEntity_Id] = [Extent2].[Id]
WHERE (N'123' = [Extent2].[SimpleField]) AND (123 = [Extent1].[AnotherEntity_Id]) AND -- further simple predicates here -- ) AS [Filter1]
INNER JOIN [dbo].[AnotherEntity] AS [Extent3] ON [Filter1].[AnotherEntity_Id1] = [Extent3].[Id]
WHERE N'123' = [Extent3].[SimpleField]
)) THEN cast(1 as bit) ELSE cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
The query, in general, has optimal query plan, uses the right indices and returns in tens of milliseconds which is completely acceptable.
However, when a critical number of threads (<=40) starts executing this query, the performance on it drops to tens of seconds.
There are no locks in the database, no queries are writing data to these tables and it reproduces very well with a database that's practically isolated from any other operations. The DB resides on the same physical machine and the machine is not overloaded at any point, i.e. has plenty of spare CPU, memory and other resources the CPU is overloaded by this operation.
Now what's really bizarre is that when I replace the EF Any() call with Context.Database.ExecuteSqlCommand() with the copy-pasted SQL (also using parameters), the problem magically disappears. Again, this reproduces very reliably - replacing the Any() call with copy-pasted SQL increases the performance by 2-3 orders of magnitude .
An attached profiler (dotTrace) sampling shows that the threads seem to all spend their time in the following method:
Is there anything I've missed or did we hit some ADO.NET / SQL Server cornercase?
MORE CONTEXT
The code running this query is a Hangfire job. For the purpose of test, a script queues a lot of jobs to be performed and up to 40 threads keep processing the job. Each job uses a separate DbContext instance and it's not really being used a lot. There are a few more queries before and after the problematic query and they take expected times to execute.
We're using many different Hangfire jobs for similar purposes and they behave as expected. Same with this one, except when it gets slow under high concurrency (of exact same jobs). Also, just switching to SQL on this particular query fixes the problem.
The profiling snapshot above is representative, all the threads slow down on this particular method call and spend the vast majority of their time on it.
UPDATE
I'm currently re-running a lot of those checks for sanity and errors. The easy reproduction means it's still on a remote machine to which I can't connect using VS for debugging.
One of the checks showed that my previous statement about free CPU was false, the CPU was not entirely overloaded but multiple cores were in fact running on full capacity for the whole duration of the long running jobs.
Re-checking everything again and will come back with updates here.
Can you try as shown below and see whether is there any performance improvement or not ...
Context.MyEntity.AsNoTracking()
.Any(se => se.SameEntity.Field == someValue
&& se.AnotherEntity.Field == anotherValue
&& se.SimpleField == simpleValue
);
Check if you are reusing the context in a loop. Doing so may create many objects during your performance test and giving the garbage collector a lot of work to do.
Faulty initial assumptions. The SQL in the question was obtained by pasting the code into LINQPad and having it generate the SQL.
After attaching an SQL profiler to the actual DB used, it showed a slightly different SQL involving outer joins, which are suboptimal and didn't have a proper index in place.
It remains a mystery why LINQPad generated different SQL, even though it's using the same EntityFramework.dll, but the original problem is resolved and all that remains is to optimize the query.
Many thanks for everyone involved.
I am currently fiddeling around with the AdventureWorks sample database and LinqPad, for scratching out some ideas.
This is the query in question:
SalesOrderHeaders.GroupBy (soh => new {soh.CustomerID, soh.BillToAddressID})
.Where(soh => soh.Skip(1).Any())
.Dump();
The idea was to find duplicates based on some criteria and then display them except the first set of data. The result should be deleted from the table.
After executing the query I get result A)
After executing the query again I get Result B)
I do not care about the correct results of the query, but about the ordering of the resultset. Only those two possibilities exist and they alternate on every run of the query.
Surely I could just order by Key, but I am more interested in why does this even occur? Why is the order chaning/alternating?
Sql server select query's result set order is not deterministic. It's just how sql server works and it's not a bug in neither linq or linqpad. The only way to get deterministic results on your query, as yourself noted, is to use an OrderBy clause.
Edit: About getting same results in SSMS if you run the query multiple times, see this. This post explains why you might get the same results if you execute a query multiple times and why you shouldn't rely on it.
Ordering is never deterministic as suggested earlier, but try and insert orderby clause in either your sql query or the Linq query, that's the only way to make it deterministic.
In fact let's look little deeper in the database for the reason. DB would be fetching all the data from the disk via i/o. data is stored in the internal sql server structures like pages, extents, segments (these are Oracle data blocks, I hope sql server have the similar stuff). Now when the query is fired, database would know all the different locations to fetch the data from, but this is not a serial operation, it is sort of parallel fetch, where different datasets are then combined to provide a user view. Now as we know in case of threads , it can never be deterministic that who goes first and who returns first, that's totally OS scheduling of the threads, hope that clarifies further little more.
OrderBy clause will work on fetched data to make in a particular order, so will always yield deterministic result.
I have a table:
-- Tag
ID | Name
-----------
1 | c#
2 | linq
3 | entity-framework
I have a class that will have the following methods:
IEnumerable<Tag> GetAll();
IEnumerable<Tag> GetByName();
Should I use a compiled query in this case?
static readonly Func<Entities, IEnumerable<Tag>> AllTags =
CompiledQuery.Compile<Entities, IEnumerable<Tag>>
(
e => e.Tags
);
Then my GetByName method would be:
IEnumerable<Tag> GetByName(string name)
{
using (var db = new Entities())
{
return AllTags(db).Where(t => t.Name.Contains(name)).ToList();
}
}
Which generates a SELECT ID, Name FROM Tag and execute Where on the code. Or should I avoid CompiledQuery in this case?
Basically I want to know when I should use compiled queries. Also, on a website they are compiled only once for the entire application?
You should use a CompiledQuery when all of the following are true:
The query will be executed more than once, varying only by parameter values.
The query is complex enough that the cost of expression evaluation and view generation is "significant" (trial and error)
You are not using a LINQ feature like IEnumerable<T>.Contains() which won't work with CompiledQuery.
You have already simplified the query, which gives a bigger performance benefit, when possible.
You do not intend to further compose the query results (e.g., restrict or project), which has the effect of "decompiling" it.
CompiledQuery does its work the first time a query is executed. It gives no benefit for the first execution. Like any performance tuning, generally avoid it until you're sure you're fixing an actual performance hotspot.
2012 Update: EF 5 will do this automatically (see "Entity Framework 5: Controlling automatic query compilation") . So add "You're not using EF 5" to the above list.
Compiled queries save you time, which would be spent generating expression trees. If the query is used often and you'll save the compiled query, you should definitely use it. I had many cases when the query parsing took more time than the actual round trip to the database.
In your case, if you are sure that it would generate SELECT ID, Name FROM Tag without the WHERE case (which I doubt, as your AllQueries function should return IQueryable and the actual query should be made only after calling ToList) - you shouldn't use it.
As someone already mentioned, on bigger tables SELECT * FROM [someBigTable] would take very long and you'll spend even more time filtering that on the client side. So you should make sure that your filtering is made on the database side, no matter if you are using compiled queries or not.
compiled queries are more helpfull with linq queries with large expression trees say complex queries to gain performance over building expression tree again and again while reusing query. in your case i guess it will save a very little time.
Compiled queries are compiled when the application is compiled and every time you reuse a query often or it is complex you should definitely try compiled queries to make execution faster.
But I would not go for it on all queries as it is a little more code to write and for simple queries it might not be worthwhile.
But for maximum performance you should also evaluate Stored Procedures where you do all the processing on the database server, even if Linq tries to push as much of the work to the db as possible you will have situations where a stored procedure will be faster.
Compiled queries offer a performance improvement, but it's not huge. If you have complex queries, I'd rather go with a stored procedure or a view, if possible; letting the database do it's thing might be a better approach.
I'd like to increase performance of very simple select and update queries of .NET & MSSQL 2k8.
My queries always select or update a single row. The DB tables have indexes on the columns I query on.
My test .NET code looks like this:
public static MyData GetMyData(int accountID, string symbol)
{
using (var cnn = new SqlConnection(connectionString))
{
cnn.Open();
var cmd = new SqlCommand("MyData_Get", cnn);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(CreateInputParam("#AccountID", SqlDbType.Int, accountID));
cmd.Parameters.Add(CreateInputParam("#Symbol", SqlDbType.VarChar, symbol));
SqlDataReader reader = cmd.ExecuteReader();
while (reader.Read())
{
var MyData = new MyData();
MyData.ID = (int)reader["ID"];
MyData.A = (int)reader["A"];
MyData.B = reader["B"].ToString();
MyData.C = (int)reader["C"];
MyData.D = Convert.ToDouble(reader["D"]);
MyData.E = Convert.ToDouble(reader["E"]);
MyData.F = Convert.ToDouble(reader["F"]);
return MyData;
}
}
}
and the according stored procedure looks like this:
PROCEDURE [dbo].[MyData_Get]
#AccountID int,
#Symbol varchar(25)
AS
BEGIN
SET NOCOUNT ON;
SELECT p.ID, p.A, p.B, p.C, p.D, p.E, p.F FROM [MyData] AS p WHERE p.AccountID = #AccountID AND p.Symbol = #Symbol
END
What I'm seeing if I run GetMyData in a loop, querying MyData objects, I'm not exceeding about ~310 transactions/sec. I was hoping to achieve well over a 1000 transactions/sec.
On the SQL Server side, not really sure what I can improve for such a simple query.
ANTS profiler shows me that on the .NET side, as expected, the bottleneck is cnn.Open and cnn.ExecuteReader, however I have no idea how I could significantly improve my .NET code?
I've seen benchmarks though where people seem to easily achieve 10s of thousands transactions/sec.
Any advice on how I can significantly improve the performance for this scenario would be greatly appreciated!
Thanks,
Tom
EDIT:
Per MrLink's recommendation, adding "TOP 1" to the SELECT query improved performance to about 585 transactions/sec from 310
EDIT 2:
Arash N suggested to have the select query "WITH(NOLOCK)" and that dramatically improved the performance! I'm now seeing around 2500 transactions/sec
EDIT 3:
Another slight optimization that I just did on the .NET side helped me to gain another 150 transactions/sec. Changing while(reader.Read()) to if(reader.Read()) surprisingly made quite a difference. On avg. I'm now seeing 2719 transactions/sec
Try using WITH(NOLOCK) in your SELECT statement to increase the performance. This would select the row without locking it.
SELECT p.ID, p.A, p.B, p.C, p.D, p.E, p.F FROM [MyData] WITH(NOLOCK) AS p WHERE p.AccountID = #AccountID AND p.Symbol = #Symbol
Some things to consider.
First, your not closing the server connection. (cnn.Close();) Eventually, it will get closed by the garbage collector. But until that happens, your creating a brand new connection to the database every time rather than collecting one from the connection pool.
Second, Do you have an index set in Sql Server covering the AccountID and Symbol columns?
Third, While accountId being and int is nice and fast. The Symbol column being varchar(25) is always going to be much slower. Can you change this to an int flag?
Make sure your database connections are actually pooling. If you are seeing a bottleneck in cnn.Open, there would seem to be a good chance they are not getting pooled.
I was hoping to achieve well over a 1000 transactions/sec [when running GetMyData in a loop]
What you are asking for is for GetMyData to run in less than 1ms - this is just pointless optimisation! At the bare minimum this method involves a round trip to the database server (possibly involving network access) - you wouldn't be able to make this method much faster if your query was SELECT 1.
If you have a genuine requirement to make more requests per second then the answer is either to use multiple threads or to buy a faster PC.
There is absolutely nothing wrong with your code - I'm not sure where you have seen people managing 10,000+ transactions per second, but I'm sure this must have involved multiple concurrent clients accessing the same database server rather than a single thread managing to execute queries in less than a 10th of a ms!
Is your method called frequently? Could you batch your requests so you can open your connection, create your parameters, get the result and reuse them for several queries before closing the whole thing up again?
If the data is not frequently invalidated (updated) I would implement a cache layer. This is one of the most effective ways (if used correctly) to gain performance.
You could use output parameters instead of a select since there's always a single row.
You could also create the SqlCommand ahead of time and re-use it. If you are executing a lot of queries in a single thread you can keep executing it on the same connection. If not, you could create a pool of them or do cmdTemplate.Clone() and set the Connection.
Try re-using the Command and Prepare'ing it first.
I can't say that it will definitely help, but it seems worth a try.
In no particular order...
Have you (or your DBAs) examined the execution plan your stored procedure is getting? Has SQL Server cached a bogus execution plan (either due to oddball parameters or old stats).
How often are statistics updated on the database?
Do you use temp tables in your stored procedure? If so, are they create upfront. If not, you'll be doing a lot of recompiles as creating a temp table invalidates the execution plan.
Are you using connection pooling? Opening/Closing a SQL server connection is an expensive operation.
Is your table clustered on accountID and Symbol?
Finally...
Is there a reason you're hitting this table by account and symbol rather than, say, just retrieving all the data for an entire account in one fell swoop?