How to determine if query of SQL is too complex?

How to determine if query of SQL is too complex? - c#

In my c# application, the users can build dynamic reports from the SQL database.
I need to warn the users if their DB-query is too complex and takes too long to run.
I'm working with microsoft-sql-server 2008.
How can I do that?
Are there any statistic-algorithms to calculate the runtime of a query execution?

This is practically impossible. The database calculates execution-plans based on table and indices statistics and even the database itself cannot predict the runtime.
There might be some indications such as ordering (and grouping, which implies ordering) or several joins, but any algorithmic prediction is nearly impossible in my opinion.

You could create a formular, based on Factors like the number and types of joins, order by and where criteria but also the line-count of all tables affected.
Depending how that formular is build, that would give u a rough indication when a query gets to "heavy".

As others are pointing out, the absolute answer is that you can't tell up front, how long a query will take to run.
The closest thing I could suggest, to try and give a crude idea of what is involved in the query before running it, is to first retrieve the estimated execution plan for it, and doing some crude analysis of that.
SET SHOWPLAN_XML ON
GO
SELECT TOP 100 *
FROM MyTable
GO
In the execution plan XML, there is info like the estimated number of rows, whether the optimizer timed out before choosing an execution plan, what operations are performed (index seek/scan etc) and a whole ton of other stuff (you'd really need to dive deeper into execution plans).
So in theory, you could try and look at the info in the plan to make a crude/best guess judgement. Note it's only the estimated execution plan, and I don't know what level of accuracy/judgement you could make.
It doesn't give you time, just (maybe) some way to weigh up the relative anticipated cost of a query

As the others already shown, you can't predict the complexity beforehand. All you can do is letting the query run and abort if it takes too long.
To abort the query after some time, you could take a look into the SqlCommand.CommandTimeout Property and set it to some suitable value. The drawback of this approach that the user might wait the full time just to get an error message that the query simply took to long.
Another approach would be to let the user decide when it takes to much time. This can be done by using no limit (Timeout = 0), but call the execute method asynchronously and simply provide a cancel button to the user he can hit.
This last approach would be the same as within the SQL Server Management Studio. If you start a query you'll see in the footer a running timer on how long the query is currently running and within the toolbar you have a cancel button to stop the currently running query.

Related

Avoid Timeout on MySQL Query

I'm accessing a MySQL database using the standard MySql.Data package from Oracle. Every few releases of the application, we need to tweak the database schema (e.g. client wanted DECIMAL(10,2) changed to DECIMAL(10,3)) which the application handles by sending the necessary SQL statement. This works except that on a large database, the schema update can be a rather lengthy operation and times out.
The obvious solution is to crank up the timeout, but that results in a relatively poor user experience - I can put up a dialog that says "updating, please wait" and then just sit there with no kind of progress indicator.
Is there a way to get some kind of feedback from the MySQL server that it's 10% complete, 20% complete, etc., that I could pass on to the user?

There's two ways to approach this problem.
The first is the easiest way, as you've suggested, and just use a progress bar that bounces back and forth. It's not great, it's not the best user experience, but it's better than locking up the application and at least it's giving feedback. Also I assume this is not something that occurs regularly and is a one-off annoyance every now and again. Not something I'd really be worried about.
However, if you really are worried about user-experience and want to give better feed back, then you're going to need to try some metrics. Taking your DECIMAL example, time the change on different row-counts. 100,000 rows, a million rows, etc etc. This will give you a napkin-guess time it might take. Note, different hardware, other things running on the computer, you're never going to get it down exact. But you have an estimate.
Once you have an estimate, and you know the row-count, you can create a real progress bar based on those estimates. And if it gets to 100% and the real operation hasn't completed, or if it finishes before you get to 100% (and you can insta-jump the bar!), it's... something.
Personally I'd go with option one, and perhaps add a helpful message that Windows commonly does. "This may take a few minutes". Maybe add "Now's a great time for coffee!". And a nice little animated gif :)

C# application profiling gives different results

I'm new to profiling. I'm trying to profile a C# application which connects to an SQLite database and retrieve data. The database contains 146856400 rows and the select query retrieves 428800 rows after execution.
On the first execution the main thread takes 246686 ms
On second execution of the same code the main thread takes only 4296 ms
After restarting the system
On the first execution the main thread takes 244533 ms
On the second execution of the same code the main thread takes only 4053 ms
Questions:
1) Why is there a big difference between the first execution timing and the second execution timing
2) After restarting the system why I'm not getting the same results.
Pls help

You experience the difference between cold and warm execution of your query. Cold means the first time and warm all subsequent invocations of your db query.
The first time everything is "cold"
OS file system cache is empty.
SQLLite cache is empty.
ORM dynamic query compilation is not done and cached yet.
ORM Mapper cache is empty.
Garbage Collector needs to tune your working set
....
When you execute your query a second time all these first time initializations (caching) are done and you are measuring the effects of different cache levels as long as there is enough memory available to cache a substantial amount of your requested data.
A performance difference between 4 minutes and 4s is impressive. Both numbers are valid. Measuring something is easy. Telling someone else what exactly you have measured and how the performance can be improved by changing this or that is much harder.
The performance game goes often like this:
Customer: It is slow
Dev: I cannot repro your issue.
Customer: Here is my scenario ....
Dev: I still cannot repro it. Can you give me data set you use and the exact steps you did perform?
Customer: Sure. Here is the data and the test steps.
Dev: Ahh I see. I can make it 10 times faster.
Customer: That is great. Can I have the fix?
Dev: Sure here it is.
Customer: **Very Angry** It has become faster yes. But I cannot read my old data!
Dev: Ups. We need to migrate all your old data to the new much more efficient format.
We need to develop a a conversion tool which will take 3 weeks and your site will
have 3 days downtime while the conversion tool is running.
Or
We keep the old inefficient data format. But then we can make it only 9 times faster.
Customer: I want to access my data faster without data conversion!
Dev: Here is the fix which is 10% slower with no schema changes.
Customer: Finally. The fix does not break anything but it has not become faster?
Dev: I have measured your use case. It is only slow for the first time.
All later data retrievals are 9 times faster than before.
Customer: Did I mention that in my use case I read always different data?
Dev: No you did not.
Customer: Fix it!
Dev: That is not really possible without a major rewrite of large portions of our software.
Customer: The data I want to access is stored in a list. I want to process it sequentially.
Dev: In that case we can preload the data in the background while you are working the current data set. You will only experience a delay for the first data set on each working day.
Customer: Can I have the fix?
Dev: Sure here it is.
Customer: Perfect. It works!
Performance is hard to grasp since most of the time you deal with perceived performance which is subjective. Bringing it down to quantitative measurements is a good start but you need to tune your metrics to reflect actual customer use cases or you will likely optimize at the wrong places like above. A complete understanding of customer requirements and use cases is a must. On the other hand you need to understand your complete system (profile it as hell) to be able to tell the difference between cold and warm query execution and where you can tune the whole thing. These caches become useless if you query for different data all of the time (not likely). Perhaps you need a different index to speed up queries or you buy a SSD or you keep all of the data in memory and do all subsequent queries in memory....

Providing "Totals" for custom SQL queries on a regular basis

I would like some advice on how to best go about what I'm trying to achieve.
I'd like to provide a user with a screen that will display one or more "icon" (per say) and display a total next to it (bit like the iPhone does). Don't worry about the UI, the question is not about that, it is more about how to handle the back-end.
Let's say for argument sake, I want to provide the following:
Total number of unread records
Total number of waiting for approval
Total number of pre-approved
Total number of approved
etc...
I suppose, the easiest way to descrive the above would be "MS Outlook". Whenever emails arrive to your inbox, you can see the number of unread email being updated immediately. I know it's local, so it's a bit different, but now imagine having the same principle but for the queries above.
This could vary from user to user and while dynamic stored procedures are not ideal, I don't think I could write one sp for each scenario, but again, that's not the issue heree.
Now the recommendation part:
Should I be creating a timer that polls the database every minute (for example?) and run-all my relevant sql queries which will then provide me with the relevant information.
Is there a way to do this in real time without having a "polling" mechanism i.e. Whenever a query changes, it updates the total/count and then pushes out the count of the query to the relevant client(s)?
Should I have some sort of table storing these "totals" for each query and handle the updating of these immediately based on triggers in SQL and then when queried by a user, it would only read the "total" rather than trying to calculate them?
The problem with triggers is that these would have to be defined individually and I'm really tring to keep this as generic as possible... Again, I'm not 100% clear on how to handle this to be honest, so let me know what you think is best or how you would go about it.
Ideally when a specific query is created, I'd like to provide to choices. 1) General (where anyone can use this) and b) Specific where the "username" would be used as part of the query and the count returned would only be applied for that user but that's another issue.
The important part is really the notification part. While the polling is easy, I'm not sure I like it.
Imagine if I had 50 queries to be execute and I've got 500 users (unlikely, but still!) looking at the screen with these icons. 500 users would poll the database every minute and 50 queries would also be executed, this could potentially be 25000 queries per miuntes... Just doesn't sound right.
As mentioned, ideally, a) I'd love to have the data changes in real-time rather than having to wait a minute to be notified of a new "count" and b) I want to reduce the amount of queries to a minimum. Maybe I won't have a choice.
The idea behind this, is that they will have a small icon for each of these queries, and a little number will be displayed indicating how many records apply to the relevant query. When they click on this, it will bring them the relevant result data rather than the actual count and then can deal with it accordingly.
I don't know if I've explained this correctly, but if unclear, please ask, but hopefully I have and I'll be able to get some feedback on this.
Looking forward to your feeback.
Thanks.

I am not sure if this is the ideal solution but maybe a decent 1.
The following are the assumptions I have taken
Considering that your front end is a web application i.e. asp.net
The data which needs to be fetched on a regular basis is not hugh
The data which needs to be fetched does not change very frequently
If I were in this situation then I would have gone with the following approach
Implemented SQL Caching using SQLCacheDependency class. This class will fetch the data from the database and store in the cache of the application. The cache will get invalidated whenever the data in the table on which the dependency is created changes thus fetching the new data and again creating the cache. And you just need to get the data from the cache rest everything (polling the database, etc) is done by asp.net itself. Here is a link which describes the steps to implement SQL Caching and believe me it is not that difficult to implement.
Use AJAX to update the counts on the UI so that the User does not feel the pinch of PostBack.

What about "Improving Performance with SQL Server 2008 Indexed Views"?
"This is often particularly effective for aggregate views in decision
support or data warehouse environments"

Very slow T-SQL stored procedure sped up by dropping and recreating

I have a simple stored procedure in T-SQL that is instant when run from SQL Server Management Studio, and has a simple execution plan. It's used in a C# web front-end, where it is usually quick, but occasionally seems to get itself into a state where it sits there and times-out. It then does this consistently from any web-server. The only way to fix it that I’ve found is to drop and recreate it. It only happens with a single stored procedure, out of a couple of hundred similar procedures that are used in the application.
I’m looking for an answer that’s better than making a service to test it every n minutes and dropping and recreating on timeout.

As pointed out by other responses, the reasons could be many, varying from execution plan, to the actual SP code, to anything else. However, in my past experience, I faced a similar problem due to 'parameter sniffing'. Google for it and take a look, it might help. Basically, you should use local variables in your SP instead of the parameters passed in.

Not sure if my situation is too uncommon to be useful to others (It involved use of table variables inside the stored proc). But here is the story anyways.
I was working on an issue where a stored proc would take 10 seconds in most cases, but 3-4 minutes every now and then. After a little digging around, I found a pattern in the issue :
This being a stored proc that takes in a start date and and an end date, if I ran this for a year's worth of data (which is what people normally do), it ran in 10 sec. However when the query plan cache was cleared out, and if someone ran it for a day (uncommon use case), all further calls for a 1-year range would take 3-4 minutes, until I did a DBCC FREEPROCCACHE
The following 2 things were what fixed the problem
My first suspect was Parameter sniffing. Fixed it immediately using the local variable approach This, however, improved performance only by a small percentage (<10%).
In a clutching-at-straws approach, I changed the table variables that the original developer had used in this stored proc, to temp tables. This was what fixed the issue finally. Now that I know that this was the problem, I am doing some reading online, and have come across a few links such as
http://www.sqlbadpractices.com/using-table-variable-for-large-table-vs-temporary-table/
which seem to correspond with the issue I am seeing.
Happy coding!!

It's hard to say for sure without seeing SP code.
Some suggestions.
SQL server by default reuses execution plan for stored procedure. The plan is generated upon the first execution. That may cause a problem. For example, for the first time you provide input with very high selectivity, and SQL Server generates the plan keeping that in mind. Next time you pass low selectivity input, but SP reuses the old plan, causing very slow execution.
Having different execution paths in SP causes the same problem.
Try creating this procedure WITH RECOMPILE option to prevent caching.
Hope that helps.

Run SQL Profiler and execute it from the web site until it happens again. When it pauses / times out check to see what is happening on the SQL server itself.
There are lots of possibilities here depending on what the s'proc actually does. For example, if it is inserting records then you may have issues where the database server needs to expand the database and/or log file size to accept new data. If it's happening on the log file and you have slow drives or are nearing the max of your drive space then it could timeout.
If it's a select, then those tables might be locked for a period of time due to other inserts happening... Or it might be reusing a bad execution plan.
The drop / recreate dance is may only be delaying the execution to the point that the SQL server can catch up or it might be causing a recompile.

My original thought was that it was an index but on further reflection, I don't think that dropping and recreating the stored prod would help.
It most probably your cached execution plan that is causing this.
Try using DBCC FREEPROCCACHE to clean your cache the next time this happens. Read more here http://msdn.microsoft.com/en-us/library/ms174283.aspx
Even this is a reactive step - it does not really solve the issue.
I suggest you execute the procedure in SSMS and check out the actual Execution Plan and figure out what is causing the delay. (in the Menu, go to [View] and then [Include Actual Execution Plan])

Let me just suggest that this might be unrelated to the procedure itself, but to the actual operation you are trying to do on the database.
I'm no MS SQL expert, but I would'n be surprised that it behaves similarly to Oracle when two concurrent transactions try to delete the same row: the transaction that first reaches the deletion locks the row and the second transaction is then blocked until the first one either commits or rolls back. If that was attempted from your procedure it might appear as "stuck" (until the "locking" transaction is finished).
Do you have any long-running transactions that might lock rows that your procedure is accessing?

Improving search performance in large data sets

On a WPF application already in production, users have a window where they choose a client. It shows a list with all the clients and a TextBox where they can search for a client.
As the client base increased, this turns out to be exceptionally slow. Around 1 minute for a operation that happens around 100 times each day.
Currently MSSQL management studio says the query select id, name, birth_date from client takes 41 seconds to execute (around 130000 rows).
Is there any suggestions on how to improve this time? Indexes, ORMs or direct sql queries on code?
Currently I'm using framework 3.5 and LinqToSql

If your query is actually SELECT id, name, birth_date from client (ie, no where clause) there is very little that you'll be able to do to speed that up short of new hardware. SQL Server will have to do a table scan to get all of the data. Even a covering index means that it will have to scan an index just as big as the table.
What you need to ask yourself is: is a list of 130000 clients really useful for your users? I anybody really going to scroll through to the 75613th entry in a list to find the user that they want? The answer is probably not. I would go with the search option only. At least then you can add indices that make sense for those queries.
If you absolutely do need the entire list, try loading it lazily in chunks. Start with the first 500 records and then add more records as the user moves the scroll bar. That way the initial load time is reduced and the user will only load the data that is necessary.

Why do you need the list of all the clients? Couldn't you just have the search TextBox that you describe and handle the search query on the server side. There you set a cap on the maximum number of returned rows for an individual client search (e.g. max 500 matches).
Alternatively, some efficiency gains may be achived by caching the client data list on the web server

Indexing should not help, based on your query. You could use a view which caches the sorted query (assuming you're not ordering by the id?), but given SQL Server's baked-in query cache for adhoc queries you're probably not going to see much gain there either. The ORM does add some overhead, but there are several tutorials out there for cutting the cost of that (e.g. http://www.sidarok.com/web/blog/content/2008/05/02/10-tips-to-improve-your-linq-to-sql-application-performance.html). Main points there that apply to you are to use compiled queries wherever possible, and turn off optimistic concurrency for read-only data.
An even bigger performance gain could be realized by having your clients not hit the db directly. If you add a service layer in there (not necessarily a web service, but it could be) then the service class or application could put some smart caching in place, which would help by an order of magnitude for read-only queries like this.

Go in to SQL Server, do a new query. In the Query menu click the "Include Client Statistics".
Run the query just as you would from code.
It will display the results and also a tab next to the result called "Client Statistics"
Click that and look at the time in the "Wait time on server replies" This is in ms, and it's the time the server was actually executing.
I just ran this query:
select firstname, lastname from leads
It took 3ms on the server to fetch 301,000 records.
The "Total Execution Time" was something like 483ms, which includes the time for SSMS to actually get the data and process it. My query took something like 2.5-3s to run in SSMS and the remaining time (2500ms or so) was actually for SSMS to paint the results etc.)
My guess is, the 41 seconds is probably not being spent on the SQL server, as 130,000 records really isn't that much. Your 41 seconds is probably largely being spent by everything after the SQL server returns the results.
If you find out SQL Server is taking a long time to execute, in the query menu turn on "Include Actual Execution Plan" Rerun your query. A new tab appears called "Execution Plan" this tab will show you what SQL server is doing when you do a select on this table as well as a percentage of where it spends all of it's time. In my case it spent 100% of the time in a "Clustered Index Scan" of PK_Leads
Edited to include more stats

In general:
Find out what takes so much time, executing the query or retrieving the results
If its the query execution, the query plan will tell you which indexes are missing, just press the display query plan button in the SSMS and you will get hints on which indexes you should create to increase performance
If its the retrieval of the values, there is not much you can do about it besides upgrading hardware (ram, disk, network etc.)
But:
In your case it looks like the query is a full table scan, which is never good for performance, check if you really need to retrieve all this data at once.
Since there are no clauses what so ever its unlikely that its the query execution that is the problem. Meaning additional indexes will not help.
You will need to change the way the application access the data. Instead of loading all clients into memory and then search from them in memory you will need to pass on the search term to the database query.
LinqToSql enable you to use different features for searching values, here is a blog describing most of them:
http://davidhayden.com/blog/dave/archive/2007/11/23/LINQToSQLLIKEOperatorGeneratingLIKESQLServer.aspx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.