SQL CLR Web Service Call: Limiting Overhead - c#

I'm attempting to improve query performance for an application and I'm logically stuck.
So the application is proprietary and thus we're unable to alter application-side code. We have, however, received permission to work with the underlying database (surprisingly enough). The application calls a SQL Server database, so the current idea we're running with is to create a view with the same name as the table and rename the underlying table. When the application hits the view, the view calls one of two SQL CLR functions, which both do nothing more than call a web service we've put together. The web service performs all the logic, and contains an API call to an external, proprietary API that performs some additional logic and then returns the result.
This all works, however, we're having serious performance issues when scaling up to large data sets (100,000+ rows). The pretty clear source of this is the fact we're having to work on one row at a time with the web service, which includes the API call, which makes for a lot of latency overhead.
The obvious solution to this is to figure out a way to limit the number of times that the web service has to be hit per query, but this is where I'm stuck. I've read about a few different ways out there for potentially handling scenarios like this, but as a total database novice I'm having difficulty getting a grasp on what would be appropriate in this situation.
If any there are any ideas/recommendations out there, I'd be very appreciative.

There are probably a few things to look at here:
Is your SQLCLR TVF streaming the results out (i.e. are you adding to a collection and then returning that collection at the end, or are you releasing each row as it is completed -- either with yield return or building out a full Enumerator)? If not streaming, then you should do this as it allows for the rows to be consumed immediately instead of waiting for the entire process to finish.
Since you are replacing a Table with a View that is sourced by a TVF, you are naturally going to have performance degradation since TVFs:
don't report their actual number of rows. T-SQL Multi-statement TVFs always appear to return 1 row, and SQLCLR TVFs always appear to return 1000 rows.
don't maintain column statistics. When selecting from a Table, SQL Server will automatically create statistics for columns referenced in WHERE and JOIN conditions.
Because of these two things, the Query Optimizer is not going to have an easy time generating an appropriate plan if the actual number of rows is 100k.
How many SELECTs, etc are hitting this View concurrently? Since the View is hitting the same URI each time, you are bound by the concurrent connection limit imposed by ServicePointManager ( ServicePointManager.DefaultConnectionLimit ). And the default limit is a whopping 2! Meaning, all additional requests to that URI, while there are already 2 active/open HttpWebRequests, will wait inline, patiently. You can increase this by setting the .ServicePoint.ConnectionLimit property of the HttpWebRequest object.
How often does the underlying data change? Since you switched to a View, that doesn't take any parameters, so you are always returning everything. This opens the door for doing some caching, and there are two options (at least):
cache the data in the Web Service, and if it hasn't reached a particular time limit, return the cached data, else get fresh data, cache it, and return it.
go back to using a real Table. Create a SQL Server Agent job that will, every few minutes (or maybe longer if the data doesn't change that often): start a transaction, delete the current data, repopulate via the SQLCLR TVF, and commit the transaction. This requires that extra piece of the SQL Agent job, but you are then back to having more accurate statistics!!
For more info on working with SQLCLR in general, please visit: SQLCLR Info

Related

SQL Server High Frequency Inserts

I've a system where Data is being inserted through SP that's called via WCF Service.
In system, we have currently 12000+ actively logged in Users who will be calling WCF service at every 30 seconds (effectively min 200 requests per second).
On SQL Server side, CPU Usage shoots to 100% and when I examined, > 90% of time was spent in DB Writes. This affects overall server performance.
I need suggestion to resolve this issue so that we have less DB write operations and more CPU remains free.
Am open to integrate any other DB Server, use Entity Framework or any other ORM combination if needed. I need to have solution to handle this issue.
Other information that might be helpful:
Table has no indexes defined
Database has growth factor set to 200MB.
SQL Server Version is 2012.
SImple solution: back the writes. Do not call into the sql server for every insert.
Make a service that collects them and calls them more coarsely. The main problem is that transaction handling is a little heavy cost wise - in cases like that it may make sense to batch them.
Do not call a SP for every row, load them into a temp table and then process them in bulk (or use a table variable to provide the sp with multiple lines of information at once).
This gets rid of a lot of issues, including a ton of commits (you basically ask for like 200 TPS which is quite heavy and not needed here).
How you do that is up to you - but for something that heavy I would stay away from an ORM (Entity Framework is hilarious in not batching anything - that should be tons of sp calls) and use handcrafted sql at least for this part. I love ORM's but it is always nice to have a high performance hand crafted approach when needed.

Store a database table in memory in a C# website application?

I have noticed that our web application queries a particular table an enormous amount of times. The table is relatively small, with only about a hundred rows that are used.
I'm wondering if there is a way to store this table once every 15 minutes or so in memory in the website application, so the system doesn't have to make so many queries to get the same information over and over again. This would be available across many different users.
The table is the Client table, so users login from many different clients. The data is pretty static, probably getting updated perhaps once a day.
Updates: SQL profiler is showing the query is run quite a bit, so that's what concerns me. The website is not notably slow. I just thought this could help make it even faster.
If the table is small and frequently queried, there is an outstanding chance that the data and any indices is entirely in SQL Server's memory, the query plan is cached, and that the query will be extremely fast.
Measure the actual performance impact before making any changes.
If you see there is a performance impact, there are many caching strategies that you can use to reduce trips to the database. More information about access patterns to the table and the need for information consistency would be needed to recommend a specific caching strategy.
You state
to get the same information over and over again
but also state
once every 15 minutes
If the information really is the same over and over, you can load it once into the ASP.Net cache at application start. If it might change every so often, but it is OK for the data to be a little out-of-date for a given user, you can use a time-based cache expiration policy. If the data changes only every so often but must be up-to-date immediately after it changes, you can consider a SQL Dependency for cache expiration.
For more information on ASP.Net caching see
http://msdn.microsoft.com/en-us/library/xsbfdd8c(v=vs.100).aspx
and specifically
http://msdn.microsoft.com/en-us/library/6hbbsfk6(v=vs.100).aspx
My suggestion would be to create a WCF windows service - using REST you could easily cache the SQLDataReader (or other DataReader) and implement a TTL metric to re-query at an interval.
Well,there is few solutions.
If you want to load data in memory every 15 minutes you should use some of the .net caching library's,for example system .NET Caching where you could set expiration polices,and other.
You could try optimize you query with nonclustered indexes
You could use App Fabric caching,or something similar
And last,try to add more memory on sql server server

C# + SQL Server - Fastest / Most Efficient way to read new rows into memory

I have an SQL Server 2008 Database and am using C# 4.0 with Linq to Entities classes setup for Database interaction.
There exists a table which is indexed on a DateTime column where the value is the insertion time for the row. Several new rows are added a second (~20) and I need to effectively pull them into memory so that I can display them in a GUI. For simplicity lets just say I need to show the newest 50 rows in a list displayed via WPF.
I am concerned with the load polling may place on the database and the time it will take to process new results forcing me to become a slow consumer (Getting stuck behind a backlog). I was hoping for some advice on an approach. The ones I'm considering are;
Poll the database in a tight loop (~1 result per query)
Poll the database every second (~20 results per query)
Create a database trigger for Inserts and tie it to an event in C# (SqlDependency)
I also have some options for access;
Linq-to-Entities Table Select
Raw SQL Query
Linq-to-Entities Stored Procedure
If you could shed some light on the pros and cons or suggest another way entirely I'd love to hear it.
The process which adds the rows to the table is not under my control, I wish only to read the rows never to modify or add. The most important things are to not overload the SQL Server, keep the GUI up to date and responsive and use as little memory as possible... you know, the basics ;)
Thanks!
I'm a little late to the party here, but if you have the feature on your edition of SQL Server 2008, there is a feature known as Change Data Capture that may help. Basically, you have to enable this feature both for the database and for the specific tables you need to capture. The built-in Change Data Capture process looks at the transaction log to determine what changes have been made to the table and records them in a pre-defined table structure. You can then query this table or pull results from the table into something friendlier (perhaps on another server altogether?). We are in the early stages of using this feature for a particular business requirement, and it seems to be working quite well thus far.
You would have to test whether this feature would meet your needs as far as speed, but it may help maintenance since no triggers are required and the data capture does not tie up your database tables themselves.
Rather than polling the database, maybe you can use the SQL Server Service broker and perform the read from there, even pushing which rows are new. Then you can select from the table.
The most important thing I would see here is having an index on the way you identify new rows (a timestamp?). That way your query would select the top entries from the index instead of querying the table every time.
Test, test, test! Benchmark your performance for any tactic you want to try. The biggest issues to resolve are how the data is stored and any locking and consistency issues you need to deal with.
If you table is updated constantly with 20 rows a second, then there is nothing better to do that pull every second or every few seconds. As long as you have an efficient way (meaning an index or clustered index) that can retrieve the last rows that were inserted, this method will consume the fewest resources.
IF the updates occur in burst of 20 updates per second but with significant periods of inactivity (minutes) in between, then you can use SqlDependency (which has absolutely nothing to do with triggers, by the way, read The Mysterious Notification for to udneratand how it actually works). You can mix LINQ with SqlDependency, see linq2cache.
Do you have to query to be notified of new data?
You may be better off using push notifications from a Service Bus (eg: NServiceBus).
Using notifications (i.e events) is almost always a better solution than using polling.

Load SQL query result data into cache in advance

I have the following situation:
.net 3.5 WinForm client app accessing SQL Server 2008
Some queries returning relatively big amount of data are used quite often by a form
Users are using local SQL Express and restarting their machines at least daily
Other users are working remotely over slow network connections
The problem is that after a restart, the first time users open this form the queries are extremely slow and take more or less 15s on a fast machine to execute. Afterwards the same queries take only 3s. Of course this comes from the fact that no data is cached and must be loaded from disk first.
My question:
Would it be possible to force the loading of the required data in advance into SQL Server cache?
Note
My first idea was to execute the queries in a background worker when the application starts, so that when the user starts the form the queries will already be cached and execute fast directly. I however don't want to load the result of the queries over to the client as some users are working remotely or have otherwise slow networks.
So I thought just executing the queries from a stored procedure and putting the results into temporary tables so that nothing would be returned.
Turned out that some of the result sets are using dynamic columns so I couldn't create the corresponding temp tables and thus this isn't a solution.
Do you happen to have any other idea?
Are you sure this is the execution plan being created, or is it server memory caching that's going on? Maybe the first query loads quite a bit of data, but subsequent queries can use the already-cached data, and so run much quicker. I've never seen an execution plan take more than a second to generate, so I'd suspect the plan itself isn't the cause.
Have you tried running the index tuning wizard on your query? If it is the plan that's causing problems, maybe some statistics or an additional index will help you out, and the optimizer is pretty good at recommending things.
I'm not sure how you are executing your queries, but you could do:
SqlCommand Command = /* your command */
Command.ExecuteReader(CommandBehavior.SchemaOnly).Dispose();
Executing your command with the schema-only command behavior will add SET FMTONLY ON to the query and cause SQL Server to get metadata about the result set (requiring generation of the plan), but will not actually execute the command.
To narrow down the source of the problem you can always use the SQL Server Objects in perfmon to get a general idea of how the local instance of SQL Server Express is performing.
In this case you would most likely see a lower Buffer Cache Hit Ratio on the first request and a higher number on subsequent requests.
Also you may want to check out http://msdn.microsoft.com/en-us/library/ms191129.aspx It describes how you can set a sproc to run automatically when the SQL Server service starts up.
If you retrieve the Data you need with that sproc then maybe the data will remain cached and improve the performance the first time the data is retrieved by the end user via your form.
In the end I still used the approach I tried first: Executing the queries from a stored procedure and putting the results into temporary tables so that nothing would be returned. This 'caching' stored procedure is executed in the background whenever the application starts.
It just took some time to write the temporary tables as the result sets are dynamic.
Thanks to all of you for your really fast help on the issue!

Optimizing Smart Client Performance

I have a smart client (WPF) that makes calls to the server va services (WCF). The screen I am working on holds a list of objects that it loads when the constructor is called. I am able to add, edit and delete records in the list.
Typically what I am doing is after every add or delete I am reloading the entire model from the service again, there are a number off reasons for this including the fact that the data may have changed on the server between calls.
This approach has proved to be a big hit on perfomance because I am loading everything sending the list up and down the wire on Add and Edit.
What other options are open to me, should I only be send the required information to the server and how would I go about not reloading all the data again ever time an add or delete is performed?
The optimal way of doing what you're describing (I'm going to assume that you know that client/server I/O is the bottleneck already) is to send only changes in both directions once the client is populated.
This can be straightforward if you've adopted a journaling model for updates to the data. In order for any process to make a change to the shared data, it has to create a time-stamped transaction that gets added to a journal. The update to the data is made by a method that applies the transaction to the data.
Once your data model supports transaction journals, you have a straightforward way of keeping the client and server in sync with a minimum of network traffic: to update the client, the server sends all of the journal entries that have been created since the last time the client was updated.
This can be a considerable amount of work to retrofit into an existing design. Before you go down this road, you want to be sure that the problem you're trying to fix is in fact the problem that you have.
Make sure this functionality is well-encapsulated so you can play with it without having to touch other components.
Have your source under version control and check in often.
I highly recommend having a suite of automated unit tests to verify that everything works as expected before refactoring and continues to work as you perform each change.
If the performance hit is on the server->client transfer of data, moreso than the querying, processing and disk IO on the server, you could consider devising a hash of a given collection or graph of objects, and passing the hash to a service method on the server, which would query and calculate the hash from the db, compare the hashes, and return true or false. Only if false would you then reload data. This works if changes are unlikely or infrequent, because it requires two calls to get the data, when it has changed. If changes in the db are a concern, you might not want to only get the changes when the user modifies or adds something -- this might be a completely separate action based on a timer, for example. Your concurrency strategy really depends on your data, number of users, likelihood of more than one user being interested in changing the same data at the same time, etc.

Categories

Resources