Effect of Many(!) Small Queries Being Performed

Effect of Many(!) Small Queries Being Performed - c#

So I am troubleshooting some performance problems on a legacy application, and I have uncovered a pretty specific problem (there may be others).
Essentially, the application is using an object relational mapper to fetch data, but it is doing so in a very inefficient/incorrect way. In effect, it is performing a series of entity graph fetches to fill a datagrid in the UI, and on databinding the grid (it is ASP.Net Webforms) it is doing additional fetches, which lead to other fetches, etc.
The net effect of this is that many, many tiny queries are being performed. Using SQL Profiler shows that a certain page performs over 10,000 queries (to fill a single grid. No query takes over 10ms to complete, and most of them register as 0ms in Profiler. Each query will use and release one connection, and the series of queries would be single-threaded (per http request).
I am very familiar with the ORM, and know exactly how to fix the problem.
My question is: what is the exact effect of having many, many small queries being executed in an application? In what ways does it/can it stress the different components of the system?
For example, what is the effect on the webserver's CPU and memory? Would it flood the connection pool and cause blocking? What would be the impact on the database server's memory, CPU and I/O?
I am looking for relatively general answers, mainly because I want to start monitoring the areas that are likely to be the most affected (I need to measure => fix => re-measure). Concurrent use of the system at peak would likely be around 100-200 users.

It will depend on the database but generally there is a parse phase for each query. If the query has used bind variables it will probably be cached. If not, you wear the hit of a parse and that often means short locks on resources. i.e. BAD. In Oracle, CPU and blocking are much more prevelant at the parse than the execute. SQL Server less so but it's worse at the execute. Obviously doing 10K of anything over a network is going to be a terrible solution, especially x 200 users. Volume I'm sure is fine but that frequency will really highlight all the overhead in comms latency and stuff like that. Connection pools generally are in the hundreds, not tens of thousands, and now you have 10s of thousands of objects all being created, queued, managed, destroyed, garbage collected etc.
But I'm sure you already know all this deep down. Ditch the ORM for this part and write a stored procedure to execute the single query to return your result set. Then put it on the grid.

Related

Database benchmarking: should the data transfer/protocol latency be included?

I am currently benchmarking two databases, Postgres and MongoDB, on a relatively large data set with equivalent queries. Of course, I am doing my best to put them on equal grounds, but I have one dilemma. For Postgres I take the execution time reported by EXPLAIN ANALYZE, and there is a similar concept with MongoDB, using profiling (although not equivalent, millis).
However, different times are observed if executed from, lets say, PgAdmin or the mongo CLI client or in my watched C# app. That time also includes the transfer latency, and probably protocol differences. PgAdmin, for example, actually seems to completely deform the execution time (it obviously includes the result rendering time).
The question is: is there any sense in actually measuring the time on the "receiving end", since an application actually does consume that data? Or does it just include too many variables and does not contribute anything to the actual database performance, and I should stick to the reported DBMS execution times?

The question you'd have to answer is why are you benchmarking the databases? If you are benchmarking so you can select one over the other, for use in a C# application, then you need to measure the time "on the 'receiving end'". Whatever variables that may contain, that is what you need to compare.

Is my SQL transaction taking too long?

There is something that worries me about my application. I have a SQL query that does a bunch of inserts into the database across various tables. I timed how long it takes to complete the process, it takes about 1.5 seconds. At this point I'm not even done developing the query, I still have more inserts to program into this. So I fully expect this to process to take even longer, perhaps up to 3 seconds.
Now, it is important that all of this data be consistent and finish either completely, or not at all. So What I'm wondering about is, is it OK for a transaction to take that long. Doesn't it lock up the table, so selects, inserts, updates, etc... cannot be run until the transaction is finished? My concern is if this query is being run frequently it could lock up the entire application so that certain parts of it become either incredibly slow, or unusable. With a low user base, I doubt this would be an issue, but if my application should gain some traction, this query could potentially be a lot.
Should I be concerned about this or am I missing something where the database won't act how I am thinking. I'm using a SQL Server 2014 database.
To note, I timed this by using the StopWatch C# object immediately before the transaction starts, and stop it right after the changes are committed. So it's about as accurate as can be.

You're right to be concerned about this, as a transaction will lock the rows it's written until the transaction commits, which can certainly cause problems such as deadlocks, and temporary blocking which will slow the system response. But there are various factors that determine the potential impact.
For example, you probably largely don't need to worry if your users are only updating and querying their own data, and your tables have indexing to support both read and write query criteria. That way each user's row locking will largely not affect the other users--depending on how you write your code of course.
If your users share data, and you want to be able to support efficient searching across multiple user's data even with multiple concurrent updates for example, then you may need to do more.
Some general concepts:
-- Ensure your transactions write to tables in the same order
-- Keep your transactions as short as possible by preparing the data to be written as much as possible before starting the transaction.
-- If this is a new system (and even if not new), definitely consider enabling Snapshot Isolation and/or Read Committed Snapshot Isolation on the database. SI will (when explicitly set on the session) allow your read queries not to be blocked by concurrent writes. RCSI will allow all your read queries by default not to be blocked by concurrent writes. But read this to understand both the benefits and gotchas of both isolation levels: https://www.brentozar.com/archive/2013/01/implementing-snapshot-or-read-committed-snapshot-isolation-in-sql-server-a-guide/

I think its depend on your code, how you used loop effectively, select query and the other statement.

SQL Server High Frequency Inserts

I've a system where Data is being inserted through SP that's called via WCF Service.
In system, we have currently 12000+ actively logged in Users who will be calling WCF service at every 30 seconds (effectively min 200 requests per second).
On SQL Server side, CPU Usage shoots to 100% and when I examined, > 90% of time was spent in DB Writes. This affects overall server performance.
I need suggestion to resolve this issue so that we have less DB write operations and more CPU remains free.
Am open to integrate any other DB Server, use Entity Framework or any other ORM combination if needed. I need to have solution to handle this issue.
Other information that might be helpful:
Table has no indexes defined
Database has growth factor set to 200MB.
SQL Server Version is 2012.

SImple solution: back the writes. Do not call into the sql server for every insert.
Make a service that collects them and calls them more coarsely. The main problem is that transaction handling is a little heavy cost wise - in cases like that it may make sense to batch them.
Do not call a SP for every row, load them into a temp table and then process them in bulk (or use a table variable to provide the sp with multiple lines of information at once).
This gets rid of a lot of issues, including a ton of commits (you basically ask for like 200 TPS which is quite heavy and not needed here).
How you do that is up to you - but for something that heavy I would stay away from an ORM (Entity Framework is hilarious in not batching anything - that should be tons of sp calls) and use handcrafted sql at least for this part. I love ORM's but it is always nice to have a high performance hand crafted approach when needed.

Is Many requests to cache efficient?

when someone visits my webpage I have about 100 requests to C# cache. Do you think I would be better off just getting the information from the database? because if I make a database call it is just one round trip to the database. But If I try to call this information from cache I have to make many round trips to the cache.

In memory lookups are almost always faster than hopping the network to query the database.
You should also consider more than the time of that single request. Even if that single request is longer doing the 100 in memory lookups (which it won't unless your data structures are inefficient), consider the bottleneck. The database always becomes the single bottleneck in a system that can scale out. By caching in memory, you let the system breath and allow it to scale by adding more front end servers.
But caches are not without their own problems. Lifetime is always a challenge - especially if you require the data to updated quickly if changed.
Caches are also a source of bugs. If you need to update the data and your app scales out, you can bounce around in farm and get inconsistent answers. That can be minimized with cluster affinity or it may not even be an issue if the data doesn't change frequently or it's not critical to be up to date.

In-memory caches exist because a request to memory is nearly always quicker than a network request. Once you have the data in a cache, the only reason for re-querying the database is if you believe the data in your cache is out of date.
That said, if you are querying your cache many times, I don't see how this could this be reduced to a single database query (unless you are obtaining data from your cache by row or field.)
Regardless, multiple trips to memory should still be far quicker than refreshing from the database.

SQL Server Compact compared to C# data structures

We currently use List<T> to store events from a simulation project we are running. We need to optimise memory utilisation and the time it takes to process the events in order to derive certain key metrics.
We thought of moving the event log to a SQL Server Compact database table and then possibly use Linq to calculate the metrics. From your experience do you think it will be faster to use SQL Server Compact than C#'s built-in data structures or are we going to have issues?

Some ideas.
MSMQ (Microsoft Message Queue)
You can have a thread dequeueing off of MSMQ and updating metrics on the fly. If you need to store these events for later paroosal you can put them into the database as you dequeue them. MSMQ demonstrates much better scalability in these scenarios - especially when the publisher and subscriber have assymetric processing speeds; and binary data is being used (as SQL can get bogged down with allocating space for VARBINARY, or allocating/splitting pages for indexes).
The two other SQL scenarios are complimentary to this one - you can still use dequeueing to insert into SQL; to avoid any hiccups in your simulation while SQL allocates space.
You can side-step what #Aliostad said using this one, to a certain degree.
OLAP (Online Analytical Processing)
Sounds like you might benefit from from OLAP (cubes etc.). This will increase the overall runtime of your simulation but will improve the value of the data. Unfortunately this means forking out cash for one of the bigger SQL editions.
Stored Procedures
While Linq-to-SQL is great for 'your average developer' please keep away from it in scientific projects. There are a host of great tricks you can use in raw TSQL, in addition to being able to inspect the query plan. If you want the best possible performance plan your DB carefully and create stored procedures/UDFs to aggregate your data.
If you can only calculate some of the metrics in C#, do as much work in SQL before-hand - and then feel free to use Linq-to-SQL to grab the data.
Also remember if you are inserting off the end of a MSMQ you can agressively index, which will speed up your metric calculations without impacting your simulation.
I would only involve SQL if there is a real need for better memory utilization (i.e. you are actually running out of it).
Memory Mapped Files
This allows you to offset memory pressure onto disk; at a performance penalty if it needs to be 'paged' back in.
Overall
I could steer clear of Linq to define basic metrics - do it in SQL. MSMQ is without a doubt a huge winner in this case. Don't overcomplicate the memory issue and keep it in .Net if you are not running out of memory.

If you need to process all of the events a C# List<> will be faster than Sql Server. An Array<> will have better performance, especially if the elements are structs and not classes, since structs are put in arrays where class instances only are referenced from the array. Having the structs within the array reduces garbage collection and increases cache locality.
If you only need to process part of the events, I think the solutions are in this order when it come to speed:
C# data structures, crafted especially for your needs.
Sql Server
Naive C# data structures, traversing a list searching for the right elements.

It sounds like you're thinking you need to have them in a database in order to use Linq. This isn't the case. You can use Linq with csharp's built in structures.

Depends on what you mean "faster use". If this is about performance of access to data, it's all about how much data you have, on big data the DB solution, only for statistical purposes, is definitely good choice.
Like DB, for this kind of purposes I would suggest SQLite: as this is single file (no services need like SQL Server compact) fully ACID supported DB. But again, this depends on your data size, as SQLite has limit of data inferior to that one of SQLServer.
Regards.

We need to optimise memory utilisation
Use Sql-Server-CE
the time it takes to process the events
Use Linq-To-Objects.
These two objectives are conflicting and you need to choose one that matters more to you.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.