I have a long-running windows service that constantly receives data and processes it, then puts it into the database. I use stored procedures for the complex operations, but some of them have many parameters.
I'm aware that this is the suggested 'best practice':
using(IDbConnection connection = GetConnection())
{
connection.Open();
// Do stuff
connection.Close();
}
Resulting in short-lived connections, but taking full advantage of connection pooling. However, this practice really seems to negate the benefits of a stored procedure. I have something like this at the moment:
while(true)
{
var items = GetData(); // network I/O
using(IDbConnection conn = GetConn())
{
connection.Open();
var tran = connection.BeginTransaction();
var preparedStatement1 = SQL.Prepare(connection, "...", ...);
var preparedStatement2 = SQL.Prepare(connection, "...", ...);
var preparedStatement3 = SQL.Prepare(connection, "...", ...);
foreach(var item in items)
{
// loop which calls SQL statements.
}
connection.Close();
}
}
I really feel that I should open the connection outside of the while loop so it stays alive a long time; and prepare the statements before entering the loop. That would give me the full benefit of using the stored procedures:
using(IDbConnection conn = GetConn())
{
connection.Open();
var tran = connection.BeginTransaction();
var preparedStatement1 = SQL.Prepare(connection, "...", ...);
var preparedStatement2 = SQL.Prepare(connection, "...", ...);
var preparedStatement3 = SQL.Prepare(connection, "...", ...);
while(!service.IsStopped)
{
var items = GetData(); // network I/O
foreach(var item in items)
{
// loop which calls SQL statements.
}
}
connection.Close();
}
So the question is, does the performance benefit of stored procedures outweigh the "risks" of leaving connections open for a long time? The best practices never seem to mention prepared statements, and the MSDN documentation (I'm using SQL Server) seems to suggest Prepare()ing will sometimes be a no-op: SQLCommand.Prepare()
Related
I have the following code:
public void Execute(string Query, params SqlParameter[] Parameters)
{
using (var Connection = new SqlConnection(Configuration.ConnectionString))
{
Connection.Open();
using (var Command = new SqlCommand(Query, Connection))
{
if (Parameters.Length > 0)
{
Command.Parameters.Clear();
Command.Parameters.AddRange(Parameters);
}
Command.ExecuteNonQuery();
}
}
}
The method may be called 2 or 3 times for different queries but in same manner.
For example:
Insert an Employee
Insert Employee Certificates
Update Degree of Employee on another table [ Fail can cause here. for example ]
If Point [3] fails, all already committed commands shouldn't execute and must be rolled back.
I know I can put SqlTransaction above and use Commit() method. But what about 3rd point if failed? I think point 3 only will rollback and other point 1,2 will not? How to solve this and what approach should I do??
Should I use SqlCommand[] arrays? What I should I do?
I only find similar question but in CodeProject:
See Here
Without changing your Execute method you can do this
var tranOpts = new TransactionOptions()
{
IsolationLevel = IsolationLevel.ReadCommitted,
Timeout = TransactionManager.MaximumTimeout
};
using (var tran = new TransactionScope(TransactionScopeOption.Required, tranOpts)
{
Execute("INSERT ...");
Execute("INSERT ...");
Execute("UPDATE ...");
tran.Complete();
}
SqlClient will cache the internal SqlConnection that is enlisted in the Transaction and reuse it for each call to Execute. So you even end up with a local (not distributed) transaction.
This is all explained in the docs here: System.Transactions Integration with SQL Server
There are a few ways to do it.
The way that probably involves changing the least code and involves the least complexity is to chain multiple SQL statements into a single query. It's perfectly fine to build a string for the Query argument that runs more than one statement, including BEGIN TRANSACTION, COMMIT, and (if needed) ROLLBACK. Basically, keep a whole stored procedure in your C# code. This also has the nice benefit of making it easier to use version control with your procedures.
But it still feels kind of hackish.
One way to reduce that effect is marking the Execute() method private. Then, have an additional method in the class for each query. In this way, the long SQL strings are isolated, and when you're using the database it feels more like using a local API. For more complicated applications, this might instead be a whole separate assembly with a few types managing logical functional areas, where the core methods like Exectue() are internal. This is a good idea anyway, regardless of how you end up supporting transactions.
And speaking of procedures, stored procedures are also a perfectly fine way to handle this. Have one stored procedure to do all the work, and call it when ready.
Another option is overloading the method to accept multiple queries and parameter collections:
public void Execute(string TransactionName, string[] Queries, params SqlParameter[][] Parameters)
{
using (var Connection = new SqlConnection(Configuration.ConnectionString))
using (var Transaction = new SqlTransaction(TransactionName))
{
connection.Transaction = Transaction;
Connection.Open();
try
{
for (int i = 0; i < Queries.Length; i++)
{
using (var Command = new SqlCommand(Queries[i], Connection))
{
command.Transaction = Transaction;
if (Parameters[i].Length > 0)
{
Command.Parameters.Clear();
Command.Parameters.AddRange(Parameters);
}
Command.ExecuteNonQuery();
}
}
Transaction.Commit();
}
catch(Exception ex)
{
Transaction.Rollback();
throw; //I'm assuming you're handling exceptions at a higher level in the code
}
}
}
Though I'm not sure how the params keyword works with an array of arrays... I've just not tried that option, but something along these lines would work. The weakness here is also that it's not trivial to have a later query depend on a result from an earlier query, and even queries with no parameter would still need a Parameters array as a placeholder.
A final option is extending the type holding your Execute() method to support transactions. The trick here is it's common (and desirable) to have this type be static, but supporting transactions requires re-using common connection and transaction objects. Given the implied long-running nature of a transaction, you have to support more than one at a time, which means both instances and implementing IDisposable.
using (var connection = new SqlConnection(Configuration.ConnectionString))
{
SqlCommand command = connection.CreateCommand();
SqlTransaction transaction;
connection.Open();
transaction = connection.BeginTransaction("Transaction");
command.Connection = connection;
command.Transaction = transaction;
try
{
if (Parameters.Length > 0)
{
command.Parameters.Clear();
command.Parameters.AddRange(Parameters);
}
command.ExecuteNonQuery();
transaction.Commit();
}
catch (Exception e)
{
try
{
transaction.Rollback();
}
catch (Exception ex2)
{
//trace
}
}
}
I am using the following way of querying in dapper on a MySQL database.
using (var db = new MySqlConnection(ConfigurationHandler.GetSection<string>(StringConstants.ConnectionString)))
{
resultSet = db.Execute(UpdateQuery, new { _val = terminalId }, commandType: CommandType.Text);
db.Close();//should i call this or not
db.Dispose();//should i call this or not
}
Is it a good way of explicitly calling db.close and db.dispose? My application could be handling 100's of requests per second.
A using block is a convenience arround the IDisposable interface. It ensures that the dispose method is called at the end of the block.
See: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/using-statement.
In your case you can remove the explicit calls to db.Close() and db.Dispose() because you are not re-using the connection object.
using (var db = new MySqlConnection(ConfigurationHandler.GetSection<string>(StringConstants.ConnectionString)))
{
resultSet = db.Execute(UpdateQuery,
new { _val = terminalId }, commandType: CommandType.Text);
}
The following link provides further details about .Close vs .Dispose: https://stackoverflow.com/a/61171/1028323
I am trying to understand what's happening in the background, when a simple select query executed by client.
I am using C# Asp.Net Webforms, and i checked the processes with WireShark.
public DBC(string procedureName, params object[] procParams)
{
strError = null;
using (MySqlConnection connection = new MySqlConnection(GetConnectionString()))
{
connection.Close();
try
{
connection.Open();
MySqlCommand cmd = new MySqlCommand(procedureName, connection);
cmd.CommandType = CommandType.StoredProcedure;
//if we use params for stored procedure
if (procParams != null)
{
int i = 1;
foreach (object paramValue in procParams)
{
cmd.Parameters.Add(new MySqlParameter("#param_" + i, paramValue.ToString()));
i++;
}
}
if (procedureName.Contains("get"))
{
dtLoaded = new DataTable();
dtLoaded.Load(cmd.ExecuteReader());
}
else
{
cmd.ExecuteNonQuery();
}
}
catch (Exception ex)
{
strError = ErrorHandler.ErrorToMessage(ex);
}
finally
{
connection.Close();
connection.Dispose();
}
}
}
This is a simple SELECT * FROM TABLE query, in a try-catch statement. At the finally state, the connection was closed and disposed.
Why is it causes 43 process? I don't understand, why is there so much. Somebody could explain me?
Many thanks!
I assume you're using Oracle's Connector/NET. It performs a lot of not-strictly-necessary queries after opening a connection, e.g., SHOW VARIABLES to retrieve some server settings. (In 8.0.17 and later, this has been optimised slightly.)
Executing a stored procedure requires retrieving information about the stored procedure (to align parameters); it's more "expensive" than just executing a SQL statement directly. (You can disable this with CheckParameters=false, but I wouldn't recommend it.)
You can switch to MySqlConnector if you want a more efficient .NET client library. It's been tuned for performance (in both client CPU time and network I/O) and won't perform as much unnecessary work when opening a connection and executing a query. (MySqlConnector is the client library used for the .NET/MySQL benchmarks in the TechEmpower Framework Benchmarks.)
I'm using load-tests to analyze the "ball park" performance of Dapper accessing SQL Server. My laptop is simultaneously the load-generator and the test target. My laptop has 2 cores, 16GB RAM, and is running Windows 10 Pro, v1709. The database is SQL Server 2017 running in a Docker container (the container's Hyper-V VM has 3GB RAM). My load-test and test code is using .net 4.6.1.
My load-test results after 15 seconds of a simulated 10 simultaneous clients are as follows:
Synchronous Dapper code: 750+ transactions per second.
Asynchronous Dapper code: 4 to 8 transactions per second. YIKES!
I realize that async can sometimes be slower than synchronous code. I also realize that my test setup is weak. However, I shouldn't be seeing such horrible performance from asynchronous code.
I've narrowed the problem to something associated with Dapper and the System.Data.SqlClient.SqlConnection. I need help to finally solve this. Profiler results are below.
I figured out a cheesy way to force my async code to achieve 650+ transactions per second, which I'll discuss in a bit, but now first it is time to show my code which is just a console app. I have a test class:
public class FitTest
{
private List<ItemRequest> items;
public FitTest()
{
//Parameters used for the Dapper call to the stored procedure.
items = new List<ItemRequest> {
new ItemRequest { SKU = "0010015488000060", ReqQty = 2 } ,
new ItemRequest { SKU = "0010015491000060", ReqQty = 1 }
};
}
... //the rest not listed.
Synchronous Test Target
Within the FitTest class, under load, the following test-target method achieves 750+ transactions per second:
public Task LoadDB()
{
var skus = items.Select(x => x.SKU);
string procedureName = "GetWebInvBySkuList";
string userDefinedTable = "[dbo].[StringList]";
string connectionString = "Data Source=localhost;Initial Catalog=Web_Inventory;Integrated Security=False;User ID=sa;Password=1Secure*Password1;Connect Timeout=30;Encrypt=False;TrustServerCertificate=True;ApplicationIntent=ReadWrite;MultiSubnetFailover=False";
var dt = new DataTable();
dt.Columns.Add("Id", typeof(string));
foreach (var sku in skus)
{
dt.Rows.Add(sku);
}
using (var conn = new SqlConnection(connectionString))
{
var inv = conn.Query<Inventory>(
procedureName,
new { skuList = dt.AsTableValuedParameter(userDefinedTable) },
commandType: CommandType.StoredProcedure);
return Task.CompletedTask;
}
}
I am not explicitly opening or closing the SqlConnection. I understand that Dapper does that for me. Also, the only reason the above code returns a Task is because my load-generation code is designed to work with that signature.
Asynchronous Test Target
The other test-target method in my FitTest class is this:
public async Task<IEnumerable<Inventory>> LoadDBAsync()
{
var skus = items.Select(x => x.SKU);
string procedureName = "GetWebInvBySkuList";
string userDefinedTable = "[dbo].[StringList]";
string connectionString = "Data Source=localhost;Initial Catalog=Web_Inventory;Integrated Security=False;User ID=sa;Password=1Secure*Password1;Connect Timeout=30;Encrypt=False;TrustServerCertificate=True;ApplicationIntent=ReadWrite;MultiSubnetFailover=False";
var dt = new DataTable();
dt.Columns.Add("Id", typeof(string));
foreach (var sku in skus)
{
dt.Rows.Add(sku);
}
using (var conn = new SqlConnection(connectionString))
{
return await conn.QueryAsync<Inventory>(
procedureName,
new { skuList = dt.AsTableValuedParameter(userDefinedTable) },
commandType: CommandType.StoredProcedure).ConfigureAwait(false);
}
}
Again, I'm not explicitly opening or closing the connection - because Dapper does that for me. I have also tested this code with explicitly opening and closing; it does not change the performance. The profiler results for the load-generator acting against the above code (4 TPS) is as follows:
What DOES change the performance is if I change the above as follows:
//using (var conn = new SqlConnection(connectionString))
//{
var inv = await conn.QueryAsync<Inventory>(
procedureName,
new { skuList = dt.AsTableValuedParameter(userDefinedTable) },
commandType: CommandType.StoredProcedure);
var foo = inv.ToArray();
return inv;
//}
In this case I've converted the SqlConnection into a private member of the FitTest class and initialized it in the constructor. That is, one SqlConnection per client per load-test session. It is never disposed during the load-test. I also changed the connection string to include "MultipleActiveResultSets=True", because now I started getting those errors.
With these changes, my results become: 640+ transactions per second, and with 8 exceptions thrown. The exceptions were all "InvalidOperationException: BeginExecuteReader requires an open and available Connection. The connection's current state is connecting." The profiler results in this case:
That looks to me like a synchronization bug in Dapper with the SqlConnection.
Load-Generator
My load-generator, a class called Generator, is designed to be given a list of delegates when constructed. Each delegate has a unique instantiation of the FitTest class. If I supply an array of 10 delegates, it is interpreted as representing 10 clients to be used for generating load in parallel.
To kick off the load test, I have this:
//This `testFuncs` array (indirectly) points to either instances
//of the synchronous test-target, or the async test-target, depending
//on what I'm measuring.
private Func<Task>[] testFuncs;
private Dictionary<int, Task> map;
private TaskCompletionSource<bool> completionSource;
public void RunWithMultipleClients()
{
completionSource = new TaskCompletionSource<bool>();
//Create a dictionary that has indexes and Task completion status info.
//The indexes correspond to the testFuncs[] array (viz. the test clients).
map = testFuncs
.Select((f, j) => new KeyValuePair<int, Task>(j, Task.CompletedTask))
.ToDictionary(p => p.Key, v => v.Value);
//scenario.Duration is usually '15'. In other words, this test
//will terminate after generating load for 15 seconds.
Task.Delay(scenario.Duration * 1000).ContinueWith(x => {
running = false;
completionSource.SetResult(true);
});
RunWithMultipleClientsLoop();
completionSource.Task.Wait();
}
So much for the setup, the actual load is generated as follows:
public void RunWithMultipleClientsLoop()
{
//while (running)
//{
var idleList = map.Where(x => x.Value.IsCompleted).Select(k => k.Key).ToArray();
foreach (var client in idleList)
{
//I've both of the following. The `Task.Run` version
//is about 20% faster for the synchronous test target.
map[client] = Task.Run(testFuncs[client]);
//map[client] = testFuncs[client]();
}
Task.WhenAny(map.Values.ToArray())
.ContinueWith(x => { if (running) RunWithMultipleClientsLoop(); });
// Task.WaitAny(map.Values.ToArray());
//}
}
The while loop and Task.WaitAny, commented out, represent a different approach that has nearly the same performance; I keep it around for experiments.
One last detail. Each of the "client" delegates I pass in is first wrapped inside a metrics-capture function. The metrics capture function looks like this:
private async Task LoadLogic(Func<Task> testCode)
{
try
{
if (!running)
{
slagCount++;
return;
}
//This is where the actual test target method
//is called.
await testCode().ConfigureAwait(false);
if (running)
{
successCount++;
}
else
{
slagCount++;
}
}
catch (Exception ex)
{
if (ex.Message.Contains("Assert"))
{
errorCount++;
}
else
{
exceptionCount++;
}
}
}
When my code runs, I do not receive any errors or exceptions.
Ok, what am I doing wrong? In the worst case scenario, I would expect the async code to be only slightly slower than synchronous.
When inserting data into database using parallel foreach I get the following error:
The connection pool has been exhausted'
after inserting some amount of data into the database
try
{
var connection = ConfigurationManager.ConnectionStrings["Connection"].ConnectionString;
Parallel.ForEach(Enumerable.Range(0, 1000), (_) =>
{
using (var connectio = new NpgsqlConnection(connection))
{
connectio.Open();
using (var command = new NpgsqlCommand("fn_tetsdata", connectio) { CommandType = CommandType.StoredProcedure })
{
command.Parameters.AddWithValue("firstname", "test");
command.Parameters.AddWithValue("lastname", "test");
command.Parameters.AddWithValue("id", 10);
command.Parameters.AddWithValue("designation", "test");
command.ExecuteNonQuery();
}
connectio.Close();
}
});
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Constrain the amount of parralelism with MaxDegreeOfParallelism, by default it could be exceeding the number of DB connections you have. Find a balance between parallelising your work and not killing the DB :)
Parallel.ForEach(yourListOfStuff,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
stuff => { YourMethod(stuff); }
);
I assume you're using parallelism to improve performance. If that's the case then first you need a baseline. Run the 1,000 queries in serial, creating a new connection each time (which in reality just pulls one from the pool).
Then try it with the same connection object and see if the performance improves.
Then try it with the came command object, just changing the parameter values.
Then try it in parallel with the same connection co you're not creating 1,000 connection objects, which you've already tried.
I would be surprised if you got a significant performance improvement by using parallelism, since Parallel improves the performance of CPU-bound tasks, and data queries are generally much more bound by I/O than CPU.