When inserting data into database using parallel foreach I get the following error:
The connection pool has been exhausted'
after inserting some amount of data into the database
try
{
var connection = ConfigurationManager.ConnectionStrings["Connection"].ConnectionString;
Parallel.ForEach(Enumerable.Range(0, 1000), (_) =>
{
using (var connectio = new NpgsqlConnection(connection))
{
connectio.Open();
using (var command = new NpgsqlCommand("fn_tetsdata", connectio) { CommandType = CommandType.StoredProcedure })
{
command.Parameters.AddWithValue("firstname", "test");
command.Parameters.AddWithValue("lastname", "test");
command.Parameters.AddWithValue("id", 10);
command.Parameters.AddWithValue("designation", "test");
command.ExecuteNonQuery();
}
connectio.Close();
}
});
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Constrain the amount of parralelism with MaxDegreeOfParallelism, by default it could be exceeding the number of DB connections you have. Find a balance between parallelising your work and not killing the DB :)
Parallel.ForEach(yourListOfStuff,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
stuff => { YourMethod(stuff); }
);
I assume you're using parallelism to improve performance. If that's the case then first you need a baseline. Run the 1,000 queries in serial, creating a new connection each time (which in reality just pulls one from the pool).
Then try it with the same connection object and see if the performance improves.
Then try it with the came command object, just changing the parameter values.
Then try it in parallel with the same connection co you're not creating 1,000 connection objects, which you've already tried.
I would be surprised if you got a significant performance improvement by using parallelism, since Parallel improves the performance of CPU-bound tasks, and data queries are generally much more bound by I/O than CPU.
Related
I have a program that loads a large quantity of data (~800K-1M rows per iteration) in a Task running on the threadpool (see offending code sample below); no more than 4 tasks running concurrently. This is the only place in the program that a connection is made to this database. When running the program on my laptop (and other coworkers identical laptops), the program functions perfectly. However, we have access to another workstation via remote desktop that is substantially more powerful than our laptops. The program fails about 1/3 to 1/2 of the way through its list. All of the tasks return an exception.
The first exception was: "Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached." I've tried googling, binging, searching on StackOverflow, and banging my head against the table trying to figure out how this can be the case. With no more than 4 tasks running at once, there shouldn't be more than 4 connections at any one time.
In response to this, I tried two things: (1) I added a try/catch around the conn.Open() line that would clear the pool if InvalidOperationException appears--that appeared to work [didn't let it run all the way through, but got substantially past where it did before], but at the cost of performance. (2) I changed ConnectionTimeout to be 30 seconds instead of 15, which did not work (but let it proceed a little more). I also tried at one point to do ConnectRetryInterval=4 (mistakenly choosing this instead of ConnectRetryCount)--this let to a different error "The maximum number of requests is 4,800", which is strange because we still shouldn't be anywhere near 4,800 requests or connections.
In short, I'm at a loss because I can't figure out what is causing this connection leak only on a higher speed computer. I am also unable to get Visual Studio on that computer to debug directly--any thoughts anyone might have on where to look to try and resolve this would be much appreciated.
(Follow-up to c# TaskFactory ContinueWhenAll unexpectedly running before all tasks complete)
private void LoadData()
{
SqlConnectionStringBuilder builder = new SqlConnectionStringBuilder();
builder.DataSource = "redacted";
builder.UserID = "redacted";
builder.Password = "redacted";
builder.InitialCatalog = "redacted";
builder.ConnectTimeout = 30;
using (SqlConnection conn = new SqlConnection(builder.ConnectionString))
{
//try
//{
// conn.Open();
//} catch (InvalidOperationException)
//{
// SqlConnection.ClearPool(conn);
// conn.Open();
//}
conn.Open();
string monthnum = _monthsdict.First((x) => x.Month == _month).MonthNum;
string yearnum = _monthsdict.First((x) => x.Month == _month).YearNum;
string nextmonthnum = _monthsdict[Array.IndexOf(_monthsdict, _monthsdict.First((x) => x.Month == _month))+1].MonthNum;
string nextyearnum = _monthsdict[Array.IndexOf(_monthsdict, _monthsdict.First((x) => x.Month == _month)) + 1].YearNum;
SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandText = #"redacted";
cmd.Parameters.AddWithValue("#redacted", redacted);
cmd.Parameters.AddWithValue("#redacted", redacted);
cmd.Parameters.AddWithValue("#redacted", redacted);
cmd.CommandTimeout = 180;
SqlDataReader reader = cmd.ExecuteReader();
while(reader.Read())
{
Data data = new Data();
int col1 = reader.GetOrdinal("col1");
int col2 = reader.GetOrdinal("col2");
int col3 = reader.GetOrdinal("col3");
int col4 = reader.GetOrdinal("col4");
data.redacted = redacted;
data.redacted = redacted;
data.redacted = redacted;
data.redacted = redacted;
data.redacted = redacted;
data.Calculate();
_data.Add(data); //not a mistake, referring to another class variable
}
reader.Close();
cmd.Dispose();
conn.Close();
conn.Dispose();
}
}
This turned out to be a classic case of not reading the documentation closely enough. I was trying to cap max Threads at 4, using ThreadPool.SetMaxThreads, but max Threads cannot be less than the number of processors. On the workstation it failed on, it has 8 processors. So, there was never a cap, it was running as many tasks as the Task Scheduler felt appropriate, and it was eventually hitting the Connection Pool limit.
https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadpool.setmaxthreads
I'm using load-tests to analyze the "ball park" performance of Dapper accessing SQL Server. My laptop is simultaneously the load-generator and the test target. My laptop has 2 cores, 16GB RAM, and is running Windows 10 Pro, v1709. The database is SQL Server 2017 running in a Docker container (the container's Hyper-V VM has 3GB RAM). My load-test and test code is using .net 4.6.1.
My load-test results after 15 seconds of a simulated 10 simultaneous clients are as follows:
Synchronous Dapper code: 750+ transactions per second.
Asynchronous Dapper code: 4 to 8 transactions per second. YIKES!
I realize that async can sometimes be slower than synchronous code. I also realize that my test setup is weak. However, I shouldn't be seeing such horrible performance from asynchronous code.
I've narrowed the problem to something associated with Dapper and the System.Data.SqlClient.SqlConnection. I need help to finally solve this. Profiler results are below.
I figured out a cheesy way to force my async code to achieve 650+ transactions per second, which I'll discuss in a bit, but now first it is time to show my code which is just a console app. I have a test class:
public class FitTest
{
private List<ItemRequest> items;
public FitTest()
{
//Parameters used for the Dapper call to the stored procedure.
items = new List<ItemRequest> {
new ItemRequest { SKU = "0010015488000060", ReqQty = 2 } ,
new ItemRequest { SKU = "0010015491000060", ReqQty = 1 }
};
}
... //the rest not listed.
Synchronous Test Target
Within the FitTest class, under load, the following test-target method achieves 750+ transactions per second:
public Task LoadDB()
{
var skus = items.Select(x => x.SKU);
string procedureName = "GetWebInvBySkuList";
string userDefinedTable = "[dbo].[StringList]";
string connectionString = "Data Source=localhost;Initial Catalog=Web_Inventory;Integrated Security=False;User ID=sa;Password=1Secure*Password1;Connect Timeout=30;Encrypt=False;TrustServerCertificate=True;ApplicationIntent=ReadWrite;MultiSubnetFailover=False";
var dt = new DataTable();
dt.Columns.Add("Id", typeof(string));
foreach (var sku in skus)
{
dt.Rows.Add(sku);
}
using (var conn = new SqlConnection(connectionString))
{
var inv = conn.Query<Inventory>(
procedureName,
new { skuList = dt.AsTableValuedParameter(userDefinedTable) },
commandType: CommandType.StoredProcedure);
return Task.CompletedTask;
}
}
I am not explicitly opening or closing the SqlConnection. I understand that Dapper does that for me. Also, the only reason the above code returns a Task is because my load-generation code is designed to work with that signature.
Asynchronous Test Target
The other test-target method in my FitTest class is this:
public async Task<IEnumerable<Inventory>> LoadDBAsync()
{
var skus = items.Select(x => x.SKU);
string procedureName = "GetWebInvBySkuList";
string userDefinedTable = "[dbo].[StringList]";
string connectionString = "Data Source=localhost;Initial Catalog=Web_Inventory;Integrated Security=False;User ID=sa;Password=1Secure*Password1;Connect Timeout=30;Encrypt=False;TrustServerCertificate=True;ApplicationIntent=ReadWrite;MultiSubnetFailover=False";
var dt = new DataTable();
dt.Columns.Add("Id", typeof(string));
foreach (var sku in skus)
{
dt.Rows.Add(sku);
}
using (var conn = new SqlConnection(connectionString))
{
return await conn.QueryAsync<Inventory>(
procedureName,
new { skuList = dt.AsTableValuedParameter(userDefinedTable) },
commandType: CommandType.StoredProcedure).ConfigureAwait(false);
}
}
Again, I'm not explicitly opening or closing the connection - because Dapper does that for me. I have also tested this code with explicitly opening and closing; it does not change the performance. The profiler results for the load-generator acting against the above code (4 TPS) is as follows:
What DOES change the performance is if I change the above as follows:
//using (var conn = new SqlConnection(connectionString))
//{
var inv = await conn.QueryAsync<Inventory>(
procedureName,
new { skuList = dt.AsTableValuedParameter(userDefinedTable) },
commandType: CommandType.StoredProcedure);
var foo = inv.ToArray();
return inv;
//}
In this case I've converted the SqlConnection into a private member of the FitTest class and initialized it in the constructor. That is, one SqlConnection per client per load-test session. It is never disposed during the load-test. I also changed the connection string to include "MultipleActiveResultSets=True", because now I started getting those errors.
With these changes, my results become: 640+ transactions per second, and with 8 exceptions thrown. The exceptions were all "InvalidOperationException: BeginExecuteReader requires an open and available Connection. The connection's current state is connecting." The profiler results in this case:
That looks to me like a synchronization bug in Dapper with the SqlConnection.
Load-Generator
My load-generator, a class called Generator, is designed to be given a list of delegates when constructed. Each delegate has a unique instantiation of the FitTest class. If I supply an array of 10 delegates, it is interpreted as representing 10 clients to be used for generating load in parallel.
To kick off the load test, I have this:
//This `testFuncs` array (indirectly) points to either instances
//of the synchronous test-target, or the async test-target, depending
//on what I'm measuring.
private Func<Task>[] testFuncs;
private Dictionary<int, Task> map;
private TaskCompletionSource<bool> completionSource;
public void RunWithMultipleClients()
{
completionSource = new TaskCompletionSource<bool>();
//Create a dictionary that has indexes and Task completion status info.
//The indexes correspond to the testFuncs[] array (viz. the test clients).
map = testFuncs
.Select((f, j) => new KeyValuePair<int, Task>(j, Task.CompletedTask))
.ToDictionary(p => p.Key, v => v.Value);
//scenario.Duration is usually '15'. In other words, this test
//will terminate after generating load for 15 seconds.
Task.Delay(scenario.Duration * 1000).ContinueWith(x => {
running = false;
completionSource.SetResult(true);
});
RunWithMultipleClientsLoop();
completionSource.Task.Wait();
}
So much for the setup, the actual load is generated as follows:
public void RunWithMultipleClientsLoop()
{
//while (running)
//{
var idleList = map.Where(x => x.Value.IsCompleted).Select(k => k.Key).ToArray();
foreach (var client in idleList)
{
//I've both of the following. The `Task.Run` version
//is about 20% faster for the synchronous test target.
map[client] = Task.Run(testFuncs[client]);
//map[client] = testFuncs[client]();
}
Task.WhenAny(map.Values.ToArray())
.ContinueWith(x => { if (running) RunWithMultipleClientsLoop(); });
// Task.WaitAny(map.Values.ToArray());
//}
}
The while loop and Task.WaitAny, commented out, represent a different approach that has nearly the same performance; I keep it around for experiments.
One last detail. Each of the "client" delegates I pass in is first wrapped inside a metrics-capture function. The metrics capture function looks like this:
private async Task LoadLogic(Func<Task> testCode)
{
try
{
if (!running)
{
slagCount++;
return;
}
//This is where the actual test target method
//is called.
await testCode().ConfigureAwait(false);
if (running)
{
successCount++;
}
else
{
slagCount++;
}
}
catch (Exception ex)
{
if (ex.Message.Contains("Assert"))
{
errorCount++;
}
else
{
exceptionCount++;
}
}
}
When my code runs, I do not receive any errors or exceptions.
Ok, what am I doing wrong? In the worst case scenario, I would expect the async code to be only slightly slower than synchronous.
I have a method which takes an argument and run it against database, retrieve records, process and save processed records to a new table. Running the method from the service with one parameter works. What i am trying to achieve now is make the parameter dynamic. I have implemented a method to retrieve the parameters and it works fine. Now i am trying to run methods parallel from the list of parameter's provided. My current implementation is:
WorkerClass WorkerClass = new WorkerClass();
var ParametersList = WorkerClass.GetParams();
foreach (var item in ParametersList){
WorkerClass WorkerClass2 = new WorkerClass();
Parallel.Invoke(
()=>WorkerClass2.ProcessAndSaveMethod(item)
);
}
On the above implementation i think defining a new WorkerClass2 defies the whole point of Parallel.Invoke but i am having an issue with data mixup when using already defined WorkerClass. The reason for the mix up is Oracle connection is opened inside the Init() Method of the class and static DataTable DataCollectionList; is defined on a class level thus creating an issue.
Inside the method ProcessAndSaveMethod(item) i have:
OracleCommand Command = new OracleCommand(Query, OracleConnection);
OracleDataAdapter Adapter = new OracleDataAdapter(Command);
Adapter.Fill(DataCollectionList);
Inside init():
try
{
OracleConnection = new OracleConnection(Passengers.OracleConString);
DataCollectionList = new DataTable();
OracleConnection.Open();
return true;
}
catch (Exception ex)
{
OracleConnection.Close();
DataCollectionList.Clear();
return false;
}
And the function isn't run parallely as i was trying to do. Is there another way to implement this?
To run it in parallel you need to call Parallel.Invoke only once with all the tasks to be completed:
Parallel.Invoke(
ParametersList.Select(item =>
new Action(()=>WorkerClass2.ProcessAndSaveMethod(item))
).ToArray()
);
If you have a list of somethings and want it processed in parallel, there really is no easier way than PLinq:
var parametersList = SomeObject.SomeFunction();
var resultList = parametersList.AsParallel()
.Select(item => new WorkerClass().ProcessAndSaveMethod(item))
.ToList();
The fact that you build up a new connection and use a lot of variables local to the one item you process is fine. It's actually the preferred way to do multi-threading: keep as much local to the thread as you can.
That said, you have to measure if multi-threading is actually the fastest way to solve your problem. Maybe you can do your processing sequentially and then do all your database stuff in one go with bulk inserts, temporary tables or whatever is suited to your specific problem. Splitting a task into smaller tasks for more processors to run is not always faster. It's a tool and you need to find out if that tool is helping in your specific situation.
I achieved parallel processing using the below code and also avoided null pointer exception from DbCon.open() caused by connection pooling using the max degree of parallelism parameter.
Parallel.ForEach(ParametersList , new ParallelOptions() { MaxDegreeOfParallelism = 5 }, item=>
{
WorkerClass Worker= new WorkerClass();
Worker.ProcessAndSaveMethod(item);
});
I am using Npgsqlconnection inside a Parallel.ForEach, looping through inline queries in a List.
When I reach the number around 1400+ I get an Exception saying
'FATAL: 53300: remaining connection slots are reserved for non-replication superuser connections'.
I am using
Pooling=true;MinPoolSize=1;MaxPoolSize=1024;ConnectionLifeTime=1
in my app.config and con.Close(), con.ClearPool(), con.Dispose() in my code.
Parallel.ForEach(queries, query =>
{
using (NpgsqlConnection con = new NpgsqlConnection(ConfigurationManager.ConnectionStrings["PSQL"].ConnectionString))
{
con.ClearPool();
con.Open();
//int count = 0;
int queryCount = queries.Count;
using (NpgsqlCommand cmd = con.CreateCommand())
{
cmd.CommandType = CommandType.Text;
//cmd.CommandTimeout = 0;
cmd.CommandText = query;
cmd.ExecuteNonQuery();
count += 1;
this.label1.Invoke(new MethodInvoker(delegate { this.label1.Text = String.Format("Processing...\n{0} of {1}.\n{2}% completed.", count, queryCount, Math.Round(Decimal.Divide(count, queryCount) * 100, 2)); }));
}
con.Close();
//con.Dispose();
//con.ClearPool();
}
});
You are hitting the max connection limit of postgresql itself:
http://www.postgresql.org/docs/9.4/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS
Your parallel queries are getting a lot of connections and the server isn't being able to handle it. By default, Postgresql is configured to allow 100 concurrent connections. Maybe you should try to increase this value in your postgresql.conf file.
Another option is to limit the pool size of Npgsql to a lower number. Your concurrent queries would wait when the max pool size is reached.
Also, don't call ClearPool as you would add overhead to the pool logic and wouldn't benefit from the pool at all. You could try setting Pool=false in your connection string instead.
I hope it helps.
I have a long-running windows service that constantly receives data and processes it, then puts it into the database. I use stored procedures for the complex operations, but some of them have many parameters.
I'm aware that this is the suggested 'best practice':
using(IDbConnection connection = GetConnection())
{
connection.Open();
// Do stuff
connection.Close();
}
Resulting in short-lived connections, but taking full advantage of connection pooling. However, this practice really seems to negate the benefits of a stored procedure. I have something like this at the moment:
while(true)
{
var items = GetData(); // network I/O
using(IDbConnection conn = GetConn())
{
connection.Open();
var tran = connection.BeginTransaction();
var preparedStatement1 = SQL.Prepare(connection, "...", ...);
var preparedStatement2 = SQL.Prepare(connection, "...", ...);
var preparedStatement3 = SQL.Prepare(connection, "...", ...);
foreach(var item in items)
{
// loop which calls SQL statements.
}
connection.Close();
}
}
I really feel that I should open the connection outside of the while loop so it stays alive a long time; and prepare the statements before entering the loop. That would give me the full benefit of using the stored procedures:
using(IDbConnection conn = GetConn())
{
connection.Open();
var tran = connection.BeginTransaction();
var preparedStatement1 = SQL.Prepare(connection, "...", ...);
var preparedStatement2 = SQL.Prepare(connection, "...", ...);
var preparedStatement3 = SQL.Prepare(connection, "...", ...);
while(!service.IsStopped)
{
var items = GetData(); // network I/O
foreach(var item in items)
{
// loop which calls SQL statements.
}
}
connection.Close();
}
So the question is, does the performance benefit of stored procedures outweigh the "risks" of leaving connections open for a long time? The best practices never seem to mention prepared statements, and the MSDN documentation (I'm using SQL Server) seems to suggest Prepare()ing will sometimes be a no-op: SQLCommand.Prepare()