Parallel.ForEach using NpgsqlConnection - c#

I am using Npgsqlconnection inside a Parallel.ForEach, looping through inline queries in a List.
When I reach the number around 1400+ I get an Exception saying
'FATAL: 53300: remaining connection slots are reserved for non-replication superuser connections'.
I am using
Pooling=true;MinPoolSize=1;MaxPoolSize=1024;ConnectionLifeTime=1
in my app.config and con.Close(), con.ClearPool(), con.Dispose() in my code.
Parallel.ForEach(queries, query =>
{
using (NpgsqlConnection con = new NpgsqlConnection(ConfigurationManager.ConnectionStrings["PSQL"].ConnectionString))
{
con.ClearPool();
con.Open();
//int count = 0;
int queryCount = queries.Count;
using (NpgsqlCommand cmd = con.CreateCommand())
{
cmd.CommandType = CommandType.Text;
//cmd.CommandTimeout = 0;
cmd.CommandText = query;
cmd.ExecuteNonQuery();
count += 1;
this.label1.Invoke(new MethodInvoker(delegate { this.label1.Text = String.Format("Processing...\n{0} of {1}.\n{2}% completed.", count, queryCount, Math.Round(Decimal.Divide(count, queryCount) * 100, 2)); }));
}
con.Close();
//con.Dispose();
//con.ClearPool();
}
});

You are hitting the max connection limit of postgresql itself:
http://www.postgresql.org/docs/9.4/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS
Your parallel queries are getting a lot of connections and the server isn't being able to handle it. By default, Postgresql is configured to allow 100 concurrent connections. Maybe you should try to increase this value in your postgresql.conf file.
Another option is to limit the pool size of Npgsql to a lower number. Your concurrent queries would wait when the max pool size is reached.
Also, don't call ClearPool as you would add overhead to the pool logic and wouldn't benefit from the pool at all. You could try setting Pool=false in your connection string instead.
I hope it helps.

Related

The command timeout does not throw

private void NpgSqlGetContracts(IList<Contract> con)
{
var conn = (NpgsqlConnection)Database.GetDbConnection();
List<Contract> contracts = new List<Contract>();
using (var cmd = new NpgsqlCommand("SELECT * FROM \"Contracts\";", conn))
{
cmd.CommandTimeout = 1;
cmd.Prepare();
int conCount= cmd.ExecuteNonQuery();
using (var reader = cmd.ExecuteReader(CommandBehavior.SingleResult))
{
while (reader.Read())
{
contracts.Add(MapDataReaderRowToContract(reader));
}
}
}
}
Here I have this code to try the command timeout in postgres, I have try debug it locally with break point in visual studio. I try both ExecuteNonQuery and ExecuteReader The query took more than 1 second to load all data (I have above 3 millions rows here). But the command timeout is set to 1 second. I wonder why it does not throw any exception here, What did I configured wrong here?
Thank you :)
As #hans-kesting wrote above, the command timeout isn't cumulative for the entire command, but rather for each individual I/O-producing call (e.g. Read). In that sense, it's meant to help with queries running for too long (without producing any results), or network issues.
You may also want to take a look at PostgreSQL's statement_timeout, which is a PG-side timeout for the entire command. It too has its issues, and Npgsql never sets it implicitly for you - but you can set it yourself.

Connection leak (C#/ADO.NET) even though SqlConnection created with using

I have a program that loads a large quantity of data (~800K-1M rows per iteration) in a Task running on the threadpool (see offending code sample below); no more than 4 tasks running concurrently. This is the only place in the program that a connection is made to this database. When running the program on my laptop (and other coworkers identical laptops), the program functions perfectly. However, we have access to another workstation via remote desktop that is substantially more powerful than our laptops. The program fails about 1/3 to 1/2 of the way through its list. All of the tasks return an exception.
The first exception was: "Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached." I've tried googling, binging, searching on StackOverflow, and banging my head against the table trying to figure out how this can be the case. With no more than 4 tasks running at once, there shouldn't be more than 4 connections at any one time.
In response to this, I tried two things: (1) I added a try/catch around the conn.Open() line that would clear the pool if InvalidOperationException appears--that appeared to work [didn't let it run all the way through, but got substantially past where it did before], but at the cost of performance. (2) I changed ConnectionTimeout to be 30 seconds instead of 15, which did not work (but let it proceed a little more). I also tried at one point to do ConnectRetryInterval=4 (mistakenly choosing this instead of ConnectRetryCount)--this let to a different error "The maximum number of requests is 4,800", which is strange because we still shouldn't be anywhere near 4,800 requests or connections.
In short, I'm at a loss because I can't figure out what is causing this connection leak only on a higher speed computer. I am also unable to get Visual Studio on that computer to debug directly--any thoughts anyone might have on where to look to try and resolve this would be much appreciated.
(Follow-up to c# TaskFactory ContinueWhenAll unexpectedly running before all tasks complete)
private void LoadData()
{
SqlConnectionStringBuilder builder = new SqlConnectionStringBuilder();
builder.DataSource = "redacted";
builder.UserID = "redacted";
builder.Password = "redacted";
builder.InitialCatalog = "redacted";
builder.ConnectTimeout = 30;
using (SqlConnection conn = new SqlConnection(builder.ConnectionString))
{
//try
//{
// conn.Open();
//} catch (InvalidOperationException)
//{
// SqlConnection.ClearPool(conn);
// conn.Open();
//}
conn.Open();
string monthnum = _monthsdict.First((x) => x.Month == _month).MonthNum;
string yearnum = _monthsdict.First((x) => x.Month == _month).YearNum;
string nextmonthnum = _monthsdict[Array.IndexOf(_monthsdict, _monthsdict.First((x) => x.Month == _month))+1].MonthNum;
string nextyearnum = _monthsdict[Array.IndexOf(_monthsdict, _monthsdict.First((x) => x.Month == _month)) + 1].YearNum;
SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandText = #"redacted";
cmd.Parameters.AddWithValue("#redacted", redacted);
cmd.Parameters.AddWithValue("#redacted", redacted);
cmd.Parameters.AddWithValue("#redacted", redacted);
cmd.CommandTimeout = 180;
SqlDataReader reader = cmd.ExecuteReader();
while(reader.Read())
{
Data data = new Data();
int col1 = reader.GetOrdinal("col1");
int col2 = reader.GetOrdinal("col2");
int col3 = reader.GetOrdinal("col3");
int col4 = reader.GetOrdinal("col4");
data.redacted = redacted;
data.redacted = redacted;
data.redacted = redacted;
data.redacted = redacted;
data.redacted = redacted;
data.Calculate();
_data.Add(data); //not a mistake, referring to another class variable
}
reader.Close();
cmd.Dispose();
conn.Close();
conn.Dispose();
}
}
This turned out to be a classic case of not reading the documentation closely enough. I was trying to cap max Threads at 4, using ThreadPool.SetMaxThreads, but max Threads cannot be less than the number of processors. On the workstation it failed on, it has 8 processors. So, there was never a cap, it was running as many tasks as the Task Scheduler felt appropriate, and it was eventually hitting the Connection Pool limit.
https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadpool.setmaxthreads

Cancelling SQL Query in background in .NET

I'm designing a small desktop app that fetches data from SQL server. I used BackgroundWorker to make the query execute in background. The code that fetches data generally comes down to this:
public static DataTable GetData(string sqlQuery)
{
DataTable t = new DataTable();
using (SqlConnection c = new SqlConnection(GetConnectionString()))
{
c.Open();
using (SqlCommand cmd = new SqlCommand(sqlQuery))
{
cmd.Connection = c;
using (SqlDataReader r = cmd.ExecuteReader())
{
t.Load(r);
}
}
}
return t;
}
Since query can take up 10-15 minutes I want to implement cancellation request and pass it from GUI layer to DAL. Cancellation procedure of BackroundWorker won't let me cancel SqlCommand.ExecuteReader() beacuse it only stops when data is fetched from server or an exception is thrown by Data Provider.
I tried to use Task and async/await with SqlCommand.ExecuteReaderAsync(CancellationToken) but I am confused where to use it in multi-layer app (GUI -> BLL -> DAL).
Have you tried using the SqlCommand.Cancel() method ?
Aproach: encapsulate that GetData method in a Thread/Worker and then when you cancel/stop that thread call the Cancel() method on the SqlCommand that is being executed.
Here is an example on how to use it on a thread
using System;
using System.Data;
using System.Data.SqlClient;
using System.Threading;
class Program
{
private static SqlCommand m_rCommand;
public static SqlCommand Command
{
get { return m_rCommand; }
set { m_rCommand = value; }
}
public static void Thread_Cancel()
{
Command.Cancel();
}
static void Main()
{
string connectionString = GetConnectionString();
try
{
using (SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
Command = connection.CreateCommand();
Command.CommandText = "DROP TABLE TestCancel";
try
{
Command.ExecuteNonQuery();
}
catch { }
Command.CommandText = "CREATE TABLE TestCancel(co1 int, co2 char(10))";
Command.ExecuteNonQuery();
Command.CommandText = "INSERT INTO TestCancel VALUES (1, '1')";
Command.ExecuteNonQuery();
Command.CommandText = "SELECT * FROM TestCancel";
SqlDataReader reader = Command.ExecuteReader();
Thread rThread2 = new Thread(new ThreadStart(Thread_Cancel));
rThread2.Start();
rThread2.Join();
reader.Read();
System.Console.WriteLine(reader.FieldCount);
reader.Close();
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
static private string GetConnectionString()
{
// To avoid storing the connection string in your code,
// you can retrieve it from a configuration file.
return "Data Source=(local);Initial Catalog=AdventureWorks;"
+ "Integrated Security=SSPI";
}
}
You can only do Cancelation checking and Progress Reporting between Distinct lines of code. Usually both require that you disect the code down to the lowest loop level, so you can do both these things between/in the loop itterations. When I wrote my first step into BGW, I had the advantage that I needed to do the loop anyway so it was no extra work. You have one of the worse cases - pre-existing code that you can only replicate or use as is.
Ideal case:
This operation should not take nearly as long is it does. 5-10 minutes indicates that there is something rather wrong with your design.
If the bulk of the time is transmission of data, then you are propably retreiving way to much data. Retrieving everything to do filtering in the GUI is a very common mistake. Do as much filtering in the query as possible. Usign a Distributed Database might also help with transmission performance.
If the bulk of the time is processing as part of the query operation (complex Conditions), something in your general approach might have to change. There are various ways to trade off complex calculation with a bit of memory on the DBMS side. Views afaik can cache the results of operations, while still maintaining transactional consistency.
But it really depends what your backend DB/DBMS and use case are. A lot of the use SQL as Query Language. So it does not allow us to predict wich options you have.
Second best case:
The second best thing if you can not cut it down, would be if you had the actually DB access code down to the lowest loop and would do progress reporting/cancelation checking on it. That way you could actually use the existing Cancelation Token System inherent in BGW.
Everything else
Using any other approach to Cancelation is really a fallback. I wrote a lot on why it is bad, but felt that this might work better if I focus on the core issue - likely something wrong in design of he DB and/or Query. Because those might well eliminate the issue altogether.

c# Win Form - The CLR has been unable to transition from COM context

SQL version : SQL Server 2008 R2 Standard Edition
Application : .Net 3.5 (Windows Form)
This is the error i am receiving after running my code
The CLR has been unable to transition from COM context 0xe88270 to COM context 0xe88328 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations.
Following code produces the above error sometimes. Most probably after 24,000+ records have been inserted it gives the error.
Objective of the code
The code is written to insert dummy entry to test my application
Random rnd = new Random();
string Data = "";
for (int i = 0; i < 2000000; i++)
{
Data = "Insert Into Table1(Field1)values('" + rnd.Next(0, 200000000) + "');" + Environment.NewLine +
"Insert Into Table1(Field1)values('" + rnd.Next(0, 200000000) + "');" + Environment.NewLine +
"Insert Into Table1(Field1)values('" + rnd.Next(0, 200000000) + "');" + Environment.NewLine +
"Insert Into Table1(Field1)values('" + rnd.Next(0, 200000000) + "');" + Environment.NewLine +
"Insert Into Table1(Field1)values('" + rnd.Next(0, 200000000) + "');" + Environment.NewLine +
"Insert Into Table1(Field1)values('" + rnd.Next(0, 200000000) + "');";
ExecuteQuery(Data);//Error is displayed here
}
Code of "ExecuteQuery"
SqlConnection Conn = new SqlConnection("Connection String");
if (Conn == null)
{
Conn = new SqlConnection(Program.ConnString);
}
if (Conn.State == ConnectionState.Closed)
{
Conn.Open();
}
else
{
Conn.Close();
Conn.Open();
}
SqlCommand cmd = new SqlCommand();
cmd.Connection = Conn;
cmd.CommandTimeout = 600000;
cmd.CommandType = CommandType.Text;
cmd.CommandText = strsql;
cmd.ExecuteNonQuery();
Conn.Close();
Note :- I have written multiple insert statements that are exact same in the query that is because it will reduce the no. of queries the SQL has to handle
Question
How can i optimize my code to prevent the error from occuring?
"The CLR has been unable to transition from COM context … to COM context … for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages."
It is not entirely clear from your question how your code works in detail, so I will assume that your application is not multithreaded, i.e. the loop performing 2×6 million database INSERTs is running on the application's main (UI) thread. This will probably take a while (> 60 seconds), so your UI will freeze in the meantime (i.e. make it non-responsive). This is because Windows Forms never gets a chance to run and react to user input while your (blocking) loop is still running. And I bet this is what causes the warning you've cited.
Use a parameterized SQL command instead.
The first easy thing you could do is turn your SqlCommand into a parameterized one. You would then issue commands with the identical SQL text; only the separately provided parameters would differ:
private void InsertRandomNumbers(int[] randomNumbers)
{
const string commandText = "INSERT INTO dbo.Table1 (Field1) VALUES (#field1);"
using (var connection = new SqlConnection(connectionString)
using (var command = new SqlCommand(commandText, connection))
{
var field1Parameter = new SqlDataParameter("#field1", SqlDbType.Int);
command.Parameters.Add(field1Parameter);
connection.Open();
foreach (int randomNumber in randomNumbers)
{
field1Parameter.Value = randomNumber;
/* int rowsAffected = */ command.ExecuteNonQuery();
}
connection.Close();
}
}
Note how commandText is defined as a constant. This is a good indication that SQL Server will also recognise it as always the same command — with parameterized commands, the actual parameter values are provided separately — and SQL Server will only compile and optimize the statement once (and put the compiled statement in its cache so it can be reused subsequently) instead of doing the same thing over and over again. This alone should save a lot of time.
Long-running operations should be asynchronous so your UI doesn't freeze.
Another thing you can do is to shift the database code to a background thread, so that your application's UI won't freeze.
Let's assume that your database INSERT loop is currently triggered by a button doWorkButton. So the for loop is inside a button Click event handler:
private void doWorkButton_Click(object sender, EventArgs e)
{
…
for (i = 0; i < 2000000; i++)
{
Data = …
ExecuteQuery(Data);
// note: I leave it as an exercise to you to combine the
// above suggestion (parameterized queries) with this one.
}
}
The first thing we do is to change your ExecuteQuery method in three ways:
Make it asynchronous. Its return type will be Task instead of void, and its declaration is augmented with the async keyword.
Rename it to ExecuteNonQueryAsync to reflect two things: that it will now be asynchronous, and that it doesn't perform queries. We don't actually expect to get back a result from the database INSERTs.
We rewrite the method body to use of ADO.NET asynchronous methods instead of their synchronous counterparts.
This is what it might end up looking like:
private async Task ExecuteNonQueryAsync(string commandText)
{
using (var connection = new SqlConnection(connectionString)
using (var command = new SqlCommand(commandText, connection))
{
connection.Open();
/* int rowsAffected = */ await command.ExecuteNonQueryAsync();
connection.Close();
}
}
That's it. Now we need to modify the handler method to make it asynchronous as well:
private async void doWorkButton_Click(object sender, EventArgs e)
{
try
{
…
for (i = 0; i < 2000000; i++) { … }
{
Data = …
await ExecuteNonQueryAsync(Data);
}
}
catch
{
… // do not let any exceptions escape this handler method
}
}
This should take care of the UI freezing business.
Millions of INSERT statements can be done more efficiently with SqlBulkCopy.
SQL Server has this wonderful option for bulk inserting data. Making use of this feature often results in much better performance that doing your own batches of INSERT statements.
private async void doWorkButton_Click(object sender, EventArgs e)
{
// Prepare the data to be loaded into your database table.
// Note, this could be done more efficiently.
var dataTable = new DataTable();
{
dataTable.Columns.Add("Field1", typeof(int));
var rnd = new Random();
for (int i = 0; i < 12000000; ++i)
{
dataTable.Rows.Add(rnd.Next(0, 2000000));
}
}
using (var connection = new SqlConnection(connectionString))
using (var bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName = "dbo.Table1";
try
{
// This will perform a bulk insert into the table
// mentioned above, using the data passed in as a parameter.
await bulkCopy.WriteToServerAsync(dataTable);
}
catch
{
Console.WriteLine(ex.Message);
}
}
}
I cannot test right now, but I hope this will get you started. If you want to improve the SqlBulkCopy-based solution further, I suggest that you create a custom implementation of IDataReader that creates "rows" containing the random data on-the-fly, then you pass an instance of that to SqlBulkCopy instead of a pre-populated data table. This means that you won't have to keep a huge data table in memory, but only one data row at a time.

The connection pool has been exhausted

When inserting data into database using parallel foreach I get the following error:
The connection pool has been exhausted'
after inserting some amount of data into the database
try
{
var connection = ConfigurationManager.ConnectionStrings["Connection"].ConnectionString;
Parallel.ForEach(Enumerable.Range(0, 1000), (_) =>
{
using (var connectio = new NpgsqlConnection(connection))
{
connectio.Open();
using (var command = new NpgsqlCommand("fn_tetsdata", connectio) { CommandType = CommandType.StoredProcedure })
{
command.Parameters.AddWithValue("firstname", "test");
command.Parameters.AddWithValue("lastname", "test");
command.Parameters.AddWithValue("id", 10);
command.Parameters.AddWithValue("designation", "test");
command.ExecuteNonQuery();
}
connectio.Close();
}
});
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Constrain the amount of parralelism with MaxDegreeOfParallelism, by default it could be exceeding the number of DB connections you have. Find a balance between parallelising your work and not killing the DB :)
Parallel.ForEach(yourListOfStuff,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
stuff => { YourMethod(stuff); }
);
I assume you're using parallelism to improve performance. If that's the case then first you need a baseline. Run the 1,000 queries in serial, creating a new connection each time (which in reality just pulls one from the pool).
Then try it with the same connection object and see if the performance improves.
Then try it with the came command object, just changing the parameter values.
Then try it in parallel with the same connection co you're not creating 1,000 connection objects, which you've already tried.
I would be surprised if you got a significant performance improvement by using parallelism, since Parallel improves the performance of CPU-bound tasks, and data queries are generally much more bound by I/O than CPU.

Categories

Resources