I'm trying to figure out the best way to batch insert about 37k rows into my Sql Server using DAPPER.
My problem is that when I use Parallel.ForEach - the number of connections to the database increases over a short period of time - finally hitting nearly or about 100 ... which gives connection pool errors. If I force the max degree of parall then it's hit that max number and stays there.
Setting the maxdegree feels wrong.
It currently is doing about 10-20 inserts a second. This is also in a simple Console App - so there's no other database activity besides what's happening in my Parallel.ForEach loop.
Is using Parallel.ForEach the incorrect thing in this case because this is not-CPU bound?
Should I be using async/await ? If so, what stopping this from doing hundreds of db calls in one go?
Sample code which is basically what I'm doing.
var items = GetItemsFromSomewhere(); // Returns 37K items.
Parallel.ForEach(items => item)
{
using (var sqlConnection = new SqlConnection(_connectionString))
{
var result = sqlConnection.Execute(myQuery, new { ... } );
}
}
My (incorrect) understanding of this was that there should on be about 8 or so connections at any time to the db. The Connection Pool will release the connection (which remains instantiated in the Connection Pool, waiting to be used). And if the Execute takes .. i donno .. lets say even a 1 second (the longest running time for an insert was about 500ms .. and that's 1 in every 100 or so) ... that's ok .. that thread is blocked and chills until the Execute completes. Then the scope completes (and Dispose is auto called) and the connection closed. With the connection closed, the Parallel.ForEach then grabs the next item in the collection, goes to the connection pool and then grabs a spare connection (remember - we just closed one, a split second ago) ... rinse.repeat.
Is this wrong?
Notes:
.NET 4.5
Sql 2012
Console app.
Using Dapper.NET for sql code.
First of all: If it is about performance, use SqlBulkCopy. This works with SQL-Server. If you are using other database servers, they might have their own SqlBulkCopy-solution (Oracle has one).
SqlBulkCopy works like a bulk-select: One state opens one connection and streams all the data from the server to the client. With an insert, it works the other way arround: It streams all the new records from the client to the server.
See: https://msdn.microsoft.com/en-us/library/ex21zs8x(v=vs.110).aspx
If you insist of using parallellism, you might want to consider the follow code:
void BulkInsert<T>(object p)
{
IEnumerator<T> e = (IEnumerator<T>)p;
using (var sqlConnection = new SqlConnection(_connectionString))
{
while(true)
{
T item;
lock(e)
{
if (!e.MoveNext())
return;
item = e.Current;
}
var result = sqlConnection.Execute(myQuery, new { ... } );
}
}
}
Now create your own threads and invoke this method on these threads with one and the same parameter: The iterator which runs through your collection. Each threat opens its own connection once, starts inserting, and after all items are inserted, the connection is closed. This solutions uses as many connections as your created threads.
PS: Multiple variants of above code are possible . You could call it from background threads, from Tasks, etc. I hope you get the point.
You should use SqlBulkCopy instead of inserting one by one. Faster and more efficient.
https://msdn.microsoft.com/en-us/library/ex21zs8x(v=vs.110).aspx
credits to the answer owner
Sql Bulk Copy/Insert in C#
Related
I'm using the .NET Connector to access a MySQL database from my C# program. All my queries are done with MySqlCommand.BeginExecuteReader, with the IAsyncResults held in a list so I can check them periodically and invoke appropriate callbacks whenever they finish, fetching the data via MySqlCommand.EndExecuteReader. I am careful never to hold one of these readers open while attempting to read results from something else.
This mostly works fine. But I find that if I start two queries at the same time, then I get the dreaded MySqlException: There is already an open DataReader associated with this Connection which must be closed first exception in EndExecuteReader. And this is happening the first time I invoke EndExecuteReader. So the error message is full of baloney; there is no other open DataReader at that point, unless the connector has somehow opened one behind the scenes without me calling EndExecuteReader. So what's going on?
Here's my update loop, including copious logging:
for (int i=queries.Count-1; i>=0; i--) {
Debug.Log("Checking query: " + queries[i].command.CommandText);
if (!queries[i].operation.IsCompleted) continue;
var q = queries[i];
queries.RemoveAt(i);
Debug.Log("Finished, opening Reader for " + q.command.CommandText);
using (var reader = q.command.EndExecuteReader(q.operation)) {
try {
q.callback(reader, null);
} catch (System.Exception ex) {
Logging.LogError("Exception while processing: " + q.command.CommandText);
Logging.LogError(ex.ToString());
q.callback(null, ex.ToString());
}
}
Debug.Log("And done with callback for: " + q.command.CommandText);
}
And here's the log:
As you can see, I start both queries in rapid succession. (This is the first thing my program does after opening the DB connection, just to pin down what's happening.) Then the first one I check says it's done, so I call EndExecuteReader on it, and boom -- already it claims there's another open one. This happens immediately, before it even gets to my callback method. How can that be?
Is it not valid to have two open queries at once, even if I only call EndExecuteReader on one at a time?
When you run two queries concurrently, you must have two Connection objects. Why? Each Connection can only handle one query at a time. It looks like your code got into some kind of race condition where some of your concurrent queries worked and then a pair of them collided and failed.
At any rate your system will be more resilient in production if you can keep your startup sequences simple. If I were you I'd run one query after another rather than trying to run them all at once. (Obvs if that causes real performance problems you'll have to run them concurrently. But keep it simple until you need it to be complex.)
I know this question has been asked many times however none of the answers fit my issue.
I have a thread timer firing every 30 seconds that queries a MSSQL db that is under heavy load. If i need to update the data in the console app that i'm using i use Linq To Sql to update the data stored in memory.
My problem is sometimes I get the error ExecuteReader requires an open and available Connection.
The code from the thread timer fires a Thread.Run(reload());
The connection string is
//example code
void reload(...
string connstring = string.Format("Data Source={0},{1};Initial Catalog={2};User ID={3};Password={4};Application Name={5};Connect Timeout=120;MultipleActiveResultSets=True;Max Pool Size=1524;Pooling=true;"
settings = new ConnectionStringSettings("sqlServer", connstring, "System.Data.SqlClient");
using (var tx = new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions() { IsolationLevel = IsolationLevel.ReadUncommitted }))
{
using (SwitchDataDataContext data = new SwitchDataDataContext(settings.ConnectionString))
{
data.CommandTimeout = 560;
then i do many linqtosql searches. The exceptions happen from time to time but not always on the same query's. it's like the connections is opened and is forced closed.
Sometimes the exceptions says the current status is Open, Closed, Connecting. I add a larger ThreadPool to the SQL db but nothing seems to help.
i also have ADO in other parts of the program without any issues.
I believe that your problem is that the transaction scope also has a timeout. The default timeout is 1 minute according to this answer. So the transaction times out long before your command does (560 seconds = 9.3 minutes or so) . You will need to set the timeout property in the instance of the TransactionOptions object you are creating
new TransactionOptions()
{
IsolationLevel = IsolationLevel.ReadUncommitted,
Timeout = new TimeSpan(0,10,0) /* 10 Minutes */
}
You can verify that is indeed the issue by setting the TransactionScope timeout to a small value to force it to timeout.
I changed Linq To Sql to Entity Framework and received the same type of message. I believe the issues is lazy Loading. I was using the collections before it was ready on a different thread. I just added .Include("Lab") to my collection to load the entire collection and it seems to of fixed the issue.
Some first things that people learned in their early use of MySQL that closing connection right after its usage is important, but why is this so important? Well, if we do it on a website it can save some server resource (as described here) But why we should do that on a .NET desktop application? Does it share the same issues with web application? Or are there others?
If you use connection pooling you won't close the physical connection by calling con.Close, you just tell the pool that this connection can be used. If you call database stuff in a loop you'll get exceptions like "too many open connections" quickly if you don't close them.
Check this:
for (int i = 0; i < 1000; i++)
{
var con = new SqlConnection(Properties.Settings.Default.ConnectionString);
con.Open();
var cmd = new SqlCommand("Select 1", con);
var rd = cmd.ExecuteReader();
while (rd.Read())
Console.WriteLine("{0}) {1}", i, rd.GetInt32(0));
}
One of the possible exceptions:
Timeout expired. The timeout period elapsed prior to obtaining a
connection from the pool. This may have occurred because all pooled
connections were in use and max pool size was reached.
By the way, the same is true for a MySqlConnection.
This is the correct way, use the using statement on all types implementing IDsiposable:
using (var con = new SqlConnection(Properties.Settings.Default.ConnectionString))
{
con.Open();
for (int i = 0; i < 1000; i++)
{
using(var cmd = new SqlCommand("Select 1", con))
using (var rd = cmd.ExecuteReader())
while (rd.Read())
Console.WriteLine("{0}) {1}", i, rd.GetInt32(0));
}
}// no need to close it with the using statement, will be done in connection.Dispose
Yes I think it is important to close out your connection rather than leaving it open or allowing the garbage collector to eventually handle it. There are a couple of reason why you should do this and below that I'll describe the best method for how
WHY:
So you've opened a connection to the database and sent some data back and forth along this pipeline and now have the results you were looking for. Ideally at this point you do something else with the data and the end results of your application is achieved.
Once you have the data from the database you don't need it anymore, its part in this is done so leaving the connection open does nothing but hold up memory and increase the number of connections the database and your application has to keep track of and possibly pushing you closer to your maximum number of connections limit.
"But wait! I have to make a lot of database calls in rapid
succession!"
Okay no problem, open the connection run your calls and then close it out again. Opening a connection to a database in a "modern" application isn't going to cost you a significant amount of computing power/time, while explicitly closing out a connection does nothing but help (frees up memory, lowers your number of current connections).
So that is the why, here is the how
HOW:
So depending on how you are connecting to your MySQL database you a probably using an IDisposible object to help manage the connection. Here is what MSDN has to say on using an IDisposable:
As a rule, when you use an IDisposable object, you should declare and
instantiate it in a using statement. The using statement calls the
Dispose method on the object in the correct way, and (when you use it
as shown earlier) it also causes the object itself to go out of scope
as soon as Dispose is called. Within the using block, the object is
read-only and cannot be modified or reassigned.
Here is my personal take on the subject:
Using a using block helps to keep your code cleaner (readability)
Using a usingblock helps to keep your code clear (memory wise), it will "automagically" clean up unused items
With a usingblock it helps to prevent using a previous connection from being used accidentally as it will automatically close out the connection when you are done with it.
In short, I think it is important to close connections properly, preferably with a con.close() type statement method in combination with a using block
As pointed out in the comments this is also a very good question/answer similar to yours: Why always close Database connection?
I have a simple while loop that checks the database for inserts. If its in the loop too long my try catch gets "max pool size was reached" error. So at the bottom of loop I have a connectionclearallpools(); But that still doesn't solve it.
while (!quit)
{
connection to database strings timeout=400
read from database
connection.clearallpools();
}
Probably you are not closing your connections...you may want to use
while(!quit){
//do something here
using(var connection = GetMyConnection()){
//do your db reads here
}
//do validations and something more here
}
This would ensure your connections are disposed / closed properly.
Also, you do not need to clear your pools.
SQLConnection / DBConnection objects implements IDisposable. You may want to go thru these
Digging into IDisposable
C# Using Statement with SQL Connection
You most probably keep opening new connections in the loop.
Above the loop open the connection is a using statement and then use it in the loop. Also note the removal of the clearallpools:
using(create new connection)
{
while (!quit)
{
connection to database strings timeout=400
read from database
// connection.clearallpools(); REMOVE THIS!!!!
}
}
You're exhausting the connection pool. Following each read you should close the connection which will release it back to the pool.
I need some advice regarding an application I wrote. The issues I am having are due to my DAL and connections to my SQL Server 2008 database not being closed, however I have looked at my code and each connection is always being closed.
The application is a multithreaded application that retrieves a set of records and while it processes a record it updates information about it.
Here is the flow:
The administrator has the ability to set the number of threads to run and how many records per thread to pull.
Here is the code that runs after they click start:
Adapters are abstractions to my DAL here is a sample of what they look like:
public class UserDetailsAdapter: IDataAdapter<UserDetails>
{
private IUserDetailFactory _factory;
public UserDetailsAdapter()
{
_factory = new CampaignFactory();
}
public UserDetails FindById(int id){
return _factory.FindById(id);
}
}
As soon as the _factory is called it processes the SQL and immediately closes the connection.
Code For Threaded App:
private int _recordsPerthread;
private int _threadCount;
public void RunDetails()
{
//create an adapter instance that is an abstration
//of the data factory layer
var adapter = new UserDetailsAdapter();
for (var i = 1; i <= _threadCount; i++)
{
//This adater makes a call tot he databse to pull X amount of records and
//set a lock filed so the next set of records that are pulled are differnt.
var details = adapter.FindTopDetailsInQueue(_recordsPerthread);
if (details != null)
{
var parameters = new ArrayList {i, details};
ThreadPool.QueueUserWorkItem(ThreadWorker, parameters);
}
else
{
break;
}
}
}
private void ThreadWorker(object parametersList)
{
var parms = (ArrayList) parametersList;
var threadCount = (int) parms[0];
var details = (List<UserDetails>) parms[1];
var adapter = new DetailsAdapter();
//we keep running until there are no records left inthe Database
while (!_noRecordsInPool)
{
foreach (var detail in details)
{
var userAdapter = new UserAdapter();
var domainAdapter = new DomainAdapter();
var user = userAdapter.FindById(detail.UserId);
var domain = domainAdapter.FindById(detail.DomainId);
//...do some work here......
adapter.Update(detail);
}
if (!_noRecordsInPool)
{
details = adapter.FindTopDetailsInQueue(_recordsPerthread);
if (details == null || details.Count <= 0)
{
_noRecordsInPool = true;
break;
}
}
}
}
The app crashes because there seem to be connection issues to the database. Looking in my log files for the DAL I am seeing this:
Timeout expired. The timeout period
elapsed prior to obtaining a
connection from the pool. This may
have occurred because all pooled
connections were in use and max pool
size was reached
When I run this in one thread it works fine. I am guessing when I runt his in multiple threads I am obviously making too many connections to the DB. Any thoughts on how I can keep this running in multiple threads and make sure the database doesn’t give me any errors.
Update:
I am thinking my issues may be deadlocks in my database. Here is the code in SQL that is running whe I get a deadlock error:
WITH cte AS (
SELECT TOP (#topCount) *
FROM
dbo.UserDetails WITH (READPAST)
WHERE
dbo.UserDetails where IsLocked = 0)
UPDATE cte
SET
IsLocked = 1
OUTPUT INSERTED.*;
I have never had issues with this code before (in other applications). I reorganzied my Indexes as they were 99% fragmented. That didn't help. I am at a loss here.
I'm confused as to where in your code connections get opened, but you probably want your data adapters to implement IDispose (making sure to close the pool connection as you leave using scope) and wrap your code in using blocks:
using (adapter = new UserDetailsAdapter())
{
for (var i = 1; i <= _threadCount; i++)
{
[..]
}
} // adapter leaves scope here; connection is implicitly marked as no longer necessary
ADO.NET uses connection pooling, so there's no need to (and it can be counter-productive to) explicitly open and close connections.
It is not clear to me how you actually connect to the database. The adapter must reference a connection.
How do you actually initialize that connection?
If you use a new adapter for each thread, you must use a new connection for each adapter.
I am not too familiar with your environment, but I am certain that you really need a lot of open connections before your DB starts complaining about it!
Well, after doing some research I found that there might be a bug in SQL server 2008 and running parallel queries. I’ll have to dig up the link where I found the discussion on this, but I ended up running this on my server:
sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
This can decrease your server performance, overall, so it may not be an option for some people, but it worked great for me.
For some queries I added the MAXDOP(n) (n being the number of processors to utilize) option so they can run more efficiently. It did help a bit.
Secondly, I found out that my DAL’s Dispose method was using the GC.Suppressfinalize method. So, my finally sections were not firing in my DAL properly and not closing out my connections.
Thanks to all who gave their input!