I have a multi-threaded windows application using more that a background worker. every background worker is using some code to update the same SQL Server database and when it finished it runs again. I have noticed that every background worker is using a single connection. I have created a ConcurrentQueue of a custom class to add all the stored procedures to it and execute it from a single backgorundworker to use just one connection as the database is getting very slow when using many connections.
here is my code
this is the stored procedure class
string _procName;
Dictionary<string, object> _parameters;
public string ProcName
{
get { return _procName; }
set { _procName = value; }
}
public Dictionary<string, object> Parameters
{
get { return _parameters; }
set { _parameters = value; }
}
public PSCProc(string procName, Dictionary<string, object> parameters)
{
_procName = procName;
_parameters = parameters;
}
and here is the method used to run the stored procedure
public static void execProc(string procName, Dictionary<string, object> parameters)
{
using (var conn = new SqlConnection(Test.Properties.Settings.Default.testConnection))
using (var command = new SqlCommand(procName, conn)
{
CommandType = CommandType.StoredProcedure
})
{
foreach (var item in parameters)
{
command.Parameters.AddWithValue(item.Key, item.Value);
}
conn.Open();
command.ExecuteNonQuery();
conn.Close();
Form1.updated++;
}
}
and this is how i add an item to the queue
Dictionary<string, object> parameters = new Dictionary<string, object>();
int x = 1;
string address = "cairo";
parameters.Add("#id", x);
parameters.Add("#address", address);
PSCProc proc1 = new PSCProc("updateAddress", parameters);
pscQueue.Enqueue(proc1);
and this how i run the background worker to run the procedures
PSCProc proc;
if (pscQueue.TryDequeue(out proc))
{
helper.execProc(proc.ProcName, proc.Parameters);
}
Note that:
-the background worker that executes the procedures runs again when it finished.
-the database has too many locks as there are hundreds using it.
-the database is very important to be responsive all the time without any locks.
-connection pooling is saving the connections sleeping or suspended all the time.
-the ratio of adding procedures to the queue won't be faster that the ratio of executing them.
My Question Is
Is it better to use this way or using many connections won't affect the Database.
I would make a SQL Agent job that runs your stored procedure. Then your connection can login, start the job, and exit, and SQL Agent will run the job in the background. That way your connections aren't held open while the procedure runs.
That being said, I'll bet your database isn't slow because there are lots of connections, I'll bet it's slow because it's running lots of queries on behalf of those connections. But without knowing the code of your stored procedure nor your schema it's really impossible to know.
The amount of connections could slowdown the SQL Server depending on the actual amount.
One way of slimming down could be by checking whether or not the application is using connection pooling correctly. See this MSDN article for getting it right. A lot depends on the connection string and the state a connection is left in. (If there are open transactions, different credentials it can't be pooled)
Another way is moving the execution of the procedure(s) to a central service and have that service cache the database requests/responses.
Finally I'd have a look at the procedures/queries themselves; you mention that there is a lot of locking going on. Try and find out why. Did you create an insert hotspot at the end of a table? An index might help removing the hotspot. An (insert) trigger might be in the way. See this post for more details
Related
We have the following processes that can modify the same dataset:
Web site (Asp.NET Web API that modifies some parent/child dataset)
Azure Web Job (C# cod that modifies the same parent/child dataset)
What are the recommended ways to ensure that we keep data integrity when the Parent/Child datasets are modified by the two processes simultaneously (C# Lock statement won't work because it's code running in different processes).
Currently they are using Entity Framework, and the process will load the dataset in memory, work on the data, and then save it. The problem is that the data may change by the other process A after it is initially read by process B.
The data is in a SQL Azure database.
Can I create a blocking transaction on the parent table record (Id = XXXX) and so other processes just have to wait until the lock is released. How best to do that?
Otherwise, perhaps some other ideas off the top of my head might be setting a "Locked" field on the parent record, or checking the MAX RowVersion for the parent UNION with each child table (for the Parent ID) and check this RowVersion before and after each change?
The sp_getapplock stored procedure seems to be exactly what you need.
It is very useful when you want to synchronize processes that need to get exclusive access to a part of the database.
This should be something like below. Where ctx is a DbContext and each of the processes you want to synchronize is using the same shared logical name string "MYLOCKNAME", or anything you want.
using (var tran = ctx.Database.BeginTransaction())
{
try
{
const string lockName = "MYLOCKNAME";
var resourceParam = new SqlParameter("Resource", SqlDbType.NVarChar, 255) { Value = lockName };
var lockModeParam = new SqlParameter("LockMode", SqlDbType.NVarChar, 32) { Value = "Transaction" };
ctx.Database.ExecuteSqlCommand("EXEC sp_getapplock #Resource, #LockMode", resourceParam, lockModeParam);
// Do stuff with ctx
// ...
tran.Commit();
}
catch
{
tran.Rollback();
}
}
I'm trying to figure out the best way to batch insert about 37k rows into my Sql Server using DAPPER.
My problem is that when I use Parallel.ForEach - the number of connections to the database increases over a short period of time - finally hitting nearly or about 100 ... which gives connection pool errors. If I force the max degree of parall then it's hit that max number and stays there.
Setting the maxdegree feels wrong.
It currently is doing about 10-20 inserts a second. This is also in a simple Console App - so there's no other database activity besides what's happening in my Parallel.ForEach loop.
Is using Parallel.ForEach the incorrect thing in this case because this is not-CPU bound?
Should I be using async/await ? If so, what stopping this from doing hundreds of db calls in one go?
Sample code which is basically what I'm doing.
var items = GetItemsFromSomewhere(); // Returns 37K items.
Parallel.ForEach(items => item)
{
using (var sqlConnection = new SqlConnection(_connectionString))
{
var result = sqlConnection.Execute(myQuery, new { ... } );
}
}
My (incorrect) understanding of this was that there should on be about 8 or so connections at any time to the db. The Connection Pool will release the connection (which remains instantiated in the Connection Pool, waiting to be used). And if the Execute takes .. i donno .. lets say even a 1 second (the longest running time for an insert was about 500ms .. and that's 1 in every 100 or so) ... that's ok .. that thread is blocked and chills until the Execute completes. Then the scope completes (and Dispose is auto called) and the connection closed. With the connection closed, the Parallel.ForEach then grabs the next item in the collection, goes to the connection pool and then grabs a spare connection (remember - we just closed one, a split second ago) ... rinse.repeat.
Is this wrong?
Notes:
.NET 4.5
Sql 2012
Console app.
Using Dapper.NET for sql code.
First of all: If it is about performance, use SqlBulkCopy. This works with SQL-Server. If you are using other database servers, they might have their own SqlBulkCopy-solution (Oracle has one).
SqlBulkCopy works like a bulk-select: One state opens one connection and streams all the data from the server to the client. With an insert, it works the other way arround: It streams all the new records from the client to the server.
See: https://msdn.microsoft.com/en-us/library/ex21zs8x(v=vs.110).aspx
If you insist of using parallellism, you might want to consider the follow code:
void BulkInsert<T>(object p)
{
IEnumerator<T> e = (IEnumerator<T>)p;
using (var sqlConnection = new SqlConnection(_connectionString))
{
while(true)
{
T item;
lock(e)
{
if (!e.MoveNext())
return;
item = e.Current;
}
var result = sqlConnection.Execute(myQuery, new { ... } );
}
}
}
Now create your own threads and invoke this method on these threads with one and the same parameter: The iterator which runs through your collection. Each threat opens its own connection once, starts inserting, and after all items are inserted, the connection is closed. This solutions uses as many connections as your created threads.
PS: Multiple variants of above code are possible . You could call it from background threads, from Tasks, etc. I hope you get the point.
You should use SqlBulkCopy instead of inserting one by one. Faster and more efficient.
https://msdn.microsoft.com/en-us/library/ex21zs8x(v=vs.110).aspx
credits to the answer owner
Sql Bulk Copy/Insert in C#
First, let me start with saying that I am a C# beginner. My background has mostly been with databases. I am working on a project where there will be frequent calls to a C# server which then calls various stored procedures (about 20 or so) to retrieve data from a SQL Server DB. Right now, the C# server was set up doing synchronous calls. While the SP calls are quick and small, we would still like to implement a thread pool to handle a large pool of users and simultaneous requests.
My questions:
How do I implement a thread pool? Most likely, the thread pool will start around 500, but could grow depending on use of the application.
How do I add the SP calls to the thread pool. Right now my SP call looks like this:
int SPCall(string param1, string param2)
{
string MyConnString = "...";
SqlConnection MyConn = new SqlConnection(MyConnString);
MyConn.Open();
SqlCommand SPCommand = new SqlCommand("wh_SP");
SPCommand.Connection = MyConn;
SPCommand.Parameters.Add(...) = param1;
SPCommand.Parameters.Add(...) = param2;
SPCommand.CommandType = System.Data.CommandType.StoredProcedure;
SPCommand.ExecuteNonQuery();
int outPut = (int)SPCommand.Parameters["#OUTPUT"].Value;
return outPut;
}
As mentioned in the comments, you should use the .NET ThreadPool instead of implementing your own. Even better, use the newer .NET Parallel library and chunk each of these out into a task. You will have far better control over how your concurrency is handled with relatively little code.
public void PerformWork()
{
// setup your inputs
IEnumerable<string> inputs = CreateYourInputList();
// Method signature: Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body)
Parallel.ForEach(inputs, input =>
{
// call your code that issues the stored procedure here
this.SPCall(input);
} //close lambda expression
); //close method invocation
// Keep the console window open in debug mode.
Console.WriteLine("Processing complete. Press any key to exit.");
Console.ReadKey();
}
I need some advice regarding an application I wrote. The issues I am having are due to my DAL and connections to my SQL Server 2008 database not being closed, however I have looked at my code and each connection is always being closed.
The application is a multithreaded application that retrieves a set of records and while it processes a record it updates information about it.
Here is the flow:
The administrator has the ability to set the number of threads to run and how many records per thread to pull.
Here is the code that runs after they click start:
Adapters are abstractions to my DAL here is a sample of what they look like:
public class UserDetailsAdapter: IDataAdapter<UserDetails>
{
private IUserDetailFactory _factory;
public UserDetailsAdapter()
{
_factory = new CampaignFactory();
}
public UserDetails FindById(int id){
return _factory.FindById(id);
}
}
As soon as the _factory is called it processes the SQL and immediately closes the connection.
Code For Threaded App:
private int _recordsPerthread;
private int _threadCount;
public void RunDetails()
{
//create an adapter instance that is an abstration
//of the data factory layer
var adapter = new UserDetailsAdapter();
for (var i = 1; i <= _threadCount; i++)
{
//This adater makes a call tot he databse to pull X amount of records and
//set a lock filed so the next set of records that are pulled are differnt.
var details = adapter.FindTopDetailsInQueue(_recordsPerthread);
if (details != null)
{
var parameters = new ArrayList {i, details};
ThreadPool.QueueUserWorkItem(ThreadWorker, parameters);
}
else
{
break;
}
}
}
private void ThreadWorker(object parametersList)
{
var parms = (ArrayList) parametersList;
var threadCount = (int) parms[0];
var details = (List<UserDetails>) parms[1];
var adapter = new DetailsAdapter();
//we keep running until there are no records left inthe Database
while (!_noRecordsInPool)
{
foreach (var detail in details)
{
var userAdapter = new UserAdapter();
var domainAdapter = new DomainAdapter();
var user = userAdapter.FindById(detail.UserId);
var domain = domainAdapter.FindById(detail.DomainId);
//...do some work here......
adapter.Update(detail);
}
if (!_noRecordsInPool)
{
details = adapter.FindTopDetailsInQueue(_recordsPerthread);
if (details == null || details.Count <= 0)
{
_noRecordsInPool = true;
break;
}
}
}
}
The app crashes because there seem to be connection issues to the database. Looking in my log files for the DAL I am seeing this:
Timeout expired. The timeout period
elapsed prior to obtaining a
connection from the pool. This may
have occurred because all pooled
connections were in use and max pool
size was reached
When I run this in one thread it works fine. I am guessing when I runt his in multiple threads I am obviously making too many connections to the DB. Any thoughts on how I can keep this running in multiple threads and make sure the database doesn’t give me any errors.
Update:
I am thinking my issues may be deadlocks in my database. Here is the code in SQL that is running whe I get a deadlock error:
WITH cte AS (
SELECT TOP (#topCount) *
FROM
dbo.UserDetails WITH (READPAST)
WHERE
dbo.UserDetails where IsLocked = 0)
UPDATE cte
SET
IsLocked = 1
OUTPUT INSERTED.*;
I have never had issues with this code before (in other applications). I reorganzied my Indexes as they were 99% fragmented. That didn't help. I am at a loss here.
I'm confused as to where in your code connections get opened, but you probably want your data adapters to implement IDispose (making sure to close the pool connection as you leave using scope) and wrap your code in using blocks:
using (adapter = new UserDetailsAdapter())
{
for (var i = 1; i <= _threadCount; i++)
{
[..]
}
} // adapter leaves scope here; connection is implicitly marked as no longer necessary
ADO.NET uses connection pooling, so there's no need to (and it can be counter-productive to) explicitly open and close connections.
It is not clear to me how you actually connect to the database. The adapter must reference a connection.
How do you actually initialize that connection?
If you use a new adapter for each thread, you must use a new connection for each adapter.
I am not too familiar with your environment, but I am certain that you really need a lot of open connections before your DB starts complaining about it!
Well, after doing some research I found that there might be a bug in SQL server 2008 and running parallel queries. I’ll have to dig up the link where I found the discussion on this, but I ended up running this on my server:
sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
This can decrease your server performance, overall, so it may not be an option for some people, but it worked great for me.
For some queries I added the MAXDOP(n) (n being the number of processors to utilize) option so they can run more efficiently. It did help a bit.
Secondly, I found out that my DAL’s Dispose method was using the GC.Suppressfinalize method. So, my finally sections were not firing in my DAL properly and not closing out my connections.
Thanks to all who gave their input!
I'm performing a large number of INSERTS to a SQLite database. I'm using just one thread. I batch the writes to improve performance and have a bit of security in case of a crash. Basically I cache up a bunch of data in memory and then when I deem appropriate, I loop over all of that data and perform the INSERTS. The code for this is shown below:
public void Commit()
{
using (SQLiteConnection conn = new SQLiteConnection(this.connString))
{
conn.Open();
using (SQLiteTransaction trans = conn.BeginTransaction())
{
using (SQLiteCommand command = conn.CreateCommand())
{
command.CommandText = "INSERT OR IGNORE INTO [MY_TABLE] (col1, col2) VALUES (?,?)";
command.Parameters.Add(this.col1Param);
command.Parameters.Add(this.col2Param);
foreach (Data o in this.dataTemp)
{
this.col1Param.Value = o.Col1Prop;
this. col2Param.Value = o.Col2Prop;
command.ExecuteNonQuery();
}
}
this.TryHandleCommit(trans);
}
conn.Close();
}
}
I now employ the following gimmick to get the thing to eventually work:
private void TryHandleCommit(SQLiteTransaction trans)
{
try
{
trans.Commit();
}
catch (Exception e)
{
Console.WriteLine("Trying again...");
this.TryHandleCommit(trans);
}
}
I create my DB like so:
public DataBase(String path)
{
//build connection string
SQLiteConnectionStringBuilder connString = new SQLiteConnectionStringBuilder();
connString.DataSource = path;
connString.Version = 3;
connString.DefaultTimeout = 5;
connString.JournalMode = SQLiteJournalModeEnum.Persist;
connString.UseUTF16Encoding = true;
using (connection = new SQLiteConnection(connString.ToString()))
{
//check for existence of db
FileInfo f = new FileInfo(path);
if (!f.Exists) //build new blank db
{
SQLiteConnection.CreateFile(path);
connection.Open();
using (SQLiteTransaction trans = connection.BeginTransaction())
{
using (SQLiteCommand command = connection.CreateCommand())
{
command.CommandText = DataBase.CREATE_MATCHES;
command.ExecuteNonQuery();
command.CommandText = DataBase.CREATE_STRING_DATA;
command.ExecuteNonQuery();
//TODO add logging
}
trans.Commit();
}
connection.Close();
}
}
}
I then export the connection string and use it to obtain new connections in different parts of the program.
At seemingly random intervals, though at far too great a rate to ignore or otherwise workaround this problem, I get unhandled SQLiteException: Database file is locked. This occurs when I attempt to commit the transaction. No errors seem to occur prior to then. This does not always happen. Sometimes the whole thing runs without a hitch.
No reads are being performed on these files before the commits finish.
I have the very latest SQLite binary.
I'm compiling for .NET 2.0.
I'm using VS 2008.
The db is a local file.
All of this activity is encapsulated within one thread / process.
Virus protection is off (though I think that was only relevant if you were connecting over a network?).
As per Scotsman's post I have implemented the following changes:
Journal Mode set to Persist
DB files stored in C:\Docs + Settings\ApplicationData via System.Windows.Forms.Application.AppData windows call
No inner exception
Witnessed on two distinct machines (albeit very similar hardware and software)
Have been running Process Monitor - no extraneous processes are attaching themselves to the DB files - the problem is definitely in my code...
Does anyone have any idea whats going on here?
I know I just dropped a whole mess of code, but I've been trying to figure this out for way too long. My thanks to anyone who makes it to the end of this question!
brian
UPDATES:
Thanks for the suggestions so far! I've implemented many of the suggested changes. I feel that we are getting closer to the answer...however...
The code above technically works however it is non-deterministic! It is not guaranteed to do anything aside from spin in neutral forever. In practice it seems to work somewhere between the 1st and 10th iteration. If i batch my commits at a reasonable interval damage will be mitigated but I really do not want to leave things in this state...
More suggestions welcome!
It looks like you failed to link the command with the transaction you've created.
Instead of:
using (SQLiteCommand command = conn.CreateCommand())
You should use:
using (SQLiteCommand command = new SQLiteCommand("<INSERT statement here>", conn, trans))
Or you can set its Transaction property after its construction.
While we are at it - your handling of failures is incorrect:
The command's ExecuteNonQuery method can also fail and you are not really protected. You should change the code to something like:
public void Commit()
{
using (SQLiteConnection conn = new SQLiteConnection(this.connString))
{
conn.Open();
SQLiteTransaction trans = conn.BeginTransaction();
try
{
using (SQLiteCommand command = conn.CreateCommand())
{
command.Transaction = trans; // Now the command is linked to the transaction and don't try to create a new one (which is probably why your database gets locked)
command.CommandText = "INSERT OR IGNORE INTO [MY_TABLE] (col1, col2) VALUES (?,?)";
command.Parameters.Add(this.col1Param);
command.Parameters.Add(this.col2Param);
foreach (Data o in this.dataTemp)
{
this.col1Param.Value = o.Col1Prop;
this. col2Param.Value = o.Col2Prop;
command.ExecuteNonQuery();
}
}
trans.Commit();
}
catch (SQLiteException ex)
{
// You need to rollback in case something wrong happened in command.ExecuteNonQuery() ...
trans.Rollback();
throw;
}
}
}
Another thing is that you don't need to cache anything in memory. You can depend on SQLite journaling mechanism for storing incomplete transaction state.
Run Sysinternals Process Monitor and filter on filename while running your program to rule out if any other process does anything to it and to see what exacly your program is doing to the file. Long shot, but might give a clue.
We had a very similar problem using nested Transactions with the TransactionScope class. We thought all database actions occurred on the same thread...however we were caught out by the Transaction mechanism...more specifically the Ambient transaction.
Basically there was a transaction higher up the chain which, by the magic of ado, the connection automatically enlisted in. The result was that, even though we thought we were writing to the database on a single thread, the write didn't really happen until the topmost transaction was committed. At this 'indeterminate' point the database was written to causing it to be locked outside of our control.
The solution was to ensure that the sqlite database did not directly take part in the ambient transaction by ensuring we used something like:
using(TransactionScope scope = new TransactionScope(TransactionScopeOptions.RequiresNew))
{
...
scope.Complete()
}
Things to watch for:
don't use connections across multiple threads/processes.
I've seen it happen when a virus scanner would detect changes to the file and try to scan it. It would lock the file for a short interval and cause havoc.
I started facing this same problem today: I'm studying asp.net mvc, building my first application completely from scratch. Sometimes, when I'd write to the database, I'd get the same exception, saying the database file was locked.
I found it really strange, since I was completely sure that there was just one connection open at that time (based on process explorer's listing of active file handles).
I've also built the whole data access layer from scratch, using System.Data.SQLite .Net provider, and, when I planned it, I took special care with connections and transactions, in order to ensure no connection or transaction was left hanging around.
The tricky part was that setting a breakpoint on ExecuteNonQuery() command and running the application in debug mode would make the error disappear!
Googling, I found something interesting on this site: http://www.softperfect.com/board/read.php?8,5775. There, someone replied the thread suggesting the author to put the database path on the anti-virus ignore list.
I added the database file to the ignore list of my anti-virus (Microsoft Security Essentials) and it solved my problem. No more database locked errors!
Is your database file on the same machine as the app or is it stored on a server?
You should create a new connection in every thread. I would simplefy the creation of a connection, use everywhere: connection = new SQLiteConnection(connString.ToString());
and use a database file on the same machine as the app and test again.
Why the two different ways of creating a connection?
These guys were having similiar problems (mostly, it appears, with the journaling file being locked, maybe TortoiseSVN interactions ... check the referenced articles).
They came up with a set of recommendations (correct directories, changing journaling types from delete to persist, etc). http://sqlite.phxsoftware.com/forums/p/689/5445.aspx#5445
The journal mode options are discussed here: http://www.sqlite.org/pragma.html . You could try TRUNCATE.
Is there a stack trace during the exception into SQL Lite?
You indicate you "batch my commits at a reasonable interval". What is the interval?
I would always use a Connection, Transaction and Command in a using clause. In your first code listing you did, but your third (creating the tables) you didn't. I suggest you do that too, because (who knows?) maybe the commands that create the table somehow continue to lock the file. Long shot... but worth a shot?
Do you have Google Desktop Search (or another file indexer) running? As previously mentioned, Sysinternals Process Monitor can help you track it down.
Also, what is the filename of the database? From PerformanceTuningWindows:
Be VERY, VERY careful what you name your database, especially the extension
For example, if you give all your databases the extension .sdb (SQLite Database, nice name hey? I thought so when I choose it anyway...) you discover that the SDB extension is already associated with APPFIX PACKAGES.
Now, here is the cute part, APPFIX is an executable/package that Windows XP recognizes, and it will, (emphasis mine) ADD THE DATABASE TO THE SYSTEM RESTORE FUNCTIONALITY
This means, stay with me here, every time you write ANYTHING to the database, the Windows XP system thinks a bloody executable has changed and copies your ENTIRE 800 meg database to the system restore directory....
I recommend something like DB or DAT.
While the lock is reported on the COMMIT, the lock is on the INSERT/UPDATE command. Check for record locks not being released earlier in your code.