Reviewing some legacy code, there is a commonally used table that gets updated very infrequently.
To save having to constantly go to the database to get the same data each time, it seems like the developer was trying to cache the data. The code looks like this:
private static IDataReader _cachedCheckList;
public override IDataReader GetDataReader()
{
if (_cachedCheckList == null)
{
using (var oneTimeRead = base.GetDataReader())
{
_cachedCheckList = new CachedDataReader(oneTimeRead);
}
}
return _cachedCheckList ?? base.GetDataReader();
}
Then elsewhere in the system the function that uses this follows the pattern of:
IDataReader reader = new CheckList().GetDataReader();
while (reader.Read())
{
[snip]
}
By loading the IReader into memory, I don't think that this provides much in the way of a performance increase.
I'm trying to understand the developers reason for this code. What is the benefit of caching the IReader?
Update: The CachedDataReader() method is basically:
SqlConnection connection = new SqlConnection(ConnectionString);
connection.Open();
var sqlCommand = new SqlCommand(commandText, connection)
command.CommandType = CommandType.StoredProcedure;
return command.ExecuteReader();
I'd not seen anyone cache a DataReader before and was wondering there was a good reason to do this before refactoring the code.
May be
they used DataReader cache for following reasons and they are as follows :-
DataReader readonly and forward only. It fetches the record from databse and stores in the network buffer and gives whenever requests. DataReader release the records as query executes and do not wait for the entire query to execute. Therefore it is very fast as compare to the dataset. It releases only when read method is called.
DataReader should not be Cached. You should fetch data in DataSet or DataTable and then use:
Cache["Data"] = DataTable;
Never cache DataReader objects. Because a DataReader object holds an open connection to the database, caching the object extends the lifetime of the connection, affecting other users of the database. Also, because theDataReader is a forward-only stream of data, after a client has read the information, the information cannot be accessed again. Caching it would be futile.
Caching DataReader objects disastrously affects the scalability of your applications. You may hold connections open and eventually cache all available connections, making the database unusable until the connections are closed. Never cache DataReader objects no matter what caching technology you are using.
Above quotes taken from https://forums.asp.net/post/3224692.aspx
Related
I am using IBM.Data.DB2.DB2DataAdapter object to make multiple connections to different databases on different servers.
My basic loop and connection structure looks like this:
foreach (MyDBObject db in allDBs)
{
//Database Call here for current DB...//Get SQL, then pass it to DB call
QueryCurrentDB(command);
}
Then...
DB2Connection _connection;
Public DataTable QueryCurrentDB(DB2Command command)
{
_connection = new DB2Connection();
DB2DataAdapter adapter = new DB2DataAdapter();
_connection.ConnectionString = string.Format("Server={0};Database={1};UID={2};PWD={3};", _currentDB.DBServer, _currentDB.DBName, _currentDB.UserCode, _currentDB.Password);
command.CommandTimeout = 20;
command.Connection = _connection;
adapter.SelectCommand = command;
_connection.Open();
adapter.Fill(dataTable);
_connection.Close();
_connection.Dispose();
return dataTable;
}
If I have around 20 or so databases on different servers I end up eventually getting this exception. I cannot control the memory allocation for each db instance either.
ERROR [57019] [IBM] SQL1084C The database manager failed to allocate shared memory because an operating system kernel memory limit has been reached. SQLSTATE=57019
The only way I have been able to get around this is to put a thread sleep before each db call, such as:
System.Threading.Thread.Sleep(3000);
I hate this, any suggestions would be appreciated.
In the code posted, the Connection, Command and DataAdapter are all IDisposable indicating they need to be disposed to free allocated resources. But only the DBConnection object is actually disposed. Particularly in a loop such as you have, it is important to dispose of those to prevent leaks.
I dont have the DB2 providers, but they all work pretty much the same, especially in this regard. I'd start by refactoring the code starting with MyDBObject. Rather than just holding onto connection string params, have it create the connection(s) for you:
class MyDBObject
{
private const string fmt = "Server={0};Database={1};UID={2};PWD={3};";
...
public DB2Connection GetConnection()
{
return new DB2Connection(string.Format(fmt,
DBServer,DBName,UserCode,Password));
}
}
Then the loop method:
// this also could be a method in MyDbObject
public DataTable QueryCurrentDB(string SQL)
{
DataTable dt = new DataTable();
using (DB2Connection dbcon = currentDB.GetConnection())
using (DB2Command cmd = new DB2Command(SQL, dbcon))
{
cmd.CommandTimeout = 20;
dbcon.Open();
dt.Load(cmd.ExecuteReader());
}
return dt;
}
Most importantly, note that the IDisposable objects are all enclosed in a using block. This will dispose (and close) the target and release any resources allocated.
You dont need a DataAdapter to fill a table. Omitting it means one less IDisposable thing created.
Rather than passing in the command, pass in the SQL. This allows you to also create, use and dispose of the DBCommand object.
If there is a chance of 2 tables in the same DB getting polled, I'd refactor further to make it possible to fill both tables on the same connection.
Before: 2 out of 3 objects were not being disposed (per iteration!)
After: 2 out of 2 objects are disposed.
I suspect the culprit was the DBCommand object (similar to this question), but it could be a combination of them.
Putting the thread to sleep (probably) works because it gives GC a chance to catch up on cleanup. You are probably not out of the woods yet. The link above was running into problems at 400 iterations; 20 or even 40 (20*2 objects) seems like a very small number to exhaust resources.
So, I suspect other parts of the code are also failing to dispose properly and that loop is just the straw which breaks the camel's back. Look for other loops and DB objects being used and be sure to dispose of them. Basically, anything which has a Dispose() method ought to be used in a using block.
Some first things that people learned in their early use of MySQL that closing connection right after its usage is important, but why is this so important? Well, if we do it on a website it can save some server resource (as described here) But why we should do that on a .NET desktop application? Does it share the same issues with web application? Or are there others?
If you use connection pooling you won't close the physical connection by calling con.Close, you just tell the pool that this connection can be used. If you call database stuff in a loop you'll get exceptions like "too many open connections" quickly if you don't close them.
Check this:
for (int i = 0; i < 1000; i++)
{
var con = new SqlConnection(Properties.Settings.Default.ConnectionString);
con.Open();
var cmd = new SqlCommand("Select 1", con);
var rd = cmd.ExecuteReader();
while (rd.Read())
Console.WriteLine("{0}) {1}", i, rd.GetInt32(0));
}
One of the possible exceptions:
Timeout expired. The timeout period elapsed prior to obtaining a
connection from the pool. This may have occurred because all pooled
connections were in use and max pool size was reached.
By the way, the same is true for a MySqlConnection.
This is the correct way, use the using statement on all types implementing IDsiposable:
using (var con = new SqlConnection(Properties.Settings.Default.ConnectionString))
{
con.Open();
for (int i = 0; i < 1000; i++)
{
using(var cmd = new SqlCommand("Select 1", con))
using (var rd = cmd.ExecuteReader())
while (rd.Read())
Console.WriteLine("{0}) {1}", i, rd.GetInt32(0));
}
}// no need to close it with the using statement, will be done in connection.Dispose
Yes I think it is important to close out your connection rather than leaving it open or allowing the garbage collector to eventually handle it. There are a couple of reason why you should do this and below that I'll describe the best method for how
WHY:
So you've opened a connection to the database and sent some data back and forth along this pipeline and now have the results you were looking for. Ideally at this point you do something else with the data and the end results of your application is achieved.
Once you have the data from the database you don't need it anymore, its part in this is done so leaving the connection open does nothing but hold up memory and increase the number of connections the database and your application has to keep track of and possibly pushing you closer to your maximum number of connections limit.
"But wait! I have to make a lot of database calls in rapid
succession!"
Okay no problem, open the connection run your calls and then close it out again. Opening a connection to a database in a "modern" application isn't going to cost you a significant amount of computing power/time, while explicitly closing out a connection does nothing but help (frees up memory, lowers your number of current connections).
So that is the why, here is the how
HOW:
So depending on how you are connecting to your MySQL database you a probably using an IDisposible object to help manage the connection. Here is what MSDN has to say on using an IDisposable:
As a rule, when you use an IDisposable object, you should declare and
instantiate it in a using statement. The using statement calls the
Dispose method on the object in the correct way, and (when you use it
as shown earlier) it also causes the object itself to go out of scope
as soon as Dispose is called. Within the using block, the object is
read-only and cannot be modified or reassigned.
Here is my personal take on the subject:
Using a using block helps to keep your code cleaner (readability)
Using a usingblock helps to keep your code clear (memory wise), it will "automagically" clean up unused items
With a usingblock it helps to prevent using a previous connection from being used accidentally as it will automatically close out the connection when you are done with it.
In short, I think it is important to close connections properly, preferably with a con.close() type statement method in combination with a using block
As pointed out in the comments this is also a very good question/answer similar to yours: Why always close Database connection?
I'd like to know the correct approach for running two simultaneous queries using NHibernate. Right now, I have a single ISession object that I use for all my queries:
session = sessionFactory.OpenSession();
In one thread, I'm loading some data which takes 10-15 seconds, but I don't need it right away so I don't want to block the entire program while it's loading:
IDbCommand cmd = session.Connection.CreateCommand();
cmd.CommandType = CommandType.TableDirect;
cmd.CommandText = "RecipesForModelingGraph";
IDataReader reader = cmd.ExecuteReader();
while (reader.Read())
{
// Do stuff
}
reader.Close();
This works fine, however in another thread I might be running a query such as:
var newBlah = new Blah();
session.Save(newBlah);
When the above transaction commits, I occasionally get an exception:
Additional information: There is already an open DataReader associated
with this Command which must be closed first.
Now, I thought maybe this was because I was running everything in the same transaction. So, I surrounded all my loading code with:
using (ITransaction transaction = session.BeginTransaction(IsolationLevel.Serializable))
{
// Same DataReader code as above
}
However, the problem has not gone away. I'm thinking maybe I need each thread to have its own ISession object. Is this the correct approach, or am I doing something wrong. Note, I only want a single open connection to the database. Also, keep in mind the background thread is only loading data and nothing else, so I'm not worried about isolation levels and data changing as its being read.
The session is tied to the thread and the Commands created are linked to the sessions connection object. So yes, if a commit or close is executed while an open reader exists you will get an exception.
You could Join() your threads and wait until all are complete before closing/committing.
I have distributive DB architecture where data is stored in multiple SQL servers.
how can i do select/update/delete by running a single query. for example "select * from employees" should return data from all databases i have.
How can write single query which run across multiple SQL servers and gets a single consolidated view to my web server.
NOTE: Since the number of SQL servers may change at varied times so I am looking for something else than linked queries since managing the linked queries at scale( up or down) is a big pain
To talk to different databases / connections, you'll need a distributed transaction via TransactionScope; fortunately, this is actually easier than db-transactions (although you need a reference to System.Transactions.dll):
using(TransactionScope tran = new TransactionScope()) {
// lots of code talking to different databases / connections
tran.Complete();
}
Additionally, TransactionScope nest naturally, and SqlConnection enlists automatically, making it really easy to use.
Use TransactionScope.
If you open connections to different servers within the scope, the transaction will be escelated to a distributed transaction.
Example:
using (TransactionScope scope = new TransactionScope())
{
conn1.Open(); //Open connection to db1
conn2.Open(); //Open connection to db2
// Don't forget to commit the transaction so it won't rollback
scope.Complete()
}
You can't do what you're after with a single query unless you're willing to insert an intermediary of some kind, such as a SQL Express instance, that would mediate with the other servers, perhaps using SQL CLR. But that's messy.
It would be much easier to simply issue a bunch of async requests, and then merge the responses into a single DataTable (or equivalent) when they arrive. By using native ADO.NET style async calls, all queries can happen in parallel. You will of course need to use a lock while reading the data into a single DataTable.
The best solution here is to use a Virtual DBMS to blend your multiple back-ends into a single apparent backend -- so your query goes to the Virtual DBMS which then relays it appropriately to the actual data stores.
OpenLink Virtuoso is one option. Virtuoso opens connections to any ODBC-accessible (including JDBC-accessible, via an ODBC-to-JDBC Bridge) data source.
Your data consuming applications can connect to Virtuoso via ODBC, JDBC, OLE-DB, or ADO.NET as needed. All remote linked objects (Tables, Views, Stored Procedures, etc.) are available through all data access mechanisms.
While you can achieve similar results using the other techniques outlined here, those require the end user to know all about the back-end data structures, and to optimize queries themselves. With Virtuoso, a built-in Cost-based Optimizer will re-write queries to deliver the fastest possible results, with the least possible network traffic, based on the Virtual Schema constructed when you link in the remote objects.
Disclaimer: I work for OpenLink Software, but do not directly benefit from anyone choosing to use our products.
If you do not have access or privilegies to make a Linked server and generates View with consolidate JOINed Query with all Sql Servers, fill results from the same query statement at all Sql Server instances and make and make a union of results. Looping throwing all Databases connections and add collected data to a consolidated collection data structure, at this example I choice DataTable:
DataTable consolidatedEmployees = new DataTable();
foreach(ConnectionStringSettings cs in ConfigurationManager.ConnectionStrings)
{
consolidatedEmployees.Merge(
SelectTransaction("select * from employees", cs.ConnectionString));
}
Using this method example to query any SQL Server Database based on ADO.NET:
/// <summary>
/// Method to execute SQL Query statements with
/// Transaction scope using isolation level to select read commited data
/// </summary>
/// <param name="query">SQL Query statement</param>
/// <param name="connString">Connections String</param>
internal DataTable SelectTransaction(string query, string connString)
{
DataTable tableResult = null;
SqlCommand cmd = null;
SqlConnection conn = null;
SqlDataAdapter adapter = null;
TransactionOptions tranOpt = new TransactionOptions();
tranOpt.IsolationLevel = IsolationLevel.ReadCommitted;
using (TransactionScope scope = new TransactionScope(TransactionScopeOption.Required, tranOpt))
{
tableResult = new DataTable();
try
{
conn = new SqlConnection(connString);
conn.Open();
cmd = new SqlCommand(query, conn);
adapter = new SqlDataAdapter(cmd);
adapter.Fill(tableResult);
break;
}
catch (Exception ex)
{
scope.Dispose();
throw new Exception("Erro durante a transação ao banco de Dados.", ex);
}
finally
{
if (null != adapter)
{
adapter.Dispose();
}
if (null != cmd)
{
cmd.Dispose();
}
if (null != conn)
{
conn.Close();
conn.Dispose();
}
}
scope.Complete();
}
return tableResult;
}
With this solution, only per attention to replicated data, needs after to make a distinct on consolidated result.
I'm performing a large number of INSERTS to a SQLite database. I'm using just one thread. I batch the writes to improve performance and have a bit of security in case of a crash. Basically I cache up a bunch of data in memory and then when I deem appropriate, I loop over all of that data and perform the INSERTS. The code for this is shown below:
public void Commit()
{
using (SQLiteConnection conn = new SQLiteConnection(this.connString))
{
conn.Open();
using (SQLiteTransaction trans = conn.BeginTransaction())
{
using (SQLiteCommand command = conn.CreateCommand())
{
command.CommandText = "INSERT OR IGNORE INTO [MY_TABLE] (col1, col2) VALUES (?,?)";
command.Parameters.Add(this.col1Param);
command.Parameters.Add(this.col2Param);
foreach (Data o in this.dataTemp)
{
this.col1Param.Value = o.Col1Prop;
this. col2Param.Value = o.Col2Prop;
command.ExecuteNonQuery();
}
}
this.TryHandleCommit(trans);
}
conn.Close();
}
}
I now employ the following gimmick to get the thing to eventually work:
private void TryHandleCommit(SQLiteTransaction trans)
{
try
{
trans.Commit();
}
catch (Exception e)
{
Console.WriteLine("Trying again...");
this.TryHandleCommit(trans);
}
}
I create my DB like so:
public DataBase(String path)
{
//build connection string
SQLiteConnectionStringBuilder connString = new SQLiteConnectionStringBuilder();
connString.DataSource = path;
connString.Version = 3;
connString.DefaultTimeout = 5;
connString.JournalMode = SQLiteJournalModeEnum.Persist;
connString.UseUTF16Encoding = true;
using (connection = new SQLiteConnection(connString.ToString()))
{
//check for existence of db
FileInfo f = new FileInfo(path);
if (!f.Exists) //build new blank db
{
SQLiteConnection.CreateFile(path);
connection.Open();
using (SQLiteTransaction trans = connection.BeginTransaction())
{
using (SQLiteCommand command = connection.CreateCommand())
{
command.CommandText = DataBase.CREATE_MATCHES;
command.ExecuteNonQuery();
command.CommandText = DataBase.CREATE_STRING_DATA;
command.ExecuteNonQuery();
//TODO add logging
}
trans.Commit();
}
connection.Close();
}
}
}
I then export the connection string and use it to obtain new connections in different parts of the program.
At seemingly random intervals, though at far too great a rate to ignore or otherwise workaround this problem, I get unhandled SQLiteException: Database file is locked. This occurs when I attempt to commit the transaction. No errors seem to occur prior to then. This does not always happen. Sometimes the whole thing runs without a hitch.
No reads are being performed on these files before the commits finish.
I have the very latest SQLite binary.
I'm compiling for .NET 2.0.
I'm using VS 2008.
The db is a local file.
All of this activity is encapsulated within one thread / process.
Virus protection is off (though I think that was only relevant if you were connecting over a network?).
As per Scotsman's post I have implemented the following changes:
Journal Mode set to Persist
DB files stored in C:\Docs + Settings\ApplicationData via System.Windows.Forms.Application.AppData windows call
No inner exception
Witnessed on two distinct machines (albeit very similar hardware and software)
Have been running Process Monitor - no extraneous processes are attaching themselves to the DB files - the problem is definitely in my code...
Does anyone have any idea whats going on here?
I know I just dropped a whole mess of code, but I've been trying to figure this out for way too long. My thanks to anyone who makes it to the end of this question!
brian
UPDATES:
Thanks for the suggestions so far! I've implemented many of the suggested changes. I feel that we are getting closer to the answer...however...
The code above technically works however it is non-deterministic! It is not guaranteed to do anything aside from spin in neutral forever. In practice it seems to work somewhere between the 1st and 10th iteration. If i batch my commits at a reasonable interval damage will be mitigated but I really do not want to leave things in this state...
More suggestions welcome!
It looks like you failed to link the command with the transaction you've created.
Instead of:
using (SQLiteCommand command = conn.CreateCommand())
You should use:
using (SQLiteCommand command = new SQLiteCommand("<INSERT statement here>", conn, trans))
Or you can set its Transaction property after its construction.
While we are at it - your handling of failures is incorrect:
The command's ExecuteNonQuery method can also fail and you are not really protected. You should change the code to something like:
public void Commit()
{
using (SQLiteConnection conn = new SQLiteConnection(this.connString))
{
conn.Open();
SQLiteTransaction trans = conn.BeginTransaction();
try
{
using (SQLiteCommand command = conn.CreateCommand())
{
command.Transaction = trans; // Now the command is linked to the transaction and don't try to create a new one (which is probably why your database gets locked)
command.CommandText = "INSERT OR IGNORE INTO [MY_TABLE] (col1, col2) VALUES (?,?)";
command.Parameters.Add(this.col1Param);
command.Parameters.Add(this.col2Param);
foreach (Data o in this.dataTemp)
{
this.col1Param.Value = o.Col1Prop;
this. col2Param.Value = o.Col2Prop;
command.ExecuteNonQuery();
}
}
trans.Commit();
}
catch (SQLiteException ex)
{
// You need to rollback in case something wrong happened in command.ExecuteNonQuery() ...
trans.Rollback();
throw;
}
}
}
Another thing is that you don't need to cache anything in memory. You can depend on SQLite journaling mechanism for storing incomplete transaction state.
Run Sysinternals Process Monitor and filter on filename while running your program to rule out if any other process does anything to it and to see what exacly your program is doing to the file. Long shot, but might give a clue.
We had a very similar problem using nested Transactions with the TransactionScope class. We thought all database actions occurred on the same thread...however we were caught out by the Transaction mechanism...more specifically the Ambient transaction.
Basically there was a transaction higher up the chain which, by the magic of ado, the connection automatically enlisted in. The result was that, even though we thought we were writing to the database on a single thread, the write didn't really happen until the topmost transaction was committed. At this 'indeterminate' point the database was written to causing it to be locked outside of our control.
The solution was to ensure that the sqlite database did not directly take part in the ambient transaction by ensuring we used something like:
using(TransactionScope scope = new TransactionScope(TransactionScopeOptions.RequiresNew))
{
...
scope.Complete()
}
Things to watch for:
don't use connections across multiple threads/processes.
I've seen it happen when a virus scanner would detect changes to the file and try to scan it. It would lock the file for a short interval and cause havoc.
I started facing this same problem today: I'm studying asp.net mvc, building my first application completely from scratch. Sometimes, when I'd write to the database, I'd get the same exception, saying the database file was locked.
I found it really strange, since I was completely sure that there was just one connection open at that time (based on process explorer's listing of active file handles).
I've also built the whole data access layer from scratch, using System.Data.SQLite .Net provider, and, when I planned it, I took special care with connections and transactions, in order to ensure no connection or transaction was left hanging around.
The tricky part was that setting a breakpoint on ExecuteNonQuery() command and running the application in debug mode would make the error disappear!
Googling, I found something interesting on this site: http://www.softperfect.com/board/read.php?8,5775. There, someone replied the thread suggesting the author to put the database path on the anti-virus ignore list.
I added the database file to the ignore list of my anti-virus (Microsoft Security Essentials) and it solved my problem. No more database locked errors!
Is your database file on the same machine as the app or is it stored on a server?
You should create a new connection in every thread. I would simplefy the creation of a connection, use everywhere: connection = new SQLiteConnection(connString.ToString());
and use a database file on the same machine as the app and test again.
Why the two different ways of creating a connection?
These guys were having similiar problems (mostly, it appears, with the journaling file being locked, maybe TortoiseSVN interactions ... check the referenced articles).
They came up with a set of recommendations (correct directories, changing journaling types from delete to persist, etc). http://sqlite.phxsoftware.com/forums/p/689/5445.aspx#5445
The journal mode options are discussed here: http://www.sqlite.org/pragma.html . You could try TRUNCATE.
Is there a stack trace during the exception into SQL Lite?
You indicate you "batch my commits at a reasonable interval". What is the interval?
I would always use a Connection, Transaction and Command in a using clause. In your first code listing you did, but your third (creating the tables) you didn't. I suggest you do that too, because (who knows?) maybe the commands that create the table somehow continue to lock the file. Long shot... but worth a shot?
Do you have Google Desktop Search (or another file indexer) running? As previously mentioned, Sysinternals Process Monitor can help you track it down.
Also, what is the filename of the database? From PerformanceTuningWindows:
Be VERY, VERY careful what you name your database, especially the extension
For example, if you give all your databases the extension .sdb (SQLite Database, nice name hey? I thought so when I choose it anyway...) you discover that the SDB extension is already associated with APPFIX PACKAGES.
Now, here is the cute part, APPFIX is an executable/package that Windows XP recognizes, and it will, (emphasis mine) ADD THE DATABASE TO THE SYSTEM RESTORE FUNCTIONALITY
This means, stay with me here, every time you write ANYTHING to the database, the Windows XP system thinks a bloody executable has changed and copies your ENTIRE 800 meg database to the system restore directory....
I recommend something like DB or DAT.
While the lock is reported on the COMMIT, the lock is on the INSERT/UPDATE command. Check for record locks not being released earlier in your code.