SQL server recovery mechanism

SQL server recovery mechanism - c#

I have a service application in C# which queries data from one database and inserts it into another SQL database.
Sometimes, the MSSQLSERVER service crashed for unknown reason and my application will crash as well. I want to do a SQL recovery mechanism that where I check to make sure the sqlconnection state is fine before i write to the database but how i do that?
I tried stopping MSSQLSERVER service and sqlconnection.State is always open even when the MSSQLSERVER service is stopped.

First: Fix your real problem. SQL Server should be very, very stable.
Second: consider using MSMQ (or SQL Service Broker) on both the client application and server to queue updates.

The general strategy of checking the connection state before calling a SQL command fundamentally won't work. What happens if the service crashes after your connection check, but before you call the SQL command?
You probably will need to figure out what exception is thrown when the database is down and recover from that exception at the appropriate layer of code.

I think that the approach you chose is not very good.
If your application is some kind of scheduled job, let it crash. No database - no work can be done. This is ok to crash in this case. Next time it runs and db is up it will do its thing. You can also implement retries.
If your application is a windows service inside and some kind of scheduled timer, you just make sure that your service doesn't crash by handling SqlExcpetion. Retry again until server is up.
Also, you might want to use distributed transactions. To guarantee integrity of the copy procedure, but whether you need it or not, depends on the requirements.
[Edit] In response to retry question.
var attemptNumber = 0;
while (true)
{
try
{
using (var connection = new SqlConnection())
{
connection.Open();
// do the job
}
break;
}
catch (SqlException exception)
{
// log exception
attemptNumber++;
if (attemptNumber > 3)
throw; // let it crash
}
}

Related

Proper way to deal with database connectivity issue

I getting below error on trying to connect with the database :
A network-related or instance-specific error occurred while
establishing a connection to SQL Server. The server was not found or
was not accessible. Verify that the instance name is correct and that
SQL Server is configured to allow remote connections. (provider: Named
Pipes Provider, error: 40 - Could not open a connection to SQL Server)
Now sometimes i get this error and sometimes i dont so for eg:When i run my program for the first time,it open connection successfully and when i run for the second time i get this error and the next moment when i run my program again then i dont get error.
When i try to connect to same database server through SSMS then i am able to connect successfully but i am getting this network issue in my program only.
Database is not in my LOCAL.Its on AZURE.
I dont get this error with my local database.
Code :
public class AddOperation
{
public void Start()
{
using (var processor = new MyProcessor())
{
for (int i = 0; i < 2; i++)
{
if(i==0)
{
var connection = new SqlConnection("Connection string 1");
processor.Process(connection);
}
else
{
var connection = new SqlConnection("Connection string 2");
processor.Process(connection);
}
}
}
}
}
public class MyProcessor : IDisposable
{
public void Process(DbConnection cn)
{
using (var cmd = cn.CreateCommand())
{
cmd.CommandText = "query";
cmd.CommandTimeout = 1800;
cn.Open();//Sometimes work sometimes dont
using (var reader = cmd.ExecuteReader(CommandBehavior.CloseConnection))
{
//code
}
}
}
}
So i am confused with 2 things :
1) ConnectionTimeout : Whether i should increase connectiontimeout and will this solve my unusual connection problem ?
2) Retry Attempt Policy : Should i implement retry connection mechanism like below :
public static void OpenConnection(DbConnection cn, int maxAttempts = 1)
{
int attempts = 0;
while (true)
{
try
{
cn.Open();
return;
}
catch
{
attempts++;
if (attempts >= maxAttempts) throw;
}
}
}
I am confused with this 2 above options.
Can anybody please suggest me what would be the better way to deal with this problem?

Use a new version of .NET (4.6.1 or later) and then take advantage of the built-in resiliency features:
ConnectRetryCount, ConnectRetryInterval and Connection Timeout.
See the for more info: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-connectivity-issues#net-sqlconnection-parameters-for-connection-retry

All applications that communicate with remote service are sensitive to transient faults.
As mentioned in other answers, if your client program connects to SQL Database by using the .NET Framework class System.Data.SqlClient.SqlConnection, use .NET 4.6.1 or later (or .NET Core) so that you can use its connection retry feature.
When you build the connection string for your SqlConnection object, coordinate the values among the following parameters:
ConnectRetryCount: Default is 1. Range is 0 through 255.
ConnectRetryInterval: Default is 1 second. Range is 1 through 60.
Connection Timeout: Default is 15 seconds. Range is 0 through 2147483647.
Specifically, your chosen values should make the following equality true:
Connection Timeout = ConnectRetryCount * ConnectionRetryInterval
Now, Coming to option 2, when you app has custom retry logic, it will increase total retry times - for each custom retry it will try for ConnectRetryCount times. e.g. if ConnectRetryCount = 3 and custom retry = 5, it will attempt 15 tries. You might not need that many retries.
If you only consider custom retry vs Connection Timeout:
Connection Timeout occurs usually due to lossy network - network with higher packet losses (e.g. cellular or weak WiFi) or high traffic load. It's up to you choose best strategy of using among them.
Below guidelines would be helpful to troubleshoot transient errors:
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-connectivity-issues
https://learn.microsoft.com/en-in/azure/architecture/best-practices/transient-faults

As you can read here a retry logic is recommended even for a SQL Server installed on an Azure VM (IaaS).
FAULT HANDLING: Your application code includes retry logic and
transient fault handling? Including proper retry logic and transient
fault handling remediation in the code should be a universal best
practice, both on-premises and in the cloud, either IaaS or PaaS. If
this characteristic is missing, application problems may raise on both
Azure SQLDB and SQL Server in Azure VM, but in this scenario the
latter is recommended over the former.
An incremental retry logic is recommended.
There are two basic approaches to instantiating the objects from the application block that your application requires. In the first approach, you can explicitly instantiate all the objects in code, as shown in the following code snippet:
var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2));
var retryPolicy =
new RetryPolicy<SqlDatabaseTransientErrorDetectionStrategy>(retryStrategy);
In the second approach, you can instantiate and configure the objects from configuration data as shown in the following code snippet:
// Load policies from the configuration file.
// SystemConfigurationSource is defined in
// Microsoft.Practices.EnterpriseLibrary.Common.
using (var config = new SystemConfigurationSource())
{
var settings = RetryPolicyConfigurationSettings.GetRetryPolicySettings(config);
// Initialize the RetryPolicyFactory with a RetryManager built from the
// settings in the configuration file.
RetryPolicyFactory.SetRetryManager(settings.BuildRetryManager());
var retryPolicy = RetryPolicyFactory.GetRetryPolicy
<SqlDatabaseTransientErrorDetectionStrategy>("Incremental Retry Strategy");
...
// Use the policy to handle the retries of an operation.
}
For more information, please visit this documentation.

Consider using Polly.
You could use a simple piece of code like -
RetryPolicy retryPolicy = Policy.Handle<Exception>()
.WaitAndRetry(3, retryAttempt =>
TimeSpan.FromSeconds(retryAttempt));
var result = retryPolicy.Execute(() => someClass.DoSomething());
This will retry the request up to three times.

It is completely possible that a connection can drop. "Fallacies of Distributed Computing" :).
It could be network connectivity issue. Could be at any end.
I would recommend: (assuming firewall is enabled for your machine on Azure)
Ping the server and see if there is any loss.
ping (server).database.windows.net
tracert
telnet can also be your friend.
The above three should help you to pin-point where the problem is.
I think your retry logic is fine.
Regarding you question
Increase Timeout
Only if you are sure that your query will take long time. If for a simple insert you have to increase timeout problem could be network connectivity.
Retry Logic
As already posted, it's now part of framework which you can utilise or the one you created should be fine. Ideally, it's good to have retry logic, even if you are sure about connectivity and speed. Just in case :)

You should increase the timeout because the time taken to establish a connection to a SQL server has many steps, hence it takes some time when it goes for establishing the connection for the first time. After establishment of the connection, the connection is pooled in the memory for re-use in subsequent queries.
Please refer below link for more detailed understanding on connection-pooling:
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling
As you mentioned that this error generates sometimes, and not always, so there might be some network and connectivity factors for that. The default timeout for SQL connection is 15 seconds. I think if you change it to 30 seconds, it should work.

Are you using SQL Express or Workgroup Edition? If so, it's possible that the server is too busy to respond.
To rule out network problems, from a command prompt, do a PING -t SqlServername. Does every ping come back, or are some lost? This can be an indicator of network interruptions that might also cause this error, like a faulty switch. If they are all lost then (given that your database connection sometimes works) it is likely that ping is being blocked by a firewall somewhere: it may help diagnosis if you find that block and temporarily unblock it.
The error message indicates that you are using Named pipes. Are you using Named pipes on purpose? For most scenarios (including Azure database) I'd suggest enabling TCP/IP and disabling Named Pipes, in SQL Server Configuration Manager.
Depending how 'far away' your Azure database is, the delays because of routers and firewalls sometimes upset Kerberos and/or related timings. You can overcome this by using the port in the connection string to avoid the roundtrip to port 1434 to enumerate the instance. I assume you're already using a FQDN. For example: server\instance,port

NLog within TransactionScope causing Transaction to be invalid

I am having a problem in a production environment that I am not getting locally.
I am running some LINQ to SQL code within a TransactionScope as below:
using (var scope = new TransactionScope())
{
uploadRepository.SubmitChanges();
result = SubmitFileResult.Succeed();
ScanForNewData(upload);
scope.Complete();
}
ScanForNewData() calls GetSubmittedData(). If an exception occurs in GetSubmitted() we use Nlog to write the error to file, database and also send an email:
catch (Exception ex)
{
//MT - having to comment this out beause it is causing a problem with transactions on theproduction server
logger.ErrorException(String.Format("Error reading txt file {0} into correct format", upload.DocumentStore.FileName), ex);
return new UploadGetSubmittedDataResult { Exception = ex, Success = false, Message = String.Format("Error reading txt file {0} into correct format", upload.DocumentStore.FileName) };
}
In ScanForNewData we then call repository.SubmitChanges().This then causes:
The operation is not valid for the state of the transaction. System.Transactions.TransactionException TransactionException System.Transactions.TransactionException: The operation is not valid for the state of the transaction.
The best idea I have come up with is that in production this code is running on a web server and calling a separate database server. Both the DataContext and Nlog have the same connectionstring configuration and Sql user, but maybe because the server is remote (whereas locally I am using integrated security) something strange is happening.
Any idea what happens to the transaction in this scenario?
Update - I just tried it with SQL user locally and it still works fine. Must be something to do with the production set up...
Another update - I tell a lie. On the dev maching the Nlog database record is never written, the email is sent, and the TransactionException does not happen.

Hard to guess what is the problem without a full stack trace of the exception, it may depend on multiple things.
For instance, I'm assuming NLog opens a new connection to the db my himself, and that will probably cause the transaction to be promoted to a distributed one, and the Distributed Transaction Coordinator will kick in. This can cause the asymmetry between the behavior of your application in production and locally.
You may be breaking the transaction with some operation inside it, like some unhandled exception or illegal accessing of some data.
Provide full stack trace and more code involved for a deeper analysis.

Without knowing what the inner exceptions off of your TransactionException is it will be difficult to resolve but here is a thought:
If you refactor your code to have the logging occur after the using block around the transaction scope has ended you will likely avoid the issue you are having since the transaction scope will be ended and DTC will roll back the transaction.
I have used and seen this pattern in the past (don't log until after the transaction is ended and rolled back) when dealing with transactions and it has worked well.
Doing logging on a separate database is always advisable to avoid issues like this as well. If you did this the issue would also be avoided.

Have a look at this..seems to be a bug with Nlog.
https://groups.google.com/forum/#!msg/nlog-users/I5UR-bISlTA/6RPtOZhR4NoJ
suggested solution is to use async target for Db logging.

Check if connection to database can be established

I need to write a C# program that is scheduled to run everyday and runs a series of tests to ensure everything's running good. It checks to see for net connection, server connection, database connection etc.
The part I'm confused about it checking connection to database. Should I establish a connection with the db and then disconnect? Or is there a way to just poll the database without having to pass credentials (don't actually need to log in).

You could try to connect to the database using invalid credentials and then examine the error code to see if you got an "access denied" error as opposed to "connection failed" or something else. Whether this is reliably doable depends on your database server of choice, which you failed to mention.
The easiest way would be to just use the correct credentials, though.

MySql Offers a Connection.Ping() method that returns true or false even if you haven't called Connection.Open() before.
However, I prefer to not perform such kind of preflight checks but rather handle exceptions if something goes wrong (Even if Connection.Ping() returns true you can't be sure that the server is still available during the next command)

It really depends on what do you exactly mean by "checking connection to the database". Problems can arise at many levels. For example in the case of SQL Server, if you read this article, there are many ways something can fail: http://support.microsoft.com/kb/827422/en
The best is really to connect, do a SELECT 1 or alike, and check the return.

Im no expert but i assume you mean MS Sql server? Database server could be more or less any thing...
If the program is running on the same server or has access to it you could check if the database service is up and running but im not 100% sure if that gives you the info you need..
Edit:
You could also try to use the "SqlDataSourceEnumerator" to find the instance

You could use the ServiceController class in System.Service process to check that the DB service is running.
But then you could still get an instance where the service is running but the DB is not accepting connections. So, for me the only sure way would be to connect and run a simple query

I don't see why you couldn't do something like this? Obviously this isn't as specific, but you could fill in the blanks.
using (SqlConnection con = new SqlConnection(connectionString))
{
try
{
con.Open();
}
catch (Exception)
{
// Cant Connect
}
}

We can either check if the connection is open or not.
if (conn.State == ConnectionState.Open)
{
return true;
}
else {
return false;
}
Don't forget using System.Data;

SQL Exception: "Impersonate Session Security Context" cannot be called in this batch because a simultaneous batch has called it

When opening a connection to SQL Server 2005 from our web app, we occasionally see this error:
"Impersonate Session Security Context" cannot be called in this batch because a simultaneous batch has called it.
We use MARS and connection pooling.
The exception originates from the following piece of code:
protected SqlConnection Open()
{
SqlConnection connection = new SqlConnection();
connection.ConnectionString = m_ConnectionString;
if (connection != null)
{
try
{
connection.Open();
if (m_ExecuteAsUserName != null)
{
string sql = Format("EXECUTE AS LOGIN = {0};", m_ExecuteAsUserName);
ExecuteCommand(connection, sql);
}
}
catch (Exception exception)
{
connection.Close();
connection = null;
}
}
return connection;
}
I found an MS Connect article which suggests that the error is caused when a previous command has not yet terminated before the EXECUTE AS LOGIN command is sent. Yet how can this be if the connection has only just been opened?
Could this be something to do with connection pooling interacting strangely with MARS?
UPDATE: For the short-term we have implemented a workaround by clearing out the connection pool whenever this happens, to get rid of the bad connection, as it otherwise keeps getting handed back to various users. (This now happens a 5-10 times a day with only a small number of simultaneous users, so it is fairly annoying.) But if anyone has any further ideas, we are still looking out for a real solution...

I would say it's MARS rather then pooling
From "Using Multiple Active Result Sets (MARS)"
Applications can have multiple default
result sets open and can interleave
reading from them.
Applications can
execute other statements (for example,
INSERT, UPDATE, DELETE, and stored
procedure calls) while default result
sets are open.
Connection pooling in it's basic form means the connection open/close overhead is minimised, but any connection (until MARS) has one thing going on at any one time. Pooling has been around for some time and just works out of the box.
MARS (I've not used it BTW) introduces overlapping "stuff" going on for any single connection. So it's probably MARS rather than connection pooling is the bigger culprit of the 2.
From "Extending Database Impersonation by Using EXECUTE AS"
When impersonating a principal by
using the EXECUTE AS LOGIN statement,
or within a server-scoped module by
using the EXECUTE AS clause, the scope
of the impersonation is server-wide.
This may explain why MARS is causing it: the same principal in 2 session both running EXECUTE AS.
There may be something in that article of use, or try this:
IF ORIGINAL_LOGIN() = SUSER_SNAME() EXECUTE AS LOGIN = {0};
On reflection and after reading for this answer, I've not convinced that trying to change execution context for each session (MARS) in one connections is a good idea...

Don't blame connection pooling - MARS is quite notorious for wreaking a havoc. It's not entirely it's blame but it's kind of half and half. The key thing to remember is that MARS is designed, and only works with "normal" DB use (meaning, regular CRUD stuff no admin batches). Any commands that have a wide effect on DB engine can trip MARS even if it's just one connection and single threaded (like running a setup batch to create tables or a nested transaction).
Having said that, one can easily just blame MARS, but it works perfecly fine for normal CRUD scenarios which are like 99% (and things with low efficiencey like ORM-s and LINQ depend on it for life). Meaning that it's important for people to learn that if they want to hack SQL through a connection they can't use MARS. For example I had a setup code that was creating whole DB from scratch, beceuse it's very convenient for deployment, but it was sharing connection sting with web service it was deploying - oops :-) Took me a few days of digging to learn my lesson. So I just maintain the separation of concerns (which is always good) and problems went away.

Have you tried to use a revert at the end of your sql statement?
http://msdn.microsoft.com/en-us/library/ms178632.aspx
I always do this to just make sure the current context is back to normal.

ASP.NET SqlConnection Timeout issue

I have run into a frustrating issue which I originally thought was a connection leak but that does not seem to be the case. The secnario is this: the data access for this application is using the Enterprise Libraries (v4) from Microsoft. All data access calls are wrapped in using statements such as
using (DbCommand dbCommand = db.GetStoredProcCommand("sproc"))
{
db.AddInParameter(dbCommand, "MaxReturn", DbType.Int32, MaxReturn);
...more code
}
Now the index of this application makes 8 calls to the database to load everything and I can bring the application to its knees by refreshing the index about 15 times. It seems that when the the database reaches 113 connections is when I recieve this error. Here is what makes this weird:
I have run similar code with the entlib on high traffic sites and have NEVER had this problem ever.
If I kill all the connections to the database and get the production application back up and running everytime I refresh the application I can run this SQL
SELECT DB_NAME(dbid) as 'Database Name',
COUNT(dbid) as 'Total Connections'
FROM sys.sysprocesses WITH (nolock)
WHERE dbid > 0
GROUP BY dbid
I can see the number of connections actively increasing with each page refresh. Running the same code on my local box with the same connection string does not cause this problem. Further if the production website is down I can fire up the site via Visual Studio and run it fine and the only difference between the two is that the production site has Windows authentication turned on and my local copy doesn't. Turning windows authentication off seems to have no effect on the server.
I have absolutely no clue what is causing this or why the connections are not being disposed of in SQL Server. The EntLib objects do no explose .Close() methods for anything so I can't explictily close the object.
Any thoughts?
Thanks!
Edit
Wow I just noticed that I never actually posted the error message. Oy. The actual connection error is: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.

Check that the stored procedure you are executing is not running into a row or table lock. Also if you can possibly try to deploy in another server and check if the application would crawl again.
Also try to increase the maximum allowed connections for your SQL server.

think the “Timeout Expired” error is a general issue and may have seveal causes. Increasing the TimeOut can solve some of them but not all.
You may also refer to the following links to troubleshoot and fix the error
http://techielion.blogspot.com/2007/01/error-timeout-expired-timeout-period.html

Could it be a configuration issue on the server?
How do you make a connection to the database on the production server?
That might be an area worth looking into.

While I don't know the answer I can suggest that for some reason connections are not being closed by you application when run in production. (Stating the obvious)
You might want examine your network configuration between the web server and sql server. High latency networks can cause connections not being closed in time.
Also it might help looking at the performance counters listed in the end of the following msdn article:
http://msdn.microsoft.com/en-us/library/8xx3tyca%28VS.71%29.aspx
Finally, if nothing else helps, I'd get debugger and Enterprise Library source code on production and debug your code inside the enterprise library to find out why connections are not being closed.
Silly question are you properly closing your DataReader? If not this could be the problem and the difference in behaviour between dev and prod can be caused by different garbage collection patterns.

I would disable connection pooling and try to suppress it (heh). Just add ";Pooling=false" to your connection string.
Or, perhaps you could add something like the following 'cleanup' code to your page (which closes any connection left open when the page unloads) - right in the 'using' clause:
System.Web.UI.Page page = HttpContext.Current.Handler as System.Web.UI.Page;
if (page != null) {
page.Unload += (EventHandler)delegate(object s, EventArgs e) {
try {
dbCommand.Connection.Close();
} catch (Exception) {
} finally {
result = null;
}
};
}
Also, make sure you've enabled the 'shared memory' protocoll if your SQL server and IIS are on the same machine (a real performance booster)!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.