Correct Implementation of Transient Fault Handling (Azure)

Correct Implementation of Transient Fault Handling (Azure) - c#

For the past day or so I've been trying to implement Transient Fault Handling on an Azure SQL database. Although I have a working connection to the DB, I'm not convinced that it's handling the transient faults as expected.
So far my approach has involved
public static void SetRetryStratPol()
{
const string defaultRetryStrategyName = "default";
var strategy = new Incremental(defaultRetryStrategyName, 3, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(2));
var strategies = new List<RetryStrategy> { strategy };
var manager = new RetryManager(strategies, defaultRetryStrategyName);
RetryManager.SetDefault(manager);
retryPolicy = new RetryPolicy<SqlDatabaseTransientErrorDetectionStrategy>(strategy);
retryPolicy.Retrying += (obj, eventArgs) =>
{
var msg = String.Format("Retrying, CurrentRetryCount = {0} , Delay = {1}, Exception = {2}", eventArgs.CurrentRetryCount, eventArgs.Delay, eventArgs.LastException.Message);
System.Diagnostics.Debug.WriteLine(msg);
};
}
I call that method from the Global.asax's, Application_Start(). [retryPolicy is a global static variable on a static class which also includes this next method.]
I then have a method
public static ReliableSqlConnection GetReliableConnection()
{
var conn = new ReliableSqlConnection("Server=...,1433;Database=...;User ID=...;Password=...;Trusted_Connection=False;Encrypt=True;Connection Timeout=30;", retryPolicy);
conn.Open();
return conn;
}
I then use this method
using (var conn = GetReliableConnection())
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT COUNT(*) FROM ReliabilityTest";
result = (int) cmd.ExecuteScalarWithRetry();
return View(result);
}
So far, this works. Then, in order to test the retry policy, I've tried using a wrong username (a suggestion from here).
But when I step through that code the cursor immediately jumps to my catch statement with
Login failed for user '[my username]'.
I would have expected that this exception only be caught after several seconds, but no delay is incurred at all.
Furthermore, I've also tried with the Entity Framework, following exactly this post, but get the same result.
What have I missed? Is there a configuration step or am I incorrectly inducing a transient fault?

Transient Fault Handling block is for handling transient errors. Failed login because of incorrect username/password is certainly not one of them. From this web page: http://msdn.microsoft.com/en-us/library/dn440719%28v=pandp.60%29.aspx:
What Are Transient Faults?
When an application uses a service, errors can occur because of
temporary conditions such as intermittent service,
infrastructure-level faults, network issues, or explicit throttling by
the service; these types of error occur more frequently with
cloud-based services, but can also occur in on-premises solutions. If
you retry the operation a short time later (maybe only a few
milliseconds later) the operation may succeed. These types of error
conditions are referred to as transient faults. Transient faults
typically occur very infrequently, and in most cases, only a few
retries are necessary for the operation to succeed.
You may want to check the source code for this application block (http://topaz.codeplex.com/) and see what error codes returned from SQL databases are considered transient errors and are thus retried.
You can always extend the functionality and include failed login as one of the transient error to test your code.
UPDATE
Do take a look at the source code here: http://topaz.codeplex.com/SourceControl/latest#source/Source/TransientFaultHandling.Data/SqlDatabaseTransientErrorDetectionStrategy.cs. This is where the retry magic happens. What you could do is create a class (let's call it CustomSqlDatabaseTransientErrorDetectionStrategy) and copy the entire code from the link to this class). Then for testing purpose, you can add login failed scenario as one of the transient error and use this class in your application instead of SqlDatabaseTransientErrorDetectionStrategy.

Related

Getting SignalR connection startup error stack

How can I get the full stack of an exception that's happening in an otherwise functioning Web application during SignalR connection setup?
Background
I'm part of a team maintaining a Web application with C# clients that uses an extremely basic SignalR setup (version 2.2) to effectively deliver push notifications about progress during long-running server processes. Like, out-of-the-box,
app.MapSignalR();
HubConnection connection = new HubConnection(_applicationConfiguration.ApiBaseUri + "/signalr");
await connection.Start();
basic. Some of our clients run on remoting services and periodically run into an issue where the other functions of the Web application work fine, but the client code that calls connection.Start() returns a 500 internal server error with no further information. They can address it by refreshing the remote connection but this is less than ideal, so I'm trying to get some information about where in the connection setup process this error is happening.
Problem
Following the information about setting up error handling for SignalR on MSDN, I've tried to simulate the problem by inserting the following pipeline module into the GlobalHost.HubPipeline:
public class HubErrorHandlingModule : HubPipelineModule
{
public override Func<IHub, Task> BuildConnect(Func<IHub, Task> connect)
{
throw new InvalidOperationException("Testing Connection Exceptions");
}
protected override void OnIncomingError(ExceptionContext exceptionContext,
IHubIncomingInvokerContext invokerContext)
{
// some logging happens here
base.OnIncomingError(exceptionContext, invokerContext);
}
}
and it kind of works, in that I can see the exception get thrown in the pipeline code, and my test C# client is also seeing a 500 internal server error with no further information.
But it also kind of doesn't work, in that I've dropped in breakpoints and the OnIncomingError code is never hit. That sort of makes sense, since it's not code in any Hub method that's causing the exception, but I don't know where this exception is happening; it could be anywhere during the client call to connection.Start.
I've also tried passing in an alternate HubConfiguration with EnableDetailedErrors = true but that doesn't seem to improve anything.
It doesn't really matter where I get the full stack trace, since I control both the server and the client code, but in order to understand their problem I need to see the full trace somewhere.
What I've Tried And Why It Doesn't Work
app.MapSignalR(new HubConfiguration { EnableDetailedErrors = true });
I think this is meant to show detailed errors from Hub processing, not connection handshaking? Supposedly it's meant to send a message tagged as an error that might be traced by the connection even if it's never bubbled up to any consumer. Unfortunately...
var writer = new StreamWriter("C:\\Logs\\ClientLog.txt");
writer.AutoFlush = true;
connection.TraceLevel = TraceLevels.All;
connection.TraceWriter = writer;
This does trace successful communication to the SignalR backend, once I remove the deliberate pipeline error. But when I set it back up, all I see is a failed attempt to establish the connection and a 500 internal server error. No trace.
<system.diagnostics>
<sharedListeners ... >
<switches ...>
<sources ...>
<trace autoflush="true" />
</system.diagnostics>
Set up both after the MSDN trace details and this commentary on GitHub. Neither set of details works. As I play around by moving the pipeline exception to different pipeline events, I can sometimes see a stack trace show up in the SignalR.HubDispatcher source mentioned only in the GitHub details, but it happens when I throw the exception after the connection's been established and what arrives at the client side is a different error than just a 500 internal server error, so that's probably happening too late to be whatever's going wrong at the client installation.

In my case I have to put the SignalR.cs in my root path.
Then in the view I include the script:
<script src="~/signalr/hubs"></script>
This is what my SignalR.cs looks like:
public class NotificationHub : Hub
{
public void SendUpdateNotification(string message)
{
// message = "show" / "hide"
if (message.Equals("show"))
Config._MaintenanceMode = true;
else
Config._MaintenanceMode = false;
// Call the broadcastUpdate method to update clients.
Clients.All.broadcastUpdate(message);
}
}

To handle errors that SignalR raises, you can add a handler for the error event on the connection object
connection.Error += ex => Console.WriteLine("SignalR error: {0}", ex.StackTrace);
To handle errors from method invocations, wrap the code in a try-catch block.
HubConnection connection = new HubConnection(_applicationConfiguration.ApiBaseUri + "/signalr");
try
{
await connection.Start();
}
catch (Exception ex)
{
Console.WriteLine("Error " + ex);
}
To enable detailed error messages for troubleshooting purposes,
var hubConfiguration = new HubConfiguration();
hubConfiguration.EnableDetailedErrors = true;
App.MapSignalR(hubConfiguration);
In your code the hub pipeline module I do not see you are logging/printing the error
Console.WriteLine("Exception " + exceptionContext.Error.Message);
base.OnIncomingError(exceptionContext, invokerContext);
and now hook up the custom HubPipelineModule we've created, this is achieved in the startup class
public partial class Startup
{
public void Configuration(IAppBuilder app)
{
GlobalHost.HubPipeline.AddModule(new HubErrorHandlingModule());
app.MapSignalR();
}
}
References:
SignalR Notify user about disconnections
SingalR How to Handle Errors
SignalR 500 Internal Server Error

MassTransit Activity Fault with parameters

I am currently using Masstransit in with the Courier pattern.
I´ve set up an Activity which may fail, and I want to be able to subscribe to this failure and act accordingly.
My problem is, even though I can subscribe to the failure, and even see the exception that caused the failure, I am unable to pass any arguments to it.
For testing purposes, supose I have the following activity:
public class MyActivity : ExecuteActivity<MyMessage>
{
public Task<ExecutionResult> Execute(ExecuteContext<MyMessage> context)
{
try
{
// .... some code
throw new FaultException<RegistrationRefusedData>(
new RegistrationRefusedData(RegistrationRefusedReason.ItemUnavailable));
// .... some code
}
catch (Exception ex)
{
return Task.FromResult(context.Faulted(ex));
}
}
}
The problem is in the reason (RegistrationRefusedReason) I am passing as a argument of the exception. If I subscribe a RoutingSlipActivityFaulted consumer, I can almost get all the information I need:
public class ActivityFaultedConsumer : IMessageConsumer<RoutingSlipActivityFaulted>
{
public void Consume(RoutingSlipActivityFaulted message)
{
string exceptionMessage = message.ExceptionInfo.Message; // OK
string messageType = message.ExceptionInfo.ExceptionType; // OK
RegistrationRefusedReason reason = ??????;
}
}
I feel like I am missing something important here, (maybe misusing the pattern?).
Is there any other way to get parameters from a faulted activity ?

So, the case you're describing isn't a Fault. It's a failure to meet a business condition. In this case, you wouldn't want to retry the transaction, you'd want to terminate it. To notify the initiator of the routing slip, you'd Publish a business event signifying that the transaction was not completed due to the business condition.
For instance, in your case, you may do something like:
context.Publish<RegistrationRefused>(new {
CustomerId = xxx,
ItemId = xxxx,
Reason = "Item was unavailable"
});
context.Terminate();
This would terminate the routing slip (the subsequent activities would not be executed), and produce a RoutingSlipTerminated event.
That's the proper way to end a routing slip due to a business condition or rule. Exceptions are for exceptional behavior only, since you'll likely want to retry them to handle the failure.

Kinda raising this from the dead, but I really haven't found a neat solution to this.
Here is my scenario:
I want to implement a request/response, but I want to wait for the execution of a routing slip.
As Fabio, I want to compensate for any previous activities and I want to pass data back to the request client in case of a fault.
Conveniently, Chris provided a RoutingSlipRequestProxy/RoutingSlipResponseProxy which does just that. I've found 2 approaches, but both of them seem very hacky to me.
Approach 1:
The request client waits for ISimpleResponse or ISimpleFailResponse.
RoutingSlipRequestProxy sets the ResponseAddress in the variables.
The activity sends ISimpleFailResponse to the ResponseAddress.
The client waits for either response
The RoutingSlipResponseProxy sends back Fault<ISimpleResponse> to the ResponseAddress.
From what I see the hackiness comes from step 4/5 and their order. I am pretty sure it works, but it could easily stop working in case messages are consumed out-of-order.
Sample code: https://github.com/steliyan/Sample-RequestResponse/commit/3fcb196804d9db48617a49c7a8f8c276b47b03ef
Approach 2:
The request client waits for ISimpleResponse or ISimpleFailResponse.
The activity calls ReviseItirery with the variables and adds a faulty activity.*
The faulty activity faults
The RoutingSlipResponseProxy2 get the ValidationErrors and sends back ISimpleFailResponse to the ResponseAddress.
* The activity needs to be Activity and not ExecuteActivity because there is no overload of ReviseItinerary with variables but with no activity log.
This approach seems hacky because an additional fault activity is added to the itinerary, just to be able to add a variable to the routing slip.
Sample code: https://github.com/steliyan/Sample-RequestResponse/commit/e9644fa683255f2bda8ae33d8add742f6ffe3817
Conclusion:
Looking at MassTransit code, it doesn't seem like a problem to add a FaultedWithVariables overload. However, I think Chris' point is that there should be a better way to design the workflow, but I am not sure about that.

RetryPolicy in Enterprise Library 5 does not work

I am working on a small app to translate and import a large amount of data from one database to another. To do this, I'm using Entity Framework and some custom extensions to commit a page of items at a time, in batches of 1000 or so. Since this can take a while, I was also interested in making sure the whole thing wouldn't grind to a halt if there is a hiccup in the connection while it's running.
I chose the Transient Fault Handling Application block, part of Enterprise Library 5.0, following this article (see Case 2: Retry Policy With Transaction Scope). Here is an example of my implementation in the form of an ObjectContext extension, which simply adds objects to the context and tries to save them, using a Retry Policy focused on Sql Azure stuff:
public static void AddObjectsAndSave<T>(this ObjectContext context, IEnumerable<T> objects)
where T : EntityObject
{
if(!objects.Any())
return;
var policy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>
(10, TimeSpan.FromSeconds(10));
var tso = new TransactionOptions();
tso.IsolationLevel = IsolationLevel.ReadCommitted;
var name = context.GetTableName<T>();
foreach(var item in objects)
context.AddObject(name, item);
policy.ExecuteAction(() =>
{
using(TransactionScope ts = new TransactionScope(TransactionScopeOption.Required, tso))
{
context.SaveChanges();
ts.Complete();
}
});
}
The code works great, until I actually test the Retry Policy by pausing my local instance of Sql Server while it's running. It almost immediately poops, which is odd. You can see that I've got the policy configured to try again in ten second intervals; either it is ignoring the interval or failing to catch the error. I suspect the latter, but I'm new to this so I don't really know.

I suspect that the SqlAzureTransientErrorDetectionStrategy does not include the error your are simulating. This policy implements specific errors thrown by SQL Azure. Look at this code to find out which errors are implemented by this policy: http://code.msdn.microsoft.com/Reliable-Retry-Aware-BCP-a5ae8e40/sourcecode?fileId=22213&pathId=1098196556
To handle the error you are trying to catch, you could implement your own strategy by implementing the ITransientErrorDetectionStrategy interface.

Exception escapes from workflow despite TryCatch activity

I have a workflow inside a Windows Service that is a loop that performs work periodically. The work is done inside a TryCatch activity. The Try property is a TransactionScope activity that wraps some custom activities that read and update a database. When the transaction fails, I would expect any exception that caused this to be caught by the TryCatch. However, my workflow aborts. The workflow I have is the following:
var wf = new While(true)
{
Body = new Sequence
{
Activities =
{
new TryCatch
{
Try = new TransactionScope
{
IsolationLevel = IsolationLevel.ReadCommitted,
Body = new Sequence
{
Activities = { ..custom database activities.. }
},
AbortInstanceOnTransactionFailure = false
},
Catches =
{
new Catch<Exception>
{
Action = new ActivityAction<Exception>
{
Argument = exception,
Handler = ..log error..
}
}
}
},
new Delay { Duration = new InArgument<TimeSpan>(duration) }
}
},
}
In my case, it's possible that the database is sometimes unavailable so obviously the transaction won't commit. What happens in this case is that the workflow aborts with the following exception:
System.OperationCanceledException: An error processing the current work item has caused the workflow to abort.
The inner exception is:
System.Transactions.TransactionException: The operation is not valid for the state of the transaction.
This makes sense because I have just switched off the database. However, why isn't this exception handled by my TryCatch activity?
EDIT 1: Some additional information. I run the workflow using the WorkflowApplication class. To better see what's going on, I specified the properties Aborted and OnUnhandledException. When the exception occurs, it goes directly to Aborted and OnUnhandledException is skipped (although this is clearly an unhandled exception).
EDIT 2: I enabled the debug log and this provides some additional insight. The 'custom database activities' successfully run to completion. The first event log entry that indicates that something is wrong is a Verbose level message: The runtime transaction has completed with the state 'Aborted'. Next I see an Information message: WorkflowInstance Id: 'dbd1ba5c-2d8a-428c-970d-21215d7e06d9' E2E Activity (not sure what this means). And the Information message after that is: Activity 'System.Activities.Statements.TransactionScope', DisplayName: 'Transaction for run immediately checks', InstanceId: '389' has completed in the 'Faulted' state.
After this message, I see that each parent (including the TryCatch activity) completes in the 'Faulted' state, ending with the abortion of my workflow.
EDIT 3: To be clear, everything works as expected when an exception occurs in any of the 'custom database activities'. The exception is caught and the workflow continues. It only goes wrong when the transaction can't commit at the end of the TransactionScope. See the following stacktrace that is logged from the Aborted callback:
at System.Transactions.TransactionStateInDoubt.Rollback(InternalTransaction tx, Exception e)
at System.Transactions.Transaction.Rollback(Exception e)
at System.Activities.Runtime.ActivityExecutor.CompleteTransactionWorkItem.HandleException(Exception exception)
If you follow the calls from TransactionScope.OnCompletion(...), eventually you will arrive at the ActivityExecutor class from the stacktrace.

Transactions commit asynchronously and after the fact. You can't react to a failure of the transaction to commit because of a problem at the resource manager level.
As you pointed out, you can deal with exceptions that occur in your activities. If you look at the tracking records for your workflow my guess is that you would see the TryCatch activity is closed prior to the transaction abort.
Many years ago when I was a program manager in the COM+ team I studied this issue because often people want a transactional component (or workflow) as in this case to be able to react to a transaction abort.
The async nature of the resolution of the transaction means that you simply cannot react to it in the component itself. The solution is to react in the caller which can then take some action.
The design assumption is that once a transaction has aborted, nothing about state aqcuired in the transaction can be safely used - it will all be discarded because the transaction is aborted.

So just to add to Ron's answer. Your only option here is to add the SqlWorkflowInstanceStore and drop a Persist activity just before the TransactionScope. When the transaction aborts the whole workflow will abort but the past saved state will still be in the persistence database and the workflow can be restarted from this previously saved state and execute the transaction again.

Wcf transaction

Is there a way to know in a wcf operation that a transaction has committed?
Ok, second attempt into being more specific.
I got a WCF service with an Operation with Transaction flow allow.
Now when a client call my wcf service it can have a transaction. But my service is also interested in the fact that the transaction on the client has succeeded. Because on my wcf service level, if everything went well. It has other things to do, but only if all transactions has been committed....
Is there like an event I can subscribe to or something?

It depends on the service itself and how you are handling transactions. If you are engaging in transactions in WCF through WS-Transaction then if the call to the client succeeds without exception, you can assume the transaction took place.
However, if this is in the context of another transaction, then you can't be sure if the transaction went through until the containing transaction is completed.
Even if you are using the TransactionScope class, if you have the service enabled to use transactions, you still have to take into account the encompassing transaction (if there is one).
You will have to provide more information about where the transaction is in relation to the call in order for a more complete answer.

Try using the operation behavior attribute above, in your operation that allows TransactionFlow:
[OperationBehavior(TransactionScopeRequired=true)]
If a transaction flows from the client, then the service will use it.

bool isTransactionComplete = true;
try
{
using (TransactionScope trScope = new TransactionScope(TransactionScopeOption.Required))
{
//some work
trScope.Complete();
}
}
catch (TransactionAbortedException e)
{
//Transaction holder got exception from some service
//and canceled transaction
isTransactionComplete = false;
}
catch//other exception
{
isTransactionComplete = false;
throw;
}
if (isTransactionComplete)
{
//Success
}
As casperOne wrote it depends on the settings. But you should be aware of complex transactions like
1) session service and simultaneous transactions for one service instance
2) transaction inside transaction

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.