I am using Google Cloud SQL with MySql v5.7 from C# .NET-core 2.2 and entity framework 6 application.
In my logs I can see the following exception from multiple locations in the code that I use the database from:
MySql.Data.MySqlClient.MySqlException (0x80004005): Connect Timeout expired. ---> System.OperationCanceledException: The operation was canceled.
at System.Threading.CancellationToken.ThrowOperationCanceledException()
at System.Threading.SemaphoreSlim.WaitUntilCountOrTimeoutAsync(TaskNode asyncWaiter, Int32 millisecondsTimeout, CancellationToken cancellationToken)
at MySqlConnector.Core.ConnectionPool.GetSessionAsync(MySqlConnection connection, IOBehavior ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\Core\ConnectionPool.cs:line 42
at MySql.Data.MySqlClient.MySqlConnection.CreateSessionAsync(Nullable`1 ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\MySql.Data.MySqlClient\MySqlConnection.cs:line 507
at MySql.Data.MySqlClient.MySqlConnection.CreateSessionAsync(Nullable`1 ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\MySql.Data.MySqlClient\MySqlConnection.cs:line 523
at MySql.Data.MySqlClient.MySqlConnection.OpenAsync(Nullable`1 ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\MySql.Data.MySqlClient\MySqlConnection.cs:line 232
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenDbConnectionAsync(Boolean errorsExpected, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected)
at Microsoft.EntityFrameworkCore.Storage.Internal.RelationalCommand.ExecuteAsync(IRelationalConnection connection, DbCommandMethod executeMethod, IReadOnlyDictionary`2 parameterValues, CancellationToken cancellationToken)
This happens temporarily for split second when there is a some load on the database(not very high, about 20% cpu of the database machine).
Configuring The Context:
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
if (!optionsBuilder.IsConfigured)
{
optionsBuilder.UseMySql(
new System.Net.NetworkCredential(string.Empty, ConfigurationManager.CacheCS).Password, builder =>
{
builder.EnableRetryOnFailure(15, TimeSpan.FromSeconds(30), null);
}
);
}
}
This sets up to 15 retries and maximum of 30 seconds between retries.
It seems from the log that the MySqlConnector does not retry on this specific error.
My Tries
Tried adding transient error numbers to the list of error numbers to add:
builder.EnableRetryOnFailure(15, TimeSpan.FromSeconds(30), MySqlErrorCodes.TransientErrors);
where MySqlErrorCodes.TransientErrors is defined as:
public enum MySqlErrorCode
{
// Too many connections
ConnectionCountError = 1040,
// Unable to open connection
UnableToConnectToHost = 1042,
// Lock wait timeout exceeded; try restarting transaction
LockWaitTimeout = 1205,
// Deadlock found when trying to get lock; try restarting transaction
LockDeadlock = 1213,
// Transaction branch was rolled back: deadlock was detected
XARBDeadlock = 1614
}
public class MySqlErrorCodes
{
static MySqlErrorCodes()
{
TransientErrors = new HashSet<int>()
{
(int)MySqlErrorCode.ConnectionCountError,
(int)MySqlErrorCode.UnableToConnectToHost,
(int)MySqlErrorCode.LockWaitTimeout,
(int)MySqlErrorCode.LockDeadlock,
(int)MySqlErrorCode.XARBDeadlock
};
}
public static HashSet<int> TransientErrors { get; private set; }
}
This didn't work.
Questions
How can I solve this issue?
Is there a way to make Entity Framework more resilient to such connectivity issues?
Edit
The issue occurs when I use this code to execute a raw sql command to call a stored procedure:
public static async Task<RelationalDataReader> ExecuteSqlQueryAsync(this DatabaseFacade databaseFacade,
string sql,
CancellationToken cancellationToken = default(CancellationToken),
params object[] parameters)
{
var concurrencyDetector = databaseFacade.GetService<IConcurrencyDetector>();
using (concurrencyDetector.EnterCriticalSection())
{
var rawSqlCommand = databaseFacade
.GetService<IRawSqlCommandBuilder>()
.Build(sql, parameters);
return await rawSqlCommand
.RelationalCommand
.ExecuteReaderAsync(
databaseFacade.GetService<IRelationalConnection>(),
parameterValues: rawSqlCommand.ParameterValues,
cancellationToken: cancellationToken);
}
}
...
using (var context = new CacheDbContext())
{
using (var reader = await context
.Database
.ExecuteSqlQueryAsync("CALL Counter_increment2(#p0, #p1, #p2)",
default(CancellationToken),
new object[] { id, counterType, value })
.ConfigureAwait(false)
)
{
reader.DbDataReader.Read();
if (!(reader.DbDataReader[0] is DBNull))
return Convert.ToInt32(reader.DbDataReader[0]);
else
{
Logger.Error($"Counter was not found! ('{id}, '{counterType}')");
return 1;
}
}
}
I think this may be why there are no retries for the connect timeout.
How can I retry this safely while not executing the same stored procedure twice?
Edit
These are the global variables:
SHOW GLOBAL VARIABLES LIKE '%timeout%'
connect_timeout 10
delayed_insert_timeout 300
have_statement_timeout YES
innodb_flush_log_at_timeout 1
innodb_lock_wait_timeout 50
innodb_rollback_on_timeout OFF
interactive_timeout 28800
lock_wait_timeout 31536000
net_read_timeout 30
net_write_timeout 60
rpl_semi_sync_master_async_notify_timeout 5000000
rpl_semi_sync_master_timeout 3000
rpl_stop_slave_timeout 31536000
slave_net_timeout 30
wait_timeout 28800
SHOW GLOBAL STATUS LIKE '%timeout%'
Ssl_default_timeout 7200
Ssl_session_cache_timeouts 0
SHOW GLOBAL STATUS LIKE '%uptime%'
Uptime 103415
Uptime_since_flush_status 103415
In addition to the connect time out issue I am also seeing the following log:
MySql.Data.MySqlClient.MySqlException (0x80004005): MySQL Server rejected client certificate ---> System.IO.IOException: Unable to read data from the transport connection: Broken pipe. ---> System.Net.Sockets.SocketException: Broken pipe
Which seems to be a related issue regarding the connection to the database.
Is it safe to retry on such exception?
The issue happened because of low value for maximumpoolsize in the connection string.
When there are multiple threads using the database and not enough connetions to handle all the requests this may cause Connect Timeout.
To fix this change this in the connection string to a higher value:
Max Pool Size={maxConnections};
Related
Getting an error when I try to send emails using MailKit. Using port 587 and host is a Barracuda server. Email still manages to send but breaks at client.Send(mail) and gives this error below.
Does anyone know what I'm missing or doing wrong? I can provide more code if needed.
Send mail code:
client.Connect(_SMTPOptions.Host, _SMTPOptions.Port, SecureSocketOptions.StartTls);
client.Timeout = 1000;
client.Send(mail);
client.Disconnect(true);
Add attachement code (if this matters):
bytes = bytes + document.Bytes.Length;
Stream stream = new MemoryStream(document.Bytes);
streamlist.Add(stream);
var attachment = new MimePart("application","pdf")
{
Content = new MimeContent(stream, ContentEncoding.Default),
ContentDisposition = new MimeKit.ContentDisposition(MimeKit.ContentDisposition.Attachment),
ContentTransferEncoding = ContentEncoding.Base64,
FileName = Path.GetFileName(order.OrderNumber + "_" + document.DocumentType + "_" + document.IntDocID + ".pdf")
};
builder.Attachments.Add(attachment);
Error Message:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
StackTrace:
StackTrace: at MailKit.Net.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.Stream.Read(Span 1 buffer)
at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory 1 buffer)
at System.Net.Security.SslStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at MailKit.Net.Smtp.SmtpStream.ReadAheadAsync(Boolean doAsync, CancellationToken cancellationToken)
at MailKit.Net.Smtp.SmtpStream.ReadResponseAsync(Boolean doAsync, CancellationToken cancellationToken)
at MailKit.Net.Smtp.SmtpStream.ReadResponse(CancellationToken cancellationToken)
at MailKit.Net.Smtp.SmtpClient.DataAsync(FormatOptions options, MimeMessage message, Int64 size, Boolean doAsync, CancellationToken cancellationToken, ITransferProgress progress)
at MailKit.Net.Smtp.SmtpClient.SendAsync(FormatOptions options, MimeMessage message, MailboxAddress sender, IList`1 recipients, Boolean doAsync, CancellationToken cancellationToken, ITransferProgress progress)
at MailKit.Net.Smtp.SmtpClient.Send(FormatOptions options, MimeMessage message, CancellationToken cancellationToken, ITransferProgress progress)
at MailKit.MailTransport.Send(MimeMessage message, CancellationToken cancellationToken, ITransferProgress progress)
The problem is likely to be this line:
client.Timeout = 1000;
You are telling MailKit to throw an exception after 1000 milliseconds (aka 1 second) if it cannot read from (or write to) a socket within that time period.
The default timeout that MailKit uses is 2 minutes (2 * 60 * 1000).
I am receiving this error in my worker role:
Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (404) Not Found. ---> System.Net.WebException: The remote server returned an error: (404) Not Found. at System.Net.HttpWebRequest.GetResponse() at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](StorageCommandBase`1 cmd, IRetryPolicy policy, OperationContext operationContext) --- End of inner exception stack trace --- at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](StorageCommandBase`1 cmd, IRetryPolicy policy, OperationContext operationContext) at Microsoft.WindowsAzure.Storage.Queue.CloudQueue.DeleteMessage(String messageId, String popReceipt, QueueRequestOptions options, OperationContext operationContext) at Microsoft.WindowsAzure.Storage.Queue.CloudQueue.DeleteMessage(CloudQueueMessage message, QueueRequestOptions options, OperationContext operationContext) at CloudCartConnector.TaskRole2.WorkerRole.ExecuteTask() in C:\a\src\CCC\Source\CloudCartConnector.TaskRole2\WorkerRole.cs:line 101 Request Information RequestID:7a7c08ec-0003-0059-6d7b-2d118f000000 RequestDate:Thu, 03 Dec 2015 03:33:11 GMT StatusMessage:The specified queue does not exist. ErrorCode:QueueNotFound
If there was an exception in the on start method, would this cause a worker role to fail to run? Should I enter a try catch statement in the on start method and just return base.OnStart()? If my storage accounts becomes unavailable, due to a MS upgrade or a server going down, is the try catch the best?
public override bool OnStart()
{
ServicePointManager.DefaultConnectionLimit = 12;
// Retrieve storage account from connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create the queue client.
CloudQueueClient queueClient = storageAccount.CreateCloudQueueClient();
// Retrieve a reference to a queue.
queue = queueClient.GetQueueReference("taskqueue");
return base.OnStart();
}
Below this code, I execute a task. Should I say if the queue is null, just return?
public string GetTasks()
{
CloudQueueMessage cloudQueueMessasge = new CloudQueueMessage(message);
queue.AddMessage(cloudQueueMessasge, new TimeSpan(0, 30, 0));
}
catch (Exception ex)
{
return ex.ToString();
}
}
You should have a try...catch block in your OnStart method.
try{
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create the queue client.
CloudQueueClient queueClient = storageAccount.CreateCloudQueueClient();
// Retrieve a reference to a queue.
queue = queueClient.GetQueueReference("taskqueue");
}
catch(Microsoft.WindowsAzure.Storage.StorageException e)
{
// Exception Handling & Logging
// Return false for OnStart
}
You should also check if queue is null in your GetTasks() method to prevent throwing potential and unnecessary NullReferenceException.
I think I've managed to make a test that shows this problem repeatably, at least on my system. This question relates to HttpClient being used for a bad endpoint (nonexistant endpoint, the target is down).
The problem is that the number of completed tasks falls short of the total, usually by about a few. I don't mind requests not working, but this just results in the app just hanging there when the results are awaited.
I get the following result form the test code below:
Elapsed: 237.2009884 seconds.
Tasks in batch array: 8000 Completed Tasks : 7993
If i set batchsize to 8 instead of 8000, it completes. For 8000 it jams on the WhenAll .
I wonder if other people get the same result, if I am doing something wrong, and if this appears to be a bug.
using System;
using System.Diagnostics;
using System.Linq;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
namespace CustomArrayTesting
{
/// <summary>
/// Problem: a large batch of async http requests is done in a loop using HttpClient, and a few of them never complete
/// </summary>
class ProgramTestHttpClient
{
static readonly int batchSize = 8000; //large batch size brings about the problem
static readonly Uri Target = new Uri("http://localhost:8080/BadAddress");
static TimeSpan httpClientTimeout = TimeSpan.FromSeconds(3); // short Timeout seems to bring about the problem.
/// <summary>
/// Sends off a bunch of async httpRequests using a loop, and then waits for the batch of requests to finish.
/// I installed asp.net web api client libraries Nuget package.
/// </summary>
static void Main(String[] args)
{
httpClient.Timeout = httpClientTimeout;
stopWatch = new Stopwatch();
stopWatch.Start();
// this timer updates the screen with the number of completed tasks in the batch (See timerAction method bellow Main)
TimerCallback _timerAction = timerAction;
TimerCallback _resetTimer = ResetTimer;
TimerCallback _timerCallback = _timerAction + _resetTimer;
timer = new Timer(_timerCallback, null, TimeSpan.FromSeconds(1), Timeout.InfiniteTimeSpan);
//
for (int i = 0; i < batchSize; i++)
{
Task<HttpResponseMessage> _response = httpClient.PostAsJsonAsync<Object>(Target, new Object());//WatchRequestBody()
Batch[i] = _response;
}
try
{
Task.WhenAll(Batch).Wait();
}
catch (Exception ex)
{
}
timer.Dispose();
timerAction(null);
stopWatch.Stop();
Console.WriteLine("Done");
Console.ReadLine();
}
static readonly TimeSpan timerRepeat = TimeSpan.FromSeconds(1);
static readonly HttpClient httpClient = new HttpClient();
static Stopwatch stopWatch;
static System.Threading.Timer timer;
static readonly Task[] Batch = new Task[batchSize];
static void timerAction(Object state)
{
Console.Clear();
Console.WriteLine("Elapsed: {0} seconds.", stopWatch.Elapsed.TotalSeconds);
var _tasks = from _task in Batch where _task != null select _task;
int _tasksCount = _tasks.Count();
var _completedTasks = from __task in _tasks where __task.IsCompleted select __task;
int _completedTasksCount = _completedTasks.Count();
Console.WriteLine("Tasks in batch array: {0} Completed Tasks : {1} ", _tasksCount, _completedTasksCount);
}
static void ResetTimer(Object state)
{
timer.Change(timerRepeat, Timeout.InfiniteTimeSpan);
}
}
}
Sometimes it just crashes before finishing with an Access Violation unhandled exception. The call stack just says:
> mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode = 1225, uint numBytes = 0, System.Threading.NativeOverlapped* pOVERLAP = 0x08b38b98)
[Native to Managed Transition]
kernel32.dll!#BaseThreadInitThunk#12()
ntdll.dll!___RtlUserThreadStart#8()
ntdll.dll!__RtlUserThreadStart#8()
Most of the time it doesn't crash but just never finishes waiting on the whenall. In any case the following first chance exceptions are thrown for each request:
A first chance exception of type 'System.Net.Sockets.SocketException' occurred in System.dll
A first chance exception of type 'System.Net.WebException' occurred in System.dll
A first chance exception of type 'System.AggregateException' occurred in mscorlib.dll
A first chance exception of type 'System.ObjectDisposedException' occurred in System.dll
I made the debugger stop on the Object disposed exception, and got this call stack:
> System.dll!System.Net.Sockets.NetworkStream.UnsafeBeginWrite(byte[] buffer, int offset, int size, System.AsyncCallback callback, object state) + 0x136 bytes
System.dll!System.Net.PooledStream.UnsafeBeginWrite(byte[] buffer, int offset, int size, System.AsyncCallback callback, object state) + 0x19 bytes
System.dll!System.Net.ConnectStream.WriteHeaders(bool async = true) + 0x105 bytes
System.dll!System.Net.HttpWebRequest.EndSubmitRequest() + 0x8a bytes
System.dll!System.Net.HttpWebRequest.SetRequestSubmitDone(System.Net.ConnectStream submitStream) + 0x11d bytes
System.dll!System.Net.Connection.CompleteConnection(bool async, System.Net.HttpWebRequest request = {System.Net.HttpWebRequest}) + 0x16c bytes
System.dll!System.Net.Connection.CompleteConnectionWrapper(object request, object state) + 0x4e bytes
System.dll!System.Net.PooledStream.ConnectionCallback(object owningObject, System.Exception e, System.Net.Sockets.Socket socket, System.Net.IPAddress address) + 0xf0 bytes
System.dll!System.Net.ServicePoint.ConnectSocketCallback(System.IAsyncResult asyncResult) + 0xe6 bytes
System.dll!System.Net.LazyAsyncResult.Complete(System.IntPtr userToken) + 0x65 bytes
System.dll!System.Net.ContextAwareResult.Complete(System.IntPtr userToken) + 0x92 bytes
System.dll!System.Net.LazyAsyncResult.ProtectedInvokeCallback(object result, System.IntPtr userToken) + 0xa6 bytes
System.dll!System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* nativeOverlapped) + 0x98 bytes
mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* pOVERLAP) + 0x6e bytes
[Native to Managed Transition]
The exception message was:
{"Cannot access a disposed object.\r\nObject name: 'System.Net.Sockets.NetworkStream'."} System.Exception {System.ObjectDisposedException}
Notice the relationship to that unhandled access violation exception that I rarely see.
So, it seems that HttpClient is not robust for when the target is down. I am doing this on windows 7 32 by the way.
I looked through the source of HttpClient using reflector. For the synchronously executed part of the operation (when it is kicked-off), there seems to be no timeout applied to the returned task, as far as I can see. There is some timeout implementation that calls Abort() on an HttpWebRequest object, but again they seem to have missed out any timeout cancellation or faulting of the returned task on this side of the async function. There maybe something on the callback side, but sometimes the callback is probably "going missing", leading to the returned Task never completing.
I posted a question asking how to add a timeout to any Task, and an answerer gave this very nice solution (here as an extension method):
public static Task<T> WithTimeout<T>(this Task<T> task, TimeSpan timeout)
{
var delay = task.ContinueWith(t => t.Result
, new CancellationTokenSource(timeout).Token);
return Task.WhenAny(task, delay).Unwrap();
}
So, calling HttpClient like this should prevent any "Tasks gone bad" from never ending:
Task<HttpResponseMessage> _response = httpClient.PostAsJsonAsync<Object>(Target, new Object()).WithTimeout<HttpResponseMessage>(httpClient.Timeout);
A couple more things that I think made requests less likely to go missing:
1. Increasing the timeout from 3s to 30s made all the tasks finish in the program that I posted with this question.
2. Increasing the number of concurrent connections allowed using for example System.Net.ServicePointManager.DefaultConnectionLimit = 100;
I came across this question when googling for solutions to a similar problem from WCF. That series of exceptions is exactly the same pattern I see. Eventually through a ton of investigation I found a bug in HttpWebRequest that HttpClient uses. The HttpWebRequest gets in a bad state and only sends the HTTP headers. It then sits waiting for a response which will never be sent.
I've raised a ticket with Microsoft Connect which can be found here: https://connect.microsoft.com/VisualStudio/feedback/details/1805955/async-post-httpwebrequest-hangs-when-a-socketexception-occurs-during-setsocketoption
The specifics are in the ticket but it requires an async POST call from the HttpWebRequest to a non-localhost machine. I've reproduced it on Windows 7 in .Net 4.5 and 4.6. The failed SetSocketOption call, which raises the SocketException, only fails on Windows 7 in testing.
For us the UseNagleAlgorithm setting causes the SetSocketOption call, but we can't avoid it as WCF turns off UseNagleAlgorithm and you can't stop it. In WCF it appears as a timed out call. Obviously this isn't great as we're spending 60s waiting for nothing.
Your exception information is being lost in the WhenAll task. Instead of using that, try this:
Task aggregateTask = Task.Factory.ContinueWhenAll(
Batch,
TaskExtrasExtensions.PropagateExceptions,
TaskContinuationOptions.ExecuteSynchronously);
aggregateTask.Wait();
This uses the PropagateExceptions extension method from the Parallel Extensions Extras sample code to ensure that exception information from the tasks in the batch operation are not lost:
/// <summary>Propagates any exceptions that occurred on the specified tasks.</summary>
/// <param name="tasks">The Task instances whose exceptions are to be propagated.</param>
public static void PropagateExceptions(this Task [] tasks)
{
if (tasks == null) throw new ArgumentNullException("tasks");
if (tasks.Any(t => t == null)) throw new ArgumentException("tasks");
if (tasks.Any(t => !t.IsCompleted)) throw new InvalidOperationException("A task has not completed.");
Task.WaitAll(tasks);
}
I have a TimeoutException problem, I am using C# 4.0 (can't upgrade to 4.5 anytime soon) and WCF. Note that I do not control the Server and cannot see the code and or technology that are used. The problem happens with different servers made by different people.
I send as many request as I can to many servers (let's say 10), one per server at any time. They go from 2 to 30 requests per second. Between 30 seconds to 5 minutes, I will get some TimeoutException :
exception {"The HTTP request to 'http://xx.xx.xx.xx/service/test_service' has exceeded the allotted timeout of 00:02:10. The time allotted to this operation may have been a portion of a longer timeout."} System.Exception {System.TimeoutException}.
Stack Trace :
Server stack trace:
at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at System.ServiceModel.Channels.ServiceChannel.SendAsyncResult.End(SendAsyncResult result)
at System.ServiceModel.Channels.ServiceChannel.EndCall(String action, Object[] outs, IAsyncResult result)
at System.ServiceModel.Channels.ServiceChannelProxy.InvokeEndService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Device.EndTest(IAsyncResult result)
at DeviceClient.EndTest(IAsyncResult result) in ...
at TestAsync(IAsyncResult ar) in ...
The InnerException is :
[System.Net.WebException] {"The request was aborted: The request was canceled."} System.Net.WebException
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelAsyncRequest.CompleteGetResponse(IAsyncResult result)
Wireshark tells me that I do not even open a connection (no SYN). So this should be a client problem. I have many TIME_WAIT connections in TCPView
Using Sync calls is working, but not possible.
Note that in the following code example, there is one method call per server. (In my case, 10 concurrent TestAsync)
(in the real project, we use CCR instead of Semaphore, same result)
private void AsyncTest()
{
//GetServiceObject Will add custom bindings and more..
Client client = ClientBuilder.GetServiceObject();
while (true)
{
Semaphore semaphore = new Semaphore(0,1);
client.BeginTest(BeginTestCallback, new AsyncState
{
Client = client,
Semaphore = semaphore
});
semaphore.WaitOne();
}
}
private void BeginTestCallback(IAsyncResult asyncResult)
{
try
{
AsyncState state = asyncResult.AsyncState as AsyncState;
Client client = state.Client;
Semaphore semaphore = state.Semaphore;
Client.EndTest(asyncResult);
semaphore.Release();
}
catch (Exception e)
{
//Will catch the exception here because of Client.EndTest(asyncResult)
Debug.Assert(false, e.Message);
}
}
I tried with
ServicePointManager.DefaultConnectionLimit = 200;
ServicePointManager.MaxServicePointIdleTime = 2000;
As some post suggested, without success.
Even if I set really High Open, send, receive and close timeouts, it will do the same exception. WCF seems to be "stuck" at sending the request. The server continues to respond correctly to other requests.
Have any idea?
Also, If I do this (BeginTest in Callback instead of while(true)), it will never do the exception?!?!
private void AsyncTest()
{
//GetServiceObject Will add custom bindings and more..
Client client = ClientBuilder.GetServiceObject();
try
{
client.BeginTest(BeginTestCallback, new AsyncState
{
Client = client
});
}
catch (Exception e)
{
Debug.Assert(false, e.Message);
}
}
private void BeginTestCallback(IAsyncResult asyncResult)
{
try
{
AsyncState state = asyncResult.AsyncState as AsyncState;
state.Client.EndTest(asyncResult);
state.Client.BeginTest(BeginTestCallback, state);
}
catch (Exception e)
{
//No Exception here
Debug.Assert(false, e.Message);
}
}
After more testing, I found out that if the begin/end mechanism is not executed on the same thread pool, it will randomly do this behavior.
In the first case, "AsyncTest" was spawned within a new thread with ThreadStart and Thread. In the second case, only the first "begin" is called on the dedicated thread and since the problem occurs at random, there is a small chance that the exception would happen on first request. The other "begin" are made on the .net ThreadPool.
By using Task.Factory.StartNew(() => AsyncTest()) in the first case, the problem is gone.
In my real project, I still use CCR (and the CCR threadpool) to do everything until I have to call the begin/end.. I will use the .net threadpool and everything is working now.
Anyone have better explanation of why WCF doesn't like to be called on another threadpool?
Sorry if this is a bit long winded but I thought better to post more than less.
This is also my First post here, so please forgive.
I have been trying to figure this one out for some time. and to no avail, hoping there is a genius out there who has encountered this before.
This is an intermittent problem and has been hard to reproduce.
The code that I am running simply calls a web service
The Web Service call is in a loop (so we could be doing this a lot, 1500 times or more)
Here is the code that is causing the error:
HttpWebRequest groupRequest = null;
WebResponse groupResponse = null;
try
{
XmlDocument doc = new XmlDocument();
groupRequest = (HttpWebRequest)HttpWebRequest.Create(String.Format(Server.HtmlDecode(Util.GetConfigValue("ImpersonatedSearch.GroupLookupUrl")),userIntranetID));
groupRequest.Proxy = null;
groupRequest.KeepAlive = false;
groupResponse = groupRequest.GetResponse();
doc.Load(groupResponse.GetResponseStream());
foreach (XmlElement nameElement in doc.GetElementsByTagName(XML_GROUP_NAME))
{
foreach (string domain in _groupDomains )
{
try
{
string group = new System.Security.Principal.NTAccount(domain, nameElement.InnerText).Translate(typeof(System.Security.Principal.SecurityIdentifier)).Value;
impersonationChain.Append(";").Append(group);
break;
}
catch{}
} // loop through
}
}
catch (Exception groupLookupException)
{
throw new ApplicationException(String.Format(#"Impersonated Search ERROR: Could not find groups for user<{0}\{1}>", userNTDomain, userIntranetID), groupLookupException);
}
finally
{
if ( groupResponse != null )
{
groupResponse.Close();
}
}
Here is the error that happens sometimes:
Could not find groups for user<DOMAIN\auser> ---> System.IO.IOException: Unable to read
data from the transport connection: An established connection was aborted by the
software in your host machine. ---> System.Net.Sockets.SocketException: An established
connection was aborted by the software in your host machine at
System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags
socketFlags) at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32
size) --- End of inner exception stack trace --- at System.Net.ConnectStream.Read(Byte[]
buffer, Int32 offset, Int32 size) at System.Xml.XmlTextReaderImpl.ReadData() at
System.Xml.XmlTextReaderImpl.ParseDocumentContent() at
System.Xml.XmlLoader.LoadDocSequence
(XmlDocument parentDoc) at System.Xml.XmlDocument.Load(XmlReader reader) at
System.Xml.XmlDocument.Load(Stream inStream) at
MyWebServices.ImpersonatedSearch.PerformQuery(QueryParameters parameters,
String userIntranetID, String userNTDomain)--- End of inner exception stack trace
---at MyWebServices.ImpersonatedSearch.PerformQuery(QueryParameters parameters, String userIntranetID, String userNTDomain)
--- End of inner exception stack trace ---
at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message,
WebResponse response, Stream responseStream, Boolean asyncCall)
at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName,
Object[] parameters) at MyProgram. MyWebServices.ImpersonatedSearch.PerformQuery
(QueryParameters parameters, String userIntranetID, String userNTDomain)
at MyProgram.MyMethod()
Sorry that was alot of code to read through.
This happens about 30 times out of around 1700
You're probably hitting a timeout. First of all, turn the keepalive back on. Second, check the timestamps on the request and reply. If there is a firewall between the sender and receiver, make sure that it isn't closing the connection because of idle timeout. I've had both these problems in the past, although with generic socket programming, not DB stuff.