I have a TimeoutException problem, I am using C# 4.0 (can't upgrade to 4.5 anytime soon) and WCF. Note that I do not control the Server and cannot see the code and or technology that are used. The problem happens with different servers made by different people.
I send as many request as I can to many servers (let's say 10), one per server at any time. They go from 2 to 30 requests per second. Between 30 seconds to 5 minutes, I will get some TimeoutException :
exception {"The HTTP request to 'http://xx.xx.xx.xx/service/test_service' has exceeded the allotted timeout of 00:02:10. The time allotted to this operation may have been a portion of a longer timeout."} System.Exception {System.TimeoutException}.
Stack Trace :
Server stack trace:
at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at System.ServiceModel.Channels.ServiceChannel.SendAsyncResult.End(SendAsyncResult result)
at System.ServiceModel.Channels.ServiceChannel.EndCall(String action, Object[] outs, IAsyncResult result)
at System.ServiceModel.Channels.ServiceChannelProxy.InvokeEndService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Device.EndTest(IAsyncResult result)
at DeviceClient.EndTest(IAsyncResult result) in ...
at TestAsync(IAsyncResult ar) in ...
The InnerException is :
[System.Net.WebException] {"The request was aborted: The request was canceled."} System.Net.WebException
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelAsyncRequest.CompleteGetResponse(IAsyncResult result)
Wireshark tells me that I do not even open a connection (no SYN). So this should be a client problem. I have many TIME_WAIT connections in TCPView
Using Sync calls is working, but not possible.
Note that in the following code example, there is one method call per server. (In my case, 10 concurrent TestAsync)
(in the real project, we use CCR instead of Semaphore, same result)
private void AsyncTest()
{
//GetServiceObject Will add custom bindings and more..
Client client = ClientBuilder.GetServiceObject();
while (true)
{
Semaphore semaphore = new Semaphore(0,1);
client.BeginTest(BeginTestCallback, new AsyncState
{
Client = client,
Semaphore = semaphore
});
semaphore.WaitOne();
}
}
private void BeginTestCallback(IAsyncResult asyncResult)
{
try
{
AsyncState state = asyncResult.AsyncState as AsyncState;
Client client = state.Client;
Semaphore semaphore = state.Semaphore;
Client.EndTest(asyncResult);
semaphore.Release();
}
catch (Exception e)
{
//Will catch the exception here because of Client.EndTest(asyncResult)
Debug.Assert(false, e.Message);
}
}
I tried with
ServicePointManager.DefaultConnectionLimit = 200;
ServicePointManager.MaxServicePointIdleTime = 2000;
As some post suggested, without success.
Even if I set really High Open, send, receive and close timeouts, it will do the same exception. WCF seems to be "stuck" at sending the request. The server continues to respond correctly to other requests.
Have any idea?
Also, If I do this (BeginTest in Callback instead of while(true)), it will never do the exception?!?!
private void AsyncTest()
{
//GetServiceObject Will add custom bindings and more..
Client client = ClientBuilder.GetServiceObject();
try
{
client.BeginTest(BeginTestCallback, new AsyncState
{
Client = client
});
}
catch (Exception e)
{
Debug.Assert(false, e.Message);
}
}
private void BeginTestCallback(IAsyncResult asyncResult)
{
try
{
AsyncState state = asyncResult.AsyncState as AsyncState;
state.Client.EndTest(asyncResult);
state.Client.BeginTest(BeginTestCallback, state);
}
catch (Exception e)
{
//No Exception here
Debug.Assert(false, e.Message);
}
}
After more testing, I found out that if the begin/end mechanism is not executed on the same thread pool, it will randomly do this behavior.
In the first case, "AsyncTest" was spawned within a new thread with ThreadStart and Thread. In the second case, only the first "begin" is called on the dedicated thread and since the problem occurs at random, there is a small chance that the exception would happen on first request. The other "begin" are made on the .net ThreadPool.
By using Task.Factory.StartNew(() => AsyncTest()) in the first case, the problem is gone.
In my real project, I still use CCR (and the CCR threadpool) to do everything until I have to call the begin/end.. I will use the .net threadpool and everything is working now.
Anyone have better explanation of why WCF doesn't like to be called on another threadpool?
Related
I'm trying to self-host a singleton instance of a service and I'm obviously getting lost at a level of indirection...
I've got a base address of http://localhost:8050/. I'm not too bothered where the service endpoint is as long as it's predictable. For the moment, I'm trying to use /Manage/.
I'm able to browse to the base address and see a wsdl. If I scan through the wsdl, it points at /Manage/..
<wsdl:service name="EngineService">
<wsdl:port name="BasicHttpBinding_IEngineService" binding="tns:BasicHttpBinding_IEngineService">
<soap:address location="http://localhost:8050/Manage/"/>
</wsdl:port>
</wsdl:service>
When I consume the wsdl using the WcfTestClient, it lists all the correct methods, but calling any of them throw the following exception
System.ServiceModel.EndpointNotFoundException: There was no endpoint listening at http://localhost:8050/Manage that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.
Server stack trace:
at System.ServiceModel.Channels.HttpChannelUtilities.ProcessGetResponseWebException(WebException webException, HttpWebRequest request, HttpAbortReason abortReason)
at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)
at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout)
at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at IEngineService.SupportedAgents()
at EngineServiceClient.SupportedAgents()
Inner Exception:
The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)
Log messages show my instance methods never get called. The service doesn't enter a faulted state, it just looks like it's not there.
I'm listening as follows:
public static ServiceHost Listen<TServiceContract>(
TServiceContract instance,
int port,
string name
) {
//Added this for debugging, was previously just "name"
string endpoint = String.Format("http://localhost:{0}/{1}/", port, name);
var svcHost = new ServiceHost(
instance,
new Uri[] { new Uri(String.Format("http://localhost:{0}/", port)) });
/* Snip: Add a Faulted handler but it's never called */
ServiceEndpoint serviceHttpEndpoint = svcHost.AddServiceEndpoint(
typeof(TServiceContract),
new BasicHttpBinding {
HostNameComparisonMode = HostNameComparisonMode.WeakWildcard
}, endpoint); /*Using name instead of endpoint makes no difference beyond removing the trailing slash */
/* Snip: Add a ServiceDebugBehavior with IncludeExceptionDetailInFaults = true */
/* Snip: Add a ServiceMetadataBehavior with HttpGetEnabled = true */
try {
log.Trace("Opening endpoint");
svcHost.Open();
} catch () {
/* Lots of catches for different problems including Exception
* None of them get hit */
}
log.Info("Service contract {0} ready at {1}", typeof(TServiceContract).Name, svcHost.BaseAddresses.First());
return svcHost;
And calling the Listen() method as follows:
IEngineService wcfInstance = Resolver.Resolve<IEngineService>();
service = WcfHoster.Listen(wcfInstance, 8050, "Manage");
How can I track down what the problem is/debug further?
Additional info: The Service contract and minimal implementation:
[ServiceContract]
interface IEngineService {
[OperationContract]
List<string> Agents();
[OperationContract]
string Test();
[OperationContract]
List<string> SupportedAgents();
[OperationContract]
string Connect(string AgentStrongName, string Hostname);
}
And the implementation:
[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single)]
class EngineService : IEngineService {
IAgentManager agentManager;
public EngineService(IAgentManager AgentManager) {
log.Debug("Engine webservice instantiating");
this.agentManager = AgentManager;
}
public string Connect(string AgentStrongName, string Hostname) {
log.Debug("Endpoint requested for [{0}], [{1}]", Hostname, AgentStrongName);
return agentManager.GetSession(AgentStrongName, Hostname);
}
public List<string> Agents() {
log.Debug("Current agents queried");
throw new NotImplementedException();
}
public List<string> SupportedAgents() {
log.Debug("Supported agents queried");
return agentManager.SupportedAgents().ToList();
}
public string Test() {
log.Warn("Test query");
return "Success!";
}
}
The test client can see the service and methods but throws the exception above when I click Invoke...
Edit: localhost resolves to IPv6 by default so I've tried using 127.0.0.1 explicitly at both ends. No difference.
I've tried taking the above code into a new project and get the same issue. Running the whole thing on someone else's machine didn't help either.
Service Trace viewer
Running a service trace on the server side, then examining the results in the viewer gives:
Failed to lookup a channel to receive an incoming message. Either the endpoint or the SOAP action was not found.
Config file: Since I need the executable to be able to make a decision about which Wcf service to present at runtime, I don't have any Wcf-related code in the config file.
This is probably a client / service binding mismatch. Please check the test client binding. You should also create a unit test by generating a proxy from the wsdl.
Ok. I have tried to reproduce your issue and I managed calling the host by removing "HostNameComparisonMode = HostNameComparisonMode.WeakWildcard" in order to get a default basichttp endpoint. Why do you need this?
I have a List of TCP sockets I write data to. If the writing fails, I remove it from the list and just carry on.
At least thats the plan. What happens is, that when a client disconnects, the SocketException escalates and the program crashes, even though that exception is handled. The code is below:
// sockets is type List<Socket>
foreach (Socket s in sockets)
{
String jsonString = skeleton.Marshall();
byte[] jsonBytes = System.Text.Encoding.UTF8.GetBytes(jsonString);
try
{
s.Send(jsonBytes); // boom! System.Net.Sockets.SocketException!
}
catch (System.Net.Sockets.SocketException except)
{
sockets.Remove(s);
Console.WriteLine(except.StackTrace);
}
catch (Exception except)
{
Console.WriteLine(except.StackTrace);
}
}
I don't get why any exception could go through this. I didn't look at the console output because Visual Studio clears that when an exception occurs (at least I didn't see anything meaningful over there)
Thanks for your help!
Edit
As Sebastian Negraszus pointed out, I can't directly remove the Socket from the List, so the code now is
List<Socket> remove = new List<Socket>();
// sockets still is of type List<Socket>
foreach (Socket s in sockets)
{
String jsonString = skeleton.Marshall();
byte[] jsonBytes = System.Text.Encoding.UTF8.GetBytes(jsonString);
try
{
s.Send(jsonBytes);
}
catch (System.Net.Sockets.SocketException except)
{
remove.Add(s);
Console.WriteLine(except.StackTrace);
}
catch (Exception except)
{
Console.WriteLine(except.StackTrace);
}
}
foreach (Socket s in remove)
{
sockets.Remove(s);
}
However, even if the Socket is not removed from the list, it should just escalate here.
Edit 2
This code runs in an event handler, while sockets is being filled in the main Thread, so I assumed the lack of locking caused problems. However, after adding locks, the error still appeared.
main thread:
// ...
sockets = new List<Socket>();
delegateFoo += handlerFunction;
// ...
TcpListener tcpListener = new TcpListener(IPAddress.Any, 20001);
tcpListener.Start();
while (true) {
Socket s = tcpListener.AcceptSocket();
lock (sockets) {
sockets.Add(s);
}
}
handler function:
// ...generate skeleton...
lock (sockets)
{
foreach (Socket s in sockets)
{
String jsonString = skeleton.Marshall();
byte[] jsonBytes = System.Text.Encoding.UTF8.GetBytes(jsonString);
try
{
s.Send(jsonBytes);
}
catch (System.Net.Sockets.SocketException except)
{
remove.Add(s);
Console.WriteLine(except.StackTrace);
}
catch (Exception except)
{
Console.WriteLine(except.StackTrace);
}
}
foreach (Socket s in remove)
{
sockets.Remove(s);
}
}
Bad luck though, the Exception still escalates (at least I think so, the program interrupts in VS and this little window occurs saying "SocketException occured" (I use the German version, so the wording might be different).
The error can be triggered by connecting twice using putty and closing one of the two puttys. The next time Send() is called - boom.
Edit 3: Exception details
I'm sorry these are in German. Translations:
"... ist aufgetreten" = "... occured"
"bei" = "at"
message = "An existing connection has been aborted/terminated by the host computer"
System.Net.Sockets.SocketException ist aufgetreten.
_HResult=-2147467259
_message=Eine bestehende Verbindung wurde softwaregesteuert durch den Hostcomputer abgebrochen
HResult=-2147467259
IsTransient=false
Message=Eine bestehende Verbindung wurde softwaregesteuert durch den Hostcomputer abgebrochen
Source=System
ErrorCode=10053
NativeErrorCode=10053
StackTrace:
bei System.Net.Sockets.Socket.Send(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
InnerException:
And yes, I only Send() once in my program.
Edit 4: Visual Studio weirdness
Okay, it's Visual Studio being weird. I can uncheck the "break on Exceptions of this type" checkbox and then it just continues. So the exception didn't escalate, but nevertheless made the program stop.
I don't get why you would want to break on handled exceptions by default. I figured that if I uncheck that the program just faults. If you have a better solution, I'd be glad to accept your answer.
I assume sockets is a List<T>? You cannot modify the list with sockets.Remove(s); while still inside the foreach loop, because this invalidates the enumerator. The next iteration causes an InvalidOperationException.
Uncheck "break on exceptions of this type" (or whatever it's called in English). Works fine afterwards.
I think I've managed to make a test that shows this problem repeatably, at least on my system. This question relates to HttpClient being used for a bad endpoint (nonexistant endpoint, the target is down).
The problem is that the number of completed tasks falls short of the total, usually by about a few. I don't mind requests not working, but this just results in the app just hanging there when the results are awaited.
I get the following result form the test code below:
Elapsed: 237.2009884 seconds.
Tasks in batch array: 8000 Completed Tasks : 7993
If i set batchsize to 8 instead of 8000, it completes. For 8000 it jams on the WhenAll .
I wonder if other people get the same result, if I am doing something wrong, and if this appears to be a bug.
using System;
using System.Diagnostics;
using System.Linq;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
namespace CustomArrayTesting
{
/// <summary>
/// Problem: a large batch of async http requests is done in a loop using HttpClient, and a few of them never complete
/// </summary>
class ProgramTestHttpClient
{
static readonly int batchSize = 8000; //large batch size brings about the problem
static readonly Uri Target = new Uri("http://localhost:8080/BadAddress");
static TimeSpan httpClientTimeout = TimeSpan.FromSeconds(3); // short Timeout seems to bring about the problem.
/// <summary>
/// Sends off a bunch of async httpRequests using a loop, and then waits for the batch of requests to finish.
/// I installed asp.net web api client libraries Nuget package.
/// </summary>
static void Main(String[] args)
{
httpClient.Timeout = httpClientTimeout;
stopWatch = new Stopwatch();
stopWatch.Start();
// this timer updates the screen with the number of completed tasks in the batch (See timerAction method bellow Main)
TimerCallback _timerAction = timerAction;
TimerCallback _resetTimer = ResetTimer;
TimerCallback _timerCallback = _timerAction + _resetTimer;
timer = new Timer(_timerCallback, null, TimeSpan.FromSeconds(1), Timeout.InfiniteTimeSpan);
//
for (int i = 0; i < batchSize; i++)
{
Task<HttpResponseMessage> _response = httpClient.PostAsJsonAsync<Object>(Target, new Object());//WatchRequestBody()
Batch[i] = _response;
}
try
{
Task.WhenAll(Batch).Wait();
}
catch (Exception ex)
{
}
timer.Dispose();
timerAction(null);
stopWatch.Stop();
Console.WriteLine("Done");
Console.ReadLine();
}
static readonly TimeSpan timerRepeat = TimeSpan.FromSeconds(1);
static readonly HttpClient httpClient = new HttpClient();
static Stopwatch stopWatch;
static System.Threading.Timer timer;
static readonly Task[] Batch = new Task[batchSize];
static void timerAction(Object state)
{
Console.Clear();
Console.WriteLine("Elapsed: {0} seconds.", stopWatch.Elapsed.TotalSeconds);
var _tasks = from _task in Batch where _task != null select _task;
int _tasksCount = _tasks.Count();
var _completedTasks = from __task in _tasks where __task.IsCompleted select __task;
int _completedTasksCount = _completedTasks.Count();
Console.WriteLine("Tasks in batch array: {0} Completed Tasks : {1} ", _tasksCount, _completedTasksCount);
}
static void ResetTimer(Object state)
{
timer.Change(timerRepeat, Timeout.InfiniteTimeSpan);
}
}
}
Sometimes it just crashes before finishing with an Access Violation unhandled exception. The call stack just says:
> mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode = 1225, uint numBytes = 0, System.Threading.NativeOverlapped* pOVERLAP = 0x08b38b98)
[Native to Managed Transition]
kernel32.dll!#BaseThreadInitThunk#12()
ntdll.dll!___RtlUserThreadStart#8()
ntdll.dll!__RtlUserThreadStart#8()
Most of the time it doesn't crash but just never finishes waiting on the whenall. In any case the following first chance exceptions are thrown for each request:
A first chance exception of type 'System.Net.Sockets.SocketException' occurred in System.dll
A first chance exception of type 'System.Net.WebException' occurred in System.dll
A first chance exception of type 'System.AggregateException' occurred in mscorlib.dll
A first chance exception of type 'System.ObjectDisposedException' occurred in System.dll
I made the debugger stop on the Object disposed exception, and got this call stack:
> System.dll!System.Net.Sockets.NetworkStream.UnsafeBeginWrite(byte[] buffer, int offset, int size, System.AsyncCallback callback, object state) + 0x136 bytes
System.dll!System.Net.PooledStream.UnsafeBeginWrite(byte[] buffer, int offset, int size, System.AsyncCallback callback, object state) + 0x19 bytes
System.dll!System.Net.ConnectStream.WriteHeaders(bool async = true) + 0x105 bytes
System.dll!System.Net.HttpWebRequest.EndSubmitRequest() + 0x8a bytes
System.dll!System.Net.HttpWebRequest.SetRequestSubmitDone(System.Net.ConnectStream submitStream) + 0x11d bytes
System.dll!System.Net.Connection.CompleteConnection(bool async, System.Net.HttpWebRequest request = {System.Net.HttpWebRequest}) + 0x16c bytes
System.dll!System.Net.Connection.CompleteConnectionWrapper(object request, object state) + 0x4e bytes
System.dll!System.Net.PooledStream.ConnectionCallback(object owningObject, System.Exception e, System.Net.Sockets.Socket socket, System.Net.IPAddress address) + 0xf0 bytes
System.dll!System.Net.ServicePoint.ConnectSocketCallback(System.IAsyncResult asyncResult) + 0xe6 bytes
System.dll!System.Net.LazyAsyncResult.Complete(System.IntPtr userToken) + 0x65 bytes
System.dll!System.Net.ContextAwareResult.Complete(System.IntPtr userToken) + 0x92 bytes
System.dll!System.Net.LazyAsyncResult.ProtectedInvokeCallback(object result, System.IntPtr userToken) + 0xa6 bytes
System.dll!System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* nativeOverlapped) + 0x98 bytes
mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* pOVERLAP) + 0x6e bytes
[Native to Managed Transition]
The exception message was:
{"Cannot access a disposed object.\r\nObject name: 'System.Net.Sockets.NetworkStream'."} System.Exception {System.ObjectDisposedException}
Notice the relationship to that unhandled access violation exception that I rarely see.
So, it seems that HttpClient is not robust for when the target is down. I am doing this on windows 7 32 by the way.
I looked through the source of HttpClient using reflector. For the synchronously executed part of the operation (when it is kicked-off), there seems to be no timeout applied to the returned task, as far as I can see. There is some timeout implementation that calls Abort() on an HttpWebRequest object, but again they seem to have missed out any timeout cancellation or faulting of the returned task on this side of the async function. There maybe something on the callback side, but sometimes the callback is probably "going missing", leading to the returned Task never completing.
I posted a question asking how to add a timeout to any Task, and an answerer gave this very nice solution (here as an extension method):
public static Task<T> WithTimeout<T>(this Task<T> task, TimeSpan timeout)
{
var delay = task.ContinueWith(t => t.Result
, new CancellationTokenSource(timeout).Token);
return Task.WhenAny(task, delay).Unwrap();
}
So, calling HttpClient like this should prevent any "Tasks gone bad" from never ending:
Task<HttpResponseMessage> _response = httpClient.PostAsJsonAsync<Object>(Target, new Object()).WithTimeout<HttpResponseMessage>(httpClient.Timeout);
A couple more things that I think made requests less likely to go missing:
1. Increasing the timeout from 3s to 30s made all the tasks finish in the program that I posted with this question.
2. Increasing the number of concurrent connections allowed using for example System.Net.ServicePointManager.DefaultConnectionLimit = 100;
I came across this question when googling for solutions to a similar problem from WCF. That series of exceptions is exactly the same pattern I see. Eventually through a ton of investigation I found a bug in HttpWebRequest that HttpClient uses. The HttpWebRequest gets in a bad state and only sends the HTTP headers. It then sits waiting for a response which will never be sent.
I've raised a ticket with Microsoft Connect which can be found here: https://connect.microsoft.com/VisualStudio/feedback/details/1805955/async-post-httpwebrequest-hangs-when-a-socketexception-occurs-during-setsocketoption
The specifics are in the ticket but it requires an async POST call from the HttpWebRequest to a non-localhost machine. I've reproduced it on Windows 7 in .Net 4.5 and 4.6. The failed SetSocketOption call, which raises the SocketException, only fails on Windows 7 in testing.
For us the UseNagleAlgorithm setting causes the SetSocketOption call, but we can't avoid it as WCF turns off UseNagleAlgorithm and you can't stop it. In WCF it appears as a timed out call. Obviously this isn't great as we're spending 60s waiting for nothing.
Your exception information is being lost in the WhenAll task. Instead of using that, try this:
Task aggregateTask = Task.Factory.ContinueWhenAll(
Batch,
TaskExtrasExtensions.PropagateExceptions,
TaskContinuationOptions.ExecuteSynchronously);
aggregateTask.Wait();
This uses the PropagateExceptions extension method from the Parallel Extensions Extras sample code to ensure that exception information from the tasks in the batch operation are not lost:
/// <summary>Propagates any exceptions that occurred on the specified tasks.</summary>
/// <param name="tasks">The Task instances whose exceptions are to be propagated.</param>
public static void PropagateExceptions(this Task [] tasks)
{
if (tasks == null) throw new ArgumentNullException("tasks");
if (tasks.Any(t => t == null)) throw new ArgumentException("tasks");
if (tasks.Any(t => !t.IsCompleted)) throw new InvalidOperationException("A task has not completed.");
Task.WaitAll(tasks);
}
I am working on a client server application, Windows Server and Linux Client. I was testing my server with multiple concurrent clients. I tried just 20 concurrent connections from client, and i noticed that some requests were not processed despite all 20 requests were the same. They went into the queue and for some reason when their turn comes client was shutdown (Client connect timeout is 5 sec).
Then I added a Thread.Sleep(1000), to check if it is really asynchronous but then i realized it does not process other request until timeout. Despite the fact
It is asynchronous
ManualResetEvent was set before going to sleep.
Now I am wondering what Am I missing here, as this happens with concurrent connections mostly?
public static void StartServer(IPAddress ipAddr, int port)
{
//IPEndPoint serverEndPoint = new IPEndPoint(ipAddr, port);
IPEndPoint serverEndPoint = new IPEndPoint(IPAddress.Any, port);
Socket clientListener = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
try
{
clientListener.Bind(serverEndPoint);
clientListener.Listen(500);
Console.WriteLine("-- Server Listening: {0}:{1}",ipAddr,port);
while (true)
{
resetEvent.Reset();
Console.WriteLine("|| Waiting for connection");
clientListener.BeginAccept(new AsyncCallback(AcceptConnection), clientListener);
resetEvent.WaitOne();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
public static void AcceptConnection(IAsyncResult ar)
{
// Get the socket that handles the client request.
Socket listener = (Socket)ar.AsyncState;
Socket handler = listener.EndAccept(ar);
// Signal the main thread to continue.
resetEvent.Set();
// Create the state object.
JSStateObject state = new JSStateObject();
state.workSocket = handler;
if (handler.Connected)
{
Console.WriteLine("** Connected to: {0}", handler.RemoteEndPoint.ToString());
state.workingDirectory = JSUtilityClass.CreatetTemporaryDirectry();
try
{
Thread.Sleep(1000);
Receive(state);
}
catch (Exception e)
{
handler.Shutdown(SocketShutdown.Both);
handler.Close();
Console.WriteLine(e.Message);
}
}
}
I created a test that sends 100 connection attempts and found a few things slowing it down.
Why is it so slow?
I put a breakpoint in AcceptConnection to look at the callstack, this is it
ConsoleApplication1.exe!ConsoleApplication1.Program.AcceptConnection(System.IAsyncResult ar) Line 62 C#
System.dll!System.Net.LazyAsyncResult.Complete(System.IntPtr userToken) + 0x69 bytes
System.dll!System.Net.ContextAwareResult.CaptureOrComplete(ref System.Threading.ExecutionContext cachedContext, bool returnContext) + 0xab bytes
System.dll!System.Net.ContextAwareResult.FinishPostingAsyncOp(ref System.Net.CallbackClosure closure) + 0x3c bytes
System.dll!System.Net.Sockets.Socket.BeginAccept(System.AsyncCallback callback, object state) + 0xe3 bytes
ConsoleApplication1.exe!ConsoleApplication1.Program.StartServer(System.Net.IPAddress ipAddr, int port) Line 48 + 0x32 bytes C#
So the callback AcceptConnection is running from the same thread that BeginAccept was called from. I had a look at FinishPostingAsyncOp with reflector and it's using the async pattern where if there is already a socket operation in the queue waiting to be processed, it'll do so on the current thread, otherwise if there isn't anything pending, it'll process in a different thread later on, e.g.
SocketAsyncEventArgs sae = new SocketAsyncEventArgs();
sae.Completed += new EventHandler<SocketAsyncEventArgs>(SocketOperation_Completed);
if (!clientListener.AcceptAsync(sae))
AcceptConnection(clientListener, sae); // operation completed synchronously, process the result
else
// operation will complete on a IO completion port (different thread) which we'll handle in the Completed event
So as you observed the program is effectively completely synchronous in this scenario, and with the 1 second Thread.Sleep it's going to take at least 100 seconds to accept all the connections, by which time most of them will timeout.
The solution
Even though BeginAccept method summary says
Begins an asynchronous operation to accept an incoming connection
attempt.
It turns out there is more to the story
From MSDN http://msdn.microsoft.com/en-AU/library/system.net.sockets.socket.beginaccept.aspx
BeginAccept(Int32, AsyncCallback, Object)
Begins an asynchronous operation to accept an
incoming connection attempt and receives the first block of data sent
by the client application.
So it's performing a read operation with a short timeout before firing the callback. You can disable this by specifying the receiveSize of 0. Change
clientListener.BeginAccept(new AsyncCallback(AcceptConnection), clientListener);
to
clientListener.BeginAccept(0, new AsyncCallback(AcceptConnection), clientListener);
That speeds it up, and if we remove the Thread.Sleep(1000) from AcceptConnection then all the connections are accepted really fast.
If you leave the Thread.Sleep(1000) in there to simulate work load or just for testing then you may want to prepare the server to handle such a load by doing
int minWorkerThreads = 0;
int minCompletionPortThreads = 0;
ThreadPool.GetMinThreads(out minWorkerThreads, out minCompletionPortThreads);
ThreadPool.SetMinThreads(minWorkerThreads, 100);
Where 100 is the amount of threads you want readily available to handle socket operations.
Just one other thing, it's a matter of personal preference but just so you know you might like to call BeginAccept from within AcceptConnection which removes the need for that while loop.
i.e. change this
while (true)
{
resetEvent.Reset();
Console.WriteLine("|| Waiting for connection");
clientListener.BeginAccept(new AsyncCallback(AcceptConnection), clientListener);
resetEvent.WaitOne();
}
to this
Console.WriteLine("|| Waiting for connection");
clientListener.BeginAccept(new AsyncCallback(AcceptConnection), clientListener);
and put another BeginAccept in AcceptConnection
public static void AcceptConnection(IAsyncResult ar)
{
// Get the socket that handles the client request.
Socket listener = (Socket)ar.AsyncState;
// start another listening operation
listener.BeginAccept(new AsyncCallback(AcceptConnection), listener);
... the rest of the method
}
I have a .NET Remoting service which works fine most of the time. If an exception or error happens, it logs the error to a file but still continues to run.
However, about once every two weeks the service stops responding to clients, which causes the client appication to crash with a SocketException with the following message:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
No exception or stack trace is written to our log file, so I can't figure out where the service is crashing at, which leads me to believe that it is somewhere outside of my code which is failing. What additional steps can I take to figure out the root cause of this crash? I would imagine that it writes something to an EventLog somewhere, but I am not super familiar with Windows' Event Logging system so I'm not exactly sure where to look.
Thanks in advance for any assistance with this.
EDIT: Forgot to mention, stopping or restarting the service does nothing, the service never responds. I need to manually kill the process before I can start the service again.
EDIT 2:
public class ClientInfoServerSinkProvider :
IServerChannelSinkProvider
{
private IServerChannelSinkProvider _nextProvider = null;
public ClientInfoServerSinkProvider()
{
}
public ClientInfoServerSinkProvider(
IDictionary properties,
ICollection providerData)
{
}
public IServerChannelSinkProvider Next
{
get { return _nextProvider; }
set { _nextProvider = value; }
}
public IServerChannelSink CreateSink(IChannelReceiver channel)
{
IServerChannelSink nextSink = null;
if (_nextProvider != null)
{
nextSink = _nextProvider.CreateSink(channel);
}
return new ClientIPServerSink(nextSink);
}
public void GetChannelData(IChannelDataStore channelData)
{
}
}
public class ClientIPServerSink :
BaseChannelObjectWithProperties,
IServerChannelSink,
IChannelSinkBase
{
private IServerChannelSink _nextSink;
public ClientIPServerSink(IServerChannelSink next)
{
_nextSink = next;
}
public IServerChannelSink NextChannelSink
{
get { return _nextSink; }
set { _nextSink = value; }
}
public void AsyncProcessResponse(
IServerResponseChannelSinkStack sinkStack,
Object state,
IMessage message,
ITransportHeaders headers,
Stream stream)
{
IPAddress ip = headers[CommonTransportKeys.IPAddress] as IPAddress;
CallContext.SetData("ClientIPAddress", ip);
sinkStack.AsyncProcessResponse(message, headers, stream);
}
public Stream GetResponseStream(
IServerResponseChannelSinkStack sinkStack,
Object state,
IMessage message,
ITransportHeaders headers)
{
return null;
}
public ServerProcessing ProcessMessage(
IServerChannelSinkStack sinkStack,
IMessage requestMsg,
ITransportHeaders requestHeaders,
Stream requestStream,
out IMessage responseMsg,
out ITransportHeaders responseHeaders,
out Stream responseStream)
{
if (_nextSink != null)
{
IPAddress ip =
requestHeaders[CommonTransportKeys.IPAddress] as IPAddress;
CallContext.SetData("ClientIPAddress", ip);
ServerProcessing spres = _nextSink.ProcessMessage(
sinkStack,
requestMsg,
requestHeaders,
requestStream,
out responseMsg,
out responseHeaders,
out responseStream);
return spres;
}
else
{
responseMsg = null;
responseHeaders = null;
responseStream = null;
return new ServerProcessing();
}
}
This is like trying to find out why nobody picks up the phone when you call a friend. And the problem is that his house burned down to the ground. An imperfect view of what is going on is the core issue, especially bad with a service because there is so little to look at.
This can't get better until you use that telephone to talk to the service programmer and get him involved with the problem. Somebody is going to have to debug this. And yes, it will be difficult, failing once every two weeks might not be considered critical enough. Or too long to sit around waiting for it to happen. Only practical thing you can do to help is create a minidump of the process and pass that to the service programmer so he's got something to poke at. If the service runs on another machine then get the LAN admin involved as well.
The issue was due to a deadlock caused in my code, if memory serves I had two locking objects and I locked one from inside the other, essentially making them wait for each other. I was able to determine this by hooking up a debugger to the remote service.