MSDTC failing on first transaction - c#

I have an application that retrieves data and stores into a database once per day. Until recently this application has resided on the same machine as the SQL server but due to some hardware issues with some of the required peripherals, it has been moved to a seperate machine running windows XP.
The problem we are having here, is that when the first transaction of the morning is run, we receive a stack trace of the following:
System.Transactions.TransactionManagerCommunicationException: Communication with the underlying transaction manager has failed. ---> System.Runtime.InteropServices.COMException (0x80004005): Error HRESULT E_FAIL has been returned from a call to a COM component.
However, immediately rerunning the transaction again is successful. It seems as though the MSDTC is taking too long to respond to the first transaction and is thus, failing but is then ready for the second. I have found several references to this occurring on the internet but have found no real solution. Has anyone encountered this? If so, is there a way of preventing MSDTC from unloading from memory or is there another solution for this such as extending time outs?

Thanks guys,
Just to fill you in, we have resolved the issue by changing the dcom config to use the remote coordinator located on the SQL server, so far we have not experienced any further issues.

I recommend you first look in the event logs of all machines involved and see what else is there. You're making an assumption about what's going on. It could be a good assumption, but I suggest you find out before making changes.
I'm also going to start the process of moving this question over to ServerFault, where you'll probably get a faster answer. If it takes too long (five people have to vote), then you may want to go ask the question over there manually. If you do, then indicate that the original (and past the link) is probably on its way.

One thing to look at (and it may not be the cause of your problem), is to make sure that the reverse DNS lookup on the client's IP actually resolves to a name that refers to the client machine. We had problems with our DNS/DHCP setup, where an IP was matched to multiple names. When the remote end of MSDTC tried to connect back to the MSDTC on the client, it was attempting to connect to a different machine.
This will manifest as (seemingly random) transaction timeouts.

Oh dear we have also been facing the same problem. We were migrating data from one database to another (with different structure) and was using Subsonic to speedup the process. We used transactions and SharedDbConnectionScope object and it failed simillarly on the machine running XP SP3. I think there are some updates in SP3 that breaks the things as it is working fine on Vista, 2003 and 2008 Servers.
EDIT: Here is an MSDN KB article that discusses the same problem.

You could probably try running a process which simply initiates and commits a transaction on the DTC every 30 minutes or so?

We had a similar problem in our test environment. The first transaction that occurred after 10 minutes of inactivity failed with error “Communication with the underlying transaction manager has failed”.
After some research we concluded that the MSDTC connection was canceled and could not be established in the required amount of time (it seams like the default timeout for this operation is 4 seconds).
To solve this problem we have increased the length of time that the client computer waits for the bind packet response from the server computer. This is done by adding a key in the registry of client computer: http://support2.microsoft.com/?id=922430

Related

100 % CPU by IIS process by Service Model-4 module after the wcf request has passed

I have a problem I'm fighting for a week now. I have a WCF service running in IIS 8.5 on Windows Server 2012 R2 and a windows service client who is making one or two requests at each 30 seconds. At some point (usually withing two hours of the service running) one of the requests is causing the service app pool (separated from other app pools) process to gain CPU usage. In IIS worker process section can be seen that this request never ends and is hanging in ServiceModel-4 module in AuthenticateRequest state (i.e most likely it is in infinite loop somewhere). At some point another such request is added to the first one, until they become four, staying forever and causing 100 % CPU usage (there are 4 logical processors on the machine). What I did to investigate , fix this problem:
used wcf tracing and custom logging to determine where the problem is. Wcf tracing actually shows all the requests made to the server passed succesfully in milliseconds (!) (at the same time wcf tracing on the client side shows of course time out on the same requests). Custom logging also is showing that the service code is calling returtn of the requested operation. The result of the method are two simple dto objects, so no possible serialization issue and also there are no enpoint behaviors or wathever custom code which is execting before sending reply from the service (except the method code, which, as I mentioned returns successfully).
used iis failed request tracing which shows the request reaching the ServiceModel-4 without continuing with the following information:
ModuleName : ServiceModel-4.0
Notification: AUTHENTICATE_REQUEST
HttpStatus: 500
HttpReason: Internal Server Error
HttpSubStatus: 0
ErrorCode: The operation completed succesfully (0x0)
used Debug Diag for tracing requests continuing more than 10 minutes and saw the threads which are running long time. The stack trace is as follows:
or as follows:
I've seen these are called from iis process. Since thiese are .Net function I suspected first corrupted .Net installation, moreover there were both .Net4.5 and .Net4 installed on the server (which I don't know how exactly could happen). So:
I deinstalled .Net4 and From windows features on/off i turned off .Net4.5 features, restarted and after that i turned them on, restarted, without success
after that I by same way reinstalled the IIS (from Windows features). Again no success.
Does not have any more ideas.
it seems I have found the answer (but havent used Dot Trace or other tools). There was an access to a Generic Dictionary from multiple threads. This seems to be a known problem:
https://blogs.msdn.microsoft.com/tess/2009/12/21/high-cpu-in-net-app-using-a-static-generic-dictionary/
https://blogs.msdn.microsoft.com/asiatech/2009/05/11/asp-net-application-100-cpu-caused-by-system-collections-generic-dictionary/
Actually I noticed this problem in the beggining of the research but ruled it out, because i couldn't reproduce it (probably because I havent't testing the dictionary in iis app, of course I received various exceptions, but not a 100 % Cpu) and mainly because all logs showed that the code, accessing the dictionary has passed, also the stack trace above has nothing to do with the dictionary.
However I think that the problem happened during the serialization of this dictionary (which is data contract) which explains the logged information.
Still cannot explain how this exactly is happening. If anyone can explain it I think it will be a good knowledge for everyone.

ASP application causing Error 500 and causing server to stop frequently

The ASP application running on the sql server is causing to stop the IIS server very frequently. The cause it shows in the Error log is:
"A significant part of sql server process memory has been paged out.This may result in a performance degradation."
Is there any tool which can identify the fault in the web application?
No. You might be able to play with some settings to get your apps to not crash but in the end, if you have reached your bandwidth cap, you are stuck.
There might not actually be any fault in the web application. Both IIS and SQL Server eat a lot of memory. Source, SQL Server eats ram for lunch
There might not be anything wrong, you might just be running too much on one machine. You will have to provide an actual error or problem. Because right now, our only answer can be to leverage the admin tools, and get more memory.
I have found the cause to my problem. For each Url redirection, I used the syntax Response.Redirect("/NewPage.aspx"); and this would continue the process even after creating the child process. The fix was: Response.Redirect("/NewPage.aspx", false); This would terminate the process right after calling a child process. That saved a lot of memory used by each process!

slow performance- IIS or application?

Our team has an application in Android, with a .NET c# backend, hosted in IIS.
Recently, we have observed sudden and unexplainable latencies in our customers with the following scenario:
Without any warning, users are enable to change the channel (Zapping) , since the product has to do with Live Media Streaming, and they can not even log out of the application
The mobile application connected to another backend (still a c# backend) , is working properly, without any problem
After some time (which varies from 6 hours of the first incident, to 5 minutes of the last one), it all turns back to normal.
I have enabled Failed Request Tracing logs, to see if I can get anything from there, and I have results as follows:
<failedRequest url="https://ourDNS.com:443/servertime.aspx"
siteId="1"
appPoolId="DefaultAppPool"
processId="22232"
verb="POST"
remoteUserName=""
userName=""
tokenUserName="NT AUTHORITY\IUSR"
authenticationType="anonymous"
activityId="{80013C53-0802-B500-B63F-84710C7967BB}"
failureReason="TIME_TAKEN"
statusCode="200"
triggerStatusCode="0"
timeTaken="45141"
xmlns:freb="http://schemas.microsoft.com/win/2006/06/iis/freb"
>
The page described above is a simple page, that first gets the server's timezone, and then after getting the customer's timezone (that can be set manually from the client), returns the exact date and time of the device where the application is hosted, for further calculations of stream program, what is playing now etc. However, for this page, that returns a simple JSON with a string in it, it requires some times more than 45 seconds (to me this is insane).
Another log from Client side in the moment is one Exception as above:
java.net.SocketTimeoutException
at java.net.PlainSocketImpl.read(PlainSocketImpl.java:491)
at java.net.PlainSocketImpl.access$000(PlainSocketImpl.java:46)
at java.net.PlainSocketImpl$PlainSocketInputStream.read(PlainSocketImpl.java:240)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:103)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:191)
at org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:82)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:174)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:180)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:235)
at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:259)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:279)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:428)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:555)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:487)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:465)
at com.framework.utilityframe.webhelper.HttpRequest.getHttpResponse(HttpRequest.java:316)
at com.framework.utilityframe.webhelper.HttpRequest.httpRequest(HttpRequest.java:393)
at com.tibo.webtv.web.TiboLog.logBufferingError(TiboLog.java:319)
at com.tibo.webtv.CustomVideoView$Buffering_Problem.doInBackground(CustomVideoView.java:324)
at com.tibo.webtv.CustomVideoView$Buffering_Problem.doInBackground(CustomVideoView.java:307)
at android.os.AsyncTask$2.call(AsyncTask.java:287)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:305)
at java.util.concurrent.FutureTask.run(FutureTask.java:137)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1076)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:569)
at java.lang.Thread.run(Thread.java:856)
Reading through different forums, I have seen different causes of performance leaks, starting from database to IIS and even a misconfiguration of the application. I have discarded database as a cause because:
At the moment of the problem, database parameters were absolutely fine, no changes in queries time execution, no waiting tasks, no locking
Secondly, the mobile and Decoder application connect to the same database, and the mobile application is running just fine with the same queries
Now, if I think of IIS, every Application hosted at that AppPool, was running fine and without delays, but still there may be something I am missing over there
And at least, something that makes me suspicious is the fact that the mobile application differs in two ways with the Decoder application:
First, the mobile application takes the responses from the Backend in XML format, the Decoder uses JSON.
Second,the mobile application uses http requests, and the Decoder uses https (SSL)
If anyone has experienced similar issues, their help would be greatly appreciated. And for any other detail you need, just ask and I will provide.
So,
Today, our team made another test, which included :
Application hosted in one server and database in another
Application and database hosted in a completely different server (Azure environment)
In both cases, the result was the same: Latencies and problem at the service.
The problem was neither at the backend nor the server. First, the Java application by mistake executed Sync Tasks when saving the logs to another server(dedicated, with full potential to keep as much data as you can give). Second, the log server had a full HDD, with more than 1 TB of only DB Logs, so when the application executed those Sync Tasks (which came as the first call, before any interaction with the channels), they received the Socket exceptions. So, maybe for someone else who may see this post: PLEASE,ALWAYS CHECK YOUR TASKS IN YOUR APPLICATION,AND ALWAYS CHECK ANY SERVER RELATED TO YOUR APPLICATION!!! Thank you very much :D

How to debug a C# service that stops for no reason on production server

We have implemented a pair of services in C# that send and receive faxes. These services have been running flawlessly for several years on several servers - until last week.
One of our clients upgraded to Windows Server 2012. We installed the services and all hellbroke loose.
Basically, one of the services appears to work for several minutes, and then, for some unknown reason - goes to the OnStop method. So someone, or something - is stopping it, but I don't know what it is.
How could I go about debugging this? I am new to C# and this is not my code.
Any help would be appreciated.
Is interesting the fact you are sending and receiving faxes: It colud be related to some Session 0 Insolation introduced with windows server 2008/2012, that could cause problem in graphic related services.
If you have some chanches to run the server on a developement machine, using a Windows7/8 box and a SYSTEM user, you can probably reproduce the problem.
If it only stops on the production server, it is reasonable that there is something different about the production server than your development server/workstation.
It is probably unlikely that you're allowed to hook a debugger into something on the production server, but the best way to handle this is just to log the he** out of the code.
You should introduce enough logging to figure out:
Where it stops
Why it stops (my money is on an exception)
The state of the application at that time (related to the crash)
This will probably have to be done in iterations, unless you go all out to begin with.
Services and logging go hand in hand, so just implement it.

How to debug the application after deployed in IIs?

Hai All,
I'm developing the application using .net 2008 and Oracle 10g as database. I have deployed the application in IIS, now when two users get logged into the same applicaion, same page at a time getting error as
"*Connection must be open for this operation.Cannot access a disposed object.object name: 'Oracle.DataAccess.Client.OralceConnection'.Connection must be open for this operation"*
Plz give a solution to solve this multiuser issue..
Thanks in Advance!!!!
The easiest way to look into what's happening on IIS is to deploy a debug build, connect to the machine the server is on, and run the CLR debugger. Of course, this is only really practical in a staging rather than live scenario (or you have dozens or even thousands of people hitting the breakpoint, and of course the whole thing freezes up while you are stepping through).
This case sounds a bit like you might have a connection object statically scoped, or otherwise shared between threads, rather than created as needed on each thread of execution. It's the sort of thing sometimes seen if someone tries to manually pool connection objects (which is pointless, indeed counter-productive, as the underlying connector objects are pooled for you).

Categories

Resources