How to track "freezing" app, finding its source? - c#

I have a server with 3 threads and a threadpool for recieved data processing. The only locks (reader and writer) Im using are for client connection lists.
Sometimes the main form freezes for a second and I cannot find the problem. The form doesnt do any hard work, thats for different threads.
I wanted to ask whather there is not any way how to track this "freezing" down? Any help is very appreciated,thanks!

You could run a profiler on your app to try and help isolate the problem.
I've been playing around with EQATEC Profiler, it looks like a really good utility and is completely free. It shows you some really useful stats like the time spent in each method. If you are armed with this information it should go a long way to tracking down your problem.
I haven't tried it on a multi-threaded app yet, so I'm not sure how it handles different threads. But it's worth a shot as (like I said) it's completely free (BSD license) and easy to use.

Is application freezing when you are running it in the Debug mode?
I have experienced a similar behavior myself and when testing outside VS environment (Debug and Release builds) the "hiccups" were gone.

Related

Origin of short-lived threads in an application

I am currently health checking an application which experiences UI stuttering during heavy usage.
Using Microsoft Concurrency Visualizer extension for Visual Studio 2015, showed that quite a lot of short-lived threads are created and stopped after ~100ms of execution.
Unfortunately, their displayed callstack is like clr.dll!0x98071 ntdll.dll!0x634fb and I am not quite sure how to extract useful information out of it.
I have no clue what is the purpose of those threads and which part of the code in the application is creating them.
How can I better identify where each one of them gets started?
In the code, I was able to grep a handful of Tasks, another of QueueUserWorkItems, several dozens of plain Thread instantiations, some System.Threading.Timer & System.Timers.Timer, no Reactive Extensions. I put breakpoints for all of them but it seems I am missing some...
I don't think those are from the threadpool because they would be displayed in synchronisation state in concurrency visualizer, instead they just end, and another one with another Id gets created later. But maybe I am misleading.
We also use a few third-party libs and a bunch of JuggerNET generated code, so maybe the origin is not even in the application itself.
I was finally able to find the culprit of those short-lived threads by looking closely at some of the cryptic callstacks, which included for instance:
mmdevapi.dll
wdmaud.drv
avrt.dll
audioses.dll
It led me to thinking that I should double check the sound alert system. It was indeed this one which spawned those threads.
Note:
I will not accept my answer however because I would like someone to share a better process or any kind of tips and tricks for diagnosing unwanted threads origin.

How to unload unsued COM objects/libraries after a complete restart?

Here is the thing. I'm connecting via COM to some devices at KNX/EIB. But sometimes - and I want to be ready for worst-case anyways - my application crashes leaving all objects and libraries exposed somewhere, somehow. I noticed when I restart the app I have trouble to get a connection again. I get an error for a connection procedure that is actually working well normally. Sometimes this connect procedure is working sometimes it is not, randomly. That is bad! After some time (several minutes) it seems to work again after a series of complete fails. But I think I see a pattern now. It doesn't work after a crash with no clean disconnect. My guess is there are objects that hold a connection to the device that us why I can't get a new connection. This is why I ask this question.
Question:
How do I unload those unused objects to kill undead connections?
How do I make Windows to check for unused libraries to be unloaded?
I just want to tell Windows, "I messed up badly and I need to continue my work. Please clean up my mess for me, so I can start fresh! Do I deserve a 2nd chance?"
Edit:
The scenario is the app has crashed and closed. I have no references to anything anymore. No finally clause or anything. The app can only be started again. What can I do to clean up the mess that has been made before, programmatically?
Edit 2:
Hans gave me the hint of killing the responsible server. So for now I solve that with calling taskkill on startup (at least as long I'm in dev). And it works!
C:\Windows\System32\taskkill.exe /F /IM Falcon.exe
This is the failure mode of an out-of-process COM server. If the client program crashes to the desktop without releasing the interface pointers then the server is completely unaware that the client isn't around anymore. And tends to get balky when you try to reconnect, many servers just permit one client.
By far the most common way that programmers induce this failure mode is by using a debugger. They'll click the Red Button or use the Stop Debugging command. Bam, no cleanup of course.
COM garbage-collects unused servers automatically. But that isn't particularly fast, takes an easy 10 minutes before it decides it needs to step in. And doesn't always work for every server, Office programs notoriously don't get cleaned-up for example.
Not much you can do about this when your app keels over in regular usage. Otherwise the kind of problem that killed middle-ware. Still, having such a mishap in a C# program is pretty unusual, the CLR releases interface pointers at program termination even when the app crashed with an exception. You'd have to have the very nasty kind of mishaps to bypass this, critical exceptions like ExecutionEngineException or the one this site is named after.
Don't focus too much on the Stop Debugging induced failures, it is normal and using Task Manager to kill the server is expected and required. Otherwise just be sure to get the nasty bugs out of your code and you won't have a problem. If you need more help then be sure to contact the owner of the server, be sure to have a small repro project available that demonstrates the issue.

WeakReferences are not freed in embedded OS

I've got a strange behavior here:
I get a massive memory leak in production running a WPF application that runs on a DLOG-Terminal (Windows Embedded Standard SP1) that behaves perfectly fine if I run it localy on a normal desktop (Win7 prof.)
After many unsucessful attempts to find any problem I put one of those directly beside my monitor, installed the ANTs MemoryProfiler and did one hour test run simulating user operations on both the terminal and my development PC.
Result is, that due to some strange reasons the embedded system piles up a huge amount of WeakReference and EffectiveValueEntry[] Objects.
Here are are some pictures:
Development (PC):
And the terminal:
Just look at the class list...
Has anyone seen something like this before and are there known solutions to this?
Where can I get help?
(PS the terminals where installed with images prepared for .net4)
PPS: for the close-voter: I think the question is clear: how can I fix this.
You could argue if this is a IT/OS problem vs. a programming problem but I think if I post this in Server Fault it will get a off-topic close in no time...
UPDATE:
I was able to find a big portion of the problem - but it feels a bit like C++:
I use a ViewModel-like Items class for a WPF-List that provides (among others) a ICommand (RelayCommand-pattern). The Items where created on the fly in the getter of a ViewModel-Property for the view and it seems that the application/GC did never free those unused commands - or the subscribtions to their CanExecuteChanged - the memory profiler shows those as "held by a weak reference". I changed my code to reuse those item-viewmodels and Dispose/set to null every used properties in their Dispose and use this too as clean up - as I said: feels like "delete" in those old C++ days.
On top of this I use a forced GC.Collect every 30mins (yeah I know - you never should - but I got no other solution till now).
With this setup the applications runs for 6+ hours without problems so far but it don't feel right.
I cannot understand why those WeakReferences are not claimed as they are on my desktop machine...
Any thoughts on this? Please!
UPDATE:
I am still not able to pin down this problem but I see a strange behavior:
If I use PC-Anywhere to observe the operation of my software on one of the terminals the problem goes away!
Even after running 8hr. straight the software runs as it should - it will even free memory (I put a little memorycounter-display in the main-screen - let's say I connect to the terminal and see that memory is low - after waiting a few minutes the memory is reclaimed)
So I think Devin (one Answer below) has a lead in the right direction - something in the Remote-Control software unblocks the finalizer-thread or whatever is blocking the GC - be it the simulated keyboard/mouse or whatever.
Any thoughts on this?
We had a (somewhat) similar issue running my app on a tablet. The memory would be reclaimed when run on a desktop, but not when run on a tablet or some other device that used a PC Input panel. The problem is that the finalization queue was getting stuck. The COM object finalizer was waiting to run something on the main thread, which didn't have a message loop.
The solution was to find an adequate time to invoke Application.DoEvents(). We had a method that would be called intermittently and we invoke it with every 10th call. I don't know if this is the same issue you are having, but maybe it can shed some light.
EDIT: I do need to make it clear, in general, calling DoEvents() is a bad idea. It works in that case because there isn't any UI on that thread or anything else happening that those events can interfere with.
From the screenshots it is interesting to see that the LOH grows at the same time the used space does not grow much. The free space is growing a lot at the LOH which indicates memory fragmentation due to pinned objects. This looks like a stuck finalizer thread which does prevent the cleanup of managed objects. You should get a memory dump and check in which method the finalizer thread was stuck. You can do this quite easy with Windbg.

What can make a .NET app freeze the computer?

I know this is probably the canonical "It depends..." question but I'd appreciate any pointers as to where to start looking.
I have a client/server app talking over ethernet. In one computer I run the server and a client and on another just the client. One runs Vista and one runs XP. After an uptime of about 3 weeks the entire computer freezes and nothing works, not mouse, not keyboard, nothing -just power off. Every ten seconds the server sends a ping message to see if the clients are alive, other than that just a few small messages go back and forth every day.
I'm trying to find out if it's me causing it or something else. I've started a session and after a few days I thought I'd check for strange increases in memory use but beyond that I have very few ideas.
Some thoughts to consider:
You know the computer doesn't respond, but that doesn't mean it's hung. Does it respond to a ping?
Maybe the disk activity light is on all the time?
You say "no keyboard" - do you mean no caps lock or num lock lights?
Although the .NET application may be the only one you're running at the time, that does not imply it is the cause of the problem. Some background job could be doing it.
For example, I notice that Retrospect backup, when it is creating a snapshot, freezes the entire system for 10-15 minutes. I mean, no caps lock, the clock in the task bar doesn't update, no CTRL-ALT-DEL, can't type into an "Answer" text box in SO, nothing. It had nothing to do with what I was doing at the time, which was answering a question on SO.
After it came back, SO asked if I was a human. My feelings were hurt. ;-)
You could attach a kernel debugger to the OS. That way you should be able to inspect the state of the OS and your process even if the OS is completely unresponsive. (Unfortunately, it's a lot harder than just hitting "break" in VS. I suggest reading John Robbin's "Debugging Applications for .NET and Windows" before trying that.)
You could also try to create memory dumps of your application in regular intervals. You might have to do a little scripting for that, though. (usually, you'd create a dump with a keystroke, using a tool like userdump or adplus, but if the OS is not responding to keystrokes, that won't work.) That way, you know what state your process is in during or shortly before a hang.
This page: http://blogs.msdn.com/debuggingtoolbox/default.aspx is a good starting point for scripting WinDbg. (If you don't know what to do with a memory dump, I'd again suggest John Robbin's excellent book on debugging!)
Other than that, I can only think of standard debugging tricks: does the problem occur on every PC? Does it happen if there are no client requests? Does it happen sooner if there are more client requests? Does it happen sooner if there is less available physical memory? Try removing parts of your application (maybe on a separate server for testing) and see if the problem still occurs, and so on. Try running it in a VM so you can see if it uses the CPU, harddisk, or network during those "hangs".
This isn't going to be the answer, but I'd advise starting by checking your OS event logs and running a perfmon to keep track of memory, cpu usage etc.
Which computer freezes, the server or client? And what OSes are they running respectively?
As Daniel L noted, tight polling loops can really kill the CPU. If you can, change your code to use event handlers, it's a much more robust solution.
Finally, are you certain there's not a hardware problem on the freezing computer?

How to debug a deadlock?

Other than that I don't know if I can reproduce it now that it's happened (I've been using this particular application for a week or two now without issue), assuming that I'm running my application in the VS debugger, how should I go about debugging a deadlock after it's happened? I thought I might be able to get at call stacks if I paused the program and hence see where the different threads were when it happened, but clicking pause just threw Visual Studio into a deadlock too till I killed my application.
Is there some way other than browsing through my source tree to find potential problems? Is there a way to get at the call stacks once the problem has occured to see where the problem is? Any other tools/tips/tricks that might help?
What you did was the correct way. If Visual Studio also deadlocks, that happens now and then. It's just bad luck, unless there's some other issue.
You don't have to run the application in the debugger in order to debug it. Run the application normally, and if the deadlock happens, you can attach VS later. Ctrl+Alt+P, select the process, choose debugger type and click attach. Using a different set of debugger types might reduce the risk of VS crashing (especially if you don't debug native code)
A deadlock involves 2 or more threads. You probably know the first one (probably your UI thread) since you noticed the deadlock in your application. Now you only need to find the other one. With knowledge of the architecture, it should be easy to find (e.g. what other threads use the same locks, interact with the UI etc)
If VS doesn't work at all, you can always use windbg. Download here: http://www.microsoft.com/whdc/devtools/debugging/default.mspx
I'd try different approaches in the following order:
First, inspect the code to look for thread-safety violations, making sure that your critical regions don't call other functions that will in turn try to lock a critical region.
Use whatever tool you can get your hands on to visualize thread activity, I use an in-house perl script that parses an OS log we made and graphs all the context switches and shows when a thread gets pre-empted.
If you can't find a good tool, do some logging to see the last threads that were running before the deadlock occurred. This will give you a clue as to where the issue might be caused, it helps if the locking mechanisms have unique names, like if an object has it's own thread, create a dedicated semaphore or mutex just to manage that thread.
I hope this helps. Good luck!
You can use different programs like Intel(R) Parallel Inspector:
http://software.intel.com/en-us/intel-parallel-inspector/
Such programs can show you places in your code with potential deadlocks. However you should pay for it, or use it only evaluation period. Don't know if there is any free tools like this.
Just like anywhere, there're no "Silver bullet" tools to catch all the deadlocks. It is all about the sequence in which different threads aquire resources so your job is to find out where the order was violated. Usually Visual Studio or other debugger will provide stack traces and you will be able to find out where the discrepancy is. DevPartner Studio does provide deadlock analysis but last time I've checked there were too many false positives. Some static analysis tools will find some potential deadlocks too.
Other than that it helps to get the architecture straight to enforce resource aquisition order. For example, layering helps to make sure upper level locks are taken before lower ones but beware of callbacks.

Categories

Resources