Intermittent application hang on startup in Windows Store App - c#

I am developing a Windows Store application. Currently, I am getting intermittent hangs as described in this blog post. The issue appears to be that not enough space is given to remainder-defined column widths and TextBlocks attempting to format themselves (possibly due to the ellipsis processing). My app tends to hang indefinitely when this happens.
The question I have less related to how to solve the issue (as it seems to be described fairly well in the blog post), but instead how to find the issues. I have one fairly regularly (approximately one in five or ten start-ups) on a Hub Page, so I've been looking through there (as it's the most notable instance of issue), but it's a true Heisenbug in that it never seems to happen when debugging (or when you look for it).
So, how do I find the offending code? Is there just a pattern I need to look for (ColumnWidth="*"?). Is there a simpler way to solve this, such as changing the base style to remove one of the possibly offending properties listed in the blog post?
It seems possible that this is being caused by another issue, but this seems to be the most likely/plausible as of right now (as with the hubs I have a similar situation to what is being described there).
Also, is there a way to track when this happens in the wild? MSFT provides crash dumps on hangs, but they seem to give little to no information in them at all (and on top of that they only appear 5 days after they happen, which is less than ideal).
Thanks!

This is a complicated question to answer.
First, I think you have identified a real problem with WinRT. You theorize that the layout subsystem seems busy calculating your layout, and based on some condition that occurs around 20% of the time it does not finish in any reasonable time. Reasonable guess.
The problem, then, is when such an event does not occur during debug. In my personal development experience, errors that do not occur in debug are 99.99% timing related. Something is not finishing before a second process begins. Debugging lets those first, long process finish.
This is a real computer science question, and not so much a WinRT or Windows 8 question. To that end, the best answer I can give you without any code samples (why no code samples?) is the typical approach I employ when I reach the same dilemma. I hope it helps, at least a little.
Start with your brain.
I have always joked with developers just how much debugging can be done outside the debugger - and in your mind. Mentally walking the pipeline of your app and looking for race-condition dependencies that might cause deadlocks. Believe it or not, this solves a lot of problems a debugger could never catch - because debuggers unwind timing dependencies.
Next is simplicity.
The more complex the problem the less likely you will find the culprit. In the case of a XAML application, I tend to remove or disable value converters first. Then, I look to remove data templates. If you have element bindings, those go next. If simplifying the XAML does help - that's just the beginning to figuring it out. If it doesn't, things just got easier.
Your code behind can be disabled with just a few keystrokes and found guilty or innocent. It's the most likely place for your problem, I find, and the reason we work so hard to keep it simple, clean, and minimal. After that, there's the view model. Though it's not impossible for your view model to be the one, and indeed you still have to check, it's probably not the root of your evil.
Lastly, there's the app pipeline that loads your page, loads your data, or does anything else. Step by step your only real option is to slowly remove things from your app until you don't see the problem. Removing the problem, though is not solving it. That's a case by case thing based on your app and the logic in it. Reality is, you might see the problem leave when removing XAML, while the real problem is in the view model or elsewhere.
What am I really saying? The silver bullet you are asking for really isn't there. There are several Microsoft tools and even more third party tools to look for bottlenecks, latency problems, slow code, and stuff - but in all reality, the scenario you describe is plain ole programming. I am not saying you aren't the victim of a bug. I'm saying, with the information we have, this is all I can do for you.
You'll get it.
Third thing to do is to add logging, and instrumentation to your app.
Best of luck.

Given that Jerry has answered this at a higher level I figured I would add in the lower level answers that from the way your question is phrased makes me think you are interested in. I guess first I would like to address the last item which is the dump files. There is a mechanism for getting dump files of a process 'in the wild' that Microsoft provides which is through Windows Error Reporting. If you are wanting to collect dump files from failed client processes you could sign up for Windows Error Reporting (I must admit I have never actually done it, but I did look into it and tried to get my current employer to allow me to do this, but it didn't end successfully). To sign up go to the Establish a Hardware/Desktop Account Page.
As far as what to do with dump files once you get them, you would be wanting to download the debugging tools for windows (part of the Windows SDK download) and/or the Debug Diag Tool (I must confess I am more of a debugging tools for windows user than a Debug Diag user). These will provide you with the tools to look into what is going on at a lower level. Obviously you can only go so far as you won't have access to private Microsoft symbols, but you do have access to public symbols and usually those are enough to give you a pretty good idea of the problem area.
Your primary tools will depend on how reproducible the issue is. If it is only reproducible on some client machines then you will have to rely on looking at a single dump file that you probably got a hold of from Windows Error Reporting. In this case what I would do is open it up using the appropriate version of Windbg (either x86 or x64) and look at what was going on at the time the dump was taken. Depending on how savvy you are depends on how far you can go. Probably a simple starter would be to run
.symfix
.reload
.loadby sos clr
!EEStack
This will load Microsoft public symbols, the sos extension dll for dealing with Managed code inspection, and then will dump the contents of the stack for each thread in the process. From looking at the names of the method that appear on the call stacks you might be able to get a pretty good idea of at least the area of the code where the lock is occuring.
You can go much farther than this as Windbg provides the ability to go pretty deep into deadlock analysis (for instance there is an extension available for Windbg called sosex that provides a command !dlk which can sometimes automate the detection of a deadlock for you from a single dump file. To load an extension dll into Windbg you just have to download it and then call .load fullpathtodll). If the problem is reproducible locally you might even be more successful with WPA/WPR or if you are really fortunate a simple procmon trace. These tools do have a pretty decent barrier to entry as they take some time to learn. But if you are really interested in the topic your best resources would be the Defrag Tools series on Channel9 and anything by Mario Hewardt (especially his book "Advanced .Net Debugging"). Again, getting familiar with these tools can take a bunch of time, but at the very least if you just know how to dump the contents of the stacks from a dump file you can sometimes get what you need just from that so a basic understanding of these tools can be beneficial as well.

Related

I've found a bug in the JIT/CLR - now how do I debug or reproduce it?

I have a computationally-expensive multi-threaded C# app that seems to crash consistently after 30-90 minutes of running. The error it gives is
The runtime has encountered a fatal error. The address of the error was at 0xec37ebae, on thread 0xbcc. The error code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.
(0xc0000005 is the error-code for Access Violation)
My app does not invoke any native code, or use any unsafe blocks, or even any non-CLS compliant types like uint. In fact, the line of code that the debugger says caused the crash is
overallLength += distanceTravelled;
Where both values are of type double
Given all this, I believe the crash must be due to a bug in the compiler or CLR or JIT. I'd like to figure out what causes it, or at the very least write a smaller reproduction to send into Microsoft, but I have no idea where to even begin. I've never had to view the CIL-binary, or the compiled JIT output, or the native stacktrace (there is no managed stacktrace at the time of the crash), so I'm not sure how. I can't even figure out how to view the state of all the variables at the time of the crash (VS unfortunately won't tell me like it does after managed-exceptions, and outputting them to console/a file would slow down the app 1000-fold, which is obviously not an option).
So, how do I go about debugging this?
[Edit] Compiled under VS 2010 SP1, running latest version of .Net 4.0 Client Profile. Apparently it's ".Net 4.0C/.Net 4.0E, .Net CLR 1.1.4322"
I'd like to figure out what causes it, or at the very least write a smaller reproduction to send into Microsoft, but I have no idea where to even begin.
"Smaller reproduction" definitely sounds like a great idea here... even if "smaller" won't mean "quicker to reproduce".
Before you even start, try to reproduce the error on another machine. If you can't reproduce it on another machine, that suggests a whole different set of tests to do - hardware, installation etc.
Also, check you're on the latest version of everything. It would be annoying to spend days debugging this (which is likely, I'm afraid) and then end up with a response of "Yes, we know about this - it was a bug in .NET 4 which was fixed in .NET 4.5" for example. If you can reproduce it on a variety of framework versions, that would be even better :)
Next, cut out everything you can in the program:
Does it have a user interface at all? If possible, remove that.
Does it use a database? See if you can remove all database access: definitely any output which isn't used later, and ideally input too. If you can hard code the input within the app, that would be ideal - but if not, files are simpler for reproductions than database access.
Is it data-sensitive? Again, without knowing much about the app it's hard to know whether this is useful, but assuming it's processing a lot of data, can you use a binary search to find a relatively small amount of data which causes the problem?
Does it have to be multi-threaded? If you can remove all the threading, obviously that may well then take much longer to reproduce the problem - but does it still happen at all?
Try removing bits of business logic: if your app is componentized appropriately, you can probably fake out whole significant components by first creating a stub implementation, and then simply removing the calls.
All of this will gradually reduce the size of the app until it's more manageable. At each step, you'll need to run the app again until it either crashes or you're convinced it won't crash. If you have a lot of machines available to you, that should help...
tl;dr Make sure you're compiling to .Net 4.5
This sounds suspiciously like the same error found here. From the MSDN page:
This bug can be encountered when the Garbage Collector is freeing and compacting memory. The error can happen when the Concurrent Garbage Collection is enabled and a certain combination of foreground Garbage Collection and background Garbage Collection occurs. When this situation happens you will see the same call stack over and over. On the heap you will see one free object and before it ends you will see another free object corrupting the heap.
The fix is to compile to .Net 4.5. If for some reason you can't do this, you can also disable concurrent garbage collection by disabling gcConcurrent in the app.config file:
<configuration>
<runtime>
<gcConcurrent enabled="false"/>
</runtime>
</configuration>
Or just compile to x86.
WinDbg is your friend:
http://blogs.msdn.com/b/tess/archive/2006/02/09/net-crash-managed-heap-corruption-calling-unmanaged-code.aspx
http://www.codeproject.com/Articles/23589/Get-Started-Debugging-Memory-Related-Issues-in-Net
http://www.codeproject.com/Articles/22245/Quick-start-to-using-WinDbg
Download Debug Diagnostic Tool v1.2
Run program
Add Rule "Crash"
Select "Specific Process"
on page Advanced Configuration set your exception if you know on which exception it fails or just leave this page as is
Set userdump location
Now wait for process to crash, log file is created by DebugDiag. Now activate tab Advanced Analysis, select Crash/Hang Analyzers in top list and dump file in lower list and hit Start Analysis. This will generate html report for you. Hopes you found usefull info in that report. If you have problem with analyze, upload html report somewhere and place url here so we can focus on it.
My app does not invoke any native code, or use any unsafe blocks, or
even any non-CLS compliant types like uint
You may think this, but threading, synchronization via semaphore, mutex it any handles all are native. .net is a layer over operating system, .net itself does not support pure clr code for multithreading apps, this is because OS already does it.
Most likely this is thread synchronization error. Probably multiple threads are trying to access shared resource like file etc that is outside clr boundary.
You may think you aren't accessing com etc, but when you call certain API like get desktop folder path etc it is called through shell com API.
You have following two options,
Publish your code so that we can review the bottleneck
Redesign your app using .net parallel threading framework, which includes variety of algorithms requiring CPU intensive operations.
Most likely programs fail after certain period of time as collections grow up and operations fail to execute before other thread interfere. For example, producer consumer problem, you will not notice any problem till producer will become slower or fail to finish its operation before consumer kicks in.
Bug in clr is rare, because clr is very stable. But poorly written code may lead error to appear as bug in clr. Clr can not and will never detect whether the bug is in your code or in clr itself.
Did you run a memory test for your machine as the one time I had comparable symptoms one of my dimms turned out to be faulty (a very good memorytester is included in Win7; http://www.tomstricks.com/how-to-test-your-ram-or-memory-with-windows-memory-diagnostic-tool-in-windows-7/)
It might also be a heating/throttling issue if your CPU gets too hot after this period of time. Although that would happen sooner imho.
There should be a dumpfile that you can analyze. If you never did this find someone who did, or send that to microsoft
I will suggest you open a support case via http://support.microsoft.com immediately, as the support guys can show you how to collect the necessary information.
Generally speaking, like #paulsm4 and #psulek said, you can utilize WinDbg or Debug Diag to capture crash dumps of the process, and within it, all necessary information is embedded. However, if this is the very first time you use those tools, you might be puzzled. Microsoft support team can provide you step by step guidance on them, or they can even set up a Live Meeting session with you to capture the data, as the program crashes so often.
Once you are familiar with the tools, in the future you can perform similar troubleshooting more easily,
http://blogs.msdn.com/b/lexli/archive/2009/08/23/when-the-application-program-crashes-on-windows.aspx
BTW, it is too early to say "I've found a bug". Though you cannot obviously find in your program a dependency on native code, it might still have a dependency on native code. We should not draw a conclusion before debugging further into the issue.

What is the best way to debug performance problems?

I'm writing a plug-in for another program in C#.NET, and am having performance issues where commands take a lot longer then I would. The plug-in reacts to events in the host program, and also depends on utility methods of the the host program SDK. My plug-in has a lot of recursive functions because I'm doing a lot of reading and writing to a tree structure. Plus I have a lot of event subscriptions between my plugin and the host application, as well as event subscriptions between classes in my plug-in.
How can I figure out what is taking so long for a task to complete? I can't use regular breakpoint style debugging, because it's not that it doesn't work it's just that it's too slow. I have setup a static "LogWriter" class that I can reference from all my classes that will allow me to write out timestamped lines to a log file from my code. Is there another way? Does visual studio keep some kind of timestamped log that I could use instead? Is there someway to view the call stack after the application has closed?
You need to use profiler. Here link to good one: ANTS Performance Profiler.
Update: You can also write messages in control points using Debug.Write. Then you need to load DebugView application that displays all your debug string with precise time stamp. It is freeware and very good for quick debugging and profiling.
My Profiler List includes ANTS, dotTrace, and AQtime.
However, looking more closely at your question, it seems to me that you should do some unit testing at the same time you're doing profiling. Maybe start by doing a quick overall performance scan, just to see which areas need most attention. Then start writing some unit tests for those areas. You can then run the profiler while running those unit tests, so that you'll get consistent results.
In my experience, the best method is also the simplest. Get it running, and while it is being slow, hit the "pause" button in the IDE. Then make a record of the call stack. Repeat this several times. (Here's a more detailed example and explanation.)
What you are looking for is any statement that appears on more than one stack sample that isn't strictly necessary. The more samples it appears on, the more time it takes. The way to tell if the statement is necessary is to look up the stack, because that tells you why it is being done.
Anything that causes a significant amount of time to be consumed will be revealed by this method, and recursion does not bother it.
People seem to tackle problems like this in one of two ways:
Try to get good measurements before doing anything.
Just find something big that you can get rid of, rip it out, and repeat.
I prefer the latter, because it's fast, and because you don't have to know precisely how big a tumor is to know it's big enough to remove. What you do need to know is exactly where it is, and that's what this method tells you.
Sounds like you want a code 'profiler'. http://en.wikipedia.org/wiki/Code_profiler#Use_of_profilers
I'm unfamiliar with which profilers are the best for C#, but I came across this link after a quick google which has a list of free open-source offerings. I'm sure someone else will know which ones are worth considering :)
http://csharp-source.net/open-source/profilers
Despite the title of this topic I must argue that the "best" way is subjective, we can only suggest possible solutions.
I have had experience using Redgate ANTS Performance Profiler which will show you where the bottlenecks are in your application. It's definitely worth checking out.
Visual Studio Team System has a profiler baked in, its far from perfect, but for simple applications you can kind of get it to work.
Recently I have had the most success with EQATECs free profiler, or rolling my own tiny profiling class where needed.
Also, there have been quite a few questions about profilers in that past see: http://www.google.com.au/search?hl=en&q=site:stackoverflow.com+.net+profiler&btnG=Google+Search&meta=&aq=f&oq=
Don't ever forget Rico Mariani's advice on how to carry out a good perf investigation.
You can also use performance counter for asp.net applications.

Not enough storage is available to process this command in VisualStudio 2008

When I try to compile an assembly in VS 2008, I got (occasionally, usually after 2-3 hours of work with the project) the following error
Metadata file '[name].dll' could not be opened --
'Not enough storage is available to process this command.
Usually to get rid of that I need to restart Visual Studio
The assembly I need to use in my project is BIG enough (> 70 Mb) and probably this is the reason of that bug, I've never seen some thing like this in my previous projects. Ok, if this is the reason my question is why this happens and what I need to do to stop it.
I have enough of free memory on my drives and 2Gb RAM (only ~1.2 Gb are utilized when exception happens)
I googled for the answers to the questions like this.
Suggestions usually related to:
to the number of user handlers that is limited in WinXP...
to the physical limit of memory available per process
I don't think either could explain my case
For user handlers and other GUI resources - I don't think this could be a problem. The big 70Mb assembly is actually a GUI-less code that operates with sockets and implements parsers of a proprietary protocols. In my current project I have only 3 GUI forms, with total number of GUI controls < 100.
I suppose my case is closer to the fact that in Windows XP the process address space is limited with 2 GB memory (and, taking into account memory segmentation, it is possible that I don't have a free segment large enough to allocate a memory).
However, it is hard to believe that segmentation could be so big after just 2-3 hours of working with the project in Visual Studio. Task Manager shows that VS consumes about 400-500 Mb (OM + VM). During compilation, VS need to load only meta-data.
Well, there are a lot of classes and interfaces in that library, but still I would expect that 1-2 Mb is more then enough to allocate metadata that is used by compiler to find all public classes and interfaces (though it is only my suggestion, I don't know what exactly happens inside CLR when it loads assembly metadata).
In addition, I would say that entire assembly size is so big only because it is C++ CLI library that has other um-managed libraries statically linked into one DLL. I estimated (using Reflector) that .NET (managed) code is approx 5-10% of this assembly.
Any ideas how to define the real reason of that bug? Are there any restrictions or recommendations as to .NET assembly size? (Yes I know that it worth thinking of refactoring and splitting a big assembly into several smaller pieces, but it is a 3rd party component, and I can't rebuilt it)
The error is misleading. It really should say "A large enough contiguous space in virtual memory could not be found to perform the operation". Over time allocations and deallocations of virtual memory space leads to it becoming fragmented. This can lead to situations where a large allocation cannot be filled despite there being a plenty total space available.
I think this what your "segmentation" is refering to. Without knowing all the details of everything else that needs to load and other activity which occupies the 2-3 hour period its difficult to say whether this really is the cause. However I would not put it into the category of unlikely, in fact it is the most likely cause.
In my case the following fix helped:
http://confluence.jetbrains.net/display/ReSharper/OutOfMemoryException+Fix
As Anthony pointed out, the error message is a bit misleading. The issue is less about how big your assembly is and more about how much contiguous memory is available.
The problem is likely not really the size of your assembly. It's much more likely that something inside of Visual Studio is fragmenting memory to the point that a build cannot complete. The usual suspects for this type of problem are
Too many projects in the solution.
Third party add-ins
If you have more than say 10 projects in the solution. Try breaking up the solution and see if that helps.
If you have any 3rd party addins, try disabling them one at a time and seeing if the problem goes away.
I am getting this error on one of my machines and surprisingly, this problem is not seen on other dev machines. May be something wrong with VS installation.
But I found an easier solution.
If I delete the .suo file of teh solution and re-open the solution again, it will start working smoothly.
Hope this will be useful for somebody in distress..
If you are just interested to make it work then restart your computer and it will work like a charm. I Had same kind of error in my application and then after reading all of the answer here at stackoverflow, I decided to first restart my computer before doing any other modifications. And it saved me a lot of time.
Another cause for this problem can be using too many typed datasets via the designer. or other types that can be instaniated via a designer like lots of databound controls on lots of forms.
I imagine your the sort of hardcore programmer though who wouldn't drag n' drop a DS! :D
in relation to your problem, Bogdan, have you tried to reproduce the problem w/o your c++ component loaded? If you can't then maybe its this. How are you loading the component? have you tried other techniques like late binding, etc? any difference?
Additional:
Yes you are right, the other culprits are lots of controls on the form. I once saw this same issue with a dev that had imported a very VB6 app over to .net. he had literally 100's of forms. He would get periodic crashing of the IDE after a couple of hours. I'm pretty sure it was thread exhaustion. It might be worth setting up a vanilla box w/ no addins loaded just to rule addins out, but my guess is you are just hitting the wall in terms of a combined limiation of VS and your box specs. Try running Windows Vista 64bit and install some extra RAM modules.
If memory usage and VM size is small for devenv.
Explicitly kill "ALL" instances of devenv.exe running.
I had 3 devenv.exe running where as I had two instances of Visual studion opened in front.
That was solution in my case.
I know it has been a long time since this was commented on but I ran into this exact issue today with a telerik dll in VS2010. I had never seen this issue before until today when I was making some setting changes in IE.
There is a setting in Tools/Folder Option/View in the Files and Folders section called "Launch folder windows in a separate process".
I am not sure the amount of memory used for each window when using this setting but until today I have never had this checked. After checking this option for misc reasons I started getting the "not enough storage is available to process this command". The telerik dll is an 18mb dll that we are using located in our library folder as a reference in our project.
Unchecking this resolved the problem.
Just passing along as another possible solution
I also faced the same problem.
Make sure that the windows os is with 64bit.
I switched to windows 64bit from windows 32bit. I problem got solved.
I had this same issue and in my case, the exception name was very misleading. The actual problem was that the DLL couldn't be loaded at all due to invalid path. The exception i was getting said "
I used DllImport attribute in C#, ASP.NET application with declaration like below and it was causing the exception:
[DllImport(#"Calculation/lib/supplier/SupplierModule.dll", CallingConvention = CallingConvention.StdCall, CharSet = CharSet.Ansi, EntryPoint = "FunctionName")]
Below is working code snippet:
[DllImport(#"Calculation\lib\supplier\SupplierModule.dll", CallingConvention = CallingConvention.StdCall, CharSet = CharSet.Ansi, EntryPoint = "FunctionName")]
The actual problem was using forward slashes in path, instead of back slashes. This cost me way too much to figure out, hope this will help others.

Hide a C# program from the task manager?

Is there any way to hide a C# program from the Windows Task Manager?
EDIT:
Thanks for the overwhelming response! Well I didn't intend to do something spooky. Just wanted to win a bet with my friend that I can do it without him noticing. And I'm not a geek myself to be able to write a rootkit, as someone suggested though I'd love to know how to do it.
Not that I'm aware of - and there shouldn't be. The point of the task manager is to allow users to examine processes etc.
If the user should be able to do that, they should be able to find your program. If they shouldn't be poking around in Task Manager, group policy should prevent that - not your program.
Don't mean to zombie this but i thought i could contribute some useful information
If you want to hide a application there a two methods (that i can think of atm).
They both have their ups and downs
[1] SSDT Table hooking - basically you have to set the MDL of the table to writeable, overwrite the address of NtQuerySystemInformation (iirc) with the address of your function and have it call the original function after filtering the results.
This method doesn't suit your needs very well because the hooking function would always need to be in memory and would involve writing a kernel mode driver. Its a fun thing to do but debugging is a pain because an exception means a BSOD.
[2] Direct Kernel Object Manipulation (DKOM) - the list of processes is a doubly linked list, with a kernel mode driver you can alter the pointers of the records above and below your process to point around yours. This still requires the use of a kernel mode driver but there are rootkits such as FU that can be easily downloaded that contain an exe and the service. The exe could be called from inside your application as a child process (in the released version of FU, at least the one I found, there was a bug which I had to fix where if the hidden application exited the computer would BSOD, it was a trivial fix).
This will thankfully be caught by almost any decent antivirus so if you are trying to do something sneaky you'll have to learn to get around that (hint: they use a binary signature)
I have not used method 1 ever but method 2 has worked for me from a VB.Net application.
A third possible option is to just create the application as a windows service, this will show up in task manager by default but I'm willing to bet that there is a way to tell it to not show up there since there are plenty of other services which don't show up in task manager.
Hope I helped a little, my advice is that if you are interested in this kind of stuff to learn C++.
You could make your program a service and then it would appear as "svchost". There's a little more to it than that, but that should give you a hint to go in the right direction.
I'm not aware of any way to hide it from the task manager, but you could just disguise it by making it show up as "svchost.exe". It'll get lumped in with all the others (there's usually several), and will become indistinguishable.
You shouldn't hide it, but you could prevent the user from killing the process.
See Chris Smith's answer to this question.

How to debug a deadlock?

Other than that I don't know if I can reproduce it now that it's happened (I've been using this particular application for a week or two now without issue), assuming that I'm running my application in the VS debugger, how should I go about debugging a deadlock after it's happened? I thought I might be able to get at call stacks if I paused the program and hence see where the different threads were when it happened, but clicking pause just threw Visual Studio into a deadlock too till I killed my application.
Is there some way other than browsing through my source tree to find potential problems? Is there a way to get at the call stacks once the problem has occured to see where the problem is? Any other tools/tips/tricks that might help?
What you did was the correct way. If Visual Studio also deadlocks, that happens now and then. It's just bad luck, unless there's some other issue.
You don't have to run the application in the debugger in order to debug it. Run the application normally, and if the deadlock happens, you can attach VS later. Ctrl+Alt+P, select the process, choose debugger type and click attach. Using a different set of debugger types might reduce the risk of VS crashing (especially if you don't debug native code)
A deadlock involves 2 or more threads. You probably know the first one (probably your UI thread) since you noticed the deadlock in your application. Now you only need to find the other one. With knowledge of the architecture, it should be easy to find (e.g. what other threads use the same locks, interact with the UI etc)
If VS doesn't work at all, you can always use windbg. Download here: http://www.microsoft.com/whdc/devtools/debugging/default.mspx
I'd try different approaches in the following order:
First, inspect the code to look for thread-safety violations, making sure that your critical regions don't call other functions that will in turn try to lock a critical region.
Use whatever tool you can get your hands on to visualize thread activity, I use an in-house perl script that parses an OS log we made and graphs all the context switches and shows when a thread gets pre-empted.
If you can't find a good tool, do some logging to see the last threads that were running before the deadlock occurred. This will give you a clue as to where the issue might be caused, it helps if the locking mechanisms have unique names, like if an object has it's own thread, create a dedicated semaphore or mutex just to manage that thread.
I hope this helps. Good luck!
You can use different programs like Intel(R) Parallel Inspector:
http://software.intel.com/en-us/intel-parallel-inspector/
Such programs can show you places in your code with potential deadlocks. However you should pay for it, or use it only evaluation period. Don't know if there is any free tools like this.
Just like anywhere, there're no "Silver bullet" tools to catch all the deadlocks. It is all about the sequence in which different threads aquire resources so your job is to find out where the order was violated. Usually Visual Studio or other debugger will provide stack traces and you will be able to find out where the discrepancy is. DevPartner Studio does provide deadlock analysis but last time I've checked there were too many false positives. Some static analysis tools will find some potential deadlocks too.
Other than that it helps to get the architecture straight to enforce resource aquisition order. For example, layering helps to make sure upper level locks are taken before lower ones but beware of callbacks.

Categories

Resources