I've written a C#/WPF/Windows 10 application for a client that, when started, runs beautifully - it connects to another app over the network, begins data acquisition, as the data comes in it processes it and displays charts and tables to the user. All is well.
And then Something Bad Happens.
At some point, for no reason I've been able to determine, there's a spike in CPU usage and then the program becomes totally GC-bound. The client sent me this screenshot:
The client is running the app in a completely different environment than what I have (he has several hundred thousand dollars of actual data acquisition hardware, I have a simulator) and is several thousand miles away.
When the problem happens, the client is willing to let me take remote control over WebEx and run the debugger, but since the problem is so infrequent I only get called in once the problem has happened - and by then it's too late. Often times Visual Studio is so overwhelmed by the memory situation that it gives up when asked to do a memory snapshot or even a stack trace (the client has full source so can do full debugging).
Initially I thought this was a memory leak, so for key classes I have code that keeps WeakReferences for each object that is created, and a regular process that scans those lists every 2 minutes, reports back how many objects of each are in existence, and writes that data to a log file. The log file usually starts getting pretty wonky when the problem starts happening, so the log file data I have doesn't show anything abnormal.
I have also run the ANTS Performance Profiler and Memory Profiler on the application - on my computer. Neither one show anything unexpected happening.
QUESTION
Since this only happens on the client machine, and only happens infrequently, is there something I can put in my code to (1) detect the GC becoming unhappy, and (2) log what it is that is making the GC unhappy?
Related
A 32-bit C# software that interact with some machines and PLCs, after a while that runs, starts giving problem, freeze, and eventually crash.
Once the program is restarted, it start working normally, and then crash again after +/- 2h.
After collecting some dumps: one at the start, one after 1h, and the last right before it crashes, I found that in the heap-momory is always increasing in %.
Using WinDBG, I found no recognized deadlocks, but heap memory reach >95% on all gens apart from the gen0, right before it start to hang:
I tried to load all the strings content using !sosex.strings, and apart from some class attributes, I got loads of Windows Schemas links in memory (on the left, the count of times that the string has been found in memory):
Unfortunately that software can't be realeased in a 64-bit version, due to some libraries restrictions. The machine on where the software runs is an high-end consumer pc.
Does anybody have any clue on how I can identify a possible memory leak, and how to act on it?
Some managment on disposing objects that where stuck in memory has been alredy performed, giving some relevant results, but still is not enogh.
I have a fairly simple application that takes a data file with about 500k lines of data in it, parses the data, organizes it and then inserts it into an Azure Table. There are about 2000 of these files and I need the process to work smoothly as I load all of the data.
I am using WindowsAzure.Storage v5.0.2 for inserting the data and Microsoft.tpl.dataflow v4.4.24 for parallelization. Each file is completely processed and all tasks finalized before I move onto the next file. I am also disposing of all objects I can and setting everything else to Null at the end of each file load.
Despite trying to be as careful as possible the RAM usage goes up steadily until it crashes the process. When it starts it jumps up to 1 GB of RAM used and steadily climbs until the process crashes somewhere above 9GB of RAM consumed.
Note - this is targeting x64 on a reasonably large computer. Garbage collection is happening on a regular basis, but it doesn't seem to affect the memory problem.
At this point I am completely confused about where the memory leak is coming from and don't have any idea about how to diagnose the problem.
Update
After a lot of work and following the suggestion below I found out that the parallelization I was using was allowing more simultaneous insert processes than I expected. It looked like the insert was complete and my code was starting the next insert. In reality the parallel process had just reported back a status but had not finished. This led to a huge backlog of simultaneous inserts happening, chewing up RAM and crashing the system. It also cause me to go past my IOPS limit which I believe triggered throttling, compounding the problem.
Figuring this out required a huge amount of work and many different ways of analysing everything, but the suggestion below got me going in the right direction.
Doing a search for 'troubleshoot .net memory leak' will yield lots of results, but probably the best one is http://blogs.msdn.com/b/tess/archive/2009/05/12/debug-diag-script-for-troubleshooting-net-2-0-memory-leaks.aspx.
Basically, use DebugDiag to generate a memory leak analysis report and then look for which objects are consuming all of the memory. You will likely see one type of object that you didn't realize you were continuously adding to a collection without removing it later on.
createWindowEx failed exception is thrown by my server which is using overbyteICS dll in .net C# windowsforms.
I have a server which handles large number of clients throughout the day. But when the total connections(i.e Connection and disconnections altogether) count reaches to 10000 the above error appears and the server stops accepting user connections and also hangs the machine.
I agree with Roger, but let's confirm it first - When this error occurs, run SPY++ from MicrosoftVisualStudio\Tools in the start Menu and look through the window tree. Expand the branches and look for duplicates of some windows. Surely there will be many of them, but you are interested in hundreds and thousands of copies. If you hit that, then it's what Roger said... ...and there's almost no solution other that periodically restarting the connection-server process (or whole machine, just in case) just to be sure it doesnt hang (of course, server restart will irritate the users almost as much..), or fixing/patching/reimplementing the connection-server process to be more resource-friendly..
Note that while opening a hidden window per single connection is a very wasteful approach, it still shuold not hang the machine. It simply should drop the connections that it cannot handle. Here, it seems it has no limits implemented at all, which is a bug.
edit: on pre-NT (i.e. win9x) the limit is hardcoded. On NT class systems, you can try to tweak the pool:
http://weblogs.asp.net/israelio/archive/2007/02/07/max-num-of-open-windows-under-xp-2003-vista-resolved.aspx
but still, I'd consider that as a last restort, as problem will return when number of connection rises again. First, try to ping the server developers to fix that permanently..
You diagnosed it well. Yes, a CreateWindowEx() failure and 10,000 belong together. 10,000 is the default user32 object quota for a process. In other words, a single process isn't allowed to create more than 10,000 windows. This is a counter-measure against apps that leak window handles, a very common bug. The total number of windows that can be created in a session is a limited resource, having one process consume them all would cause outright failure, you couldn't shut down Windows anymore.
Clearly it is not a leak in your case. You can find temporary relief by changing a registry setting, HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows\USERProcessHandleQuota. Reboot to make it effective.
Increasing from 10,000 to the maximum of 18,000 should be okayish if the machine doesn't otherwise run processes that require a lot of windows. Something you can see with Taskmgr.exe, Processes tab. Choose View + Select Columns and tick USER objects. Also tick GDI objects and Handles while you are at it, other resources that have a quota.
Long term, this behavior does not scale well. You'll need to find the code that creates a window handle for every web request and fix it.
I have a website doing some things that I've never seen before. My server is Win 2003 w/ IIS6 I'm using C# and .Net 4.0.
The site is a real-estate website that stores the data directly in my db. The site will run great for a little while and then just die. What I mean is you'll try to view a property's details and it will take the site 2-3 minutes to load, if it loads at all. If I simply resave the web.config file and reupload it to restart the app, it runs just fine for a little while and then will die again. This continues over and over. I've gone to the local copy while the live site has "died" and the local copy will run just fine and then it will die after so long as well. The time frame that it takes varies from 5 minutes to 30 minutes, i believe it has something to do with the number of requests.
Anyone have any clue as to what might be happening? The only the data query on the page is to pull the main data which is the LINQ query below:
public Listing GetListingByMLNumber(string MLNumber)
{
try
{
DatabaseDataContext db = new DatabaseDataContext();
var item = (from a in db.Listings
where a.ML_.ToLower() == MLNumber.ToLower()
select a).FirstOrDefault();
return item;
}
catch (Exception ex)
{
Message = ex.Message;
return null;
}
}
Not closing the database context stands out as the obvious error in the code you provided. Wrap it in a using statement to be sure it gets disposed correctly.
As long as the context lives, you will hold on to a sql connection, which is a limited resource. You will also waste memory by change-tracking the entities you returned. Given your code the context should be garbage collected at some point, but it might still be the problem (And, whether or not this is the problem, you should dispose your database contexts).
Try load testing locally to see if you can reproduce the problem. If you can, then use the debugger to figure out the problem. If not, you probably need to add logging to narrow down the problem.
You could also look at the IIS process to see if it uses absurd amounts of memory, handles, etc. Also check IIS settings for performance and application pool recyling as suggested in another answer here.
I would take a look at the application pool settings to see how the worker processes are being recycled, and I would also look under the Performance tab in IIS to see if there's a bandwidth threshold specified.
If you ever hit this type of problem again then your should add DebugDiag/ADPlus and WinDBG to your diagnostic toolbelt.
When your application hangs again or is taking an exceedingly long time to respond to requests then grab a dump of the worker process using DebugDiag or ADPlus. Load this up into WinDBG, load up SOS (Son of Strike) which is a WinDBG extension for managed code debugging and start digging around.
Tess Ferrandez has a great set of tutorials and labs on how to use these tools effectively:
.NET Debugging Demos - Information and setup instructions
They've gotten me out of a few pickles several times and it's well worth spending the time familiarising yourself with them.
I've been experiencing a high degree of flicker and UI lag in a small application I've developed to test a component that I've written for one of our applications. Because the flicker and lag was taking place during idle time (when there should--seriously--be nothing going on), I decided to do some investigating. I noticed a few threads in the Threads window that I wasn't aware of (not entirely unexpected), but what caught my eye was one of the threads was set to Highest priority. This thread exists at the time Main() is called, even before any of my code executes. I've discovered that this thread appears to be present in every .NET application I write, even console applications.
Being the daring soul that I am, I decided to freeze the thread and see what happened. The flickering did indeed stop, but I experienced some oddness when it came to doing database interaction (I'm using SQL CE 3.5 SP1). My thought was that this might be the thread that the database is actually running on, but considering it's started at the time the application loads (before any references to the DB) and is present in other, non-database applications, I'm inclined to believe this isn't the case.
Because this thread (like a few others) shows up with no data in the Location column and no Call Stack listed if I switch to it in the debugger while paused, I tried matching the StartAddress property through GetCurrentProcess().Threads for the corresponding thread, but it falls outside all of the currently loaded modules address ranges.
Does anyone have any idea what this thread is, or how I might find out?
Edit
After doing some digging, it looks like the StartAddress is in kernel32.dll (based upon nearby memory contents). This leads me to think that this is just the standard system function used to start the thread, according to this page, which basically puts me back at square one as far as determining where this thread actually comes from. This is further confirmed by the fact that ALL of the threads in this list have the same value for StartAddress, leading me to ask exactly what the purpose is...?
Edit 2
Process Explorer let me to an actually meaningful start address. It looks like it's mscorwks.dll!CreateApplicationContext+0xbbef. This dll is in %WINDOWS%\Microsoft.NET\Framework\v2.0.50, so it looks like it's clearly a runtime assembly. I'm still not sure why
it's Highest priority
it appears to be causing hiccups in my application
You could try using Sysinternals. Process Explorer let's you dig in pretty deep. Right click on the Process to access Properties. Then "Threads" tab. In there, you can see the thread's stack and module.
EDIT:
After asking around some, it seems that your "Highest" priority thread is the Finalizer thread that runs due to a garbage collection. I still don't have a good reason as to why it would constantly keep running. Maybe you have some funky object lifetime behavior going on in your process?
I'm not sure what this is, but if you turn on unmanaged debugging, and set up Visual Studio with the Windows symbol server, you might get some more clues.
Might be the Garbage Collector thread. I noticed it too when I was once investigating a finalizer-related bug. Perhaps your system memory is low and the GC is trying to collect all the time? This was the case in the previously mentioned bug too. I couldn't reproduce it on my machine, but a co-worker of mine had a machine with less RAM where it would reappear like clockwork.