I've got a strange behavior here:
I get a massive memory leak in production running a WPF application that runs on a DLOG-Terminal (Windows Embedded Standard SP1) that behaves perfectly fine if I run it localy on a normal desktop (Win7 prof.)
After many unsucessful attempts to find any problem I put one of those directly beside my monitor, installed the ANTs MemoryProfiler and did one hour test run simulating user operations on both the terminal and my development PC.
Result is, that due to some strange reasons the embedded system piles up a huge amount of WeakReference and EffectiveValueEntry[] Objects.
Here are are some pictures:
Development (PC):
And the terminal:
Just look at the class list...
Has anyone seen something like this before and are there known solutions to this?
Where can I get help?
(PS the terminals where installed with images prepared for .net4)
PPS: for the close-voter: I think the question is clear: how can I fix this.
You could argue if this is a IT/OS problem vs. a programming problem but I think if I post this in Server Fault it will get a off-topic close in no time...
UPDATE:
I was able to find a big portion of the problem - but it feels a bit like C++:
I use a ViewModel-like Items class for a WPF-List that provides (among others) a ICommand (RelayCommand-pattern). The Items where created on the fly in the getter of a ViewModel-Property for the view and it seems that the application/GC did never free those unused commands - or the subscribtions to their CanExecuteChanged - the memory profiler shows those as "held by a weak reference". I changed my code to reuse those item-viewmodels and Dispose/set to null every used properties in their Dispose and use this too as clean up - as I said: feels like "delete" in those old C++ days.
On top of this I use a forced GC.Collect every 30mins (yeah I know - you never should - but I got no other solution till now).
With this setup the applications runs for 6+ hours without problems so far but it don't feel right.
I cannot understand why those WeakReferences are not claimed as they are on my desktop machine...
Any thoughts on this? Please!
UPDATE:
I am still not able to pin down this problem but I see a strange behavior:
If I use PC-Anywhere to observe the operation of my software on one of the terminals the problem goes away!
Even after running 8hr. straight the software runs as it should - it will even free memory (I put a little memorycounter-display in the main-screen - let's say I connect to the terminal and see that memory is low - after waiting a few minutes the memory is reclaimed)
So I think Devin (one Answer below) has a lead in the right direction - something in the Remote-Control software unblocks the finalizer-thread or whatever is blocking the GC - be it the simulated keyboard/mouse or whatever.
Any thoughts on this?
We had a (somewhat) similar issue running my app on a tablet. The memory would be reclaimed when run on a desktop, but not when run on a tablet or some other device that used a PC Input panel. The problem is that the finalization queue was getting stuck. The COM object finalizer was waiting to run something on the main thread, which didn't have a message loop.
The solution was to find an adequate time to invoke Application.DoEvents(). We had a method that would be called intermittently and we invoke it with every 10th call. I don't know if this is the same issue you are having, but maybe it can shed some light.
EDIT: I do need to make it clear, in general, calling DoEvents() is a bad idea. It works in that case because there isn't any UI on that thread or anything else happening that those events can interfere with.
From the screenshots it is interesting to see that the LOH grows at the same time the used space does not grow much. The free space is growing a lot at the LOH which indicates memory fragmentation due to pinned objects. This looks like a stuck finalizer thread which does prevent the cleanup of managed objects. You should get a memory dump and check in which method the finalizer thread was stuck. You can do this quite easy with Windbg.
Related
I have a server application rewrite underway and am puzzled by the memory usage of the application. The earlier version was written with TcpListener while the new one is plain old Socket. This is mostly for performance and stability reasons which are secondary to this question and even this issue.
As mentioned, everything is heavily async'd with AcceptAsync, SendAsync, and ReceiveAsync. On top of that, I use ThreadPool.QueueUserWorkItem for utility tasks such as the initial kick-off for AcceptAsync and keeping the next AcceptAsync queued, the call after processing to write back to the Socket, and the call cleaning up disconnected clients. Further, there are a series of events that I fire with BeginInvoke and EndInvoke.
The detection for those disconnects as well as the main driver for data availability are handled by a custom class that I call AvailabilityNotifier that peaks on a ReceiveAsync as well as detecting for SocketAsyncEventArgs.BytesTransferred being zero which fires a Disconnect event.
The throughput of the application is good, and there's almost zero (relatively speaking) lock contention thanks to a healthy usage of System.Collections.Concurrent objects. However, it clings to memory like a predator clinging to a kill.
I've debugged to verify my internal collections are getting cleared, the client sockets are being shutdown and disposed of, and utilizing a buffer pool instead of creating new buffers for each read. Running a test application that ultimately performs 1,000 connections (100 concurrent) and sends/receives 100,000 messages bloats the server process memory to around 800 MB and it never goes down even after Windows clears any TIME_WAITs that might have happened. I know for sure the diposal code is firing thanks to a ton of ObjectDisposedException and null exception catch blocks that you can see in the linked github below.
I say all that without quoted code as it's quite long for a post here so here's a github: https://github.com/hoagsie/TcpServer. The Program.cs and ClientProgram.cs is provided as well if you'd want to run it yourself but the main action is in NetworkServer.cs and AvailabilityNotifier.cs. The sample server I have running also has a WCF service it talks to but is just the standard WCF project with literally no modifications. I just needed it to match a sample scenario.
I'm also not sure if it matters on some level, but I do build this in x64 mode rather than AnyCPU/x86. This is mostly for resource consumption opportunity on the target server it will be going on, but I haven't noticed a difference in behavior with regard to this issue in either x86 or x64.
EDIT:
A coworker pointed out the Snapshot tool in Visual Studio. I had never seen this before and it displayed things differently from what I had been using, which was dotTrace. It pointed to a ton of allocations around the SocketAsyncEventArgs object which makes sense, but they kept building and building. I looked at its member list again and discovered it had a Dispose method. My issue has gone away. I didn't realize that was an IDisposable object.
A coworker pointed out the Snapshot tool in Visual Studio. I had never seen this before and it displayed things differently from what I had been using, which was dotTrace. It pointed to a ton of allocations around the SocketAsyncEventArgs object which makes sense, but they kept building and building. I looked at its member list again and discovered it had a Dispose method. My issue has gone away. I didn't realize that was an IDisposable object.
Here is the thing. I'm connecting via COM to some devices at KNX/EIB. But sometimes - and I want to be ready for worst-case anyways - my application crashes leaving all objects and libraries exposed somewhere, somehow. I noticed when I restart the app I have trouble to get a connection again. I get an error for a connection procedure that is actually working well normally. Sometimes this connect procedure is working sometimes it is not, randomly. That is bad! After some time (several minutes) it seems to work again after a series of complete fails. But I think I see a pattern now. It doesn't work after a crash with no clean disconnect. My guess is there are objects that hold a connection to the device that us why I can't get a new connection. This is why I ask this question.
Question:
How do I unload those unused objects to kill undead connections?
How do I make Windows to check for unused libraries to be unloaded?
I just want to tell Windows, "I messed up badly and I need to continue my work. Please clean up my mess for me, so I can start fresh! Do I deserve a 2nd chance?"
Edit:
The scenario is the app has crashed and closed. I have no references to anything anymore. No finally clause or anything. The app can only be started again. What can I do to clean up the mess that has been made before, programmatically?
Edit 2:
Hans gave me the hint of killing the responsible server. So for now I solve that with calling taskkill on startup (at least as long I'm in dev). And it works!
C:\Windows\System32\taskkill.exe /F /IM Falcon.exe
This is the failure mode of an out-of-process COM server. If the client program crashes to the desktop without releasing the interface pointers then the server is completely unaware that the client isn't around anymore. And tends to get balky when you try to reconnect, many servers just permit one client.
By far the most common way that programmers induce this failure mode is by using a debugger. They'll click the Red Button or use the Stop Debugging command. Bam, no cleanup of course.
COM garbage-collects unused servers automatically. But that isn't particularly fast, takes an easy 10 minutes before it decides it needs to step in. And doesn't always work for every server, Office programs notoriously don't get cleaned-up for example.
Not much you can do about this when your app keels over in regular usage. Otherwise the kind of problem that killed middle-ware. Still, having such a mishap in a C# program is pretty unusual, the CLR releases interface pointers at program termination even when the app crashed with an exception. You'd have to have the very nasty kind of mishaps to bypass this, critical exceptions like ExecutionEngineException or the one this site is named after.
Don't focus too much on the Stop Debugging induced failures, it is normal and using Task Manager to kill the server is expected and required. Otherwise just be sure to get the nasty bugs out of your code and you won't have a problem. If you need more help then be sure to contact the owner of the server, be sure to have a small repro project available that demonstrates the issue.
I have a server with 3 threads and a threadpool for recieved data processing. The only locks (reader and writer) Im using are for client connection lists.
Sometimes the main form freezes for a second and I cannot find the problem. The form doesnt do any hard work, thats for different threads.
I wanted to ask whather there is not any way how to track this "freezing" down? Any help is very appreciated,thanks!
You could run a profiler on your app to try and help isolate the problem.
I've been playing around with EQATEC Profiler, it looks like a really good utility and is completely free. It shows you some really useful stats like the time spent in each method. If you are armed with this information it should go a long way to tracking down your problem.
I haven't tried it on a multi-threaded app yet, so I'm not sure how it handles different threads. But it's worth a shot as (like I said) it's completely free (BSD license) and easy to use.
Is application freezing when you are running it in the Debug mode?
I have experienced a similar behavior myself and when testing outside VS environment (Debug and Release builds) the "hiccups" were gone.
I've been experiencing a high degree of flicker and UI lag in a small application I've developed to test a component that I've written for one of our applications. Because the flicker and lag was taking place during idle time (when there should--seriously--be nothing going on), I decided to do some investigating. I noticed a few threads in the Threads window that I wasn't aware of (not entirely unexpected), but what caught my eye was one of the threads was set to Highest priority. This thread exists at the time Main() is called, even before any of my code executes. I've discovered that this thread appears to be present in every .NET application I write, even console applications.
Being the daring soul that I am, I decided to freeze the thread and see what happened. The flickering did indeed stop, but I experienced some oddness when it came to doing database interaction (I'm using SQL CE 3.5 SP1). My thought was that this might be the thread that the database is actually running on, but considering it's started at the time the application loads (before any references to the DB) and is present in other, non-database applications, I'm inclined to believe this isn't the case.
Because this thread (like a few others) shows up with no data in the Location column and no Call Stack listed if I switch to it in the debugger while paused, I tried matching the StartAddress property through GetCurrentProcess().Threads for the corresponding thread, but it falls outside all of the currently loaded modules address ranges.
Does anyone have any idea what this thread is, or how I might find out?
Edit
After doing some digging, it looks like the StartAddress is in kernel32.dll (based upon nearby memory contents). This leads me to think that this is just the standard system function used to start the thread, according to this page, which basically puts me back at square one as far as determining where this thread actually comes from. This is further confirmed by the fact that ALL of the threads in this list have the same value for StartAddress, leading me to ask exactly what the purpose is...?
Edit 2
Process Explorer let me to an actually meaningful start address. It looks like it's mscorwks.dll!CreateApplicationContext+0xbbef. This dll is in %WINDOWS%\Microsoft.NET\Framework\v2.0.50, so it looks like it's clearly a runtime assembly. I'm still not sure why
it's Highest priority
it appears to be causing hiccups in my application
You could try using Sysinternals. Process Explorer let's you dig in pretty deep. Right click on the Process to access Properties. Then "Threads" tab. In there, you can see the thread's stack and module.
EDIT:
After asking around some, it seems that your "Highest" priority thread is the Finalizer thread that runs due to a garbage collection. I still don't have a good reason as to why it would constantly keep running. Maybe you have some funky object lifetime behavior going on in your process?
I'm not sure what this is, but if you turn on unmanaged debugging, and set up Visual Studio with the Windows symbol server, you might get some more clues.
Might be the Garbage Collector thread. I noticed it too when I was once investigating a finalizer-related bug. Perhaps your system memory is low and the GC is trying to collect all the time? This was the case in the previously mentioned bug too. I couldn't reproduce it on my machine, but a co-worker of mine had a machine with less RAM where it would reappear like clockwork.
I know this is probably the canonical "It depends..." question but I'd appreciate any pointers as to where to start looking.
I have a client/server app talking over ethernet. In one computer I run the server and a client and on another just the client. One runs Vista and one runs XP. After an uptime of about 3 weeks the entire computer freezes and nothing works, not mouse, not keyboard, nothing -just power off. Every ten seconds the server sends a ping message to see if the clients are alive, other than that just a few small messages go back and forth every day.
I'm trying to find out if it's me causing it or something else. I've started a session and after a few days I thought I'd check for strange increases in memory use but beyond that I have very few ideas.
Some thoughts to consider:
You know the computer doesn't respond, but that doesn't mean it's hung. Does it respond to a ping?
Maybe the disk activity light is on all the time?
You say "no keyboard" - do you mean no caps lock or num lock lights?
Although the .NET application may be the only one you're running at the time, that does not imply it is the cause of the problem. Some background job could be doing it.
For example, I notice that Retrospect backup, when it is creating a snapshot, freezes the entire system for 10-15 minutes. I mean, no caps lock, the clock in the task bar doesn't update, no CTRL-ALT-DEL, can't type into an "Answer" text box in SO, nothing. It had nothing to do with what I was doing at the time, which was answering a question on SO.
After it came back, SO asked if I was a human. My feelings were hurt. ;-)
You could attach a kernel debugger to the OS. That way you should be able to inspect the state of the OS and your process even if the OS is completely unresponsive. (Unfortunately, it's a lot harder than just hitting "break" in VS. I suggest reading John Robbin's "Debugging Applications for .NET and Windows" before trying that.)
You could also try to create memory dumps of your application in regular intervals. You might have to do a little scripting for that, though. (usually, you'd create a dump with a keystroke, using a tool like userdump or adplus, but if the OS is not responding to keystrokes, that won't work.) That way, you know what state your process is in during or shortly before a hang.
This page: http://blogs.msdn.com/debuggingtoolbox/default.aspx is a good starting point for scripting WinDbg. (If you don't know what to do with a memory dump, I'd again suggest John Robbin's excellent book on debugging!)
Other than that, I can only think of standard debugging tricks: does the problem occur on every PC? Does it happen if there are no client requests? Does it happen sooner if there are more client requests? Does it happen sooner if there is less available physical memory? Try removing parts of your application (maybe on a separate server for testing) and see if the problem still occurs, and so on. Try running it in a VM so you can see if it uses the CPU, harddisk, or network during those "hangs".
This isn't going to be the answer, but I'd advise starting by checking your OS event logs and running a perfmon to keep track of memory, cpu usage etc.
Which computer freezes, the server or client? And what OSes are they running respectively?
As Daniel L noted, tight polling loops can really kill the CPU. If you can, change your code to use event handlers, it's a much more robust solution.
Finally, are you certain there's not a hardware problem on the freezing computer?