C# How to investigate a OutOfMemoryException in production - c#

We have a WCF service developed in C# running in a production environment where it crashes every few hours with no observable pattern. Memory usage will hover at ~250mb for a while, then all of a sudden memory usage starts going up until it crashes with an OutOfMemoryException at 4gb (it's a 32bit process).
We have a hard time identifying the problem, our exceptions logged are from different places in the code, presumably from another request trying to use some memory and it receive the exception.
We have taken a memory dump when the process is at 4gb and a list of ~750k database objects is in memory when the crash occurs. We have looked up the queries of those said objects but can't pinpoint the one that loads up the entire table. The service make calls to the database using EF6.
Another thing to note, this problem never occured in our preproduction environment. The data in the database is sufficient in our preproduction environment for this to occur, if it were to load the entire table also. It's probably a specific call with a specific parameter that triggers this issue, but we can't pinpoint it.
I am out of ideas what to try next to solve our issues. Is there a tool that can help us in this situation ?
Thanks

If you want to capture all your SQL and are using Entity, you can print out queries like this
Context.Database.Log = s => Debug.Print(s);
If you mess around with that a bit you can get it to output to a variable and save the result to text file or Db. You would have to wrap it around all Db calls-not sure how big your project is?
Context.Database.Log = null;
turns it off

Related

Strategies for Diagnosing Memory Leak in C# program updating an Azure Table

I have a fairly simple application that takes a data file with about 500k lines of data in it, parses the data, organizes it and then inserts it into an Azure Table. There are about 2000 of these files and I need the process to work smoothly as I load all of the data.
I am using WindowsAzure.Storage v5.0.2 for inserting the data and Microsoft.tpl.dataflow v4.4.24 for parallelization. Each file is completely processed and all tasks finalized before I move onto the next file. I am also disposing of all objects I can and setting everything else to Null at the end of each file load.
Despite trying to be as careful as possible the RAM usage goes up steadily until it crashes the process. When it starts it jumps up to 1 GB of RAM used and steadily climbs until the process crashes somewhere above 9GB of RAM consumed.
Note - this is targeting x64 on a reasonably large computer. Garbage collection is happening on a regular basis, but it doesn't seem to affect the memory problem.
At this point I am completely confused about where the memory leak is coming from and don't have any idea about how to diagnose the problem.
Update
After a lot of work and following the suggestion below I found out that the parallelization I was using was allowing more simultaneous insert processes than I expected. It looked like the insert was complete and my code was starting the next insert. In reality the parallel process had just reported back a status but had not finished. This led to a huge backlog of simultaneous inserts happening, chewing up RAM and crashing the system. It also cause me to go past my IOPS limit which I believe triggered throttling, compounding the problem.
Figuring this out required a huge amount of work and many different ways of analysing everything, but the suggestion below got me going in the right direction.
Doing a search for 'troubleshoot .net memory leak' will yield lots of results, but probably the best one is http://blogs.msdn.com/b/tess/archive/2009/05/12/debug-diag-script-for-troubleshooting-net-2-0-memory-leaks.aspx.
Basically, use DebugDiag to generate a memory leak analysis report and then look for which objects are consuming all of the memory. You will likely see one type of object that you didn't realize you were continuously adding to a collection without removing it later on.

Cannot open the shared memory region error

I have a user reporting this error when they're using my application.
The application is a .NET Winforms application running on Windows XP Embedded, using SQL Server CE 3.5 sp1, and Linq-To-SQL as the ORM. The database itself is located in a subdirectory my application creates in the My Documents folder. The user account is an adminstrator account on the system. There are no other applications or processes connecting to the database.
For the most part, the application seems to run fine. It starts up, can load data from and save data to the database. The user is using the application to access the database maybe a couple hundred times a day. They get this error, but only intermittently. Maybe 3-4 times a day.
In the code itself, all of the calls to the database are using a Linq-To-SQL data context that's wrapped in a using clause. So in other words:
using(MyDataContext db = new MyDataContext(ConnectionString))
{
List<blah> someList = db.SomeTable.Where(//selection criteria).ToList();
return(someList);
}
That's what pretty much all of the calls to the database look like (with the exception that the ones that save data obviously aren't selecting and returning anything). As I mentioned before, they have no issue 99% of the time but only get the shared memory error a few times a day.
My current "fix" is on application startup I simply read all of the data out of the database (there's not a lot) and cache it in memory and converted my database calls to read from the in-memory lists. So far, this seems to have fixed the problem. For a day and a half now they've reported no problems. But this is still bugging me, because I don't know what would cause the error in the first place.
While the application is accessing the database a few hundred times a day, it's typically not in rapid-fire succession. It's usually once every few minutes at the least. However, there is one use-case where there might be two calls one right after the other, as fast as possible. In other words, something like:
//user makes a selectio n on the screen
DatabaseCall1();
DatabaseCall2();
Both of those would follow the pattern in the code block above where they create a new context, do work, and then return. But these calls aren't asynchronous, so I would expect the connection would be closed and disposed of before DatabaseCall2 is invoked. However, could it be that something on the SQL Server CE end isn't closing the connection fast enough? It might explain why it's intermittent since maybe most of the time it doesn't have a problem? I should also mention that this exact program without the fix is installed on a few other systems with the exact same hardware and software (they're clones of each other), and users of the other systems have not reported any errors.
I'm stuck scratching my head because I can't reproduce this error on my development machine or a test machine, and answers to questions about this exception here and other places typically revolve around insufficient user permissions or the database on a shared network folder.
Check this previous post,I think you will find your answer :-
SQL Server CE - Internal error: Cannot open the shared memory region

Pass large amounts of data between app domains quickly

I have an application used to import a large dataset (millions of records) from one database to another, doing a diff in the process (IE removing things that were deleted, updating things, etc). Due to many foreign key constraints and such and to try to speed up the processing of the application, it loads up the entire destination database into memory and then tries to load up parts of the source database and does an in-memory compare, updating the destination in memory as it goes. In the end it writes these changes back to the destination. The databases do not match one to one, so a single table in one may be multiple tables in the other, etc.
So to my question: it currently takes hours to run this process (sometimes close to a day depending on the amount of data added/changed) and this makes it very difficult to debug. Historically, when we encounter a bug, we have made a change, and then rerun the app which has to load all of the data into memory again (taking quite some time) and then run the import process until we get to the part we were at and then we cross our fingers and hope our change worked. This isn't fun :(
To speed up the debugging process I am making an architectural change by moving the import code into a separate dll that is loaded into a separate appdomain so that we can unload it, make changes, and reload it and try to run a section of the import again, picking up where we left off, and seeing if we get better results. I thought that I was a genius when I came up with this plan :) But it has a problem. I either have to load up all the data from the destination database into the second appdomain and then, before unloading, copy it all to the first using the [Serializable] deal (this is really really slow when unloading and reloading the dll) or load the data in the host appdomain and reference it in the second using MarshalByRefObject (which has turned out to make the whole process slow it seems)
So my question is: How can I do this quickly? Like, a minute max! I would love to just copy the data as if it was just passed by reference and not have to actually do a full copy.
I was wondering if there was a better way to implement this so that the data could better be shared between the two or at least quickly passed between them. I have searched and found things recommending the use of a database (we are loading the data in memory to AVOID the database) or things just saying to use MarshalByRefObject. I'd love to do something that easy but it hasn't really worked yet.
I read somewhere that loading a C++ dll or unmanaged dll will cause it to ignore app domains and could introduce some problems. Is there anyway I could use this to my advantage, IE, load an unmanaged dll that holds my list for me or something, and use it to trick my application into using the same memory are for both appdomains so that the lists just stick around when I unload the other dll by unloading the app domain?
I hope this makes sense. It's my first question on here so if I've done a terrible job do help me out. This has frustrated me for a few days now.
App domains approach is a good way of separating for the sake of loading/unloading only part of your application. Unfortunately, as you discovered, exchanging data between two app domains is not easy/fast. It is just like two different system processes trying to communicate which will always be slower than the same process communication. So the way to go is to use quickest possible inter process communication mechanism. Skip WCF as it ads overhead you do not need here. Use named pipes through which you can stream data very fast. I have used it before with good results. To go even faster you can try MemoryMappedFile (link) but that's more difficult to implement. Start with named pipes and if that is too slow go for memory mapped files.
Even when using fast way of sending, you may hit another bottleneck - data serialization. For large amounts of data, standard serialization (even binary) is very slow. You may want to look at Google's protocol buffers.
One word of caution on AppDomain - any uncaught exception in one of the app domains brings the whole process down. They are not that separated, unfortunately.
On the side note. I do not know what your application does but millions of records does not seem that excessive. Maybe there is a room for optimization?
You didn't say if it were SQL Server, but did you look at using SSIS for doing this? There are evidently some techniques that can make it fast with big data.

Site dies after so long?

I have a website doing some things that I've never seen before. My server is Win 2003 w/ IIS6 I'm using C# and .Net 4.0.
The site is a real-estate website that stores the data directly in my db. The site will run great for a little while and then just die. What I mean is you'll try to view a property's details and it will take the site 2-3 minutes to load, if it loads at all. If I simply resave the web.config file and reupload it to restart the app, it runs just fine for a little while and then will die again. This continues over and over. I've gone to the local copy while the live site has "died" and the local copy will run just fine and then it will die after so long as well. The time frame that it takes varies from 5 minutes to 30 minutes, i believe it has something to do with the number of requests.
Anyone have any clue as to what might be happening? The only the data query on the page is to pull the main data which is the LINQ query below:
public Listing GetListingByMLNumber(string MLNumber)
{
try
{
DatabaseDataContext db = new DatabaseDataContext();
var item = (from a in db.Listings
where a.ML_.ToLower() == MLNumber.ToLower()
select a).FirstOrDefault();
return item;
}
catch (Exception ex)
{
Message = ex.Message;
return null;
}
}
Not closing the database context stands out as the obvious error in the code you provided. Wrap it in a using statement to be sure it gets disposed correctly.
As long as the context lives, you will hold on to a sql connection, which is a limited resource. You will also waste memory by change-tracking the entities you returned. Given your code the context should be garbage collected at some point, but it might still be the problem (And, whether or not this is the problem, you should dispose your database contexts).
Try load testing locally to see if you can reproduce the problem. If you can, then use the debugger to figure out the problem. If not, you probably need to add logging to narrow down the problem.
You could also look at the IIS process to see if it uses absurd amounts of memory, handles, etc. Also check IIS settings for performance and application pool recyling as suggested in another answer here.
I would take a look at the application pool settings to see how the worker processes are being recycled, and I would also look under the Performance tab in IIS to see if there's a bandwidth threshold specified.
If you ever hit this type of problem again then your should add DebugDiag/ADPlus and WinDBG to your diagnostic toolbelt.
When your application hangs again or is taking an exceedingly long time to respond to requests then grab a dump of the worker process using DebugDiag or ADPlus. Load this up into WinDBG, load up SOS (Son of Strike) which is a WinDBG extension for managed code debugging and start digging around.
Tess Ferrandez has a great set of tutorials and labs on how to use these tools effectively:
.NET Debugging Demos - Information and setup instructions
They've gotten me out of a few pickles several times and it's well worth spending the time familiarising yourself with them.

NHibernate + ActiveRecord + PostgreSQL = memoryexception

I have a system in winforms C#.Net 2.0 with ActiveRecord + NHibernate communicating with a PostgreSQL 9 database.
When user open the system, starts the communication with the DB by a new SessionScope(). For some users it works perfectly... but for others the system generates a memoryexception, identical to the problem of Marcio in msdn forum: link.
How can I solve this problem? The problem is in NHibernate! The error occurs when I try to close the ISession object or when I try to Commit the transaction.
An underlying reason for OutOfMemoryException can be outside of the code that you posted. You simply have a memory leak and it can be anywhere in your app. The exception will be thrown from the code that tries to allocate more memory, not necessarily from the code that causes memory leak. Use memory profiler to figure out what causes the memory leak.
It is very likely however that this issue is due to bloated 1st level cache in NHibernate. From SessionScope document:
At the same time, NHibernate is keeping tracks of changes to objects
within the scope. If there are too many objects and too many changes
to keep track, then performance will slowly downgrade. So a flushing
now and then will be required.
Get rid of GC calls, you don't need them.
Limit the scope of the session
Flush/Clear session periodically
Make sure that you use lazy loading appropriately (don't load information you don't need from database)

Categories

Resources