C# Possible Memory Leak? - c#

So, I have an app, written in C# (vs2010), performing OCR using the tesseract 3.02 dll and Charles Weld's terreract .net wrapper.
I think I have a memory leak and it seems to be in the area of code where the Pix object is allocated. I am taking a PDF, converting that to a grayscale PNG, then loading that into a Pix object for OCR. When it works, it works really well. Image is large in size (5100 or so pixels in each dim) but not so large in size (only 500k or so).
My code:
Init engine at app startup:
private TesseractEngine engine = new TesseractEngine(#"./tessdata/", "eng+fra", EngineMode.Default);
Method to convert PDF to PNG, then calls:
// Load the image file created earlier into a Pix object.
Pix pixImage = Pix.LoadFromFile(Path.Combine(textBoxSourceFolder.Text, sourceFile));
And then calls the following:
// Perform OCR on the image referenced in the Pix object.
private String PerformImageOCR(Pix pixImage)
{
int safety = 0;
do
{
try
{
// Deskew the image.
pixImage = pixImage.Deskew();
//pixImage.Save(#"c:\temp\img_deskewed.png", ImageFormat.Png); // Debugging - verify image deskewed properly to allow good OCR.
string text = "";
// Use the tesseract OCR engine to process the image
using (var page = engine.Process(pixImage))
{
// and then extract the text.
text = page.GetText();
}
return text;
}
catch (Exception e)
{
MessageBox.Show(string.Format("There was an error performing OCR on image, Retrying.\n\nError:\n{0}", e.Message), "Error", MessageBoxButtons.OK);
}
} while (++safety < 3);
return string.Empty;
}
I have observed that memory usage jumps by about 31MB when the Pix object is created, then jumps again while OCR is being performed, then finally settles about 33MB higher than before it started. ie: if app, after loading, was consuming 50MB, loading the Pix object causes the memory usage to jump to about 81MB. Performing OCR will see it spike to 114+MB, then, once the process is complete and the results saved, the memory usage settles to about 84MB. Repeating this over many files in a folder will eventually cause the app to barf at 1.5GB or so consumed.
I think my code is okay, but there's something somewhere that's holding onto resources.
The tesseract and leptonica dlls are written in C and I have recompiled them with VS2010 along with the latest or recommended image lib versions, as appropriate. What I'm unsure of, is how to diagnose a memory leak in a C dll from a C# app using visual studio. If I were using Linux, I'd use a tool such as valgrind to help me spot the leak, but my leak sniffing skills on the windows side are sadly lacking. Looking for advice on how to proceed.

Reading your code here I do not see you disposing your Pix pixImage anywhere? That's what is taking up all the resources when you are processing x images.
Before you return your string result you should call the dispose method on your pixImage. That should reduce the amount of resources used by your program.

I'm not familliar with Tesseract or the wrapper, but for memory profiling issues, if you have Visual Studio 2012/2013, you can use the Performance Wizard. I know it's available in Ultimate, but not sure on other versions.
http://blogs.msdn.com/b/dotnet/archive/2013/04/04/net-memory-allocation-profiling-with-visual-studio-2012.aspx
It's either something in your code or something in the wrapper is not disposing an unmanaged object properly. My guess would be it's in the wrapper. Running the Performance Wizard or another C# memory profiler (like JetBrains DotTrace) may help you track it down.

Related

Memory leak analysis and help requested

I've been using the methodology outlined by Shivprasad Koirala to check for memory leaks from code running inside a C# application (VoiceAttack). It basically involves using the Performance Monitor to track an application's private bytes as well as bytes in all heaps and compare these counters to assess if there is a leak and what type (managed/unmanaged). Ideally I need to test outside of Visual Studio, which is why I'm using this method.
The following portion of code generates the below memory profile (bear in mind the code has a little different format compared to Visual Studio because this is a function contained within the main C# application):
public void main()
{
string FilePath = null;
using (FileDialog myFileDialog = new OpenFileDialog())
{
myFileDialog.Title = "this is the title";
myFileDialog.FileName = "testFile.txt";
myFileDialog.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*";
myFileDialog.FilterIndex = 1;
if (myFileDialog.ShowDialog() == DialogResult.OK)
{
FilePath = myFileDialog.FileName;
var extension = Path.GetExtension(FilePath);
var compareType = StringComparison.InvariantCultureIgnoreCase;
if (extension.Equals(".txt", compareType) == false)
{
FilePath = null;
VA.WriteToLog("Selected file is not a text file. Action canceled.");
}
else
VA.WriteToLog(FilePath);
}
else
VA.WriteToLog("No file selected. Action canceled.");
}
VA.WriteToLog("done");
}
You can see that after running this code the private bytes don't come back to the original count and the bytes in all heaps are roughly constant, which implies that there is a portion of unmanaged memory that was not released. Running this same inline function a few times consecutively doesn't cause further increases to the maximum observed private bytes or the unreleased memory. Once the main C# application (VoiceAttack) closes all the related memory (including the memory for the above code) is released. The bad news is that under normal circumstances the main application may be kept running indefinitely by the user, causing the allocated memory to remain unreleased.
For good measure I threw this same code into VS (with a pair of Thread.Sleep(5000) added before and after the using block for better graphical analysis) and built an executable to track with the Performance Monitor method, and the result is the same. There is an initial unmanaged memory jump for the OpenFileDialog and the allocated unmanaged memory never comes back down to the original value.
Does the memory and leak tracking methodology outlined above make sense? If YES, is there anything that can be done to properly release the unmanaged memory?
Does the memory and leak tracking methodology outlined above make sense?
No. You shouldn't expect unmanaged committed memory (Private Bytes) always be released. For instance processes have an unmanaged heap, which is managed to allow for subsequent allocations. And since Windows can page your committed memory, it isn't critical to minimize each processes committed memory.
If repeated calls don't increase memory use, you don't have a memory leak, you have delayed initialization. Some components aren't initialized until you use them, so their memory usage isn't being taken into account when you establish your baseline.

Trying to load 150+ grayscale 4096 x 4096 bitmaps. Need help getting around the 2GB limit, I think

Solved: I assumed, incorrectly, that Visual Studio 2012 defaulted to building 64-bit apps when making new projects. Under Solution Properties, Build tab, there is a checkbox marked, "Prefer 32-bit". When I unchecked this checkbox and rebuilt my solution, I was able to load and process over 200 bitmaps at a time. Thanks to Mike Trusov for so politely pointing out the solution to me.
Original question: I've run into a small problem. I have more than 150 grayscale bitmaps that I want to load into RAM on an 8 GB system, but I can't seem to get past 50 or so without throwing an exception. I've tried loading the bitmaps into an array, which is a lost cause, of course, because .NET has a 2GB limit. But it failed long before 2GB worth of bitmaps were loaded. So I tried loading them into a List<List<Bitmap>>, but even that fails at about the same place. At the moment of the exception, Task Manager says I have 3939 MB of Available RAM waiting to be filled. Very strange.
I don't need these bitmaps to be contiguous in RAM. They can be a scattered bunch of 16 MB allocations for all I care. I just want to be able to fill available RAM with a bunch of bitmaps. That's all.
The exception has at various times been an OutOfMemory exception or an ArgumentException, depending on how much available RAM I had when I started the program. In either case, the stack trace dies inside System.Drawing.Bitmap..ctor(String filename). There is nothing wrong with the specific file being loaded at the time of the exception. In fact, when I have it load a different (or even overlapping) set of bitmaps, the error occurs at the same iteration.
Does anyone have a clue they can lend me on how to do this? Am I running into the .NET 2GB limit in some strange way?
To respond to a few questions and comments: I'm using Visual Studio 2012, .NET 4.5, 64-bit Windows 7, on a computer with 8 GB of RAM. Yes, I need all those bitmaps in RAM at the same time for a variety of reasons (performance, image processing reasons, etc). I have pondered using gcAllowVeryLargeObjects, but I don't need or want all my bitmaps in a long chunk of contiguous memory. I would much rather each Bitmap used its own separate memory allocation. Besides, if I had a machine with 64 GB of RAM, it would be absurd to be limited to even 150 Bitmaps of that size. Why won't these bitmaps load without throwing an OutOfMemoryException?
To me, it seems that .NET is trying to keep all Bitmaps in a single 2 GB region. If there was a way to get each Bitmap to (saying more than I know here) have its own separate address space, that might solve the problem. To invoke the language of the long ago days of MS-DOS, I want to allocate and access far memory using a long pointer, not have all my data stuck in a single near segment.
Here is the array code:
List<String> imageFiles; // List of .bmp filenames.
Bitmap[] bmps = new Bitmap[100]; // Stores/accesses the Bitmaps.
private void goButton_Click(object sender, EventArgs e)
{
int i;
// Load the bitmaps
if (bmps[0] == null)
{
// Load the list of bitmap files.
imageFiles = Directory.EnumerateFiles(#"C:\Temp", "*.bmp", SearchOption.TopDirectoryOnly).ToList();
// Read bitmap files
for (i = 0; i < bmps.Length; ++i)
{
bmps[i] = new Bitmap(imageFiles[i]); // <-- Exception occurs here when i == 52 or so.
}
}
}
Here is the List> code:
List<String> imageFiles; // List of .bmp filenames.
List<List<Bitmap>> bmps = new List<List<Bitmap>>(100); // Stores/accesses the Bitmaps.
private void goButton_Click(object sender, EventArgs e)
{
int i;
// Load the bitmaps
if (bmps.Count == 0)
{
// Load the list of bitmap files.
imageFiles = Directory.EnumerateFiles(#"C:\Temp", "*.bmp", SearchOption.TopDirectoryOnly).ToList();
// Read bitmap files
for (i = 0; i < 100; ++i)
{
// Load the bitmap into temporary Bitmap b.
Bitmap b = new Bitmap(imageFiles[i]); // <-- Exception occurs here when i == 52 or so.
// Create and add a List<Bitmap> that will receive the clone of Bitmap b.
bmps.Add(new List<Bitmap>(1));
// Clone Bitmap b and add that cloned Bitmap to the Bitmap of List<Bitmap>.
bmps[i].Add((Bitmap)b.Clone());
// Dispose Bitmap b.
b.Dispose();
}
}
}
There should be no issues loading more than 2gb of bitmaps into memory in a 64bit app (check project settings - might have to create a new configuration for Any CPU based on x86) running on 64bit OS (I'm guessing you are). Also a simple list should work:
var imageFiles = Directory.EnumerateFiles(#"C:\Temp", "*.bmp", SearchOption.TopDirectoryOnly).ToList();
var lst = new List<Bitmap>();
foreach (var imageFile in imageFiles)
{
lst.Add(new Bitmap(imageFile));
}
Do they ALL have to be loaded at the same time? Could you load say, 20 of them, then while you are processing or displaying those, you have a background thread prep the next 20.

Reading from PackagePart stream does not release memory

In our application, we are reading an XPS file using the System.IO.Packaging.Package class. When we read from a stream of a PackagePart, we can see from the Task Manager that the application's memory consumption rises. However, when the reading is done, the memory consumption doesn't fall back to what it was before reading from the stream.
To illustrate the problem, I wrote a simple code sample that you can use in a stand alone wpf application.
public partial class Window1 : Window
{
public Window1()
{
InitializeComponent();
_package = Package.Open(#"c:\test\1000pages.xps", FileMode.Open, FileAccess.ReadWrite, FileShare.None);
}
private void ReadPackage()
{
foreach (PackagePart part in _package.GetParts())
{
using (Stream partStream = part.GetStream())
{
byte[] arr = new byte[partStream.Length];
partStream.Read(arr, 0, (int)partStream.Length);
partStream.Close();
}
}
}
Package _package;
private void Button_Click(object sender, RoutedEventArgs e)
{
ReadPackage();
}
}
The ReadPackage() method will read all the PackagePart objects' stream contents into a local array. In the sample, I used a 1000 page XPS document as the package source in order to easily see the memory consumption change of the application. On my machine, the stand alone app's memory consumption starts at 18MB then rises to 100MB after calling the method. Calling the method again can raise the memory consumption again but it can fall back to 100MB. However, it doesn't fall back to 18MB anymore.
Has anyone experienced this while using PackagePart? Or am I using it wrong? I think the internal implementation of PackagePart is caching the data that was read.
Thank you!
You do not specify how you measure the "memory consumption" of your application but perhaps you are using task manager? To get a better view of what is going on I suggest that you examine some performance counters for your application. Both .NET heap and general process memory performance counters are available.
If you really want to understand the details of how your application uses memory you can use the Microsoft CLR profiler.
What you see may be a result of the .NET heap expanding to accomodate a very large file. Big objects are placed on the Large Object Heap (LOH) and even if the .NET memory is garbage collected the free memory is never returned to the operating system. Also, objects on the LOH are never moved around during garbage collection and this may fragment the LOH exhausting the available address space even though there is plenty of free memory.
Has anyone experienced this while using PackagePart? Or am I using it wrong?
If you want to control the resources used by the package you are not using it in the best way. Packages are disposable and in general you should use it like this:
using (var package = Package.Open(#"c:\test\1000pages.xps", FileMode.Open, FileAccess.ReadWrite, FileShare.None)) {
// ... process the package
}
At the end of the using statement resources consumed by the package are either already released or can be garbage collected.
If you really want to keep the _package member of your form you should at some point call Close() (or IDisposable.Dispose()) to release the resources. Calling GC.Collect() is not recommended and will not necessarily be able to recycle the resources used by the package. Any managed memory (e.g. package buffers) that are reachable from _package will not be garbage collected no matter how often you try to force a garbage collection.

Is passing System.Drawing.Bitmap across class libraries unreliable?

I have a 3rd party dll which generates a Bitmap and send back its reference. If I generate a System.Windows.Media.Imaging.BitmapSource out of it immediately then all goes well. But if I save the reference and later on (after a few seconds and many function calls) I try to generate the Bitmapsource, I get
System.AccessViolationException was
unhandled by user code
Message=Attempted to read or write
protected memory. This is often an
indication that other memory is
corrupt. Source="System.Drawing"
when doing :
System.Windows.Media.Imaging.BitmapSource bitmapSource = System.Windows.Interop.Imaging.CreateBitmapSourceFromHBitmap(
bmp.GetHbitmap(),
IntPtr.Zero,
Int32Rect.Empty,
System.Windows.Media.Imaging.BitmapSizeOptions.FromEmptyOptions());
Any clues on whats going wrong here ? Any pointers will be useful. Thanks.
I think this indicates that the handle (reference to a resource managed by the operating system, rather than .Net) returned by bmp.GetHBitmap is no longer valid - possibly a Dispose has been called somewhere or something like that (not necessarily by your code though).
I'd recommend using another way of persisting the bitmap data that does not rely on handles - possibly stream out the binary data of the bitmap itself immediately, and then throw a reference to that around.
I had a big problem with Bitmaps and access violations as well. What I believe to be happening is that certain bitmap constructors leave file handles open when they should not. Thus, the program you are running detects that the files are in use, when they shouldn't be.
I eventually figured out a solution in that I make a copy of the original bitmap and then dispose the original. Here is my code, which preserves the resolution of the original Bitmap:
Bitmap Temp = new Bitmap(inFullPathName);
Bitmap Copy = new Bitmap(Temp.Width, Temp.Height);
Copy.SetResolution(Temp.HorizontalResolution, Temp.VerticalResolution);
using (Graphics g = Graphics.FromImage(Copy))
{
g.DrawImageUnscaled(Temp, 0, 0);
}
Temp.Dispose();
return Copy;
Obviously, for the first line, yours would be Bitmap Temp = MyThirdPartyDLL.GetBitmap(); or something. If you don't care about the resolution it can be simplified to:
Bitmap Temp = MyThirdPartyDLL.GetBitmap();
Bitmap Copy = new Bitmap(Temp, Temp.Width, Temp.Height);
Temp.Dispose();
return Copy;
After making this change, I was able to do all kinds of File I/O, etc, perfectly fine, hope you can do the same.

GDI+ exception saving a Bitmap to a MemoryStream

I have a problem in a Windows Forms application with Bitmap.Save failing when I save to a MemoryStream. The problem only seems to occur intermittently on one machine (so far) and the bad news it is at a customer site. I can't debug on the machine, but I got a stack trace that narrowed the problem down to a single line of code.
Here's a condensed version of my code:
byte[] ConvertPixelsToBytes()
{
// imagine a picture class that creates a 24 bbp image, and has
// a method to get an unmanaged pixel buffer.
// imagine it also has methods for getting the width,
// height, pitch
// I suppose this line could return a bad address, but
// I would have expected that the Bitmap constructor would have
// failed if it was
System.IntPtr pPixels = picture.GetPixelData();
System.Drawing.Bitmap bmp = new System.Drawing.Bitmap(
picture.width(),
picture.height(),
picture.pitch(),
System.Drawing.Imaging.PixelFormat.Format24bppRgb,
pPixels );
// This line doesn't actually free the memory, but it could be freed in a
// background thread
// (2)
picture.releasePixelData(pPixels);
System.IO.MemoryStream memStream = new System.IO.MemoryStream();
try
{
// I don't see how this line could fail, but it does
// (3)
bmp.Save(memStream, System.Drawing.Imaging.ImageFormat.Bmp);
return memStream.ToArray();
}
catch(System.Runtime.InteropServices.ExternalException e)
{
// e.Message is the very helpful " A generic error occurred in GDI+."
}
finally
{
memStream.Dispose();
}
return new byte[0];
}
Any idea what might be going on? I'm pretty sure my pixel buffer is right, it always works on our dev/test machines and at other customer sites.
My thoughts on possible reasons for failure are
a. The bitmap constructor doesn't copy the pixel data, but keeps a reference to it, and the Save fails because the memory is released. I don't find the MSDN docs clear on this point, but I assume that the Bitmap copies the pixel data rather than assume it is locked.
b. The pixel data is invalid, and causes the Save method to fail. I doubt this since my pixel data is 24 Bits per pixel, so as far as I know it should not be invalid.
c. There's a problem with the .NET framework.
I would appreciate any thoughts on other possible failure reasons so I can add extra checks and logging information to my app so I can send something out into the field.
The MSDN docs for that Bitmap constructor leave no doubt whatsoever:
Remarks
The caller is responsible for allocating and freeing the block of memory specified by the scan0 parameter, however, the memory should not be released until the related Bitmap is released.
Have you tried moving
picture.releasePixelData(pPixels);
to
finally
{
memStream.Dispose();
picture.releasePixelData(pPixels);
}
it definitely sounds like a threading issue (especially since you state that the releasePixelData could happen on a background thread). Threading issues are always the ones that only happen on one machine, and it is always on the clients machine (probably due to the fact they only have 256Megs of memory or something ridiculous and garbage collector is kicking in early, or the machine has quad core and your developer machine is dual core or something).

Categories

Resources