How to implement my own byte array creation and disposal

How to implement my own byte array creation and disposal - c#

BACKGROUND:
In running my app through a profiler, it looks like the hotspots are all involved in allocating a large number of temporary new byte[] arrays.
In one run under CLR Profiler, a few short (3-5 seconds worth of CPU time outside the profiler) produced over a gigabyte of garbage, the majority of it byte[] allocation, and this triggered over 500 collections.
In some cases it appears that the application is spending upwards of 10% of its CPU time performing collections.
Clearly a rewrite is in order.
So, I am thinking of replacing the new byte[] allocations with a pool class that could reuse the buffer at a later time.
Something like this ...
{
byte[] temp = Pool.AllocateBuffer(1024);
...
}
QUESTION:
How can I force the application to call code in the routine Pool.deAllocate(temp) when temp is no longer needed.
In the above code fragment, when temp is a Pool allocated byte[] buffer, but when it goes out of scope it gets deleted. Not a real problem, but doesn't get reused by the pool.
I know I could replace the "return 0;" with "Pool.deAllocate(temp); return 0", but I'm trying to force the recovery to occur.
Is this even remotely possible?

You could implement a Buffer class which implements IDisposable and returns the buffer to the pool when it's disposed. You can then give access to the underlying byte array, and so long as everyone plays nicely you can take advantage of reuse.
Be warned though:
Your buffers will quickly end up in gen 2, which may not be ideal for other reasons
If a malicious piece of code keeps a reference to the byte array, they could spy on data used by other code
You need to remember to dispose of buffers at the right time.
I actually have some code in MiscUtil to do this - see CachingBufferManager, CachedBuffer etc. I can't say I've used it much, mind you... and from what I remember, I made it a bit more complicated than I really needed to...
EDIT: To respond to the comments...
You can't force application code to release buffers, no. There's no automatic release mechanism in C# - a using statement is the closest we've got.
You could implement an implicit conversion to byte[] in your buffer class to allow you to call methods which have byte array parameters. Personally I'm not much of a fan of implicit conversions, but it's certainly available as an option.

Related

Avoiding memory leaks using ArrayPool within a class

I have a class which I am currently using to build byte arrays (for network packets), it starts with some initial buffer, and resizes as more data is written to it. I want to reduce the amount of GC caused by this util by using an ArrayPool, but am having a hard time figuring out a good way to prevent memory leaks without too much overhead.
My main idea was to make my class IDisposable and return the array back to the pool in the Dispose() call.
using ByteBuilder builder = new ByteBuilder();
builder.AddData(); // Add a bunch of data
// write data to network stream
...
// builder is diposed and memory is returned to ArrayPool at the end of the method call
The problem here is that if it happens to be used without the using declaration, the memory would never be returned. Is there some way I can guarantee how someone uses it?
Another idea I had was to use a finalizer to return the memory to the array pool, but it seems that this has significant overhead from what I have read. I am allocating a lot of arrays, and many of them could be small, so having a finalizer seems like it may not be worth the trade off.

C# Too Much Memory Usage

I have a process going with multiple steps defied. (Let's say generic implementation of a strategy pattern) where all steps are passing a common ProcessParameter object around. (read/write to it)
This ProcessParameter is an object having many arrays and collections. Example:
class ProcessParameter() {
public List<int> NumbersAllStepsNeed {get; set;}
public List<int> OhterNumbersAllStepsNeed {get; set;}
public List<double> SomeOtherData {get; set;}
public List<string> NeedThisToo {get; set;}
...
}
Once the steps finished, I'd like to make sure the memory is freed and not hang around, because this can have a big memory footprint and other processes need to run too.
Do I do that by running:
pParams.NumbersAllStepsNeed = null;
pParams.OhterNumbersAllStepsNeed = null;
pParams.SomeOtherData = null;
...
or should ProcessParameter implement IDosposable, and Dispose method would do that, and then I just need to use pParams.Dispose() (or wrap it in using block)
What is the best and most elegant way to clean the memory footprint of the used data of one process running?
Does having arrays instead of lists change anything? Or Mixed?
The actual param type I need is collections/array of custom objects.
Am I looking in the right direction?
UPDATE
Great questions! Thanks for the comments!
I used to have this process running as a single run and I could see memory usage go very high and then gradually down to "normal".
The problem came when I started chaining this processes on top of each other with different stating parameters. That is when memory went unreasonably high, so I want to include a cleaning step between two processes and looking for best way to do that.
There is a DB, this params is a sort of "cache" to speed up things.
Good point on IDisposable, I do not keep unmanaged resources in the params object.

Whilst using the Disposal pattern is a good idea, I don't think it will give you any extra benefits in terms of freeing up memory.
Two things that might:
Call GC.Collect()
However, I really wouldn't bother (unless perhaps you are getting out of memory exceptions). Calling GC.Collect() explicity may hurt performance and the garbage collector really does do a good job on its own. (But see LOH - below.)
Be aware of the Large Object Heap (LOH)
You mentioned that it uses a "big memory footprint". Be aware that any single memory allocation for 85,000 bytes or above comes from the large object heap (LOH). The LOH doesn't get compacted like the small object heap. This can lead to the LOH becoming fragmented and can result in out of memory errors even when you have plenty of available memory.
When might you stray into the LOH? Any memory allocation of 85,000 bytes or more, so on a 64 bit system that would be any array (or list or dictionary) with 10,625 elements or more, image manipulation, large strings etc.
Three strategies to help minimise fragmentation of the LOH:
i. Redesign to avoid it. Not always practical. But a list of lists or dictionary of dictionaries might avoid the limit. This can make the implementation more complex so I wouldn't unless you really need to, but on the plus side this can be very effective.
ii. Use fixed sizes. If all of more of your memory allocations in the LOH are the same size then this will help minimise any fragmentation. For example for dictionaries and lists set the capacity (which sets the size of the internal array) to the largest size you are likely to use. Not so practical if you are doing image manipulation.
iii. Force the garbage collector to compact the LOH:
System.Runtime.GCSettings.LargeObjectHeapCompactionMode = System.Runtime.GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
you do need to be using .NET Framework 4.5.1 or later to use that.
This is probably the simplest approach. In my own applications I have a couple of instances where I know I will be straying into the LOH and that fragmentation can be an issue and I set
System.Runtime.GCSettings.LargeObjectHeapCompactionMode = System.Runtime.GCLargeObjectHeapCompactionMode.CompactOnce;
as standard in the destructor - but only call GC.Collect() explicitly if I get an out of memory exception when allocating.
Hope this helps.

Does Array.Copy maintain the guarantee about atomic reads and writes on a per element basis?

C# ensures that certain types always have atomic reads and writes. Do I have those same assurances when calling Array.Copy on two arrays of those types? Is each element atomically read and written? I browsed some of the source code but did not come away with a solid answer.
For example, if I rolled my own code for copying two arrays...
static void Copy<T>(T[] source, T[] destination, int length)
{
for (int i = 0; i < length; ++i)
destination[i] = source[i];
}
... and called the Copy<int> variant, this guarantees that each element is atomically read from source and atomically written to destination because C# promises that int reads and writes are atomic. I'm simply asking if Array.Copy maintains that guarantee (as opposed to, say, using its own specialized memory block copy routine that possibly breaks this guarantee).

Array.Copy() tries to make the copy efficient by using the memmove() CRT function, a raw memory-to-memory copy without regard for the actual type stored in the array. It can be substantially more efficient if the array element type is smaller than the natural processor word size.
So you need to know whether memmove() can provide the atomicity guarantee. That is an tricky question that the CLR programmer answered unambiguously. Atomiticy is an essential trait for object references, the garbage collector cannot operate correctly when it can't update those references atomically. So the programmer special-cases this in the CLR code, the comment he provided tells you what you want to know (edited to fit):
// The CRT version of memmove does not always guarantee that updates of
// aligned fields stay atomic (e.g. it is using "rep movsb" in some cases).
// Type safety guarantees and background GC scanning requires object
// references in GC heap to be updated atomically.
That's a very pessimistic view on life. But clearly no, you can't assume Array.Copy() is atomic when the CLR author did not make this assumption.
Practical consideration do perhaps need to prevail. On reasonably common architectures, x86 and x64 have a memmove() implementation that don't make the CLR memory model guarantees worse, they copy 4 or 8 aligned bytes at a time. And practically the for-loop in the generic code substitute is not guaranteed to be atomic since T is not guaranteed to be.
Most practical is that you ought not have to ask the question. Atomicity only matters when you have another thread that is accessing the arrays without any synchronization. Writes to the source array or reads from the destination array. That is however a guaranteed threading race. Writes to the source array are worst, the copy has an arbitrary mix of old and new data. Reads from the destination array randomly produce stale data, like a threading bug normally does. You have to be quite courageous to risk this kind of code.

Detecting when about to run out of memory (getting the amount of "free physical memory")

I'm transferring images from a high-FPS camera into a memory buffer (a List), and as those images are pretty large, the computer runs out of memory pretty quickly.
What I would like to do is to stop the transfer some time before the application runs out of memory. During my testing, I have found it to be consistent with the "Free Physical Memory" indicator getting close to zero.
Now the problem is that I can't find a way actually to get this value programmatically; in XP, it is not even displayed anywhere (just in the Vista/7 task manager).
I have tried all the ways I could find (WMI, performance counters, MemoryStatus, ...), but everything I got from those was just the "Available Physical Memory," which is of course not the same.
Any ideas?
Update
Unfortunately, I need the data to be in memory (yes, I know I can't guarantee it will be in physical memory, but still), because the data is streamed in real-time and I need to preview it in memory after it's been stored there.

Correlation is not causation. You can "run out of memory" even with loads of physical memory still free. Physical memory is almost certainly irrelevant; what you are probably running out of is address space.
People tend to think of "memory" as consuming space on a chip, but that hasn't been true for over a decade. Memory in modern operating systems is often better thought of as a large disk file that has a big hardware cache sitting on top of it to speed it up. Physical memory is just a performance optimization of disk-based memory.
If you're running out of physical memory then your performance is going to be terrible. But the scarce resource is actually address space that you are running out of. A big list has to have a large contiguous block of address space, and there might not be any block large enough left in the size you want.
Don't do that. Pull down a reasonably sized block, dump it to disk, and then deal with the file on disk as needed.

I'm late to the party, but have you considered using the System.Runtime.MemoryFailPoint class? It does a bunch of stuff to ensure that the requested allocation would succeed and throws InsufficientMemoryException if it fails; you can catch this and stop your transfer. You can probably predict an average size of incoming frames and try to allocate 3 or 4 of them, then stop acquisition when a failure is made. Maybe something like this?
const int AverageFrameSize = 10 * 1024 * 1024 // 10MB
void Image_OnAcquired(...)
{
try
{
var memoryCheck = new MemoryFailPoint(AverageFrameSize * 3);
}
catch (InsufficientMemoryException ex)
{
_camera.StopAcquisition();
StartWaitingForFreeMemory();
return;
}
// if you get here, there's enough memory for at least a few
// more frames
}
I doubt it'd be 100% foolproof, but it's a start. It's definitely more reliable than the performance counter for reasons that are explained in the other answers.

You can't use the free memory counter on its own in Vista/7 as a guide since it may be close to zero all the time. The reason for this is Vista/7's superfetch which uses free memory to cache stuff from disk that it thinks you're likely to use.
Linky: http://www.codinghorror.com/blog/2006/09/why-does-vista-use-all-my-memory.html
In addition if you're running a 32-bit C# process you are limited to 2GB of memory per process anyway (actually more like 1.5GB in practice before things become unstable) so even if your box shows you have loads of free memory you will still get an out of memory exception when your process hits the 2GB limit.
As Tergiver comments above, the real solution is to avoid holding all of the file in memory and instead swapping bits of the image in and out of memory as required.

Thanks for all the answers.
I've been thinking about it some more and came to a conclusion that it would be quite difficult (if not impossible) to do what I initially wanted, that is to somehow detect when the application is about to run out of memory.
All answers seem to point in the same direction (to somehow keep data out of memory), however unfortunately I cannot go there, as I really "need" the data to stay inside the memory (physical if possible).
As I have to make a compromise, I decided to create a setting for user to decide the memory usage limit for captured data. It is at least easy to implement.

Wanted to add my own answer because the otherwise good answer by OwenP has two important errors in the way it is using System.Runtime.MemoryFailPoint.
The first mistake is a very simple to fix: The constructor signature is public MemoryFailPoint(int sizeInMegabytes) so the AverageFrameSize argument should be in megabytes, not bytes. Also note the following about the size:
MemoryFailPoint operates at a granularity of 16 MB. Any values smaller than 16 MB are treated as 16 MB, and other values are treated as the next largest multiple of 16 MB.
The second mistake is that the MemoryFailPoint instance must be kept alive until after the memory you wish to use has been allocated, and then disposed!
This can be a bit harder to fix and might require design changes to be made depending on what OP's actual code looks like.
The reason that you have to dispose it in this fashion is that the MemoryFailPoint class keeps a process-wide record of memory reservations made from it's constructor. This is done to ensure that if two threads perform a memory-check at roughly the same time, they will not both succeed unless there is enough memory to meet the demands of both threads. (Otherwise the MemoryFailPoint class would be useless in multi-threaded applications!)
The memory "reserved" by the constructor is unreserved when calling Dispose(). Thus the thread should dispose the MemoryFailPoint-instance as soon as possible after it has allocated the required memory, but not before that.
(The "as soon as possible" part is preferred but not critical. Delaying the dispose can lead to other memory-checks failing needlessly, but at least you err on the conservative side.)
The above requirement is what requires alteration to the codes design. Either the method checking for memory has to also perform the allocation, or it has to pass out the MemoryFailPoint instance to the caller, which makes it the callers responsibility to dispose it at the correct time. (The latter is what the example code on MSDN does.)
Using the first approach (and a fixed buffer-size) might look something like this:
const int FrameSizeInMegabytes = 10; // 10MB (perhaps more is needed?)
const int FrameSizeInBytes = FrameSizeInMegabytes << 20;
// shifting by 20 is the same as multiplying with 1024 * 1024.
bool TryCreateImageBuffer(int numberOfImages, out byte[,] imageBuffer)
{
// check that it is theoretically possible to allocate the array.
if (numberOfImages < 0 || numberOfImages > 0x7FFFFFC7)
throw new ArgumentOutOfRangeException("numberOfImages",
"Outside allowed range: 0 <= numberOfImages <= 0x7FFFFFC7");
// check that we have enough memory to allocate the array.
MemoryFailPoint memoryReservation = null;
try
{
memoryReservation =
new MemoryFailPoint(FrameSizeInMegabytes * numberOfImages);
}
catch (InsufficientMemoryException ex)
{
imageBuffer = null;
return false;
}
// if you get here, there's likely to be enough memory
// available to create the buffer. Normally we can't be
// 100% sure because another thread might allocate memory
// without first reserving it with MemoryFailPoint in
// which case you have a race condition for the allocate.
// Because of this the allocation should be done as soon
// as possible - the longer we wait the higher the risk.
imageBuffer = new byte[numberOfImages, FrameSizeInBytes];
//Now that we have allocated the memory we can go ahead and call dispose
memoryReservation.Dispose();
return true;
}
0x7FFFFFC7 is the maximum indexer allowed in any dimension on arrays of single-byte types and can be found on the MSDN page about arrays.
The second approach (where the caller is responsible for the MemoryFailPoint instance) might look something like this:
const int AverageFrameSizeInMegabytes = 10; // 10MB
/// <summary>
/// Tries to create a MemoryFailPoint instance for enough megabytes to
/// hold as many images as specified by <paramref name="numberOfImages"/>.
/// </summary>
/// <returns>
/// A MemoryFailPoint instance if the requested amount of memory was
/// available (at the time of this call), otherwise null.
/// </returns>
MemoryFailPoint GetMemoryFailPointFor(int numberOfImages)
{
MemoryFailPoint memoryReservation = null;
try
{
memoryReservation =
new MemoryFailPoint(AverageFrameSizeInMegabytes * numberOfImages);
}
catch (InsufficientMemoryException ex)
{
return null;
}
return memoryReservation;
}
This looks a lot simpler (and is more flexible), but it is now up to the caller to handle the MemoryFailPoint instance and dispose of it at the correct point in time. (Added some mandatory documentation since I didn't come up with a good and descriptive name for the method.)
Important: What "reserved" means in this context
Memory is not "reserved" in the sense that it is guaranteed to be available (to the calling thread). It only means that when a thread uses MemoryFailPoint to check for memory, assuming it succeeds, it adds it's memory size to a process-wide (static) "reserved" amount that the MemoryFailPoint class keeps track of. This reservation will cause any other call to MemoryFailPoint (e.g. from other threads) to perceive the total amount of free memory as the actual amount minus the current process-wide (static) "reserved" amount. (When MemoryFailPoint instances are disposed they subtract their amount from the reserved total.). However the actual memory allocation system itself doesn't know or care about this so called "reservation" which is one of the reasons that MemoryFailPoint doesn't have strong guarantees.
Note also that memory "reserved" is simply kept track of as an amount. Since it isn't an actual reservation of a specific segment of memory this further weakens the guarantees as is illustrated by the following frustrated comment found in the reference source:
// Note that multiple threads can still ---- on our free chunk of address space, which can't be easily solved.
It's not hard to guess what the censored word is.
Here is an interesting article about how to overcome the 2GB limit on arrays.
Also if you need to allocate very large data structures you will need to know about <gcAllowVeryLargeObjects> which you can set in your app-config.
It is worth nothing that this doesn't really have anything to do with physical memory exclusively as the OP really wanted. Matter of fact, one of the things MemoryFailPoint will try to do before it gives up and reports failure is to increase the size of the page-file. But it will do a very decent job of avoiding getting an OutOfMemoryException if used correctly, which is at least half of what the OP wanted.
If you really want to force data into physical memory then, as far as I know, you have to go native with AllocateUserPhysicalPages which isn't the easiest thing in the world with a plethora of things that can go wrong, requires the appropriate permissions and is almost certainly overkill. The OS doesn't really like to be told how to manage memory so it doesn't make it easy to do so...

Getting an OutOfMemoryException just means that the current memory allocation could not be honored. It doesn't necessarily mean that the system or even the process is running out of memory. Imagine a hello world type application that starts off by allocating a 2 GB chunk of memory. On a 32 bit system, that will most likely trigger an exception despite the fact that the process hasn't really allocated any significant memory at this point.
A common source of OutOfMemoryExceptions is not enough contiguous memory available. I.e. plenty of memory is available, but no chunk is big enough to honor the current request. In other words trying to avoid OOM by watching the free memory counters is not really feasible.

How do I get .NET to garbage collect aggressively?

I have an application that is used in image processing, and I find myself typically allocating arrays in the 4000x4000 ushort size, as well as the occasional float and the like. Currently, the .NET framework tends to crash in this app apparently randomly, almost always with an out of memory error. 32mb is not a huge declaration, but if .NET is fragmenting memory, then it's very possible that such large continuous allocations aren't behaving as expected.
Is there a way to tell the garbage collector to be more aggressive, or to defrag memory (if that's the problem)? I realize that there's the GC.Collect and GC.WaitForPendingFinalizers calls, and I've sprinkled them pretty liberally through my code, but I'm still getting the errors. It may be because I'm calling dll routines that use native code a lot, but I'm not sure. I've gone over that C++ code, and make sure that any memory I declare I delete, but still I get these C# crashes, so I'm pretty sure it's not there. I wonder if the C++ calls could be interfering with the GC, making it leave behind memory because it once interacted with a native call-- is that possible? If so, can I turn that functionality off?
EDIT: Here is some very specific code that will cause the crash. According to this SO question, I do not need to be disposing of the BitmapSource objects here. Here is the naive version, no GC.Collects in it. It generally crashes on iteration 4 to 10 of the undo procedure. This code replaces the constructor in a blank WPF project, since I'm using WPF. I do the wackiness with the bitmapsource because of the limitations I explained in my answer to #dthorpe below as well as the requirements listed in this SO question.
public partial class Window1 : Window {
public Window1() {
InitializeComponent();
//Attempts to create an OOM crash
//to do so, mimic minute croppings of an 'image' (ushort array), and then undoing the crops
int theRows = 4000, currRows;
int theColumns = 4000, currCols;
int theMaxChange = 30;
int i;
List<ushort[]> theList = new List<ushort[]>();//the list of images in the undo/redo stack
byte[] displayBuffer = null;//the buffer used as a bitmap source
BitmapSource theSource = null;
for (i = 0; i < theMaxChange; i++) {
currRows = theRows - i;
currCols = theColumns - i;
theList.Add(new ushort[(theRows - i) * (theColumns - i)]);
displayBuffer = new byte[theList[i].Length];
theSource = BitmapSource.Create(currCols, currRows,
96, 96, PixelFormats.Gray8, null, displayBuffer,
(currCols * PixelFormats.Gray8.BitsPerPixel + 7) / 8);
System.Console.WriteLine("Got to change " + i.ToString());
System.Threading.Thread.Sleep(100);
}
//should get here. If not, then theMaxChange is too large.
//Now, go back up the undo stack.
for (i = theMaxChange - 1; i >= 0; i--) {
displayBuffer = new byte[theList[i].Length];
theSource = BitmapSource.Create((theColumns - i), (theRows - i),
96, 96, PixelFormats.Gray8, null, displayBuffer,
((theColumns - i) * PixelFormats.Gray8.BitsPerPixel + 7) / 8);
System.Console.WriteLine("Got to undo change " + i.ToString());
System.Threading.Thread.Sleep(100);
}
}
}
Now, if I'm explicit in calling the garbage collector, I have to wrap the entire code in an outer loop to cause the OOM crash. For me, this tends to happen around x = 50 or so:
public partial class Window1 : Window {
public Window1() {
InitializeComponent();
//Attempts to create an OOM crash
//to do so, mimic minute croppings of an 'image' (ushort array), and then undoing the crops
for (int x = 0; x < 1000; x++){
int theRows = 4000, currRows;
int theColumns = 4000, currCols;
int theMaxChange = 30;
int i;
List<ushort[]> theList = new List<ushort[]>();//the list of images in the undo/redo stack
byte[] displayBuffer = null;//the buffer used as a bitmap source
BitmapSource theSource = null;
for (i = 0; i < theMaxChange; i++) {
currRows = theRows - i;
currCols = theColumns - i;
theList.Add(new ushort[(theRows - i) * (theColumns - i)]);
displayBuffer = new byte[theList[i].Length];
theSource = BitmapSource.Create(currCols, currRows,
96, 96, PixelFormats.Gray8, null, displayBuffer,
(currCols * PixelFormats.Gray8.BitsPerPixel + 7) / 8);
}
//should get here. If not, then theMaxChange is too large.
//Now, go back up the undo stack.
for (i = theMaxChange - 1; i >= 0; i--) {
displayBuffer = new byte[theList[i].Length];
theSource = BitmapSource.Create((theColumns - i), (theRows - i),
96, 96, PixelFormats.Gray8, null, displayBuffer,
((theColumns - i) * PixelFormats.Gray8.BitsPerPixel + 7) / 8);
GC.WaitForPendingFinalizers();//force gc to collect, because we're in scenario 2, lots of large random changes
GC.Collect();
}
System.Console.WriteLine("Got to changelist " + x.ToString());
System.Threading.Thread.Sleep(100);
}
}
}
If I'm mishandling memory in either scenario, if there's something I should spot with a profiler, let me know. That's a pretty simple routine there.
Unfortunately, it looks like #Kevin's answer is right-- this is a bug in .NET and how .NET handles objects larger than 85k. This situation strikes me as exceedingly strange; could Powerpoint be rewritten in .NET with this kind of limitation, or any of the other Office suite applications? 85k does not seem to me to be a whole lot of space, and I'd also think that any program that uses so-called 'large' allocations frequently would become unstable within a matter of days to weeks when using .NET.
EDIT: It looks like Kevin is right, this is a limitation of .NET's GC. For those who don't want to follow the entire thread, .NET has four GC heaps: gen0, gen1, gen2, and LOH (Large Object Heap). Everything that's 85k or smaller goes on one of the first three heaps, depending on creation time (moved from gen0 to gen1 to gen2, etc). Objects larger than 85k get placed on the LOH. The LOH is never compacted, so eventually, allocations of the type I'm doing will eventually cause an OOM error as objects get scattered about that memory space. We've found that moving to .NET 4.0 does help the problem somewhat, delaying the exception, but not preventing it. To be honest, this feels a bit like the 640k barrier-- 85k ought to be enough for any user application (to paraphrase this video of a discussion of the GC in .NET). For the record, Java does not exhibit this behavior with its GC.

Here are some articles detailing problems with the Large Object Heap. It sounds like what you might be running into.
http://connect.microsoft.com/VisualStudio/feedback/details/521147/large-object-heap-fragmentation-causes-outofmemoryexception
Dangers of the large object heap:
http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
Here is a link on how to collect data on the Large Object Heap (LOH):
http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
According to this, it seems there is no way to compact the LOH. I can't find anything newer that explicitly says how to do it, and so it seems that it hasn't changed in the 2.0 runtime:
http://blogs.msdn.com/maoni/archive/2006/04/18/large-object-heap.aspx
The simple way of handling the issue is to make small objects if at all possible. Your other option to is to create only a few large objects and reuse them over and over. Not an idea situation, but it might be better than re-writing the object structure. Since you did say that the created objects (arrays) are of different sizes, it might be difficult, but it could keep the application from crashing.

Start by narrowing down where the problem lies. If you have a native memory leak, poking the GC is not going to do anything for you.
Run up perfmon and look at the .NET heap size and Private Bytes counters. If the heap size remains fairly constant but private bytes is growing then you've got a native code issue and you'll need to break out the C++ tools to debug it.
Assuming the problem is with the .NET heap you should run a profiler against the code like Redgate's Ant profiler or JetBrain's DotTrace. This will tell you which objects are taking up the space and not being collected quickly. You can also use WinDbg with SOS for this but it's a fiddly interface (powerful though).
Once you've found the offending items it should be more obvious how to deal with them. Some of the sort of things that cause problems are static fields referencing objects, event handlers not being unregistered, objects living long enough to get into Gen2 but then dying shortly after, etc etc. Without a profile of the memory heap you won't be able to pinpoint the answer.
Whatever you do though, "liberally sprinkling" GC.Collect calls is almost always the wrong way to try and solve the problem.
There is an outside chance that switching to the server version of the GC would improve things (just a property in the config file) - the default workstation version is geared towards keeping a UI responsive so will effectively give up with large, long running colections.

Use Process Explorer (from Sysinternals) to see what the Large Object Heap for your application is. Your best bet is going to be making your arrays smaller but having more of them. If you can avoid allocating your objects on the LOH then you won't get the OutOfMemoryExceptions and you won't have to call GC.Collect manually either.
The LOH doesn't get compacted and only allocates new objects at the end of it, meaning that you can run out of space quite quickly.

If you're allocating a large amount of memory in an unmanaged library (i.e. memory that the GC isn't aware of), then you can make the GC aware of it with the GC.AddMemoryPressure method.
Of course this depends somewhat on what the unmanaged code is doing. You haven't specifically stated that it's allocating memory, but I get the impression that it is. If so, then this is exactly what that method was designed for. Then again, if the unmanaged library is allocating a lot of memory then it's also possible that it's fragmenting the memory, which is completely beyond the GC's control even with AddMemoryPressure. Hopefully that's not the case; if it is, you'll probably have to refactor the library or change the way in which it's used.
P.S. Don't forget to call GC.RemoveMemoryPressure when you finally free the unmanaged memory.
(P.P.S. Some of the other answers are probably right, this is a lot more likely to simply be a memory leak in your code; especially if it's image processing, I'd wager that you're not correctly disposing of your IDIsposable instances. But just in case those answers don't lead you anywhere, this is another route you could take.)

Just an aside: The .NET garbage collector performs a "quick" GC when a function returns to its caller. This will dispose the local vars declared in the function.
If you structure your code such that you have one large function that allocates large blocks over and over in a loop, assigning each new block to the same local var, the GC may not kick in to reclaim the unreferenced blocks for some time.
If on the other hand, you structure your code such that you have an outer function with a loop that calls an inner function, and the memory is allocated and assigned to a local var in that inner function, the GC should kick in immediately when the inner function returns to the caller and reclaim the large memory block that was just allocated, because it's a local var in a function that is returning.
Avoid the temptation to mess with GC.Collect explicitly.

Apart from handling the allocations in a more GC-friendly way (e.g. reusing arrays etc.), there's a new option now: you can manually cause compaction of the LOH.
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
This will cause a LOH compaction the next time a gen-2 collection happens (either on its own, or by your explicit call of GC.Collect).
Do note that not compacting the LOH is usually a good idea - it's just that your scenario is a decent enough case for allowing for manual compaction. The LOH is usually used for huge, long-living objects - like pre-allocated buffers that are reused over time etc.
If your .NET version doesn't support this yet, you can also try to allocate in sizes of powers of two, rather than allocating precisely the amount of memory you need. This is what a lot of native allocators do to ensure memory fragmentation doesn't get impossibly stupid (it basically puts an upper limit on the maximum heap fragmentation). It's annoying, but if you can limit the code that handles this to a small portion of your code, it's a decent workaround.
Do note that you still have to make sure it's actually possible to compact the heap - any pinned memory will prevent compaction in the heap it lives in.
Another useful option is to use paging - never allocating more than, say, 64 kiB of contiguous space on the heap; this means you'll avoid using the LOH entirely. It's not too hard to manage this in a simple "array-wrapper" in your case. The key is to maintain a good balance between performance requirements and reasonable abstraction.
And of course, as a last resort, you can always use unsafe code. This gives you a lot of flexibility in handling memory allocations (though it's a bit more painful than using e.g. C++) - including allowing you to explicitly allocate unmanaged memory, do your work with that and release the memory manually. Again, this only makes sense if you can isolate this code to a small portion of your total codebase - and make sure you've got a safe managed wrapper for the memory, including the appropriate finalizer (to maintain some decent level of memory safety). It's not too hard in C#, though if you find yourself doing this too often, it might be a good idea to use C++/CLI for those parts of the code, and call them from your C# code.

Have you tested for memory leaks? I've been using .NET Memory Profiler with quite a bit of success on a project that had a number of very subtle and annoyingly persistent (pun intended) memory leaks.
Just as a sanity check, ensure that you're calling Dispose on any objects that implement IDisposable.

You could implement your own array class which breaks the memory into non-contiguious blocks. Say, have a 64 by 64 array of [64,64] ushort arrays which are allocated and deallocated seperately. Then just map to the right one. Location 66,66 would be at location [2,2] in the array at [1,1].
Then, you should be able to dodge the Large Object Heap.

The problem is most likely due to the number of these large objects you have in memory. Fragmentation would be a more likely issue if they are variable sizes (while it could still be an issue.) You stated in the comments that you are storing an undo stack in memory for the image files. If you move this to Disk you would save yourself tons of application memory space.
Also moving the undo to disk should not cause too much of a negative impact on performance because it's not something you will be using all of the time. (If it does become a bottle neck you can always create a hybrid disk/memory cache system.)
Extended...
If you are truly concerned about the possible impact of performance caused by storing undo data on the file system, you may consider that the virtual memory system has a good chance of paging this data to your virtual page file anyway. If you create your own page file/swap space for these undo files, you will have the advantage of being able to control when and where the disk I/O is called. Don't forget, even though we all wish our computers had infinite resources they are very limited.
1.5GB (useable application memory space) / 32MB (large memory request size) ~= 46

you can use this method:
public static void FlushMemory()
{
Process prs = Process.GetCurrentProcess();
prs.MinWorkingSet = (IntPtr)(300000);
}
three way to use this method.
1 - after dispose managed object such as class ,....
2 - create timer with such 2000 intervals.
3 - create thread to call this method.
i suggest to you use this method in thread or timer.

The best way to do it is like this article show, it is in spanish, but you sure understand the code.
http://www.nerdcoder.com/c-net-forzar-liberacion-de-memoria-de-nuestras-aplicaciones/
Here the code in case link get brock
using System.Runtime.InteropServices;
....
public class anyname
{
....
[DllImport("kernel32.dll", EntryPoint = "SetProcessWorkingSetSize", ExactSpelling = true, CharSet = CharSet.Ansi, SetLastError = true)]
private static extern int SetProcessWorkingSetSize(IntPtr process, int minimumWorkingSetSize, int maximumWorkingSetSize);
public static void alzheimer()
{
GC.Collect();
GC.WaitForPendingFinalizers();
SetProcessWorkingSetSize(System.Diagnostics.Process.GetCurrentProcess().Handle, -1, -1);
}
....
you call alzheimer() to clean/release memory.

The GC doesn't take into account the unmanaged heap. If you are creating lots of objects that are merely wrappers in C# to larger unmanaged memory then your memory is being devoured but the GC can't make rational decisions based on this as it only see the managed heap.
You end up in a situation where the GC doesn't think you are short of memory because most of the things on your gen 1 heap are 8 byte references where in actual fact they are like icebergs at sea. Most of the memory is below!
You can make use of these GC calls:
System::GC::AddMemoryPressure(sizeOfField);
System::GC::RemoveMemoryPressure(sizeOfField);
These methods allow the GC to see the unmanaged memory (if you provide it the right figures).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.