Monitoring my program's Virtual Bytes usage while it is running showed that by doing some kind operations, the virtual bytes usage goes up by about 1GB in about 5 minutes.
The program deals with tcp sockets and high data transfer throughput between them (~800Mbps).
Loading a dump file of the program in windbg showed that the reason for the very high and fast memory usage is about 1GB of "free" objects.
Indeed, when I call the garbage collector (gen 0, 1 & 2) from the console screen of the program (after getting it to this state) it frees up about 1GB of memory usage.
I'm trying to understand what exactly are these free objects and why aren't they garbage collected by the garbage collector automatically.
Edit: One suggestion was that I may be creating the objects in the Large Object Heap and it becomes fragmanted but this is not the case as I've seen that all of the "free" objects sits in Gen 2 Heap.
Other suggestion was that maybe Gen 2 Heap gets fragmented because of pinned objects but if this was the case, GC.Collect wouldn't fix the problem but it actually does so I believe this is not the case as well.
What I suspect from the discussion with Paul is that the memory does gets freed but is from some reason returned to the OS rarely or only when I manually call GC.Collect.
They are not free 'objects', they are free space. .NET does not release memory it has used back to the operating system immediately. Any free blocks can be used for subsequent object allocations, providing they fit inside the free block (otherwise the heap has to be extended by asking the operating system to allocate more memory).
The garbage collector makes efforts to combine free space into large usable blocks by compacting generation 2. This is not always possible: for example an application may pin objects which will potentially prevent the garbage collector from combining free space by moving the live objects to the front of the heap. If this happens a lot, the application's memory will be broken up into useless small blocks and this affect is known as 'heap fragmentation'.
In addition, there is a Large Object Heap (LOH) in which larger objects are allocated. The rationale is that there is a cost associated with heap compaction as data must be copied around and so the LOH is not compacted, avoiding these costs. However, the flipside is that the LOH can become easily fragmented, with small, less useful blocks of free memory interspersed between live objects.
I would suggest running a dumpheap -stat. This command will report at the end of the list any fragmented blocks. You can then dump those blocks to get an idea of what is happening.
By the way, it looks like you have a well-known problem (at least among socket gurus) that most socket servers get in .Net. Paul has already touched on what it means. To elaborate more, what is going wrong is that during a Read/Write on the socket the buffer gets pinned - which means the GC isn't allowed to move it around (as such your heap fragments). Coversant (who pioneered the solution) were seeing an OutOfMemoryException when their actual memory usage was only about 500MB (due to such heavy fragmentation). Fixing it is another story entirely.
What you want to do is at application start up allocate a very large amount of buffers (I am currently doing 50MB). You will find the new ArraySegment<T> (v2.0) and ConcurrentQueue<T> (v4.0) classes particularly useful when writing this. There are a few lock-free queues floating around on the tubes if you are not using v4.0 yet.
// Pseudo-code.
ArraySegment<byte> CheckOut()
{
ArraySegment<byte> result;
while(!_queue.TryDequeue(out result))
GrowBufferQueue(); //Enqueue a bunch more buffers.
return result;
}
void CheckOut(ArraySegment<byte> buffer)
{
_queue.Enqueue(buffer);
}
void GrowBufferQueue()
{
// Verify this, I did throw it together in 30s.
// Allocates nearly 2MB. You might want to tweak that.
for(var j = 0; j < 5; j++)
{
var buffer = new byte[409600]; // 4096 = page size on Windows.
for(var i = 0; i < 409600; i += 4096)
_queue.Enqueue(new ArraySegment<byte>(buffer, i, 4096));
}
}
Following this you will need to subclass NetworkStream and swap out the incoming buffer with one from your buffer pool. Buffer::BlockCopy will help performance (don't use Array::Copy). This is complicated and hairy; especially if you make it async-capable.
If you are not layering streams (e.g. SSLStream <--> DeflateStream <--> XmlWriter etc.) you should use the new socket async pattern in .Net 4.0; which has more efficiency around the IAsyncResults that get passed around. Because you are not layering streams you have full control over the buffers that get used - so you don't need to go the NetworkStream subclass route.
Related
Using the Visual Studio Concurrency Visualizer I now see why I don't get any benefit switching to Parallel.For: only the 9% of the time the machine is busy executing the code, the rest is 71% synchronization and 17% memory management (1).
Checking all the orange stripes on the diagram below I discovered that GC is always involved (2).
After reading all these interesting topics...
Why do I have a lock here?
https://blog.marcgravell.com/2011/10/assault-by-gc.html
Prevent .NET Garbage collection for short period of time
https://devblogs.microsoft.com/premier-developer/understanding-different-gc-modes-with-concurrency-visualizer/
.. am I right assuming that all these threads need to play with a single memory management object and therefore removing the need to allocate objects on the heap my scenario will improve considerably? Like using structs instead of classes, array instead of dynamic lists, etc.?
I have a lot of work to do to bend my code in this direction. Just wanted to be sure before starting.
From your screenshot it seems like memory allocation is blocked while waiting for GC to complete. There are server and workstation GC modes, and it may be concurrent or not, but all options need to block threads at least a little while. I would check in more detail how often, and how much time you are spending in GC, and how often gen 0/1 and 2 is running.
I believe that each thread has a separate ephemeral segment it uses for allocations, so that it would not need to synchronize allocations, unless it needs a new segment, or the allocation is on the large object heap. But I'm unable to find a reference for this.
In any case, you will likely benefit from reducing the amount and size of allocations. If possible, use a object pool or memory pool to reuse memory. You might also benefit from increasing the amount of memory and checking the application for memory leaks. A general recommendation for memory is that there should be two types of allocations:
Small temporary allocations that only live for a short duration, like a temporary object that live for the duration of a method call.
Long lived allocations of any size that live for the duration of the "application".
If this pattern is followed almost all garbage should be collected in Gen 0/1, and gen 2 collections should be fairly rare.
It also depends a bit if you are allocating many small objects, or large chunks of memory. If the former you may consider using structs since these are stack allocated. If the later you also need to consider memory fragmentation, and this should also improve by using a memory pool that only allocates fixed sized chunks of memory.
Edit:
At the very simplest a object pool could be something like this:
public class ObjectPool<T>
{
private ConcurrentBag<T> pool = new ConcurrentBag<T>();
public T Get(Func<T> constructor) => pool.TryTake(out var result) ? result : constructor();
public void Return(T obj) => pool.Add(obj);
}
This assumes that the objects represent identical resources, like byte arrays of some fixed size. But there are also existing implementations:
.Net core MemoryPool
asp.Net core object pool
stack overflow question regarding object pools
Memory Management The Memory Management report shows the calls where memory management blocks occurred, along with the total blocking
times of each call stack. Use this information to identify areas that
have excessive paging or garbage collection issues.
Further more
Memory management time
These segments in the timeline are associated with blocking times that
are categorized as Memory Management. This scenario implies that a
thread is blocked by an event that is associated with a memory
management operation such as Paging. During this time, a thread has
been blocked in an API or kernel state that the Concurrency Visualizer
is counting as memory management. These include events such as paging
and memory allocation. Examine the associated call stacks and profile
reports to better understand the underlying reasons for blocks that
are categorized as Memory Management.
Yes, allocating less will likely have a large benefit on your resources and efficiency, but that is almost always the case on hot paths and thrashed applications
Heap allocations and particular Large Object Heap (LOB) allocations are costly, it also creates extra work for your The Garbage Collector and can fragment your memory causing even more inefficiency. The less you allocate, or reuse memory, or use the stack the better you are (in general).
This is also where you would learn to use a good memory profiler and get to know your garbage collector.
On saying that this would not be the only tool you would use to make your application less allocatey. A good memory profiler will go a long way, combined with learning how to read the results and affect changes based on the results.
Creating minimal allocation code is an artform, and one worth your learning
Also as #mjwills pointed out in the comments, you would run any change through your benchmark software as well, removing allocations at the cost of CPU time won't make sense. There are a lot of ways to speed up code, and low allocation is just one of a lot of approaches that may help.
Lastly, I would suggest following Marc Gravell and his blogs as a start (Mr DeAllocation), get to know your Garbage Collector and how the generations wortk, and tools like memory profilers and benchmarkers for performant silky smooth production code
There is a possible memory leak in this statement for a very large string (tempText can grow as big as ~10mb).
string strXML = new string(tempText.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray());
Memory allocated for strXML doesn't get released even after exiting the function. And I've to call this function multiple times. Any possible solution, without having this string as a class variable?
I'm not very familiar with C# memory management, can someone shed some light on this issue?
The garbage collector doesn't collect objects the instance that their lifetime ends. It executes periodically, based on it's perceived need, to free up memory. The string will be eventually collected at some indeterminate point in time after it is long longer referenced by any rooted object.
When you build up a large object, it will remain in memory for much longer than other small objects.
Read up on the large object heap and generation 2 garbage collection... it gets technical but those two terms should suffice to point out what's going on here.
That's why you are not seeing the garbage collector reclaim that memory as quickly as you'd like.
In order to overcome this, either allocate work buffers once and reuse them, or work on the data in smaller chunks.
I have a class that has a byte array holding from 1,048,576 bytes up to 134,217,728 bytes.
In the dispose void I have the array set to null and the method calling the dispose calls GC.Collect after that.
If I dispose right away I will get my memory back, but if I wait like 10 hours and dispose, memory usage doesn't change.
Memory usage is based on OS memory allocation. It may be freed immediately, it may not be. This depends on OS utilization, application, etc. You free it in the runtime, but this doesn't mean the OS alway gets it back. I have to think heres a determination based on memory patterns (ie time here) that affects this calculation of when to return it or not. This was addressed here:
Explicitly freeing memory in c#
Note: if the memory isn't released, it doesn't automatically means it used. It's up to the CLR whether it's releases chuncks of memory or not, but this memory isn't wasted.
That said, if you want a technical explanation, you'll want to read litterature about the Large Object Heap: http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
Basically, it's a zone of memory where very large objects (more than 85kB) are allocated. It differs from the other zones of memory in that it's never compacted, and thus can become fragmented. I think what happens in your case is:
Case 1: you allocate the object and immediately call GC.Collect. The object is allocated at the end of the heap, then freed. The CLR sees a free segment at the end of the heap and releases it to the OS.
Case 2: you allocate the object and wait for a while. In the mean time, an other object is allocated in the LOH. Now, your object isn't the last one anymore. Then, when you call GC.Collect, your object is erased, but there's still the other object(s) at the end of the memory segment. So the CLR cannot release the memory to the OS.
Just a guess based on my knowledge of memory management in .NET. I may be completely wrong.
Your findings are not unusual, but it doesn't mean that anything is wrong either. In order to be collected, something must prompt the GC to collect (often an attempted allocation). As a result, you can build an app that consumes a bunch of memory, releases it, and then goes idle. If there is no memory pressure on the machine, and if your app doesn't try to do anything after that, the GC won't fire (because it doesn't need to). Once you get busy, the GC will kick in and do its job. This behavior is very commonly mistaken for a leak.
BTW:
Are you using that very large array more than once? If so, you might be better off keeping it around and reusing it. Reason: any object larger than 85,000 bytes is allocated on the Large Object Heap. That heap only gets GC'd on Generation 2 collections. So if you are allocating and reallocating arrays very often, you will be causing a lot of Gen 2 (expensive) collections.
(note: that doesn't mean that there's a hard and fast rule to always reuse large arrays, but if you are doing a lot of allocation/deallocation/allocation of the array, you should measure how much it helps if you re-use).
I am working on a relatively large solution in Visual Studio 2010. It has various projects, one of them being an XNA Game-project, and another one being an ASP.NET MVC 2-project.
With both projects I am facing the same issue: After starting them in debug mode, memory usage keeps rising. They start at 40 and 100MB memory usage respectively, but both climb to 1.5GB relatively quickly (10 and 30 minutes respectively). After that it would sometimes drop back down to close to the initial usage, and other times it would just throw OutOfMemoryExceptions.
Of course this would indicate severe memory leaks, so that is where I initially tried to spot the problem. After searching for leaks unsuccesfully, I tried calling GC.Collect() regularly (about once per 10 seconds). After introducing this "hack", memory usage stayed at 45 and 120MB respectively for 24 hours (until I stopped testing).
.NET's garbage collection is supposed to be "very good", but I can't help suspecting that it just doesn't do its job. I have used CLR Profiler in an attempt to troubleshoot the issue, and it showed that the XNA project seemed to have saved a lot of byte arrays I was indeed using, but to which the references should already be deleted, and therefore collected by the garbage collector.
Again, when I call GC.Collect() regularly, the memory usage issues seem to have gone. Does anyone know what could be causing this high memory usage? Is it possibly related to running in Debug mode?
After searching for leaks unsuccesfully
Try harder =)
Memory leaks in a managed language can be tricky to track down. I have had good experiences with the Redgate ANTS Memory Profiler. It's not free, but they give you a 14 day, full-featured trial. It has a nice UI and shows you where you memory is allocated and why these objects are being kept in memory.
As Alex says, event handlers are a very common source of memory leaks in a .NET app. Consider this:
public static class SomeStaticClass
{
public event EventHandler SomeEvent;
}
private class Foo
{
public Foo()
{
SomeStaticClass.SomeEvent += MyHandler;
}
private void MyHandler( object sender, EventArgs ) { /* whatever */ }
}
I used a static class to make the problem as obvious as possible here. Let's say that, during the life of your application, many Foo objects are created. Each Foo subscribes to the SomeEvent event of the static class.
The Foo objects may fall out of scope at one time or another, but the static class maintains a reference to each one via the event handler delegate. Thus, they are kept alive indefinitely. In this case, the event handler simply needs to be "unhooked".
...the XNA project seemed to have saved a lot of byte arrays I was indeed using...
You may be running into fragmentation in the LOH. If you are allocating large objects very frequently they may be causing the problem. The total size of these objects may be much smaller than the total memory allocated to the runtime, but due to fragmentation there is a lot of unused memory allocated to your application.
The profiler I linked to above will tell you if this is a problem. If it is, you will likely be able to track it down to an object leak somewhere. I just fixed a problem in my app showing the same behavior and it was due to a MemoryStream not releasing its internal byte[] even after calling Dispose() on it. Wrapping the stream in a dummy stream and nulling it out fixed the problem.
Also, stating the obvious, make sure to Dispose() of your objects that implement IDisposable. There may be native resources lying around. Again, a good profiler will catch this.
My suggestion; it's not the GC, the problem is in your app. Use a profiler, get your app in a high memory consumption state, take a memory snapshot and start analyzing.
First and foremost, the GC works, and works well. There's no bug in it that you have just discovered.
Now that we've gotten that out of the way, some thoughts:
Are you using too many threads?
Remember that GC is non deterministic; it'll run whenever it thinks it needs to run (even if you call GC.Collect().
Are you sure all your references are going out of scope?
What are you loading into memory in the first place? Large images? Large text files?
Your profiler should tell you what's using so much memory. Start cutting at the biggest culprits as much as you can.
Also, calling GC.Collect() every X seconds is a bad idea and will unlikely solve your real problem.
Analyzing memory issues in .NET is not a trivial task and you definitely should read several good articles and try different tools to achieve the result. I end up with the following article after investigations: http://www.alexatnet.com/content/net-memory-management-and-garbage-collector You can also try to read some of Jeffrey Richter's articles, like this one: http://msdn.microsoft.com/en-us/magazine/bb985010.aspx
From my experience, there are two most common reasons for Out-Of-Memory issue:
Event handlers - they may hold the object even when no other objects referencing it. So ideally you need to unsubscribe event handlers to destroy the object automatically.
Finalizer thread is blocked by some other thread in STA mode. For example, when STA thread does a lot of work the other threads are stopped and the objects that are in the finalization queue cannot be destroyed.
Edit: Added link to Large Object Heap fragmentation.
Edit: Since it looks like it is a problem with allocating and throwing away the Textures, can you use Texture2D.SetData to reuse the large byte[]s?
First, you need to figure out whether it is managed or unmanaged memory that is leaking.
Use perfmon to see what happens to your process '.net memory# Bytes in all Heaps' and Process\Private Bytes. Compare the numbers and the memory rises. If the rise in Private bytes outpaces the rise in heap memory, then it's unmanaged memory growth.
Unmanaged memory growth would point to objects that are not being disposed (but eventually collected when their finalizer executes).
If it's managed memory growth, then we'll need to see which generation/LOH (there are also performance counters for each generation of heap bytes).
If it's Large Object Heap bytes, you'll want to reconsider the use and throwing away of large byte arrays. Perhaps the byte arrays can be re-used instead of discarded. Also, consider allocating large byte arrays that are powers of 2. This way, when disposed, you'll leave a large "hole" in the large object heap that can be filled by another object of the same size.
A final concern is pinned memory, but I don't have any advice for you on this because I haven't ever messed with it.
I would also add that if you are doing any file access, make sure that you are closing and/or disposing of any Readers or Writers. You should have a matching 1-1 between opening any file and closing it.
Also, I usually use the using clause for resources, like a Sql Connection:
using (var connection = new SqlConnection())
{
// Do sql connection work in here.
}
Are you implementing IDisposable on any objects and possibly doing something custom that is causing any issues? I would double check all of your IDisposable code.
The GC doesn't take into account the unmanaged heap. If you are creating lots of objects that are merely wrappers in C# to larger unmanaged memory then your memory is being devoured but the GC can't make rational decisions based on this as it only see the managed heap.
You end up in a situation where the GC collector doesn't think you are short of memory because most of the things on your gen 1 heap are 8 byte references where in actual fact they are like icebergs at sea. Most of the memory is below!
You can make use of these GC calls:
System::GC::AddMemoryPressure(sizeOfField);
System::GC::RemoveMemoryPressure(sizeOfField);
These methods allow the garbage collector to see the unmanaged memory (if you provide it the right figures)
My application does a good deal of binary serialization and compression of large objects. Uncompressed the serialized dataset is about 14 MB. Compressed it is arround 1.5 MB. I find that whenever I call the serialize method on my dataset my large object heap performance counter jumps up from under 1 MB to about 90 MB. I also know that under a relatively heavy loaded system, usually after a while of running (days) in which this serialization process happens a few time, the application has been known to throw out of memory excpetions when this serialization method is called even though there seems to be plenty of memory. I'm guessing that fragmentation is the issue (though i can't say i'm 100% sure, i'm pretty close)
The simplest short term fix (i guess i'm looking for both a short term and a long term answer) i can think of is to call GC.Collect right after i'm done the serialization process. This, in my opinion, will garbage collect the object from the LOH and will do so likely BEFORE other objects can be added to it. This will allow other objects to fit tightly tightly against the remaining objects in the heap without causing much fragmentation.
Other than this ridiculous 90MB allocation i don't think i have anything else that uses a lost of the LOH. This 90 MB allocation is also relatively rare (arround every 4 hours). We of course will still have the 1.5 MB array in there and maybe some other smaller serialized objects.
Any ideas?
Update as a result of good responses
Here is my code which does the work. I've actually tried changing this to compress WHILE serializing so that serialization serializes to a stream at the same time and i don't get much better result. I've also tried preallocating the memory stream to 100 MB and trying to use the same stream twice in a row, the LOH goes up to 180 MB anyways. I'm using Process Explorer to monitor it. It's insane. I think i'm going to try the UnmanagedMemoryStream idea next.
I would encourage you guys to try it out if you wont. It doesn't have to be this exact code. Just serialize a large dataset and you will get surprising results (mine has lots of tables, arround 15 and lots of strings and columns)
byte[] bytes;
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter serializer =
new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
System.IO.MemoryStream memStream = new System.IO.MemoryStream();
serializer.Serialize(memStream, obj);
bytes = CompressionHelper.CompressBytes(memStream.ToArray());
memStream.Dispose();
return bytes;
Update after trying binary serialization with UnmanagedMemoryStream
Even if I serialize to an UnmanagedMemoryStream the LOH jumps up to the same size. It seems that no matter what i do, called the BinaryFormatter to serialize this large object will use the LOH. As for pre-allocating, it doesn't seem to help much. Say i pre-allocate say i preallocate 100MB, then i serialize, it will use 170 MB. Here is the code for that. Even simpler than the above code
BinaryFormatter serializer = new BinaryFormatter();
MemoryStream memoryStream = new MemoryStream(1024*1024*100);
GC.Collect();
serializer.Serialize(memoryStream, assetDS);
The GC.Collect() in the middle there is just to update the LOH performance counter. You will see that it will allocate the correct 100 MB. But then when you call the serialize, you will notice that it seems to add that on top of the 100 that you have already allocated.
Beware of the way collection classes and streams like MemoryStream work in .NET. They have an underlying buffer, a simple array. Whenever the collection or stream buffer grows beyond the allocated size of the array, the array gets re-allocated, now at double the previous size.
This can cause many copies of the array in the LOH. Your 14MB dataset will start using the LOH at 128KB, then take another 256KB, then another 512KB, etcetera. The last one, the one actually used, will be around 16MB. The LOH contains the sum of these, around 30MB, only one of which is in actual use.
Do this three times without a gen2 collection and your LOH has grown to 90MB.
Avoid this by pre-allocating the buffer to the expected size. MemoryStream has a constructor that takes an initial capacity. So do all collection classes. Calling GC.Collect() after you've nulled all references can help unclog the LOH and purge those intermediate buffers, at the cost of clogging the gen1 and gen2 heaps too soon.
Unfortunately, the only way I could fix this was to break up the data in chunks so as not to allocate large chunks on the LOH. All the proposed answers here were good and were expected to work but they did not. It seems that the binary serialization in .NET (using .NET 2.0 SP2) does its own little magic under the hood which prevents users from having control over memory allocation.
Answer then to the question would be "this is not likely to work". When it comes to using .NET serialization, your best bet is to serialize the large objects in smaller chunks. For all other scenarios, the answers mentioned above are great.
90MB of RAM is not much.
Avoid calling GC.Collect unless you have a problem. If you have a problem, and no better fix, try calling GC.Collect and seeing if your problem is solved.
Don't worry about LOH size jumping up. Worry about allocating/deallocating LOH. .Net very dumb about LOH -- rather than allocating LOH objects far away from regular heap, it allocates at next available VM page. I have a 3D app that does much allocate/deallocate of both LOH and regular objects -- the result (as seen in DebugDiag dump report) is that pages of small heap and large heap end up alternating throughout RAM, until there are no large chunks of the applications 2 GB VM space left. The solution when possible is to allocate once what you need, and then don't release it -- re-use it next time.
Use DebugDiag to analyze your process. See how the VM addresses gradually creep up towards 2 GB address mark. Then make a change that keeps that from happening.
I agree with some of the other posters here that you might want to try and use tricks to work with the .NET Framework instead of trying to force it to work with you via GC.Collect.
You may find this Channel 9 video helpful which discusses ways to ease pressure on the Garbage collector.
If you really need to use the LOH for something like a service or something that needs to be running for a long time, you need to use buffer pools that are never deallocated and that you can ideally allocate on start-up. This means you'll have to do your 'memory management' yourself for this, of course.
Depending on what you're doing with this memory, you might also have to p/Invoke over to native code for selected parts to avoid having to call some .NET API that forces you to put the data on newly allocated space in the LOH.
This is a good starting point article about the issues: https://devblogs.microsoft.com/dotnet/using-gc-efficiently-part-3/
I'd consider you very lucky if you GC trick would work, and it would really only work if there isn't much going on at the same time in the system. If you have work going on in parallel, this will just slightly delay the unevitable.
Also read up on the documentation about GC.Collect.IIRC, GC.Collect(n) only says that it collects no further than the generation n -- not that it actually ever GETS to generation n.