Deallocate memory from C# dictionary contained in a static object - c#

I had some problems with a WCF web service (some dumps, memory leaks, etc.) and I run a profillng tool (ANTS Memory Profiles).
Just to find out that even with the processing over (I run a specific test and then stopped), Generation 2 is 25% of the memory for the web service. I tracked down this memory to find that I had a dictionary object full of (null, null) items, with -1 hash code.
The workflow of the web service implies that during specific processing items are added and then removed from the dictionary (just simple Add and Remove). Not a big deal. But it seems that after all items are removed, the dictionary is full of (null, null) KeyValuePairs. Thousands of them in fact, such that they occupy a big part of memory and eventually an overflow occurs, with the corresponding forced application pool recycle and DW20.exe getting all the CPU cycles it can get.
The dictionary is in fact Dictionary<SomeKeyType, IEnumerable<KeyValuePair<SomeOtherKeyType, SomeCustomType>>> (System.OutOfMemoryException because of Large Dictionary) so I already checked if there is some kind of reference holding things.
The dictionary is contained in a static object (to make it accesible to different processing threads through processing) so from this question and many more (Do static members ever get garbage collected?) I understand why that dictionary is in Generation 2. But this is also the cause of those (null, null)? Even if I remove items from dictionary something will be always occupied in the memory?
It's not a speed issue like in this question Deallocate memory from large data structures in C# . It seems that memory is never reclaimed.
Is there something I can do to actually remove items from dictionary, not just keep filling it with (null, null) pairs?
Is there anything else I need to check out?

Dictionaries store items in a hash table. An array is used internally for this. Because of the way hash tables work, this array must always be larger than the actual number of items stored (at least about 30% larger). Microsoft uses a load factor of 72%, i.e. at least 28% of the array will be empty (see An Extensive Examination of Data Structures Using C# 2.0 and especially The System.Collections.Hashtable Class
and The System.Collections.Generic.Dictionary Class) Therefore the null/null entries could just represent this free space.
If the array is too small, it will grow automatically; however, when items are removed, the array does not shrink, but the space that will be freed up should be reused when new items are inserted.
If you are in control of this dictionary, you could try to re-create it in order to shrink it:
theDict = new Dictionary<TKey, IEnumerable<KeyValuePair<TKey2, TVal>>>(theDict);
But the problem might arise from the actual (non empty) entries. Your dictionary is static and will therefore never be reclaimed automatically by the garbage collector, unless you assign it another dictionary or null (theDict = new ... or theDict = null). This is only true for the dictionary itself which is static, not for its entries. As long as references to removed entries exist somewhere else, they will persist. The GC will reclaim any object (earlier or later) which cannot be accessed any more through some reference. It makes no difference, whether this object was declared static or not. The objects themselves are not static, only their references.
As #RobertTausig kindly pointed out, since .NET Core 2.1 there is the new Dictionary.TrimExcess(), which is what you actually wanted, but didn't exist back then.

Looks like you need to recycle space in that dict periodically. You can do that by creating a new one: new Dictionary<a,b>(oldDict). Be sure to do this in a thread-safe manner.
When to do this? Either on the tick of a timer (60sec?) or when a specific number of writes has occurred (100k?) (you'd need to keep a modification counter).

A solution could be to call Clear() method on the static dictionary.
In this way, the reference to the dictionary will remain available, but the objects contained will be released.

Related

C# Issue handling memory usage for SearchResultAttributeCollection LDAP

Using the System.DirectoryServices.Protocols library:
I have a class LdapItemOperator that takes a SearchResultEntry object from an LDAP query (not Active Directory related) and stores the attributes for the object in a field: readonly SearchResultAttributeCollection LdapAttributes.
The problem I am experiencing is that when I have a large operation the garbage collector seems to never delete these objects after they ought to have been disposed because of the LdapAttributes field in my objects, at least I think that's the problem. What ways can I try to dispose of the objects when they are no longer required? I can't seem to find a way to incorporate a using statement in there, although I only have little experience with it.
As an example, let's say I have the following logic:
List<LdapItemOperator> itemList = GetList(ldapFilter);
List<bool> resultList = new List<bool>();
foreach (IdmLdapItemOperator item in itemList) {
bool result = doStuff(item);
resultList.Add(result);
}
//Even though we are out of the loop now, the objects are still stored in memory, how come? Same goes for the previous objects in the loop, they seem to remain in memory
Logic.WriteResultToLog(result);
After a good while of running the logic on large filesets, this process starts taking up enormous amounts of memory, of course...
I think you might be a little confused about how GC works. You can never know exactly when GC will run. And objects you are still holding a reference to will not be collected (unless it's a weak reference...).
Also "disposing" is yet another different concept, that hasn't much to do with GC.
Basically, all objects will be in memory already after the call to GetList. And memory consumption will not change much after that, the foreach loop shouldn't affect it at all.
Without knowing your implementation, maybe try returning an enumerable instead of a single list, or make multiple batched calls.

Details of write barriers in the .Net Garbage Collector

I have a large T[] in generation 2 on the Large Object Heap. T is a reference type. I make the following assignment:
T[0] = new T(..);
Which object(s) are marked as dirty for the next Gen0/Gen1 mark phases of GC? The entire array instance, or just the new instance of T? Will the next Gen0/Gen1 GC mark phase have to go through every item of the array? (That would seem unnecessary and very inefficient.)
Are arrays special in this regard? Would it change the answer if the collection were e.g. a SortedList<K, T> and I added a new, maximal item?
I've read through many questions and articles, including the ones below, but I still don't think I've found a clear answer.
I'm aware that an entire range of memory is marked as dirty, not individual objects, but is the new array entry or the array itself the basis of this?
card table and write barriers in .net GC
Garbage Collector Basics and Performance Hints
Which object(s) are marked as dirty for the next Gen0/Gen1 mark phases of GC? The entire array instance, or just the new instance of T?
128B block containing the start of the array will be marked as dirty. The newly created instance (new T()) will be a new object, so it will first be checked through a Gen 0 collection without the card table.
For simplicity, presuming that the start of the array is aligned on a 128B boundary, this means the first 128B will be invalidated, so presuming T is a reference type and you're on a 64-bit system, that's first 16 items to check during the next collection.
Will the next Gen0/Gen1 GC mark phase have to go through every item of the array? (That would seem unnecessary and very inefficient.)
Just these 16 to 32 items, depending on the pointer size in this architecture.
Are arrays special in this regard? Would it change the answer if the collection were e.g. a SortedList and I added a new, maximal item?
Arrays are not special. A SortedList<K,T> maintains two arrays internally, so more blocks will end up dirty in the average case.
pretty sure its tracking array slots, not the root which is holding reference to array object itself.
btw if particual card is set dirty, it has to scan 4k of memory. ive read somewhere its now using windows' own mechanism which lets you get notifications if memory range in interest is written to.

Where is the List<MyClass> object buffer maintained? Is it on RAM or HDD?

My question might sound a little vague. But what I want to know is where the List<> buffer is maintained.
I have a list List<MyClass> to which I am adding items from an infinite loop. But the RAM consumption of the Windows Service(inside which I am creating the List) never goes beyond 17 MB. In fact it hovers between 15-16MB even if I continue adding items to the List.
I was trying to do some Load Testing of My Service and came across this thing.
Can anyone tell me whether it dumps the data to some temporary location on the machine, and picks it from there as I don't see an increase in RAM consumption.
The method which I am calling infinitely is AddMessageToList().
class MainClass
{
List<MessageDetails> messageList = new List<MessageDetails>();
private void AddMessageToList()
{
SendMessage(ApplicationName,Address, Message);
MessageDetails obj= new MessageDetails();
obj.ApplicationName= ApplicationName;
obj.Address= Address;
obj.Message= Message;
lock(messageList)
{
messageList.Add(obj);
}
}
}
class MessageDetails
{
public string Message
{
get;
set;
}
public string ApplicationName
{
get;
set;
}
public string Address
{
get;
set;
}
}
The answer to your question is: "In Memory".
That can mean RAM, and it can also mean the hard drive (Virtual Memory). The OS memory manager decides when to page memory to Virtual Memory, which mostly has to do with how often the memory is accessed (though I don't pretend to know Microsoft's specific algorithm).
You also asked why your memory usages isn't going up. First off, a MegaByte is a HUGE amount of memory. Unless your class is quite large, you will need a LOT of them to make a MB appear. Eventually your memory usage should go up though.
In general C# objects are created from the Heap, which resides in memory. If you want to store things on disk there are ways to go about it, but a standard List<T> will live in memory.
When you create an object it will occupy a certain number of bytes in memory plus the size of the pointers used to reference it. Adding it to a list only adds a pointer to the object you've already created, so if you're adding lots of copies of the same instance into the list, it won't grow as fast as you expect.
If you really want to test the impact of large data structures on your memory, you're going to have to ramp up the numbers. A few thousand average objects aren't going to occupy much memory, but a few million might.
You might also be interested in the GC.GetTotalMemory() method and its friends.
Note that pretty much all memory on Windows (and .NET) is Virtual Memory - its "real, physical" location is arbitrary, Windows memory management handles that. However, regardless of whether it's currently using physical RAM or a page file on the HDD, it will show up as committed private memory.
So it's up to how you're actually creating the items and adding them to the List<T>. How many objects are there? Are you adding the same object over and over again, or creating a new one every time? Are you using the same List instance, or are you creating others? Do you keep references to the created objects (and List instances), or are you throwing them away? Do you actually do anything with the object / list? If not, the optimizer might have removed the code alltogether (it's very conservative, though, so I wouldn't count on that in adding items to a list - that's a very complex scenario with possible side effects).
In the lowest memory ideal case, you could be using about four bytes per list item, that's not much - you'd need 262 144 items to consume a single MiB of memory!
Show us your code, the whole loop and it's surroundings. Then we can tell you what you're actually doing.
EDIT: This is in a WCF service? You should have said so before. Where do you store the MainClass class? If it's inside the WCF service class, it might not last longer than a single request. And even if you fix that, and store it in something a bit more persistent, like a static class, you get into the complexities of when everything is collected, how the service is being restarted etc. If you need the data to be safely held for longer than a single request, storing it in process memory isn't good enough. If you don't care that the data can get thrown away once in a while, you can make the List instance static (it's not going to be shared nor persisted otherwise). Otherwise, use a database.
Speculating from your sparse description, there are two sorts of things you might not realize:
The 15-16 MB usage you see might have nothing to do with the size of your list: it could be the memory requirements for the rest of the program, and your list only consumes a negligible amount of memory in comparison. Even if you don't explicitly create objects, your program still has to load libraries and stuff, which takes memory.
I don't know C# so I don't know if this applies to List, but one of the standard container class implementations to dynamically allocate an array to hold the objects... and if the array is ever filled, then you allocate a new array twice the size and copy everything over to the new array and continue along (the actual ratio may be something other than $2$). This can have the effect that your memory usage remains constant for a long time, until you finally fill up the array and then it suddenly jumps up in size, only to remain constant again for a long time.

Return object to pool when no references point to it

Ok, I want to do the following to me it seems like a good idea so if there's no way to do what I'm asking, I'm sure there's a reasonable alternative.
Anyways, I have a sparse matrix. It's pretty big and mostly empty. I have a class called MatrixNode that's basically a wrapper around each of the cells in the matrix. Through it you can get and set the value of that cell. It also has Up, Down, Left and Right properties that return a new MatrixNode that points to the corresponding cell.
Now, since the matrix is mostly empty, having a live node for each cell, including the empty ones, is an unacceptable memory overhead. The other solution is to make new instances of MatrixNode every time a node is requested. This will make sure that only the needed nodes are kept in the memory and the rest will be collected. What I don't like about it is that a new object has to be created every time. I'm scared about it being too slow.
So here's what I've come up with. Have a dictionary of weak references to nodes. When a node is requested, if it doesn't exist, the dictionary creates it and stores it as a weak reference. If the node does already exist (probably referenced somewhere), it just returns it.
Then, if the node doesn't have any live references left, instead of it being collected, I want to store it in a pool. Later, when a new node is needed, I want to first check if the pool is empty and only make a new node if there isn't one already available that can just have it's data swapped out.
Can this be done?
A better question would be, does .NET already do this for me? Am I right in worrying about the performance of creating single use objects in large numbers?
Instead of guessing, you should make a performance test to see if there are any issues at all. You may be surprised to know that managed memory allocation can often outperform explicit allocation because your code doesn't have to pay for deallocation when your data goes out of scope.
Performance may become an issue only when you are allocating new objects so frequently that the garbage collector has no chance to collect them.
That said, there are sparse array implementations in C# already, like Math.NET and MetaNumerics. These libraries are already optimized for performance and will probably avoid performance issues you will run into if you start your implementation from stratch
An SO search for c# and sparse-matrix will return many related questions, including answers pointing to commercial libraries like ILNumerics (has a community edition), NMath and Extreme Optimization's libraries
Most sparse matrix implementations use one of a few well-known schemes for their data; I generally recommend CSR or CSC, as those are efficient for common operations.
If that seems too complex, you can start using COO. What this means in your code is that you will not store anything for empty members; however, you have an item for every non-empty one. A simple implementation might be:
public struct SparseMatrixItem
{
int Row;
int Col;
double Value;
}
And your matrix would generally be a simple container:
public interface SparseMatrix
{
public IList<SparseMatrixItem> Items { get; }
}
You should make sure that the Items list stays sorted according to the row and col indices, because then you can use binary search to quickly find out if an item exists for a specific (i,j).
The idea of having a pool of objects that people use and then return to the pool is used for really expensive objects. Objects representing a network connection, a new thread, etc. It sounds like your object is very small and easy to create. Given that, you're almost certainly going to harm performance pooling it; the overhead of managing the pool will be greater than the cost of just creating a new one each time.
Having lots of short lived very small objects is the exact case that the GC is designed to handle quickly. Creating a new object is dirt cheap; it's just moving a pointer up and clearing out the bits for that object. The real overhead for objects comes in when a new garbage collection happens; for that it needs to find all "alive" objects and move them around, leaving all "dead" objects in their place. If your small object doesn't live through a single collection it has added almost no overhead. Keeping the objects around for a long time (like, say, by pooling them so you can reuse them) means copying them through several collections, consuming a fair bit of resources.

Is it bad form to let C# garbage collect a list instead of reusing it? [duplicate]

This question already has answers here:
Using the "clear" method vs. New Object
(5 answers)
Closed 8 years ago.
I have a list of elements that steadily grows, until I dump all the data from that list into a file. I then want to reuse that list for the same purpose again. Is it bad practice to simply assign it to a new list, instead of removing all the elements from the list? It seems garbage collection should take care of the old list, and that way I don't have to worry about removing the elements.
For example:
var myList = new List<element>();
myList.Add(someElement);
myList.Add(anotherElement);
// dumps the elements into a file
myList = new List<element>();
Edit: Even if there are easy ways around this, I was wondering too about the philosophical side of it. Is it bad to let something be garbage collected if there is a way around it? What are the costs of allowing garbage collection vs deleting the elements and reusing the same memory?
It depends a bit on how many elements are in the list. If the array backing the list is large enough to be on the large object heap, then you might be better off clearing the list and reusing it. This will reduce the number of large memory allocations, and will help reduce the problem of large object heap fragmentation. (See http://msdn.microsoft.com/en-us/magazine/cc534993.aspx and http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/ for more information; see http://blogs.msdn.com/b/dotnet/archive/2011/10/04/large-object-heap-improvements-in-net-4-5.aspx for improvements due with .NET 4.5)
If the lists are small, you might be better off just creating a new list, or you might get better performance calling Clear(). When in doubt, measure the performance.
Edit: In response to the philosophical question you pose in your edit, here are two reasons to create a new list:
In general, code is cleaner and easier to reason about if you do not reuse objects. The cost of garbage collection is low, the cost of confusing code is high.
Consider what happens if the code dumping the list's contents is in another function, as it most likely is. Once you've passed that list out of its local context, it's possible that there are non-local references to the same list. Other code might be modifying the list, or might be assuming (incorrectly) that you're not modifying it.
myList.Clear() is even easier to code than myList = new List<element>();
msdn: List.Clear Method
Each element in the list is a different object itself, and will need to be garbage collected whether you clear the list, or recreate a new list, or remove the items one at a time. What will NOT need to be garbage collected if you just clear the list and reuse it is the list itself. Unless your list is huge, containing hundreds of thousands of items, it will be difficult to measure a performance difference one way or the other. Fortunately, the garbage collector is highly optimized and it's a rare occurrence where developers need to consider what it is doing.
(As others have pointed out, there are various factors involved, such as...how many elements will you be adding to the new list? vs how many elements were in the old list? ...but the point is: the garbage collection of the list itself isn't relevant when it comes to collecting the elements of the list.)
I'm no expert, but:
Making a new list expecting that the GC will "take care" of the old one is probably a bad idea because it's a bad practice & probably inefficient.
Although it's a micro-optimization, I'd say that "setting" the new values until you reach list.Count, and the continuing to list.Add is the best way, because then you don't clear nor allocate unnecessary new memory (unless it's large lists which you want to clear for space)
Anyway, I would recommend using List.Clear() - it saves you and the GC trouble.
It sounds like you're asking two different questions. One is whether it's okay to set it to a new object or just clear it, which I think Eric answered pretty well. The second is whether you should just ignore the GC and let it work without trying to "help" it - to that, I'd say absolutely YES. Let the framework do what the framework does and stay out of its way until you have to.
A lot of programmers want to dig in too deep, and most of the time it causes more problems than it helps. The GC is designed to collect these things and clean them up for you. Unless you are seeing a very specific problem, you should write the code that works and pay ignore when something will be collected (with the exception of the using keyword when appropriate).
The important perspective is clean code.
when you create a new list the old one will be removed by the GC (if there are no other reference to it.)
I would rather to use List.Clear() to remove all the elements for re-use. The Capacity remain unchanged so there shouldn't have additional overhead cost and letting GC to handle the memory garbage collection so you can maintain clean code.

Categories

Resources