C# Issue handling memory usage for SearchResultAttributeCollection LDAP - c#

Using the System.DirectoryServices.Protocols library:
I have a class LdapItemOperator that takes a SearchResultEntry object from an LDAP query (not Active Directory related) and stores the attributes for the object in a field: readonly SearchResultAttributeCollection LdapAttributes.
The problem I am experiencing is that when I have a large operation the garbage collector seems to never delete these objects after they ought to have been disposed because of the LdapAttributes field in my objects, at least I think that's the problem. What ways can I try to dispose of the objects when they are no longer required? I can't seem to find a way to incorporate a using statement in there, although I only have little experience with it.
As an example, let's say I have the following logic:
List<LdapItemOperator> itemList = GetList(ldapFilter);
List<bool> resultList = new List<bool>();
foreach (IdmLdapItemOperator item in itemList) {
bool result = doStuff(item);
resultList.Add(result);
}
//Even though we are out of the loop now, the objects are still stored in memory, how come? Same goes for the previous objects in the loop, they seem to remain in memory
Logic.WriteResultToLog(result);
After a good while of running the logic on large filesets, this process starts taking up enormous amounts of memory, of course...

I think you might be a little confused about how GC works. You can never know exactly when GC will run. And objects you are still holding a reference to will not be collected (unless it's a weak reference...).
Also "disposing" is yet another different concept, that hasn't much to do with GC.
Basically, all objects will be in memory already after the call to GetList. And memory consumption will not change much after that, the foreach loop shouldn't affect it at all.
Without knowing your implementation, maybe try returning an enumerable instead of a single list, or make multiple batched calls.

Related

Running Enumerable multiple times if the data source remain unchanged

I understand that IEnumerable might have a risk to return different result on multiple run.
But, is that still a problem if we sure the underlying record set will never change in between and the sequence of the loop doesn't matter at all ?
It's such a shame to call ToList / ToArray everywhere without any consideration that it's just a "possible" risk. R# or VS can simply mark it as error if it should never happened.
Is that really no exception at all?
We should never iterate IEnumerable multiple times?
This is what actually happened.
In a single threaded environment.
void Main()
{
var result = GetFile(new [] {path1, path2}) // hardcoded path
}
IList<SomeFile> GetFiles(IEnumerable<string> filePaths)
{
var paths = filePaths.ToArray(); // <-- why we have to do this ?
foreach(var path In paths)
// Throw exception if the path not exist.
foreach (var path In paths)
// Process and return a list of file
}
I understand it makes not much difference as the collection is so small but we are at the beginning of implementing a project that required to deal with big collection of static data. This kinda practice might be a big problem if apply to all areas without considering whether if it is necessary or not.
The concern of getting different result on second iteration is a distant second to a much more realistic performance concern. Unless your IEnumerable<T> is actually a collection in memory, you are running the risk of having to reproduce it each time you enumerate. This could be very costly:
If IEnumerable<T> comes from another LINQ expression, you spend CPU cycles to recompute the same thing,
If IEnumerable<T> comes from the database, you may end up re-reading the data from the server,
If IEnumerable<T> comes from a file, you will re-read the file.
None of the above have an effect on correctness, but it may dramatically decrease the speed, especially for large data sets. Since memory is relatively cheap these days, and garbage collection system is pretty reliable, temporarily saving collections in a list or an array is an inexpensive way to avoid the problem.
If it is in-memory collection and not abstract enumerable object - just use appropriate interface (IReadOnlyCollection, IReadOnlyList, etc.) instead of IEnumerable. If you use IEnumerable you should assume that it can be any IEnumerable implementation.

C# WebAPI Garbage Collection

I just delivered my first C# WebAPI application to the first customer. Under normal load, performance initially is even better than I expected. Initially.
Everything worked fine until, at some point, memory was up and garbage collection started running riot (as in "It collects objects that are not yet garbage"). At that point, there were multiple W3WP threads with some ten gigs of ram altogether, and single-digit gigs per worker. After a restart of the IIS everything was back to normal, but of course the memory usage is rising again.
Please correct me if I am wrong, but
Shouldn't C# have automatic garbage collection?
Shouldn't it be easy for GC to collect the garbage of a WebAPI application?
And please help me out:
How can I explicitly state what GC should collect, thus preventing memory leaks? Is someBigList = null; the way to go?
How can I detect where the memory leaks are?
EDIT: Let me clarify some things.
My .NET WebAPI application is mostly a bunch of
public class MyApiController:ApiController
{
[HttpGet]
public MyObjectClass[] MyApi(string someParam) {
List<MyObjectClass> list = new List<MyObjectClass>();
...
for/while/foreach {
MyObjectClass obj = new MyObjectClass();
obj.firstStringAttribute = xyz;
...
list.Add(obj);
}
return list.ToArray();
}
}
Under such conditions, GC should be easy: after "return", all local variables should be garbage. Yet with every single call the used memory increases.
I initially thought that C# WebAPI programs behave similar to (pre-compiled) PHP: IIS calls the program, it is executed, returns the value and is then completely disposed off.
But this is not the case. For instance, I found static variables to keep their data between runs, and now I disposed of all static variables.
Because I found static variables to be a problem for GC:
internal class Helper
{
private static List<string> someVar = new List<string>();
internal Helper() {
someVar=new List<string>();
}
internal void someFunc(string str) {
someVar.Add(str);
}
internal string[] someOtherFunc(string str) {
string[] s = someVar.ToArray();
someVar=new List<string>();
return s;
}
}
Here, under low-memory conditions, someVar threw a null pointer error, which in my opinion can only be caused by GC, since I did not find any code where someVar is actively nullified by me.
I think the memory increase slowed down since I actively set the biggest array variables in the most often used Controllers to null, but this is only a gut feeling and not even nearly a complete solution.
I will now do some profiling using the link you provided, and get back with some results.
Shouldn't C# have automatic garbage collection?
C# is a programming language for the .NET runtime, and .NET brings the automatic garbage collection to the table. So, yes, although technically C# isn't the piece that brings it.
Shouldn't it be easy for GC to collect the garbage of a WebAPI application?
Sure, it should be just as easy as for any other type of .NET application.
The common theme here is garbage. How does .NET determine that something is garbage? By verifying that there are no more live references to the object. To be honest I think it is far more likely that you have verified one of your assumptions wrongly, compared to there being a serious bug in the garbage collector in such a way that "It collects objects that are not yet garbage".
To find leaks, you need to figure out what objects are currently held in memory, make a determination whether that is correct or not, and if not, figure out what is holding them there. A memory profiler application would help with that, there are numerous available, such as the Red-Gate ANTS Memory Profiler.
For your other questions, how to make something eligible for garbage collection? By turning it into garbage (see definition above). Note that setting a local variable to null may not necessarily help or be needed. Setting a static variable to null, however, might. But the correct way to determine that is to use a profiler.
Here are some shot-in-the-dark type of tips you might look into:
Look at static classes, static fields, and static properties. Are you storing data there that is accumulating?
How about static events? Do you have this? Do you remember to unsubscribe the event when you no longer need it?
And by "static fields, properties, and events", I also mean normal instance fields, properties and events that are held in objects that directly or indirectly are stored in static fields or properties. Basically, anything that will keep the objects in memory.
Are you remembering to Dispose of all your IDisposable objects? If not, then the memory being used could be unmanaged. Typically, however, when the garbage collector collects the managed object, the finalizer of that object should clean up the unmanaged memory as well, however you might allocate memory that the GC algorithm isn't aware of, and thus thinks it isn't a big problem to wait with collection. See the GC.AddMemoryPressure method for more on this.

Return object to pool when no references point to it

Ok, I want to do the following to me it seems like a good idea so if there's no way to do what I'm asking, I'm sure there's a reasonable alternative.
Anyways, I have a sparse matrix. It's pretty big and mostly empty. I have a class called MatrixNode that's basically a wrapper around each of the cells in the matrix. Through it you can get and set the value of that cell. It also has Up, Down, Left and Right properties that return a new MatrixNode that points to the corresponding cell.
Now, since the matrix is mostly empty, having a live node for each cell, including the empty ones, is an unacceptable memory overhead. The other solution is to make new instances of MatrixNode every time a node is requested. This will make sure that only the needed nodes are kept in the memory and the rest will be collected. What I don't like about it is that a new object has to be created every time. I'm scared about it being too slow.
So here's what I've come up with. Have a dictionary of weak references to nodes. When a node is requested, if it doesn't exist, the dictionary creates it and stores it as a weak reference. If the node does already exist (probably referenced somewhere), it just returns it.
Then, if the node doesn't have any live references left, instead of it being collected, I want to store it in a pool. Later, when a new node is needed, I want to first check if the pool is empty and only make a new node if there isn't one already available that can just have it's data swapped out.
Can this be done?
A better question would be, does .NET already do this for me? Am I right in worrying about the performance of creating single use objects in large numbers?
Instead of guessing, you should make a performance test to see if there are any issues at all. You may be surprised to know that managed memory allocation can often outperform explicit allocation because your code doesn't have to pay for deallocation when your data goes out of scope.
Performance may become an issue only when you are allocating new objects so frequently that the garbage collector has no chance to collect them.
That said, there are sparse array implementations in C# already, like Math.NET and MetaNumerics. These libraries are already optimized for performance and will probably avoid performance issues you will run into if you start your implementation from stratch
An SO search for c# and sparse-matrix will return many related questions, including answers pointing to commercial libraries like ILNumerics (has a community edition), NMath and Extreme Optimization's libraries
Most sparse matrix implementations use one of a few well-known schemes for their data; I generally recommend CSR or CSC, as those are efficient for common operations.
If that seems too complex, you can start using COO. What this means in your code is that you will not store anything for empty members; however, you have an item for every non-empty one. A simple implementation might be:
public struct SparseMatrixItem
{
int Row;
int Col;
double Value;
}
And your matrix would generally be a simple container:
public interface SparseMatrix
{
public IList<SparseMatrixItem> Items { get; }
}
You should make sure that the Items list stays sorted according to the row and col indices, because then you can use binary search to quickly find out if an item exists for a specific (i,j).
The idea of having a pool of objects that people use and then return to the pool is used for really expensive objects. Objects representing a network connection, a new thread, etc. It sounds like your object is very small and easy to create. Given that, you're almost certainly going to harm performance pooling it; the overhead of managing the pool will be greater than the cost of just creating a new one each time.
Having lots of short lived very small objects is the exact case that the GC is designed to handle quickly. Creating a new object is dirt cheap; it's just moving a pointer up and clearing out the bits for that object. The real overhead for objects comes in when a new garbage collection happens; for that it needs to find all "alive" objects and move them around, leaving all "dead" objects in their place. If your small object doesn't live through a single collection it has added almost no overhead. Keeping the objects around for a long time (like, say, by pooling them so you can reuse them) means copying them through several collections, consuming a fair bit of resources.

Deallocate memory from C# dictionary contained in a static object

I had some problems with a WCF web service (some dumps, memory leaks, etc.) and I run a profillng tool (ANTS Memory Profiles).
Just to find out that even with the processing over (I run a specific test and then stopped), Generation 2 is 25% of the memory for the web service. I tracked down this memory to find that I had a dictionary object full of (null, null) items, with -1 hash code.
The workflow of the web service implies that during specific processing items are added and then removed from the dictionary (just simple Add and Remove). Not a big deal. But it seems that after all items are removed, the dictionary is full of (null, null) KeyValuePairs. Thousands of them in fact, such that they occupy a big part of memory and eventually an overflow occurs, with the corresponding forced application pool recycle and DW20.exe getting all the CPU cycles it can get.
The dictionary is in fact Dictionary<SomeKeyType, IEnumerable<KeyValuePair<SomeOtherKeyType, SomeCustomType>>> (System.OutOfMemoryException because of Large Dictionary) so I already checked if there is some kind of reference holding things.
The dictionary is contained in a static object (to make it accesible to different processing threads through processing) so from this question and many more (Do static members ever get garbage collected?) I understand why that dictionary is in Generation 2. But this is also the cause of those (null, null)? Even if I remove items from dictionary something will be always occupied in the memory?
It's not a speed issue like in this question Deallocate memory from large data structures in C# . It seems that memory is never reclaimed.
Is there something I can do to actually remove items from dictionary, not just keep filling it with (null, null) pairs?
Is there anything else I need to check out?
Dictionaries store items in a hash table. An array is used internally for this. Because of the way hash tables work, this array must always be larger than the actual number of items stored (at least about 30% larger). Microsoft uses a load factor of 72%, i.e. at least 28% of the array will be empty (see An Extensive Examination of Data Structures Using C# 2.0 and especially The System.Collections.Hashtable Class
and The System.Collections.Generic.Dictionary Class) Therefore the null/null entries could just represent this free space.
If the array is too small, it will grow automatically; however, when items are removed, the array does not shrink, but the space that will be freed up should be reused when new items are inserted.
If you are in control of this dictionary, you could try to re-create it in order to shrink it:
theDict = new Dictionary<TKey, IEnumerable<KeyValuePair<TKey2, TVal>>>(theDict);
But the problem might arise from the actual (non empty) entries. Your dictionary is static and will therefore never be reclaimed automatically by the garbage collector, unless you assign it another dictionary or null (theDict = new ... or theDict = null). This is only true for the dictionary itself which is static, not for its entries. As long as references to removed entries exist somewhere else, they will persist. The GC will reclaim any object (earlier or later) which cannot be accessed any more through some reference. It makes no difference, whether this object was declared static or not. The objects themselves are not static, only their references.
As #RobertTausig kindly pointed out, since .NET Core 2.1 there is the new Dictionary.TrimExcess(), which is what you actually wanted, but didn't exist back then.
Looks like you need to recycle space in that dict periodically. You can do that by creating a new one: new Dictionary<a,b>(oldDict). Be sure to do this in a thread-safe manner.
When to do this? Either on the tick of a timer (60sec?) or when a specific number of writes has occurred (100k?) (you'd need to keep a modification counter).
A solution could be to call Clear() method on the static dictionary.
In this way, the reference to the dictionary will remain available, but the objects contained will be released.

Is it bad form to let C# garbage collect a list instead of reusing it? [duplicate]

This question already has answers here:
Using the "clear" method vs. New Object
(5 answers)
Closed 8 years ago.
I have a list of elements that steadily grows, until I dump all the data from that list into a file. I then want to reuse that list for the same purpose again. Is it bad practice to simply assign it to a new list, instead of removing all the elements from the list? It seems garbage collection should take care of the old list, and that way I don't have to worry about removing the elements.
For example:
var myList = new List<element>();
myList.Add(someElement);
myList.Add(anotherElement);
// dumps the elements into a file
myList = new List<element>();
Edit: Even if there are easy ways around this, I was wondering too about the philosophical side of it. Is it bad to let something be garbage collected if there is a way around it? What are the costs of allowing garbage collection vs deleting the elements and reusing the same memory?
It depends a bit on how many elements are in the list. If the array backing the list is large enough to be on the large object heap, then you might be better off clearing the list and reusing it. This will reduce the number of large memory allocations, and will help reduce the problem of large object heap fragmentation. (See http://msdn.microsoft.com/en-us/magazine/cc534993.aspx and http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/ for more information; see http://blogs.msdn.com/b/dotnet/archive/2011/10/04/large-object-heap-improvements-in-net-4-5.aspx for improvements due with .NET 4.5)
If the lists are small, you might be better off just creating a new list, or you might get better performance calling Clear(). When in doubt, measure the performance.
Edit: In response to the philosophical question you pose in your edit, here are two reasons to create a new list:
In general, code is cleaner and easier to reason about if you do not reuse objects. The cost of garbage collection is low, the cost of confusing code is high.
Consider what happens if the code dumping the list's contents is in another function, as it most likely is. Once you've passed that list out of its local context, it's possible that there are non-local references to the same list. Other code might be modifying the list, or might be assuming (incorrectly) that you're not modifying it.
myList.Clear() is even easier to code than myList = new List<element>();
msdn: List.Clear Method
Each element in the list is a different object itself, and will need to be garbage collected whether you clear the list, or recreate a new list, or remove the items one at a time. What will NOT need to be garbage collected if you just clear the list and reuse it is the list itself. Unless your list is huge, containing hundreds of thousands of items, it will be difficult to measure a performance difference one way or the other. Fortunately, the garbage collector is highly optimized and it's a rare occurrence where developers need to consider what it is doing.
(As others have pointed out, there are various factors involved, such as...how many elements will you be adding to the new list? vs how many elements were in the old list? ...but the point is: the garbage collection of the list itself isn't relevant when it comes to collecting the elements of the list.)
I'm no expert, but:
Making a new list expecting that the GC will "take care" of the old one is probably a bad idea because it's a bad practice & probably inefficient.
Although it's a micro-optimization, I'd say that "setting" the new values until you reach list.Count, and the continuing to list.Add is the best way, because then you don't clear nor allocate unnecessary new memory (unless it's large lists which you want to clear for space)
Anyway, I would recommend using List.Clear() - it saves you and the GC trouble.
It sounds like you're asking two different questions. One is whether it's okay to set it to a new object or just clear it, which I think Eric answered pretty well. The second is whether you should just ignore the GC and let it work without trying to "help" it - to that, I'd say absolutely YES. Let the framework do what the framework does and stay out of its way until you have to.
A lot of programmers want to dig in too deep, and most of the time it causes more problems than it helps. The GC is designed to collect these things and clean them up for you. Unless you are seeing a very specific problem, you should write the code that works and pay ignore when something will be collected (with the exception of the using keyword when appropriate).
The important perspective is clean code.
when you create a new list the old one will be removed by the GC (if there are no other reference to it.)
I would rather to use List.Clear() to remove all the elements for re-use. The Capacity remain unchanged so there shouldn't have additional overhead cost and letting GC to handle the memory garbage collection so you can maintain clean code.

Categories

Resources