C# Collect garbage of object with memory leak - c#

I am using a 3rd-party object I didn't create that over time consumes a lot of resources. This object shouldn't in any way contain a state, it simply performs a calculation. Despite this fact, everytime I call a specific function of this object a little more memory is consumed. A few hours later, and my program is sitting at gigabytes of allocated memory.
The object was origionaly initialized as a static member of my Program class in my command-line application. I have found that if I wrap my entire program in an class, and reinitialize it every now and again, the older (and bloated) object is unallocated by GC and a new smaller object replaces it.
My issue is this method is quite clumsy and ruins the flow of my Program.
Is there any other way you can dispose of an object? I am lead to believe GC.Collect() will only dispose unreachable code. Is there anyway I can make an object 'unreachable'?
Edit: As requested, the code:
static ILexicon lexicon = new Lexicon();
...
lexicon.LoadDataFromFile(#"lexicon.dat", null);
...
byte similarityScore(string w1, string w2, PartOfSpeech pos, SimilarityMeasure measure)
{
if (w1 == w2)
return 255;
if (pos != PartOfSpeech.Noun && pos != PartOfSpeech.Verb)
return 0;
IList<ILemma> w1_lemmas = lexicon.FindSenses(w1, pos);
IList<ILemma> w2_lemmas = lexicon.FindSenses(w2, pos);
byte result;
byte score = 0;
foreach (ILemma w1_lemma in w1_lemmas)
{
foreach (ILemma w2_lemma in w2_lemmas)
{
result = (byte) (w1_lemma.GetSimilarity(w2_lemma, measure) * 255);
if (result > score)
score = result;
}
}
return score;
}
As similarityScore is called, more memory is allocated to a private member of lexicon. It does not implement IDisposable and there are no obvious functions to clear the memory. The library is based on WordNet, and uses an algorithm to find path lengths in the hypernym tree to calculate the similarity of two words. Unless there is caching, I can't see why it would need to store any memory. What is for sure, is I can't change it. I'm almost certain there is nothing wrong with my code. I just need to dispose of lexicon when it gets too large (N.B. it takes a second or two to load the lexicon from file to memory)

If the object doesn't implement IDisposable and you want to push it out of scope you can set all references to it to null and then the force garbage collection with GC.Collect().
GC.Collect() is very expensive. If you're going to have to do this frequently, you might want to consider contacting the vendor.
Find out:
If you are using their library correctly, or is there something you're doing wrong that's causing the memory leak.
If their library is leaking memory even when used as intended, can they fix the leak?
Additional note: If the 3rd party library is native and you're having to use interop, you can use Marshal.ReleaseComObject to free unmanaged memory.

you could try calling the Dispose() method. This would make the object unusable, so you would have to instantiate another one. I assume your program is in a loop, so it can be a loop variable with the call to dispose at the bottom.

I would suggest that if you can get your hands on a memory profiler, you use it. A memory profiler will let you pause your program, click on a class, and and see a list of objects of that class. One can then click on an object and see how it was created, and the "path" to that object from a root (e.g. there's a static class foo, which holds a reference to a bar, which holds a reference to a boz, which holds a reference to a reallybigthing). Often, seeing that will make it clear what needs to be done to break the chain.

you might be able to download the source from wordnet repository and modify the code since it is an opensource.

Related

C# Issue handling memory usage for SearchResultAttributeCollection LDAP

Using the System.DirectoryServices.Protocols library:
I have a class LdapItemOperator that takes a SearchResultEntry object from an LDAP query (not Active Directory related) and stores the attributes for the object in a field: readonly SearchResultAttributeCollection LdapAttributes.
The problem I am experiencing is that when I have a large operation the garbage collector seems to never delete these objects after they ought to have been disposed because of the LdapAttributes field in my objects, at least I think that's the problem. What ways can I try to dispose of the objects when they are no longer required? I can't seem to find a way to incorporate a using statement in there, although I only have little experience with it.
As an example, let's say I have the following logic:
List<LdapItemOperator> itemList = GetList(ldapFilter);
List<bool> resultList = new List<bool>();
foreach (IdmLdapItemOperator item in itemList) {
bool result = doStuff(item);
resultList.Add(result);
}
//Even though we are out of the loop now, the objects are still stored in memory, how come? Same goes for the previous objects in the loop, they seem to remain in memory
Logic.WriteResultToLog(result);
After a good while of running the logic on large filesets, this process starts taking up enormous amounts of memory, of course...
I think you might be a little confused about how GC works. You can never know exactly when GC will run. And objects you are still holding a reference to will not be collected (unless it's a weak reference...).
Also "disposing" is yet another different concept, that hasn't much to do with GC.
Basically, all objects will be in memory already after the call to GetList. And memory consumption will not change much after that, the foreach loop shouldn't affect it at all.
Without knowing your implementation, maybe try returning an enumerable instead of a single list, or make multiple batched calls.

C# - downsides to recycling resources in finalizer

As the title states: are there any downsides to recycling resources (like large arrays) in the finalizer of the containing object? So far it works out fine but since finalizers can be a bit funky and difficult to debug I decided to ask here.
The use case is a polygon class which just represents a list of points. My application makes heavy use of large polygons with a few thousand points - allocating those arrays often isn't exactly cheap. A dispose pattern is unfortunately out of the question because they get passed around and pretty much only the GC knows when no other object is referencing them.
This is how I implemented it (arrays always have the length of 2^sizeLog2 and only arrays with sizeLog2 >= MIN_SIZE_LOG_2 get recycled):
Constructor:
public Polygon(int capacity = 1)
{
//fast method for getting the ceiling of the logarithm of an integer
int sizeLog2 = UIM.CeilingLog2((uint)capacity);
//supress finalize when the array should not be recycled
if(sizeLog2 < TWO_POW_MIN_SIZE) GC.SuppressFinalize(this);
Points = Create(sizeLog2);
}
Create array of size 2^sizeLog2:
private static Pnt[] Create(int sizeLog2)
{
if (sizeLog2 >= TWO_POW_MIN_SIZE)
{
if (/*try to get an item from recycle queue*/) return result;
Pnt[] points = new Pnt[1 << sizeLog2];
//keep array alive so that it won't get collected by GC when the polygon is
GC.KeepAlive(points);
return points;
}
return new Pnt[1 << sizeLog2];
}
In the increase capacity method this is the line for reregistering the polygon for finalize, if the finalize was suppressed up to this point (capacity can only get increased and only by a factor 2):
if (newSizeLog2 == MIN_SIZE_LOG_2) GC.ReRegisterForFinalize(this);
And this is the destructor:
~Polygon()
{
if (Points != null) Recycle(Points); //enqueues the array in a ConcurrentQueue
Points = null;
}
And yes, I do know that this isn't exactly a fancy programming style, but this is rather performance critical so I really can't just let the GC do all the work because then I end up with hundreds of MB on the large object heap in a few seconds.
Also, the polygons get handled by different threads concurrently.
The short answer is yes there are. Large arrays are not the type of resources finalizers are intended to recycle. Finalizers should be used for external resources and in vanishingly rare instances application state.
I would suggest this article, and/or this one, to better understand some of the pitfalls of finalization.
The crux of the problem with what you've described is this statement: " I really can't just let the GC do all the work because then I end up with hundreds of MB on the large object heap in a few seconds."
But the finalizer is never going to be called until the GC "has done all the work" so you must not have a full understanding what is causing the memory pressure for your application.
It really sounds like you've got in issue with the overall design of your application where you're finding it hard to reason about who owns Polygons and/or Pnts. The result is there are references to polygons/Pnts - somewhere - when you don't expect to have them. Use a good profiler to find where this is happening and fix this overall design problem instead trying to use finalization.

Does a managed pointer to a value-type field keep its containing GC instance alive? [duplicate]

In C#, ref and out params are, as far as I know, passed by passing only the raw address of the relevant value. That address may be an interior pointer to an element in an array or a field within an object.
If a garbage collection occurs, it's possible that the only reference to some object is through one of these interior pointers, as in:
using System;
public class Foo
{
public int field;
public static void Increment(ref int x) {
System.GC.Collect();
x = x + 1;
Console.WriteLine(x);
}
public static void Main()
{
Increment(ref new Foo().field);
}
}
In that case, the GC needs to find the beginning of the object and treat the entire object as reachable. How does it do that? Does it have to scan the entire heap looking for the object that contains that pointer? That seems slow.
The garbage collector will have a fast way to find the start of an object from a managed interior pointer. From there it can obviously mark the object as "referenced" when doing the sweeping phase.
Don't have the code for the Microsoft collector but they would use something similar to Go's span table which has a fast lookup for different "spans" of memory which you can key on the most significant X bits of the pointer depending on how large you choose the spans to be. From there they use the fact that each span contains X number of objets of the same size to very quickly find the header of the one you have. It's pretty much an O(1) operation. Obviously the Microsoft heap will be different since it's allocated sequentially without regard for object size but they will have some sort of O(1) lookup structure.
https://github.com/puppeh/gcc-6502/blob/master/libgo/runtime/mgc0.c
// Otherwise consult span table to find beginning.
// (Manually inlined copy of MHeap_LookupMaybe.)
k = (uintptr)obj>>PageShift;
x = k;
x -= (uintptr)runtime_mheap.arena_start>>PageShift;
s = runtime_mheap.spans[x];
if(s == nil || k < s->start || (const byte*)obj >= s->limit || s->state != MSpanInUse)
return false;
p = (byte*)((uintptr)s->start<<PageShift);
if(s->sizeclass == 0) {
obj = p;
} else {
uintptr size = s->elemsize;
int32 i = ((const byte*)obj - p)/size;
obj = p+i*size;
}
Note that the .NET garbage collector is a copying collector so managed/interior pointers need to be updated whenever the object is moved during a garbage collection cycle. The GC will be aware of where in the stack interior pointers are for each stack frame based on the method parameters known at JIT time.
Your code compiles to
IL_0001: newobj instance void Foo::.ctor()
IL_0006: ldflda int32 Foo::'field'
IL_000b: call void Foo::Increment(int32&)
AFAIK, the ldflda instruction creates a reference to the object containing the field, for as long as the address is on the stack (until the call completes).
The garbage collector works in three basic steps:
Mark all objects that are still alive.
Collect the objects that are not marked as alive.
Compact the memory.
Your concern is step 1: How does the GC figure out that it shouldn't collect objects behind ref and out params?
When the GC performs a collection, it starts with a state where none of the objects is considered alive. It then goes from the root references and marks all those objects as alive. Root references are all references on the stack and in static fields. Then the GC goes recursively into the marked objects and marks all objects as alive that are referenced from them. This is repeated until no objects are found that are not already marked as alive. The result of this operation is an object graph.
A ref or out parameter has a reference on the stack, and so the GC will mark the respective object as alive, because the stack is a root for the object graph.
At the end of the process, the objects with only internal references are not marked, because there is no path from the root references that would reach them. This takes care of all circular references, too. These objects are considered dead and will be collected in the next step (that includes calling the finalizer, even though there is no guarantee for that).
At the end, the GC will move all alive objects to a continuous area of memory at the beginning of the heap. The rest of the memory will filled with zeroes. That simplifies the process of creating new objects, since their memory can always be allocated at the end of the heap and all fields already have the default values.
It is true that the GC needs some time to do all of this, but it still does it reasonably fast, due to some optimizations. One of the optimizations is to separate the heap into generations. All newly allocated objects are generation 0. All objects surviving the first collection are generation 1 and so forth. Higher generations are only collected if collecting lower generations does not free up enough memory. So, no, the GC does not always have to scan the entire heap.
You have to consider that, while the collection takes some time, allocating new objects (which happens much more often than a garbage collection) is much faster than in other implementations, where the heap looks more like a swiss cheese and you need some time to find a hole big enough for the new object (which you still need to initialize).

C# WebAPI Garbage Collection

I just delivered my first C# WebAPI application to the first customer. Under normal load, performance initially is even better than I expected. Initially.
Everything worked fine until, at some point, memory was up and garbage collection started running riot (as in "It collects objects that are not yet garbage"). At that point, there were multiple W3WP threads with some ten gigs of ram altogether, and single-digit gigs per worker. After a restart of the IIS everything was back to normal, but of course the memory usage is rising again.
Please correct me if I am wrong, but
Shouldn't C# have automatic garbage collection?
Shouldn't it be easy for GC to collect the garbage of a WebAPI application?
And please help me out:
How can I explicitly state what GC should collect, thus preventing memory leaks? Is someBigList = null; the way to go?
How can I detect where the memory leaks are?
EDIT: Let me clarify some things.
My .NET WebAPI application is mostly a bunch of
public class MyApiController:ApiController
{
[HttpGet]
public MyObjectClass[] MyApi(string someParam) {
List<MyObjectClass> list = new List<MyObjectClass>();
...
for/while/foreach {
MyObjectClass obj = new MyObjectClass();
obj.firstStringAttribute = xyz;
...
list.Add(obj);
}
return list.ToArray();
}
}
Under such conditions, GC should be easy: after "return", all local variables should be garbage. Yet with every single call the used memory increases.
I initially thought that C# WebAPI programs behave similar to (pre-compiled) PHP: IIS calls the program, it is executed, returns the value and is then completely disposed off.
But this is not the case. For instance, I found static variables to keep their data between runs, and now I disposed of all static variables.
Because I found static variables to be a problem for GC:
internal class Helper
{
private static List<string> someVar = new List<string>();
internal Helper() {
someVar=new List<string>();
}
internal void someFunc(string str) {
someVar.Add(str);
}
internal string[] someOtherFunc(string str) {
string[] s = someVar.ToArray();
someVar=new List<string>();
return s;
}
}
Here, under low-memory conditions, someVar threw a null pointer error, which in my opinion can only be caused by GC, since I did not find any code where someVar is actively nullified by me.
I think the memory increase slowed down since I actively set the biggest array variables in the most often used Controllers to null, but this is only a gut feeling and not even nearly a complete solution.
I will now do some profiling using the link you provided, and get back with some results.
Shouldn't C# have automatic garbage collection?
C# is a programming language for the .NET runtime, and .NET brings the automatic garbage collection to the table. So, yes, although technically C# isn't the piece that brings it.
Shouldn't it be easy for GC to collect the garbage of a WebAPI application?
Sure, it should be just as easy as for any other type of .NET application.
The common theme here is garbage. How does .NET determine that something is garbage? By verifying that there are no more live references to the object. To be honest I think it is far more likely that you have verified one of your assumptions wrongly, compared to there being a serious bug in the garbage collector in such a way that "It collects objects that are not yet garbage".
To find leaks, you need to figure out what objects are currently held in memory, make a determination whether that is correct or not, and if not, figure out what is holding them there. A memory profiler application would help with that, there are numerous available, such as the Red-Gate ANTS Memory Profiler.
For your other questions, how to make something eligible for garbage collection? By turning it into garbage (see definition above). Note that setting a local variable to null may not necessarily help or be needed. Setting a static variable to null, however, might. But the correct way to determine that is to use a profiler.
Here are some shot-in-the-dark type of tips you might look into:
Look at static classes, static fields, and static properties. Are you storing data there that is accumulating?
How about static events? Do you have this? Do you remember to unsubscribe the event when you no longer need it?
And by "static fields, properties, and events", I also mean normal instance fields, properties and events that are held in objects that directly or indirectly are stored in static fields or properties. Basically, anything that will keep the objects in memory.
Are you remembering to Dispose of all your IDisposable objects? If not, then the memory being used could be unmanaged. Typically, however, when the garbage collector collects the managed object, the finalizer of that object should clean up the unmanaged memory as well, however you might allocate memory that the GC algorithm isn't aware of, and thus thinks it isn't a big problem to wait with collection. See the GC.AddMemoryPressure method for more on this.

Need to delete objects: implement Dispose or create objects in a function?

I have some objects that read a file, save the data in arrays and make some operations. The sequence is Create object A, operate with object A. Create object B, operate with object B...
The data read by each object may be around 10 MB. So the best option would be to delete each object after operate with each one. Let say I want my program to allocate around 10 MB in memory, not 10MB * 1000 objects = 1GB
The objects are something like:
class MyClass
{
List<string[]> data;
public MyClass(string datafile)
{
using (CsvReader csv = new CsvReader(new StreamReader(datafile), true))
{
data = csv.ToList<string[]>();
}
}
public List<string> Operate()
{
...
}
}
My question is: should I implement dispose? And do something like:
List<string> results = new List<results>();
using (MyClass m = new MyClass("fileM.txt"))
{
results.AddRange(m.Operate());
}
using (MyClass d = new MyClass("fileD.txt"))
{
results.AddRange(d.Operate());
}
...
I´ve read that implementing Disposable is recommended when you use unmmanaged resources (sockets, streams, ...), but in my class I have only big data arrays.
Another way would be to create functions for each objects (I suppose GC will delete a object created in a function automatically):
List<string> results = new List<results>();
results.AddRange(myFunction("fileM.txt"));
results.AddRange(myFunction("fileD.txt"));
public List<string> myFunction(string file)
{
MyClass c = new MyClass(file);
return results.AddRange(c.Operate());
}
IDisposable etc will not help you here, since it doesn't cause anything to get collected. In this type of scenario, maybe the best approach is to use a pool to reduce allocations - essentially becoming your own memory manager. For example, if your List<string> is big, you can avoid a lot of the arrays by re-using the lists - after clearing them, obviously. If you call .Clear(), the backing array is not reset - it just sets a logical marker to consider it as empty. In your specific case, a lot of your objects are going to be the individual strings; that is trickier, but at least they are small and should be collectable in generation-zero.
In your case I'd allocate a single buffer array. For example, allocate a 10 MB array once and fill it with the data you want. Then, when you get to the next object, just reuse the array. If you ever need a bigger array, you can just allocate a new, bigger, array and use that instead. The garbage collector will eventually remove your smaller one.
You can also use a List<T>, it will internally do the same (allocate an array, keep it until it becomes too small, allocate a new one). Just Clear it before creating the next object.
Note that you cannot force1 the garbage collector to collect an object. IDisposable is indeed only used to clean up unmanaged resources, as the garbage collector does not know about them, or to close (file) handles. Calling Dispose does not guarantee (or imply) that the object is removed from memory.
However, if you change nothing, your code will still be correct and work properly. The garbage collector is responsible for removing unused objects whenever it feels like doing that, and it will ensure that there is plenty of memory available at all times. The only thing you have to do to let the collector do its work is to let go of any references to old objects (by overwriting them or setting them to null, or letting them go out of scope).
1) You can force the garbage collector to collect your data by calling GC.Collect(). However, this is not recommended. Let the garbage collector figure it out by itself.
If you're using .NET 4.0 or greater, have a look at the BlockingCollection class. The constructor that takes the Int32 parameter allows you to specify an upper bound on the size of the collection. The Add and Take methods work as throttles. Add will only succeed if the upper bound has not been reached. If it has, it'll block. Take will only succeed if an item exists. If no item exists, it'll block until one is available. Of course, the class has some variations of these methods, so fully examine the documentation to see which, if any, make sense.

Categories

Resources