Does GCHandle.Alloc allocate memory? - c#

I am using .NET Memory Profiler from SciTech to reduce memory allocations rate of my program and cut frequency of garbage collections.
Surprisingly, according to the profiler, the largest amount of allocations seems to be coming from GCHandle.Alloc calls I am doing to marshall existing .NET arrays to native OpenGL.
My understanding is that calling GCHandle.Alloc does not allocate memory, it only pins existing memory on the managed heap?
Am I wrong or is the profiler wrong?

.NET reference source is available for anyone to see, and you can have a look and find out for yourself.
If you dig into GCHandle.Alloc, you'll see the it calls a native method called InternalAlloc:
[System.Security.SecurityCritical] // auto-generated
[MethodImplAttribute(MethodImplOptions.InternalCall)]
[ResourceExposure(ResourceScope.None)]
internal static extern IntPtr InternalAlloc(Object value, GCHandleType type);
Drilling down into the CLR code, you see the internal call to MarshalNative::InternalAlloc, which ends up calling:
hnd = GetAppDomain()->CreateTypedHandle(objRef, type);
Which in turns calls ObjectHandle::CreateTypedHandle -> HandleTable::HndCreateHandle -> HandleTableCache->TableAllocSingleHandleFromCache which allocates the handle if the it doesn't exist in the cache.
As #Antosha corrected me, the place of invocation isn't via ComDelegate (which actually makes little since) but via MarshalNative. An allocation does occur, not on the managed heap, but an external heap reserved by the runtime for managing handle roots into GC objects. The only allocation that does occur in the managed heap is the IntPtr which holds to pointer to the address in the table. Despite this, you should still make sure to call GCHandle.Free once you're done.

The profiler is even assigning a specific memory amount to each GCHandle I allocate - 8 bytes. And the managed heap seems to grow 8 bytes with each GCHandle.Alloc. So it seems that it actually does allocate space on managed heap, although I have no idea what for?
I don't know how an handle could be smaller :) I've done some tests:
Console.WriteLine("Is 64 bit: {0}, IntPtr.Size: {1}", Environment.Is64BitProcess, IntPtr.Size);
int[][] objects = new int[100000][];
for (int i = 0; i < objects.Length; i++)
{
objects[i] = new int[] { 0 };
}
long w1 = Environment.WorkingSet;
GCHandle[] handles = new GCHandle[objects.Length];
for (int i = 0; i < handles.Length; i++)
{
//handles[i] = new GCHandle(handles);
//handles[i] = GCHandle.Alloc(objects[i]);
handles[i] = GCHandle.Alloc(objects[i], GCHandleType.Pinned);
}
Console.WriteLine("Allocated");
long w2 = Environment.WorkingSet;
Console.WriteLine("Used: {0}, by handle: {1}", w2 - w1, ((double)(w2 - w1)) / handles.Length);
Console.ReadKey();
It is a small program. If you run, you will see that an "empty" GCHandle (one created with new GCHandle()) occupies IntPtr.Size memory. This is clear if you use ILSpy to look at it: it has a single IntPtr field. If you pin some object, then it occupies 2*IntPtr.Size memory. This probably because it has to write something in a CLR table (of size IntPtr) plus its internal IntPtr
Taken from https://stackoverflow.com/a/18122621/613130
It uses a dedicated table of GC handles built inside the CLR. You allocate an entry into this table with GCHandle.Alloc() and release it again later with GCHandle.Free(). The garbage collector simply adds the entries in this table to the graph of objects it discovered itself when it performs a collection.

from msdn
A new GCHandle that protects the object from garbage collection. This GCHandle must be released with Free when it is no longer needed.
So even if no real allocation is made, I guess no release of preallocated can be done until a call to Free.

Related

Memory Allocation Time (The Fast Way)

For a really simple code snippet, I'm trying to see how much of the time is spent actually allocating objects on the small object heap (SOH).
static void Main(string[] args)
{
const int noNumbers = 10000000; // 10 mil
ArrayList numbers = new ArrayList();
Random random = new Random(1); // use the same seed as to make
// benchmarking consistent
for (int i = 0; i < noNumbers; i++)
{
int currentNumber = random.Next(10); // generate a non-negative
// random number less than 10
object o = currentNumber; // BOXING occurs here
numbers.Add(o);
}
}
In particular, I want to know how much time is spent allocating space for the all the boxed int instances on the heap (I know, this is an ArrayList and there's horrible boxing going on as well - but it's just for educational purposes).
The CLR has 2 ways of performing memory allocations on the SOH: either calling the JIT_TrialAllocSFastMP (for multi-processor systems, ...SFastSP for single processor ones) allocation helper - which is really fast since it consists of a few assembly instructions - or failing back to the slower JIT_New allocation helper.
PerfView sees just fine the JIT_New being invoked:
However, I can't figure out which - if any - is the native function involved for the "quick way" of allocating. I certainly don't see any JIT_TrialAllocSFastMP. I've already tried raising the count of the loop (from 10 to 500 mil), in the hope of increasing my chances of of getting a glimpse of a few stacks containing the elusive function, but to no avail.
Another approach was to use JetBrains dotTrace (line-by-line) performance viewer, but it falls short of what I want: I do get to see the approximate time it takes the boxing operation for each int, but 1) it's just a bar and 2) there's both the allocation itself and the copying of the value (of which the latter is not what I'm after).
Using the JetBrains dotTrace Timeline viewer won't work either, since they currently don't (quite) support native callstacks.
At this point it's unclear to me if there's a method being dynamically generated and called when JIT_TrialAllocSFastMP is invoked - and by miracle neither of the PerfView-collected stack frames (one every 1 ms) ever capture it -, or somehow the Main's method body gets patched, and those few assembly instructions mentioned above are somehow injected directly in the code. It's also hard to believe that the fast way of allocating memory is never called.
You could ask "But you already have the .NET Core CLR code, why can't you figure out yourself ?". Since the .NET Framework CLR code is not publicly available, I've looked into its sibling, the .NET Core version of the CLR (as Matt Warren recommends in his step 6 here). The \src\vm\amd64\JitHelpers_InlineGetThread.asm file contains a JIT_TrialAllocSFastMP_InlineGetThread function. The issue is that parsing/understanding the C++ code there is above my grade, and also I can't really think of a way to "Step Into" and see how the JIT-ed code is generated, since this is way lower-level that your usual Press-F11-in-Visual-Studio.
Update 1: Let's simplify the code, and only consider individual boxed int values:
const int noNumbers = 10000000; // 10 mil
object o = null;
for (int i=0;i<noNumbers;i++)
{
o = i;
}
Since this is a Release build, and dead code elimination could kick in, WinDbg is used to check the final machine code.
The resulting JITed code, whose main loop is highlighted in blue below, which simply does repeated boxing, shows that the method that handles the memory allocation is not inlined (note the call to hex address 00af30f4):
This method in turn tries to allocate via the "fast" way, and if that fails, goes back to the "slow" way of a call to JIT_New itself):
It's interesting how the call stack in PerfView obtained from the code above doesn't show any intermediary method between the level of Main and the JIT_New entry itself (given that Main doesn't directly call JIT_New):

How is the memory unpinned in Memory<T>.Span?

I believe that following two pieces of code should be equivalent:
// first example
string s = "Hello memmory";
ReadOnlyMemory<char> memory = s.AsMemory();
using (MemoryHandle pin = memory.Pin())
{
Span<char> span = new Span<char>(pin.Pointer, 1);
Console.WriteLine(span[0]);
}
// second example
ReadOnlySpan<char> span2 = memory.Span;
Console.WriteLine(span2[0]);
Both codes will print "H".
What I don't understand is where is the unpinning of memory in the second example.
As I understand it string is allocated on Heap, MemoryHandle pinn it and create Span from the pointer. MemoryHandle.Dispose unpin memory back.
I believe that memory.Span must pin the memory as well, otherwise span couldn't accessing the pointer. But how is the memory unpinned in the second example?
The last assumption is incorrect: memory.Span does not need to pin the memory, as the garbage collector is aware of its underlying reference. Pinning is independently available in case you would like to pass the pointer to a native API.
A Span only lives on the stack of the current method thread and not on the Heap of it therefore will it be alive as long as you using it there. So far so clear.
Now the funny part:
The clear truth is that the result of memory.Span is not pinned but only referenced by using the ref T inside the Span<T> what is .nets idea of managed pointers that are also observed by the GarbageCollector.
As long as your Memory lives, your span will too and by this your Span.
References:
https://msdn.microsoft.com/en-us/magazine/mt814808.aspx?f=255&MSPPError=-2147217396
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/ref#ref-struct-types

Does a managed pointer to a value-type field keep its containing GC instance alive? [duplicate]

In C#, ref and out params are, as far as I know, passed by passing only the raw address of the relevant value. That address may be an interior pointer to an element in an array or a field within an object.
If a garbage collection occurs, it's possible that the only reference to some object is through one of these interior pointers, as in:
using System;
public class Foo
{
public int field;
public static void Increment(ref int x) {
System.GC.Collect();
x = x + 1;
Console.WriteLine(x);
}
public static void Main()
{
Increment(ref new Foo().field);
}
}
In that case, the GC needs to find the beginning of the object and treat the entire object as reachable. How does it do that? Does it have to scan the entire heap looking for the object that contains that pointer? That seems slow.
The garbage collector will have a fast way to find the start of an object from a managed interior pointer. From there it can obviously mark the object as "referenced" when doing the sweeping phase.
Don't have the code for the Microsoft collector but they would use something similar to Go's span table which has a fast lookup for different "spans" of memory which you can key on the most significant X bits of the pointer depending on how large you choose the spans to be. From there they use the fact that each span contains X number of objets of the same size to very quickly find the header of the one you have. It's pretty much an O(1) operation. Obviously the Microsoft heap will be different since it's allocated sequentially without regard for object size but they will have some sort of O(1) lookup structure.
https://github.com/puppeh/gcc-6502/blob/master/libgo/runtime/mgc0.c
// Otherwise consult span table to find beginning.
// (Manually inlined copy of MHeap_LookupMaybe.)
k = (uintptr)obj>>PageShift;
x = k;
x -= (uintptr)runtime_mheap.arena_start>>PageShift;
s = runtime_mheap.spans[x];
if(s == nil || k < s->start || (const byte*)obj >= s->limit || s->state != MSpanInUse)
return false;
p = (byte*)((uintptr)s->start<<PageShift);
if(s->sizeclass == 0) {
obj = p;
} else {
uintptr size = s->elemsize;
int32 i = ((const byte*)obj - p)/size;
obj = p+i*size;
}
Note that the .NET garbage collector is a copying collector so managed/interior pointers need to be updated whenever the object is moved during a garbage collection cycle. The GC will be aware of where in the stack interior pointers are for each stack frame based on the method parameters known at JIT time.
Your code compiles to
IL_0001: newobj instance void Foo::.ctor()
IL_0006: ldflda int32 Foo::'field'
IL_000b: call void Foo::Increment(int32&)
AFAIK, the ldflda instruction creates a reference to the object containing the field, for as long as the address is on the stack (until the call completes).
The garbage collector works in three basic steps:
Mark all objects that are still alive.
Collect the objects that are not marked as alive.
Compact the memory.
Your concern is step 1: How does the GC figure out that it shouldn't collect objects behind ref and out params?
When the GC performs a collection, it starts with a state where none of the objects is considered alive. It then goes from the root references and marks all those objects as alive. Root references are all references on the stack and in static fields. Then the GC goes recursively into the marked objects and marks all objects as alive that are referenced from them. This is repeated until no objects are found that are not already marked as alive. The result of this operation is an object graph.
A ref or out parameter has a reference on the stack, and so the GC will mark the respective object as alive, because the stack is a root for the object graph.
At the end of the process, the objects with only internal references are not marked, because there is no path from the root references that would reach them. This takes care of all circular references, too. These objects are considered dead and will be collected in the next step (that includes calling the finalizer, even though there is no guarantee for that).
At the end, the GC will move all alive objects to a continuous area of memory at the beginning of the heap. The rest of the memory will filled with zeroes. That simplifies the process of creating new objects, since their memory can always be allocated at the end of the heap and all fields already have the default values.
It is true that the GC needs some time to do all of this, but it still does it reasonably fast, due to some optimizations. One of the optimizations is to separate the heap into generations. All newly allocated objects are generation 0. All objects surviving the first collection are generation 1 and so forth. Higher generations are only collected if collecting lower generations does not free up enough memory. So, no, the GC does not always have to scan the entire heap.
You have to consider that, while the collection takes some time, allocating new objects (which happens much more often than a garbage collection) is much faster than in other implementations, where the heap looks more like a swiss cheese and you need some time to find a hole big enough for the new object (which you still need to initialize).

.NET, get memory used to hold struct instance

It's possible to determine memory usage (according to Jon Skeet's blog)
like this :
public class Program
{
private static void Main()
{
var before = GC.GetTotalMemory(true);
var point = new Point(1, 0);
var after = GC.GetTotalMemory(true);
Console.WriteLine("Memory used: {0} bytes", after - before);
}
#region Nested type: Point
private class Point
{
public int X;
public int Y;
public Point(int x, int y)
{
X = x;
Y = y;
}
}
#endregion
}
It prints Memory used: 16 bytes (I'm running x64 machine).
Consider we change Point declaration from class to struct. How then to determine memory used? Is is possible at all? I was unable to find anything about getting stack size in .NET
P.S
Yes, when changed to 'struct', Point instances will often be stored on Stack(not always), instead of Heap.Sorry for not posting it first time together with the question.
P.P.S
This situation has no practical usage at all(IMHO), It's just interesting for me whether it is possible to get Stack(short term storage) size. I was unable to find any info about it, so asked you, SO experts).
You won't see a change in GetTotalMemory if you create the struct the way you did, since it's going to be part of the thread's stack, and not allocated separately. The GetTotalMemory call will still work, and show you the total allocation size of your process, but the struct will not cause new memory to be allocated.
You can use sizeof(Type) or Marshal.SizeOf to return the size of a struct (in this case, 8 bytes).
There is special CPU register, ESP, that contains pointer to the top of the stack. Probably you can find a way to read this register from .Net (using some unsafe or external code). Then just compare value of this pointer at given moment with value at thread start - and difference between them will be more or less acurate amount of memory, used for thread's stack. Not sure if it really works, just an idea :)
In isolation, as you have done here, you might have a "reasonable" amount of success with this methodology. I am not confident the information is useful, but running this methodology, especially if you run it numerous times to ensure you did not have any other piece of code or GC action affecting the outcome. Utilizing this methodology in a real world application is less likely to give accurate results, however, as there are too many variables.
But realize, this only "reasonable" and not a surety.
Why do you need to know the size of objects? Just curious, as knowing the business reason may lead to other alternatives.

C# Collect garbage of object with memory leak

I am using a 3rd-party object I didn't create that over time consumes a lot of resources. This object shouldn't in any way contain a state, it simply performs a calculation. Despite this fact, everytime I call a specific function of this object a little more memory is consumed. A few hours later, and my program is sitting at gigabytes of allocated memory.
The object was origionaly initialized as a static member of my Program class in my command-line application. I have found that if I wrap my entire program in an class, and reinitialize it every now and again, the older (and bloated) object is unallocated by GC and a new smaller object replaces it.
My issue is this method is quite clumsy and ruins the flow of my Program.
Is there any other way you can dispose of an object? I am lead to believe GC.Collect() will only dispose unreachable code. Is there anyway I can make an object 'unreachable'?
Edit: As requested, the code:
static ILexicon lexicon = new Lexicon();
...
lexicon.LoadDataFromFile(#"lexicon.dat", null);
...
byte similarityScore(string w1, string w2, PartOfSpeech pos, SimilarityMeasure measure)
{
if (w1 == w2)
return 255;
if (pos != PartOfSpeech.Noun && pos != PartOfSpeech.Verb)
return 0;
IList<ILemma> w1_lemmas = lexicon.FindSenses(w1, pos);
IList<ILemma> w2_lemmas = lexicon.FindSenses(w2, pos);
byte result;
byte score = 0;
foreach (ILemma w1_lemma in w1_lemmas)
{
foreach (ILemma w2_lemma in w2_lemmas)
{
result = (byte) (w1_lemma.GetSimilarity(w2_lemma, measure) * 255);
if (result > score)
score = result;
}
}
return score;
}
As similarityScore is called, more memory is allocated to a private member of lexicon. It does not implement IDisposable and there are no obvious functions to clear the memory. The library is based on WordNet, and uses an algorithm to find path lengths in the hypernym tree to calculate the similarity of two words. Unless there is caching, I can't see why it would need to store any memory. What is for sure, is I can't change it. I'm almost certain there is nothing wrong with my code. I just need to dispose of lexicon when it gets too large (N.B. it takes a second or two to load the lexicon from file to memory)
If the object doesn't implement IDisposable and you want to push it out of scope you can set all references to it to null and then the force garbage collection with GC.Collect().
GC.Collect() is very expensive. If you're going to have to do this frequently, you might want to consider contacting the vendor.
Find out:
If you are using their library correctly, or is there something you're doing wrong that's causing the memory leak.
If their library is leaking memory even when used as intended, can they fix the leak?
Additional note: If the 3rd party library is native and you're having to use interop, you can use Marshal.ReleaseComObject to free unmanaged memory.
you could try calling the Dispose() method. This would make the object unusable, so you would have to instantiate another one. I assume your program is in a loop, so it can be a loop variable with the call to dispose at the bottom.
I would suggest that if you can get your hands on a memory profiler, you use it. A memory profiler will let you pause your program, click on a class, and and see a list of objects of that class. One can then click on an object and see how it was created, and the "path" to that object from a root (e.g. there's a static class foo, which holds a reference to a bar, which holds a reference to a boz, which holds a reference to a reallybigthing). Often, seeing that will make it clear what needs to be done to break the chain.
you might be able to download the source from wordnet repository and modify the code since it is an opensource.

Categories

Resources