How is the memory unpinned in Memory<T>.Span? - c#

I believe that following two pieces of code should be equivalent:
// first example
string s = "Hello memmory";
ReadOnlyMemory<char> memory = s.AsMemory();
using (MemoryHandle pin = memory.Pin())
{
Span<char> span = new Span<char>(pin.Pointer, 1);
Console.WriteLine(span[0]);
}
// second example
ReadOnlySpan<char> span2 = memory.Span;
Console.WriteLine(span2[0]);
Both codes will print "H".
What I don't understand is where is the unpinning of memory in the second example.
As I understand it string is allocated on Heap, MemoryHandle pinn it and create Span from the pointer. MemoryHandle.Dispose unpin memory back.
I believe that memory.Span must pin the memory as well, otherwise span couldn't accessing the pointer. But how is the memory unpinned in the second example?

The last assumption is incorrect: memory.Span does not need to pin the memory, as the garbage collector is aware of its underlying reference. Pinning is independently available in case you would like to pass the pointer to a native API.

A Span only lives on the stack of the current method thread and not on the Heap of it therefore will it be alive as long as you using it there. So far so clear.
Now the funny part:
The clear truth is that the result of memory.Span is not pinned but only referenced by using the ref T inside the Span<T> what is .nets idea of managed pointers that are also observed by the GarbageCollector.
As long as your Memory lives, your span will too and by this your Span.
References:
https://msdn.microsoft.com/en-us/magazine/mt814808.aspx?f=255&MSPPError=-2147217396
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/ref#ref-struct-types

Related

Does a managed pointer to a value-type field keep its containing GC instance alive? [duplicate]

In C#, ref and out params are, as far as I know, passed by passing only the raw address of the relevant value. That address may be an interior pointer to an element in an array or a field within an object.
If a garbage collection occurs, it's possible that the only reference to some object is through one of these interior pointers, as in:
using System;
public class Foo
{
public int field;
public static void Increment(ref int x) {
System.GC.Collect();
x = x + 1;
Console.WriteLine(x);
}
public static void Main()
{
Increment(ref new Foo().field);
}
}
In that case, the GC needs to find the beginning of the object and treat the entire object as reachable. How does it do that? Does it have to scan the entire heap looking for the object that contains that pointer? That seems slow.
The garbage collector will have a fast way to find the start of an object from a managed interior pointer. From there it can obviously mark the object as "referenced" when doing the sweeping phase.
Don't have the code for the Microsoft collector but they would use something similar to Go's span table which has a fast lookup for different "spans" of memory which you can key on the most significant X bits of the pointer depending on how large you choose the spans to be. From there they use the fact that each span contains X number of objets of the same size to very quickly find the header of the one you have. It's pretty much an O(1) operation. Obviously the Microsoft heap will be different since it's allocated sequentially without regard for object size but they will have some sort of O(1) lookup structure.
https://github.com/puppeh/gcc-6502/blob/master/libgo/runtime/mgc0.c
// Otherwise consult span table to find beginning.
// (Manually inlined copy of MHeap_LookupMaybe.)
k = (uintptr)obj>>PageShift;
x = k;
x -= (uintptr)runtime_mheap.arena_start>>PageShift;
s = runtime_mheap.spans[x];
if(s == nil || k < s->start || (const byte*)obj >= s->limit || s->state != MSpanInUse)
return false;
p = (byte*)((uintptr)s->start<<PageShift);
if(s->sizeclass == 0) {
obj = p;
} else {
uintptr size = s->elemsize;
int32 i = ((const byte*)obj - p)/size;
obj = p+i*size;
}
Note that the .NET garbage collector is a copying collector so managed/interior pointers need to be updated whenever the object is moved during a garbage collection cycle. The GC will be aware of where in the stack interior pointers are for each stack frame based on the method parameters known at JIT time.
Your code compiles to
IL_0001: newobj instance void Foo::.ctor()
IL_0006: ldflda int32 Foo::'field'
IL_000b: call void Foo::Increment(int32&)
AFAIK, the ldflda instruction creates a reference to the object containing the field, for as long as the address is on the stack (until the call completes).
The garbage collector works in three basic steps:
Mark all objects that are still alive.
Collect the objects that are not marked as alive.
Compact the memory.
Your concern is step 1: How does the GC figure out that it shouldn't collect objects behind ref and out params?
When the GC performs a collection, it starts with a state where none of the objects is considered alive. It then goes from the root references and marks all those objects as alive. Root references are all references on the stack and in static fields. Then the GC goes recursively into the marked objects and marks all objects as alive that are referenced from them. This is repeated until no objects are found that are not already marked as alive. The result of this operation is an object graph.
A ref or out parameter has a reference on the stack, and so the GC will mark the respective object as alive, because the stack is a root for the object graph.
At the end of the process, the objects with only internal references are not marked, because there is no path from the root references that would reach them. This takes care of all circular references, too. These objects are considered dead and will be collected in the next step (that includes calling the finalizer, even though there is no guarantee for that).
At the end, the GC will move all alive objects to a continuous area of memory at the beginning of the heap. The rest of the memory will filled with zeroes. That simplifies the process of creating new objects, since their memory can always be allocated at the end of the heap and all fields already have the default values.
It is true that the GC needs some time to do all of this, but it still does it reasonably fast, due to some optimizations. One of the optimizations is to separate the heap into generations. All newly allocated objects are generation 0. All objects surviving the first collection are generation 1 and so forth. Higher generations are only collected if collecting lower generations does not free up enough memory. So, no, the GC does not always have to scan the entire heap.
You have to consider that, while the collection takes some time, allocating new objects (which happens much more often than a garbage collection) is much faster than in other implementations, where the heap looks more like a swiss cheese and you need some time to find a hole big enough for the new object (which you still need to initialize).

What are layout and size of a managed class's header in unmanaged memory?

Recently, in this question, I've asked how to get a raw memory address of class in C# (it is a crude unreliable hack and a bad practice, don't use it unless you really need it). I've succeeded, but then a problem arose: according to this article, first 2 words in the class raw memory representation should be pointers to SyncBlock and RTTI structures, and therefore the first field's address must be offset by 2 words [8 bytes in 32-bit systems, 16 bytes in 64-bit systems] from the beginning. However, when I dump first bytes from memory at the object location, first field's raw offset from the object's address is only 1 32-bit word (4 bytes), which doesn't make any sense for both types of systems. From the question I've linked:
class Program
{
// Here is the function.
// I suggest looking at the original question's solution, as it is
// more reliable.
static IntPtr getPointerToObject(Object unmanagedObject)
{
GCHandle gcHandle = GCHandle.Alloc(unmanagedObject, GCHandleType.WeakTrackResurrection);
IntPtr thePointer = Marshal.ReadIntPtr(GCHandle.ToIntPtr(gcHandle));
gcHandle.Free();
return thePointer;
}
class TestClass
{
uint a = 0xDEADBEEF;
}
static void Main(string[] args)
{
byte[] cls = new byte[16];
var test = new TestClass();
var thePointer = getPointerToObject(test);
Marshal.Copy(thePointer, cls, 0, 16); //Dump first 16 bytes...
Console.WriteLine(BitConverter.ToString(BitConverter.GetBytes(thePointer.ToInt32())));
Console.WriteLine(BitConverter.ToString(cls));
Console.ReadLine();
gcHandle.Free();
}
}
/* Example output (yours should be different):
40-23-CA-02
4C-38-04-01-EF-BE-AD-DE-00-00-00-80-B4-21-50-73
That field's value is "EF-BE-AD-DE", 0xDEADBEEF as it is stored in memory. Yay, we found it!
*/
Why is so? Maybe I've just got the address wrong, but how and why? And if I didn't, what could be wrong anyway? Maybe, if that article is wrong, I simply misunderstood what managed class header looks like? Or maybe it doesn't have that Lock pointer - but why and how is this possible?..
(These are, obviously, only a few possible options, and, while I'm still going to carefully check each one I can predict, wild guessing cannot compare in both time and accuracy to a correct answer.)
#HansPassant brilliantly pointed out that the pointer for the object in question points to the second structure, the method table. Now that totally makes sense for performance reasons, as the method table (RTTI structure) is used far more often than the SyncRoot structure, which, therefore, is still located right before it at the negative index -1.
He made it clear that he doesn't want to post this answer so I'm posting it myself, but the credit still goes to him.
But I would like to remind that this is a dirty unreliable hack, possibly making the system unstable:
Beyond the pinning problem, other nasty issues are not having any idea how long the object is and how the fields are arranged.
You should use the debugger instead, unless you understand all the consequences, understand exactly what you are trying to do and really need to do it - using this, dirty and unreliable, way.

Does GCHandle.Alloc allocate memory?

I am using .NET Memory Profiler from SciTech to reduce memory allocations rate of my program and cut frequency of garbage collections.
Surprisingly, according to the profiler, the largest amount of allocations seems to be coming from GCHandle.Alloc calls I am doing to marshall existing .NET arrays to native OpenGL.
My understanding is that calling GCHandle.Alloc does not allocate memory, it only pins existing memory on the managed heap?
Am I wrong or is the profiler wrong?
.NET reference source is available for anyone to see, and you can have a look and find out for yourself.
If you dig into GCHandle.Alloc, you'll see the it calls a native method called InternalAlloc:
[System.Security.SecurityCritical] // auto-generated
[MethodImplAttribute(MethodImplOptions.InternalCall)]
[ResourceExposure(ResourceScope.None)]
internal static extern IntPtr InternalAlloc(Object value, GCHandleType type);
Drilling down into the CLR code, you see the internal call to MarshalNative::InternalAlloc, which ends up calling:
hnd = GetAppDomain()->CreateTypedHandle(objRef, type);
Which in turns calls ObjectHandle::CreateTypedHandle -> HandleTable::HndCreateHandle -> HandleTableCache->TableAllocSingleHandleFromCache which allocates the handle if the it doesn't exist in the cache.
As #Antosha corrected me, the place of invocation isn't via ComDelegate (which actually makes little since) but via MarshalNative. An allocation does occur, not on the managed heap, but an external heap reserved by the runtime for managing handle roots into GC objects. The only allocation that does occur in the managed heap is the IntPtr which holds to pointer to the address in the table. Despite this, you should still make sure to call GCHandle.Free once you're done.
The profiler is even assigning a specific memory amount to each GCHandle I allocate - 8 bytes. And the managed heap seems to grow 8 bytes with each GCHandle.Alloc. So it seems that it actually does allocate space on managed heap, although I have no idea what for?
I don't know how an handle could be smaller :) I've done some tests:
Console.WriteLine("Is 64 bit: {0}, IntPtr.Size: {1}", Environment.Is64BitProcess, IntPtr.Size);
int[][] objects = new int[100000][];
for (int i = 0; i < objects.Length; i++)
{
objects[i] = new int[] { 0 };
}
long w1 = Environment.WorkingSet;
GCHandle[] handles = new GCHandle[objects.Length];
for (int i = 0; i < handles.Length; i++)
{
//handles[i] = new GCHandle(handles);
//handles[i] = GCHandle.Alloc(objects[i]);
handles[i] = GCHandle.Alloc(objects[i], GCHandleType.Pinned);
}
Console.WriteLine("Allocated");
long w2 = Environment.WorkingSet;
Console.WriteLine("Used: {0}, by handle: {1}", w2 - w1, ((double)(w2 - w1)) / handles.Length);
Console.ReadKey();
It is a small program. If you run, you will see that an "empty" GCHandle (one created with new GCHandle()) occupies IntPtr.Size memory. This is clear if you use ILSpy to look at it: it has a single IntPtr field. If you pin some object, then it occupies 2*IntPtr.Size memory. This probably because it has to write something in a CLR table (of size IntPtr) plus its internal IntPtr
Taken from https://stackoverflow.com/a/18122621/613130
It uses a dedicated table of GC handles built inside the CLR. You allocate an entry into this table with GCHandle.Alloc() and release it again later with GCHandle.Free(). The garbage collector simply adds the entries in this table to the graph of objects it discovered itself when it performs a collection.
from msdn
A new GCHandle that protects the object from garbage collection. This GCHandle must be released with Free when it is no longer needed.
So even if no real allocation is made, I guess no release of preallocated can be done until a call to Free.

C# Collect garbage of object with memory leak

I am using a 3rd-party object I didn't create that over time consumes a lot of resources. This object shouldn't in any way contain a state, it simply performs a calculation. Despite this fact, everytime I call a specific function of this object a little more memory is consumed. A few hours later, and my program is sitting at gigabytes of allocated memory.
The object was origionaly initialized as a static member of my Program class in my command-line application. I have found that if I wrap my entire program in an class, and reinitialize it every now and again, the older (and bloated) object is unallocated by GC and a new smaller object replaces it.
My issue is this method is quite clumsy and ruins the flow of my Program.
Is there any other way you can dispose of an object? I am lead to believe GC.Collect() will only dispose unreachable code. Is there anyway I can make an object 'unreachable'?
Edit: As requested, the code:
static ILexicon lexicon = new Lexicon();
...
lexicon.LoadDataFromFile(#"lexicon.dat", null);
...
byte similarityScore(string w1, string w2, PartOfSpeech pos, SimilarityMeasure measure)
{
if (w1 == w2)
return 255;
if (pos != PartOfSpeech.Noun && pos != PartOfSpeech.Verb)
return 0;
IList<ILemma> w1_lemmas = lexicon.FindSenses(w1, pos);
IList<ILemma> w2_lemmas = lexicon.FindSenses(w2, pos);
byte result;
byte score = 0;
foreach (ILemma w1_lemma in w1_lemmas)
{
foreach (ILemma w2_lemma in w2_lemmas)
{
result = (byte) (w1_lemma.GetSimilarity(w2_lemma, measure) * 255);
if (result > score)
score = result;
}
}
return score;
}
As similarityScore is called, more memory is allocated to a private member of lexicon. It does not implement IDisposable and there are no obvious functions to clear the memory. The library is based on WordNet, and uses an algorithm to find path lengths in the hypernym tree to calculate the similarity of two words. Unless there is caching, I can't see why it would need to store any memory. What is for sure, is I can't change it. I'm almost certain there is nothing wrong with my code. I just need to dispose of lexicon when it gets too large (N.B. it takes a second or two to load the lexicon from file to memory)
If the object doesn't implement IDisposable and you want to push it out of scope you can set all references to it to null and then the force garbage collection with GC.Collect().
GC.Collect() is very expensive. If you're going to have to do this frequently, you might want to consider contacting the vendor.
Find out:
If you are using their library correctly, or is there something you're doing wrong that's causing the memory leak.
If their library is leaking memory even when used as intended, can they fix the leak?
Additional note: If the 3rd party library is native and you're having to use interop, you can use Marshal.ReleaseComObject to free unmanaged memory.
you could try calling the Dispose() method. This would make the object unusable, so you would have to instantiate another one. I assume your program is in a loop, so it can be a loop variable with the call to dispose at the bottom.
I would suggest that if you can get your hands on a memory profiler, you use it. A memory profiler will let you pause your program, click on a class, and and see a list of objects of that class. One can then click on an object and see how it was created, and the "path" to that object from a root (e.g. there's a static class foo, which holds a reference to a bar, which holds a reference to a boz, which holds a reference to a reallybigthing). Often, seeing that will make it clear what needs to be done to break the chain.
you might be able to download the source from wordnet repository and modify the code since it is an opensource.

Is c# compiler deciding to use stackalloc by itself?

I found a blog entry which suggests that sometimes c# compiler may decide to put array on the stack instead of the heap:
Improving Performance Through Stack Allocation (.NET Memory Management: Part 2)
This guy claims that:
The compiler will also sometimes decide to put things on the stack on its own. I did an experiment with TestStruct2 in which I allocated it both an unsafe and normal context. In the unsafe context the array was put on the heap, but in the normal context when I looked into memory the array had actually been allocated on the stack.
Can someone confirm that?
I was trying to repeat his example, but everytime I tried array was allocated on the heap.
If c# compiler can do such trick without using 'unsafe' keyword I'm specially intrested in it. I have a code that is working on many small byte arrays (8-10 bytes long) and so using heap for each new byte[...] is a waste of time and memory (especially that each object on heap has 8 bytes overhead needed for garbage collector).
EDIT: I just want to describe why it's important to me:
I'm writing library that is communicating with Gemalto.NET smart card which can have .net code working in it. When I call a method that returns something, smart card return 8 bytes that describes me the exact Type of return value. This 8 bytes are calculated by using md5 hash and some byte arrays concatenations.
Problem is that when I have an array that is not known to me I must scan all types in all assemblies loaded in application and for each I must calculate those 8 bytes until I find the same array.
I don't know other way to find the type, so I'm trying to speed it up as much as possible.
Author of the linked-to article here.
It seems impossible to force stack allocation outside of an unsafe context. This is likely the case to prevent some classes of stack overflow condition.
Instead, I recommend using a memory recycler class which would allocate byte arrays as needed but also allow you to "turn them in" afterward for reuse. It's as simple as keeping a stack of unused byte arrays and, when the list is empty, allocating new ones.
Stack<Byte[]> _byteStack = new Stack<Byte[]>();
Byte[] AllocateArray()
{
Byte[] outArray;
if (_byteStack.Count > 0)
outArray = _byteStack.Pop();
else
outArray = new Byte[8];
return outArray;
}
void RecycleArray(Byte[] inArray)
{
_byteStack.Push(inArray);
}
If you are trying to match a hash with a type it seems the best idea would be to use a Dictionary for fast lookups. In this case you could load all relevant types at startup, if this causes program startup to become too slow you might want to consider caching them the first time each type is used.
From your line:
I have a code that is working on many small byte arrays (8-10 bytes long)
Personally, I'd be more interested in allocating a spare buffer somewhere that different parts of your code can re-use (while processing the same block). Then you don't have any creation/GC to worry about. In most cases (where the buffer is used for very discreet operations) with a scratch-buffer, you can even always assume that it is "all yours" - i.e. every method that needs it can assume that they can start writing at zero.
I use this single-buffer approach in some binary serialization code (while encoding data); it is a big boost to performance. In my case, I pass a "context" object between the layers of serialization (that encapsulates the scratch-buffer, the output-stream (with some additional local buffering), and a few other oddities).
System.Array (the class representing an array) is a reference type and lives on the heap. You can only have an array on the stack if you use unsafe code.
I can't see where it says otherwise in the article that you refer to. If you want to have a stack allocated array, you can do something like this:
decimal* stackAllocatedDecimals = stackalloc decimal[4];
Personally I wouldn't bother- how much performance do you think you will gain by this approach?
This CodeProject article might be useful to you though.

Categories

Resources