Garbage Collection Across C# and C++/CLI Objects

Garbage Collection Across C# and C++/CLI Objects - c#

I'm currently looking into using C++/CLI to bridge the gap between managed C# and native, unmanaged C++ code. One particular issue I'm looking to resolve, is the conversion of data types that are different in C# and C++.
While reading up on the use of such a bridging approach and the performance implications involved, I wondered how Garbage Collection would work. Specifically, how the Garbage Collector would handle cleanup of objects created on either side, if they are referenced / destroyed on the 'other side'.
So far, I've read various articles and forum questions on StackOverflow and MSDN, which has lead me to believe that the Garbage Collector should work across both types of code when running in the same process - i.e if an object was created in C# and passed to the C++/CLI bridge, it would not be collected until the references on both sides were no longer in use.
My question in this case, breaks down into three parts:
Am I right in concluding that the Garbage Collector works across both portions of code (C# and C++/CLI) when running in the same process?
In relation to 1: how does it work in such a circumstance (specifically in terms of cleaning up objects referenced by both code bases).
Are there any suggestions on how to monitor the activity of the Garbage Collector - i.e. writing tests to check when Garbage Collection occurs; or a program that monitors the Garbage Collector itself.
I already have somewhat of an understanding of how the Garbage Collector works in general, so my questions here are specific to the following scenario:
Components
Assembly A - (written in C#)
Assembly B - (written in C++/CLI)
Program Execution
Object O is created in Assembly A.
Object O is passed into a function inside Assembly B.
Reference to object O in Assembly A is released.
Assembly B holds onto reference to object O.
Execution ends (i.e. via program exit).
Assembly B releases reference to object O.
Thanks in advance for any thoughts on this question. Let me know if further information is needed or if something is not clear enough.
EDIT
As per request, I have written a rough example of the scenario I'm trying to describe. The C# and C++/CLI code can be found on PasteBin.

When the code is actually running, none of it will be C# or C++/CLI. All of it will be IL from the C# and C++/CLI and machine code from the native code you're interoperating with.
Hence you could re-write part of your question as:
Assembly A - (IL and we don't know what it was written in)
Assembly B - (IL and we don't know what it was written in)
Of the managed objects, all of them will be garbage collected as per the same rules, unless you use a mechanism to prevent it (GC.KeepAlive). All of them could be moved in memory unless you pin them (because you're passing addresses to unmanaged code.
.NET Profiler will give you some information on garbage collection, as will the collection counts in performance monitor.

Am I right in concluding that the Garbage Collector works across both
portions of code (C# and C++/CLI) when running in the same process?
Yes, a single garbage collector works inside one process for both (C# and Managed C++) . If inside one process there is code that is running under different CLR versions then there will be different instance of GC for each CLR version
In relation to 1: how does it work in such a circumstance (specifically
in terms of cleaning up objects referenced by both code bases).
I think for GC it doesn't matter if the code is managed C# or C++/CLI (Note that GC will only manage C# and Managed C++ code not native C++). It will work in it's own way without considering what type of code underlying is. Regarding freeing memory, GC will do that whenever an object can no longer be referenced. So as long as there is something referring to a variable it won't be collected regardless of assembly. For native C++ code you will have to manually free every dynamically allocated memory
Are there any suggestions on how to monitor the activity of the Garbage
Collector - i.e. writing tests to check when Garbage Collection occurs; or
a program that monitors the Garbage Collector itself.
You can use tools like .Net Profiler for monitoring. Also take a look at Garbage Collection Notifications

Related

Managed vs unmanaged resource from a class creation viewpoint

What I am trying to understand is when I am creating my own classes, how do I know what is a managed vs unmanaged resource so I know if my class needs to provide the ability to clean it up or if GC will eventually do it. Also, going a little deeper, when I create a .Dispose() method there will be a block for managed resources and a block for unmanaged resources and how do I know which resources should get cleaned up in which block.
I have read many answers about managed vs unmanaged resources in a C# program but most of them are providing the definition with regards to GC cleanup as in "managed resource are cleaned up by GC and unmanaged resources are not". That doesn't help me because I can't see how GC determines what it will clean up and what it will leave behind. I also understand that if a class provides a .Dispose() method that my program should execute it.
I have seen answers stating that if I use a WIN32 API, I've created an unmanaged resource. If I don't call a WIN32 API, does that mean I don't have any unmanaged resources? I've also stumbled over Marshall. Does Marshall also create unmanaged resources? Are there other "keywords / classes" to use to identify that I'm creating unmanaged resources?
Please exclude from your answers anything about "managed resources that are tying up huge amounts of memory". I understand that it would be nice to give the ability to free up this memory but it is not a requirement as the GC will eventually do it, just not always in a timely manner.

Usually if you are not crossing the boundaries of native and managed codes you don't have to bother about releasing unmanaged resources in your classes.
When you are running your .NET application, the framework allocates a managed slice in the memory for it, where almost everything that you can access from the .NET framework will be stored and tracked by the GC. Everything else falls outside of this slice, left without the sharp eye of the GC.
So for your question about how the GC determines which resources should be collected and which not, the short answer is that it doesn't know anything about unmanaged resources, so it also does not able to collect them.
These worlds - the native and the managed - are separated but they can communicate with eachother and thats what Marshalling is for. You can read more about it here. With that of course you can create unmanaged resources but that does not mean that you will do every time when you are using it.
It's also a bit extreme to say that every time you are using Win32 APIs you will create native resources that you must release.
When you use Platform Invoke or C++/CLI wrapper calls on any native code which creates pointers or anything which should be released by hand in the native world (those of course are not tracked by the GC), you have to release them manually if they are not released already by the native side. But if you use APIs that just works with primitive types then you don't have to release anything.
If you are not using anything from above then there is a good chance that you don't have to prepare your classes for directly release anything unmanaged.
There are types using native resources - you probably already came accross - which are managed wrappers under the hood. They release those resources in their Dispose implementation with marshalling.
For example the FileStream managed class is holding an unmanaged handle to the given file. The FileStream itself is a managed class tracked and collected by the GC, but the unmanaged handle is not, it must be released manually, so if you, the user of the FileStream are not calling its Dispose method in your code, that handle will remain in the memory leaking until the application exits.

C# - Fundamental Misunderstanding about Garbage Collection and Managed and Unmanaged Resources [duplicate]

I want to know about unmanaged resources.
Can anyone please give me a basic idea?

Managed resources basically means "managed memory" that is managed by the garbage collector. When you no longer have any references to a managed object (which uses managed memory), the garbage collector will (eventually) release that memory for you.
Unmanaged resources are then everything that the garbage collector does not know about. For example:
Open files
Open network connections
Unmanaged memory
In XNA: vertex buffers, index buffers, textures, etc.
Normally you want to release those unmanaged resources before you lose all the references you have to the object managing them. You do this by calling Dispose on that object, or (in C#) using the using statement which will handle calling Dispose for you.
If you neglect to Dispose of your unmanaged resources correctly, the garbage collector will eventually handle it for you when the object containing that resource is garbage collected (this is "finalization"). But because the garbage collector doesn't know about the unmanaged resources, it can't tell how badly it needs to release them - so it's possible for your program to perform poorly or run out of resources entirely.
If you implement a class yourself that handles unmanaged resources, it is up to you to implement Dispose and Finalize correctly.

Some users rank open files, db connections, allocated memory, bitmaps, file streams etc. among managed resources, others among unmanaged. So are they managed or unmanaged?
My opinion is, that the response is more complex: When you open file in .NET, you probably use some built-in .NET class System.IO.File, FileStream or something else. Because it is a normal .NET class, it is managed. But it is a wrapper, which inside does the "dirty work" (communicates with the operating system using Win32 dlls, calling low level functions or even assembler instructions) which really open the file. And this is, what .NET doesn't know about, unmanaged.
But you perhaps can open the file by yourself using assembler instructions and bypass .NET file functions. Then the handle and the open file are unmanaged resources.
The same with the DB: If you use some DB assembly, you have classes like DbConnection etc., they are known to .NET and managed. But they wrap the "dirty work", which is unmanaged (allocate memory on server, establish connection with it, ...).
If you don't use this wrapper class and open some network socket by yourself and communicate with your own strange database using some commands, it is unmanaged.
These wrapper classes (File, DbConnection etc.) are managed, but they inside use unmanaged resources the same way like you, if you don't use the wrappers and do the "dirty work" by yourself. And therefore these wrappers DO implement Dispose/Finalize patterns. It is their responsibility to allow programmer to release unmanaged resources when the wrapper is not needed anymore, and to release them when the wrapper is garbage collected. The wrapper will be correctly garbage collected by garbage collector, but the unmanaged resources inside will be collected by using the Dispose/Finalize pattern.
If you don't use built-in .NET or 3rd party wrapper classes and open files by some assembler instructions etc. in your class, these open files are unmanaged and you MUST implement dispose/finalise pattern. If you don't, there will be memory leak, forever locked resource etc. even when you don't use it anymore (file operation complete) or even after you application terminates.
But your responsibility is also when using these wrappers. For those, which implement dispose/finalise (you recognize them, that they implement IDisposable), implement also your dispose/finalise pattern and Dispose even these wrappers or give them signal to release their unmanaged resources. If you don't, the resources will be after some indefinite time released, but it is clean to release it immediately (close the file immediately and not leaving it open and blocked for random several minutes/hours). So in your class's Dispose method you call Dispose methods of all your used wrappers.

Unmanaged resources are those that run outside the .NET runtime (CLR)(aka non-.NET code.) For example, a call to a DLL in the Win32 API, or a call to a .dll written in C++.

An "unmanaged resource" is not a thing, but a responsibility. If an object owns an unmanaged resource, that means that (1) some entity outside it has been manipulated in a way that may cause problems if not cleaned up, and (2) the object has the information necessary to perform such cleanup and is responsible for doing it.
Although many types of unmanaged resources are very strongly associated with various type of operating-system entities (files, GDI handles, allocated memory blocks, etc.) there is no single type of entity which is shared by all of them other than the responsibility of cleanup. Typically, if an object either has a responsibility to perform cleanup, it will have a Dispose method which instructs it to carry out all cleanup for which it is responsible.
In some cases, objects will make allowances for the possibility that they might be abandoned without anyone having called Dispose first. The GC allows objects to request notification that they've been abandoned (by calling a routine called Finalize), and objects may use this notification to perform cleanup themselves.
Terms like "managed resource" and "unmanaged resource" are, unfortunately, used by different people to mean different things; frankly think it's more useful to think in terms of objects as either not having any cleanup responsibility, having cleanup responsibility that will only be taken care of if Dispose is called, or having cleanup responsibility which should be taken care of via Dispose, but which can also be taken care of by Finalize.

The basic difference between a managed and unmanaged resource is that the
garbage collector knows about all managed resources, at some point in time
the GC will come along and clean up all the memory and resources associated
with a managed object. The GC does not know about unmanaged resources, such
as files, stream and handles, so if you do not clean them up explicitly in
your code then you will end up with memory leaks and locked resources.
Stolen from here, feel free to read the entire post.

Any resource for which memory is allocated in the .NET managed heap is a Managed resource. CLR is completly aware of this sort of memory and will do everything to make sure that it doesn't go orphaned. Anything else is unmanaged. For example interoping with COM, might create objects in the proces memory space, but CLR will not take care of it. In this case the managed object that makes calls across the managed boundry should own the responsibility for anything beyond it.

Let us first understand how VB6 or C++ programs (Non Dotnet applications) used to execute.
We know that computers only understand machine level code. Machine level code is also called as native or binary code. So, when we execute a VB6 or C++ program, the respective language compiler, compiles the respective language source code into native code, which can then be understood by the underlying operating system and hardware.
Native code (Unmanaged Code) is specific (native) to the operating system on which it is generated. If you take this compiled native code and try to run on another operating system it will fail. So the problem with this style of program execution is that, it is not portable from one platform to another platform.
Let us now understand, how a .Net program executes. Using dotnet we can create different types of applications. A few of the common types of .NET applications include Web, Windows, Console and Mobile Applications. Irrespective of the type of the application, when you execute any .NET application the following happens
The .NET application gets compiled into Intermediate language (IL). IL is also referred as Common Intermediate language (CIL) and Microsoft Intermediate language (MSIL). Both .NET and non .NET applications generate an assembly. Assemblies have an extension of .DLL or .EXE. For example if you compile a windows or Console application, you get a .EXE, where as when we compile a web or Class library project we get a .DLL. The difference between a .NET and NON .NET assembly is that, DOTNET Assembly is in intermediate language format where as NON DOTNET assembly is in native code format.
NON DOTNET applications can run directly on top of the operating system, where as DOTNET applications run on top of a virtual environment called as Common Language Runtime (CLR). CLR contains a component called Just In-Time Compiler (JIT), which will convert the Intermediate language into native code which the underlying operating system can understand.
So, in .NET the application execution consists of 2 steps
1. Language compiler, compiles the Source Code into Intermediate Language (IL)
2. JIT compiler in CLR converts, the IL into native code which can then be run on the underlying operating system.
Since, a .NET assembly is in Intermedaite Language format and not native code, .NET assemblies are portable to any platform, as long as the target platform has the Common Language Runtime (CLR). The target platform's CLR converts the Intermedaite Language into native code that the underlying operating system can understand. Intermediate Languge is also called as managed code. This is because CLR manages the code that runs inside it. For example, in a VB6 program, the developer is responsible for de-allocating the memory consumed by an object. If a programmer forgets to de-allocate memory, we may run into hard to detecct out of memory exceptions. On the other hand a .NET programmer need not worry about de-allocating the memory consumed by an object. Automatic memory management, also known as grabage collection is provided by CLR. Apart, from garbage collection, there are several other benefits provided by the CLR, which we will discuss in a later session. Since, CLR is managing and executing the Intermediate Language, it (IL) is also called as managed code.
.NET supports different programming languages like C#, VB, J#, and C++. C#, VB, and J# can only generate managed code (IL), where as C++ can generate both managed code (IL) and un-managed code (Native code).
The native code is not stored permanently anywhere, after we close the program the native code is thrown awaya. When we execute the program again, the native code gets generated again.
.NET program is similar to java program execution. In java we have byte codes and JVM (Java Virtual Machine), where as in .NET we Intermediate Language and CLR (Common Language Runtime)
This is provided from this link - He is a great tutor.
http://csharp-video-tutorials.blogspot.in/2012/07/net-program-execution-part-1.html

Unmanaged and managed resources are based on application domain.
From my understanding, unmanaged resource is everything that is used to make connection to outside of your application domain.
It could be HttpClient class that you resort to fetch data outside of your domain or a FileStream that helps you to read/write from/to a file.
we use Using block to dispose these kind of classes objects immediately after our work is done because GC in the first place care about inside the process resources not outside ones, although it will be disposed by the GC at the end.

In what order should one release COM objects and garbage collect?

There are lots of questions on SO regarding the releasing COM objects and garbage collection but nothing I could find that address this question specifically.
When releasing COM objects (specifically Excel Interop in this case), in what order should I be releasing the reference and calling garbage collection?
In some places (such as here) I have seen this:
Marshall.FinalReleaseComObject(obj);
GC.Collect();
GC.WaitForPendingFinalizers();
And in others (such as here) this:
GC.Collect();
GC.WaitForPendingFinalizers();
Marshall.FinalReleaseComObject(obj);
Or doesn't it matter and I'm worrying about nothing?

Marshal.FinalReleaseComObject() releases the underlying COM interface pointer.
GC.Collect() and GC.WaitForPendingFinalizers() causes the finalizer for a COM wrapper to be called, which calls FinalReleaseComObject().
So what makes no sense is to do it both ways. Pick one or the other.
The trouble with explicitly calling FinalReleaseComObject() is that it will only work when you call it for all the interface pointers. The Office program will keep running if you miss just one of them. That's very easy to do, especially the syntax sugar allowed in C# version 4 makes it likely. An expression like range = sheet.Cells[1, 1], very common in Excel interop code. There's a hidden Range interface reference there that you never explicitly store anywhere. So you can't release it either.
That's not a problem with GC.Collect(), it can see them. It is however not entirely without trouble either, it will only collect and run the finalizer when your program has no reference to the interface anymore. Which is definitely what's wrong with your second snippet. And which tends to go wrong when you debug your program, the debugger extends the lifetime of local object references to the end of the method. Also the time you look at Taskmgr and yell "die dammit!"
The usual advice for GC.Collect() applies here as well. Keep your program running and perform work. The normal thing happens, you'll trigger a garbage collection and that releases the COM wrappers as well. And the Office program will exit. It just doesn't happen instantly, it will happen eventually.

Reference counting mechanism that is used by COM is another way of automatic memory management but with slightly different impact on memory and behavior.
Any reference counting implementation provide deterministic behavior for resource cleanup. This means that right after call to Marshal.FinalReleaseComObject() all resources (memory and other resources) related to the COM object would be reclaimed.
This means that if we have additional managed objects and you want to reclaim them as quickly as possible, you should release COM object first and only after that call GC.Collect method.

Comparing shared_ptr to managed language references

Could someone explain to a C++ programmer most important differences between Java (and C# as well) references and shared_ptr (from Boost or from C++0x).
I more or less aware how shared_ptr is implemented. I am curious about differences in the following ares:
1) Performance.
2) Cycling. shared_ptr can be cycled (A and B hold pointers to each other). Is cycling possible in Java?
3) Anything else?
Thank you.

Performance: shared_ptr performs pretty well, but in my experience is slightly less efficient than explicit memory management, mostly because it is reference counted and the reference count has to allocated as well. How well it performs depends on a lot of factors and how well it compares to Java/C# garbage collectors can only be determined on a per use case basis (depends on language implementation among other factors).
Cycling is only possible with weak_ptr, not with two shared_ptrs. Java allows cycling without further ado; its garbage collector will break the cycles. My guess is that C# does the same.
Anything else: the object pointed to by a shared_ptr is destroyed as soon as the last reference to it goes out of scope. The destructor is called immediately. In Java, the finalizer may not be called immediately. I don't know how C# behaves on this point.

The key difference is that when the shared pointer's use count goes to zero, the object it points to is destroyed (destructor is called and object is deallocated), immediately. In Java and C# the deallocation of the object is postponed until the Garbage Collector chooses to deallocate the object (i.e., it is non-deterministic).
With regard to cycles, I am not sure I understand what you mean. It is quite common in Java and C# to have two objects that contain member fields that refer to each other, thus creating a cycle. For example a car and an engine - the car refers to the engine via an engine field and the engine can refer to its car via a car field.

Nobody pointed the possibility of moving the object by the memory manager in managed memory. So in C# there are no simple references/pointers, they work like IDs describing object which is returned by the manager.
In C++ you can't achieve this with shared_ptr, because the object stays in the same location after it has been created.

First of all, Java/C# have only pointers, not references, though they call them that way. Reference is a unique C++ feature. Garbage collection in Java/C# basically means infinite life-time. shared_ptr on the other hand provides sharing and deterministic destruction, when the count goes to zero. Therefore, shared_ptr can be used to automatically manage any resources, not just memory allocation. In a sense (just like any RAII design) it turns pointer semantics into more powerful value semantics.

Cyclical references with C++ reference-counted pointers will not be disposed. You can use weak pointers to work around this. Cyclical references in Java or C# may be disposed, when the garbage collector feels like it.
When the count in a C++ reference-counted pointer drops to zero, the destructor is called. When a Java object is no longer reachable, its finalizer may not be called promptly or ever. Therefore, for objects which require explicit disposal of external resources, some form of explicit call is required.

Memory management - C# VS Objective C?

SO I already know about memory management in objective C, and I never had to know about it while programming in .net (C#). But i still have some questions about how everything is done.
-Why does the code leak in objective c if we allocate an object and not release it?
-Why doesn't this leak in C#?
-What are some advantages and disadvantages of automatic-garbage-collecting?
-Why not use autorelease on every allocated object (Objective C)?
-Is it possible to take care of the memory manually (C#)? so let's say i instantiate an object, and when I'm done i want to release it, and i don't want to wait for the garbage collector to do it?

It leaks in Objective-C because Objective-C doesn’t take any action on it. It relies on you doing all the work. It doesn’t leak in C# (more precisely, in .NET) because it employs a garbage collector which cleans up objects that are no longer used.
The main advantage of garbage collection is the above: you have far fewer memory leaks. (It’s still possible to have a memory leak, e.g. by filling a list indefinitely, but that’s harder to do accidentally.) It used to be thought that garbage collection has a disadvantage in that it could slow down the program because it keeps doing the garbage collection in the background and you have little control over it. In reality, however, the difference is negligible: there are other background tasks on your computer (e.g. device drivers) running all the time, the garbage collector doesn’t break the camel’s back either.
Auto-deallocation (as it is employed in C++ when a non-pointer variable goes out of scope) is dangerous because it opens the possibility to have a reference to it still in existence even after the object has been disposed. If your code then tries to access the object, the process goes kaboom big time.
Yes, it is possible to tell C# to release memory by invoking the garbage collector directly (GC.Collect()). However, I have yet to see a case where this is at all necessary. If you actually run out of memory, the garbage collector will already kick in automatically and free as much as it can.

Objective-C isn't a garbage-collected language, so it has no way of knowing that an object won't be used anymore unless you tell it. That's the purpose of the .NET garbage collector: it checks to see which objects can no longer be used by the program, and- at some point- gets rid of them to free up memory. There are no guarantees as to when, or if, it will ever free any given abandoned object; it's just trying to keep memory usage from going out of control.
C# can't release an object without the garbage collector. If you release an object that's still being referenced, your program is going to crash when you try to use that. Which is always the risk of manual memory management, but like all "memory-managed languages", it is trying to prevent you from making exactly that mistake. If you want to explicitly shut down an object's operation, implement the interface IDisposable for that object's type, and use the Dispose() method on that object- essentially a destructor. Be sure you're done with it, of course, and that the object will behave correctly (by throwing exceptions) if something tries to use it after it's been Dispose()d of.
Objective-C is reference-counted. When an object is out of references, it deletes itself. It's not a bad solution to the "is someone still using this object?" problem, except for data structures that refer to themselves; circular data structures will hang around forever unless carefully handled. .NET isn't a reference counter, so it will get rid of circular data structures that can't be reached from running code.
Autorelease is just a "release later", for returning a value that should self-destruct if the code that grabs it doesn't immediately want to hold onto it, as far as I understand. (I'm not an Objective-C programmer, though.) It gets around the "who releases this object?" problem for calls that return an object, without destroying it before the function is finished. It's a special use case, though, and it doesn't make sense in most other cases.

The advantage of automatic garbage collection is that you do not have to explicitly free/release your objects as you said. The disadvantage is you cannot be sure when (or even if) any given object instance will be released.
C# has other mechanisms to release other resources like files, or db connections that have to be released detirminitely. For example, using allows you to make sure that IDispose is called on an object for sure.
Garbage collected systems tend to have more memory in use at any given time than a well tuned manual implementation. Of course, you do not have memory leaks.
The quality and performance of garbage collectors can vary quite a bit which is something you may not have a lot of control over. For example, there may be a noticable lag when the GC runs. In .NET, you can call GC.Collect() to tell the GC that now would be a good time but it may or may not listen.
These days Objective-C is also garbage collected on the Mac. On the iPhone it is reference counted though so I assume that is where you are running into the issue.
On garbage collected systems, you can still run into the issue where a an object hangs onto a reference to an object that you would expect to be garbage collected. The garbage collector will not clean-up this up and so the memory is essentially leaked.
In reference counted systems, you do not have to keep track of every object that points at your object instances but you do have to specify that you want them released.
EDIT: I guess I did not say this explicitly - you cannot manually control memory allocation in C#.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.