C# - Fundamental Misunderstanding about Garbage Collection and Managed and Unmanaged Resources [duplicate]

C# - Fundamental Misunderstanding about Garbage Collection and Managed and Unmanaged Resources [duplicate] - c#

I want to know about unmanaged resources.
Can anyone please give me a basic idea?

Managed resources basically means "managed memory" that is managed by the garbage collector. When you no longer have any references to a managed object (which uses managed memory), the garbage collector will (eventually) release that memory for you.
Unmanaged resources are then everything that the garbage collector does not know about. For example:
Open files
Open network connections
Unmanaged memory
In XNA: vertex buffers, index buffers, textures, etc.
Normally you want to release those unmanaged resources before you lose all the references you have to the object managing them. You do this by calling Dispose on that object, or (in C#) using the using statement which will handle calling Dispose for you.
If you neglect to Dispose of your unmanaged resources correctly, the garbage collector will eventually handle it for you when the object containing that resource is garbage collected (this is "finalization"). But because the garbage collector doesn't know about the unmanaged resources, it can't tell how badly it needs to release them - so it's possible for your program to perform poorly or run out of resources entirely.
If you implement a class yourself that handles unmanaged resources, it is up to you to implement Dispose and Finalize correctly.

Some users rank open files, db connections, allocated memory, bitmaps, file streams etc. among managed resources, others among unmanaged. So are they managed or unmanaged?
My opinion is, that the response is more complex: When you open file in .NET, you probably use some built-in .NET class System.IO.File, FileStream or something else. Because it is a normal .NET class, it is managed. But it is a wrapper, which inside does the "dirty work" (communicates with the operating system using Win32 dlls, calling low level functions or even assembler instructions) which really open the file. And this is, what .NET doesn't know about, unmanaged.
But you perhaps can open the file by yourself using assembler instructions and bypass .NET file functions. Then the handle and the open file are unmanaged resources.
The same with the DB: If you use some DB assembly, you have classes like DbConnection etc., they are known to .NET and managed. But they wrap the "dirty work", which is unmanaged (allocate memory on server, establish connection with it, ...).
If you don't use this wrapper class and open some network socket by yourself and communicate with your own strange database using some commands, it is unmanaged.
These wrapper classes (File, DbConnection etc.) are managed, but they inside use unmanaged resources the same way like you, if you don't use the wrappers and do the "dirty work" by yourself. And therefore these wrappers DO implement Dispose/Finalize patterns. It is their responsibility to allow programmer to release unmanaged resources when the wrapper is not needed anymore, and to release them when the wrapper is garbage collected. The wrapper will be correctly garbage collected by garbage collector, but the unmanaged resources inside will be collected by using the Dispose/Finalize pattern.
If you don't use built-in .NET or 3rd party wrapper classes and open files by some assembler instructions etc. in your class, these open files are unmanaged and you MUST implement dispose/finalise pattern. If you don't, there will be memory leak, forever locked resource etc. even when you don't use it anymore (file operation complete) or even after you application terminates.
But your responsibility is also when using these wrappers. For those, which implement dispose/finalise (you recognize them, that they implement IDisposable), implement also your dispose/finalise pattern and Dispose even these wrappers or give them signal to release their unmanaged resources. If you don't, the resources will be after some indefinite time released, but it is clean to release it immediately (close the file immediately and not leaving it open and blocked for random several minutes/hours). So in your class's Dispose method you call Dispose methods of all your used wrappers.

Unmanaged resources are those that run outside the .NET runtime (CLR)(aka non-.NET code.) For example, a call to a DLL in the Win32 API, or a call to a .dll written in C++.

An "unmanaged resource" is not a thing, but a responsibility. If an object owns an unmanaged resource, that means that (1) some entity outside it has been manipulated in a way that may cause problems if not cleaned up, and (2) the object has the information necessary to perform such cleanup and is responsible for doing it.
Although many types of unmanaged resources are very strongly associated with various type of operating-system entities (files, GDI handles, allocated memory blocks, etc.) there is no single type of entity which is shared by all of them other than the responsibility of cleanup. Typically, if an object either has a responsibility to perform cleanup, it will have a Dispose method which instructs it to carry out all cleanup for which it is responsible.
In some cases, objects will make allowances for the possibility that they might be abandoned without anyone having called Dispose first. The GC allows objects to request notification that they've been abandoned (by calling a routine called Finalize), and objects may use this notification to perform cleanup themselves.
Terms like "managed resource" and "unmanaged resource" are, unfortunately, used by different people to mean different things; frankly think it's more useful to think in terms of objects as either not having any cleanup responsibility, having cleanup responsibility that will only be taken care of if Dispose is called, or having cleanup responsibility which should be taken care of via Dispose, but which can also be taken care of by Finalize.

The basic difference between a managed and unmanaged resource is that the
garbage collector knows about all managed resources, at some point in time
the GC will come along and clean up all the memory and resources associated
with a managed object. The GC does not know about unmanaged resources, such
as files, stream and handles, so if you do not clean them up explicitly in
your code then you will end up with memory leaks and locked resources.
Stolen from here, feel free to read the entire post.

Any resource for which memory is allocated in the .NET managed heap is a Managed resource. CLR is completly aware of this sort of memory and will do everything to make sure that it doesn't go orphaned. Anything else is unmanaged. For example interoping with COM, might create objects in the proces memory space, but CLR will not take care of it. In this case the managed object that makes calls across the managed boundry should own the responsibility for anything beyond it.

Let us first understand how VB6 or C++ programs (Non Dotnet applications) used to execute.
We know that computers only understand machine level code. Machine level code is also called as native or binary code. So, when we execute a VB6 or C++ program, the respective language compiler, compiles the respective language source code into native code, which can then be understood by the underlying operating system and hardware.
Native code (Unmanaged Code) is specific (native) to the operating system on which it is generated. If you take this compiled native code and try to run on another operating system it will fail. So the problem with this style of program execution is that, it is not portable from one platform to another platform.
Let us now understand, how a .Net program executes. Using dotnet we can create different types of applications. A few of the common types of .NET applications include Web, Windows, Console and Mobile Applications. Irrespective of the type of the application, when you execute any .NET application the following happens
The .NET application gets compiled into Intermediate language (IL). IL is also referred as Common Intermediate language (CIL) and Microsoft Intermediate language (MSIL). Both .NET and non .NET applications generate an assembly. Assemblies have an extension of .DLL or .EXE. For example if you compile a windows or Console application, you get a .EXE, where as when we compile a web or Class library project we get a .DLL. The difference between a .NET and NON .NET assembly is that, DOTNET Assembly is in intermediate language format where as NON DOTNET assembly is in native code format.
NON DOTNET applications can run directly on top of the operating system, where as DOTNET applications run on top of a virtual environment called as Common Language Runtime (CLR). CLR contains a component called Just In-Time Compiler (JIT), which will convert the Intermediate language into native code which the underlying operating system can understand.
So, in .NET the application execution consists of 2 steps
1. Language compiler, compiles the Source Code into Intermediate Language (IL)
2. JIT compiler in CLR converts, the IL into native code which can then be run on the underlying operating system.
Since, a .NET assembly is in Intermedaite Language format and not native code, .NET assemblies are portable to any platform, as long as the target platform has the Common Language Runtime (CLR). The target platform's CLR converts the Intermedaite Language into native code that the underlying operating system can understand. Intermediate Languge is also called as managed code. This is because CLR manages the code that runs inside it. For example, in a VB6 program, the developer is responsible for de-allocating the memory consumed by an object. If a programmer forgets to de-allocate memory, we may run into hard to detecct out of memory exceptions. On the other hand a .NET programmer need not worry about de-allocating the memory consumed by an object. Automatic memory management, also known as grabage collection is provided by CLR. Apart, from garbage collection, there are several other benefits provided by the CLR, which we will discuss in a later session. Since, CLR is managing and executing the Intermediate Language, it (IL) is also called as managed code.
.NET supports different programming languages like C#, VB, J#, and C++. C#, VB, and J# can only generate managed code (IL), where as C++ can generate both managed code (IL) and un-managed code (Native code).
The native code is not stored permanently anywhere, after we close the program the native code is thrown awaya. When we execute the program again, the native code gets generated again.
.NET program is similar to java program execution. In java we have byte codes and JVM (Java Virtual Machine), where as in .NET we Intermediate Language and CLR (Common Language Runtime)
This is provided from this link - He is a great tutor.
http://csharp-video-tutorials.blogspot.in/2012/07/net-program-execution-part-1.html

Unmanaged and managed resources are based on application domain.
From my understanding, unmanaged resource is everything that is used to make connection to outside of your application domain.
It could be HttpClient class that you resort to fetch data outside of your domain or a FileStream that helps you to read/write from/to a file.
we use Using block to dispose these kind of classes objects immediately after our work is done because GC in the first place care about inside the process resources not outside ones, although it will be disposed by the GC at the end.

Related

Managed vs unmanaged resource from a class creation viewpoint

What I am trying to understand is when I am creating my own classes, how do I know what is a managed vs unmanaged resource so I know if my class needs to provide the ability to clean it up or if GC will eventually do it. Also, going a little deeper, when I create a .Dispose() method there will be a block for managed resources and a block for unmanaged resources and how do I know which resources should get cleaned up in which block.
I have read many answers about managed vs unmanaged resources in a C# program but most of them are providing the definition with regards to GC cleanup as in "managed resource are cleaned up by GC and unmanaged resources are not". That doesn't help me because I can't see how GC determines what it will clean up and what it will leave behind. I also understand that if a class provides a .Dispose() method that my program should execute it.
I have seen answers stating that if I use a WIN32 API, I've created an unmanaged resource. If I don't call a WIN32 API, does that mean I don't have any unmanaged resources? I've also stumbled over Marshall. Does Marshall also create unmanaged resources? Are there other "keywords / classes" to use to identify that I'm creating unmanaged resources?
Please exclude from your answers anything about "managed resources that are tying up huge amounts of memory". I understand that it would be nice to give the ability to free up this memory but it is not a requirement as the GC will eventually do it, just not always in a timely manner.

Usually if you are not crossing the boundaries of native and managed codes you don't have to bother about releasing unmanaged resources in your classes.
When you are running your .NET application, the framework allocates a managed slice in the memory for it, where almost everything that you can access from the .NET framework will be stored and tracked by the GC. Everything else falls outside of this slice, left without the sharp eye of the GC.
So for your question about how the GC determines which resources should be collected and which not, the short answer is that it doesn't know anything about unmanaged resources, so it also does not able to collect them.
These worlds - the native and the managed - are separated but they can communicate with eachother and thats what Marshalling is for. You can read more about it here. With that of course you can create unmanaged resources but that does not mean that you will do every time when you are using it.
It's also a bit extreme to say that every time you are using Win32 APIs you will create native resources that you must release.
When you use Platform Invoke or C++/CLI wrapper calls on any native code which creates pointers or anything which should be released by hand in the native world (those of course are not tracked by the GC), you have to release them manually if they are not released already by the native side. But if you use APIs that just works with primitive types then you don't have to release anything.
If you are not using anything from above then there is a good chance that you don't have to prepare your classes for directly release anything unmanaged.
There are types using native resources - you probably already came accross - which are managed wrappers under the hood. They release those resources in their Dispose implementation with marshalling.
For example the FileStream managed class is holding an unmanaged handle to the given file. The FileStream itself is a managed class tracked and collected by the GC, but the unmanaged handle is not, it must be released manually, so if you, the user of the FileStream are not calling its Dispose method in your code, that handle will remain in the memory leaking until the application exits.

Several AppDomains and native code

My C# application is using native code which is not thread safe.
I can run multiple processes of that native code, using inter-process communication to achieve concurrency.
My question is, can i use App Domains instead, so that several managed threads, each on a different App Domain, will call the native code and they will not interfere with each other?
The main goal is to prevent process seperation.

No, AppDomains are a pure managed code concept. It achieves isolation by keeping the managed object roots separate. One AppDomain cannot see the objects of another, makes it very safe to abort code and unload assemblies. Never an accident, it throws away all the data that might contain state.
Unmanaged code is completely agnostic of the GC heap and thus AppDomains, it will allocate in its data section and its own native heap (HeapAlloc). Such allocations are process-global. That makes a process the isolation boundary, you'd need a helper process that loads the DLL and talk to it with one of the .NET process interop mechanisms (socket, named pipe, memory-mapped file, remoting, WCF).
Technically you could create copies of the DLL, each with a different name. But that scales very poorly and the pinvoke is very awkward since you can't use [DllImport] anymore. You need a delegate declaration for each exported function and LoadLibrary() and GetProcAddress() to initialize the delegate objects.

Yes it can be done but you should seriously measure if effort is repaid by benefits.
Windows won't load multiple copies of an unmanaged DLL and unmanaged DLLs are loaded per-process (not per AppDomain). What you can do is to create multiple temporary copies of same DLL then load them with LoadLibrary().
Each one will be loaded per-process but they'll be separated from each other (so they'll be thread-safe). All this stuff can be tied inside a class that wraps unmanaged calls (LoadLibrary, FreeLibrary, GetProcAddress and invocation itself). It'll use less resources and it'll be faster than multiple processes but you'll have to drop DllImport usage.
The only benefit I see is that this will scale much better than multiple processes (because it uses less resources) of course if you reuse instances keeping a cache (it's harder to keep a process cache than an object cache).

Does Unmanaged C# code compile into IL and run on the CLR?

In the midst of asking about manually managing CLR memory, I realized I know very little.
I'm aware that the CLR will place a 'cookie' on the stack when you exit a managed context, so that the Garbage Collector won't trample your memory space; however, in everything I've read the assumption is that you are calling some library written in C.
I want to an entire write layer of my application in C#, outside of the managed context, to manage data at a low level. Then, I want to access this layer from a managed layer.
In this case, will my Unmanaged C# code compile to IL and be run on the CLR? How does this work?

I assume this is related to the same C# database project you mentioned in the question.
It is technically possible to implement an entire write layers in C/C++ or any other language. And it is technically possible to have everything else in C#. I am currently working on an application that uses unmanaged code for some high-performance low level stuff and C# for business logic and upper level management.
However, the complexity of the task shall not be underestimated. The typical way to do this, is to design a contract that both parties can understand. The contract will be exposed to the managed language and managed language will trigger calls to the native application. If you have ever tried calling a C++ method from C# you will get the idea... Plus every call to unmanaged code has quite significant performance overhead, which may kill the whole idea of low level performance.
If you really interested in high-performance relational databases, then use single low level language.
If you want to have a naive, but fully working implementation of a database, just use C#. Don't mix these two languages unless you fully understand the complexity. See Raven DB - a document based NoSQL databases that is fully built in C# only.
Will my Unmanaged C# code compile to IL and be run on the CLR?
No, there is no such thing as unmanaged C#. C# code will be always compiled into the IL code and executed by CLR. It is the case of managed code calling unmanaged code. Unmanaged code can be implemented in several languages C/C++/Assembly etc, but CLR will have no idea of what is happening in that code.
Update from a comment. There is a tool (ngen.exe) that can compile C# directly into native architecture specific code. This tool is designed to improve performance of the managed application by removing JIT-compilation stage and putting native code directly into the executable image or library. This code, however, is still "managed" by the CLR host - memory allocation and collection, managed threading, application domains, exception handling, security and all other aspects are still controlled by the CLR. So even though C# can technically be compiled into native code, this code is not running as a standalone native image.
How does this work?
Managed code interoperate with unmanaged code. There are couple of ways to do this:
Through the code via .Net Interop. This is relatively fast but looks a bit ugly in code (plus it is hard to maintain/test) (good article with C#/C/Assembly samples)
A much much slower approach, but more open to other languages: web wervices (SOAP, WS, REST and company), queueing (such as MSMQ, NServiceBus and others), also (possibly) interprocess communication. So unmanaged process sits on one end and a managed application sits on the other one.

I know this is a C# question, but if you are comfortable with C++, C++/CLI might be an option worth considering.
It allows you to selectively compile portions of your C++ code to either a managed or an unmanaged context - However be aware that code that interacts with CLR types MUST run in a managed context.
I'm not aware of the runtime cost of transitioning from managed-context to unmanaged-context and viceversa from within the C++ code, but I assume it must be similar to the cost of calling a native method via .net Interop from C#, which as #oleksii already pointed out, is expensive. In my experience this has really paid off if you need to interact frequently with native C or C++ libraries - IMHO it is much easier to call them from within a C++/CLI project rather than writing the required .net Interop interfaces in C#.
See this question for a litte bit of information on how it is done.

Garbage Collection Across C# and C++/CLI Objects

I'm currently looking into using C++/CLI to bridge the gap between managed C# and native, unmanaged C++ code. One particular issue I'm looking to resolve, is the conversion of data types that are different in C# and C++.
While reading up on the use of such a bridging approach and the performance implications involved, I wondered how Garbage Collection would work. Specifically, how the Garbage Collector would handle cleanup of objects created on either side, if they are referenced / destroyed on the 'other side'.
So far, I've read various articles and forum questions on StackOverflow and MSDN, which has lead me to believe that the Garbage Collector should work across both types of code when running in the same process - i.e if an object was created in C# and passed to the C++/CLI bridge, it would not be collected until the references on both sides were no longer in use.
My question in this case, breaks down into three parts:
Am I right in concluding that the Garbage Collector works across both portions of code (C# and C++/CLI) when running in the same process?
In relation to 1: how does it work in such a circumstance (specifically in terms of cleaning up objects referenced by both code bases).
Are there any suggestions on how to monitor the activity of the Garbage Collector - i.e. writing tests to check when Garbage Collection occurs; or a program that monitors the Garbage Collector itself.
I already have somewhat of an understanding of how the Garbage Collector works in general, so my questions here are specific to the following scenario:
Components
Assembly A - (written in C#)
Assembly B - (written in C++/CLI)
Program Execution
Object O is created in Assembly A.
Object O is passed into a function inside Assembly B.
Reference to object O in Assembly A is released.
Assembly B holds onto reference to object O.
Execution ends (i.e. via program exit).
Assembly B releases reference to object O.
Thanks in advance for any thoughts on this question. Let me know if further information is needed or if something is not clear enough.
EDIT
As per request, I have written a rough example of the scenario I'm trying to describe. The C# and C++/CLI code can be found on PasteBin.

When the code is actually running, none of it will be C# or C++/CLI. All of it will be IL from the C# and C++/CLI and machine code from the native code you're interoperating with.
Hence you could re-write part of your question as:
Assembly A - (IL and we don't know what it was written in)
Assembly B - (IL and we don't know what it was written in)
Of the managed objects, all of them will be garbage collected as per the same rules, unless you use a mechanism to prevent it (GC.KeepAlive). All of them could be moved in memory unless you pin them (because you're passing addresses to unmanaged code.
.NET Profiler will give you some information on garbage collection, as will the collection counts in performance monitor.

Am I right in concluding that the Garbage Collector works across both
portions of code (C# and C++/CLI) when running in the same process?
Yes, a single garbage collector works inside one process for both (C# and Managed C++) . If inside one process there is code that is running under different CLR versions then there will be different instance of GC for each CLR version
In relation to 1: how does it work in such a circumstance (specifically
in terms of cleaning up objects referenced by both code bases).
I think for GC it doesn't matter if the code is managed C# or C++/CLI (Note that GC will only manage C# and Managed C++ code not native C++). It will work in it's own way without considering what type of code underlying is. Regarding freeing memory, GC will do that whenever an object can no longer be referenced. So as long as there is something referring to a variable it won't be collected regardless of assembly. For native C++ code you will have to manually free every dynamically allocated memory
Are there any suggestions on how to monitor the activity of the Garbage
Collector - i.e. writing tests to check when Garbage Collection occurs; or
a program that monitors the Garbage Collector itself.
You can use tools like .Net Profiler for monitoring. Also take a look at Garbage Collection Notifications

What is meant by "managed" vs "unmanaged" resources in .NET?

What is meant by the terms managed resource and unmanaged resource in .NET? How do they come into the picture?

The term "unmanaged resource" is usually used to describe something not directly under the control of the garbage collector. For example, if you open a connection to a database server this will use resources on the server (for maintaining the connection) and possibly other non-.net resources on the client machine, if the provider isn't written entirely in managed code.
This is why, for something like a database connection, it's recommended you write your code thusly:
using (var connection = new SqlConnection("connection_string_here"))
{
// Code to use connection here
}
As this ensures that .Dispose() is called on the connection object, ensuring that any unmanaged resources are cleaned up.

Managed resources are those that are pure .NET code and managed by the runtime and are under its direct control.
Unmanaged resources are those that are not. File handles, pinned memory, COM objects, database connections etc.

In the Q&A What are unmanaged resources?1, Bruce Wood posted the following:
I think of the terms "managed" and "unmanaged" this way:
"Managed" refers to anything within the .NET sandbox. This includes
all .NET Framework classes.
"Unmanaged" refers to the wilderness outside the .NET sandbox. This
includes anything that is returned to you through calls to Win32 API
functions.
If you never call a Win32 API function and never get back any Win32
"handle" objects, then you are not holding any unmanaged resources.
Files and streams that you open via .NET Framework class methods are
all managed wrappers.
Comment: You may not be holding an unmanaged resource directly. However, you may be holding an unmanaged resource indirectly via a managed "wrapper class" such as System.IO.FileStream. Such a wrapper class commonly implements IDisposable (either directly or via inheritance).
...many managed (.NET Framework) objects are
holding unmanaged resources inside them, and you probably want to
Dispose() of them as soon as you can, or at least offer your callers
the opportunity to do so. That's where writing your own Dispose()
method comes in. Essentially, implementing IDisposable() does two
things for you:
Allows you to get rid of any resources you grabbed directly from
the operating system behind .NET's back (unmanaged resources).
Allows you and your callers to release hefty .NET objects / .NET
objects that are holding precious resources in their grubby little
hands that you / your callers want released now.
Comment: By implementing IDisposable and thereby providing a Dispose() method, you are enabling a user of your class to release in a deterministic fashion any unmanaged resources that are held by an instance your class.
1 Link originally shared in Sachin Shanbhag's answer. Quoted material dated 2005-11-17. Note that I have lightly copy-edited the quoted content.

The basic difference between a managed and unmanaged resource is that the
garbage collector knows about all managed resources, at some point in time
the GC will come along and clean up all the memory and resources associated
with a managed object. The GC does not know about unmanaged resources, such
as files, stream and handles, so if you do not clean them up explicitly in
your code then you will end up with memory leaks and locked resources.
For more details - http://bytes.com/topic/c-sharp/answers/276059-what-unmanaged-resources

Managed resources are resources which can be freed up by the garbage collector and unmanaged resources cannot be freed up by the garbage collector for this purpose destructor is required.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.