Several AppDomains and native code

Several AppDomains and native code - c#

My C# application is using native code which is not thread safe.
I can run multiple processes of that native code, using inter-process communication to achieve concurrency.
My question is, can i use App Domains instead, so that several managed threads, each on a different App Domain, will call the native code and they will not interfere with each other?
The main goal is to prevent process seperation.

No, AppDomains are a pure managed code concept. It achieves isolation by keeping the managed object roots separate. One AppDomain cannot see the objects of another, makes it very safe to abort code and unload assemblies. Never an accident, it throws away all the data that might contain state.
Unmanaged code is completely agnostic of the GC heap and thus AppDomains, it will allocate in its data section and its own native heap (HeapAlloc). Such allocations are process-global. That makes a process the isolation boundary, you'd need a helper process that loads the DLL and talk to it with one of the .NET process interop mechanisms (socket, named pipe, memory-mapped file, remoting, WCF).
Technically you could create copies of the DLL, each with a different name. But that scales very poorly and the pinvoke is very awkward since you can't use [DllImport] anymore. You need a delegate declaration for each exported function and LoadLibrary() and GetProcAddress() to initialize the delegate objects.

Yes it can be done but you should seriously measure if effort is repaid by benefits.
Windows won't load multiple copies of an unmanaged DLL and unmanaged DLLs are loaded per-process (not per AppDomain). What you can do is to create multiple temporary copies of same DLL then load them with LoadLibrary().
Each one will be loaded per-process but they'll be separated from each other (so they'll be thread-safe). All this stuff can be tied inside a class that wraps unmanaged calls (LoadLibrary, FreeLibrary, GetProcAddress and invocation itself). It'll use less resources and it'll be faster than multiple processes but you'll have to drop DllImport usage.
The only benefit I see is that this will scale much better than multiple processes (because it uses less resources) of course if you reuse instances keeping a cache (it's harder to keep a process cache than an object cache).

Related

Manage to unmanaged transition from second AppDomain is very slow

The setup is as follows:
Main app domain loads a number of unmanaged C++ libraries from a C++/CLI assembly.
A second app domain loads those C++ libraries, but of course it is just getting hold of handles to the previously loaded libraries.
I run exactly the same managed C# code in both app domains that makes use of the unmanaged C++ libraries.
The C++/CLI assembly contains a mixture of managed and unmanged code.
When calling into the unmanaged C++ libraries from the main app domain the cost of the managed to unmanaged transition is fairly negligible compared to the overall cost of execution. However, from the second app domain that did not originally load the libraries the cost of the managed to unmanaged transition becomes substantial; orders of magnitude larger. I know this because I have run ANTS performance profiler and this is what it tell me.
There are some unmanaged static variables in the C++/CLI assembly, I've tried replacing them when a new app domain is created, but this doesn't change the performance. There would well be a bunch of hidden ones.
What could be going on here? Why should the transition in the other app domain be so slow? Is there anything obvious to try to improve things?
Some context: the code is running in a separate app domain for isolation and so that it can be torn down when things go wrong. It would be very good to keep it isolated from the main app domain. It would be possible to start a new process from which to run the unmanaged code, however this could be expensive as it may require a lot of data to be serialized and deserialized and the cost of bootstrapping everything in a new process is also fairly large. In the general case the cost of the serialization and the bootstrap will be much lower than the cost of the managed to unmanaged transition that I'm seeing, but there is a lot of messing around involved and it still slows things down a lot.

C# - Fundamental Misunderstanding about Garbage Collection and Managed and Unmanaged Resources [duplicate]

I want to know about unmanaged resources.
Can anyone please give me a basic idea?

Managed resources basically means "managed memory" that is managed by the garbage collector. When you no longer have any references to a managed object (which uses managed memory), the garbage collector will (eventually) release that memory for you.
Unmanaged resources are then everything that the garbage collector does not know about. For example:
Open files
Open network connections
Unmanaged memory
In XNA: vertex buffers, index buffers, textures, etc.
Normally you want to release those unmanaged resources before you lose all the references you have to the object managing them. You do this by calling Dispose on that object, or (in C#) using the using statement which will handle calling Dispose for you.
If you neglect to Dispose of your unmanaged resources correctly, the garbage collector will eventually handle it for you when the object containing that resource is garbage collected (this is "finalization"). But because the garbage collector doesn't know about the unmanaged resources, it can't tell how badly it needs to release them - so it's possible for your program to perform poorly or run out of resources entirely.
If you implement a class yourself that handles unmanaged resources, it is up to you to implement Dispose and Finalize correctly.

Some users rank open files, db connections, allocated memory, bitmaps, file streams etc. among managed resources, others among unmanaged. So are they managed or unmanaged?
My opinion is, that the response is more complex: When you open file in .NET, you probably use some built-in .NET class System.IO.File, FileStream or something else. Because it is a normal .NET class, it is managed. But it is a wrapper, which inside does the "dirty work" (communicates with the operating system using Win32 dlls, calling low level functions or even assembler instructions) which really open the file. And this is, what .NET doesn't know about, unmanaged.
But you perhaps can open the file by yourself using assembler instructions and bypass .NET file functions. Then the handle and the open file are unmanaged resources.
The same with the DB: If you use some DB assembly, you have classes like DbConnection etc., they are known to .NET and managed. But they wrap the "dirty work", which is unmanaged (allocate memory on server, establish connection with it, ...).
If you don't use this wrapper class and open some network socket by yourself and communicate with your own strange database using some commands, it is unmanaged.
These wrapper classes (File, DbConnection etc.) are managed, but they inside use unmanaged resources the same way like you, if you don't use the wrappers and do the "dirty work" by yourself. And therefore these wrappers DO implement Dispose/Finalize patterns. It is their responsibility to allow programmer to release unmanaged resources when the wrapper is not needed anymore, and to release them when the wrapper is garbage collected. The wrapper will be correctly garbage collected by garbage collector, but the unmanaged resources inside will be collected by using the Dispose/Finalize pattern.
If you don't use built-in .NET or 3rd party wrapper classes and open files by some assembler instructions etc. in your class, these open files are unmanaged and you MUST implement dispose/finalise pattern. If you don't, there will be memory leak, forever locked resource etc. even when you don't use it anymore (file operation complete) or even after you application terminates.
But your responsibility is also when using these wrappers. For those, which implement dispose/finalise (you recognize them, that they implement IDisposable), implement also your dispose/finalise pattern and Dispose even these wrappers or give them signal to release their unmanaged resources. If you don't, the resources will be after some indefinite time released, but it is clean to release it immediately (close the file immediately and not leaving it open and blocked for random several minutes/hours). So in your class's Dispose method you call Dispose methods of all your used wrappers.

Unmanaged resources are those that run outside the .NET runtime (CLR)(aka non-.NET code.) For example, a call to a DLL in the Win32 API, or a call to a .dll written in C++.

An "unmanaged resource" is not a thing, but a responsibility. If an object owns an unmanaged resource, that means that (1) some entity outside it has been manipulated in a way that may cause problems if not cleaned up, and (2) the object has the information necessary to perform such cleanup and is responsible for doing it.
Although many types of unmanaged resources are very strongly associated with various type of operating-system entities (files, GDI handles, allocated memory blocks, etc.) there is no single type of entity which is shared by all of them other than the responsibility of cleanup. Typically, if an object either has a responsibility to perform cleanup, it will have a Dispose method which instructs it to carry out all cleanup for which it is responsible.
In some cases, objects will make allowances for the possibility that they might be abandoned without anyone having called Dispose first. The GC allows objects to request notification that they've been abandoned (by calling a routine called Finalize), and objects may use this notification to perform cleanup themselves.
Terms like "managed resource" and "unmanaged resource" are, unfortunately, used by different people to mean different things; frankly think it's more useful to think in terms of objects as either not having any cleanup responsibility, having cleanup responsibility that will only be taken care of if Dispose is called, or having cleanup responsibility which should be taken care of via Dispose, but which can also be taken care of by Finalize.

The basic difference between a managed and unmanaged resource is that the
garbage collector knows about all managed resources, at some point in time
the GC will come along and clean up all the memory and resources associated
with a managed object. The GC does not know about unmanaged resources, such
as files, stream and handles, so if you do not clean them up explicitly in
your code then you will end up with memory leaks and locked resources.
Stolen from here, feel free to read the entire post.

Any resource for which memory is allocated in the .NET managed heap is a Managed resource. CLR is completly aware of this sort of memory and will do everything to make sure that it doesn't go orphaned. Anything else is unmanaged. For example interoping with COM, might create objects in the proces memory space, but CLR will not take care of it. In this case the managed object that makes calls across the managed boundry should own the responsibility for anything beyond it.

Let us first understand how VB6 or C++ programs (Non Dotnet applications) used to execute.
We know that computers only understand machine level code. Machine level code is also called as native or binary code. So, when we execute a VB6 or C++ program, the respective language compiler, compiles the respective language source code into native code, which can then be understood by the underlying operating system and hardware.
Native code (Unmanaged Code) is specific (native) to the operating system on which it is generated. If you take this compiled native code and try to run on another operating system it will fail. So the problem with this style of program execution is that, it is not portable from one platform to another platform.
Let us now understand, how a .Net program executes. Using dotnet we can create different types of applications. A few of the common types of .NET applications include Web, Windows, Console and Mobile Applications. Irrespective of the type of the application, when you execute any .NET application the following happens
The .NET application gets compiled into Intermediate language (IL). IL is also referred as Common Intermediate language (CIL) and Microsoft Intermediate language (MSIL). Both .NET and non .NET applications generate an assembly. Assemblies have an extension of .DLL or .EXE. For example if you compile a windows or Console application, you get a .EXE, where as when we compile a web or Class library project we get a .DLL. The difference between a .NET and NON .NET assembly is that, DOTNET Assembly is in intermediate language format where as NON DOTNET assembly is in native code format.
NON DOTNET applications can run directly on top of the operating system, where as DOTNET applications run on top of a virtual environment called as Common Language Runtime (CLR). CLR contains a component called Just In-Time Compiler (JIT), which will convert the Intermediate language into native code which the underlying operating system can understand.
So, in .NET the application execution consists of 2 steps
1. Language compiler, compiles the Source Code into Intermediate Language (IL)
2. JIT compiler in CLR converts, the IL into native code which can then be run on the underlying operating system.
Since, a .NET assembly is in Intermedaite Language format and not native code, .NET assemblies are portable to any platform, as long as the target platform has the Common Language Runtime (CLR). The target platform's CLR converts the Intermedaite Language into native code that the underlying operating system can understand. Intermediate Languge is also called as managed code. This is because CLR manages the code that runs inside it. For example, in a VB6 program, the developer is responsible for de-allocating the memory consumed by an object. If a programmer forgets to de-allocate memory, we may run into hard to detecct out of memory exceptions. On the other hand a .NET programmer need not worry about de-allocating the memory consumed by an object. Automatic memory management, also known as grabage collection is provided by CLR. Apart, from garbage collection, there are several other benefits provided by the CLR, which we will discuss in a later session. Since, CLR is managing and executing the Intermediate Language, it (IL) is also called as managed code.
.NET supports different programming languages like C#, VB, J#, and C++. C#, VB, and J# can only generate managed code (IL), where as C++ can generate both managed code (IL) and un-managed code (Native code).
The native code is not stored permanently anywhere, after we close the program the native code is thrown awaya. When we execute the program again, the native code gets generated again.
.NET program is similar to java program execution. In java we have byte codes and JVM (Java Virtual Machine), where as in .NET we Intermediate Language and CLR (Common Language Runtime)
This is provided from this link - He is a great tutor.
http://csharp-video-tutorials.blogspot.in/2012/07/net-program-execution-part-1.html

Unmanaged and managed resources are based on application domain.
From my understanding, unmanaged resource is everything that is used to make connection to outside of your application domain.
It could be HttpClient class that you resort to fetch data outside of your domain or a FileStream that helps you to read/write from/to a file.
we use Using block to dispose these kind of classes objects immediately after our work is done because GC in the first place care about inside the process resources not outside ones, although it will be disposed by the GC at the end.

.NET + pInvoked C++ Dynamic Link Library + Multithreading

Okay, I messed something up. I've written in C++ a DLL which I call from the managed code (C# .NET). The library works like diamonds and is blazingly fast.
My DLL uses its internal state i.e. allocates heaps of memory and uses myriad of variables which are not cleared off between the calls from .NET. Instead they stay there and C# code is aware of that (there is preprocessing and building data structures), actually this is required for performance.
So what is the problem?
I want to add multi-threading, effectively by allowing each .NET thread access his own DLL. Without storing any data between the calls it would be easy achievable with just one DLL.
But in my case, do I have to copy the *.DLL the number of times equal to the number of threads and write pInvoke wrapper for each file separately?? :O I mean [DllImport(...)] for each out of like 40 functions?
No way, there must be something more clever. Help?

Simply put you need to stop sharing variables between threads.
Your global variables are the problem. Instead you need each different thread to have its own copy of the state that persists between calls. Typically you would put this state into a structure of some sort, perhaps a struct. Then an initial call to the DLL would return a new instance of this structure. You then pass that structure back into the DLL every time you call a function that requires access to the persistent state. When you are done, call back into the DLL to deallocate the structure. You don't need to declare the structure in the managed code. You can just treat it as an opaque pointer. Use IntPtr.
Of course, perhaps you'd just be better off with a C++/CLI assembly.

Is the memory used by MATLAB via calls through .NET independent of the .NET application?

The reason I ask is that I have an application that (among other things) calls a MATLAB .NET component whenever data is written into a specific file. The component reads the file and creates an image out of the data contained within. This works fine.
However when I am using the underlying application to additionally process a "significant" amount of data and display the processed data in table, the call to MATLAB throws an out of memory exception, but only when I am processing this large amount of data.
Is this not an indication that the MATLAB process called will rely on the available memory of the application? I guess I just don't understand how the MATLAB memory works when being called from a .NET standpoint.
(I should also note that I call clear all before every call to the MATLAB function in an attempt to "start from scratch", but it fails regardless)

COM components built by Matlab Builder NE are in-process COM servers. This means that they are DLLs which are loaded into your application memory space. This means that the MCR, which is a kind of Matlab-Virtual-Machine is in your memory space.
I believe that .NET components should behave exactly the same.

It is entirely possible, and from what you've described, even likely that the MATLAB component is making use of unmanaged memory (memory that is not managed by the .NET garbage collector.) There is very little you can do about this, apart from ensuring that you only feed it expected data in expected quantities. You may also wish to create a support ticket with MATLAB, if you believe you are using it properly.

Never used MATLAB from C#, but as much I see it uses COM components to interact with the CLR world. You load the MATLAB unmanaged DLL's into your process memory heap. And considering that for CLR process on 32 bit machines you have approximately 1.2 GB memory space, so you go out of that available space.
Some interesting description of how the load of the unmanged COM component into the managed memory done, you can find here: Memory Management Of Unmanaged Component By CLR

What are app domains used for?

I understand roughly what an AppDomain is, however I don't fully understand the uses for an AppDomain.
I'm involved in a large server based C# / C++ application and I'm wondering how using AppDomains could improve stability / security / performance.
In particular:
I understand that a fault or fatal exception in one domain does not affect other app domains running in the same process - Does this also hold true for unmanaged / C++ exceptions, possibly even heap corruption or other memory issues.
How does inter-AppDomain communication work?
How is using AppDomains different from simply spawning many processes?

The basic use case for an AppDomain is in an environment that is hosting 3rd party code, so it will be necessary not just to load assemblies dynamically but also unload them.
There is no way to unload an assembly individually. So you have to create a separate AppDomain to house anything that might need to be unloaded. You can then trash and rebuild the whole AppDomain when necessary.
By the way, native code corrupting the heap cannot be protected against by any feature of the CLR. Ultimately the CLR is implemented natively and shares the same address space. So native code in the process can scribble all over the internals of the CLR! The only way to isolate badly behaved (i.e. most) native code is actual process isolation at the OS level. Launch mutiple .exe processes and have them communicate via some IPC mechanism.

I highly recommend CLR Via C# by Jeffrey Richter. In particular chapter 21 goes into good detail regarding the purpose and uses of AppDomains.
In answer to your points/question:
AppDomains will not protect your application from rogue unmanaged code. If this is an issue you will most likely need to use full process isolation provided by the OS.
Communication between AppDomains is performed using .NET remoting to enforce isolation. This can be via marshal by reference or marshal by value semantics, with a trade off between performance and flexibility.
AppDomains are a lightweight way of achieving process like isolation within managed code. AppDomains are considered lightweight because you can create multiple AppDomains within a single process and so they avoid the resource and performance overhead multiple OS processes. Also, a single thread can execute code in one AppDomain and then in another AppDomain as Windows knows nothing about AppDomains (see this by using using System.AppDomain.CurrentDomain)

Actually, it is not true that a critical fail in one AppDomain can't impact others. In the case of bad things, the best bet it to tear down the process. There are a few examples, but to be honest I haven't memorised them - I simply took a mental note "bad things = tear down process (check)"
Benefits of AppDomain:
you can unload an AppDomain; I use this for a system that compiles itself (meta-programming) based on data from the database - it can spin up an appdomain to host the new dll for a while, and then swap it safely when new data is available (and built)
comms between AppDomains are relatively cheap. IMO this is the only time I am happy to use remoting (although you still need to be really careful about the objects on the boundary to avoid bleeding references between them, causing "fusion" to load extra dlls into the primary AppDomain, causing a leak) - it is really easy too - just CreateInstanceAndUnwrap (or is it CreateInstanceFromAndUnwrap?).
vs spawing an extra process - you could go either way; but you don't need another exe for AppDomain work, and it is much easier to set up any comms that you need

I'm not claiming to be an expert on AppDomains, so my answer will not be all-encompassing. Perhaps I should start off by linking to a great introduction by a guy who does come off as somewhat an expert, and what does seem like covering all aspects of AppDomain usage.
My own main encounter with AppDomains has been in the security field. There, the greatest advantage I've found has been the ability to have a master domain run in high trust spawning several child domains with restricted permissions. By restricting permissions in high trust, without the use of app domains, the restricted processes would still have the permission to elevate their own privileges.

App Domain segregation strategy for running completely independent code modules, in order to address memory sharing and stability concerns, is more of an illusion than a reality.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.