The setup is as follows:
Main app domain loads a number of unmanaged C++ libraries from a C++/CLI assembly.
A second app domain loads those C++ libraries, but of course it is just getting hold of handles to the previously loaded libraries.
I run exactly the same managed C# code in both app domains that makes use of the unmanaged C++ libraries.
The C++/CLI assembly contains a mixture of managed and unmanged code.
When calling into the unmanaged C++ libraries from the main app domain the cost of the managed to unmanaged transition is fairly negligible compared to the overall cost of execution. However, from the second app domain that did not originally load the libraries the cost of the managed to unmanaged transition becomes substantial; orders of magnitude larger. I know this because I have run ANTS performance profiler and this is what it tell me.
There are some unmanaged static variables in the C++/CLI assembly, I've tried replacing them when a new app domain is created, but this doesn't change the performance. There would well be a bunch of hidden ones.
What could be going on here? Why should the transition in the other app domain be so slow? Is there anything obvious to try to improve things?
Some context: the code is running in a separate app domain for isolation and so that it can be torn down when things go wrong. It would be very good to keep it isolated from the main app domain. It would be possible to start a new process from which to run the unmanaged code, however this could be expensive as it may require a lot of data to be serialized and deserialized and the cost of bootstrapping everything in a new process is also fairly large. In the general case the cost of the serialization and the bootstrap will be much lower than the cost of the managed to unmanaged transition that I'm seeing, but there is a lot of messing around involved and it still slows things down a lot.
Related
I want to create a wrapper of realtime face analysis sdk located here http://face.ci2cv.net/. I want to know that when I will create its wrapper using dllImport, will it effect the speed of the library ?
Probably not significantly, but it depends on how much library interacts with managed code.
The performance of unmanaged code should not be affected by the CLR. However, calls between the CLR and unmanaged code (P/Invoke calls (CLR-to-unmanaged) and reverse P/Invoke calls (unmanaged-to-CLR)) do have some overhead, particularly around argument and return value marshaling. Passing huge structures, arrays, or strings between the two often require blitting or more complex marshaling, and both take time to process.
So, if the library spends a lot of time churning in unmanaged land without interacting with any CLR code, performance should not be impacted. If you're having to make a lot of calls in and out over a short period of time, you will likely notice a decrease in performance compared with making those same library calls in a native binary.
My C# application is using native code which is not thread safe.
I can run multiple processes of that native code, using inter-process communication to achieve concurrency.
My question is, can i use App Domains instead, so that several managed threads, each on a different App Domain, will call the native code and they will not interfere with each other?
The main goal is to prevent process seperation.
No, AppDomains are a pure managed code concept. It achieves isolation by keeping the managed object roots separate. One AppDomain cannot see the objects of another, makes it very safe to abort code and unload assemblies. Never an accident, it throws away all the data that might contain state.
Unmanaged code is completely agnostic of the GC heap and thus AppDomains, it will allocate in its data section and its own native heap (HeapAlloc). Such allocations are process-global. That makes a process the isolation boundary, you'd need a helper process that loads the DLL and talk to it with one of the .NET process interop mechanisms (socket, named pipe, memory-mapped file, remoting, WCF).
Technically you could create copies of the DLL, each with a different name. But that scales very poorly and the pinvoke is very awkward since you can't use [DllImport] anymore. You need a delegate declaration for each exported function and LoadLibrary() and GetProcAddress() to initialize the delegate objects.
Yes it can be done but you should seriously measure if effort is repaid by benefits.
Windows won't load multiple copies of an unmanaged DLL and unmanaged DLLs are loaded per-process (not per AppDomain). What you can do is to create multiple temporary copies of same DLL then load them with LoadLibrary().
Each one will be loaded per-process but they'll be separated from each other (so they'll be thread-safe). All this stuff can be tied inside a class that wraps unmanaged calls (LoadLibrary, FreeLibrary, GetProcAddress and invocation itself). It'll use less resources and it'll be faster than multiple processes but you'll have to drop DllImport usage.
The only benefit I see is that this will scale much better than multiple processes (because it uses less resources) of course if you reuse instances keeping a cache (it's harder to keep a process cache than an object cache).
In the midst of asking about manually managing CLR memory, I realized I know very little.
I'm aware that the CLR will place a 'cookie' on the stack when you exit a managed context, so that the Garbage Collector won't trample your memory space; however, in everything I've read the assumption is that you are calling some library written in C.
I want to an entire write layer of my application in C#, outside of the managed context, to manage data at a low level. Then, I want to access this layer from a managed layer.
In this case, will my Unmanaged C# code compile to IL and be run on the CLR? How does this work?
I assume this is related to the same C# database project you mentioned in the question.
It is technically possible to implement an entire write layers in C/C++ or any other language. And it is technically possible to have everything else in C#. I am currently working on an application that uses unmanaged code for some high-performance low level stuff and C# for business logic and upper level management.
However, the complexity of the task shall not be underestimated. The typical way to do this, is to design a contract that both parties can understand. The contract will be exposed to the managed language and managed language will trigger calls to the native application. If you have ever tried calling a C++ method from C# you will get the idea... Plus every call to unmanaged code has quite significant performance overhead, which may kill the whole idea of low level performance.
If you really interested in high-performance relational databases, then use single low level language.
If you want to have a naive, but fully working implementation of a database, just use C#. Don't mix these two languages unless you fully understand the complexity. See Raven DB - a document based NoSQL databases that is fully built in C# only.
Will my Unmanaged C# code compile to IL and be run on the CLR?
No, there is no such thing as unmanaged C#. C# code will be always compiled into the IL code and executed by CLR. It is the case of managed code calling unmanaged code. Unmanaged code can be implemented in several languages C/C++/Assembly etc, but CLR will have no idea of what is happening in that code.
Update from a comment. There is a tool (ngen.exe) that can compile C# directly into native architecture specific code. This tool is designed to improve performance of the managed application by removing JIT-compilation stage and putting native code directly into the executable image or library. This code, however, is still "managed" by the CLR host - memory allocation and collection, managed threading, application domains, exception handling, security and all other aspects are still controlled by the CLR. So even though C# can technically be compiled into native code, this code is not running as a standalone native image.
How does this work?
Managed code interoperate with unmanaged code. There are couple of ways to do this:
Through the code via .Net Interop. This is relatively fast but looks a bit ugly in code (plus it is hard to maintain/test) (good article with C#/C/Assembly samples)
A much much slower approach, but more open to other languages: web wervices (SOAP, WS, REST and company), queueing (such as MSMQ, NServiceBus and others), also (possibly) interprocess communication. So unmanaged process sits on one end and a managed application sits on the other one.
I know this is a C# question, but if you are comfortable with C++, C++/CLI might be an option worth considering.
It allows you to selectively compile portions of your C++ code to either a managed or an unmanaged context - However be aware that code that interacts with CLR types MUST run in a managed context.
I'm not aware of the runtime cost of transitioning from managed-context to unmanaged-context and viceversa from within the C++ code, but I assume it must be similar to the cost of calling a native method via .net Interop from C#, which as #oleksii already pointed out, is expensive. In my experience this has really paid off if you need to interact frequently with native C or C++ libraries - IMHO it is much easier to call them from within a C++/CLI project rather than writing the required .net Interop interfaces in C#.
See this question for a litte bit of information on how it is done.
The reason I ask is that I have an application that (among other things) calls a MATLAB .NET component whenever data is written into a specific file. The component reads the file and creates an image out of the data contained within. This works fine.
However when I am using the underlying application to additionally process a "significant" amount of data and display the processed data in table, the call to MATLAB throws an out of memory exception, but only when I am processing this large amount of data.
Is this not an indication that the MATLAB process called will rely on the available memory of the application? I guess I just don't understand how the MATLAB memory works when being called from a .NET standpoint.
(I should also note that I call clear all before every call to the MATLAB function in an attempt to "start from scratch", but it fails regardless)
COM components built by Matlab Builder NE are in-process COM servers. This means that they are DLLs which are loaded into your application memory space. This means that the MCR, which is a kind of Matlab-Virtual-Machine is in your memory space.
I believe that .NET components should behave exactly the same.
It is entirely possible, and from what you've described, even likely that the MATLAB component is making use of unmanaged memory (memory that is not managed by the .NET garbage collector.) There is very little you can do about this, apart from ensuring that you only feed it expected data in expected quantities. You may also wish to create a support ticket with MATLAB, if you believe you are using it properly.
Never used MATLAB from C#, but as much I see it uses COM components to interact with the CLR world. You load the MATLAB unmanaged DLL's into your process memory heap. And considering that for CLR process on 32 bit machines you have approximately 1.2 GB memory space, so you go out of that available space.
Some interesting description of how the load of the unmanged COM component into the managed memory done, you can find here: Memory Management Of Unmanaged Component By CLR
I understand roughly what an AppDomain is, however I don't fully understand the uses for an AppDomain.
I'm involved in a large server based C# / C++ application and I'm wondering how using AppDomains could improve stability / security / performance.
In particular:
I understand that a fault or fatal exception in one domain does not affect other app domains running in the same process - Does this also hold true for unmanaged / C++ exceptions, possibly even heap corruption or other memory issues.
How does inter-AppDomain communication work?
How is using AppDomains different from simply spawning many processes?
The basic use case for an AppDomain is in an environment that is hosting 3rd party code, so it will be necessary not just to load assemblies dynamically but also unload them.
There is no way to unload an assembly individually. So you have to create a separate AppDomain to house anything that might need to be unloaded. You can then trash and rebuild the whole AppDomain when necessary.
By the way, native code corrupting the heap cannot be protected against by any feature of the CLR. Ultimately the CLR is implemented natively and shares the same address space. So native code in the process can scribble all over the internals of the CLR! The only way to isolate badly behaved (i.e. most) native code is actual process isolation at the OS level. Launch mutiple .exe processes and have them communicate via some IPC mechanism.
I highly recommend CLR Via C# by Jeffrey Richter. In particular chapter 21 goes into good detail regarding the purpose and uses of AppDomains.
In answer to your points/question:
AppDomains will not protect your application from rogue unmanaged code. If this is an issue you will most likely need to use full process isolation provided by the OS.
Communication between AppDomains is performed using .NET remoting to enforce isolation. This can be via marshal by reference or marshal by value semantics, with a trade off between performance and flexibility.
AppDomains are a lightweight way of achieving process like isolation within managed code. AppDomains are considered lightweight because you can create multiple AppDomains within a single process and so they avoid the resource and performance overhead multiple OS processes. Also, a single thread can execute code in one AppDomain and then in another AppDomain as Windows knows nothing about AppDomains (see this by using using System.AppDomain.CurrentDomain)
Actually, it is not true that a critical fail in one AppDomain can't impact others. In the case of bad things, the best bet it to tear down the process. There are a few examples, but to be honest I haven't memorised them - I simply took a mental note "bad things = tear down process (check)"
Benefits of AppDomain:
you can unload an AppDomain; I use this for a system that compiles itself (meta-programming) based on data from the database - it can spin up an appdomain to host the new dll for a while, and then swap it safely when new data is available (and built)
comms between AppDomains are relatively cheap. IMO this is the only time I am happy to use remoting (although you still need to be really careful about the objects on the boundary to avoid bleeding references between them, causing "fusion" to load extra dlls into the primary AppDomain, causing a leak) - it is really easy too - just CreateInstanceAndUnwrap (or is it CreateInstanceFromAndUnwrap?).
vs spawing an extra process - you could go either way; but you don't need another exe for AppDomain work, and it is much easier to set up any comms that you need
I'm not claiming to be an expert on AppDomains, so my answer will not be all-encompassing. Perhaps I should start off by linking to a great introduction by a guy who does come off as somewhat an expert, and what does seem like covering all aspects of AppDomain usage.
My own main encounter with AppDomains has been in the security field. There, the greatest advantage I've found has been the ability to have a master domain run in high trust spawning several child domains with restricted permissions. By restricting permissions in high trust, without the use of app domains, the restricted processes would still have the permission to elevate their own privileges.
App Domain segregation strategy for running completely independent code modules, in order to address memory sharing and stability concerns, is more of an illusion than a reality.