I understand roughly what an AppDomain is, however I don't fully understand the uses for an AppDomain.
I'm involved in a large server based C# / C++ application and I'm wondering how using AppDomains could improve stability / security / performance.
In particular:
I understand that a fault or fatal exception in one domain does not affect other app domains running in the same process - Does this also hold true for unmanaged / C++ exceptions, possibly even heap corruption or other memory issues.
How does inter-AppDomain communication work?
How is using AppDomains different from simply spawning many processes?
The basic use case for an AppDomain is in an environment that is hosting 3rd party code, so it will be necessary not just to load assemblies dynamically but also unload them.
There is no way to unload an assembly individually. So you have to create a separate AppDomain to house anything that might need to be unloaded. You can then trash and rebuild the whole AppDomain when necessary.
By the way, native code corrupting the heap cannot be protected against by any feature of the CLR. Ultimately the CLR is implemented natively and shares the same address space. So native code in the process can scribble all over the internals of the CLR! The only way to isolate badly behaved (i.e. most) native code is actual process isolation at the OS level. Launch mutiple .exe processes and have them communicate via some IPC mechanism.
I highly recommend CLR Via C# by Jeffrey Richter. In particular chapter 21 goes into good detail regarding the purpose and uses of AppDomains.
In answer to your points/question:
AppDomains will not protect your application from rogue unmanaged code. If this is an issue you will most likely need to use full process isolation provided by the OS.
Communication between AppDomains is performed using .NET remoting to enforce isolation. This can be via marshal by reference or marshal by value semantics, with a trade off between performance and flexibility.
AppDomains are a lightweight way of achieving process like isolation within managed code. AppDomains are considered lightweight because you can create multiple AppDomains within a single process and so they avoid the resource and performance overhead multiple OS processes. Also, a single thread can execute code in one AppDomain and then in another AppDomain as Windows knows nothing about AppDomains (see this by using using System.AppDomain.CurrentDomain)
Actually, it is not true that a critical fail in one AppDomain can't impact others. In the case of bad things, the best bet it to tear down the process. There are a few examples, but to be honest I haven't memorised them - I simply took a mental note "bad things = tear down process (check)"
Benefits of AppDomain:
you can unload an AppDomain; I use this for a system that compiles itself (meta-programming) based on data from the database - it can spin up an appdomain to host the new dll for a while, and then swap it safely when new data is available (and built)
comms between AppDomains are relatively cheap. IMO this is the only time I am happy to use remoting (although you still need to be really careful about the objects on the boundary to avoid bleeding references between them, causing "fusion" to load extra dlls into the primary AppDomain, causing a leak) - it is really easy too - just CreateInstanceAndUnwrap (or is it CreateInstanceFromAndUnwrap?).
vs spawing an extra process - you could go either way; but you don't need another exe for AppDomain work, and it is much easier to set up any comms that you need
I'm not claiming to be an expert on AppDomains, so my answer will not be all-encompassing. Perhaps I should start off by linking to a great introduction by a guy who does come off as somewhat an expert, and what does seem like covering all aspects of AppDomain usage.
My own main encounter with AppDomains has been in the security field. There, the greatest advantage I've found has been the ability to have a master domain run in high trust spawning several child domains with restricted permissions. By restricting permissions in high trust, without the use of app domains, the restricted processes would still have the permission to elevate their own privileges.
App Domain segregation strategy for running completely independent code modules, in order to address memory sharing and stability concerns, is more of an illusion than a reality.
Related
I understand that an application domain forms:
an isolation boundary for security,
versioning,
reliability,
and unloading of managed code,
but so does a process
Can someone please help me understand the practical benefits of an application domain?
I assumed app domain provides you a container to load one version of an assembly but recently I discovered that multiple versions of strong key assembly can be loaded in an app domain.
My concept of application domain is still not clear. And I am struggling to understand why this concept was implemented when the concept of process is present.
Thank you.
I can't tell if you are talking in general or specifically .NET's AppDomain.
I am going to assume .NET's AppDomain and why it can be really useful when you need that isolation inside of a single process.
For instance:
Say you are dealing with a library that had certain worker classes and you have no choice, but to use those workers and can't modify the code. It's your job to build a Windows Service that manages said workers and makes sure they all stay up and running and need to work in parallel.
Easy enough right? Well, you hoped. It turns out your worker library is prone to throwing exceptions, uses a static configuration, and is generally just a real PITA.
You could try to launch them in their own process, but monitor them, you'll need to implement namedpipes or try to thoughtfully parse the STDIN and STDOUT of the process.
What else can you do? Well AppDomain actually solves this. I can spawn an AppDomain for each worker, give them their own configuration, they can't screw each other up by changing static properties because they are isolated, and on top of that, if the library bombs out and I failed to catch the exception, it doesn't bother the workers in their domain. And during all of this, I can still communicate with those workers easily.
Sadly, I have had to do this before
EDIT: Started to write this as a comment response, but got too large
Individual processes can work great in many scenarios, however, there are just times where they can become a pain. I am not saying one should use an AppDomain over another process. I think it's uncommon you would need a separate process or AppDomain, but once you need it, you'll definitely know.
The main problem I see with processes in the scenario I've given above is that processes have their own downfalls that are easier to mitigate with the AppDomain.
A process can go rogue, become unresponsive, and crash or be killed at any point.
If you're managing processes, you need to keep track of the process ID and monitor the status of it. IPCs are great, but it does take time to get proper communication going back and forth as needed.
As an example let's say your process just dies. What do you do? Depending on the mechanism you chose to monitor, maybe the communication thread died, perhaps the work finished and you still show it as "processing". What do you do?
Now what happens when you have 20 processes and your management app dies. You don't have any real information, all you have is 20 "myprocess.exe" and maybe now have to start parsing the command line arguments they were started with to see which workers you actually have. Obviously with an AppDomain all 20 would have died too, but did you really gain anything with the process? You still have to code the ability to recover, however, now you have to also code all of the recovery for your processes instead of just firing the workers back up.
As with anything in programming, there's 1,000 different ways to achieve the same goal. It's up to you to decide which solution you feel is most appropriate.
Some practical benefits using app domain:
Multiple app domains can be run in a process. You can also stop individual app domain without stopping the entire process. This alone drastically increases the server scalability.
Managing app domain life cycle is done programmatically by runtime hosts (you can override it as well). For processes & threads, you have to explicitly manage their life cycle. Initialization, execution, termination, inter-process/multithread communication is complex and that's why it's easier to defer that to CLR management.
Source: https://learn.microsoft.com/en-us/dotnet/framework/app-domains/application-domains
I essentially want to make an api for an application but I only want one instance of that dll to be running at one time.
So multiple applications also need to be able to use the DLL at the same time. As you would expect from a normal api.
However I want it to be the same instance of the dll that the different applications use. This is because of communication with hardware that I don't want to be able to overlap.
DLLs are usually loaded once per process, so if your application is guaranteed to only be running in single-instance mode, there's nothing else you have to do. Your single application instance will have only one loaded DLL.
Now, if you want to "share" a "single instance" of a DLL across applications, you will inevitably have to resort to a client-server architecture. Your DLL will have to be wrapped in a Windows Service, which would expose an HTTP (or WCF) API.
You can't do that as you intend to do. The best way to do this would be having a single process (a DLL is not a process) which receives and processes messages, and have your multiple clients use an API (this would be your DLL) that just sends messages to this process.
The intercommunication of those two processes (your single process and the clients sending or receiving the messages via your API) could be done in many ways, choose the one that suits you better (basically, any kind of client/server architecture, even if the clients and the server are running on the same hardware)
This is an XY-Problem type of question. Your actual requirement is serializing interactions with the underlying hardware, so they do not overlap. Perhaps this is what you should explicitly and specifically be asking about.
Your proposed solution is to have a DLL that is kind of an OS-wide singleton or something like that. This is actually what you are asking about; although it is still not the right approach, in my opinion. The OS is in charge of managing the lifetime of the DLL modules in each process. There are many aspects to this, but for one: most DLL instances are already being shared between every process (mostly code sections, resources and such - data, of course, is not shared by default).
To solve your actual problem, you would have to resort to multi-process synchronization techniques. In Windows, this works mostly through named kernel objects like mutexes, semaphores, events and such. Another approach would be to use IPC, as other folks have already mentioned in their respective answers, which then again would require in itself some kind of synchronization.
Maybe all this is already handled by that hardware's device driver. What would be the real scenarios in which overlapped interactions with the underlying hardware would have a negative impact on the applications that use your DLL?
To ensure you have loaded one DLL per machine, you would need to run a controlling assembly in separate AppDomain, then try creating named pipe for remoting (with IpcChannel) and claim hardware resources. IpcChannel will fail to create second time in the same environment. If you need high performance communication with your hardware, use remoting only for claiming and releasing resource by another assembly used by applications.
Mutex is one of solution for exclusive control of multiple processes.
***But Mutex will sometimes occur dead lock. Be careful if you use.
My C# application is using native code which is not thread safe.
I can run multiple processes of that native code, using inter-process communication to achieve concurrency.
My question is, can i use App Domains instead, so that several managed threads, each on a different App Domain, will call the native code and they will not interfere with each other?
The main goal is to prevent process seperation.
No, AppDomains are a pure managed code concept. It achieves isolation by keeping the managed object roots separate. One AppDomain cannot see the objects of another, makes it very safe to abort code and unload assemblies. Never an accident, it throws away all the data that might contain state.
Unmanaged code is completely agnostic of the GC heap and thus AppDomains, it will allocate in its data section and its own native heap (HeapAlloc). Such allocations are process-global. That makes a process the isolation boundary, you'd need a helper process that loads the DLL and talk to it with one of the .NET process interop mechanisms (socket, named pipe, memory-mapped file, remoting, WCF).
Technically you could create copies of the DLL, each with a different name. But that scales very poorly and the pinvoke is very awkward since you can't use [DllImport] anymore. You need a delegate declaration for each exported function and LoadLibrary() and GetProcAddress() to initialize the delegate objects.
Yes it can be done but you should seriously measure if effort is repaid by benefits.
Windows won't load multiple copies of an unmanaged DLL and unmanaged DLLs are loaded per-process (not per AppDomain). What you can do is to create multiple temporary copies of same DLL then load them with LoadLibrary().
Each one will be loaded per-process but they'll be separated from each other (so they'll be thread-safe). All this stuff can be tied inside a class that wraps unmanaged calls (LoadLibrary, FreeLibrary, GetProcAddress and invocation itself). It'll use less resources and it'll be faster than multiple processes but you'll have to drop DllImport usage.
The only benefit I see is that this will scale much better than multiple processes (because it uses less resources) of course if you reuse instances keeping a cache (it's harder to keep a process cache than an object cache).
DESCRIPTION
I am currently designing an architecture for a C# multiagent simulation, where agent actions are driven by many modules in their "brain", which may read sensors, vote for an action or send messages/queries to other modules (all of this is implemented through the exchange of messages).
Of course, modules can have a state.
Modules run in parallel: they have an update method which consumes messages and queries, and perform some sort of computation. The update methods return iterators, and have multiple yields in their bodies, so that I can schedule modules cooperatively. I do not use a single thread for each module because I expect to have hundreds to thousands of modules for every agent, which would lead to a huge amount of RAM occupied by thread overhead.
I would like these modules to behave like runtime plugins, so that while the simulation is running I can add new module classes and rewrite/debug existing ones, without ever stopping the simulation process, and then use those classes to add and remove modules from the agents' brains, or just let existing modules change their behaviours due to new implementations of their methods.
POSSIBLE SOLUTIONS
I have come up with a number of possible solutions in the last few days, which all have something disappointing:
Compile my modules into DLLs, load each in a different AppDomain and then use AppDomain.CreateInstanceFromAndUnwrap() to instantiate the module, which I would then cast to some IModule interface, shared between my simulation and the modules (and implemented by each module class). The interface would expose just the SendMessage, the Update and a few other members, common to all modules.
The problem with this solution is that calls between AppDomains are much slower than direct calls (within the same AppDomain).
Also, I don't know the overhead of AppDomains, but I suppose that they are not free, so having thousands could become a problem.
Use some scripting language for the modules, while keeping C# for the underlying engine, so that there is no assembly loading/unloading. Instead, I would host an execution context for the scripting language for each module.
My main concern is that I do not know a scripting language which is big (as in 'python, lua, ruby, js are big, Autoit and Euphoria are not') fast, embeddable into .NET and allows step by step execution (which I need in order to perform cooperative scheduling of module execution).
Another concern about this is that I suppose I'd have to use a runtime context for each module, which in turn would have massive overhead.
Lastly, I suppose a scripting language would be probably slower than C#, which would reduce performance.
Avoid unloading of assemblies, instead renaming/versioning them somehow, so that I can have a ton of different versions, then just use the latest one for each type.
I'm not even sure this is possible (due to omonimous types and namespaces)
Even if possible, it would be very memory-inefficient.
Do a transparent restart of the simulation, which means pausing the simulation (and execution of the scheduler of brains/modules), serializing everything (including every module), exiting the simulation, recompiling the code, starting the simulation again, deserializing everything, catching any exception raised due to the changes I made to the class and resuming execution.
This is a lot of work, so I consider it my last resort.
Also, this whole process would be very slow at some point, depending on number of modules and their sizes, making it impractical
I could overcome this last problem (the whole process in solution 4 becoming slow), by mixing solutions 3 and 4, loading many many assemblies with some form of versioning and performing a restart to clean up the mess every now and then. Yet, I would prefer something that doesn't interrupt the whole simulation just because I made a small change in a module class.
ACTUAL QUESTION
So here is my question(s): is there any other solution? Did I miss any workaround to the problems of those I found?
For example, is there some scripting language for .NET which satisfies my needs (solution #2)? Is versioning possible, in the way I vaguely described it(Solution #3)?
Or even, more simply: is .NET the wrong platform for this project? (I'd like to stick with it because C# is my main language, but I could see myself doing this in Python or something alike if necessary)
Did you consider Managed Extensibility Framework?
I'm working in a simulation system that works in a very similar way, treating agent modules as plugins.
I created a Plugin Manager that handles every Domain loading related things, checking plugin validity in a dummy domain and then hotloading it in the engine domain.
Using AppDomain is where you can get the full control, and you can reduce process time by running your Plugin Manager's tasks in parallel.
AppDomains aren't cost free, but you can handle it using only two (or three if you need more isolation between validation and execution domains).
Once a plugin file is validated you can load it in the very main process at any time, creating a shadow copy in any domain's probing path (or in dynamic path if set) and targeting it instead of original file is useful to check versioning and updates.
Using a domain for validation and another to execution may require a swap context, who takes care of previous version instances while updating.
Keeping a time scheduled task to check new plugins and new versions, and then block plugin module usage, swap files, reload, and unblock, reinstancing new versions from previous if necessary.
I have code to implement GoF's proxy pattern in C#. The code has MathProxy for calculating arithmetic functions.
The left side example is one implementation, and the right side is the better one for C# (.NET) with AppDomain.
What benefits can I expect using AppDomain especially with Proxy Pattern?
public MathProxy()
{
// Create Math instance in a different AppDomain
var ad = AppDomain.CreateDomain("MathDomain", null, null);
var o = ad.CreateInstance(
"DoFactory.GangOfFour.Proxy.NETOptimized",
"DoFactory.GangOfFour.Proxy.NETOptimized.Math");
_math = (Math)o.Unwrap();
}
Any given Windows process that hosts the CLR can have one or more application domains defined that contain the executable code, data, metadata structures, and
resources. In addition to the protection guarantees built in by the process, an application domain further introduces the following guarantees:
Faulty code within an application domain cannot adversely affect code running in a different application domain within the same process.
Code running within an application domain cannot directly access resources in a different application domain.
Code-specific configurations can be configured on a per application domain basis. For example, you can configure security-specific settings on a per application
domain basis.
AppDomain provides isolation boundary in CLR same as a process provides a isolation boundary at operating system level
Difference between AppDomain and Process :
Process:
When a user starts an application, memory and a whole host of resources are allocated for the application. The physical separation of this memory and resources is called a process. An application may launch more than one process. It's important to note that applications and processes are not the same thing at all.
AppDomain :
Microsoft also introduced an extra layer of abstraction/isolation called an AppDomain. The AppDomain is not a physical isolation, but rather a logic isolation within the process. Since more than one AppDomain can exist in a process, we get some benefits. For example, until we had an AppDomain, processes that needed to access each other's data had to use a proxy, which introduced extra code and overhead. By using an AppDomain, it is possible to launch several applications within the same process. The same sort of isolation that exists with processes is also available for AppDomains. Threads can execute across application domains without the overhead of inter process communication. This is all encapsulated within the AppDomain class. Any time a namespace is loaded in an application, it is loaded into an AppDomain. The AppDomain used will be the same as the calling code unless otherwise specified. An AppDomain may or may not contain threads, which is different to processes.
Why You Should Use AppDomains : Read Post
Good use case scenario for AppDomains :
"NUnit was written by .NET Framework experts. If you look at the NUnit source, you see that they knew how to dynamically create AppDomains and load assemblies into these domains. Why is a dynamic AppDomain important? What the dynamic AppDomain lets NUnit do is to leave NUnit open, while permitting you to compile, test, modify, recompile, and retest code without ever shutting down. You can do this because NUnit shadow copies your assemblies, loads them into a dynamic domain, and uses a file watcher to see if you change them. If you do change your assemblies, then NUnit dumps the dynamic AppDomain, recopies the files, creates a new AppDomain, and is ready to go again."
Entire info borrowed from Sacha Barbers article