Abort the execution of external managed code(plugin) - c#

In our application, we have a plugin system that takes compilers in the form of plugins, written specifically for our software. The plugins are assembly files, imported using MEF. The user is then able to select a compiler, and use it to compile a piece of source code.
A problem arises, if there is a faulty plugin added to the system. Upon execution, it could potentially hang the whole service, requiring a complete restart. I am looking for a way to stop the execution of a compiler after a certain timeout, without the need for TaskCancellation handling in the plugins, since you can never be sure that it will implemented properly, if at all. Of course, I also need to be sure that proper clean up is done after aborting, since failure to do that, could also crash or slow down the system.
I do understand that forcefully terminating managed code could potentially lead to a lot of problems and that is the reason I am asking for a good way to do it.

Related

C# What exactly is application domain?

I understand that an application domain forms:
an isolation boundary for security,
versioning,
reliability,
and unloading of managed code,
but so does a process
Can someone please help me understand the practical benefits of an application domain?
I assumed app domain provides you a container to load one version of an assembly but recently I discovered that multiple versions of strong key assembly can be loaded in an app domain.
My concept of application domain is still not clear. And I am struggling to understand why this concept was implemented when the concept of process is present.
Thank you.
I can't tell if you are talking in general or specifically .NET's AppDomain.
I am going to assume .NET's AppDomain and why it can be really useful when you need that isolation inside of a single process.
For instance:
Say you are dealing with a library that had certain worker classes and you have no choice, but to use those workers and can't modify the code. It's your job to build a Windows Service that manages said workers and makes sure they all stay up and running and need to work in parallel.
Easy enough right? Well, you hoped. It turns out your worker library is prone to throwing exceptions, uses a static configuration, and is generally just a real PITA.
You could try to launch them in their own process, but monitor them, you'll need to implement namedpipes or try to thoughtfully parse the STDIN and STDOUT of the process.
What else can you do? Well AppDomain actually solves this. I can spawn an AppDomain for each worker, give them their own configuration, they can't screw each other up by changing static properties because they are isolated, and on top of that, if the library bombs out and I failed to catch the exception, it doesn't bother the workers in their domain. And during all of this, I can still communicate with those workers easily.
Sadly, I have had to do this before
EDIT: Started to write this as a comment response, but got too large
Individual processes can work great in many scenarios, however, there are just times where they can become a pain. I am not saying one should use an AppDomain over another process. I think it's uncommon you would need a separate process or AppDomain, but once you need it, you'll definitely know.
The main problem I see with processes in the scenario I've given above is that processes have their own downfalls that are easier to mitigate with the AppDomain.
A process can go rogue, become unresponsive, and crash or be killed at any point.
If you're managing processes, you need to keep track of the process ID and monitor the status of it. IPCs are great, but it does take time to get proper communication going back and forth as needed.
As an example let's say your process just dies. What do you do? Depending on the mechanism you chose to monitor, maybe the communication thread died, perhaps the work finished and you still show it as "processing". What do you do?
Now what happens when you have 20 processes and your management app dies. You don't have any real information, all you have is 20 "myprocess.exe" and maybe now have to start parsing the command line arguments they were started with to see which workers you actually have. Obviously with an AppDomain all 20 would have died too, but did you really gain anything with the process? You still have to code the ability to recover, however, now you have to also code all of the recovery for your processes instead of just firing the workers back up.
As with anything in programming, there's 1,000 different ways to achieve the same goal. It's up to you to decide which solution you feel is most appropriate.
Some practical benefits using app domain:
Multiple app domains can be run in a process. You can also stop individual app domain without stopping the entire process. This alone drastically increases the server scalability.
Managing app domain life cycle is done programmatically by runtime hosts (you can override it as well). For processes & threads, you have to explicitly manage their life cycle. Initialization, execution, termination, inter-process/multithread communication is complex and that's why it's easier to defer that to CLR management.
Source: https://learn.microsoft.com/en-us/dotnet/framework/app-domains/application-domains

Plugin architecture for .NET multi-agent simulation (runtime load/unload)

DESCRIPTION
I am currently designing an architecture for a C# multiagent simulation, where agent actions are driven by many modules in their "brain", which may read sensors, vote for an action or send messages/queries to other modules (all of this is implemented through the exchange of messages).
Of course, modules can have a state.
Modules run in parallel: they have an update method which consumes messages and queries, and perform some sort of computation. The update methods return iterators, and have multiple yields in their bodies, so that I can schedule modules cooperatively. I do not use a single thread for each module because I expect to have hundreds to thousands of modules for every agent, which would lead to a huge amount of RAM occupied by thread overhead.
I would like these modules to behave like runtime plugins, so that while the simulation is running I can add new module classes and rewrite/debug existing ones, without ever stopping the simulation process, and then use those classes to add and remove modules from the agents' brains, or just let existing modules change their behaviours due to new implementations of their methods.
POSSIBLE SOLUTIONS
I have come up with a number of possible solutions in the last few days, which all have something disappointing:
Compile my modules into DLLs, load each in a different AppDomain and then use AppDomain.CreateInstanceFromAndUnwrap() to instantiate the module, which I would then cast to some IModule interface, shared between my simulation and the modules (and implemented by each module class). The interface would expose just the SendMessage, the Update and a few other members, common to all modules.
The problem with this solution is that calls between AppDomains are much slower than direct calls (within the same AppDomain).
Also, I don't know the overhead of AppDomains, but I suppose that they are not free, so having thousands could become a problem.
Use some scripting language for the modules, while keeping C# for the underlying engine, so that there is no assembly loading/unloading. Instead, I would host an execution context for the scripting language for each module.
My main concern is that I do not know a scripting language which is big (as in 'python, lua, ruby, js are big, Autoit and Euphoria are not') fast, embeddable into .NET and allows step by step execution (which I need in order to perform cooperative scheduling of module execution).
Another concern about this is that I suppose I'd have to use a runtime context for each module, which in turn would have massive overhead.
Lastly, I suppose a scripting language would be probably slower than C#, which would reduce performance.
Avoid unloading of assemblies, instead renaming/versioning them somehow, so that I can have a ton of different versions, then just use the latest one for each type.
I'm not even sure this is possible (due to omonimous types and namespaces)
Even if possible, it would be very memory-inefficient.
Do a transparent restart of the simulation, which means pausing the simulation (and execution of the scheduler of brains/modules), serializing everything (including every module), exiting the simulation, recompiling the code, starting the simulation again, deserializing everything, catching any exception raised due to the changes I made to the class and resuming execution.
This is a lot of work, so I consider it my last resort.
Also, this whole process would be very slow at some point, depending on number of modules and their sizes, making it impractical
I could overcome this last problem (the whole process in solution 4 becoming slow), by mixing solutions 3 and 4, loading many many assemblies with some form of versioning and performing a restart to clean up the mess every now and then. Yet, I would prefer something that doesn't interrupt the whole simulation just because I made a small change in a module class.
ACTUAL QUESTION
So here is my question(s): is there any other solution? Did I miss any workaround to the problems of those I found?
For example, is there some scripting language for .NET which satisfies my needs (solution #2)? Is versioning possible, in the way I vaguely described it(Solution #3)?
Or even, more simply: is .NET the wrong platform for this project? (I'd like to stick with it because C# is my main language, but I could see myself doing this in Python or something alike if necessary)
Did you consider Managed Extensibility Framework?
I'm working in a simulation system that works in a very similar way, treating agent modules as plugins.
I created a Plugin Manager that handles every Domain loading related things, checking plugin validity in a dummy domain and then hotloading it in the engine domain.
Using AppDomain is where you can get the full control, and you can reduce process time by running your Plugin Manager's tasks in parallel.
AppDomains aren't cost free, but you can handle it using only two (or three if you need more isolation between validation and execution domains).
Once a plugin file is validated you can load it in the very main process at any time, creating a shadow copy in any domain's probing path (or in dynamic path if set) and targeting it instead of original file is useful to check versioning and updates.
Using a domain for validation and another to execution may require a swap context, who takes care of previous version instances while updating.
Keeping a time scheduled task to check new plugins and new versions, and then block plugin module usage, swap files, reload, and unblock, reinstancing new versions from previous if necessary.

Running a .Net application in a sandbox

Over the months, I've developed a personal tool that I'm using to compile C# 3.5 Xaml projects online. Basically, I'm compiling with the CodeDom compiler. I'm thinking about making it public, but the problem is that it is -very-very- easy to do anything on the server with this tool.
The reason I want to protect my server is because there's a 'Run' button to test and debug the app (in screenshot mode).
Is this possible to run an app in a sandbox - in other words, limiting memory access, hard drive access and BIOS access - without having to run it in a VM? Or should I just analyze every code, or 'disable' the Run mode?
Spin up an AppDomain, load assemblies in it, look for an interface you control, Activate up the implementing type, call your method. Just don't let any instances cross that AppDomain barrier (including exceptions!) that you don't 100% control.
Controlling the security policies for your external-code AppDomain is a bit much for a single answer, but you can check this link on MSDN or just search for "code access security msdn" to get details about how to secure this domain.
Edit: There are exceptions you cannot stop, so it is important to watch for them and record in some manner the assemblies that caused the exception so you will not load them again.
Also, it is always better to inject into this second AppDomain a type that you will then use to do all loading and execution. That way you are ensured that no type (that won't bring down your entire application) will cross any AppDomain boundary. I've found it is useful to define a type that extends MarshalByRefObject that you call methods on that executes insecure code in the second AppDomain. It should never return an unsealed type that isn't marked Serializable across the boundary, either as a method parameter or as a return type. As long as you can accomplish this you are 90% of the way there.

I've found a bug in the JIT/CLR - now how do I debug or reproduce it?

I have a computationally-expensive multi-threaded C# app that seems to crash consistently after 30-90 minutes of running. The error it gives is
The runtime has encountered a fatal error. The address of the error was at 0xec37ebae, on thread 0xbcc. The error code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.
(0xc0000005 is the error-code for Access Violation)
My app does not invoke any native code, or use any unsafe blocks, or even any non-CLS compliant types like uint. In fact, the line of code that the debugger says caused the crash is
overallLength += distanceTravelled;
Where both values are of type double
Given all this, I believe the crash must be due to a bug in the compiler or CLR or JIT. I'd like to figure out what causes it, or at the very least write a smaller reproduction to send into Microsoft, but I have no idea where to even begin. I've never had to view the CIL-binary, or the compiled JIT output, or the native stacktrace (there is no managed stacktrace at the time of the crash), so I'm not sure how. I can't even figure out how to view the state of all the variables at the time of the crash (VS unfortunately won't tell me like it does after managed-exceptions, and outputting them to console/a file would slow down the app 1000-fold, which is obviously not an option).
So, how do I go about debugging this?
[Edit] Compiled under VS 2010 SP1, running latest version of .Net 4.0 Client Profile. Apparently it's ".Net 4.0C/.Net 4.0E, .Net CLR 1.1.4322"
I'd like to figure out what causes it, or at the very least write a smaller reproduction to send into Microsoft, but I have no idea where to even begin.
"Smaller reproduction" definitely sounds like a great idea here... even if "smaller" won't mean "quicker to reproduce".
Before you even start, try to reproduce the error on another machine. If you can't reproduce it on another machine, that suggests a whole different set of tests to do - hardware, installation etc.
Also, check you're on the latest version of everything. It would be annoying to spend days debugging this (which is likely, I'm afraid) and then end up with a response of "Yes, we know about this - it was a bug in .NET 4 which was fixed in .NET 4.5" for example. If you can reproduce it on a variety of framework versions, that would be even better :)
Next, cut out everything you can in the program:
Does it have a user interface at all? If possible, remove that.
Does it use a database? See if you can remove all database access: definitely any output which isn't used later, and ideally input too. If you can hard code the input within the app, that would be ideal - but if not, files are simpler for reproductions than database access.
Is it data-sensitive? Again, without knowing much about the app it's hard to know whether this is useful, but assuming it's processing a lot of data, can you use a binary search to find a relatively small amount of data which causes the problem?
Does it have to be multi-threaded? If you can remove all the threading, obviously that may well then take much longer to reproduce the problem - but does it still happen at all?
Try removing bits of business logic: if your app is componentized appropriately, you can probably fake out whole significant components by first creating a stub implementation, and then simply removing the calls.
All of this will gradually reduce the size of the app until it's more manageable. At each step, you'll need to run the app again until it either crashes or you're convinced it won't crash. If you have a lot of machines available to you, that should help...
tl;dr Make sure you're compiling to .Net 4.5
This sounds suspiciously like the same error found here. From the MSDN page:
This bug can be encountered when the Garbage Collector is freeing and compacting memory. The error can happen when the Concurrent Garbage Collection is enabled and a certain combination of foreground Garbage Collection and background Garbage Collection occurs. When this situation happens you will see the same call stack over and over. On the heap you will see one free object and before it ends you will see another free object corrupting the heap.
The fix is to compile to .Net 4.5. If for some reason you can't do this, you can also disable concurrent garbage collection by disabling gcConcurrent in the app.config file:
<configuration>
<runtime>
<gcConcurrent enabled="false"/>
</runtime>
</configuration>
Or just compile to x86.
WinDbg is your friend:
http://blogs.msdn.com/b/tess/archive/2006/02/09/net-crash-managed-heap-corruption-calling-unmanaged-code.aspx
http://www.codeproject.com/Articles/23589/Get-Started-Debugging-Memory-Related-Issues-in-Net
http://www.codeproject.com/Articles/22245/Quick-start-to-using-WinDbg
Download Debug Diagnostic Tool v1.2
Run program
Add Rule "Crash"
Select "Specific Process"
on page Advanced Configuration set your exception if you know on which exception it fails or just leave this page as is
Set userdump location
Now wait for process to crash, log file is created by DebugDiag. Now activate tab Advanced Analysis, select Crash/Hang Analyzers in top list and dump file in lower list and hit Start Analysis. This will generate html report for you. Hopes you found usefull info in that report. If you have problem with analyze, upload html report somewhere and place url here so we can focus on it.
My app does not invoke any native code, or use any unsafe blocks, or
even any non-CLS compliant types like uint
You may think this, but threading, synchronization via semaphore, mutex it any handles all are native. .net is a layer over operating system, .net itself does not support pure clr code for multithreading apps, this is because OS already does it.
Most likely this is thread synchronization error. Probably multiple threads are trying to access shared resource like file etc that is outside clr boundary.
You may think you aren't accessing com etc, but when you call certain API like get desktop folder path etc it is called through shell com API.
You have following two options,
Publish your code so that we can review the bottleneck
Redesign your app using .net parallel threading framework, which includes variety of algorithms requiring CPU intensive operations.
Most likely programs fail after certain period of time as collections grow up and operations fail to execute before other thread interfere. For example, producer consumer problem, you will not notice any problem till producer will become slower or fail to finish its operation before consumer kicks in.
Bug in clr is rare, because clr is very stable. But poorly written code may lead error to appear as bug in clr. Clr can not and will never detect whether the bug is in your code or in clr itself.
Did you run a memory test for your machine as the one time I had comparable symptoms one of my dimms turned out to be faulty (a very good memorytester is included in Win7; http://www.tomstricks.com/how-to-test-your-ram-or-memory-with-windows-memory-diagnostic-tool-in-windows-7/)
It might also be a heating/throttling issue if your CPU gets too hot after this period of time. Although that would happen sooner imho.
There should be a dumpfile that you can analyze. If you never did this find someone who did, or send that to microsoft
I will suggest you open a support case via http://support.microsoft.com immediately, as the support guys can show you how to collect the necessary information.
Generally speaking, like #paulsm4 and #psulek said, you can utilize WinDbg or Debug Diag to capture crash dumps of the process, and within it, all necessary information is embedded. However, if this is the very first time you use those tools, you might be puzzled. Microsoft support team can provide you step by step guidance on them, or they can even set up a Live Meeting session with you to capture the data, as the program crashes so often.
Once you are familiar with the tools, in the future you can perform similar troubleshooting more easily,
http://blogs.msdn.com/b/lexli/archive/2009/08/23/when-the-application-program-crashes-on-windows.aspx
BTW, it is too early to say "I've found a bug". Though you cannot obviously find in your program a dependency on native code, it might still have a dependency on native code. We should not draw a conclusion before debugging further into the issue.

Prohibit starting a form in third-part dll plugin (c# service)

I have the a service, that loads some dlls and starts a function in each dll. Each dll contains some rules, that can be also developed by our clients (something like plugin system). The problem is, that clients can theoretically add forms to be called inside dlls. So the goal is to disallow that, or, at least block such dlls.
The only method I can imagine now is call each dll in a separate thread and kill it after some timeout.
But I think it is not so nice.
Please advice me a better method. Thankx.
The best way to deal with plug-ins is to "sandbox" each one of them in an individual app domain. This way you can safely react to their execution errors, unload them if you need to, and manage them in whatever ways you like. But most importantly for this question, you can monitor their loading of assemblies using this event hook. If you see them loading a DLL that you do not want to allow, you can simply throw an exception. Your code would catch the exception, clean up the app domain, and optionally send the clients a warning for trying to do something that is not allowed.
The only downside to this approach is that it is rather non-trivial to implement.
It is VERY hard problem to protect server from third party code that you need to execute.
I would recommend reading on SharePoint sandbox approach (i.e. http://msdn.microsoft.com/en-us/library/ff798382.aspx) which tries to solve this and related issues.
As SLaks said - you implicitly trust code by simply executing it. Unless you expect code to be outright evil you may be better of by simply logging how long calls take (and maybe time out if possible) and provide your client with this information. Since it seems like client creates the code for themselves it is unlikely that code will be explicitly made non-functional.
Other interesting issues outside showing a Form:
stack overflow exception (easy to create, hard to handle)
while(true); code that never returns control
access to native code if full trust enabled.
You could always use reflection to inspect their code and ensure that certain namespaces and classes (e.g. System.Windows.Forms.*) are not referenced or used.
SQLCLR restricts what is allowed to be used/referenced in assemblies installed as SQLCLR extensions, and that appears to be done that way: http://msdn.microsoft.com/en-us/library/ms403273.aspx

Categories

Resources