.NET + pInvoked C++ Dynamic Link Library + Multithreading

.NET + pInvoked C++ Dynamic Link Library + Multithreading - c#

Okay, I messed something up. I've written in C++ a DLL which I call from the managed code (C# .NET). The library works like diamonds and is blazingly fast.
My DLL uses its internal state i.e. allocates heaps of memory and uses myriad of variables which are not cleared off between the calls from .NET. Instead they stay there and C# code is aware of that (there is preprocessing and building data structures), actually this is required for performance.
So what is the problem?
I want to add multi-threading, effectively by allowing each .NET thread access his own DLL. Without storing any data between the calls it would be easy achievable with just one DLL.
But in my case, do I have to copy the *.DLL the number of times equal to the number of threads and write pInvoke wrapper for each file separately?? :O I mean [DllImport(...)] for each out of like 40 functions?
No way, there must be something more clever. Help?

Simply put you need to stop sharing variables between threads.
Your global variables are the problem. Instead you need each different thread to have its own copy of the state that persists between calls. Typically you would put this state into a structure of some sort, perhaps a struct. Then an initial call to the DLL would return a new instance of this structure. You then pass that structure back into the DLL every time you call a function that requires access to the persistent state. When you are done, call back into the DLL to deallocate the structure. You don't need to declare the structure in the managed code. You can just treat it as an opaque pointer. Use IntPtr.
Of course, perhaps you'd just be better off with a C++/CLI assembly.

Related

Passing struct from managed to native, does it pin or copy

The question is general but I am actually doing this with Mono and not .net, so if there are differences, I am very interested in what they are.
I have a simple data containing class (not struct for other reasons) which should be blitable in the sense that it consist of ints and doubles and smaller structs which consist of doubles. I am sending this to a native dll through DllImport static methods as a reference.
I was under the impression that for a simple object like that, what happens is that it is pinned in managed memory, the address of it, in managed memory, is passed to the native code as a reference/pointer (depending on how the native code is declared, same thing), the native code can read and write it, the function returns and the managed object is unpinned and may now hold changes written by the native code.
Others think the object is instead copied into a block of native memory and then the native code runs on it after which the data is copied back into the object in managed memory. This obviously being less performent and wasteful when the marshaling does not have to convert the data.
I made a test where I note the address of the data sent to native code and I see that it does not change per object. ObjectA gets one address, objectB gets another and each keep their address for as long as I tested.... but while that seems to support my understanding of this, there could still be other explanations for the addresses, so I would be grateful for a concrete explanation, since Mono documents do not mention pinning blittable objects while Microsoft documentation does.
Extra question:
Can there can be a situation with a class (not struct) containing only ushort, int and double where it is not blittable but requires a copy? It was observed with mono on android that changes to the data in native code native would not be visible on managed side (when not using the Out decoration), seemingly indicating that copying, and not pinning, was used.
It is possible that the mono int may be different from the c++ int in the native code, but such data size and alignment issues should not be detectable from the managed side, so how would it "know" to marshal by copy and not pinning? In tests on windows such mismatch just result in garbled data, as expected, so that is likely not the reason for marshal by copy.

Performance of C# using C++ wrapped libraries

I have read this and this and was wondering if I use in C# functions from unmanaged C++ library via C# wrapper of this library, is there going to be any difference in performance compared with the same program, but written fully with unmanaged C++ and the C++ library? I am asking about crucial performance difference bigger then 1.5 times. Notice I am asking about the function performance of the C++ library only(in the two ways - with and without the use of C# wrapper), isolating the other code!
After edit:
I was just wondering if I want to use C++ dynamic unmanaged library(.dll) in C# and I am using wrapper - which is going to be compiled to intermediate CIL code and which is not. I guess only the wrapper is being compiled to CIL, and when C# want to use C++ function from the library it is just parsing and passing the arguments to C++ function with the use of the wrapper, so there will be maybe some delay, but not like if I write the whole library via C#. Correct me if I am mistaking please.

Of course, there is overhead involved in switching from managed to unmanaged code execution. It is very modest, takes about 12 cpu cycles. All that needs to be done is write a "cookie" on the stack so that the garbage collector can recognize that subsequent stack frames belong to unmanaged code and therefore should not be inspected for valid object references.
These cookies are strung together like a linked-list, supporting the scenario where C# code calls native code which in turn calls back into managed code. Traversed by the GC when it collects. Not as uncommon as it sounds, it happens in any GUI app for example. The Click event is a good example, triggered when the UI thread pinvokes GetMessage().
Not the only thing that needs to happen however, in any practical scenario you also pass arguments to the native function. They can require a lot more work to get marshaled into a format that the native code can understand. Arrays in particular, they'll need to get pinned if the array elements are blittable, that's still pretty cheap. Gets expensive when the entire array needs to be converted because the element is not blittable. Not always easy to recognize, a profiler is forever the proper tool to detect inefficient code.

Several AppDomains and native code

My C# application is using native code which is not thread safe.
I can run multiple processes of that native code, using inter-process communication to achieve concurrency.
My question is, can i use App Domains instead, so that several managed threads, each on a different App Domain, will call the native code and they will not interfere with each other?
The main goal is to prevent process seperation.

No, AppDomains are a pure managed code concept. It achieves isolation by keeping the managed object roots separate. One AppDomain cannot see the objects of another, makes it very safe to abort code and unload assemblies. Never an accident, it throws away all the data that might contain state.
Unmanaged code is completely agnostic of the GC heap and thus AppDomains, it will allocate in its data section and its own native heap (HeapAlloc). Such allocations are process-global. That makes a process the isolation boundary, you'd need a helper process that loads the DLL and talk to it with one of the .NET process interop mechanisms (socket, named pipe, memory-mapped file, remoting, WCF).
Technically you could create copies of the DLL, each with a different name. But that scales very poorly and the pinvoke is very awkward since you can't use [DllImport] anymore. You need a delegate declaration for each exported function and LoadLibrary() and GetProcAddress() to initialize the delegate objects.

Yes it can be done but you should seriously measure if effort is repaid by benefits.
Windows won't load multiple copies of an unmanaged DLL and unmanaged DLLs are loaded per-process (not per AppDomain). What you can do is to create multiple temporary copies of same DLL then load them with LoadLibrary().
Each one will be loaded per-process but they'll be separated from each other (so they'll be thread-safe). All this stuff can be tied inside a class that wraps unmanaged calls (LoadLibrary, FreeLibrary, GetProcAddress and invocation itself). It'll use less resources and it'll be faster than multiple processes but you'll have to drop DllImport usage.
The only benefit I see is that this will scale much better than multiple processes (because it uses less resources) of course if you reuse instances keeping a cache (it's harder to keep a process cache than an object cache).

Limtations of dynamic objects in C#/Java

I'm basically a C++ guy trying to venture into C#. From the basic tutorial of C#, I happen to find that all objects are created and stored dynamically (also true for Java) and are accessed by references and hence there's no need for copy constructors. There is also no need of bitwise copy when passing objects to a function or returning objects from a function. This makes C# much simpler than C++.
However, I read somewhere that operating on objects exclusively through references imposes limitations on the type of operations that one can perform thus restricting the programmer of complete control. One limitation is that the programmer cannot precisely specify when an object can be destroyed.
Can someone please elaborate on other limitations? (with a sample code if required)

Most of the "limitations" are by design rather than considered a deficiency (you may not agree of course)
You cannot determine/you don't have to worry about
when an object is destroyed
where the object is in memory
how big it is (unless you are tuning the application)
using pointer arithmetic
accessing out side an object
accessing an object with the wrong type
sharing objects between threads is simpler
whether the object is on the stack or the heap. (The stack is being used more and more in Java)
fragmentation of memory (This is not true of all collectors)

Because of Garbage collection done in java we cannot predict when the object will get destroyed but it performs the work of destructor.
If you want to free up some resources then you can use finally block.
try {
} finally{
// dispose resources.
}

Having made a similar transition, the more you look into it, the more you do have to think about C#'s GC behaviour in all but the most straighforward cases. This is especially true when trying to handle unmanaged resources from managed code.
This article highlights a lot of the issues you may be interested in.
Personally I miss a reference counted alternative to IDisposable (more like shared_ptr), but that's probably a hangover from a C++ background.
The more I have to write my own plumbing to support C++ like programming, the more likely it is there is another C# mechanism I've overlooked, or I end up getting frustrated with C#. For example, swap and move are not common idioms in C# as far as I've seen and I miss them: other programmers with a C# background may well disagree about how useful those idioms are.

Does Unmanaged C# code compile into IL and run on the CLR?

In the midst of asking about manually managing CLR memory, I realized I know very little.
I'm aware that the CLR will place a 'cookie' on the stack when you exit a managed context, so that the Garbage Collector won't trample your memory space; however, in everything I've read the assumption is that you are calling some library written in C.
I want to an entire write layer of my application in C#, outside of the managed context, to manage data at a low level. Then, I want to access this layer from a managed layer.
In this case, will my Unmanaged C# code compile to IL and be run on the CLR? How does this work?

I assume this is related to the same C# database project you mentioned in the question.
It is technically possible to implement an entire write layers in C/C++ or any other language. And it is technically possible to have everything else in C#. I am currently working on an application that uses unmanaged code for some high-performance low level stuff and C# for business logic and upper level management.
However, the complexity of the task shall not be underestimated. The typical way to do this, is to design a contract that both parties can understand. The contract will be exposed to the managed language and managed language will trigger calls to the native application. If you have ever tried calling a C++ method from C# you will get the idea... Plus every call to unmanaged code has quite significant performance overhead, which may kill the whole idea of low level performance.
If you really interested in high-performance relational databases, then use single low level language.
If you want to have a naive, but fully working implementation of a database, just use C#. Don't mix these two languages unless you fully understand the complexity. See Raven DB - a document based NoSQL databases that is fully built in C# only.
Will my Unmanaged C# code compile to IL and be run on the CLR?
No, there is no such thing as unmanaged C#. C# code will be always compiled into the IL code and executed by CLR. It is the case of managed code calling unmanaged code. Unmanaged code can be implemented in several languages C/C++/Assembly etc, but CLR will have no idea of what is happening in that code.
Update from a comment. There is a tool (ngen.exe) that can compile C# directly into native architecture specific code. This tool is designed to improve performance of the managed application by removing JIT-compilation stage and putting native code directly into the executable image or library. This code, however, is still "managed" by the CLR host - memory allocation and collection, managed threading, application domains, exception handling, security and all other aspects are still controlled by the CLR. So even though C# can technically be compiled into native code, this code is not running as a standalone native image.
How does this work?
Managed code interoperate with unmanaged code. There are couple of ways to do this:
Through the code via .Net Interop. This is relatively fast but looks a bit ugly in code (plus it is hard to maintain/test) (good article with C#/C/Assembly samples)
A much much slower approach, but more open to other languages: web wervices (SOAP, WS, REST and company), queueing (such as MSMQ, NServiceBus and others), also (possibly) interprocess communication. So unmanaged process sits on one end and a managed application sits on the other one.

I know this is a C# question, but if you are comfortable with C++, C++/CLI might be an option worth considering.
It allows you to selectively compile portions of your C++ code to either a managed or an unmanaged context - However be aware that code that interacts with CLR types MUST run in a managed context.
I'm not aware of the runtime cost of transitioning from managed-context to unmanaged-context and viceversa from within the C++ code, but I assume it must be similar to the cost of calling a native method via .net Interop from C#, which as #oleksii already pointed out, is expensive. In my experience this has really paid off if you need to interact frequently with native C or C++ libraries - IMHO it is much easier to call them from within a C++/CLI project rather than writing the required .net Interop interfaces in C#.
See this question for a litte bit of information on how it is done.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.