What is in a DLL and how does it work?

What is in a DLL and how does it work? - c#

I'm always referencing DLLs in my C# code, but they have remained somewhat of a mystery which I would like to clarify. This is a sort of brain dump of questions regarding DLLs.
I understand a DLL is a dynamically linked library which means that another program can access this library at run time to get "functionality". However, consider the following ASP.NET project with Web.dll and Business.dll (Web.dll is the front end functionality and it references Business.dll for types and methods).
At what point does Web.dll dynamically link to Business.dll? You notice a lot in Windows HDD thrashing for seemingly small tasks when using Word (etc.) and I reckon that Word is going off and dynamically linking in functionality from other DLLs?
1a. Additionally, what loads and links the DLL - the OS or some run time framework such as the .NET framework?
1b. What is the process of "linking"? Are compatibility checks made? Loading into the same memory? What does linking actually mean?
What actually executes the code in the DLL? Does it get executed by the processor or is there another stage of translation or compilation before the processor will understand the code inside the DLL?
2a. In the case of a DLL built in C# .NET, what is running this: the .NET framework or the operating system directly?
Does a DLL from Linux work on a Windows system (if such a thing exists), or are they operating system specific?
Are DLLs specific to a particular framework? Can a DLL built using C# .NET be used by a DLL built with, for example, Borland C++?
4a. If the answer to 4 is "no" then what is the point of a DLL? Why dont the various frameworks use their own formats for linked files? For example: an .exe built in .NET knows that a file type of .abc is something that it can link into its code.
Going back to the Web.dll / Business.dll example - to get a class type of customer I need to reference Business.dll from Web.dll. This must mean that Business.dll contains some sort of a specification as to what a customer class actually is. If I had compiled my Business.dll file in, say, Delphi: would C# understand it and be able to create a customer class, or is there some sort of header info or something that says "hey sorry you can only use me from another Delphi DLL"?
5a. Same applies for methods; can I write a CreateInvoice() method in a DLL, compile it in C++, and then access and run it from C#? What stops or allows me from doing this?
On the subject of DLL hijacking, surely the replacement (bad) DLL must contain the exact method signatures and types as the one that is being hijacked. I suppose this wouldn't be hard to do if you could find out what methods were available in the original DLL.
6a. What in my C# program is deciding if I can access another DLL? If my hijacked DLL contained exactly the same methods and types as the original but it was compiled in another language, would it work?
What is DLL importing and DLL registration?

First of all, you need to understand the difference between two very different kinds of DLLs. Microsoft decided to go with the same file extensions (.exe and .dll) with both .NET (managed code) and native code, however managed code DLLs and native DLLs are very different inside.
1) At what point does web.dll dynamically link to business.dll? You
notice a lot in Windows HDD thrashing for seemingly small tasks when
using Word etc and I reckon that this Word going off and dynamically
linking in functionality from other DLL's?
1) In the case of .NET, DLLs are usually loaded on demand when the first method trying to access anything from the DLL is executed. This is why you can get TypeNotFoundExceptions anywhere in your code if a DLL can't be loaded. When something like Word suddenly starts accessing the HDD a lot, it's likely swapping (getting data that has been swapped out to the disk to make room in the RAM)
1a) Additionally what loads and links the DLL - the O/S or some
runtime framework such as the .Net framework?
1a) In the case of managed DLLs, the .NET framework is what loads, JIT compiles (compiles the .NET bytecode into native code) and links the DLLs. In the case of native DLLs it's a component of the operating system that loads and links the DLL (no compilation is necessary because native DLLs already contain native code).
1b) What is the process of "linking"? Are checks made that there is
compatibility? Loading into the same memory? What does linking
actually mean?
1b) Linking is when references (e.g. method calls) in the calling code to symbols (e.g. methods) in the DLL are replaced with the actual addresses of the things in the DLL. This is necessary because the eventual addresses of the things in the DLL cannot be known before it's been loaded into memory.
2) What actually executes the code in the DLL? Does it get executed by
the processor or is there another stage of translation or compilation
before the processor will understand the code inside the DLL?
2) On Windows, .exe files and .dll files are quite identical. Native .exe and .dll files contain native code (the same stuff the processor executes), so there's no need to translate. Managed .exe and .dll files contain .NET bytecode which is first JIT compiled (translated into native code).
2a) In the case of a DLL built from C# .net what is running this? The
.Net framework or the operating system directly?
2a) After the code has been JIT compiled, it's ran in the exact same way as any code.
3) Does a DLL from say Linux work on a Windows system (if such a thing
exists) or are they operating system specific?
3) Managed DLLs might work as-is, as long as the frameworks on both platforms are up to date and whoever wrote the DLL didn't deliberately break compatibility by using native calls. Native DLLs will not works as-in, as the formats are different (even though the machine code inside is the same, if they're both for the same processor platform). By the way, on Linux, "DLLs" are known as .so (shared object) files.
4) Are they specific to a particular framework? Can a DLL built using
C# .Net be used by a DLL built with Borland C++ (example only)?
4) Managed DLLs are particular to the .NET framework, but naturally they work with any compatible language. Native DLLs are compatible as long as everyone uses the same conventions (calling conventions (how function arguments are passed on the machine code level), symbol naming, etc)
5) Going back to the web.dll / business.dll example. To get a class
type of customer I need to reference business.dll from web.dll. This
must mean that business.dll contains a specification of some sort of
what a customer class actually is. If I had compiled my business.dll
file in say Delphi would C# understand it and be able to create a
customer class - or is there some sort of header info or something
that says "hey sorry you can only use me from another delphi dll".
5) Managed DLLs contain a full description of every class, method, field, etc they contain. AFAIK Delphi doesn't support .NET, so it would create native DLLs, which can't be used in .NET straightforwadly. You will probably be able to call functions with PInvoke, but class definitions will not be found. I don't use Delphi so I don't know how it stores type information with DLLs. C++, for example, relies on header (.h) files which contain the type declarations and must be distributed with the DLL.
6) On the subject of DLL hijacking, surely the replacement (bad) DLL
must contain the exact method signatures, types as the one that is
being hijacked. I suppose this wouldnt be hard to do if you could find
out what methods etc were available in the original DLL.
6) Indeed, it's not hard to do if you can easily switch the DLL. Code signing can be used to avoid this. In order for someone to replace a signed DLL, they would have to know the signing key, which it kept secret.
6a) A bit of a repeat question here but this goes back to what in my
C# program is deciding if I can access another DLL? If my hijacked DLL
contained exactly the same methods and types as the original but it
was compiled in another lanugage would it work?
6a) It would work as long as it's a managed DLL, made with any .NET language.
What is DLL importing? and dll registration?
"DLL importing" can mean many things, usually it means referencing a DLL file and using things in it.
DLL registration is something that's done on Windows to globally register DLL files as COM components to make them available to any software on the system.

A .dll file contains compiled code you can use in your application.
Sometimes the tool used to compile the .dll matters, sometimes not. If you can reference the .dll in your project, it doesn't matter which tool was used to code the .dll's exposed functions.
The linking happens at runtime, unlike statically linked libraries, such as your classes, which link at compile-time.
You can think of a .dll as a black box that provides something your application needs that you don't want to write yourself. Yes, someone understanding the .dll's signature could create another .dll file with different code inside it and your calling application couldn't know the difference.
HTH

1) At what point does web.dll dynamically link to business.dll? You
notice a lot in Windows HDD thrashing for seemingly small tasks when
using Word etc and I reckon that this Word going off and dynamically
linking in functionality from other DLL's?
1) I think you are confusing linking with loading. The link is when all the checks and balances are tested to be sure that what is asked for is available. At load time, parts of the dll are loaded into memory or swapped out to the pagefile. This is the HD activity you are seeing.
Dynamic linking is different from static linking in that in static linking, all the object code is put into the main .exe at link time. With dynamic linking, the object code is put into a separate file (the dll) and it is loaded at a different time from the .exe.
Dynamic linking can be implicit (i.e. the app links with a import lib), or explicit (i.e. the app uses LoadLibrary(ex) to load the dll).
In the implicit case, /DELAYLOAD can be used to postpone the loading of the dll until the app actually needs it. Otherwise, at least some parts of it are loaded (mapped into the process address space) as part of the process initilazation. The dll can also request
that it never be unloaded while the process is active.
COM uses LoadLibrary to load COM dlls. Note that even in the implicit case, the system is using something similar to LoadLibrary to load the dll either at process startup or on first use.
2) What actually executes the code in the DLL? Does it get executed by
the processor or is there another stage of translation or compilation
before the processor will understand the code inside the DLL?
2) Dlls contain object code just like .exes. The format of the dll file is almost identical to the format of an exe file. I have heard that there is only one bit that is different in the headers of the two files.
In the case of a DLL built from C# .net, the .Net framework is running it.
3) Does a DLL from say Linux work on a Windows system (if such a thing
exists) or are they operating system specific?
3) DLLs are platform specific.
4) Are they specific to a particular framework? Can a DLL built using
C# .Net be used by a DLL built with Borland C++ (example only)?
4) Dlls can interoperate with other frameworks if special care is taken or some additional glue code is written.
Dlls are very useful when a company sells multiple products that have overlapping capabilities. For instance, I maintain a raster i/o dll that is used by more than 30 different products at the company. If you have multiple products installed, one upgrade of the dll can upgrade all the products to new raster formats.
5) Going back to the web.dll / business.dll example. To get a class
type of customer I need to reference business.dll from web.dll. This
must mean that business.dll contains a specification of some sort of
what a customer class actually is. If I had compiled my business.dll
file in say Delphi would C# understand it and be able to create a
customer class - or is there some sort of header info or something
that says "hey sorry you can only use me from another delphi dll".
5) Depending on the platform, the capabilities of a dll are presented in various ways, thru .h files, .tlb files, or other ways on .net.
6) On the subject of DLL hijacking, surely the replacement (bad) DLL
must contain the exact method signatures, types as the one that is
being hijacked. I suppose this wouldnt be hard to do if you could find
out what methods etc were available in the original DLL.
6) dumpbin /exports and dumbin /imports are interesting tools to use on .exe and .dlls

Related

How does a virtual machine execute instructions?

I have a strong C++ background and never really had a deep understanding of Java or C#. However, I am curious about the internal workings of the virtual machines. I've experimented with some windows exes and figured out that the actual virtual machines are the jvm and the clr dynamic libraries.
Now here is what bothers me: How do these libraries interact with the instructions in an exe file?
My only guess is that the bytecode is actually stored in the .data segment of the exe file. And it actually passes control to the .dll, which translates the bytecode instructions. Is that correct?
I was unable to find anything about the subject so any reference will be appreciated.

Your guess about where the IL is stored is addressed here:
http://en.wikipedia.org/wiki/Portable_Executable#.NET.2C_metadata.2C_and_the_PE_format
Your conjecture is basically correct for C#. The executable starts up the CLR and hands off the metadata and IL; the CLR then figures out where "Main" is, grabs the IL for that, jit-compiles it into x86 (or whatever) code, and runs that. Each method is compiled "just in time" before it runs for the first time, hence the term "jit compiler".
That is of course a greatly simplified overview. If you want more information about how .NET works, start with:
http://msdn.microsoft.com/en-us/library/a4t23ktk.aspx

Well, at the most basic level you got that right: There is a native application (the runtime, such as java.exe) that reads the bytecode and "runs" it by interpreting the instructions contained within.
The first adjustment you have to make to that picture is that for performance reasons, most VM now use JIT-compilation, which means that the bytecode is not interpreted, but compiled into native code on the fly.
My only guess is that the bytecode is actually stored in the .data segment of the exe file.
Depends. For Java, you usually just have a JAR file with the bytecode, separated from the native binary that gets launched. But, yes, you could combine that together into a single executable, that would then contain the native launcher code (but probably not all the shared libraries that depends on), and the bytecode "as data".
For instance Eclipse runs on JVM right? And still you launch it through an exe.
Yes. Eclipse has one of these "launch wrapper exe". But if you look at that, it is very small. All it does is put up a splash screen and launch the JVM (installed on your system, not part of the exe), and throws some JAR files at it (installed as part of Eclipse, but not inside the exe either).

Automatically generate C# wrapper class from dll in Visual Studio 2010 Express?

I was told by a colleague of mine that Visual Studio allows one to point to a .dll and auto-magically generate a C# wrapper class. Is this really possible? And if so, how does one go about achieving this? I've browsed the web, but have failed to come up with anything!
Thanks all!
Figured I'd share these resources as well,
How to: Create COM Wrappers
And courtesy of #Darin, Consuming Unmanaged DLL Functions

3 cases:
The DLL represents a managed assembly => you directly reference it in your project and use it
The DLL represents a COM object => you could use the tlbimp.exe utility to generate a managed wrapper
The DLL represents an unmanaged library with some exported functions. That's the toughest one. There are no tools. You will have to consult the documentation of the library to know the exported function names and parameters and build managed P/Invoke wrappers. You could use the dumpbin.exe utility to see a list of exported functions. Here's an article on MSDN about the different steps.

This certainly isn't possible with any DLL. Just a very specific kind, one that implements a COM server. The converter needs a good description of the exported types, that's provided for such servers by a type library.
A type library is the exact equivalent to metadata in a managed assembly. While it starts life as a standalone file, a .tlb file, it often gets embedded as a resource in the DLL. Good place for it, keeps the type descriptions close to the code that implements it. Just like the metadata in a .NET assembly.
Some tooling to play with to see type libraries (not sure if it works in Express): in Visual Studio use File + Open + File and pick, say, c:\windows\system32\shell32.dll. You'll see the resources in that DLL, note the TYPELIB node. That's the type library. It is binary so actually reading it isn't practical. For that, run OleView.exe from the Visual Studio Command Prompt. File + View Typelib and select the same DLL. That decompiles the type library back into IDL, the Interface Description Language that was originally used to create the type library. Highly readable, you'll have little trouble understanding the language. And can easily see how the .NET Tlbimp.exe can translate that type library into equivalent C# declarations.
Type libraries are old, they have been around since 1996. Originally designed by the Visual Basic team at Microsoft, as a replacement for VBX, the 16-bit VB extensibility model. They have been very successful, practically any Windows compiler supports them. But they are limited in expressive power, there is no support for things like generics and implementation inheritance. Notable is that the Windows 8 team has replaced type libraries for WinRT. They picked the .NET metadata format.

I know this question is fairly old and seems to have been answered sufficiently, but I just want to add one thought I think might be important.
I could be totally wrong, so please take my answer with a grain of salt and correct me on this if I am.
To be able to call members/fields in a DLL, the information needed to call them must be accessible in some form. That information should be all you need to write a wrapper. With that, you can determine all members/fields "form" aka method headers and so on.
In C# it is possible to load DLLs via reflection and get that information. I dont know about different DLL-Types as described above, but as I said, to call the members/fields this information has to be there in some form. So using reflection to get that Information, you could generate a new class e.g. "prefixOriginalname" and have it have the same members/fields as your original class, calling the members/fields of your original class and adding your desired extra functionality.
So
Every DLL (or peripheral document) gives you the information need to use its types. Namely everything that is implemented as "public"
You can access this needed information via reflection
Given 1. and 2., you can create a program to extract the needed information from DLL and generate wrappers accordingly.
As I said, I am not 100% sure on this one, because the other answers make it sound to me like that might be too difficult or even impossible for some reason.

Building NPAPI plugin in C#

Attempting to build a C# NPAPI plugin I have found a tutorial which describes that your dll needs to implement a number of methods such as NP_GetEntryPoints , NP_Initialize and NPP_New along with a number of others.
However what I want to understand is if I can simply mirror these method names and construct equivalent datastructures as described in the article (like _NPPluginFuncs) in C# and everything will work?
Could someone please be kind enough to provide some guidance? Is it possible to build a NPAPI plugin in C# and if so what are the basic steps involved?

As stated in the documentation:
A NPAPI browser plugin is, at it’s core, simply a DLL with a few specific entry points
That means you need to export some function from a regular dll, that is done usually in C/C++. Unfortunately it is not possible to expose any entry point from a plain C# dll, but look at this answer, with some effort it appear to be possible to trick some export by some sort of post build tool.
In any case don't expect to pass too much complicated data structures from/to the plugin interfaces, it will be a pain. If you are interested in doing more research the keywork to use is "reverse P/Invoke", in analogy with direct P/Invoke that is calling regular dll from managed world.
The reason a C# dll can't expose directly "entry points" is that entry point are actually just some address inside the dll poiting to some assembly code immediately executable. C# dll are different kind of beast: they are just files containing IL that is compiled "Just In Time" and indeed such compilation is forced AFAIK with some OS tricks. This is the reason reverse P/Invoke is not starightforward at all.

As Georg Fritzsche says in his comment:
NPAPI plugins are basically DLLs with a few required C-exports
and there is no built-in way to export functions (in the C-export sense) from an assembly written in C#.
Some of your options are:
A mixed-mode C++ assembly which can export the functions directly. This could have implications for hosting the CLR in your plugin's host process.
A small native DLL which hosts the exports, then uses COM interop to delegate to a C# assembly containing the plugin functionality. The "official" way to do so-called "reverse p/invoke".
An intriguing project which post-processes your fully-managed assembly, turning static methods marked with a custom attribute into named function exports. (I have no affiliation with this project; I stumbled across it after I got to wondering whether anyone had improved on the COM interop way of doing things.)

How to handle C# - C++ project interaction the right way?

I´m really in the need of getting a deeper understanding of how to set up things right to get an elegant interaction between my C++ and C# code bases.
What I want to achieve is an in-game editor written in C# for my game engine (C++/DX). For doing so I let VS build my engine as a C++ dll with some additional functions (unmanaged code) to access the required functionality of my engine from the C# editor code base. So far so good.
The first thing which is bugging me is that I´ve to build the dll with CLR support. Otherwise C# does not accept the dll for some reason. It doesn´t even allow me to add it to the resources ("A reference to 'C:\Users...\frame_work\Test\frame_workd.dll' could not be added. Please make sure that the file is accessible, and that it is a valid assembly or COM component.").
And when I build the dll with CLR support and add it to the references in C# ,re-build without CLR support, start my editor and make a function call from the dll then I get an Exception HRESULT: 0x8007007E. I searched for it but the only thing I found had to do with dependencies but that doesn´t fit to the alert I get when adding the dll to the resources.
The other point is that I always have to switch the configuration type between application (.exe) and dll. in VS C++ depending on whether I want to run my engine directly or from the editor and every time the complete project is build completely new.
So, could someone explain to me how to organize this the right way? And what could be a possible reason why C# wants the dll to be compiled with CLR support?
Thank you guys/girls.

There are two ways to deal with this.
Either you make your C++ code provide an API which has a fully compliant COM object. If the object is COM then C# can directly interop with it. (This is why you can't add it as a reference directly)
However I think what you are really wanting to do will involve a P/Invoke (calling C/C++ native code from C#). This is entirely possible but it's not always easy. You need to deal with conversions between your C++ API and your C#, pointers and you need to be very careful to pin any references that your C++ code writes to in the C# app.

C# code is managed code (runs in the CLR), and can only directly* reference managed assemblies. So of course you're getting an error when you build against a managed assembly, and then sneak in and replace that managed DLL with an (incompatible) unmanaged DLL. You're basically trying to lie to the compiler, and that generally doesn't end well.
If you want your C++ DLL to be accessible from C#, the simplest way to do it is to build it as a managed assembly (i.e., CLR support). Which you're already doing. Just take out the extra step where you replace the working managed DLL with a non-working unmanaged one.
Also:
C++ dll with some additional functions (unmanaged code) to access the required functionality of my engine from the C# editor
That won't help you, because C# can't directly* call unmanaged code. The simplest way to make this work will be to make additional managed classes and methods in your C++ DLL. Then your C# assembly will be able to directly use those managed classes.
* As Spence noted, you can use -indirect- means (P/Invoke and COM) to access unmanaged code from C#. But that will make your life much more complicated than it is now, not to mention how it will complicate your build and deployment. You're already really close to something that should work -- don't add all that extra complexity.

When calling functions with P/Invoke, you don't add the DLL to the C# project resources (or what you probably meant, references, either).
You will add it to the file list in your MSI project, of course.

Unmanaged to Managed options: performance considerations

Preliminary: The caller is a native EXE that exposes a type of "plugin" architecture. It is designed to load a DLL (by name, specified as a command line arg). That DLL must be native, and export a specific function signature. The EXE is C++, which isn't too important since the EXE is a black box (cannot be modified/recompiled). The native DLL can meet the application needs by completely implementing the solution natively, in said DLL. However, a requirement is to allow the real work (thus turning the native DLL into a thin wrapper/gateway) to be coded in C#. This leads me to 3 options (if there are more, please share):
Native DLL loads a C++/Cli DLL that internally makes use of a C# class library
Native DLL interacts with a C# COM object via CCW
Native DLL hosts CLR and makes calls to C# assembly
One more requirement is that not only does the native DLL need a way to send messages (call functions) on the C#, but the C# needs to be able to fire events/callback to the native DLL when certain extraordinary things occur (as opposed to shutting down and returning). Now this last thing I'm not sure how to handle in the 3rd option, but that is another question altogether.
So to the point: performance. Any info regarding those approaches (assuming they all meet the requirements)? From my investigation, my understanding is 2 would have more overhead than 1, but I'm not 100% confident, which is why I'm here. As for 3, I just don't have any info yet.
So if anyone has dealt with these (or knows of another elegant option), please chime in.
Thanks!

I've done option 1 before, with reasonable success. I don't remember any significant performance implications, though my application wasn't terribly performance-intensive. It seems to me that if performance problems occur, a likely culprit might be the frequent, small native-to-managed transitions. Would it be possible to batch those at the C++/CLI layer?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.