Generating a C++ executable from a C# application

Generating a C++ executable from a C# application - c#

I have a WCF service accepting requests from our clients.
After analyzing the request I need to generate (compile + link) C++ EXE.
What is the best method to create an C++ EXE from a C# application???
10x

I can only guess what it is you want, but I assume your requirements are something like:
You run a WCF service on a server somewhere.
Upon receiving a certain call, the service must output a (binary) executable, based on the parameters it receives.
C++ should be used as the intermediate language.
Let's look at what you need. The obvious first requirement is a C++ compiler/linker that can be invoked programmatically. On Unix systems, that would be g++, and you can simply shell-out to invoke it; on Windows, g++ is also available (under the name MinGW), but the version is pretty outdated, so you might be better off using Microsoft's command-line C++ compiler.
Obviously, you'll also need to generate C++ source code somewhere; I assume most of the source code is more or less the same for each request, so you probably need some sort of templating system. If it's not too complicated, this can be as simple as running a regex search-and-replace over a bunch of template files; otherwise, you need a proper templating language (XSLT is built into .NET, although the syntax takes some getting used to).
And then the glue to make it work together; I suggest something like this:
Read request and create a suitable data structure (in a format that your template engine can consume)
Pass the data to the template engine, writing the output files to a temporary folder
Invoke the compiler on the temporary location
Read the executable back in, send it to the client
Delete the temporary folder
Since compiling is often a costly operation, consider caching the generated executables (unless they are practically guaranteed to be different every time).
By the way, there is one big caveat: If the client platform is not binary-compatible with the server platform (e.g., the server is running on x64 but the client is x86), the generated executable might not work.
And another one, which is security: By hacking the server, or tricking clients into sending "wrong" requests, an attacker can potentially malicious code through the generated executable; if this application is all but super-trivial, I imagine it's going to be pretty hard to properly secure this thing.

An executable is an executable, and is defined by the abiltity to be executed.
Whatever programming language was, once upon a time, used to write the source code that was fed to a compiler which produced the executable, that no longer matters. An executable looks the same regardless of which language (or languages) you used. (A .NET executable is just an executable with some fairly complex DLL dependencies)
So there is no such thing as a "C++ executable". Perhaps you mean an executable that doesn't depend on the .NET framework?
Or do you simply mean that you have a C++ application that needs to use a WCF service?
Or that you want to rewrite your C# code as C++?

Do you mean you want to compile c# to native machine code? in which case ngen may be some use
http://msdn.microsoft.com/en-us/library/6t9t5wcf%28v=vs.80%29.aspx

Related

How does a virtual machine execute instructions?

I have a strong C++ background and never really had a deep understanding of Java or C#. However, I am curious about the internal workings of the virtual machines. I've experimented with some windows exes and figured out that the actual virtual machines are the jvm and the clr dynamic libraries.
Now here is what bothers me: How do these libraries interact with the instructions in an exe file?
My only guess is that the bytecode is actually stored in the .data segment of the exe file. And it actually passes control to the .dll, which translates the bytecode instructions. Is that correct?
I was unable to find anything about the subject so any reference will be appreciated.

Your guess about where the IL is stored is addressed here:
http://en.wikipedia.org/wiki/Portable_Executable#.NET.2C_metadata.2C_and_the_PE_format
Your conjecture is basically correct for C#. The executable starts up the CLR and hands off the metadata and IL; the CLR then figures out where "Main" is, grabs the IL for that, jit-compiles it into x86 (or whatever) code, and runs that. Each method is compiled "just in time" before it runs for the first time, hence the term "jit compiler".
That is of course a greatly simplified overview. If you want more information about how .NET works, start with:
http://msdn.microsoft.com/en-us/library/a4t23ktk.aspx

Well, at the most basic level you got that right: There is a native application (the runtime, such as java.exe) that reads the bytecode and "runs" it by interpreting the instructions contained within.
The first adjustment you have to make to that picture is that for performance reasons, most VM now use JIT-compilation, which means that the bytecode is not interpreted, but compiled into native code on the fly.
My only guess is that the bytecode is actually stored in the .data segment of the exe file.
Depends. For Java, you usually just have a JAR file with the bytecode, separated from the native binary that gets launched. But, yes, you could combine that together into a single executable, that would then contain the native launcher code (but probably not all the shared libraries that depends on), and the bytecode "as data".
For instance Eclipse runs on JVM right? And still you launch it through an exe.
Yes. Eclipse has one of these "launch wrapper exe". But if you look at that, it is very small. All it does is put up a splash screen and launch the JVM (installed on your system, not part of the exe), and throws some JAR files at it (installed as part of Eclipse, but not inside the exe either).

Is it possible to sandbox and run C++ or C# code that's entered in a textfield in a browser?

I'm diving into web development after ten years of desktop development and I'm experimenting with some testing concepts. I was wondering if it's possible to sandbox and run C++ code that's entered in a textfield in a browser? By that, I mean run the C++ or C# code on the backend webserver and return an analysis of the code. Just to be clear, I don't mean run C++ or C# code that's intended to generate any kind of markup, but simply to blackbox test the C++ or C# block of code that's entered.
How would you invoke the compiler, depending on the web server you're using?
How could you sandbox the code to prevent malicious behavior? If we're considering only one of the C variants, what about blacklisting/whitelisting specific functions and libraries to prevent malicious behavior? Or would that blacklist be too long and too limiting to allow any fair amount of code to run?
These are some fairly high-level questions that I'm asking just because I'm having a hard time finding some direction, but I'm going to continue researching them right now. Thanks so much in advance for your help!

You might find the codepad about page interesting.

# 1 is easy with C#. The Reflection capabilities of .NET allow you to compile and run code "on the fly." And here's a link to another good looking tutorial.
# 2 is a little more difficult but I suppose a basic sand boxing technique might involve executing a dynamic process under a limited, and therefore sand boxed account. Programmatically you could analyze the dynamicly built assembly's dependencies and not allow it to run if it used APIs in certain namespaces such as System.IO. This is non-trivial to say the least though.
C++ doesn't have reflection capabilities and so 3rd party libraries would be your best bet.

The Dinkumware site has something like this.
A simple Perl (or Python, ...) cgi could be used to invoke the compiler, parse it results, run the resulting executable if any and display it's results.
I would take a look at SELinux (maybe AppArmor?) for access controls. Maybe not allowing it write and read to/from the disk and limit it's running time. I don't know if the later can be done with SELinux, too.

If the server runs Linux, you may consider using chroot

We actually did just that with our product called iKnode. We are using this idea to create a Backend in the cloud.
We did this by creating a SandBox that takes an specific piece of code and executes it, captures the result and returns it to the user. This is all done in the cloud.
How would you invoke the compiler, depending on the web server you're
using?
We did this by using the CodeDom utilities from the .Net framework. And we are exploring the coming 'compiler as a service' project coming from Microsoft code-named Roslyn.
This is a good starting point on using CodeDom to programatically compile.
How could you sandbox the code to prevent malicious behavior? If we're
considering only one of the C variants, what about
blacklisting/whitelisting specific functions and libraries to prevent
malicious behavior? Or would that blacklist be too long and too
limiting to allow any fair amount of code to run?
We did this by wrapping the code execution in a separate and limited AppDomain. You can see some examples here.
Additionally, you might want to look into the MonoSandBox, which was created for Moonlight, but it is a more robust SandBox. We are experimenting with it right now, to move away from AppDomains. We believe the MonoSandBox is way better than simple AppDomains.

What is in a DLL and how does it work?

I'm always referencing DLLs in my C# code, but they have remained somewhat of a mystery which I would like to clarify. This is a sort of brain dump of questions regarding DLLs.
I understand a DLL is a dynamically linked library which means that another program can access this library at run time to get "functionality". However, consider the following ASP.NET project with Web.dll and Business.dll (Web.dll is the front end functionality and it references Business.dll for types and methods).
At what point does Web.dll dynamically link to Business.dll? You notice a lot in Windows HDD thrashing for seemingly small tasks when using Word (etc.) and I reckon that Word is going off and dynamically linking in functionality from other DLLs?
1a. Additionally, what loads and links the DLL - the OS or some run time framework such as the .NET framework?
1b. What is the process of "linking"? Are compatibility checks made? Loading into the same memory? What does linking actually mean?
What actually executes the code in the DLL? Does it get executed by the processor or is there another stage of translation or compilation before the processor will understand the code inside the DLL?
2a. In the case of a DLL built in C# .NET, what is running this: the .NET framework or the operating system directly?
Does a DLL from Linux work on a Windows system (if such a thing exists), or are they operating system specific?
Are DLLs specific to a particular framework? Can a DLL built using C# .NET be used by a DLL built with, for example, Borland C++?
4a. If the answer to 4 is "no" then what is the point of a DLL? Why dont the various frameworks use their own formats for linked files? For example: an .exe built in .NET knows that a file type of .abc is something that it can link into its code.
Going back to the Web.dll / Business.dll example - to get a class type of customer I need to reference Business.dll from Web.dll. This must mean that Business.dll contains some sort of a specification as to what a customer class actually is. If I had compiled my Business.dll file in, say, Delphi: would C# understand it and be able to create a customer class, or is there some sort of header info or something that says "hey sorry you can only use me from another Delphi DLL"?
5a. Same applies for methods; can I write a CreateInvoice() method in a DLL, compile it in C++, and then access and run it from C#? What stops or allows me from doing this?
On the subject of DLL hijacking, surely the replacement (bad) DLL must contain the exact method signatures and types as the one that is being hijacked. I suppose this wouldn't be hard to do if you could find out what methods were available in the original DLL.
6a. What in my C# program is deciding if I can access another DLL? If my hijacked DLL contained exactly the same methods and types as the original but it was compiled in another language, would it work?
What is DLL importing and DLL registration?

First of all, you need to understand the difference between two very different kinds of DLLs. Microsoft decided to go with the same file extensions (.exe and .dll) with both .NET (managed code) and native code, however managed code DLLs and native DLLs are very different inside.
1) At what point does web.dll dynamically link to business.dll? You
notice a lot in Windows HDD thrashing for seemingly small tasks when
using Word etc and I reckon that this Word going off and dynamically
linking in functionality from other DLL's?
1) In the case of .NET, DLLs are usually loaded on demand when the first method trying to access anything from the DLL is executed. This is why you can get TypeNotFoundExceptions anywhere in your code if a DLL can't be loaded. When something like Word suddenly starts accessing the HDD a lot, it's likely swapping (getting data that has been swapped out to the disk to make room in the RAM)
1a) Additionally what loads and links the DLL - the O/S or some
runtime framework such as the .Net framework?
1a) In the case of managed DLLs, the .NET framework is what loads, JIT compiles (compiles the .NET bytecode into native code) and links the DLLs. In the case of native DLLs it's a component of the operating system that loads and links the DLL (no compilation is necessary because native DLLs already contain native code).
1b) What is the process of "linking"? Are checks made that there is
compatibility? Loading into the same memory? What does linking
actually mean?
1b) Linking is when references (e.g. method calls) in the calling code to symbols (e.g. methods) in the DLL are replaced with the actual addresses of the things in the DLL. This is necessary because the eventual addresses of the things in the DLL cannot be known before it's been loaded into memory.
2) What actually executes the code in the DLL? Does it get executed by
the processor or is there another stage of translation or compilation
before the processor will understand the code inside the DLL?
2) On Windows, .exe files and .dll files are quite identical. Native .exe and .dll files contain native code (the same stuff the processor executes), so there's no need to translate. Managed .exe and .dll files contain .NET bytecode which is first JIT compiled (translated into native code).
2a) In the case of a DLL built from C# .net what is running this? The
.Net framework or the operating system directly?
2a) After the code has been JIT compiled, it's ran in the exact same way as any code.
3) Does a DLL from say Linux work on a Windows system (if such a thing
exists) or are they operating system specific?
3) Managed DLLs might work as-is, as long as the frameworks on both platforms are up to date and whoever wrote the DLL didn't deliberately break compatibility by using native calls. Native DLLs will not works as-in, as the formats are different (even though the machine code inside is the same, if they're both for the same processor platform). By the way, on Linux, "DLLs" are known as .so (shared object) files.
4) Are they specific to a particular framework? Can a DLL built using
C# .Net be used by a DLL built with Borland C++ (example only)?
4) Managed DLLs are particular to the .NET framework, but naturally they work with any compatible language. Native DLLs are compatible as long as everyone uses the same conventions (calling conventions (how function arguments are passed on the machine code level), symbol naming, etc)
5) Going back to the web.dll / business.dll example. To get a class
type of customer I need to reference business.dll from web.dll. This
must mean that business.dll contains a specification of some sort of
what a customer class actually is. If I had compiled my business.dll
file in say Delphi would C# understand it and be able to create a
customer class - or is there some sort of header info or something
that says "hey sorry you can only use me from another delphi dll".
5) Managed DLLs contain a full description of every class, method, field, etc they contain. AFAIK Delphi doesn't support .NET, so it would create native DLLs, which can't be used in .NET straightforwadly. You will probably be able to call functions with PInvoke, but class definitions will not be found. I don't use Delphi so I don't know how it stores type information with DLLs. C++, for example, relies on header (.h) files which contain the type declarations and must be distributed with the DLL.
6) On the subject of DLL hijacking, surely the replacement (bad) DLL
must contain the exact method signatures, types as the one that is
being hijacked. I suppose this wouldnt be hard to do if you could find
out what methods etc were available in the original DLL.
6) Indeed, it's not hard to do if you can easily switch the DLL. Code signing can be used to avoid this. In order for someone to replace a signed DLL, they would have to know the signing key, which it kept secret.
6a) A bit of a repeat question here but this goes back to what in my
C# program is deciding if I can access another DLL? If my hijacked DLL
contained exactly the same methods and types as the original but it
was compiled in another lanugage would it work?
6a) It would work as long as it's a managed DLL, made with any .NET language.
What is DLL importing? and dll registration?
"DLL importing" can mean many things, usually it means referencing a DLL file and using things in it.
DLL registration is something that's done on Windows to globally register DLL files as COM components to make them available to any software on the system.

A .dll file contains compiled code you can use in your application.
Sometimes the tool used to compile the .dll matters, sometimes not. If you can reference the .dll in your project, it doesn't matter which tool was used to code the .dll's exposed functions.
The linking happens at runtime, unlike statically linked libraries, such as your classes, which link at compile-time.
You can think of a .dll as a black box that provides something your application needs that you don't want to write yourself. Yes, someone understanding the .dll's signature could create another .dll file with different code inside it and your calling application couldn't know the difference.
HTH

1) At what point does web.dll dynamically link to business.dll? You
notice a lot in Windows HDD thrashing for seemingly small tasks when
using Word etc and I reckon that this Word going off and dynamically
linking in functionality from other DLL's?
1) I think you are confusing linking with loading. The link is when all the checks and balances are tested to be sure that what is asked for is available. At load time, parts of the dll are loaded into memory or swapped out to the pagefile. This is the HD activity you are seeing.
Dynamic linking is different from static linking in that in static linking, all the object code is put into the main .exe at link time. With dynamic linking, the object code is put into a separate file (the dll) and it is loaded at a different time from the .exe.
Dynamic linking can be implicit (i.e. the app links with a import lib), or explicit (i.e. the app uses LoadLibrary(ex) to load the dll).
In the implicit case, /DELAYLOAD can be used to postpone the loading of the dll until the app actually needs it. Otherwise, at least some parts of it are loaded (mapped into the process address space) as part of the process initilazation. The dll can also request
that it never be unloaded while the process is active.
COM uses LoadLibrary to load COM dlls. Note that even in the implicit case, the system is using something similar to LoadLibrary to load the dll either at process startup or on first use.
2) What actually executes the code in the DLL? Does it get executed by
the processor or is there another stage of translation or compilation
before the processor will understand the code inside the DLL?
2) Dlls contain object code just like .exes. The format of the dll file is almost identical to the format of an exe file. I have heard that there is only one bit that is different in the headers of the two files.
In the case of a DLL built from C# .net, the .Net framework is running it.
3) Does a DLL from say Linux work on a Windows system (if such a thing
exists) or are they operating system specific?
3) DLLs are platform specific.
4) Are they specific to a particular framework? Can a DLL built using
C# .Net be used by a DLL built with Borland C++ (example only)?
4) Dlls can interoperate with other frameworks if special care is taken or some additional glue code is written.
Dlls are very useful when a company sells multiple products that have overlapping capabilities. For instance, I maintain a raster i/o dll that is used by more than 30 different products at the company. If you have multiple products installed, one upgrade of the dll can upgrade all the products to new raster formats.
5) Going back to the web.dll / business.dll example. To get a class
type of customer I need to reference business.dll from web.dll. This
must mean that business.dll contains a specification of some sort of
what a customer class actually is. If I had compiled my business.dll
file in say Delphi would C# understand it and be able to create a
customer class - or is there some sort of header info or something
that says "hey sorry you can only use me from another delphi dll".
5) Depending on the platform, the capabilities of a dll are presented in various ways, thru .h files, .tlb files, or other ways on .net.
6) On the subject of DLL hijacking, surely the replacement (bad) DLL
must contain the exact method signatures, types as the one that is
being hijacked. I suppose this wouldnt be hard to do if you could find
out what methods etc were available in the original DLL.
6) dumpbin /exports and dumbin /imports are interesting tools to use on .exe and .dlls

Would like to make a php get_include_files() enhancements

I am interested in making an application that can automatically determine what files are included in php.
What I'm getting at is that I would like to make either a C/C++ or a C# application that runs in the background and as you're developing on your local machine, it can display included files by php as you launch pages running on your local apache.
What I thought about was to modify the function in php source code, but then I thought that would be a bad idea because then each new version of php, I'd have to go back and make the same modifications and I doubt everyone would do that.
So my question is, is it remotely possible to get all the included files that your php application used and then somehow display them to the user without using get_included_files() in your php program?

You could go outside of PHP completely and rely on the underlying operating system to report these details. It would be difficult to match the request to the includes though so it would only work in a development situation.
If the OS is Linux/UNIX, you can run strace on the executable (assuming using Apache with mod_php, other situations more difficult).
If the OS is Windows, I'm not sure what to use but possible one of the SysInternals utilities (most are GUI but likely there is a console equivalent of strace or a version of strace for Windows).
Another option would be to use xdebug. It would show you much more information including profiling details, memory usage, etc. It is used as a PHP extension and it does make it easy to profile a whole request in one snapshot. Once you have a trace file, you can use WinCacheGrind (Windows), kCacheGrin (UNIX, maybe OS X too) and something else for OS X. I'd suggest trying this as it is the simplest approach and is quite powerful if you are looking to get this done rather than do exploratory programming.
http://xdebug.org/
If you are interested in doing exploratory programming, my suggested route would be to look at how xdebug works and see if you can write a hook to the functions you want to trace.

Running scripts inside C#

I want to run javascript/Python/Ruby inside my application.
I have an application that creates process automatically based on user's definition. The process is generated in C#. I want to enable advanced users to inject script in predefined locations. Those scripts should be run from the C# process.
For example, the created process will have some activities and in the middle the script should be run and then the process continues as before.
Is there a mechanism to run those scripts from C#?

Basically, you have two problems: how to define point of injections in your generated code, and how to run python / ruby / whatev scripts from there.
Depending on how you generate the process, one possible solution would be to add a function to each possible point of injection. The function would check, whether the user has associated any scripts with given points, and if so, runs the script by invoking IronPython / IronRuby (with optionally given parameters).
Disadvantages include: limited accessibility from the scripts to the created process (basically, only variables passed as parameters could be accessed); as well as implementation limits (IronPython's current version omits several basic system functions).

Look into IronPython and IronRuby -- these will allow you to easily interoperate with C#.

You can compile C# code from within a C# application using the CSharpCodeProvider class.
If the compile succeeds you can run the resulting assembly as returned via the CompiledAssembly property of the CompilerResults class.

Awesome C# scripting language - Script.Net

.NET has a scripting language including runtime engine in PowerShell which can be embedded in any .NET application.

You can compile C# code "on the fly" into an in-memory assembly. I think this is possible with IronPython and IronRuby as well. Look at the CodeDomProvider.CreateProvider method.
If you need to run scripts a lot, or if your process runs for a long time, you might want to load these assemblies into another AppDomain. And unload the AppDomain after you're done with the script. Otherwise you are unable to remove them from memory. This has some consequenses on the other classes in your project, because you have to marshall all calls.

Have you thought about Visual Studio for Applications? I haven't heard much about it since .NET 1.1, but it might be worth a look.
http://msdn.microsoft.com/en-us/library/ms974548.aspx

I've done exactly this just recently - allowed run-time addition of C# scripting.
It's not hard at all, and this article:
http://www.divil.co.uk/net/articles/plugins/scripting.asp
is a very useful summary of the details.

One of Microsoft's solutions to JavaScript in C# is ClearScript,
which uses V8, Chrom browser's JavaScript engine. Check its short FAQtorial for code samples.
It has excellent two-way integration - iterator/enumerator, output parameters, optional parameters, parameter arrays, delegate, task/promise/async/await, bigint, and more.
Apart from that, I think the most distinguishing feature is that it does not depend on Rosyln or Dynamic Language Runtime. This can be good or bad - good because there may be a lot less dependencies (depending on your project's target), bad because you need to bundle the native, platform-dependent V8 dll.
If that is ok, you get to enjoy cutting edge JavaScript / ECMAScript. Everything you get on Chrome, or 98% ES6 as of 2022 Feb, plus several extensions. Speed is as fast as Chrome, obviously, so you get the best of both Google and Microsoft.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.