How is it possible to see C# code after compilation/optimization? - c#

I was reading about the yield keyword when I came across a sample chapter from C# in Depth: http://csharpindepth.com/Articles/Chapter6/IteratorBlockImplementation.aspx.
The first block of code utilizes the yield keyword to make a simple iterator. But, the second block of code shows this code after the compiler has had its way with it. Among other things, it has exploded the yield statement into a state machine.
Several other examples of code being modified by the compiler is evident on the page.
My question is: Was the author actually able to access the code after compilation, or did he infer what it was going to look like?

You can have a look using Reflector, that's probably your best bet:
http://reflector.red-gate.com

The author himself mentioned:
Obviously the compiler doesn't actually produce C#, but I've used Reflector to decompile the code as C#.
in the same paragraph, titled High level overview: what's the pattern?

Probably both. It's quite easy to reverse-engineer compiled assemblies using Reflector. And the C# language spec, which defines how various syntactic-sugary things are compiled, is a public document. The author could have used either approach, or a mixture of the two.

Check out ildasm to take a look at the compiled IL.
(Really, it's good fun once you get your eye in)

.NET CLR actually has a form of assembly called MSIL, along with an assembler and dissembler. So yes, you can actually compile the code, then see the exact compiled CLR instructions.
https://web.archive.org/web/20211020103545/https://www.4guysfromrolla.com/articles/080404-1.aspx

dotPeek by JetBrains
https://www.jetbrains.com/decompiler/
It's free and easy to use :)

Related

Decompiled с# code shows errors

I have decompiled C# dll and now can not compile it. It shows strange errors, for example in some switch blocks some case blocks havent their break statements, or errors like "Can not cast int to bool" are shown. But the amount of errors is not very large for dll of that size, so I think it is not the problem of decompiler.
Is there some derective for compiler (for example, smthing life unsafe) that will solve this problem? Or why is there such strange errors?
P.S. The dll is not broken - the application is using it right now. I'm using dotPeek to decompile and Visual Studio 15 to compile the result code.
For such an error like :
"Can not cast int to bool"
There is absolutely no compiler directive that will allow your compiler to go ahead, I think that your decompiler messed up something decompiling ... You could simply try with another decompiler and see if you get another results, valid alternatives are : ILSpy, JustDecompile and Dotnet IL Editor .
Be aware that some commercial DLL is obfuscated just to try to make life difficult to the decompiler and to who decompile ...
Be careful to avoid breaking some copyright .
It's quite hard to decompile IL to C# and decompiler has probably done something wrong. In my experience, decompiled and compiled code can also behave differently than original assembly, so be aware. And such compile error can be indocation that dotPeek was not sure, so think about it what it should do or look in IL.
If you are willing to make really small edits to the assembly (inject method calls, make something public) it may be safer to do it directly in IL obtained by ildasm.

Decompiled DLL - Clues to help tell whether it was C# or VB.NET?

When using something like DotPeek to decompile a DLL, how do I tell whether it was originally coded in VB.Net or C#?
I gather there's no easy way to tell, but that there may be tell-tale signs (ie. clues) in some of the decompiled code?
You can look for a reference to the Microsoft.VisualBasic library. If that is present, it's very probable that the code was made using VB. The library is sometimes included in C# projects also, but that is not very common. If the reference is not there, it's certainly not VB.
(Well, it's possible to compile VB without the library using the command line compiler and special compiler switches, but that is extremely rare.)
You can also check how frequently the VisualBasic library is used. In a regular VB program it would be used often, but in a C# program it would typically only be used for some specific task that isn't available in other libraries, like a DateDiff call.
Any VB specific commands, like CInt or Mid will show up as calls to the VisualBasic library, and even the = operator when used on strings, will use the library. This code (where a and b are strings):
If a = b Then
will actually make a library call to do the comparison, and shows up like this when decompiled as C#:
if (Operators.CompareString(a, b, false) == 0) {
One posible route might be to look for Named Indexers; It isn't allowed in C# i.e. you can only have the following in c#
object this [int index] {get;set;}
but in managed C++ and VB.Net (I believe, will delete this if I'm wrong) it appears you can have named indexers.
So at least you could narrow it down to whether or not it was C#
For completeness, I'll post the clue that I'm aware of:
If you decompile to C# and find invalid member names starting with $static$:
private short $STATIC$Report_Print$20211C1280B1$nHeight;
... that means it was probably VB.Net, because the compiler uses those to implement the 'Static' VB keyword.
Hans Passant and Jon Skeet explain it better over here: https://stackoverflow.com/a/7311567/22194 https://stackoverflow.com/a/7310497/22194
I'm surprised noone has mentioned the My namespace yet. It is very hard to get the VB.NET compiler to not include some of its helper classes in the output.
how do I tell whether it was originally coded in VB.Net or C#?
You can't tell that in a reliable manner. Well of course IL compiled with the VB.NET compiler will include references to some VB specific assemblies (such as Microsoft.VisualBasic), but there's nothing preventing a C# project also reference and use those assemblies.
To build on the ideas introduced in the other answers, the assembly does not report what language was used to write it, but you may look for non-cls compliant code
Being CLS compliant means that the code is written against features available to all CLS compliant languages. Which means that there are no public nested classes or named indexers and probably a number of other features that IL may support but any particular language may not.
If it is an option, you could probably just look at the PDBs.

Is it possible to create/execute code in run-time in C#?

I know you can dynamically create a .NET assembly using Emit, System.Reflection and manually created IL code as shown here.
But I was wondering is it possible to dynamically create and execute C# code block real-time, in a running application. Thanks for any input or ideas.
Edit:
As I understand CodeDOM allows you to compile C# code into EXE file rather than "just" executing it. Here's some background information and why (as far as I can tell) this isn't the best option for me. I'm creating an application that will have to execute such a dynamically created code quite a lot [for the record - it's for academic research, not a real-world application, thus this cannot be avoided]. Therefore, creating/executing thousands of dynamically created EXEs aren't really efficient. Secondly - all dynamic code fragments returns some data that is hard to read from separately running EXE. Please let me know if I'm missing something.
As for DynamicMethod approach, pointed out by Jon Skeet, everything would work like a charm if there would be an easier way to write the code itself rather than low level IL code.
In other words (very harshly speaking) I need something like this:
string x = "_some c# code here_";
var result = Exec(x);
Absolutely - that's exactly what I do for Snippy for example, for C# in Depth. You can download the source code here - it uses CSharpCodeProvider.
There's also the possibility of building expression trees and then compiling them into delegates, using DynamicMethod, or the DLR in .NET 4... all kinds of things.
Yes, it is. There are several applications that do just that - see LinqPad and Snippy.
I believe they use the CSharpCodeProvider.
Yes. See this MSDN page regarding using the CodeDOM.
Some example code extracted from the above mention page:
CodeEntryPointMethod start = new CodeEntryPointMethod();
CodeMethodInvokeExpression cs1 = new CodeMethodInvokeExpression(
new CodeTypeReferenceExpression("System.Console"),
"WriteLine", new CodePrimitiveExpression("Hello World!") );
start.Statements.Add(cs1);
You can Use CodeDom to generate code and create in memory assemblies based on that generated code. THose can then be used in the current application.
Here's a quick link to the msdn reference, it is quite extensive material.
MSDN: Using the CodeDom

How to become an MSIL pro?

I have spent hours on a debugging problem only to have a more experienced guy look at the IL (something like 00400089 mov dword ptr [ebp-8],edx ) and point out the problem. Honestly, this looks like Hebrew to me - I have no idea what the heck it's saying.
Where can I learn more about this stuff and impress everyone around me? My goal is to read stuff like the following and make a comment like: Yeh, you are having a race condition.
.maxstack 2
.entrypoint
.locals init (valuetype [MathLib]HangamaHouse.MathClass mclass)
ldloca mclass
ldc.i4 5
That is not MSIL, it is assembly langauge 80x86.
To get great at IL, start with this fantastic article: Introduction to IL Assembly Language. Although it says "introduction", it's everything you need to start getting comfortable.
The other part of what you need is practice and lots of it. Use .NET Reflector and start looking at code disassembled into IL. (Tip: when you go to download it, you don't have to provide a real email.) Also, play with the Reflexil plugin in Reflector. Here's a good starting point for that: Assembly Manipulation and C# / VB.NET Code Injection.
Not necessary but a bonus: Reflexil is open source. You can get the source here.
I can give you an answer that runs both ways.
On the one hand, there's nothing like good assembly language skills to teach you how a computer really operates. MSIL is, to some extent, an assembly-like language. On the down side, there are very few opportunities to do this kind of development any more.
On the other hand, resorting to looking at the MSIL to fix a problem is not necessarily the most direct or educational way to understand a problem. In five years of .NET programming, I've never felt the need to go there. Only once has a co-worker (who had worked at Microsoft on compiler testing) gone there with a problem that I was trying to solve, and in the end, his answer was misleading, because the real issue was based in CLR design and constraints. Better knowledge of the CLR and C# would have led to a better understanding and a real solution.
(In case you're wondering, the problem was that I wanted to use "as" for safe casting with a generic. "as" didn't work, but "is" did. My co-worker noted that "is" and "as" use the same MSIL. The real problem is that "as" only works for casting classes, and without the proper constraint on the generic declaration, C# doesn't know whether your generic type will be a class. In fact, the types I was using with generics were value types; "as" couldn't work for those at all.)
Rather than going for MSIL skills, I highly recommend Jeffrey Richter's book CLR via C#. Even after years of digging hard at into C#, this book is still full of revelations - I learn something from every page.
Can’t say I’m an IL “pro”, but I managed to teach myself pretty much all of IL by doing the following:
Write a very short (two or three line) C# program that you are curious about how to write in IL.
Compile the program.
Open the compiled EXE in .NET Reflector.
Look at the IL code for the method in Reflector.
Hover your mouse over an IL opcode (e.g. “ldloc”). There is a tooltip describing each IL instruction.

translate C++/CLI into C#

How I can translate small C++/CLI project to c#
One roundabout, manual way would be to compile your C++/CLI project and open the output assembly in Reflector. Disassemble each class, have it convert the disassembled IL to C#, and save that code off.
As for an automatic way to do it, I can't think of any off the top of my head.
Those things being said, are you sure you really want to convert your project to C#? If your C++/CLI project uses any unmanaged code, you'll have a difficult time coming up with a purely managed equivalent. If the project is more or less composed of pure CLR code, and it was written in C++/CLI for the sake of being written in C++/CLI, I can understand wanting to convert it to C#. But if there was a reason for writing it in C++/CLI, you may want to keep it that way.
IMHO, line by line is the best way. I've ported several C++ style projects to a managed language and I've tried various approaches; translators, line by line, scripting, etc ... Over time I've found the most effective way is to do it line by line even though it seems like the slowest way at first.
Too much is lost in a translator. No translator is perfect and you end up spending a lot of time fixing up the translated code. Also, translated code as a rule is ugly and tends to be less readable than hand crafted code. So the result is a fixed up, not very pretty code base.
A couple of tips I have on line by line
Start by defining all of the leaf types
For every type that has a non-trivial (freeing memory) destructor, implement IDisposable
Turn on the FxCop rule that checks for lack of Dispose calls to catch all of the places use used stack based RAII and missed it
Pay very close attention to the uses of byref in C++.
I haven't tried it, but I just googled it and found this: http://code2code.net/
According t it, you shouldn't fully rely on the code it produces:
You accept that this page does only half the work.
Futher work on your part is required. In most cases, the translated code will not even compile.
Also, read this: Translate C++/CLI to C#

Categories

Resources