How to ensure that two .NET libraries are functionally the same

How to ensure that two .NET libraries are functionally the same - c#

Following the idea of this question:
Is there a way to compare two .NET .dlls from the point of view of the CLI instructions and native instructions, to ensure that they are exactly the same, mean will behave exactly the same at runtime?
Typical use case is: you want to ensure that you have a reproducible build environment: that all developer machines compile exactly the same code as the build server.
The hash of the .dll file is not sufficient as the .NET compiler doesn't guarantee two identical .dlls are produced when compiling the same source code twice (and effectively changes a few bytes at the top and at the tail of the dll).
This question is similar to this former one, but the question didn't focus on the functional/behavior aspects of the .dlls, resulting in unclear answers and confused conversation.

One possible way could be to write unit tests. But not sure if it will also cover the statement:
from the point of view of the CLI instructions and native instructions

Related

Automatically check for binary incompatibility between versions of a C# DLL?

I'm looking for any automatable way to check for binary incompatibility between versions of a C# DLL.
Something like Unix C++'s abi-compliance-checker but for .NET, and in particular for C#, would be great. This SO answer gives a screenshot of exactly the kind of info that I'd love to be able to get regarding two different versions of a C# API.
If there isn't any direct equivalent, then it would also be helpful to get any information at all about any automatable way to get some assurance that binary compatibility isn't broken between versions of a library.
This question seems to be the authoritative list on Stack Overflow of ways to break your API. There's also an article by Jon Skeet which discusses some of these issues. (And of course there are multiple other discussions of this issue on the internet; though there is a lot more of it focused on C++ than on C# and .NET, which is what I'm asking about here.)
Amongst other things, those articles help make clear that:
Binary and compilation compatibility and incompatibility are NOT trivial problems!
Binary (in)compatibility does not always entail compilation (in)compatibility nor vice versa
Even though (it might be worth noting here) at least one Microsoft article on the topic incorrectly/misleadingly states that it does in one direction
There are different levels of incompatibility: in particular, there are some obscure things, particularly to do with reflection, that users of a library can do which will make certain API changes binary incompatible, where they wouldn't have been otherwise
As a side note, this pretty much makes it seem like real world incompatibility isn't a yes/no question: there'll be certain theoretically incompatible changes which you might well be prepared to make within a major version of your API, if you don't expect normal users of the API to be doing these more obscure things with it
That fits with the different severities of incompatibility shown in the abi-compliance-checker screenshot in the SO answer linked above
So, given that everybody these days is supposed to be updating their APIs using semantic versioning, and also given that everybody is using test suites and probably devops as well, how on earth is everybody - or even anybody?! - automatically testing for binary incompatibility changes in .NET?
I can see that careful use of a test suite is enough to let you know if you've made compilation compatibility changes, and most projects already have this - hence why my question is about binary incompatibility changes, which don't seem to be automatically detected at all in any projects I've looked at or worked on.
Just reading the above articles and being careful probably isn't enough... (though of course it's some kind of start) but I can't find any documentation about any other approach.

As a partial answer to my own question, and on the basis that something is better than nothing at all, it looks to me as if one possible pragmatic answer (i.e. it definitely won't catch all possible problems, but it should at least catch some...) would be to store compiled versions of old test suites and then to check whether these still run against compiled newer versions of the API.
In essentially all OSS projects I've worked on, the tests and the library are compiled at the same time. Anything which is a binary incompatibility but not a compilation incompatibility would never cause a failure when just running normal tests in the normal way, in such a project. But at least some such problems would show up with what I've suggested.
So I'm not just answering my question by saying 'have a test suite'! I'm suggesting something different from what I've seen normally done with test suites, that can (and does, I now use this approach) help with the problem my question is about.

What is the definition of "program" according to C#?

According to the C# spec 5.0 (sec 1.2)
The key organizational concepts in C# are programs, namespaces, types, members, and assemblies. C# programs consist of one or more source files. Programs declare types, which contain members and can be organized into namespaces. Classes and interfaces are examples of types. Fields, methods, properties, and events are examples of members. When C# programs are compiled, they are physically packaged into assemblies. Assemblies typically have the file extension .exe or .dll, depending on whether they implement applications or libraries.
But they never explain what a program is! Is a program the set of all source files that are used to create a single assembly? Or might a program be made up of several assemblies?
It matters because later the "program" is used to define other concepts, such as internal accessibility.
To Clarify: I'm asking, within the context of the C# 5.0 Specification, what is a "program"?

Either or. Based on the above definition, a program is a concept rather than a physically defined boundary. As such, it could be a single dll or a large collection of assemblies.

Basically, a program is a set of instructions that could be executed in a computer to perform a specific task. It could be a collection of assemblies or a single assembly to perform a task. It doesn't necessarily be a complete solution. I.e. it's not an application. Some may argue that they are same. But, there's a difference. They are not synonyms.
To understand what a program is you should understand the difference between a program and an application. There's a distinction between an application and a program.
A computer program, or just a program, is a sequence of instructions,
written to perform a specified task with a computer. - Wikipedia
On the other hand, an application is a set of programs working together to solve a complex problem.
Application software is a set of one or more programs designed to carry out operations for a specific application. - Wikipedia
E.g. To solve a specific business problem you might need an application, which internally performs multiple tasks to solve that problem.

John, I think the whole confusion is due to historical and terminological inconsistencies. In the beginning of computer era, only programs existed. Everybody understood a computer program as ANY SET OF INSTRUCTION for computer with defined functionality. It did not matter whether it was written on a paper, punched on punchcards, or recorded on a tape. Then new generations of software developers, languages, frameworks, etc. produced new terms, some of which had the same meaning, but carried different names. I can point to the similar story with "function" (C, C++) vs "method" (C#). In general, there is no point in answering to your question - it will not produce any practical results.

Thanks for the input. Apologies, if my question was unclear. I meant to ask: Within the context of the C# 5.0 Specification, what is a "program"? Rereading section 3.5.2 carefully, now I think it is clear that a "program" is intended to mean the set of source files that are used to create a single assembly (exe or dll). I do not think the specification is ambiguous. Throughout the entire specification "program" always has this definition. But it is unfortunately never actually defined. Probably it will be improved in C# 6.0.

Why are Visual Studio projects restricted to a single language?

This is question is inspired by the question: In what areas does F# make "absolute no sense in using"?
In theory, it should be possible to use any .NET supported language in a single project. Since every thing should be compiled into IL code, then linked into a single assembly.
Some benefits would include the ability to use say F# for one class, where F# is more suited to implement it's function, and C# for another.
Is there some technical limitation I'm overlooking that prevents this sort of setup?

A project is restricted to a single language because, under the hood, a project is not much more than an MSBuild script which calls one of the command-line compilers to produce an assembly from the various source code files "contained" in the project folder. There is a different compiler for each language (CSC.exe is for example the one for C#), and what each project has to do to turn its "contained" source code into an assembly differs with each language.
To allow multiple languages to be compiled into a single assembly, the project would basically have to produce an assembly for each language, then IL-Merge them. This is costly, requires complex automation and project file code generation, and in most circumstances it's a pretty fringe need, so the VS team simply didn't build it in.

While projects are restricted to a single language, a solution is not... and solutions can contain multiple projects.

As others mentioned, a project is a stand-alone unit that is compiled by a single compiler.
I hear the question about including e.g. one F# type in a larger C# project quite often, so I'll add some details from the F# specific point of view. There are quite a few technical problems that would make it really difficult to mix F# and C# in one project:
C# compiler sees all classes, while F# type declarations are order-dependent. I'm not sure how would you even specify what types should the F# code see at which point.
F# compiler needs to know how declarations are used in order to infer types. Would it also get usage information from the C# compiler?
The two compilers use different representation of types. When compiling, there are no System.Type information, so how would they share the information? (Both of them would need to agree on some common interface that allows them to include language-specific information - and the information may be also incomplete).
I think this is enough to explain that doing this is not just a feature that may or may not be done depending on the schedule. It is actually an interesting research problem.

For what it's worth, it's possible to have ASP.NET projects that use C# and VB.NET (or anything else, you define the compilers in web.config), just in different files.

All code files are processed by a single compiler. That's why a project can only contain a single language.
Mixing languages may not make much sense either, since each language generates it's own IL.
This of course doesn't restrict you form having multiple projects from different langauges in the same solution, since each project is compiled independently

Consider using ILMerge if you want to maintain a single .exe or .dll built by a number of different compilers.

Technically, you can mix languages in a single project, if one (or more) of those languages are scripting languages. See How to use Microsoft.Scripting.Hosting? for more details.
I know this isn't what you were talking about, but it's a little fun fact if you weren't aware.

The project file is nothing but an elevated list of command line parameters to the relevant compiler. A file with the extension of .csproj contains the parameters for a C# compiler, .vbproj for the VB.NET compiler and so on.
You can however create two or more projects in the same solution file, one for each language and then link them together in one exe file using ILMerge.

2 basic but interesting questions about .NET

when I first saw C#, I thought this must be some joke. I was starting with programming in C. But in C# you could just drag and drop objects, and just write event code to them. It was so simple.
Now, I still like C the most, because I am very attracted to the basic low level operations, and C is just next level of assembler, with few basic routines, so I like it very much. Even more because I write little apps for micro-controllers.
But yesterday I wrote very simple control program for my micro-controller based LED cube in asm, and I needed some way to simply create animation sequences to the Cube. So, I remembered C#. I have practically NO C# skills, but still I created simple program to make animation sequences in about hour with GUI, just with help of google and help of the embedded function descriptions in C#.
So, to get to the point, is there some other reason then top speed, to use any other language than C#? I mean, it is so effective. I know that Java is a bit of similar, but I expect C# to be more Windows effective since its directly from Microsoft.
The second question is, what is the advantage of compiling into CIL, and than run by CLR, than directly compile it into machine code? I know that portability is one, but since C# is mainly for Windows, wouldn´t it be more powerful to just compile it directly? Thanks.

1 - diff languages have their pros and cons. There are families of languages (functional, dynamic, static, etc.) which are better for specific problem domains. You'd need to learn one in each family to know when to choose which one. e.g. to write a simple script, I'd pick Ruby over C#
2 - Compiling it to CIL: Portability may not be a big deal.. but to be precise Mono has an implementation of the CLR on Linux. So there. Also CIL helps you to mix-and-match across languages that run on the CLR. e.g. IronRuby can access standard framework libraries written in C#. It also enables the CLR to leverage the actual hardware (e.g. turn on optimizations, use specific instructions) on which the program is run. The CLR on 2 machines would produce the best native code from the same IL for the respective machine.

Language and platform choice are a function of project goal. It sounds like you enjoy system level programming, which is one of the strong points of using C/C++. So, keep writing systems level code if that's what you enjoy.
Writing in C# is strong in rapid business application development where the goals are inherently different. Writing good working code faster is worth money in both man-hours and time to market. Microsoft does us a huge favor with providing an expressive language and a solid framework of functionality that prevents us from having to write low level code or tooling for 95% of business needs.

One important advantage of IL is language independance. You can define modules in project which should be done in C++, some in C# and some in VB.net. All these projects when compiled give respective assemblies(.dll/.exe). This you can use the assembly for C++ project in the c# one and vice versa. This is possible because.. no matter which language (.net supported) you choose.. all compile to the same IL code.

I'm not sure that C# is more effective only because is a Microsoft product. If you use the Visual Studio, or other RAD, some of the code is auto-generated and sometimes is less efficient. Some years ago I was a dogmatic, thinking only C can response all our prayers :-P , but now I think virtual machines can help a lot in the way to optimize code before to execute it (like a RDBMS), storing in caché pieces of code to execute later, etc. Including the possibility to create "clusters" of virtual machines as Terracotta does. At least the benefits of having an extra abstraction layer are bigger that don't have it.

I agree with spoulson. C# is really good at solving business problems. You can very effective create a framework that models your business processes and solve many of those problems with object orientation and design patterns. In that respect it provides much of the nice object oriented capability that C++ has.
If you are concerned with speed, C is the route to go for the reasons that you stated.

Further on the second question: you can run NGEN to generate a native image of the assembly, which can improve performance. Not quite machine code, but since it bypasses the JIT (just-in-time compile) phase, the app will tend to run much faster.
http://msdn.microsoft.com/en-us/library/6t9t5wcf(VS.80).aspx
The Native Image Generator (Ngen.exe)
is a tool that improves the
performance of managed applications.
Ngen.exe creates native images, which
are files containing compiled
processor-specific machine code, and
installs them into the native image
cache on the local computer. The
runtime can use native images from the
cache instead of using the
just-in-time (JIT) compiler to compile
the original assembly.

"is there some other reason then top
speed, to use any other language than
C#?"
I can think of at least four, all somewhat related:
I have a a large current investment in 'language X', and I don't have the time or money to switch to something else. (Port an existing code base, buy/acquire/port libraries, re-develop team skills in C#, learn different tools.)
An anticipated need to port the code to a platform where C# is not supported.
I need to use tools that are not available in C#, or are not as well supported. (IDE's, alternate compilers, code generators, libraries, the list goes on and on...)
I've found a language that's even more productive. ;-)
"what is the advantage of compiling
into CIL, and than run by CLR, than
directly compile it into machine
code?"
It's all about giving the runtime environment more control over the way the code executes. If you compile to machien code, a lot becomes 'set in stone' at that time. Deferring compilation to machine code until you know more about the runtime environment lets you optimize in ways you might not be able to otherwise. Just a few off the top of my head:
Deferring compilation lets you select instructions that more closely match your host CPU. (To use 64-bit native instructions when you have them, or the latest SSE extensions.)
Deferring code lets you optimize in ways you might not be able to otherwise. (If you have only one class at runtime that's derived from a specific interface, you can start to inline even virtual methods, etc.)
Garbage collectors sometimes need to insert checkpoints into user code. Deferring compilation lets the GC have more control and flexibility over how that's done.

First answer: C# should be used by default for new projects. There are a few cases where it hasn't caught up yet to C++ (in terms of multi-paradign support), but it is heading in that direction.
Second answer: "portability" also includes x86 / x64 portability, which can be achieved by setting the platform to AnyCPU. Another (more theoretical at this point) advantage is that the JIT compiler can take advantage of the CPU-specific instruction set and thus optimize more effectively.

Hide c# windows application source code

I wrote a windows application using C# .Net 2.0 and i want to do something which hide the source code, so when any one use refactor tool can't see the source code.
I used dotfuscator but it just changed the function names but not all the source code.
UPDATE:
I want to hide the source code, not because of hiding the key, but to hide how the code is working.
Thanks,

IL is by definition very expressive in terms of what remains in the body; you'll just have to either:
find a better (read: more expensive) obfuscator
keep the key source under your control (for example, via a web-service, so key logic is never at the client).

Well, the source code is yours and unless you explicitly provide it, youll perobably only be providing compiled binaries.
Now, these compiled binaries are IL code. To prevent someone "decompiling" and reverse engineering your IL code back to source code, you'll need to obfuscate the IL code. This is done with a code obfuscator. There are many in the marketplace.
You've already done this with dotfuscator, however, you say that it only changed the function names, not all the source code. It sounds like you're using the dotfuscator edition that comes with Visual Studio. This is effectively the "community edition" and only contains a subset of the functionality of the "professional edition". Please see this link for a comparison matrix of the features of the community edition and the professional edition.
If you want more obfuscation of your code (specifically to protect against people using tools such as Reflector), you'll need the professional edition of Dotfuscator, or another code obfuscator product that contains similar functionality.

As soon as people get a hand on your binaries they can reverse-engineer it. It’s easier with languages that are compiled to bytecode (C# and Java) and it’s harder with languages that are compiled to CPU-specific binaries but it’s always possible. Face it.

Try SmartAssembly
http://www.smartassembly.com/index.aspx

There are limits to the lengths obfuscation software can go to to hide the contents of methods, fundamentally changing the internals without affecting the correctness (and certainly performance) is extremely hard.
It is notable that code with many small methods tends to become far harder to understand once obfuscated, especially when techniques for sharing names between methods that would appear to collide to the eye but not to the runtime are employed.
Some obfuscators allow the generation of constructs which are not representable in any of the target languages, the set of all operations allowable in CIL for example is way more than that expressible through c# or even C++/CLI. However this often requires an explicit setting to enable (since it can cause problems). This can cause decompilers to fail, but some will just do their best and work around it (perhaps inlining the il it cannot handle).
If you distribute the pdb's with the app then even more can inferred due to the additional symbols.

Just symbol renaming is not enough of a hindrance to reverse-engineering your app. You also need control flow obfuscation, string encryption, resource protection, meta data reduction, anti-reflector defenses, etc, etc. Try Crypto Obfuscator which supports all this and more.

Create a setup project for your application and install the setup on your friends computer like a software. There are 5 steps to creating the setup project using microsoft visual studio.
Step 1: Create a Sample .Net Project. I have named this project as "TestProject" after that build your project in release mode.
Step 2: Add New Project using right click on your solution and select setup project and give the name this as "TestSetup".
Step 3: Right click on setup project and Add primary Output and select your project displayed.
Step 4: Right Click the setup project and select View-> File System -> Application Folder. Now copy what you want to be in installation folder.
Step 5: Now go to our project folder and open the release folder you can get the setup.exe file here. Double click on the "TestSetup" file and install your project to your and other computer.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.