Do C# optimizers perform copy elision? - c#

I'm looking to optimize some C# code and I'm curious if my optimizer is going to perform any copy elision, and if so which kinds. I haven't been able to find any information on this by googling. If not I may want to switch my methods to take struct arguments by reference. The environment I care about is Unity 3D, which is odd as it uses either the Mono 2.0 Runtime or Unity's IL2CPP transpiler (which I should probably ask them about directly as it's closed-source.) But it would be interesting to know for Microsoft's optimizer as well, and if this type of optimization is generally allowed by the standard.
Side note: If this is not supported by the optimizer, it would be awfully nice if I could get the equivalent of the C++ const ref, but it appears this doesn't exist.

I can speak for IL2CPP, and say that it does not do anything to take struct arguments by reference. Even without access to the IL2CPP source code, you can see this by inspecting the generated C++ code.
Note that a C# struct is represented by a C++ struct, and that C++ struct is passed by value. We've discussed the possibility of using const references in this case, but we've not implemented yet (and we may not ever).

Related

How do I Import a struct from a C++ dll?

How can I create a struct in C++ put it in a dll and use from C# code? I'm developing an application using C# and C++ where each process talk to each one other using named-pipes and I'd like to share data between via structs (pass raw bytes to process and then cast it to struct type) but rather than define two structs one in C++ and another in C# with same memory alignment, members names and such (which is very error prone, if I update the C++ one and forget the C#'s one) I'd like to creare only one so that there's only one place to change. My idea is (if even possible, I think it isn't not possible due P/invoke limitations) define this struct in a dll written in C++ and just use it from my C# application. This is just my idea; any different approach to solve this is very welcome too.
You can't. Structs and classes are not part of the exposed ABI, the DLL doesn't define them in any way. That's part of the reason for DLL hell in the C/C++ world, your function definitions state the name of the struct they take, but the details of the implementation of that struct aren't actually exposed or defined anywhere in the DLL itself.
C and C++ rely on the use of header files to inform dependent projects of the layout of structs and classes, as that information isn't exposed via the DLL or lib file. Since it's not possible in native code, it's going to be doubly not possible (or, pedantically, just as impossible) in managed code.
Some alternatives I'd recommend looking into if über performance is not a constraint would be some sort of serialization library. You can get some crazy things done with something like Google's protobuf, eliminating p/invoke and compatibility concerns, etc. if you want to focus on rapid development and consistency. There are also ways to generate C# and C/C++ source code with the relevant structures via a script ranging from a hack-it-yourself C/Python/Perl script to generate a struct for C and C# to complete projects that focus on creating source code from language-agnostic struct definitions, but that's probably outside the scope of this answer.

Disable generics in c# for Xamarin

I have a C# app that gets ahead of time compiled to the native iOS code using monotouch (Xamarin)
Some of the libraries I link in use generics. However, it turns out that this method of compiling causes significant code bloat because it uses the C++ style of template code generation generating functions for List<int>, List<string> etc.
What I want is the Java style of generics where generics are used for compile time checking but at runtime the code only contains functions for List and not for each of the templated types.
Note: This is not an issue with using C# in the .Net CLR, as explained here. The issue arises because code is compiled AOT to the native binary instead of intermediate language. Moreover runtime type checking for generic methods is fairly useless since the binary is native.
Question: How do I disable generics, i.e. replace all occurrences of List<T> with List, during compilation? Is this even possible?
This is not possible.
On Java it's possible because they don't have value types, only classes (you can emulate this behavior by not using value types yourself, only use List<object> (or an object subclass), in which case the AOT compiler will only generate one instantiation of List).
You're also not entirely correct saying that it's not an issue with the .NET CLR; the difference between the .NET CLR and Xamarin's AOT compiler is that the AOT compiler can't wait until execution time to determine if a particular instantiation is needed or not (because iOS doesn't allow executable code to generated on the device), it needs to make sure every possible instantiation is available. If your app on the .NET CLR happened to need every possible generic instantiation at runtime, then you'd have a similar problem (only it would show up as runtime memory usage, not executable size, and on a desktop that's usually not a problem anyway).
The supported way of solving your problem is to enable the managed linker for all assemblies (in the project's iOS Build options, set "Linker behavior" to "All assemblies"). This will remove all the managed code from your app you're not using, which will in most cases significantly reduce the app size.
You can find more information about the managed linker here: http://developer.xamarin.com/guides/ios/advanced_topics/linker/
If your app is still too big, please file a bug (http://bugzilla.xamarin.com) attaching your project, and we'll have a look and see if we can improve the AOT compiler somehow (for instance we already optimize value types with the same size, so List<int> and List<uint> generate only one instantiation).

Decompiled DLL - Clues to help tell whether it was C# or VB.NET?

When using something like DotPeek to decompile a DLL, how do I tell whether it was originally coded in VB.Net or C#?
I gather there's no easy way to tell, but that there may be tell-tale signs (ie. clues) in some of the decompiled code?
You can look for a reference to the Microsoft.VisualBasic library. If that is present, it's very probable that the code was made using VB. The library is sometimes included in C# projects also, but that is not very common. If the reference is not there, it's certainly not VB.
(Well, it's possible to compile VB without the library using the command line compiler and special compiler switches, but that is extremely rare.)
You can also check how frequently the VisualBasic library is used. In a regular VB program it would be used often, but in a C# program it would typically only be used for some specific task that isn't available in other libraries, like a DateDiff call.
Any VB specific commands, like CInt or Mid will show up as calls to the VisualBasic library, and even the = operator when used on strings, will use the library. This code (where a and b are strings):
If a = b Then
will actually make a library call to do the comparison, and shows up like this when decompiled as C#:
if (Operators.CompareString(a, b, false) == 0) {
One posible route might be to look for Named Indexers; It isn't allowed in C# i.e. you can only have the following in c#
object this [int index] {get;set;}
but in managed C++ and VB.Net (I believe, will delete this if I'm wrong) it appears you can have named indexers.
So at least you could narrow it down to whether or not it was C#
For completeness, I'll post the clue that I'm aware of:
If you decompile to C# and find invalid member names starting with $static$:
private short $STATIC$Report_Print$20211C1280B1$nHeight;
... that means it was probably VB.Net, because the compiler uses those to implement the 'Static' VB keyword.
Hans Passant and Jon Skeet explain it better over here: https://stackoverflow.com/a/7311567/22194 https://stackoverflow.com/a/7310497/22194
I'm surprised noone has mentioned the My namespace yet. It is very hard to get the VB.NET compiler to not include some of its helper classes in the output.
how do I tell whether it was originally coded in VB.Net or C#?
You can't tell that in a reliable manner. Well of course IL compiled with the VB.NET compiler will include references to some VB specific assemblies (such as Microsoft.VisualBasic), but there's nothing preventing a C# project also reference and use those assemblies.
To build on the ideas introduced in the other answers, the assembly does not report what language was used to write it, but you may look for non-cls compliant code
Being CLS compliant means that the code is written against features available to all CLS compliant languages. Which means that there are no public nested classes or named indexers and probably a number of other features that IL may support but any particular language may not.
If it is an option, you could probably just look at the PDBs.

Why does a function need to be declared before it's defined or used?

In C its optional. In C++ one "MUST" declare a function before its used/defined. Why is it so? Whats the need? We don't do that in C# or Java.
Funny thing is while we are defining a function. The definition itself has a declaration even then, we need to declare. God knows why?
Funny that you mention that, just this week Eric Lippert wrote a blog post related to your question :
http://blogs.msdn.com/ericlippert/archive/2010/02/04/how-many-passes.aspx
Basically, this is related to how the compiler works. The C# and Java compilers make several passes. If they encounter a call to a method that is not yet known, that's not an error, because the definition might be found later and the call will be resolved at the next pass. Note that my explanation is overly simplistic, I suggest you read Eric Lippert's post for a more complete answer...
Java and C# specify both the language and the binary object file format, and they are multi-pass compilers.
As a result, they are able to peek at later definitions or those that were compiled separately.
C doesn't work this way for several reasons:
Without using managed code it is a lot harder to define a machine-independent object format with type information
C deliberately allows bypassing the type mechanisms
When originally defined, there generally wasn't enough memory to run sophisticated compilers, nor were there prototypes to read anyway
C programs must be arbitrarily large with system-specific library and search path mechanisms. All of this gets in the way of defining an object-module-based type system
Part of the C portability and interoperation basis is the "input language only" nature of the specification
Until recently, even the limited one-pass nature of C was still barely practical for large programs. Something like Java or C# would have been out of the question: you could take a vacation and your make(1) would still not be done
Basically, it's down to how you write the compiler for the language.
In C++, the decision has been to make a one pass compilation possible. To do that, you (or rather the compiler) need to be able to first read the declaration of all classes, methods and the like and then read the implementation (or in C++ terms, the definition). In Java and C#, the compiler first reads through all the code generating what corresponds to what the C++ compiler generates when reading the header files. The C#/Java compiler then reads the implementation (aka definition). So, in C++, the developer is asked to write the declaration whereas in C#, the compiler runs through the code multiple times doing the declaration work for the developer.
As an aside, other languages used to ask you to write the functions in the order you needed them (if function B uses function A, you have to define A first). Most of those languages had constructs to allow you to get around this. In (Turbo) Pascal, the solution was, in a kind, the same as in C++.
C++ vs. Java/C# - Single-pass compiler (C++) vs. multi-pass compiler (Java & C#). Multiple passes allow Java and C# compilers to peek at future types and functions prototypes.
C++ vs. C - The C feature to have default declaration is basically a bug, fixed in C++. It causes problems, and it is an enabled warning for gcc. In C++ the arguments form part of the function exported name (name-mangling), so must be known before the correct function can be called.
In C++ one "MUST" declare a function before its used/defined. Why is it so? Whats the need? We don't do that in C# or Java.
I would like to say, that is not true. Yes, in C++ you have to define a function signature (prototype), before referring to it. But you may leave the implementation for a later time.
In Java that does not work: you cannot call the method of some class without having that class compiled (note: together with implementation) and available in javac classpath. So, Java is more strict in this sense.

What makes the Java compiler so fast?

I was wondering about what makes the primary Java compiler (javac by sun) so fast at compilation?
..as well as the C# .NET compiler from Microsoft.
I am comparing them with C++ compilers (such as G++), so maybe my question should have been, what makes C++ compilers so slow :)
That question was nicely answered in this one: Why does C++ compilation take so long? (as jalf pointed out in the comments section)
Basically it's the missing modules concept of C++, and the aggressive optimization done by the compiler.
I think the most difficult part is not the need to compile the header files (unless they are really big, but you can use precompiled headers in that case). The worst part is always the fact that C++'s grammar is too wildly context-sensitive. Despite the fact I like C++, I feel sorry for anybody who has to write a C++ parser.
There are a couple of things that make the C++ compiler slower than those of Java/C#. The grammar is much more complex, generic programming support is much more powerfull in C++, but at the same time it is more expensive to compile. Inclusion of files work in a different way than importing modules.
Inclussion of header files
First, whenever you include a file in C++ the contents of the file (.h usually) are injected in the current compilation unit (include guards avoid reinjecting the same header twice), and this is transitive. That is, if you include header a.h, that in turns includes b.h, your compilation unit will include all code in a.h and all code in b.h.
Java (or C#, I will talk about Java, but they are similar in this) don't have include files, they depend on the binaries from the compilation of the used classes. This means that whenever you compile a.java that uses an object B defined in b.java, it just checks the binary b.class, it does not need to go deeper to check the dependencies of B, so it can cut the process earlier (with just one level of checking).
At the same time, including files only includes the language definitions, and processing it requires time. When the Java/C# compiler reads a binary it has the same information but already processed by the compilation step that generated it.
So at the end, in C/C++ more files are included and at the same time processing of those includes is more expensive than processing of binary modules.
Templates
Templates are special in their own way. They can be precompiled, but they are usually not (for a good set of reasons). This means that in all compilation units that use std::vector the whole set of vector methods used (unused template methods don't get compiled) is processed and the binary code generated by the compiler. At a later step, during linking, redundant definitions of the same method will get dropped, but during compilation they must be processed.
Support in Java for generics is more limited in many ways. At the end, for example, there is only one Vector class binary, and whenever the compiler sees Vector in java what it does is generating type checking code before delegating to the real Vector implementation (that stores plain Object) and that is not generic. The compiler does provide the type warranties, but does not compile Vector for each type.
In C# it is, once again, different. C# support for generics is more complex than that of Java, and at the end generic classes are different than plain classes but anyway they get compiled only once as the binary format has all required information.
Because they do something quite different, C++ compiler produces optimized native code whereas C#, VB .Net and Java compiler produce an intermidiate language than when you first execute the application is turned into native code, and that is why you get slow loading of application in Java etc. the first time you execute the application.
The C++ compiler has to do the full optimization where the JITed languages optimize when you execute the application.
Someone would argue that you have to measure C++ compile time = Java compile time + time for JITing the first time you load the application if you want to be correct, but i don't think that would be right and fair because you are comparing native languages to JITed, or else oranges to apples.
The C++ compiler must repeatedly compile all the header files and there are lots of them, so this is one thing that slows it down.
One of the more time consuming tasks when compiling is code optimization.
Javac does very little optimization on the code when doing the compilation. Optimization is instead done by the JVM when running the application.
A C/C++ needs to be optimized when compiling since optimization of compiled machine code is hard.
You got it right in your last sentence: it's not java or C# that's fast to compile, it's C++ that is exceptionally slow to compile, due to its complex grammar and features, most importantly templates
If you think javac is fast try Jikes.... (see http://jikes.sourceforge.net/)
It is a Java Compiler written in C++. Unfortunately they haven't kept up with the latest Java Compiler specs but if you want to see fast this is it.
Tony
I think part of it is the complexity of the languages. C++ is incredibly mutable, with the ability to override pretty much any operator or piece of syntax (like overriding the () operator). This means the compiler has to do a lot more work just to determine what operations to actually run, even for simple things. Java and C# don't have this issue, as the syntax is fixed, and they're generally much simpler to parse.
It's a bit difficult comparing bytecode languages like java with natively compiled languages like C++. A better comparison is Delphi vs C++, where Delphi is much faster to compile. Since this has nothing to do with optimization or byte code, it must be due to differences in language syntax and the relative performance of includes vs. modules/units.
Is Java compiler fast?
The Java to class translation shall be blindingly fast since it is just a glorified zip with some syntax checking so to be fair if compared to a real compiler that is doing optimization and object code generation the "translation" from Java to class is trivial.
Did a comparison with fairly small program "hello world" and-and compare to GCC (C/C++/Ada) and found that javac was 30 times slower, and it got even worse in runtime?

Categories

Resources