What makes the Java compiler so fast?

What makes the Java compiler so fast? - c#

I was wondering about what makes the primary Java compiler (javac by sun) so fast at compilation?
..as well as the C# .NET compiler from Microsoft.
I am comparing them with C++ compilers (such as G++), so maybe my question should have been, what makes C++ compilers so slow :)

That question was nicely answered in this one: Why does C++ compilation take so long? (as jalf pointed out in the comments section)
Basically it's the missing modules concept of C++, and the aggressive optimization done by the compiler.

I think the most difficult part is not the need to compile the header files (unless they are really big, but you can use precompiled headers in that case). The worst part is always the fact that C++'s grammar is too wildly context-sensitive. Despite the fact I like C++, I feel sorry for anybody who has to write a C++ parser.

There are a couple of things that make the C++ compiler slower than those of Java/C#. The grammar is much more complex, generic programming support is much more powerfull in C++, but at the same time it is more expensive to compile. Inclusion of files work in a different way than importing modules.
Inclussion of header files
First, whenever you include a file in C++ the contents of the file (.h usually) are injected in the current compilation unit (include guards avoid reinjecting the same header twice), and this is transitive. That is, if you include header a.h, that in turns includes b.h, your compilation unit will include all code in a.h and all code in b.h.
Java (or C#, I will talk about Java, but they are similar in this) don't have include files, they depend on the binaries from the compilation of the used classes. This means that whenever you compile a.java that uses an object B defined in b.java, it just checks the binary b.class, it does not need to go deeper to check the dependencies of B, so it can cut the process earlier (with just one level of checking).
At the same time, including files only includes the language definitions, and processing it requires time. When the Java/C# compiler reads a binary it has the same information but already processed by the compilation step that generated it.
So at the end, in C/C++ more files are included and at the same time processing of those includes is more expensive than processing of binary modules.
Templates
Templates are special in their own way. They can be precompiled, but they are usually not (for a good set of reasons). This means that in all compilation units that use std::vector the whole set of vector methods used (unused template methods don't get compiled) is processed and the binary code generated by the compiler. At a later step, during linking, redundant definitions of the same method will get dropped, but during compilation they must be processed.
Support in Java for generics is more limited in many ways. At the end, for example, there is only one Vector class binary, and whenever the compiler sees Vector in java what it does is generating type checking code before delegating to the real Vector implementation (that stores plain Object) and that is not generic. The compiler does provide the type warranties, but does not compile Vector for each type.
In C# it is, once again, different. C# support for generics is more complex than that of Java, and at the end generic classes are different than plain classes but anyway they get compiled only once as the binary format has all required information.

Because they do something quite different, C++ compiler produces optimized native code whereas C#, VB .Net and Java compiler produce an intermidiate language than when you first execute the application is turned into native code, and that is why you get slow loading of application in Java etc. the first time you execute the application.
The C++ compiler has to do the full optimization where the JITed languages optimize when you execute the application.
Someone would argue that you have to measure C++ compile time = Java compile time + time for JITing the first time you load the application if you want to be correct, but i don't think that would be right and fair because you are comparing native languages to JITed, or else oranges to apples.

The C++ compiler must repeatedly compile all the header files and there are lots of them, so this is one thing that slows it down.

One of the more time consuming tasks when compiling is code optimization.
Javac does very little optimization on the code when doing the compilation. Optimization is instead done by the JVM when running the application.
A C/C++ needs to be optimized when compiling since optimization of compiled machine code is hard.

You got it right in your last sentence: it's not java or C# that's fast to compile, it's C++ that is exceptionally slow to compile, due to its complex grammar and features, most importantly templates

If you think javac is fast try Jikes.... (see http://jikes.sourceforge.net/)
It is a Java Compiler written in C++. Unfortunately they haven't kept up with the latest Java Compiler specs but if you want to see fast this is it.
Tony

I think part of it is the complexity of the languages. C++ is incredibly mutable, with the ability to override pretty much any operator or piece of syntax (like overriding the () operator). This means the compiler has to do a lot more work just to determine what operations to actually run, even for simple things. Java and C# don't have this issue, as the syntax is fixed, and they're generally much simpler to parse.

It's a bit difficult comparing bytecode languages like java with natively compiled languages like C++. A better comparison is Delphi vs C++, where Delphi is much faster to compile. Since this has nothing to do with optimization or byte code, it must be due to differences in language syntax and the relative performance of includes vs. modules/units.

Is Java compiler fast?
The Java to class translation shall be blindingly fast since it is just a glorified zip with some syntax checking so to be fair if compared to a real compiler that is doing optimization and object code generation the "translation" from Java to class is trivial.
Did a comparison with fairly small program "hello world" and-and compare to GCC (C/C++/Ada) and found that javac was 30 times slower, and it got even worse in runtime?

Related

Are all languages used within .net Equally performant?

I know the "Sales pitch" answer is yes to this question, but is it technically true.
The Common Language Runtime (CLR) is designed as an intermediate language based on Imperative Programming (IP), but this has obvious implications when dealing with Declarative Programming (DP).
So how efficient is a language based on a different paradigm than the Imperative Style when implemented in the CLR?
I also get the feeling that the step to DP would incur an extra level of abstraction that might not model at all performant, would this be a fair comment?
I have done some simple tests using F# and it all looks great, but am I missing something if the programs get more complex?

There is no guarantee that languages produce the same IL for equivalent code, so I can safely say that there is no guarantee that all .NET languages are equally performant.
However, if they produce the same IL output, then there is no difference.

First of all, the wide range of languages on the .NET platform definitely contains languages that generate code with different performance, so not all languages are equally performant. They all compile to the same intermediate language (IL), but the generated code may be different, some languages may rely on Reflection or dynamic language runtime (DLR) etc.
However, it is true that the BCL (and other libraries used by the languages) will have the same performance regardless of what language do you call them from - this means that if you use some library that does expensive calculations or rendering without doing complex calculations yourself, it doesn't really matter which language you use to call it.
I think the best way to think about the problem is not to think about languages, but about different features and styles of programming available in those languages. The following lists some of them:
Unsafe code: You can use unsafe code in C++/CLI and to some point also in C#. This is probably the most efficent way to write certain operations, but you loose some safety guarantees.
Statically typed, imperative: This is the usual style of programming in C# and VB.Net, but you can also use imperative style from F#. Notably, many tail-recursive functions are compiled to statically typed, imperative IL code, so this also applies to some F# functions
Statically typed, functional: This is used by most F# programs. The generated code is largely different than what imperative category uses, but it is still statically typed, so there is no significant performance loss. Comparing imperative and functional is somewhat difficult as the optimal implementation looks quite different in both of the versions.
Dynamically typed: Languages like IronPython and IronRuby use dynamic language runtime, which implements dynamic method calls etc. This is somewhat slower than statically typed code (but DLR is optimized in many ways). Note that code written using C# 4.0 dynamic also falls into this category.
There are many other languages that may not fall into any of these categoires, however I believe that the above list covers most of the common cases (and definitely covers all Microsoft languages).

I'm sure that there are scenarios where idiomatic code is slightly more performant when written in one .NET language than another. However, stepping back a bit, why does it matter? Do you have a performance target in mind? Even within a single language there are often choices which you can make that affect performance, and you will sometimes need to trade performance off against maintainability or development time. If you don't have a target for what constitutes acceptable performance, then it's impossible to evaluate whether any performance differences between languages are meaningful or negligible.
Additionally, compilers evolve, so what's true of the relative performance today won't necessarily hold going forward. And the JIT compiler is evolving too. Even processor designs are variable and evolving, so the same JITTed native code can perform differently across processors with different cache hierarchies, pipeline sizes, branch prediction, etc.
Having said all of that, there are probably a few broad rules that largely hold true:
Algorithm differences are probably going to make a bigger difference than compiler differences (at least when comparing statically typed languages running on the CLR)
For problems which can be parallelized easily, languages which make it easy to take advantage of multiple processors/cores will provide a simple way to speed up your code.

At the end of the day, all programming languages are compiled into the native machine code of the CPU they're running on, so the same questions could be asked of any language at all (not just ones that compile to MSIL).
For languages that are essentially just syntatic variants of each other (e.g. C# vs. VB.NET) then I wouldn't expect there to be much difference. But if the languages are too divergent (e.g. C# vs. F#) then you can't really make a valid comparison because you can't really write two "equivalent" non-trivial code samples in both languages anyway.

The language can just be thought of as the "front-end" to the IL code, so the only difference between the languages is whether the compiler will produce the same IL code or less/more efficient code.
From most of what I've read online it seems as though the managed C++ compiler does the best job of optimizing the IL code, although I haven't seen anything that shows a remarkable difference between the main languages C#/C++/VB.NET.
You could even try compiling the following into IL and take a look!?
F#
#light
open System
printfn "Hello, World!\n"
Console.ReadKey(true)
C#
// Hello1.cs
public class Hello1
{
public static void Main()
{
System.Console.WriteLine("Hello, World!");
System.Console.ReadKey(true);
}
}

Why does a function need to be declared before it's defined or used?

In C its optional. In C++ one "MUST" declare a function before its used/defined. Why is it so? Whats the need? We don't do that in C# or Java.
Funny thing is while we are defining a function. The definition itself has a declaration even then, we need to declare. God knows why?

Funny that you mention that, just this week Eric Lippert wrote a blog post related to your question :
http://blogs.msdn.com/ericlippert/archive/2010/02/04/how-many-passes.aspx
Basically, this is related to how the compiler works. The C# and Java compilers make several passes. If they encounter a call to a method that is not yet known, that's not an error, because the definition might be found later and the call will be resolved at the next pass. Note that my explanation is overly simplistic, I suggest you read Eric Lippert's post for a more complete answer...

Java and C# specify both the language and the binary object file format, and they are multi-pass compilers.
As a result, they are able to peek at later definitions or those that were compiled separately.
C doesn't work this way for several reasons:
Without using managed code it is a lot harder to define a machine-independent object format with type information
C deliberately allows bypassing the type mechanisms
When originally defined, there generally wasn't enough memory to run sophisticated compilers, nor were there prototypes to read anyway
C programs must be arbitrarily large with system-specific library and search path mechanisms. All of this gets in the way of defining an object-module-based type system
Part of the C portability and interoperation basis is the "input language only" nature of the specification
Until recently, even the limited one-pass nature of C was still barely practical for large programs. Something like Java or C# would have been out of the question: you could take a vacation and your make(1) would still not be done

Basically, it's down to how you write the compiler for the language.
In C++, the decision has been to make a one pass compilation possible. To do that, you (or rather the compiler) need to be able to first read the declaration of all classes, methods and the like and then read the implementation (or in C++ terms, the definition). In Java and C#, the compiler first reads through all the code generating what corresponds to what the C++ compiler generates when reading the header files. The C#/Java compiler then reads the implementation (aka definition). So, in C++, the developer is asked to write the declaration whereas in C#, the compiler runs through the code multiple times doing the declaration work for the developer.
As an aside, other languages used to ask you to write the functions in the order you needed them (if function B uses function A, you have to define A first). Most of those languages had constructs to allow you to get around this. In (Turbo) Pascal, the solution was, in a kind, the same as in C++.

C++ vs. Java/C# - Single-pass compiler (C++) vs. multi-pass compiler (Java & C#). Multiple passes allow Java and C# compilers to peek at future types and functions prototypes.
C++ vs. C - The C feature to have default declaration is basically a bug, fixed in C++. It causes problems, and it is an enabled warning for gcc. In C++ the arguments form part of the function exported name (name-mangling), so must be known before the correct function can be called.

In C++ one "MUST" declare a function before its used/defined. Why is it so? Whats the need? We don't do that in C# or Java.
I would like to say, that is not true. Yes, in C++ you have to define a function signature (prototype), before referring to it. But you may leave the implementation for a later time.
In Java that does not work: you cannot call the method of some class without having that class compiled (note: together with implementation) and available in javac classpath. So, Java is more strict in this sense.

translate C++/CLI into C#

How I can translate small C++/CLI project to c#

One roundabout, manual way would be to compile your C++/CLI project and open the output assembly in Reflector. Disassemble each class, have it convert the disassembled IL to C#, and save that code off.
As for an automatic way to do it, I can't think of any off the top of my head.
Those things being said, are you sure you really want to convert your project to C#? If your C++/CLI project uses any unmanaged code, you'll have a difficult time coming up with a purely managed equivalent. If the project is more or less composed of pure CLR code, and it was written in C++/CLI for the sake of being written in C++/CLI, I can understand wanting to convert it to C#. But if there was a reason for writing it in C++/CLI, you may want to keep it that way.

IMHO, line by line is the best way. I've ported several C++ style projects to a managed language and I've tried various approaches; translators, line by line, scripting, etc ... Over time I've found the most effective way is to do it line by line even though it seems like the slowest way at first.
Too much is lost in a translator. No translator is perfect and you end up spending a lot of time fixing up the translated code. Also, translated code as a rule is ugly and tends to be less readable than hand crafted code. So the result is a fixed up, not very pretty code base.
A couple of tips I have on line by line
Start by defining all of the leaf types
For every type that has a non-trivial (freeing memory) destructor, implement IDisposable
Turn on the FxCop rule that checks for lack of Dispose calls to catch all of the places use used stack based RAII and missed it
Pay very close attention to the uses of byref in C++.

I haven't tried it, but I just googled it and found this: http://code2code.net/
According t it, you shouldn't fully rely on the code it produces:
You accept that this page does only half the work.
Futher work on your part is required. In most cases, the translated code will not even compile.
Also, read this: Translate C++/CLI to C#

Is C# code faster than Visual Basic.NET code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Is C# code faster than Visual Basic.NET code, or that is a myth?

That is a myth. They compile down to the same CLR. However the compiler for the same routine may come out slightly differently in the CLR. So for certain routines some may be slightly better like (0.0000001%) faster in C# and vice versa for VB.NET, but they are both running off the same common runtime so they both are the same in performance where it counts.

The only reason that the same code in vb.Net might be slower than c# is that VB defaults to have checked arithmetic on and c# doesn't.
By default, arithmetic operations and overflows in Visual Basic are checked; in c#, they are not.
If you disable that then the resulting IL is likely to be identical. To test this take your code and run it through Reflector and you will see that it looks very similar if you switch from c# to vb.Net views.
It is possible that an optimization (or just difference in behaviour) in the c# compiler verses the vb.net compiler might lead to one slightly favouring the other. This is:
Unlikely to be significant
if it was it would be low hanging fruit to fix
Unlikely to happen.
c# and vb.net's abstract syntax trees are very close in structure. You could automatically transliterate a great deal of vb.Net into c# and vice versa. What is more the result would stand a good chance of looking idiomatic.
There are a few constructs in c# not in vb.net such as unsafe pointers. Where used they might provide some benefits but only if they were actually used, and used properly. If you are down to that sort of optimization you should be benchmarking appropriately.
Frankly if it makes a really big difference then the question should not be "Which of c#/vb.net should I use" you should instead be asking yourself why you don't move some code over to C++/CLI.
The only way I could think of that the different compilers could introduce serious, pervasive differences is if one chose to:
Implement tail calls in different places
These can make things faster or slower and certainly would affect things on deeply recursive functions. The 4.0 JIT compiler on all platforms will now respect all tail call instructions even if it has to do a lot of work to achieve it.
Implemented iterator blocks or anonymous lambdas significantly more efficiently.
I believe both compilers are about as efficient at a high level as they are going to get though in this regard. Both languages would require explicit support for the 'yield foreach' style available to f#'s sequence generators.
Boxed when it was not necessary, perhaps by not using the constrained opcode
I have never seen this happen but would love an example where it does.
Both the c# and vb.net compilers currently leave such optimization complexities as en-registering of variables, calling conventions, inlining and unrolling entirely up to the common JIT compiler in the CLR. This is likely to have far more of an impact on anything else (especially when the 32 bit and 64bit JIT's can now behave quite differently).

The framework is written in C# but that still does not tell about performance differences between C# or VB as everything is compiled to IL language which then actually executed (including JITted and so on).
The responsibility is on each specific language compiler that what kind of IL they produce based on source code. If other compiler produces better suited IL than other, then it could have performance difference. I don't know exactly that is there this kind of areas where they would cause drastically different IL, but I doubt the differences would still be huge.
Other aspect is completely then the C#'s ability to run unsafe code like using raw pointers etc that can give performance on special scenarios.

There might be a slight difference in the compiler optimization, but I'd say there is no noticeable difference. Both C# and VB.NET compile to Common Intermediate Language. In some cases, you may be able to get a significant performance gain in C# by using unsafe code, but under most circumstances I wouldn't recommend doing so. If you need something that performance critical, you shouldn't use C# either.
The myth probably started because of the huge difference in Visual Basic 6 performance compared to the average C++ application.

I was at a Microsoft conference and the MS employees stated that C# is up to 8% faster than VB.NET. So if this is a myth, it was started by the people that work at MS. If I can find the slides that state this I would post them, but this was when C# just came out. I think that even if it was true at one point in time, that the only reason for one to be faster than the other is how things are configured by default like ShuggyCoUk said.

It depends on what you're doing. I think there's no real difference between VB and C#. The both of them are .Net languages and they're compiled in IL.
More info? Read this:
http://devlicio.us/blogs/robert_dunaway/archive/2006/10/19/To-use-or-not-use-Microsoft.VisualBasic.dll-_2800_all-.NET-Languages-could-benefit_3F002900_.aspx

As usual the answer is that it depends... By itself, no, VB.Net is not slower than C#, at least nothing that you will notice. Yes, there will be slight differences in compiler optimization, but IL generated will be essentially the same.
However, VB.Net comes with a compatibility library for programmers used to VB6. I am remembering about those string methods like left, right, mid, old VB programmers would expect. Those string manipulation functions are slower. I'm not sure you would notice an impact, but depending on the intensity of their use, I'd bet the answer would be yes. Why are those methods slower than "native" .net string methods? Because they are less type-safe. Basically, you can throw almost anything at them and they will try to do what you want them to, just like in old VB6.
I am thinking about string manipulation, but if I think harder, I'm sure I'll remember about more methods thrown into that compatibility layer (I don't remember the assembly's name, but remember it is referenced by default in VB.Net) that would have a performance impact if used instead of their .net "native" equivalent.
So, if you keep on programming like you were in VB6, then you might notice an impact. If not, it's ok.

It is not really a myth. While C# and VB.Net both compile to IL, the actual instructions produced are likely to be different because 1. the compilers may have different optimisations and 2. the extra checks that VB.Net does by default e.g. arithmetic overflow. So in many cases the performance will be the same but in some cases C# will be faster. It's also possible VB.Net might be quicker in rare circumstances.

There are some small differences in the generated code that may make C# slightly faster in some situations. For example VB.NET has some extra code to clear the local variables in a method while C# doesn't.
However, those differences are barely measurable, and most code is so far from optimal that you are just starting in the wrong end by switching language to make the code run faster. You can take just about any CPU intensive code and rather easily make it twice as fast. Some code can be made 10 times faster, other code perhaps 10000 times faster. In that context the few percent that you may gain by using C# instead of VB.NET is not worth the effort.
On the other hand, learning C# may very well be an effective way of speeding up your code. Not because C# can produce faster code, but because you will get a better understanding of both C# and VB.NET, enabling you to write code that performs better in either language.
Edit:
The C# and VB.NET compilers are obviously developed more or less in sync. The speed difference between C# 1 and C# 2 is something like 30%, the difference between the parallel versions of C# and VB.NET is a lot less.

C# and VB.Net are both compiled by IL. Also C++ and F# are compiled by it. In fact, the four languages I mentioned are executed at the same speed. There isn't in these "a faster language": the unique difference is between auto garbage-collected languages (C#, VB.Net, F# etc.) and those aren't auto-collected (such as C++). This second group generally is slower, because rarely the developer know how and when collect the garbage in the heap memory, however, if you are informed about the heap memory, the program could result faster if in C++. Hard graphics programs are usually made in C++ (such as the most of Adobe programs). You can also manually collect the garbage in C# (System.GC.Collect();) and in VB.Net (System.GC.Collect).
I know this answer is not fully inherent to the question, but I want to offer you many ways and choices. You choose the right way for your programs.

Developing Math libraries

I am looking to create a custom math library for the project I am working on. The project is written in C#, and I am slightly concerned whether C# will be fast enough. The library will have a number of custom math formulas and equasions to be applied to very large data sets. Simulations and matrix operations will be done as well (i.e. Monte Carlo simulations) so it'd have to be fast.
One thought is to create the math library in C++ and reference this .dll within the C# project. I am wondering whether it is worth the effort?

The general rule of thumb is "don't optimize until you need to," so I would lean towards just writing it in C# and optimizing the code later on.
But, in this situation where optimizing might require reimplementing everything in another language, I would do some testing first. Write a small app using the most processor-intensive math you expect in both C# and C++, then compare the times to see if the C# one is acceptable.

If you will be using it in C#, then you might as well put it in C# to start with. You buy more with managed code than you save with pointer wrangling. If you are worried about memory and cache issues, then just use arrays of types instead of objects. It gives you more control over how the memory is laid out.
The optimizers and JIT compilers will buy you more than enough speed to make up for any inefficiencies.

It's a little hard to say anything definitive one way or the other. I'd suggest sticking with C# if that's what you've started or what the rest of your project is based around. Keep some canonical data sets aside and establish some benchmarks as you develop. If you find performance to fall below some unacceptable threshold, and your profiling leads you to believe the problem is intrinsic to C#, then write a C++ component to solve those specific needs.

One thing to keep in mind is that using bytecode languages like C# or Java lets the JIT compiler in the runtime optimise your code. In practice, this means that the runtime performance of your code only gets better over time. Unlike C++, where the machine code is produced once at compile time and never changes, the performance of your C# code can continuously improve along with improvements to the underlying JIT compiler.
A serious amount of research is going into JIT compiler technology these days. Taking advantage of this now is an excellent approach.

One of the reasons I like using C# for numerical programming is that it is fairly easy to interface with native code. C# and the latest .NET runtimes and JIT compilers are pretty darn good, but sometimes you just can't beat highly optimized native code. For example, here is what I have done for linear algebra stuff. Write some nice object oriented classes that hide the implementation of key operations. For me this meant creating Matrix and Vector classes with addition and multiplication functions/operators. When I encountered an algorithm that had to perform several matrx-matrix products and transpose products with rather large matrices (thousands of rows by hundreds of columns) over many iterations, things got too slow. I re-implemented the matrix multiplication function to call a highly optimized matrix-matrix multiplication function from Intel's Math Kernel library (dgemm). This gave me a better than 20x speed up. Plus the unpleasant API for this native routine (dgemm takes no less than 13 parameters!) was hidden from users of the matrix class.
So, I would suggest using C# for your library and drop down to optimized native code when and where needed.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.