If C# is not interpreted, then why is a VM needed? - c#

I have read a lot of controversy about C#, where some say it's interpreted, some say it's not. I do know it's compiled into the MSIL and then JITed when run, depending on the processor etc...but isn't it still interpreted in the way it needs a VM (.NET) to run?

The VM is an abstraction of a microprocessor. It is just a definition and does not really exist. I.e. you cannot run code on the VM; however, you can generate IL code for it. The advantage is that language compilers do not need to know details about different kinds of real processors. Since different .NET languages like C# or VB (and many more) produce IL, they are compatible on this level. This, together with other conventions like a common type system, allows you to use a DLL generated from VB code in a C# program, for instance.
The IL is compiled just in time on Windows when you run a .NET application and can also be compiled ahead of time in Mono. In both cases, native machine code for the actual processor is generated. This fully compiled code is executed on the REAL microprocessor!
A different aspect is the number of compliers you have to write. If you have n languages and you want to run them on m processor architectures, you need n language-to-IL compliers + m IL-to-native-code compliers. Without this intermediate abstraction layer you would need to have n × m compliers and this can be a much higher number than just n + m!

The short answer is no, the requirement for the VM does not indicate that it's interpreted.
The VM contains the JIT compiler that translates IL to native machine code. It also contains the .NET class library, upon which C# programs depend. It also contains some other mechanisms involved in dynamic linking and such (definitely built on top of Windows DLL mechanism, but .NET has features above and beyond what Windows provides on its own, which are implemented in the VM).

You are probably refering to CLR (an implementation of the specification CLI).
The CLI defines a specific type system, semantics of all the operations on these types, a memory model, and run-time metadata.
In order to provide all of the above, some instrumentation of the generated code must happen. One simple example is to ensure that larger-than-32-bit numerals are supported and that floating-point operations behave as per specification on every architecture.
In addition, to ensure correct behaviour of memory allocation, correct management of metadata, static initialisation, generic type instantiation and similar some additional processes must be present during the execution of your CLR code. This is all taken care of by the VM and is not readily provided by the CPU.
A quote from Wikipedia, for example:
The CLR provides additional services including memory management, type safety and exception handling.

Related

Disable generics in c# for Xamarin

I have a C# app that gets ahead of time compiled to the native iOS code using monotouch (Xamarin)
Some of the libraries I link in use generics. However, it turns out that this method of compiling causes significant code bloat because it uses the C++ style of template code generation generating functions for List<int>, List<string> etc.
What I want is the Java style of generics where generics are used for compile time checking but at runtime the code only contains functions for List and not for each of the templated types.
Note: This is not an issue with using C# in the .Net CLR, as explained here. The issue arises because code is compiled AOT to the native binary instead of intermediate language. Moreover runtime type checking for generic methods is fairly useless since the binary is native.
Question: How do I disable generics, i.e. replace all occurrences of List<T> with List, during compilation? Is this even possible?
This is not possible.
On Java it's possible because they don't have value types, only classes (you can emulate this behavior by not using value types yourself, only use List<object> (or an object subclass), in which case the AOT compiler will only generate one instantiation of List).
You're also not entirely correct saying that it's not an issue with the .NET CLR; the difference between the .NET CLR and Xamarin's AOT compiler is that the AOT compiler can't wait until execution time to determine if a particular instantiation is needed or not (because iOS doesn't allow executable code to generated on the device), it needs to make sure every possible instantiation is available. If your app on the .NET CLR happened to need every possible generic instantiation at runtime, then you'd have a similar problem (only it would show up as runtime memory usage, not executable size, and on a desktop that's usually not a problem anyway).
The supported way of solving your problem is to enable the managed linker for all assemblies (in the project's iOS Build options, set "Linker behavior" to "All assemblies"). This will remove all the managed code from your app you're not using, which will in most cases significantly reduce the app size.
You can find more information about the managed linker here: http://developer.xamarin.com/guides/ios/advanced_topics/linker/
If your app is still too big, please file a bug (http://bugzilla.xamarin.com) attaching your project, and we'll have a look and see if we can improve the AOT compiler somehow (for instance we already optimize value types with the same size, so List<int> and List<uint> generate only one instantiation).

Nominal storage allocation of object in c#

In Visual Basic Nominal storage allocation of object is system dependent.
4 bytes on 32-bit platform
8 bytes on 64-bit platform
http://msdn.microsoft.com/en-us/library/47zceaw7.aspx
my question is what is the size of Nominal storage allocation of object in c# and is it system dependent?
It is exactly the same. Remember that both languages are high-level languages and "platform-independent" that are compiled to MSIL. It is inherent to any CLI language. That is, neither C# nor VB run on your machine, it is the actual MSIL that gets compiled at runtime, at the end all of them get "translated" to the same language. Normally, you shouldn't need to care about this, chances are that if you need to be in control of this stuff you might need a lower level language where you have to do memory management by yourself such as C++, C, etc.
There is no difference. Why? Because VB and C# in the end use .NET and the .NET type (second column in your link) will always behave the way you described, regardless of actual language that lead to this type.

Are languages really dependent on libraries?

I've always wondered how the dependencies are managed from a programming language to its libraries. Take for example C#. When I was beginning to learn about computing, I would assume (wrongly as it turns out) that the language itself is designed independently of the class libraries that would eventually become available for it. That is, the set of language keywords (such as for, class or throw) plus the syntax and semantics are defined first, and libraries that can be used from the language are developed separately. The specific classes in those libraries, I used to think, should not have any impact on the design of the language.
But that doesn't work, or not all the time. Consider throw. The C# compiler makes sure that the expression following throw resolves to an exception type. Exception is a class in a library, and as such it should not be special at all. It would be a class as any other, except that the C# compiler assigns it that special semantics. That is very good, but my conclusion is that the design of the language does depend on the existence and behaviour of specific elements in the class libraries.
Additionally, I wonder how this dependency is managed. If I were to design a new programming language, what techniques would I use to map the semantics of throw to the very particular class that is Exception?
So my questions are two:
Am I correct in thinking that language design is tightly coupled to that of its base class libraries?
How are these dependencies managed from within the compiler and run-time? What techniques are used?
Thank you.
EDIT. Thanks to those who pointed out that my second question is very vague. I agree. What I am trying to learn is what kind of references the compiler stores about the types it needs. For example, does it find the types by some kind of unique id? What happens when a new version of the compiler or the class libraries is released? I am aware that this is still pretty vague, and I don't expect a precise, single-paragraph answer; rather, pointers to literature or blog posts are most welcome.
What I am trying to learn is what kind of references the compiler stores about the types it needs. For example, does it find the types by some kind of unique id?
Obviously the C# compiler maintains an internal database of all the types available to it in both source code and metadata; this is why a compiler is called a "compiler" -- it compiles a collection of data about the sources and libraries.
When the C# compiler needs to, say, check whether an expression that is thrown is derived from or identical to System.Exception it pretends to do a global namespace lookup on System, and then it does a lookup on Exception, finds the class, and then compares the resulting class information to the type that was deduced for the expression.
The compiler team uses this technique because that way it works no matter whether we are compiling your source code and System.Exception is in metadata, or if we are compiling mscorlib itself and System.Exception is in source.
Of course as a performance optimization the compiler actually has a list of "known types" and populates that list early so that it does not have to undergo the expense of doing the lookup every time. As you can imagine, the number of times you'd have to look up the built-in types is extremely large. Once the list is populated then the type information for System.Exception can be just read out of the list without having to do the lookup.
What happens when a new version of the compiler or the class libraries is released?
What happens is: a whole bunch of developers, testers, managers, designers, writers and educators get together and spend a few million man-hours making sure that the compiler and the class libraries all work before they're released.
This question is, again, impossibly vague. What has to happen to make a new compiler release? A lot of work, that's what has to happen.
I am aware that this is still pretty vague, and I don't expect a precise, single-paragraph answer; rather, pointers to literature or blog posts are most welcome.
I write a blog about, among other things, the design of the C# language and its compiler. It's at http://ericlippert.com.
I would assume (perhaps wrongly) that the language itself is designed independently of the class libraries that would eventually become available for it.
Your assumption is, in the case of C#, completely wrong. C# 1.0, the CLR 1.0 and the .NET Framework 1.0 were all designed together. As the language, runtime and framework evolved, the designers of each worked very closely together to ensure that the right resources were allocated so that each could ship new features on time.
I do not understand where your completely false assumption comes from; that sounds like a highly inefficient way to write a high-level language and a great way to miss your deadlines.
I can see writing a language like C, which is basically a more pleasant syntax for assembler, without a library. But how would you possibly write, say, async-await without having the guy designing Task<T> in the room with you? It seems like an exercise in frustration.
Am I correct in thinking that language design is tightly coupled to that of its base class libraries?
In the case of C#, yes, absolutely. There are dozens of types that the C# language assumes are available and as-documented in order to work correctly.
I once spent a very frustrating hour with a developer who was having some completely crazy problem with a foreach loop before I discovered that he had written his own IEnumerable<T> that had slightly different methods than the real IEnumerable<T>. The solution to his problem: don't do that.
How are these dependencies managed from within the compiler and run-time?
I don't know how to even begin to answer this impossibly vague question.
All (practical) programming languages have a minimum number of required functions. For modern "OO" languages, this also includes a minimum number of required types.
If the type is required in the Language Specification, then it is required - regardless of how it is packaged.
Conversely, not all of the BCL is required to have a valid C# implementation. This is because not all of the BCL types are required by the Language Specification. For instance, System.Exception (see #16.2) and NullReferenceException are required, but FileNotFoundException is not required to implement the C# Language.
Note that even though the specification provides minimal definitions for base types (e.g System.String), it does not define the commonly-accepted methods (e.g. String.Replace). That is, almost all of the BCL is outside the scope of the Language Specification1.
.. but my conclusion is that the design of the language does depend on the existence and behaviour of specific elements in the class libraries.
I agree entirely and have included examples (and limits of such definitions) above.
.. If I were to design a new programming language, what techniques would I use to map the semantics of "throw" to the very particular class that is "Exception"?
I would not look primarily at the C# specification, but rather I would look at the Common Language Infrastructure specification. This new language should, for practically reasons, be designed to interoperate with existing CLI/CLR languages, but does not necessarily need to "be C#".
1 The CLI (and associated references) do define the requirements of a minimal BCL. So if it is taken that a valid C# implementation must conform to (or may assume) the CLI then there are many other types to consider that are not mentioned in the C# specification itself.
Unfortunately, I do not have sufficient knowledge of the 2nd (and more interesting) question.
my impression is that
in languages like C# and Ada
application source code is portable
standard library source code is not portable
accross compilers/implementations

What is the difference between runtime and compile-time? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
So what is a runtime? Is it a virtual machine that executes half-compiled code that cannot run on a specific processor. If so, then what's a virtual machine? Is it another software that further translates the half-compiled code to machine specific code? So what if we are talking about one of those languages that don't compile to intermediate code but rather translate/compile directly to machine code. What's a runtime in that situation? is it the hardware (CPU and RAM)?
Also, what's the difference between compile-time and runtime? Are they stages of a software lifecycle. I mean a program is originally a bunch of text files, right? So you compile or translate those to a form of data that then either can be loaded to memory and executed by the processor or if it's a "managed" language, then it would need further compilation before it can run on hardware.
What exactly is a managed language?
Lastly, is there such a thing as debug-time and what is it?
I'm in my first term studying computer science, and it really confuses me how illogically things are taught. "Information" is being shoved down my throat, but whenever I try to make sense out of everything by organizing everything related into a single system of well defined components and relations, I get stuck.
Thanks in advance,
Garrett
The kind of code suitable for reasoning by human beings (let's call it "source code") needs to pass through several stages of translation before it can be physically executed by the underlying hardware (such as CPU or GPU):
Source code.
[Optionally] intermediate code (such as .NET MSIL or Java bytecode).
Machine code conformant to the target instruction set architecture.
The microcode that actually flips the logical gates in silicon.
These translations can be done in various phases of the program's "lifecycle". For example, a particular programming language or tool might choose to translate from 1 to 2 when the developer "builds" the program and translate from 2 to 3 when the user "runs" it (which is typically done by a piece of software called "virtual machine"1 that needs to be pre-installed on user's computer). This scenario is typical for "managed" languages such as C# and Java.
Or it could translate from 1 to 3 directly at build time, as common for "native" languages such as C and C++.
The translation between 3 and 4 is almost always done by the underlying hardware. It's technically a part of the "run time" but is typically abstracted away and largely invisible to the developer.
The term "compile time" typically denotes the translation from 1 to 2 (or 3). There are certain checks that can be done at compile time before the program is actually run, such as making sure the types of arguments passed to a method match the declared types of method parameters (assuming the language is "statically typed"). The earlier the error is caught, the easier it is to fix, but this has to be balanced with the flexibility, which is why some "scripting" languages lack comprehensive compile-time checks.
The term "run-time" typically denotes the translation from 2 (or 3) all the way down to 4. It is even possible to translate directly from 1 at run-time, as done by so called "interpreted languages".
There are certain kinds of problems that can't be caught at compile time, and you'll have to use appropriate debugging techniques (such debuggers, logging, profilers etc...) to identify them at run-time. The typical example of a run-time error is when you try to access an element of a collection that is not there, which could then manifest at run-time as an exception and is a consequence of the flow of execution too complex for the compiler to "predict" at compile time.
The "debug time" is simply a run-time while the debugger is attached to the running program (or you are monitoring the debug log etc.).
1 Don't confuse this with virtual machines that are designed to run native code, such as VMware or Oracle VirtualBox.
Compile-time and run-time usually refers to when checks occur or when errors can happen. For example, in a statically typed language like C# the static type checks are made at compile time. That means that you cannot compile the application if you for example try to assign a string to an int-variable. Run-time on the other hand refers to the time when the code is actually executed. For example exceptions are always thrown at run-time.
As for virtual machines and such; C# is a language that compiles into the Common Intermediate Language (CIL, or IL). The result is a code which is the same regardless of which .NET language you use (C# and VB.NET both produce IL). The .NET Framework then executes this language at run-time using Just-in-time compilation. So yeah, you can see the .NET Framework as a virtual machine that runs a special sublanguage against the target machine code.
As for debug-time, I don’t think there is such a thing, as you are still running the program when debugging. So if anything, debug-time would be run-time with an attached debugger. But you wouldn’t use a term like that.
Compile-time - The period at which a compiler will attempt to compile some code. Example: "The compiler found 3 type errors at compile-time which prevented the program from being compiled."
Runtime - The period during which a program is executing. Example: "We did not spot the error until runtime because it was a logic error."
Run-time and virtual machines are two separate ideas - your first question doesn't make sense to me.
Virtual Machines are indeed software programs that translate "object" [Java, C#, etc.] code into byte-code that can be run on a machine. If a language uses a virtual machine, it also often uses Just In Time compiling - which means that compile-time and run-time are in essence happening at the same time.
Conversely, compiled languages like C, C++ are usually compiled into byte-code before being executed on a machine and therefore compile-time and run-time are completely separate.
Generally "managed" languages have garbage collection (you don't directly manipulate memory with allocations and de-allocations [Java and C# are both examples]) and run on some type of virtual machine.

What makes the Java compiler so fast?

I was wondering about what makes the primary Java compiler (javac by sun) so fast at compilation?
..as well as the C# .NET compiler from Microsoft.
I am comparing them with C++ compilers (such as G++), so maybe my question should have been, what makes C++ compilers so slow :)
That question was nicely answered in this one: Why does C++ compilation take so long? (as jalf pointed out in the comments section)
Basically it's the missing modules concept of C++, and the aggressive optimization done by the compiler.
I think the most difficult part is not the need to compile the header files (unless they are really big, but you can use precompiled headers in that case). The worst part is always the fact that C++'s grammar is too wildly context-sensitive. Despite the fact I like C++, I feel sorry for anybody who has to write a C++ parser.
There are a couple of things that make the C++ compiler slower than those of Java/C#. The grammar is much more complex, generic programming support is much more powerfull in C++, but at the same time it is more expensive to compile. Inclusion of files work in a different way than importing modules.
Inclussion of header files
First, whenever you include a file in C++ the contents of the file (.h usually) are injected in the current compilation unit (include guards avoid reinjecting the same header twice), and this is transitive. That is, if you include header a.h, that in turns includes b.h, your compilation unit will include all code in a.h and all code in b.h.
Java (or C#, I will talk about Java, but they are similar in this) don't have include files, they depend on the binaries from the compilation of the used classes. This means that whenever you compile a.java that uses an object B defined in b.java, it just checks the binary b.class, it does not need to go deeper to check the dependencies of B, so it can cut the process earlier (with just one level of checking).
At the same time, including files only includes the language definitions, and processing it requires time. When the Java/C# compiler reads a binary it has the same information but already processed by the compilation step that generated it.
So at the end, in C/C++ more files are included and at the same time processing of those includes is more expensive than processing of binary modules.
Templates
Templates are special in their own way. They can be precompiled, but they are usually not (for a good set of reasons). This means that in all compilation units that use std::vector the whole set of vector methods used (unused template methods don't get compiled) is processed and the binary code generated by the compiler. At a later step, during linking, redundant definitions of the same method will get dropped, but during compilation they must be processed.
Support in Java for generics is more limited in many ways. At the end, for example, there is only one Vector class binary, and whenever the compiler sees Vector in java what it does is generating type checking code before delegating to the real Vector implementation (that stores plain Object) and that is not generic. The compiler does provide the type warranties, but does not compile Vector for each type.
In C# it is, once again, different. C# support for generics is more complex than that of Java, and at the end generic classes are different than plain classes but anyway they get compiled only once as the binary format has all required information.
Because they do something quite different, C++ compiler produces optimized native code whereas C#, VB .Net and Java compiler produce an intermidiate language than when you first execute the application is turned into native code, and that is why you get slow loading of application in Java etc. the first time you execute the application.
The C++ compiler has to do the full optimization where the JITed languages optimize when you execute the application.
Someone would argue that you have to measure C++ compile time = Java compile time + time for JITing the first time you load the application if you want to be correct, but i don't think that would be right and fair because you are comparing native languages to JITed, or else oranges to apples.
The C++ compiler must repeatedly compile all the header files and there are lots of them, so this is one thing that slows it down.
One of the more time consuming tasks when compiling is code optimization.
Javac does very little optimization on the code when doing the compilation. Optimization is instead done by the JVM when running the application.
A C/C++ needs to be optimized when compiling since optimization of compiled machine code is hard.
You got it right in your last sentence: it's not java or C# that's fast to compile, it's C++ that is exceptionally slow to compile, due to its complex grammar and features, most importantly templates
If you think javac is fast try Jikes.... (see http://jikes.sourceforge.net/)
It is a Java Compiler written in C++. Unfortunately they haven't kept up with the latest Java Compiler specs but if you want to see fast this is it.
Tony
I think part of it is the complexity of the languages. C++ is incredibly mutable, with the ability to override pretty much any operator or piece of syntax (like overriding the () operator). This means the compiler has to do a lot more work just to determine what operations to actually run, even for simple things. Java and C# don't have this issue, as the syntax is fixed, and they're generally much simpler to parse.
It's a bit difficult comparing bytecode languages like java with natively compiled languages like C++. A better comparison is Delphi vs C++, where Delphi is much faster to compile. Since this has nothing to do with optimization or byte code, it must be due to differences in language syntax and the relative performance of includes vs. modules/units.
Is Java compiler fast?
The Java to class translation shall be blindingly fast since it is just a glorified zip with some syntax checking so to be fair if compared to a real compiler that is doing optimization and object code generation the "translation" from Java to class is trivial.
Did a comparison with fairly small program "hello world" and-and compare to GCC (C/C++/Ada) and found that javac was 30 times slower, and it got even worse in runtime?

Categories

Resources