Why cant we use IteratorStateMachineAttribute in C#? - c#

I did a Go To Definition (F12) on a class I was trying to derive from and I noticed that one of the methods was marked with AsyncStateMachineAttribute. Which in turn inherits StateMachineAttribute. I was curious and decide to read up on this attribute and all its derivates on MSDN. That led me to this and I came across this statement:
You can't use IteratorStateMachineAttribute to test whether a method is an iterator method in C#.
Because that statement is made to stand out, there must be serious implications to it but there is no further explanation as to why that is so. Does anyone have insights in this regard?

I'm 99% sure it's historical. Basically, C# introduced iterator blocks in C# 2 - a long time before this attribute was introduced.
The equivalent async attribute was introduced at the same time as async methods in C#, so that was fine... but even though the C# compiler now applies IteratorStateMachineAttribute to iterator blocks:
It doesn't apply to libraries created with older versions of the compiler, so you wouldn't be able to rely on it there.
It can't apply to libraries targeting versions of .NET prior to 4.5. (I'm not sure what the VB compiler does here, to be honest. It may omit the attribute, or it may require you to be targeting a recent version of .NET in order to use iterator methods.)
I would say that the presence of an IteratorStateMachineAttribute on a method is a good indicator that it is an iterator method (although there's nothing to stop a mischievous developer applying it to other methods), but it's not a sufficient test due to older versions of the C# compiler.

The State Machine here is one that is automatically generated by the C# compiler. The C# compiler internally converts lots of advanced features (like closures, the yield keyword, and async) into simplified C# before continuing. Things like 'AsyncStateMachineAttribute' is one bit of evidence that something like this has occured. You might also be familiar with classes called, e.g., DisplayClass923084923'1, which are the classes C# generates to implement closures.
When you use 'yield', say, the C# compiler first produces a version of the code which doesn't use 'yield' but instead uses a state machine implementation. In principle, from this;
yield "A";
yield "B";
to
int _state = 0;
if (_state == 0) { state = 1; return "A"; }
if (_state == 1) { state = 2; return "B"; }
This means the C# compiler, later on, doesn't have to deal with 'yield' as such -- it's been reduced to integers and return statements. I think this is where the IteratorStateMachineAttribute is added -- to the simplified, ints-and-returns version of the class.
(I think Async works the same way, producing a simplified state machine as its simplification step, which is how you came to it in the documentation.)
However, ever since the earliest version of C#, you've had the foreach keyword, which works on any object that has a GetEnumerator method, and that enumerator has methods like MoveNext and Result.
So -- an iterator method might be produced in different ways. IteratorStateMachineAttribute is what's supplied by the compiler in some cases, but you shouldn't rely on it being there.

This is informing that you cannot apply this flag to a method because during the compilation it will inject some IL code that cannot be reliably added to methods.

Related

Are IEnumerable<T>, Task<T> and IDisposable hard coded in the C# compiler?

I asked that question myself many times. I tried to find some blog post about that and even dug into the Roslyn source code, but have not found any complete answer on that.
Basically, with some modern C# language features the compiler will take some syntactic sugar and transforms it into more low-level C# code. Some example of those are:
using() generates a try-finally to definitely dispose an IDisposable
Functions returning an IEnumerable<T> with yield return will turn that function into an iterator implemented as a state machine
Functions marked with async have to return Task<T> (or similar) and will turn into a state machine too, which can be re-entered from the programs event-loop under the hood
So, these are all nice features, but the compiler is always enforcing the specific types IEnumerable<T>, Task<T> and IDisposable. Are these types somehow baked into the compiler? And isn't it true that the compiler is somehow bound to the standard library then, even though mscorlib is just plain C# code providing common functionality?
I cannot imagine that since programming languages are so abstract and general. As I have seen there is the possibility for await-ing anything as long as the type has an GetAwaiter extension method. That sounds more abstract to me.
Edit
Also, if anyone can point me to the source code which specifies the required predefined types in the compiler, let me know!
Sort-of.
The compiler has lists of "special" (used in the type-system / binder) and "well-known" (referenced by generated code) types and members, which are hard-coded by name in Roslyn source. However, all that it cares about are the names & methods / signatures of these types / members; you can still write your own mscorlib (and people have done this) as long as it has them.
See
http://sourceroslyn.io/#Microsoft.CodeAnalysis/SpecialType.cs
http://sourceroslyn.io/#Microsoft.CodeAnalysis/SpecialMember.cs
http://sourceroslyn.io/#Microsoft.CodeAnalysis/WellKnownTypes.cs
http://sourceroslyn.io/#Microsoft.CodeAnalysis/Symbols/WellKnownMemberNames.cs

Does the compiler discard empty methods?

Would C# compiler optimize empty void methods away?
Something like
private void DoNothing()
{
}
As essentially, no code is run aside from adding DoNothing to the call stack and removing it again, wouldn't it be better to optimize this call away?
Would C# compiler optimize empty void methods away?
No. They could still be accessed via reflection, so it's important that the method itself stays.
Any call sites are likely to include the call as well - but the JIT may optimize them away. It's in a much better position to do so. It's basically a special case of inlining, where the inlined code is empty.
Note that if you call it on another object:
foo.DoNothing();
that's not a no-op, because it will check that foo is non-null.
If you want you could intercept the post build event for every project and run an IL inspecting tool that will reflect your generated dll, inspect every methodinfo in your type and request it's IL looking for empty IL patterns like only NoOp IL instructions, and remove the unwanted methods.
For example:
var ilBytes = SomeMethodInfo.GetMethodBody().GetILAsByteArray();
A good obfuscation tool will "prune" methods in this way. preemptive.com/products/dotfuscator/features#pruning – weston 5 mins ago
You could use the tool externally of visual studio to find empty methods and remove them from the file they are defined or used in.
Never. Compiler doesn't has to do with what's empty or not written. Its just what you write, you get in your MSIL. you can check it here in ILDASM

Why does "dynamic" require language-specific runtime components?

Microsoft.CSharp is required to use dynamic feature.
I understand there are binders, evaluators and helpers in the assembly.
But why it has to be language-specific?
Why Microsoft.CSharp and not Microsoft.Dynamic or System.Dynamic?
Please, explain.
Let's say we have d.x where d is dynamic.
C# compiler
1. applies C# language rules
2. gets "property or field access"
3. emits (figurally) Binder.GetPropertyOrField(d, "x")
Now, being asked to reference Microsoft.CSharp may make one think that language-agnostic binder can't handle this case, and C#-only something got its way through compilation and requires special library.
Compiler had a bad day?
To your first question, it is language-specific because it needs to be.
In C# you call a method with too many arguments and you get an error. In Javascript, the extra arguments are simply ignored. In C# you access a member that doesn't exist and get an error, while in Javascript you get undefined. Even if you discovered all these varying feature sets and put it all into System.Core, the next language fad of the month is sure to have some super neat feature that it wouldn't support. It's better to be flexible.
There is common code in .NET core, under the System.Dynamic and System.Runtime.CompilerServices namespaces. It just can't all be common.
And as for your second question, the need for the "special C# library" could of course be removed by transforming these language-specific behaviors inline, but why? That will needlessly bloat your IL code size. It is the same reasoning for you not writing your own Int32.Parse every time you need to read in a number.
One reason I can think of - Visual Basic.NET has had late binding in it from day one, primarily oriented around how it interoperates with COM IDispatch interfaces - so if they wanted a language agnostic binder, they'd have had to adopt the Visual Basic rules - which includes that member lookup only works with Public members.
Apparently, the C# designers didn't want to be so strict. You can call this class' DoStuff method from C# via a dynamic reference:
public class Class1
{
internal void DoStuff()
{
Console.WriteLine("Hello");
}
}
Whereas attempting to call the same via Visual Basic's Object results in a MissingMemberException at runtime.
So because the C# designers weren't the first to arrive at the late-binding party, they could either follow Visual Basic's lead or they could say "each language will have its own rules" - they went with the latter.

ILogger _logger.Debug("Something") - Any way for the compiler to remove it?

I got a pretty common scenario, namely a self implemented ILogger interface. It contains several methods like _logger.Debug("Some stuff") and so on. The implementation is provided by a LoggingService, and used in classes the normal way.
Now I have a question regarding performance, I am writing for Windows Phone 7, and because of the limited power of these devices, little things may matter.
I do not want to:
Include a precompiler directive on each line, like #IF DEBUG
Use a condition like log4net e.g. _logger.DebugEnabled
The way I see it, in the release version, I just return NullLoggers, which contain an empty implementation of the interface, doing nothing.
The question is: Does the compiler recognize such things (may be hard, he can't know on compile time which logger I assign). Is there any way to give .NET a hint for that?
The reason for my question, I know entering an empty function will not cause a big delay, no problem there. But there are a lot of strings in the source code of my application, and if they are never used, they do not really need to be part of my application...
Or am I overthinking a tiny problem (perhaps the "string - code" ratio just looks awful in my code editor, and its no big deal anyway)..
Thanks for tips,
Chris
Use the Conditional attribute:
[Conditional("DEBUG")]
public void Debug(string message) { /* ... */ }
The compiler will remove all calls to this method for any build configurations that don't match the string in the conditional attribute. Note that this attribute is applied to the method not the call site. Also note that it is the call site instruction that is removed, not the method itself.
It is probably a very small concern to have logging code in your application that does not "run". The overhead of the "null" logger or conditionals is likely to be very small in the scheme of things. The strings will incur memory overhead which could be worrying for a constrained device, but as it is WP7 the minimum specs are not that constrained in reality.
I understand that logging code looks fugly though. :)
If you really want to strip that logging code out...
In .Net you can use the ConditionalAttribute to mark methods for conditional compilation. You could leverage this feature to ensure that all logging calls are removed from compilation for specified build configurations. As long as methods that you have decorated with the conditional attributes follows a few rules, the compiler will literally strip the call chain out.
However, if you wanted to use this approach then you would have to forgo your interface design as the conditional attribute cannot be applied to interface members, and you cannot implement interfaces with conditional members.

What are the differences between Generics in C# and Java... and Templates in C++? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 9 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I mostly use Java and generics are relatively new. I keep reading that Java made the wrong decision or that .NET has better implementations etc. etc.
So, what are the main differences between C++, C#, Java in generics? Pros/cons of each?
I'll add my voice to the noise and take a stab at making things clear:
C# Generics allow you to declare something like this.
List<Person> foo = new List<Person>();
and then the compiler will prevent you from putting things that aren't Person into the list.
Behind the scenes the C# compiler is just putting List<Person> into the .NET dll file, but at runtime the JIT compiler goes and builds a new set of code, as if you had written a special list class just for containing people - something like ListOfPerson.
The benefit of this is that it makes it really fast. There's no casting or any other stuff, and because the dll contains the information that this is a List of Person, other code that looks at it later on using reflection can tell that it contains Person objects (so you get intellisense and so on).
The downside of this is that old C# 1.0 and 1.1 code (before they added generics) doesn't understand these new List<something>, so you have to manually convert things back to plain old List to interoperate with them. This is not that big of a problem, because C# 2.0 binary code is not backwards compatible. The only time this will ever happen is if you're upgrading some old C# 1.0/1.1 code to C# 2.0
Java Generics allow you to declare something like this.
ArrayList<Person> foo = new ArrayList<Person>();
On the surface it looks the same, and it sort-of is. The compiler will also prevent you from putting things that aren't Person into the list.
The difference is what happens behind the scenes. Unlike C#, Java does not go and build a special ListOfPerson - it just uses the plain old ArrayList which has always been in Java. When you get things out of the array, the usual Person p = (Person)foo.get(1); casting-dance still has to be done. The compiler is saving you the key-presses, but the speed hit/casting is still incurred just like it always was.
When people mention "Type Erasure" this is what they're talking about. The compiler inserts the casts for you, and then 'erases' the fact that it's meant to be a list of Person not just Object
The benefit of this approach is that old code which doesn't understand generics doesn't have to care. It's still dealing with the same old ArrayList as it always has. This is more important in the java world because they wanted to support compiling code using Java 5 with generics, and having it run on old 1.4 or previous JVM's, which microsoft deliberately decided not to bother with.
The downside is the speed hit I mentioned previously, and also because there is no ListOfPerson pseudo-class or anything like that going into the .class files, code that looks at it later on (with reflection, or if you pull it out of another collection where it's been converted into Object or so on) can't tell in any way that it's meant to be a list containing only Person and not just any other array list.
C++ Templates allow you to declare something like this
std::list<Person>* foo = new std::list<Person>();
It looks like C# and Java generics, and it will do what you think it should do, but behind the scenes different things are happening.
It has the most in common with C# generics in that it builds special pseudo-classes rather than just throwing the type information away like java does, but it's a whole different kettle of fish.
Both C# and Java produce output which is designed for virtual machines. If you write some code which has a Person class in it, in both cases some information about a Person class will go into the .dll or .class file, and the JVM/CLR will do stuff with this.
C++ produces raw x86 binary code. Everything is not an object, and there's no underlying virtual machine which needs to know about a Person class. There's no boxing or unboxing, and functions don't have to belong to classes, or indeed anything.
Because of this, the C++ compiler places no restrictions on what you can do with templates - basically any code you could write manually, you can get templates to write for you.
The most obvious example is adding things:
In C# and Java, the generics system needs to know what methods are available for a class, and it needs to pass this down to the virtual machine. The only way to tell it this is by either hard-coding the actual class in, or using interfaces. For example:
string addNames<T>( T first, T second ) { return first.Name() + second.Name(); }
That code won't compile in C# or Java, because it doesn't know that the type T actually provides a method called Name(). You have to tell it - in C# like this:
interface IHasName{ string Name(); };
string addNames<T>( T first, T second ) where T : IHasName { .... }
And then you have to make sure the things you pass to addNames implement the IHasName interface and so on. The java syntax is different (<T extends IHasName>), but it suffers from the same problems.
The 'classic' case for this problem is trying to write a function which does this
string addNames<T>( T first, T second ) { return first + second; }
You can't actually write this code because there are no ways to declare an interface with the + method in it. You fail.
C++ suffers from none of these problems. The compiler doesn't care about passing types down to any VM's - if both your objects have a .Name() function, it will compile. If they don't, it won't. Simple.
So, there you have it :-)
C++ rarely uses the “generics” terminology. Instead, the word “templates” is used and is more accurate. Templates describes one technique to achieve a generic design.
C++ templates is very different from what both C# and Java implement for two main reasons. The first reason is that C++ templates don't only allow compile-time type arguments but also compile-time const-value arguments: templates can be given as integers or even function signatures. This means that you can do some quite funky stuff at compile time, e.g. calculations:
template <unsigned int N>
struct product {
static unsigned int const VALUE = N * product<N - 1>::VALUE;
};
template <>
struct product<1> {
static unsigned int const VALUE = 1;
};
// Usage:
unsigned int const p5 = product<5>::VALUE;
This code also uses the other distinguished feature of C++ templates, namely template specialization. The code defines one class template, product that has one value argument. It also defines a specialization for that template that is used whenever the argument evaluates to 1. This allows me to define a recursion over template definitions. I believe that this was first discovered by Andrei Alexandrescu.
Template specialization is important for C++ because it allows for structural differences in data structures. Templates as a whole is a means of unifying an interface across types. However, although this is desirable, all types cannot be treated equally inside the implementation. C++ templates takes this into account. This is very much the same difference that OOP makes between interface and implementation with the overriding of virtual methods.
C++ templates are essential for its algorithmic programming paradigm. For example, almost all algorithms for containers are defined as functions that accept the container type as a template type and treat them uniformly. Actually, that's not quite right: C++ doesn't work on containers but rather on ranges that are defined by two iterators, pointing to the beginning and behind the end of the container. Thus, the whole content is circumscribed by the iterators: begin <= elements < end.
Using iterators instead of containers is useful because it allows to operate on parts of a container instead of on the whole.
Another distinguishing feature of C++ is the possibility of partial specialization for class templates. This is somewhat related to pattern matching on arguments in Haskell and other functional languages. For example, let's consider a class that stores elements:
template <typename T>
class Store { … }; // (1)
This works for any element type. But let's say that we can store pointers more effciently than other types by applying some special trick. We can do this by partially specializing for all pointer types:
template <typename T>
class Store<T*> { … }; // (2)
Now, whenever we instance a container template for one type, the appropriate definition is used:
Store<int> x; // Uses (1)
Store<int*> y; // Uses (2)
Store<string**> z; // Uses (2), with T = string*.
Anders Hejlsberg himself described the differences here "Generics in C#, Java, and C++".
There are already a lot of good answers on what the differences are, so let me give a slightly different perspective and add the why.
As was already explained, the main difference is type erasure, i.e. the fact that the Java compiler erases the generic types and they don't end up in the generated bytecode. However, the question is: why would anyone do that? It doesn't make sense! Or does it?
Well, what's the alternative? If you don't implement generics in the language, where do you implement them? And the answer is: in the Virtual Machine. Which breaks backwards compatibility.
Type erasure, on the other hand, allows you to mix generic clients with non-generic libraries. In other words: code that was compiled on Java 5 can still be deployed to Java 1.4.
Microsoft, however, decided to break backwards compatibility for generics. That's why .NET Generics are "better" than Java Generics.
Of course, Sun aren't idiots or cowards. The reason why they "chickened out", was that Java was significantly older and more widespread than .NET when they introduced generics. (They were introduced roughly at the same time in both worlds.) Breaking backwards compatibility would have been a huge pain.
Put yet another way: in Java, Generics are a part of the Language (which means they apply only to Java, not to other languages), in .NET they are part of the Virtual Machine (which means they apply to all languages, not just C# and Visual Basic.NET).
Compare this with .NET features like LINQ, lambda expressions, local variable type inference, anonymous types and expression trees: these are all language features. That's why there are subtle differences between VB.NET and C#: if those features were part of the VM, they would be the same in all languages. But the CLR hasn't changed: it's still the same in .NET 3.5 SP1 as it was in .NET 2.0. You can compile a C# program that uses LINQ with the .NET 3.5 compiler and still run it on .NET 2.0, provided that you don't use any .NET 3.5 libraries. That would not work with generics and .NET 1.1, but it would work with Java and Java 1.4.
Follow-up to my previous posting.
Templates are one of the main reasons why C++ fails so abysmally at intellisense, regardless of the IDE used. Because of template specialization, the IDE can never be really sure if a given member exists or not. Consider:
template <typename T>
struct X {
void foo() { }
};
template <>
struct X<int> { };
typedef int my_int_type;
X<my_int_type> a;
a.|
Now, the cursor is at the indicated position and it's damn hard for the IDE to say at that point if, and what, members a has. For other languages the parsing would be straightforward but for C++, quite a bit of evaluation is needed beforehand.
It gets worse. What if my_int_type were defined inside a class template as well? Now its type would depend on another type argument. And here, even compilers fail.
template <typename T>
struct Y {
typedef T my_type;
};
X<Y<int>::my_type> b;
After a bit of thinking, a programmer would conclude that this code is the same as the above: Y<int>::my_type resolves to int, therefore b should be the same type as a, right?
Wrong. At the point where the compiler tries to resolve this statement, it doesn't actually know Y<int>::my_type yet! Therefore, it doesn't know that this is a type. It could be something else, e.g. a member function or a field. This might give rise to ambiguities (though not in the present case), therefore the compiler fails. We have to tell it explicitly that we refer to a type name:
X<typename Y<int>::my_type> b;
Now, the code compiles. To see how ambiguities arise from this situation, consider the following code:
Y<int>::my_type(123);
This code statement is perfectly valid and tells C++ to execute the function call to Y<int>::my_type. However, if my_type is not a function but rather a type, this statement would still be valid and perform a special cast (the function-style cast) which is often a constructor invocation. The compiler can't tell which we mean so we have to disambiguate here.
Both Java and C# introduced generics after their first language release. However, there are differences in how the core libraries changed when generics was introduced. C#'s generics are not just compiler magic and so it was not possible to generify existing library classes without breaking backwards compatibility.
For example, in Java the existing Collections Framework was completely genericised. Java does not have both a generic and legacy non-generic version of the collections classes. In some ways this is much cleaner - if you need to use a collection in C# there is really very little reason to go with the non-generic version, but those legacy classes remain in place, cluttering up the landscape.
Another notable difference is the Enum classes in Java and C#. Java's Enum has this somewhat tortuous looking definition:
// java.lang.Enum Definition in Java
public abstract class Enum<E extends Enum<E>> implements Comparable<E>, Serializable {
(see Angelika Langer's very clear explanation of exactly why this is so. Essentially, this means Java can give type safe access from a string to its Enum value:
// Parsing String to Enum in Java
Colour colour = Colour.valueOf("RED");
Compare this to C#'s version:
// Parsing String to Enum in C#
Colour colour = (Colour)Enum.Parse(typeof(Colour), "RED");
As Enum already existed in C# before generics was introduced to the language, the definition could not change without breaking existing code. So, like collections, it remains in the core libraries in this legacy state.
11 months late, but I think this question is ready for some Java Wildcard stuff.
This is a syntactical feature of Java. Suppose you have a method:
public <T> void Foo(Collection<T> thing)
And suppose you don't need to refer to the type T in the method body. You're declaring a name T and then only using it once, so why should you have to think of a name for it? Instead, you can write:
public void Foo(Collection<?> thing)
The question-mark asks the the compiler to pretend that you declared a normal named type parameter that only needs to appear once in that spot.
There's nothing you can do with wildcards that you can't also do with a named type parameter (which is how these things are always done in C++ and C#).
Wikipedia has great write-ups comparing both Java/C# generics and Java generics/C++ templates. The main article on Generics seems a bit cluttered but it does have some good info in it.
The biggest complaint is type erasure. In that, generics are not enforced at runtime. Here's a link to some Sun docs on the subject.
Generics are implemented by type
erasure: generic type information is
present only at compile time, after
which it is erased by the compiler.
C++ templates are actually much more powerful than their C# and Java counterparts as they are evaluated at compile time and support specialization. This allows for Template Meta-Programming and makes the C++ compiler equivalent to a Turing machine (i.e. during the compilation process you can compute anything that is computable with a Turing machine).
In Java, generics are compiler level only, so you get:
a = new ArrayList<String>()
a.getClass() => ArrayList
Note that the type of 'a' is an array list, not a list of strings. So the type of a list of bananas would equal() a list of monkeys.
So to speak.
Looks like, among other very interesting proposals, there is one about refining generics and breaking backwards compatibility:
Currently, generics are implemented
using erasure, which means that the
generic type information is not
available at runtime, which makes some
kind of code hard to write. Generics
were implemented this way to support
backwards compatibility with older
non-generic code. Reified generics
would make the generic type
information available at runtime,
which would break legacy non-generic
code. However, Neal Gafter has
proposed making types reifiable only
if specified, so as to not break
backward compatibility.
at Alex Miller's article about Java 7 Proposals
NB: I don't have enough point to comment, so feel free to move this as a comment to appropriate answer.
Contrary to popular believe, which I never understand where it came from, .net implemented true generics without breaking backward compatibility, and they spent explicit effort for that.
You don't have to change your non-generic .net 1.0 code into generics just to be used in .net 2.0. Both the generic and non-generic lists are still available in .Net framework 2.0 even until 4.0, exactly for nothing else but backward compatibility reason. Therefore old codes that still used non-generic ArrayList will still work, and use the same ArrayList class as before.
Backward code compatibility is always maintained since 1.0 till now... So even in .net 4.0, you still have to option to use any non-generics class from 1.0 BCL if you choose to do so.
So I don't think java has to break backward compatibility to support true generics.

Categories

Resources