C#'s compiler design - forward referencing

C#'s compiler design - forward referencing - c#

In forward referencing language such as c#, how does the compiler handle this? What are the steps in which the compiler operate?

The main difference between allowing forward reference or not is using a one pass compiler or a multi pass one. Of course to handle forward referencing you have to check symbols definitions and do typechecking AFTER having generated the full abstract syntax tree of the source you are compiling.
So there is no problem, when you first find a forward reference you just rely that it will be defined later (you can mark it as pending in symbol table) then when you find the actual definition you refine the symbol object in symbol table.
After you can typecheck it or check if some symbols are still pending (so there is no real definition, and you can raise a semantic error)..

It does this by doing two passes of compilation. The first pass parses the code and collects all identifiers used. The second pass resolves all identifiers.
In a language with a single pass compiler, like Pascal, only backwards references can be used as the type of an identifier have to be known before it can be resolved.

exactly the same way C++ handles it, I think, only difference: the syntax is simple enough that the compiler can construct the parse tree without needing you to tell what kind of syntactical object your yet undeclared symbols refer to.

Related

Looking for clarification on the behavior of Reflection.Emit when used with a lambda expression

Specifically, I am attempting to create a method that takes in a lambda expression and then spins that expression off as the main method of a temporary console application at run time in order to allow for spinning off small sections of code as a separate process at run-time so I can better isolate the memory behavior (I did look into application domains but ran into other problems there due to certain limitations for my use case). Sort of a really limited fork.
This is relatively straightforward if it can be assumed that the lambda expression contains only local variables, but I am struggling with figuring out just how much I'll have to do (and how the best way to go about it is) if the expression also makes use of non-local variables (that is, a variable that exists within the enclosing scope, as per the comment below. I couldn't think of a better way at the time to phrase "a variable that is not within the local scope but is accessible").
To my knowledge, a non-local variable means a field load instruction of some sort will be generated within the MSIL. While I can potentially make a copy of the required objects/fields in the secondary application, if I try to take the MSIL of the lambda expression as-is via MethodBody.GetILAsByteArray(), then the generated code (I believe) will contain field load instructions that will be targeting metadata table entries that may (most likely will be) different than the metadata table entries for the copies of those objects/fields created in the console application via Reflection.Emit.
Further complicating this is the matter that closure, which I think/if I remember correctly means that any non-local reference within the lambda expression body will cause an object to be created that will hold the values (copies? references? I don't quite recall). I probably don't need to worry about that though, because it won't actually be a lambda expression once it is emitted as that second application in my particular use case? If I'm just getting the method body, I think I'd end up sidestepping the usual handling for closure?
Ultimately I have two questions:
A. Is there anything I'm missing in my general understanding of how this whole process will work
B. Will I have to go and just field table references in the MSIL and if so, what would be the most pragmatic way to go about this? Is there any way to get Reflection.Emit to make these adjustments for me?
Of course, I'd be happy to hear if there is some much less frustrating way to accomplish what I am attempting to do and am open to any suggestions.

Accessing local variables outside an anonymous method/lambda expression creates a closure object holding the variables. Assuming you pass the lambda as a delegate, the Target property will contain the outside state of the delegate (something like DisplayClass). Unless you modify the CIL of the method, you won't get real-time communication via this class from the remote process, but you can simply serialize it and pass it to the remote process. Of course, if the delegate depends on static fields, you are left with analyzing the method to find them and serialize them (using System.Linq.Expressions will be helpful).
Now, if the remote process references the main assembly, it will find the DisplayClass there, but if not, you will have to serialize even its type and build it on the other side, hooking it with the method using AppDomain.TypeResolve. Then you can deserialize the object with the created type.

Why does "dynamic" require language-specific runtime components?

Microsoft.CSharp is required to use dynamic feature.
I understand there are binders, evaluators and helpers in the assembly.
But why it has to be language-specific?
Why Microsoft.CSharp and not Microsoft.Dynamic or System.Dynamic?
Please, explain.
Let's say we have d.x where d is dynamic.
C# compiler
1. applies C# language rules
2. gets "property or field access"
3. emits (figurally) Binder.GetPropertyOrField(d, "x")
Now, being asked to reference Microsoft.CSharp may make one think that language-agnostic binder can't handle this case, and C#-only something got its way through compilation and requires special library.
Compiler had a bad day?

To your first question, it is language-specific because it needs to be.
In C# you call a method with too many arguments and you get an error. In Javascript, the extra arguments are simply ignored. In C# you access a member that doesn't exist and get an error, while in Javascript you get undefined. Even if you discovered all these varying feature sets and put it all into System.Core, the next language fad of the month is sure to have some super neat feature that it wouldn't support. It's better to be flexible.
There is common code in .NET core, under the System.Dynamic and System.Runtime.CompilerServices namespaces. It just can't all be common.
And as for your second question, the need for the "special C# library" could of course be removed by transforming these language-specific behaviors inline, but why? That will needlessly bloat your IL code size. It is the same reasoning for you not writing your own Int32.Parse every time you need to read in a number.

One reason I can think of - Visual Basic.NET has had late binding in it from day one, primarily oriented around how it interoperates with COM IDispatch interfaces - so if they wanted a language agnostic binder, they'd have had to adopt the Visual Basic rules - which includes that member lookup only works with Public members.
Apparently, the C# designers didn't want to be so strict. You can call this class' DoStuff method from C# via a dynamic reference:
public class Class1
{
internal void DoStuff()
{
Console.WriteLine("Hello");
}
}
Whereas attempting to call the same via Visual Basic's Object results in a MissingMemberException at runtime.
So because the C# designers weren't the first to arrive at the late-binding party, they could either follow Visual Basic's lead or they could say "each language will have its own rules" - they went with the latter.

If attributes are only constructed when they are reflected into, why are attribute constructors so limited?

As shown here, attribute constructors are not called until you reflect to get the attribute values. However, as you may also know, you can only pass compile-time constant values to attribute constructors. Why is this? I think many people would much prefer to do something like this:
[MyAttribute(new MyClass(foo, bar, baz, jQuery)]
than passing a string (causing stringly typed code too!) with those values, turned into strings, and then relying on Regex to try and get the value instead of just using the actual value, and instead of using compile-time warnings/errors depending on exceptions that might be thrown somewhere that has nothing to do with the class except that a method that it called uses some attributes that were typed wrong.
What limitation caused this?

Attributes are part of metadata. You need to be able to reflect on metadata in an assembly without running code in that assembly.
Imagine for example that you are writing a compiler that needs to read attributes from an assembly in order to compile some source code. Do you really want the code in the referenced assembly to be loaded and executed? Do you want to put a requirement on compiler writers that they write compilers that can run arbitrary code in referenced assemblies during the compilation? Code that might crash, or go into infinite loops, or contact databases that the developer doesn't have permission to talk to? The number of awful scenarios is huge and we eliminate all of them by requiring that attributes be dead simple.

The issue is with the constructor arguments. They need to come from somewhere, they are not supplied by code that consumes the attribute. They must be supplied by the Reflection plumbing when it creates the attribute object by calling its constructor. For which it needs the constructor argument values.
This starts at compile time with the compiler parsing the attribute and recording the constructor arguments. It stores those argument values in the assembly metadata in a binary format. At issue then is that the runtime needs a highly standardized way to deserialize those values, one that preferably doesn't depend on any of the .NET classes that you'd normally use the de/serialize data. Because there's no guarantee that such classes are actually available at runtime, they won't be in a very trimmed version of .NET like the Micro Framework.
Even something as common as binary serialization with the BinaryFormatter class is troublesome, note how it requires the [Serializable] attribute on the class to allow it to do its job. Versioning would also be an enormous problem, clearly such a serializer class could never change for the risk of breaking attributes in old assemblies.
This is a rock and a hard place, solved by the CLS designers by heavily restricting the allowed types for an attribute constructor. They didn't leave much, just the simple values types, string, a simple one-dimensional array of them and Type. Never a problem deserializing them since their binary representation is simple and unambiguous. Quite a restriction but attributes can still be pretty expressive. The ultimate fallback is to use a string and decode that string in the constructor at runtime. Creating an object of MyClass isn't an issue, you can do so in the attribute constructor. You'll have to encode the arguments that this constructor needs however as properties of the attribute.

The probably most correct answer as to why you can only use constants for attributes is because the C#/BCL design team did not judge supporting anything else important enough to be added (i.e. not worth the effort).
When you build, the C# compiler will instantiate the attributes you have placed in your code and serialize them, so that they can be stored in the generated assembly. It was probably more important to ensure that attributes can be retrieved quickly and reliably than it was to support more complex scenarios.
Also, code that fails because some attribute property value is wrong is much easier to debug than some framework-internal deserialization error. Consider what would happen if the class definition for MyClass was defined in an external assembly - you compile and embed one version, then update the class definition for MyClass and run your application: boom!
On the other hand, it's seriously frustrating that DateTime instances are not constants.

What limitation caused this?
The reason it isn't possible to do what you describe is probably not caused by any limitation, but it's purely a language design decision. Basically, when designing the language they said "this should be possible but not this". If they really wanted this to be possible, the "limitations" would have been dealt with and this would be possible. I don't know the specific reasoning behind this decision though.
/.../ passing a string (causing stringly typed code too!) with those values, turned into strings, and then relying on Regex to try and get the value instead of just using the actual value /.../
I have been in similar situations. I sometimes wanted to use attributes with lambda expressions to implement something in a functional way. But after all, c# is not a functional language, and if I wrote the code in a non-functional way I haven't had the need for such attributes.
In short, I think like this: If I want to develop this in a functional way, I should use a functional language like f#. Now I use c# and I do it in a non-functional way, and then I don't need such attributes.
Perhaps you should simply reconsider your design and not use the attributes like you currently do.
UPDATE 1:
I claimed c# is not a functional language, but that is a subjective view and there is no rigourous definition of "Functional Language". I agree with the Adam Wright, "/.../ As such, I wouldn't class C# as functional in general discussion - it is, at best, multi-paradigm, with some functional flavour." at Why is C# a functional programmming language?
UPDATE 2:
I found this post by Jon Skeet: https://stackoverflow.com/a/294259/1105687 It regards not allowing generic attribute types, but the reasoning could be similar in this case:
Answer from Eric Lippert (paraphrased): no particular reason, except
to avoid complexity in both the language and compiler for a use case
which doesn't add much value.

Checking for the existence a reference/type at compile time in .NET

I've recently found the need to check at compile-time whether either: a) a certain assembly reference exists and can be successfully resolved, or b) a certain class (whose fully qualified name is known) is defined. These two situations are equivalent for my purposes, so being able to check for one of them would be good enough. Is there any way to do this in .NET/C#? Preprocessor directives initially struck me as something that might help, but it seems it doesn't have the necessary capability.
Of course, checking for the existence of a type at runtime can be done easily enough, but unfortunately that won't resolve my particular problem in this situation. (I need to be able to ignore the fact that a certain reference is missing and thus fall-back to another approach in code.)

Is there a reason you can't add a reference and then use a typeof expression on a type from the assembly to verify it's available?
var x = typeof(SomeTypeInSomeAssembly);
If the assembly containing SomeTypeInSomeAssembly is not referenced and available this will not compile.

It sounds like you want the compiler to ignore one branch of code, which is really only doable by hiding it behind an #if block. Would defining a compiler constant and using #if work for your purposes?
#if MyConstant
.... code here that uses the type ....
#else
.... workaround code ....
#endif
Another option would be to not depend on the other class at compile-time at all, and use reflection or the .NET 4.0 dynamic keyword to use it. If it'll be called repeatedly in a perf-critical scenario in .NET 3.5 or earlier, you could use DynamicMethod to build your code on first use instead of using reflection every time.

I seem to have found a solution here, albeit not precisely for what I was initially hoping.
My Solution:
What I ended up doing is creating a new build configuration and then defining a precompiler constant, which I used in code to determine whether to use the reference, or to fall back to the alternative (guaranteed to work) approach. It's not fully automatic, but it's relatively simple and seems quite elegant - good enough for my purposes.
Alternative:
If you wanted to fully automate this, it could be done using a pre-build command that runs a Batch script/small program to check the availabilty of a given reference on the machine and then updates a file containing precompiler constants. This however I considered more effort than it was worth, though it may have been more useful if I had multiple independent references that I need to resolve (check availability).

Is it true that using "this." before the parameters in c# uses more memory?

than just to call the parameter as it is?

If you mean fields, then no. The compiler injects "this" (ldarg.0) whether you use it explicitly (this.foo) or implicitly (foo).
It does, however, take 5 more characters in your source code... so a handful of bytes on your development hard disk. It will make exactly zero difference in the compiled IL or at runtime.
There are two scenarios where use of "this" changes things:
when there is a variable/parameter with the same name (this.foo = foo;)
when resolving extension methods (this.SomeMethod();)

Your question is much too ambiguous to answer definitively but i would still start with a resounding No
then i'd want to know what exactly do you mean with parameter? I would normally interpret it as "argument to a method" but they are not tied to "this" within scope so you probably meant "members" such as fields, properties and/or methods.
If all of my assumptions about how to interpret your question are correct, I stand by my former "No".
But i would like to know where you got that idea from.

I do not know, if it uses more memory, but I don't think so, its only a clear reference, something that would be done under the hood as well by the compiler.

I guess you mean before the variable name? I do not see why it would use more memory. The CLR must be optimize whatever the syntax is to refer to the variable to a way that it doesn't affect the performance (after verification, it will add the this...). So no it doesn't.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.