Are there built-in "simplifications" with Roslyn? - c#

Is there any built-in way to use Roslyn to perform the same compile-time transformations that the C# compiler does, e.g. for transforming iterators, initializers, lambdas, LINQ, etc. into basic C# code?

The Roslyn compiler API is designed to (in addition to translating source code to IL) let you build source code analysis and transformations tools.
However, lambdas and iterators do not have translations that can always be specified using source. They are modeled using the internal bound node abstraction that includes additional compiler specific rules that can only be represented using IL.
It would be possible to translated LINQ to source in C#, since it is specified as a source code translation (whether the compiler actually does it that way or not.) Yet, there is no compiler API that does this specifically. If there was, it would probably show up as a services layer API and not a compiler API.

AFAIK, no, there is no such thing exposed in Roslyn. But the compiler has to do these transformations somehow, so it's possible you will be able to do this by accessing some internal method.
Of course, you could use Roslyn to make these transformations yourself, but that's not what you're asking.

Related

Is it possible to instruct C# compiler NOT to inline constants?

This question is related to How to detect static code dependencies in C# code in the presence of constants?
If type X depends on a constant defined in type Y, this dependency is not captured in the binary code, because the constant is inlined. Yet the dependency is there - try compiling X without Y and the compilation fails. So it is a compile time dependency, but not runtime.
I need to be able to discover such dependencies and scanning all the source code is prohibitively expensive. However, I have full control over the build and if there is a way to instruct the C# compiler not to inline constants - that is good enough for me.
Is there a way to compile C# code without inlining the constants?
EDIT 1
I would like to respond to all the comments so far:
I cannot modify the source code. This is not a toy project. I am analysing a big code base - millions of lines of C# code.
I am already using Roslyn API to examine the source code. However, I only do it when the binary code inspection (I use Mono.Cecil) of a method indicates the use of dynamic types. Analysing methods using dynamic with Roslyn is useful, because not all the dynamic usages are as bad as reflection. However, there is absolutely no way to figure out that a method uses a constant in general. Using Roslyn Analyser for that takes really long time, because of the code base size. Hence my "prohibitively expensive" statement.
I have an NDepend license and I used it at first. However, it only processes binary code. It does NOT see any dependencies introduced through constants. My analysis is better, because I drill down to dynamic users and employ Roslyn API to harvest as much as I can from such methods. NDepend does nothing of the kind. Moreover, it has bugs. For example, the latest version does not inspect generic method constraints and thus does not recognise any dependencies introduced by them.

Are IEnumerable<T>, Task<T> and IDisposable hard coded in the C# compiler?

I asked that question myself many times. I tried to find some blog post about that and even dug into the Roslyn source code, but have not found any complete answer on that.
Basically, with some modern C# language features the compiler will take some syntactic sugar and transforms it into more low-level C# code. Some example of those are:
using() generates a try-finally to definitely dispose an IDisposable
Functions returning an IEnumerable<T> with yield return will turn that function into an iterator implemented as a state machine
Functions marked with async have to return Task<T> (or similar) and will turn into a state machine too, which can be re-entered from the programs event-loop under the hood
So, these are all nice features, but the compiler is always enforcing the specific types IEnumerable<T>, Task<T> and IDisposable. Are these types somehow baked into the compiler? And isn't it true that the compiler is somehow bound to the standard library then, even though mscorlib is just plain C# code providing common functionality?
I cannot imagine that since programming languages are so abstract and general. As I have seen there is the possibility for await-ing anything as long as the type has an GetAwaiter extension method. That sounds more abstract to me.
Edit
Also, if anyone can point me to the source code which specifies the required predefined types in the compiler, let me know!
Sort-of.
The compiler has lists of "special" (used in the type-system / binder) and "well-known" (referenced by generated code) types and members, which are hard-coded by name in Roslyn source. However, all that it cares about are the names & methods / signatures of these types / members; you can still write your own mscorlib (and people have done this) as long as it has them.
See
http://sourceroslyn.io/#Microsoft.CodeAnalysis/SpecialType.cs
http://sourceroslyn.io/#Microsoft.CodeAnalysis/SpecialMember.cs
http://sourceroslyn.io/#Microsoft.CodeAnalysis/WellKnownTypes.cs
http://sourceroslyn.io/#Microsoft.CodeAnalysis/Symbols/WellKnownMemberNames.cs

How do Generics in C# provide source code protection

I am reading the book "CLR Via C#" and in the Generics chapter is said:
Source code protecton
The developer using a generic algorithm doesn't need to have access to the algorithm's source code. With C++ templates or Java's generics, however, the algorithm's source code must be available to the developer who is using the algorithm.
Can anyone explain what exactly is meant by this?
Well, Generic classes are distributed in compiled form, unlike C++, where templates need to be distributed in full source code. So you do not need to distribute the C# source code of a library that contains generic classes.
This does not prevent the Receiver of your class from disassembling it though (as it is compiled to IL which can be rather easily decompiled again). To really protect the code, additional methods, such as obfuscation are required.
Behind the scene: This distribution in compiled form is the reason why C# generics and C++ templates also differ in the way they need to be written. C# generic classes and their methods need to be fully defined at the time of compilation, and any error in the definition of the generic class or their methods or any operation on a template parameter which cannot be deduced at compile time will directly produce a compile error. In C++ the template is only compiled at the time of usage and only the methods actually used are compiled. If you have an undefined operation or even a syntactical error in a template definition, you will only see the error when that function is actually instantiated and used.

How to avoid or detect implicit delegate inference in C#?

I am writing a game using C# and have found a number of cases where a function takes a delegate and I have inadvertently passed in a function name instead of creating and caching a delegate to use as the parameter instead. This causes a delegate object to be created for each call to these functions, which then immediately becomes garbage when the function returns.
I'd like to find all the places where I've made this mistake, and I'd prefer to avoid reading every line of every file looking for them (there are years worth of code). I saw how VB has an 'option strict' which will disable implicit construction of objects which I think would work for me if C# had that feature, but I don't think it does. I've also reviewed compiler warning options and none of them seem to help here either.
Is there any reasonably convenient way to identify these objects created by implicit delegate inference so I can find out where I need to create/cache the callbacks to avoid the garbage?
In short, your question is "how can I find all method group conversions?"
We are at present working on a project code-named Roslyn which will allow you to use the same semantic analysis engine that the C# compiler and IDE uses. It will expose the syntactic model of the language, and then provide a semantic analysis API by which you can ask questions of the semantic analyzer.
With Roslyn you could compile all your code into syntax trees and then search those syntax trees for every expression. There will be an API that allows you to determine whether the expression was converted to anything, and if so, how the conversion analyzer classified the conversion.
We are at present in the "community technology preview" stage; we have a preliminary implementation but it is nowhere near fully featured yet. I do not remember if the method group conversion analyzer was implemented in the CTP release or not.
Give it a try, and if you have feedback about it we would love to hear your thoughts on the Roslyn forum.
Details here:
http://msdn.microsoft.com/en-us/roslyn

Expression evaluation design questions

I a modeling a system to evaluate expressions. Now the operands in these expressions can be of one of several types including some primitive .NET types. When defining my Expression class, I want some degree of type-safety and therefore don't want to use 'object' for the operand object type, so I am considering defining an abstract Operand base class with nothing in it and creating a subclass for each type of Operand. What do you think of this?
Also, only some types of operands make sense with others. And finally, only some operators make sense with particular operands. I can't really think of a way to implement these rules at compile-time so I'm thinking I'll have to do these checks at runtime.
Any ideas on how I might be able to do this better?
I'm not sure if C based languages have this, however Java has several packages that would really make sense for this.
The JavaCC or java compiler compiler allows you to define a language (your expressions for example) and them build the corresponding java classes. A somewhat more user friendly if not more experimental and academic package is DemeterJ - this allows you to very easily specify the expression language and comes with a library for defining visitors and strategies to operate over the generated class structure. If you could afford to switch to Java I might try that. Other wise I'd look for a C# clone of one of these technologies.
Another thing to consider if you go down this route is that once you've generated your class structure within some reasonable approximation of the end result, you can subclass all of the generated classes and build all of your application specific login into the subclasses. That way if you really need to regenerate a new model for the expression language your logic will be relatively independent of your class hierarchy.
Update: Actually it looks as though some of this stuff is ported to .NET technology though I havent used it so I'm not sure what shape it may be in:
http://www.ccs.neu.edu/home/lieber/inside-impl.html
good luck!
How about Expression in 3.5? I recently wrote an expression parser/compiler using this.
I've recently built a dynamic expression evaluator. What I found to be effective was to create, as you suggested, a BaseOperand with meaningful derived classes (NumericOperand, StringOperand, DateOperand, etc) Depending on your implementation, generics may make sense as well (Operand).
Through the implementation of the Visitor pattern, you can perform any kind of validation you like.
I had a very specific need to roll my own solution, but there are many options already available for processing expressions. You may want to take a look at some of these for inspiration or to avoid reinventing the wheel.
I found a good approach to handle the types of objects with EXPRESSIONOASIS framework. They are using custom data structure to carry the types of the objects. So after parsing the operand with regular expressions and given expressions, they decide the type and store this type as property of a generic class which can be used any time for getting the type.
http://code.google.com/p/expressionoasis/

Categories

Resources