How to add new operator in C# using Roslyn

How to add new operator in C# using Roslyn - c#

I am trying to implement a DSL like feature in C#. It may look something similar to LINQ queries. I am wondering if it is possible to implement new unary or binary operators using Roslyn.
I have been googling last few days without much success. It would be great if someone could point me to some samples or Roslyn documentations.

There are two ways how you could use Roslyn to implement a new C#-based language.
Use the Roslyn API to parse the source code into a syntax tree, then transform the syntax tree into actual C# and compile that.
This is ideal if your language is actually syntactically valid C# code, but the semantics are different. For example, you could implement await this way, if you forced await to look like a function call (e.g. await(x) would be valid, but not await x).
If you want to introduce new syntax (like a new operator), it might work, since Roslyn does support parsing “broken” code. But it most likely won't work that well, because then the syntax tree might not look the way you want. Worse, the results might not be consistent (sometimes, your new syntax will be parsed one way, sometimes another).
Since Roslyn is now open source, you can actually modify the source code of the compiler in any way you want, including adding a new operator.
But doing that is most likely not going to be simple. And I think the workflow is also going to be more complicated: you need to compile your own version of the compiler, not just use a library from NuGet like in option 1.

Related

How to access the AST generated by the Q# compiler?

Background
Part of the project I'm working on requires me to analyze Q# source code and perform specific actions when certain syntax elements are encountered. For example, say I'd like to count how many different gate types are used throughout the program. Now, this could be implemented by walking the Abstract Syntax Tree of the program and performing actions based on the current syntax node.
What I've tried
I've started by analyzing the qsharp-compiler repository, however, the inner workings of the compiler lack online documentation and browsing all the C# and F# sources can be really tedious.
Of course, I could write my own parser for the language, but that would probably be an overkill for the task at hand. There has to be a way to extract the AST from inside of the compiler.
The question
Is there a way to compile Q# source code using the Q# compiler programmatically (from C# or F#), and extract the internal AST?

Yes, it is perfectly possible to compile Q# source code programmatically. This is particularly useful if you want to repeatedly update a compilation - you can add/remove/edit (parts of) the sources and references in memory, and query all kinds of useful information about the current state of the compilation that e.g. an IDE cares about (like e.g. which symbols are defined at a particular location in a certain file).
However, if you just want to process the AST for a Q# compilation, then there is a much easier way! The Q# compiler has an extensibility mechanism that I believe fits your need perfectly.
This blog post gives a brief overview over the feature.
There is also an example for an extension on the compiler repo. This readme (and possibly this one) may also come in handy. I believe this answers half of your question, namely how to easily get access to the built AST.
The other half of the question according to my interpretation is how to conveniently analyze or transform the AST. For that there is also a mechanism provided; the syntax tree transformation framework. That framework consists of a couple of classes that define the walk/transformation for different kinds of nodes, as well as a wrapping class that plugs it all together.
Rather than starting by looking at the definition of the transformations, it is probably more intuitive to just look at some examples that use it. An example that is pretty close to what you want to do can be found here. The implemented transformation adds a comment to each callable listing all identifiers used within the callable. It is invoked as as part of a compilation step (see here) that is defined in the example I already linked above.
There are a couple of other good examples for simple transformations that are a bit farther from what you want to do, but should give you an idea how the whole setup works if you are interested: this one allows to attach attributes to callables, and this one is used to inline conjugations (pattern of the form U*VU).
Last but not least, the Gitter for the Q# community can possibly also be a good resource to engage as you work.

What's the best way to highlight C# syntax using Roslyn?

I'm trying to make a Xamarin.Android app that highlights the syntax of many different languages. I plan to use ANTLR to deal with most of them, but for C# I want to use Roslyn as that will undoubtedly be faster and less buggy than ANTLR.
What is the best way to implement syntax highlighting with Roslyn? For highlighting Java syntax, the approach I took was parsing the text into a parse tree, and using a visitor to color the text associated with each terminal. You can view my code here. Is this also a good idea for Roslyn, or does Roslyn provide APIs for syntax highlighting? (e.g. Does the code behind syntax highlighting in Visual Studio live in the dotnet/roslyn repo?) I'd really prefer not to reinvent the wheel, but I will if I have to.
edit: I have accepted Tamas' answer because his solution is the most practical for my use case; I do not have access to the full solution to build a semantic model with, so I will have to do some of my own analysis. However, if your app supports more broad C# integration and can build a semantic model, take a look at the Roslyn Classification APIs which are used in Jonathon Marolf's answer.

the ConsoleClassifier project in the roslyn Samples should be a good starting place.

Did you have a look at SourceBrowser? If you can do a full solution build, then I would use the same approach.
If your context doesn't allow a full build, then you can implement something relatively good based on syntax token types. However you might have to handle some corner cases, like contextual keywords, var, implicitly declared local variables (like value), etc. Have a look at what SonarQube is using.
Similarly, you can look for other tools that you know are Roslyn based, like OmniSharp. I'm not sure if that uses regex or Roslyn to do the highlighting. But in any case you could get quite far with Regex too.

Add a keyword to C# with code generation?

I have a domain specific language that I would like to interact with C# by adding new keywords (or some keyword-like syntax). Using attributes would be insufficient (I can't use them in method bodies), and shoehorning it into 'valid' C# notation that gets compiled into something else would be ugly and ruin the analogy with the DSL (and the translation from DSL-like notation to C# is nontrivial, so just writing the C# each time is out of the question).
I already have a way to parse the .cs file and transform it into legitimate, nontrivial, C# code which can be compiled.
The problem is, even through I can do all the work of defining the DSL, parsing it, and translating it into valid C#, Visual Studio won't let me use notation it doesn't understand; it just adds red squiggles, emits an error "cannot resolve symbol", and then often fails to properly parse things after it.
Is there a way to to force visual studio to ignore specific strings in its analysis? I've looked at visual studio plugins but it looks like, although I can do syntax highlighting and other stuff, I can't force it to ignore something it doesn't know how to parse (unless I'm missing some way to do that in the extension API, which is certainly possible).
I've skimmed through the Roslyn stuff and don't see offhand a way to do this there, either. (Again, may be missing something, it doesn't seem to have great documentation.)

Take a look at PowerLanguages.E: http://visualstudiogallery.msdn.microsoft.com/a512e0d0-f4f3-4435-bad4-8d5efbb1db4a
No english docs yet, sorry

C# - Disable Dynamic Keyword

Is there any way to disable the use of the "dynamic" keyword in .net 4?
I thought the Code Analysis feature of VS2010 might have a rule to fail the build if the dynamic keyword is used but I couldn't fine one.

It's part of the C# 4.0 language, so no not really.
You can use FXCop to look for it though and fail the build if it encounters it.
Style cop might work instead:
http://code.msdn.microsoft.com/sourceanalysis
Here is a link talking about the same issue and how style cop might be the answer. There is also a post about how to get FX cop to potentially look for the dynamic keyword, although it's not perfect.
http://social.msdn.microsoft.com/Forums/en/vstscode/thread/8ce407ba-bdf7-422b-bbcd-ca4701c3a76f

The dynamic keyword is not evil, but using it could be.
It leads to code errors that you can only find during runtime.
This should be avoided at all costs.
Runtime errors are bad. Compile time errors are good.
You could use something like the following to set your own standards.
http://joel.fjorden.se/static.php?page=CodeStyleEnforcer

Target .net 1.0? :-)
Or do code reviews.
(Or, to be less facetious, it should be pretty easy to write a custom FxCop or CA rule to disallow use of dynamic)
Wouldn't you just kill for a C++ macro right now? :-)

Remove the reference to Microsoft.CSharp.dll, and I think maybe all uses of dynamic will fail to compile.

I'm not sure I understand what this irrational fear of the dynamic keyword is for. There was this type of hysteria over anonymous variables and the var keyword for .NET 3.5 except that was just idiotic since those are legitimate statically defined types.
The dynamic keyword serves a highly specialized purpose, I don't see why any person would have the desire to use it without understanding why. However stopping that from occurring could be solved with 1 team meeting explaining some of the new features of .NET 4 including the dynamic keyword. I assume you're more of a senior or the senior lead of the team; it should be quite easy to tell your team if they ever feel they NEED to use the dynamic keyword to come see you FIRST.
This was exactly the instructions I gave to my team as I find it unlikely we will ever use the dynamic keyword because we don't write COM interop activity. And past that I will defer any type of dynamic proxy use to an established library like Linfu or Castle and leave up the implementation of dynamic proxies to them to use or not use the dynamic keyword.

Using reflection for code gen?

I'm writing a console tool to generate some C# code for objects in a class library. The best/easiest way I can actual generate the code is to use reflection after the library has been built. It works great, but this seems like a haphazard approch at best. Since the generated code will be compiled with the library, after making a change I'll need to build the solution twice to get the final result, etc. Some of these issues could be mitigated with a build script, but it still feels like a bit too much of a hack to me.
My question is, are there any high-level best practices for this sort of thing?

Its pretty unclear what you are doing, but what does seem clear is that you have some base line code, and based on some its properties, you want to generate more code.
So the key issue here are, given the base line code, how do you extract interesting properties, and how do you generate code from those properties?
Reflection is a way to extract properties of code running (well, at least loaded) into the same execution enviroment as the reflection user code. The problem with reflection is it only provides a very limited set of properties, typically lists of classes, methods, or perhaps names of arguments. IF all the code generation you want to do can be done with just that, well, then reflection seems just fine. But if you want more detailed properties about the code, reflection won't cut it.
In fact, the only artifact from which truly arbitrary code properties can be extracted is the the source code as a character string (how else could you answer, is the number of characters between the add operator and T in middle of the variable name is a prime number?). As a practical matter, properties you can get from character strings are generally not very helpful (see the example I just gave :).
The compiler guys have spent the last 60 years figuring out how to extract interesting program properties and you'd be a complete idiot to ignore what they've learned in that half century.
They have settled on a number of relatively standard "compiler data structures": abstract syntax trees (ASTs), symbol tables (STs), control flow graphs (CFGs), data flow facts (DFFs), program triples, ponter analyses, etc.
If you want to analyze or generate code, your best bet is to process it first into such standard compiler data structures and then do the job. If you have ASTs, you can answer all kinds of question about what operators and operands are used. If you have STs, you can answer questions about where-defined, where-visible and what-type. If you have CFGs, you can answer questions about "this-before-that", "what conditions does statement X depend upon". If you have DFFs, you can determine which assignments affect the actions at a point in the code. Reflection will never provide this IMHO, because it will always be limited to what the runtime system developers are willing to keep around when running a program. (Maybe someday they'll keep all the compiler data structures around, but then it won't be reflection; it will just finally be compiler support).
Now, after you have determined the properties of interest, what do you do for code generation? Here the compiler guys have been so focused on generation of machine code that they don't offer standard answers. The guys that do are the program transformation community (http://en.wikipedia.org/wiki/Program_transformation). Here the idea is to keep at least one representation of your program as ASTs, and to provide special support for matching source code syntax (by constructing pattern-match ASTs from the code fragments of interest), and provide "rewrite" rules that say in effect, "when you see this pattern, then replace it by that pattern under this condition".
By connecting the condition to various property-extracting mechanisms from the compiler guys, you get relatively easy way to say what you want backed up by that 50 years of experience. Such program transformation systems have the ability to read in source code,
carry out analysis and transformations, and generally to regenerate code after transformation.
For your code generation task, you'd read in the base line code into ASTs, apply analyses to determine properties of interesting, use transformations to generate new ASTs, and then spit out the answer.
For such a system to be useful, it also has to be able to parse and prettyprint a wide variety of source code langauges, so that folks other than C# lovers can also have the benefits of code analysis and generation.
These ideas are all reified in the
DMS Software Reengineering Toolkit. DMS handles C, C++, C#, Java, COBOL, JavaScript, PHP, Verilog, ... and a lot of other langauges.
(I'm the architect of DMS, so I have a rather biased view. YMMV).

Have you considered using T4 templates for performing the code generation? It looks like it's getting much more publicity and attention now and more support in VS2010.
This tutorial seems database centric but it may give you some pointers: http://www.olegsych.com/2008/09/t4-tutorial-creatating-your-first-code-generator/ in addition there was a recent Hanselminutes on T4 here: http://www.hanselminutes.com/default.aspx?showID=170.
Edit: Another great place is the T4 tag here on StackOverflow: https://stackoverflow.com/questions/tagged/t4
EDIT: (By asker, new developments)
As of VS2012, T4 now supports reflection over an active project in a single step. This means you can make a change to your code, and the compiled output of the T4 template will reflect the newest version, without requiring you to perform a second reflect/build step. With this capability, I'm marking this as the accepted answer.

You may wish to use CodeDom, so that you only have to build once.
First, I would read this CodeProject article to make sure there are not language-specific features you'd be unable to support without using Reflection.

From what I understand, you could use something like Common Compiler Infrastructure (http://ccimetadata.codeplex.com/) to programatically analyze your existing c# source.
This looks pretty involved to me though, and CCI apparently only has full support for C# language spec 2. A better strategy may be to streamline your existing method instead.

I'm not sure of the best way to do this, but you could do this
As a post-build step on your base dll, run the code generator
As another post-build step, run csc or msbuild to build the generated dll
Other things which depend on the generated dll will also need to depend on the base dll, so the build order remains correct

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.