Related
Background
Part of the project I'm working on requires me to analyze Q# source code and perform specific actions when certain syntax elements are encountered. For example, say I'd like to count how many different gate types are used throughout the program. Now, this could be implemented by walking the Abstract Syntax Tree of the program and performing actions based on the current syntax node.
What I've tried
I've started by analyzing the qsharp-compiler repository, however, the inner workings of the compiler lack online documentation and browsing all the C# and F# sources can be really tedious.
Of course, I could write my own parser for the language, but that would probably be an overkill for the task at hand. There has to be a way to extract the AST from inside of the compiler.
The question
Is there a way to compile Q# source code using the Q# compiler programmatically (from C# or F#), and extract the internal AST?
Yes, it is perfectly possible to compile Q# source code programmatically. This is particularly useful if you want to repeatedly update a compilation - you can add/remove/edit (parts of) the sources and references in memory, and query all kinds of useful information about the current state of the compilation that e.g. an IDE cares about (like e.g. which symbols are defined at a particular location in a certain file).
However, if you just want to process the AST for a Q# compilation, then there is a much easier way! The Q# compiler has an extensibility mechanism that I believe fits your need perfectly.
This blog post gives a brief overview over the feature.
There is also an example for an extension on the compiler repo. This readme (and possibly this one) may also come in handy. I believe this answers half of your question, namely how to easily get access to the built AST.
The other half of the question according to my interpretation is how to conveniently analyze or transform the AST. For that there is also a mechanism provided; the syntax tree transformation framework. That framework consists of a couple of classes that define the walk/transformation for different kinds of nodes, as well as a wrapping class that plugs it all together.
Rather than starting by looking at the definition of the transformations, it is probably more intuitive to just look at some examples that use it. An example that is pretty close to what you want to do can be found here. The implemented transformation adds a comment to each callable listing all identifiers used within the callable. It is invoked as as part of a compilation step (see here) that is defined in the example I already linked above.
There are a couple of other good examples for simple transformations that are a bit farther from what you want to do, but should give you an idea how the whole setup works if you are interested: this one allows to attach attributes to callables, and this one is used to inline conjugations (pattern of the form U*VU).
Last but not least, the Gitter for the Q# community can possibly also be a good resource to engage as you work.
I'm now porting some library that uses expressions to .Net Core application and encountered a problem that all my logic is based on LambdaExpression.CompileToMethod which is simply missing in. Here is sample code:
public static MethodInfo CompileToInstanceMethod(this LambdaExpression expression, TypeBuilder tb, string methodName, MethodAttributes attributes)
{
...
var method = tb.DefineMethod($"<{proxy.Name}>__StaticProxy", MethodAttributes.Private | MethodAttributes.Static, proxy.ReturnType, paramTypes);
expression.CompileToMethod(method);
...
}
Is it possible to rewrite it somehow to make it possible to generate methods using Expressions? I already can do it with Emit but it's quite complex and i'd like to avoid it in favor of high-level Expressions.
I tried to use var method = expression.Compile().GetMethodInfo(); but in this case I get an error:
System.InvalidOperationException : Unable to import a global method or
field from a different module.
I know that I can emit IL manually, but I need exactly convert Expression -> to MethodInfo binded to specific TypeBuilder instead of building myself DynamicMethod on it.
It is not an ideal solution but it is worth considering if you don't want to write everything from the scratch:
If you look on CompileToMethod implementation, you will see that under the hood it uses internal LambdaCompiler class.
If you dig even deeper, you willl see that LambdaCompiler uses System.Reflection.Emit to convert lambdas into MethodInfo.
System.Reflection.Emit is supported by .NET Core.
Taking this into account, my proposition is to try to reuse the LambdaCompiler source code. You can find it here.
The biggest problem with this solution is that:
LambdaCompiler is spread among many files so it may be cumbersome to find what is needed to compile it.
LambdaCompiler may use some API which is not supported by .NET Core at all.
A few additional comments:
If you want to check which API is supported by which platform use .NET API Catalog.
If you want to see differences between .NET standard versions use this site.
Disclaimer: I am author of the library.
I have created Expression Compiler, which has similar API to that of Linq Expressions, with slight changes. https://github.com/yantrajs/yantra/wiki/Expression-Compiler
var a = YExpression.Parameter(typeof(int));
var b = YExpression.Parameter(typeof(int));
var exp = YExpression.Lambda<Func<int,int,int>>("add",
YExpression.Binary(a, YOperator.Add, b),
new YParameterExpression[] { a, b });
var fx = exp.CompileToStaticMethod(methodBuilder);
Assert.AreEqual(1, fx(1, 0));
Assert.AreEqual(3, fx(1, 2));
This library is part of JavaScript compiler we have created. We are actively developing it and we have added features of generators and async/await in JavaScript, so Instead of using Expression Compiler, you can create debuggable JavaScript code and run C# code easily into it.
I ran into the same issue when porting some code to netstandard. My solution was to compile the lambda to a Func using the Compile method, store the Func in a static field that I added to my dynamic type, then in my dynamic method I simply load and call the Func from that static field. This allows me to create the lambda using the LINQ Expression APIs instead of reflection emit (which would have been painful), but still have my dynamic type implement an interface (which was another requirement for my scenario).
Feels like a bit of a hack, but it works, and is probably easier than trying to recreate the CompileToMethod functionality via LambdaCompiler.
Attempting to get LambdaCompiler working on .NET Core
Building on Michal Komorowski's answer, I decided to give porting LambdaCompiler to .NET Core a try. You can find my effort here (GitHub link). The fact that the class is spread over multiple files is honestly one of the smallest problems here. A much bigger problem is that it relies on internal parts in the .NET Core codebase.
Quoting myself from the GitHub repo above:
Unfortunately, it is non-trivial because of (at least) the following issues:
AppDomain.CurrentDomain.DefineDynamicAssembly is unavailable in .NET Core - AssemblyBuilder.DefineDynamicAssembly replaces is. This SO answer describes how it can be used.
Assembly.DefineVersionInfoResource is unavailable.
Reliance on internal methods and properties, for example BlockExpression.ExpressionCount, BlockExpression.GetExpression, BinaryExpression.IsLiftedLogical etc.
For the reasons given above, an attempt to make this work as a standalone package is quite unfruitful in nature. The only realistic way to get this working would be to include it in .NET Core proper.
This, however, is problematic for other reasons. I believe the licensing is one of the stumbling stones here. Some of this code was probably written as part of the DLR effort, which it itself Apache-licensed. As soon as you have contributions from 3rd party contributors in the code base, relicensing it (to MIT, like the rest of the .NET Core codebase) becomes more or less impossible.
Other options
I think your best bet at the moment, depending on the use case, is one of these two:
Use the DLR (Dynamic Language Runtime), available from NuGet (Apache 2.0-licensed). This is the runtime that empowers IronPython, which is to the best of my knowledge the only actively maintained DLR-powered language out there. (Both IronRuby and IronJS seems to be effectively abandoned.) The DLR lets you define lambda expressions using Microsoft.Scripting.Ast.LambdaBuilder; this doesn't seem to be directly used by IronPython though. There is also the Microsoft.Scripting.Interpreter.LightCompiler class which seems to be quite interesting.
The DLR unfortunately seems to be quite poorly documented. I think there is a wiki being referred to by the CodePlex site, but it's offline (can probably be accessed by downloading the archive on CodePlex though).
Use Roslyn to compile the (dynamic) code for you. This probably has a bit of a learning curve as well; I am myself not very familiar with it yet unfortunately.
This seems to have quite a lot of curated links, tutorials etc: https://github.com/ironcev/awesome-roslyn. I would recommend this as a starting point. If you're specifically interested in building methods dynamically, these also seem worth reading:
https://gunnarpeipman.com/using-roslyn-to-build-object-to-object-mapper/
http://www.tugberkugurlu.com/archive/compiling-c-sharp-code-into-memory-and-executing-it-with-roslyn
Here are some other general Roslyn reading links. Most of these links are however focused on analyzing C# code here (which is one of the use cases for Roslyn), but Roslyn can be used to generate IL code (i.e. "compile") C# code as well.
The .NET Compiler Platform SDK: https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/
Get started with syntax analysis: https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/get-started/syntax-analysis
Tutorial: Write your first analyzer and code fix: https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/tutorials/how-to-write-csharp-analyzer-code-fix
There is also the third option, which is probably uninteresting for most of us:
Use System.Reflection.Emit directly, to generate the IL instructions. This is the approach used by e.g. the F# compiler.
I'm making a Genetic Program, but I'm hitting a limitation with C# where I want to present new functions to the algorithm but I can't do it without recompiling the program. In essence I want the user of the program to provide the allowed functions and the GP will automatically use them. It would be great if the user is required to know as little about programming as possible.
I want to plug in the new functions without compiling them into the program. In Python this is easy, since it's all interpreted, but I have no clue how to do it with C#. Does anybody know how to achieve this in C#? Are there any libraries, techniques, etc?
It depends on how you want the user of the program to "provide the allowed functions."
If the user is choosing functions that you've already implemented, you can pass these around as delegates or expression trees.
If the user is going to write their own methods in C# or another .NET language, and compile them into an assembly, you can load them using Reflection.
If you want the user to be able to type C# source code into your program, you can compile that using CodeDom, then call the resulting assembly using Reflection.
If you want to provide a custom expression language for the user, e.g. a simple mathematical language, then (assuming you can parse the language) you can use Reflection.Emit to generate a dynamic assembly and call that using -- you guessed it -- Reflection. Or you can construct an expression tree from the user code and compile that using LINQ -- depends on how much flexibility you need. (And if you can afford to wait, expression trees in .NET 4.0 remove many of the limitations that were in 3.5, so you may be able to avoid Reflection.Emit altogether.)
If you are happy for the user to enter expressions using Python, Ruby or another DLR language, you can host the Dynamic Language Runtime, which will interpret the user's code for you.
Hosting the DLR (and IronPython or IronRuby) could be a good choice here because you get a well tested environment and all the optimisations the DLR provides. Here's a how-to using IronPython.
Added in response to your performance question: The DLR is reasonably smart about optimisation. It doesn't blindly re-interpret the source code every time: once it has transformed the source code (or, specifically, a given function or class) to MSIL, it will keep reusing that compiled representation until the source code changes (e.g. the function is redefined). So if the user keeps using the same function but over different data sets, then as long as you can keep the same ScriptScope around, you should get decent perf; ditto if your concern is just that you're going to run the same function zillions of times during the genetic algorithm. Hosting the DLR is pretty easy to do, so it shouldn't be hard to do a proof of concept and measure to see if it's up to your needs.
You can try to create and manipulate Expression Trees. Use Linq to evaluate expression trees.
You can also use CodeDom to compile and run a function.
For sure you can google to see some examples that might fit your needs.
It seems that this article "How to dynamically compile C# code" and this article "Dynamically executing code in .Net" could help you.
You have access to the Compiler from within code, you can then create instances of the compiled code and use them without restarting the application. There are examples of it around
Here
and
Here
The second one is a javascript evaluator but could be adapted easily enough.
You can take a look at System.Reflection.Emit to generate code at the IL level.
Or generate C#, compile into a library and load that dynamically. Not nearly as flexible.
It is in fact very easy to generate IL. See this tutorial: http://www.meta-alternative.net/calc.pdf
I am currently developing an application where you can create "programs" with it without writing source code, just click&play if you like.
Now the question is how do I generate an executable program from my data model. There are many possibilities but I am not sure which one is the best for me. I need to generate assemblies with classes and namespace and everything which can be part of the application.
CodeDOM class: I heard of lots of limitations and bugs of this class. I need to create attributes on method parameters and return values. Is this supported?
Create C# source code programmatically and then call CompileAssemblyFromFile on it: This would work since I can generate any code I want and C# supports most CLR features. But wouldn't this be slow?
Use the reflection ILGenerator class: I think with this I can generate every possible .NET code. But I think this is much more complicated and error prone than the other approaches?
Are there other possible solutions?
EDIT:
The tool is general for developing applications, it is not restricted to a specific domain. I don't know if it can be considered a visual programming language. The user can create classes, methods, method calls, all kinds of expressions. It won't be very limitating because you should be able to do most things which are allowed in real programming languages.
At the moment lots of things must still be written by the user as text, but the goal at the end is, that nearly everything can be clicked together.
You my find it is rewarding to look at the Dynamic Language Runtime which is more or less designed for creating high-level languages based on .NET.
It's perhaps also worth looking at some of the previous Stack Overflow threads on Domain Specific Languages which contain some useful links to tools for working with DSLs, which sounds a little like what you are planning although I'm still not absolutely clear from the question what exactly your aim is.
Most things "click and play" should be simple enough just to stick some pre-defined building-block objects together (probably using interfaces on the boundaries). Meaning: you might not need to do dynamic code generation - just "fake it". For example, using property-bag objects (like DataTable etc, although that isn't my first choice) for values, etc.
Another option for dynamic evaluation is the Expression class; especially in .NET 4.0, this is hugely versatile, and allows compilation to a delegate.
Do the C# source generation and don't care about speed until it matters. The C# compiler is quite quick.
When I wrote a dynamic code generator, I relied heavily on System.Reflection.Emit.
Basically, you programatically create dynamic assemblies and add new types to them. These types are constructed using the Emit constructs (properties, events, fields, etc..). When it comes to implementing methods, you'll have to use an ILGenerator to pump out MSIL op-codes into your method. That sounds super scary, but you can use a couple of tools to help:
A pre-built sample implementation
ILDasm to inspect the op-codes of the sample implementation.
It depends on your requirements, CodeDOM would certainly be the best fit for a "program" stored it in a "data model".
However its unlikely that using option 2 will be in any way measurably slower in comparision with any other approach.
I would echo others in that 1) the compiler is quick, and 2) "Click and Play" things should be simple enough so that no single widget added to a pile of widgets can make it an illegal pile.
Good luck. I'm skeptical that you can achieve point (2) for anything but really toy-level programs.
I'm writing a console tool to generate some C# code for objects in a class library. The best/easiest way I can actual generate the code is to use reflection after the library has been built. It works great, but this seems like a haphazard approch at best. Since the generated code will be compiled with the library, after making a change I'll need to build the solution twice to get the final result, etc. Some of these issues could be mitigated with a build script, but it still feels like a bit too much of a hack to me.
My question is, are there any high-level best practices for this sort of thing?
Its pretty unclear what you are doing, but what does seem clear is that you have some base line code, and based on some its properties, you want to generate more code.
So the key issue here are, given the base line code, how do you extract interesting properties, and how do you generate code from those properties?
Reflection is a way to extract properties of code running (well, at least loaded) into the same execution enviroment as the reflection user code. The problem with reflection is it only provides a very limited set of properties, typically lists of classes, methods, or perhaps names of arguments. IF all the code generation you want to do can be done with just that, well, then reflection seems just fine. But if you want more detailed properties about the code, reflection won't cut it.
In fact, the only artifact from which truly arbitrary code properties can be extracted is the the source code as a character string (how else could you answer, is the number of characters between the add operator and T in middle of the variable name is a prime number?). As a practical matter, properties you can get from character strings are generally not very helpful (see the example I just gave :).
The compiler guys have spent the last 60 years figuring out how to extract interesting program properties and you'd be a complete idiot to ignore what they've learned in that half century.
They have settled on a number of relatively standard "compiler data structures": abstract syntax trees (ASTs), symbol tables (STs), control flow graphs (CFGs), data flow facts (DFFs), program triples, ponter analyses, etc.
If you want to analyze or generate code, your best bet is to process it first into such standard compiler data structures and then do the job. If you have ASTs, you can answer all kinds of question about what operators and operands are used. If you have STs, you can answer questions about where-defined, where-visible and what-type. If you have CFGs, you can answer questions about "this-before-that", "what conditions does statement X depend upon". If you have DFFs, you can determine which assignments affect the actions at a point in the code. Reflection will never provide this IMHO, because it will always be limited to what the runtime system developers are willing to keep around when running a program. (Maybe someday they'll keep all the compiler data structures around, but then it won't be reflection; it will just finally be compiler support).
Now, after you have determined the properties of interest, what do you do for code generation? Here the compiler guys have been so focused on generation of machine code that they don't offer standard answers. The guys that do are the program transformation community (http://en.wikipedia.org/wiki/Program_transformation). Here the idea is to keep at least one representation of your program as ASTs, and to provide special support for matching source code syntax (by constructing pattern-match ASTs from the code fragments of interest), and provide "rewrite" rules that say in effect, "when you see this pattern, then replace it by that pattern under this condition".
By connecting the condition to various property-extracting mechanisms from the compiler guys, you get relatively easy way to say what you want backed up by that 50 years of experience. Such program transformation systems have the ability to read in source code,
carry out analysis and transformations, and generally to regenerate code after transformation.
For your code generation task, you'd read in the base line code into ASTs, apply analyses to determine properties of interesting, use transformations to generate new ASTs, and then spit out the answer.
For such a system to be useful, it also has to be able to parse and prettyprint a wide variety of source code langauges, so that folks other than C# lovers can also have the benefits of code analysis and generation.
These ideas are all reified in the
DMS Software Reengineering Toolkit. DMS handles C, C++, C#, Java, COBOL, JavaScript, PHP, Verilog, ... and a lot of other langauges.
(I'm the architect of DMS, so I have a rather biased view. YMMV).
Have you considered using T4 templates for performing the code generation? It looks like it's getting much more publicity and attention now and more support in VS2010.
This tutorial seems database centric but it may give you some pointers: http://www.olegsych.com/2008/09/t4-tutorial-creatating-your-first-code-generator/ in addition there was a recent Hanselminutes on T4 here: http://www.hanselminutes.com/default.aspx?showID=170.
Edit: Another great place is the T4 tag here on StackOverflow: https://stackoverflow.com/questions/tagged/t4
EDIT: (By asker, new developments)
As of VS2012, T4 now supports reflection over an active project in a single step. This means you can make a change to your code, and the compiled output of the T4 template will reflect the newest version, without requiring you to perform a second reflect/build step. With this capability, I'm marking this as the accepted answer.
You may wish to use CodeDom, so that you only have to build once.
First, I would read this CodeProject article to make sure there are not language-specific features you'd be unable to support without using Reflection.
From what I understand, you could use something like Common Compiler Infrastructure (http://ccimetadata.codeplex.com/) to programatically analyze your existing c# source.
This looks pretty involved to me though, and CCI apparently only has full support for C# language spec 2. A better strategy may be to streamline your existing method instead.
I'm not sure of the best way to do this, but you could do this
As a post-build step on your base dll, run the code generator
As another post-build step, run csc or msbuild to build the generated dll
Other things which depend on the generated dll will also need to depend on the base dll, so the build order remains correct