How to parse simple statement into CodeDom object

How to parse simple statement into CodeDom object - c#

I need to parse a simple statement (essentially a chain of function calls on some object) represented as a string variable into a CodeDom object (probably a subclass of CodeStatement). I would also like to provide some default imports of namespaces to be able to use less verbose statements.
I have looked around SO and the Internet to find some suggestions but I'm quite confused about what is and isn't possible and what is the simplest way to do it. For example this question seems to be almost what I want, unfortunately I can't use the solution as the CodeSnippetStatement seems not to be supported by the execution engine that I use (the WF rules engine).
Any suggestions that could help me / point me into the right direction ?

There is no library or function to parse C# code into CodeDOM objects as part of the standard .NET libraries. The CodeDOM libraries have some methods that seem to be designed for this, but none of them are actually implemented. As far as I know, there is some implementation available in Visual Studio (used e.g. by designers), but that is only internal.
CodeSnippetStatement is a CodeDOM node that allows you to place any string into the generated code. If you want to create CodeDOM tree just to generate C# source code, than this is usually fine (the source code generator just prints the string to the output). If the WF engine needs to understand the code in your string (and not just generate source code and compile it), than CodeSnippetStatement won't work.
However, there are 3rd party tools that can be used for parsing C# source code. In one project I worked on, we used NRefactory library (which is used in SharpDevelop) and it worked quite well. It gives you some tree (AST) representing the parsed code and I'm afraid you'll need to convert this to the corresponding CodeDOM tree yourself.

I have found a library implementation here that seems to cover pretty much everything I need for my purposes. I don't know if it's robust enough to be used in business scenarios, but for my unit tests it's pretty much all I can ask for.

Related

how to break up the code in a method body into statement by statement symbols/"tokens"

I'm writing something that will examine a function and rewrite that function in another language so basically if inside my function F1, i have this line of code var x=a.b(1) how do i break up the function body into symbols or "tokens"?
I've searched around and thought that stuff in System.Reflection.MethodInfo.GetMethodBody would do the trick however that class doesn't seem to be able to have the capabilities to do what i want..
what other solutions do we have?
Edit:
Is there anyway we can get the "method body" of a method using reflection? (like as a string or something)
Edit 2:
basically what I'm trying to do is to write a program in c#/vb and when i hit F5 a serializer function will (use reflection and) take the entire program (all the classes in that program) and serialize it into a single javascript file. of course javascript doesn't have the .net library so basically the C#/VB program will limit its use of classes to the .js library (which is a library written in c#/vb emulating the framework of javascript objects).
The advantage is that i have type safety while coding my javascript classes and many other benefits like using overloading and having classes/etc. since javascript doesn't have classes/overloading features natively, it rely on hacks to get it done. so basically the serializer function will write the javascript based on the C#/VB program input for me (along with all the hacks and possible optimizations).
I'm trying to code this serializer function

It sounds like you want a parse tree, which Reflection won't give you. Have a look at NRefactory, which is a VB and C# parser.

If you want to do this, the best way would be to parse the C#/VB code with a parser/lexer, such as the Gardens Point Parser Generator, flex/bison or ANTLR. then at the token level, reassemble it with proper javascript grammar. There are a few out there for C# and Java.

See this answer on analyzing and transforming source code
and this one on translating between programming languages.
These assume that you use conventional compiler methods for breaking your text into tokens ("lexing") and grouping related tokens into program structures ("parsing"). If you analysis is anything other than trivial, you'll need all the machinery, or it won't be reliable.
Reflection can only give you what the language designers decided to give you. They invariably don't give you detail inside functions.

If you want to go from IL to other language it may be easier than parsing source language first. If you want to go this route consider reading on Microsoft's "Volta" project (IL->JavaScript), while project is no longer available there are still old blogs discussing issues around it.
Note that reflection alone is not enough - reflection gives you byte array for the body of any particular method (MethodInfo.GetMethodBody.GetILAsByteArray - http://msdn.microsoft.com/en-us/library/system.reflection.methodbody.aspx) and you have to read it. There are several publically available "IL reader" libraries.

C# dynamic code evaluation, Eval, REPL

Does anyone know if there is a way to evaluate c# code at runtime.
eg. I would like to allow a user to enter DateTime.Now.AddDays(1), or something similar, as a string and then evaluate the string to get the result.
I woder if it is possible to access the emmediate windows functionality, since it seems that is evaluates every line entered dynamically.
I have found that VB has an undocumented EbExecuteLine() API function from the VBA*.dll and wonder if there is something equivalent for c#.
I have also found a custom tool https://github.com/DavidWynne/CSharpEval (it used to be at kamimucode.com but the author has moved it to GitHub) that seems to do it, but I would prefer something that comes as part of .NET
Thanks

Mono has the interactive command line (csharp.exe)
You can look at it's source code to see exactly how it does it's magic:
https://github.com/mono/mono/raw/master/mcs/tools/csharp/repl.cs

As you've probably already seen, there is no built-in method for evaluating C# code at runtime. This is the primary reason that the custom tool you mentioned exists.
I also have a C# eval program that allows for evaluating C# code. It provides for evaluating C# code at runtime and supports many C# statements. In fact, this code is usable within any .NET project, however, it is limited to using C# syntax. Have a look at my website, http://csharp-eval.com, for additional details.

Microsoft's C# compiler don't have Compiler-as-a-Service yet (Should come with C# 5.0).
You can either use Mono's REPL, or write your own service using CodeDOM

Its not fast but you can compile the code on the fly, see my previous question,
Once you have the assembly and you know the type name you can construct an instance of your compiled class using reflection and execute your method..

The O2 Platform's C# REPL Script Environment use the Fluent# APIs which have a real powerful reflection API that allows you do execute code snippets.
For example:
"return DateTime.Now.ToLongTimeString();".executeCodeSnippet();
will return
5:01:22 AM
note that the "...".executeCodeSnippet(); can actually execute any valid C# code snippet (so it is quite powerful).
If you want to control what your users can execute, I could use AST trees to limite the C# features that they have access to.
Also take a look at the Microsoft's Roslyn, which is VERY powerful as you can see on Multiple Roslyn based tools (all running Stand-Alone outside VisualStudio)

C# how to create functions that are interpreted at runtime

I'm making a Genetic Program, but I'm hitting a limitation with C# where I want to present new functions to the algorithm but I can't do it without recompiling the program. In essence I want the user of the program to provide the allowed functions and the GP will automatically use them. It would be great if the user is required to know as little about programming as possible.
I want to plug in the new functions without compiling them into the program. In Python this is easy, since it's all interpreted, but I have no clue how to do it with C#. Does anybody know how to achieve this in C#? Are there any libraries, techniques, etc?

It depends on how you want the user of the program to "provide the allowed functions."
If the user is choosing functions that you've already implemented, you can pass these around as delegates or expression trees.
If the user is going to write their own methods in C# or another .NET language, and compile them into an assembly, you can load them using Reflection.
If you want the user to be able to type C# source code into your program, you can compile that using CodeDom, then call the resulting assembly using Reflection.
If you want to provide a custom expression language for the user, e.g. a simple mathematical language, then (assuming you can parse the language) you can use Reflection.Emit to generate a dynamic assembly and call that using -- you guessed it -- Reflection. Or you can construct an expression tree from the user code and compile that using LINQ -- depends on how much flexibility you need. (And if you can afford to wait, expression trees in .NET 4.0 remove many of the limitations that were in 3.5, so you may be able to avoid Reflection.Emit altogether.)
If you are happy for the user to enter expressions using Python, Ruby or another DLR language, you can host the Dynamic Language Runtime, which will interpret the user's code for you.
Hosting the DLR (and IronPython or IronRuby) could be a good choice here because you get a well tested environment and all the optimisations the DLR provides. Here's a how-to using IronPython.
Added in response to your performance question: The DLR is reasonably smart about optimisation. It doesn't blindly re-interpret the source code every time: once it has transformed the source code (or, specifically, a given function or class) to MSIL, it will keep reusing that compiled representation until the source code changes (e.g. the function is redefined). So if the user keeps using the same function but over different data sets, then as long as you can keep the same ScriptScope around, you should get decent perf; ditto if your concern is just that you're going to run the same function zillions of times during the genetic algorithm. Hosting the DLR is pretty easy to do, so it shouldn't be hard to do a proof of concept and measure to see if it's up to your needs.

You can try to create and manipulate Expression Trees. Use Linq to evaluate expression trees.

You can also use CodeDom to compile and run a function.
For sure you can google to see some examples that might fit your needs.
It seems that this article "How to dynamically compile C# code" and this article "Dynamically executing code in .Net" could help you.

You have access to the Compiler from within code, you can then create instances of the compiled code and use them without restarting the application. There are examples of it around
Here
and
Here
The second one is a javascript evaluator but could be adapted easily enough.

You can take a look at System.Reflection.Emit to generate code at the IL level.
Or generate C#, compile into a library and load that dynamically. Not nearly as flexible.

It is in fact very easy to generate IL. See this tutorial: http://www.meta-alternative.net/calc.pdf

Using reflection for code gen?

I'm writing a console tool to generate some C# code for objects in a class library. The best/easiest way I can actual generate the code is to use reflection after the library has been built. It works great, but this seems like a haphazard approch at best. Since the generated code will be compiled with the library, after making a change I'll need to build the solution twice to get the final result, etc. Some of these issues could be mitigated with a build script, but it still feels like a bit too much of a hack to me.
My question is, are there any high-level best practices for this sort of thing?

Its pretty unclear what you are doing, but what does seem clear is that you have some base line code, and based on some its properties, you want to generate more code.
So the key issue here are, given the base line code, how do you extract interesting properties, and how do you generate code from those properties?
Reflection is a way to extract properties of code running (well, at least loaded) into the same execution enviroment as the reflection user code. The problem with reflection is it only provides a very limited set of properties, typically lists of classes, methods, or perhaps names of arguments. IF all the code generation you want to do can be done with just that, well, then reflection seems just fine. But if you want more detailed properties about the code, reflection won't cut it.
In fact, the only artifact from which truly arbitrary code properties can be extracted is the the source code as a character string (how else could you answer, is the number of characters between the add operator and T in middle of the variable name is a prime number?). As a practical matter, properties you can get from character strings are generally not very helpful (see the example I just gave :).
The compiler guys have spent the last 60 years figuring out how to extract interesting program properties and you'd be a complete idiot to ignore what they've learned in that half century.
They have settled on a number of relatively standard "compiler data structures": abstract syntax trees (ASTs), symbol tables (STs), control flow graphs (CFGs), data flow facts (DFFs), program triples, ponter analyses, etc.
If you want to analyze or generate code, your best bet is to process it first into such standard compiler data structures and then do the job. If you have ASTs, you can answer all kinds of question about what operators and operands are used. If you have STs, you can answer questions about where-defined, where-visible and what-type. If you have CFGs, you can answer questions about "this-before-that", "what conditions does statement X depend upon". If you have DFFs, you can determine which assignments affect the actions at a point in the code. Reflection will never provide this IMHO, because it will always be limited to what the runtime system developers are willing to keep around when running a program. (Maybe someday they'll keep all the compiler data structures around, but then it won't be reflection; it will just finally be compiler support).
Now, after you have determined the properties of interest, what do you do for code generation? Here the compiler guys have been so focused on generation of machine code that they don't offer standard answers. The guys that do are the program transformation community (http://en.wikipedia.org/wiki/Program_transformation). Here the idea is to keep at least one representation of your program as ASTs, and to provide special support for matching source code syntax (by constructing pattern-match ASTs from the code fragments of interest), and provide "rewrite" rules that say in effect, "when you see this pattern, then replace it by that pattern under this condition".
By connecting the condition to various property-extracting mechanisms from the compiler guys, you get relatively easy way to say what you want backed up by that 50 years of experience. Such program transformation systems have the ability to read in source code,
carry out analysis and transformations, and generally to regenerate code after transformation.
For your code generation task, you'd read in the base line code into ASTs, apply analyses to determine properties of interesting, use transformations to generate new ASTs, and then spit out the answer.
For such a system to be useful, it also has to be able to parse and prettyprint a wide variety of source code langauges, so that folks other than C# lovers can also have the benefits of code analysis and generation.
These ideas are all reified in the
DMS Software Reengineering Toolkit. DMS handles C, C++, C#, Java, COBOL, JavaScript, PHP, Verilog, ... and a lot of other langauges.
(I'm the architect of DMS, so I have a rather biased view. YMMV).

Have you considered using T4 templates for performing the code generation? It looks like it's getting much more publicity and attention now and more support in VS2010.
This tutorial seems database centric but it may give you some pointers: http://www.olegsych.com/2008/09/t4-tutorial-creatating-your-first-code-generator/ in addition there was a recent Hanselminutes on T4 here: http://www.hanselminutes.com/default.aspx?showID=170.
Edit: Another great place is the T4 tag here on StackOverflow: https://stackoverflow.com/questions/tagged/t4
EDIT: (By asker, new developments)
As of VS2012, T4 now supports reflection over an active project in a single step. This means you can make a change to your code, and the compiled output of the T4 template will reflect the newest version, without requiring you to perform a second reflect/build step. With this capability, I'm marking this as the accepted answer.

You may wish to use CodeDom, so that you only have to build once.
First, I would read this CodeProject article to make sure there are not language-specific features you'd be unable to support without using Reflection.

From what I understand, you could use something like Common Compiler Infrastructure (http://ccimetadata.codeplex.com/) to programatically analyze your existing c# source.
This looks pretty involved to me though, and CCI apparently only has full support for C# language spec 2. A better strategy may be to streamline your existing method instead.

I'm not sure of the best way to do this, but you could do this
As a post-build step on your base dll, run the code generator
As another post-build step, run csc or msbuild to build the generated dll
Other things which depend on the generated dll will also need to depend on the base dll, so the build order remains correct

Convert a Method to C# source Code

I am writing a fairly simple code gen tool, and I need the ability to convert MSIL (or MethodInfo) objects to their C# source. I realize Reflector does a great job of this, but it has the obnoxious "feature" of being UI only.
I know I could just generate the C# strings directly, using string.Format to insert the variable portions, but I'd really prefer to be able to generate methods programatically (e.g. a delegate or MethodInfo object), then pass those methods to a writer which would convert them to C#.
It seems a little silly that the System libraries make it so easy to go from a C# source code string to a compiled (and executable) method at runtime, but impossible to go from an object to source code--even for simple things.
Any ideas?

This add-in for Reflector lets you output to a file, and it is possible to run Reflector from the command line. It's probably simpler to get that to do what you want than to roll your own decompiler.
Anakrino is another decompiler with a command line option, but it hasn't been updated since .NET 1.1. It's open source, though, so you might be able to base your solution on it.

The code generators which I wrote as part of our application use String.Format, and although I am not entirely happy with String.Format (Lisp macros beat it any time of the day) it does the job all right. The C# compiler is going to re-check all your generated methods anyway, so you gain nothing by round-tripping through Reflection.Emit and the (still unwritten) MSIL-to-C# decompiler.
PS: why not use CodeDOM if you want to use something heavyweight?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.