Simple & Practical C# code generation (VS 2008 or 2010)

Simple & Practical C# code generation (VS 2008 or 2010) - c#

I've put off using generated code as part of the build process for fear of the complexity it introduces into the build process.
Is there a simple way to integrate build-time generated code into an app?
The kind of code I'm thinking of is similar to the resource and settings file code generation that Visual studio performs:
Having intellisense here is valuable
There are a lot of properties and links between properties that are trivial to describe, but impossible to implement tersely in C#.
The underlying resource is be modifiable and the code is automatically regenerated without needing any user interaction and without any need to understand the internals of the generator.
For (a non-real-world) example consider a precompiler that generated accessor to the named capture groups of a Regex via similarly named C# properties (or methods). This is typical of the kinds of things I'd like to generate: long snippets of boilerplate wrappers whose primary function is to enable compile time checking for errors (in the above; accessing non-existant capturing groups or writing and invalid regex) and no less importantly, intellisense for these properties. Finally, this setup should be trivially usable by others on the team with only the bare minimum of learning curve. I.e., it's absolutely not acceptable to require manual intervention to regenerate the code, nor acceptable to commit the generated code into source control. At worst, everyone should just need to install some extension; ideally the extension should be installable into the source-tree so that anyone that checks out the tree can build the project without any introduction.
For that to work well, it's critical that the IDE integration be excellent: Updating the underlying "resource" definition file should trigger a regeneration of the code without any user interaction, and ideally the generator itself would be easy to maintain for other developers later on (i.e. some amount of generator debug-ability is a plus).
Finally, an XSLT-like approach where the same template can be applied to various input resources is ideal; both because this means that you don't even need to look at the actual generator code if all you want to do is is update the resource, and because it makes template reuse trivial.
I've looked at T4, but from what I've seen this has a less handy ASP-like approach where template and resource aren't cleanly split (i.e, the generator is responsible for finding the resource - which makes template reuse less easy).
Is there a better (cleaner) solution or some way of running T4 such that the same template is can be trivially reused and (much like .NET settings files) that any update of the resource automatically triggers a regeneration of the implemented code?
Summary:
I'm looking for a code-gen approach that can
Regenerate code automatically without dev intervention when the underlying resource (not the template!) changes.
Be somewhat simple to maintain
Be able to share the same generator template between several resources (which, with point #1 probably implies the resource should refer to the generator and not vice-versa).

You can use T4ScriptFileGenerator from T4 Toolbox. Change "Custom Tool" property for your "resource" file to T4ScriptFileGenerator and save changes. The custom tool will generate a new, empty T4 script (.tt file). Place your code generation logic in this .tt file. Any time you modify (and save) the resource file, the T4ScriptFileGenerator will use the .tt file to generate the output code. For an example of how this works, see "LINQ to SQL Model" generator in the T4 Toolbox, which uses a .dbml file as the "resource". In the .tt file created by this generator, you will see that all of the code generation logic resides in separate .tt files and is reused with the help of include directives.

You may want to keep an eye on ABSE (http://www.abse.info). ABSE is a code-generation and model-driven software development methodology that is completely agnostic in terms of platform and language, so you wouldn't have any trouble creating your own generators for C# and anything else you wish. The big plus is that you can generate code exactly the way you want. The downside is that you may have more work to do at first to build your templates.
ABSE allows you to capture your domain knowledge into "Atoms", which are basically fragments of larger models you can build. ABSE is both declarative and executable. The model is able to generate code by your specification and incorporate custom code at the model level.
Unfortunately, ABSE is still work in progress and an Integrated Development Environment (named AtomWeaver) is still in the making. Anyway a CTP release of the generator is scheduled for January 2010, so we're already close to it.

Related

Hash of source codes at compile time in C#

Having a server that other devs use, I currently log the version of the dll they use. I do that by having the client that use Reflection to retrieve its version:
Assembly.GetEntryAssembly().GetName().Version.ToString();
It's nice, but since it come from dev that uses TFS and do themself the build, I can not see if they have the latest version of the sources. Is there a trick, like a compilation tag, that would easily allow a hash of the generating source code?
Note: I have try to send the MD5 of the dll (using assembly.Location), but it is useless since the hash value changes between 2 compilations (I suppose there is some compilation timestamp inside the generated dll).

This is most collaboraton issue then a coding.
In the moment that you find out that the version is old one.notify them about it.
If the real version is not old one, that means that developers before making buold did not increment the version ID, which is mistake.
In other words, ordanize it among people, and not relly on these kind of tools (if there is any). You trying to create a complicated tool, that will help you avoid mistakes, but humans will find a way to make them again.
So it's better to create solid relation structure among you, imo.

Create a tool on pre build event to hash/last-write-time your code files.
Write the result to a cs file or a embedded resource file.
The result file must exclude in above action.
For prevent skip build (up-to-date) feature not work,Compare the file before write.
And if youre opening the file in IDE will get a prompt `changed from out side' when build.

Seem there is no easy way to do it.

How dangerous is it to let users specify RazorEngine templates?

I have mail-merge like functionality, which takes a template, some business object, and produces html which is then made into PDF.
I'm using RazorEngine to do the template+model to html bit.
If I let the users specify the templates, what risks am I taking? Is it possible to mitigate any risks?
For example, could the users execute arbitrary code? (delete files, alter database, etc.?) Is there some way I can detect this sort of thing? (I know that would be impossible generally, but the bits of code in the razor template should be model property gets, or possibly if statements based on model property values).
I do basically trust the users here (it's a small private project), but as templating engines go, this one seems excessively powerful for this application.

In version 3 I've introduced an IsolatedTemplateService which supports the parsing/compiling of templates in another AppDomain. You'll be able to control the creation of the application domain that templates will be compiled in, which means you can introduce whatever security requirements you want by applying security policies to the child application domain itself.
In future pushes, I am hoping to introduce a generic way for adding extensions to the pipeline, so you can do things like code generation inspection. I would imagine this will enable scenarios for type checking of the generated code before it is compiled.
I pushed an early version of RazorEngine (v3) onto GitHub a few days ago. Feel free to check it out. https://github.com/Antaris/RazorEngine

A cshtml Razor file is able to execute any. NET code in the context of the site so yes, it is a security risk to permit them to be supplied by users.
You would be better served by accepting a more general HTML template, with custom tokens to input Model data.

I believe that having removed using statements and replacing any #System.[...] like System.IO.File.Delete(filepath) using regex can reduce a fair amount of possible security holes.
Keep in mind that the Template runs inside a context and can access only what is available in it but that includes also .NET Framework assemblies.

C# How to generate code from code

Is it possible to generate and build some c# code based on the code from the same project. I tried with T4 and Reflection, but there are some assembly locking issues. Is there any other way?

Reflection works fine for me. You can get around assembly locking issues by isolating your build task to a separate AppDomain within VS. When the task completes, any assemblies you need to use for code generation will be unloaded together with the task's AppDomain. See AppDomainIsolatedTask.

You can definitely write your own code generator, all in C# - after all, "code" that's being generated is just a text file you write out.
But what's wrong with T4 templates? They offer a lot of functionality that you don't have to reinvent yet again - why not use it? Can you tell us in more detail what problems you're having with T4?
T4 is really just a bunch of classes in .NET, too - so you could definitely write your own code generator handling some of the logic, and use T4 to do the templating & replacing those template values part. But again: in order to help you diagnose your T4 problems, we'd need to know more about those....

This example from Oleg Sych uses FXCop's introspection engine instead of reflection. That way, the assemblies do not get locked.
Unfortunately, Reflection is optimized
for code execution. One particular
limitation makes it ill-suited for
code generation - an assembly loaded
using Reflection can only be unloaded
with its AppDomain. Because T4
templates are compiled into .NET
assemblies and cached to improve
performance of code generation, using
Reflection to access the component
assembly causes T4 to lock it.
Alternatively, if you're only targeting Linq to SQL classes, you could generate code from the dbml file instead of the code that L2S generates from the dbml. I've got an example of something similar (an edmx file) on my own blog.

There is a third party C# .NET variant of JavaCC that we use at work.
Also an interesting article about how to make one:
http://msdn.microsoft.com/en-us/magazine/cc136756.aspx

It really depends on what exactly are you trying to achieve, but on a general case I'd recommend using T4 templates.
And yes, it is possible to use T4 templates inside your project to generate code in your project based on some local settings, but you should define what are you trying to do.
If you want to generate code based on some classes that you define in the same project - this doesn't sound like something easily achievable (after all you want to compile some of the classes in the current project, generate some code based on them and after that generate classes again... umm.. ?)
But if you want to store some settings and then run the T4 template and generate some code based on these settings - this is easily achievable. T4MVC is an example (they generate code based on a settings file that is copied and stored in the project alongside the T4 template). This template also looks at the current files available in the solution, and generates string constants based on each file. That kind of sounds like it would really help you with your problem, whatever it is :)
If you're still unsure - you can specify more details about what you want to do, and we'll try to help you :)

How to recognize code generated by Visual Studio's GUI designer?

Visual Studio is kind enough to generate a lot of code for us when we create and design Windows.Forms controls. It also surrounds most of it with a #region statement.
In newer versions it also uses a partial class to separate generated from manually created code.
Developers are supposed to edit code only in certain areas.
But nothing prevents us from violating this in whatever way we please.
I'm fine with manual edits that could just as well have been made from the designer, or manual edits in areas the designer doesn't touch. But I'd like to flag any other kind of edit.
Does anyone know a utility that can do this? StyleCop rules perhaps?
I mostly need it for the combination of C#, Windows.Forms, and Visual Studio 2003, 2005, and 2008.

These days, designer code should end up in a .Designer.cs file. It should be very rare that developers need to touch that. Unfortunately, I don't know any way of verifying that the code was genuinely generated by the designer. It would be handy if it included some sort of hash, but it doesn't as far as I'm aware...
Given how easy it is now to just say "don't edit designer files" do you really need another system though? It's not like you need to stay away from specific regions - it's the whole file which is out of bounds.

Why should developers not be allowed to change this code? If they are able to write code that works, they should be allowed to do it. If they are not able to write code that works, lets say they should be trained or fired.
You just have to extend the meaning of "it works" to "it works at runtime as well as in the designer". So what's wrong about that?
Todays gui designers are not very restrictive and are doing a good job in "understanding" code that had been written by a human.
There is also real generated code around, for instance code generated by some xml specification, resources etc. This code is generated when building, so when it had been changed, these changes are undone whenever the application is built.
Designers are not real code generators of this kind. They are a kind of "coding helpers", helping the developer to write code faster then by typing it in. But it should actually be possible to write the same kind of code manually although limiting one self to the designers capabilities is a reasonable maintainability decision.

Using reflection for code gen?

I'm writing a console tool to generate some C# code for objects in a class library. The best/easiest way I can actual generate the code is to use reflection after the library has been built. It works great, but this seems like a haphazard approch at best. Since the generated code will be compiled with the library, after making a change I'll need to build the solution twice to get the final result, etc. Some of these issues could be mitigated with a build script, but it still feels like a bit too much of a hack to me.
My question is, are there any high-level best practices for this sort of thing?

Its pretty unclear what you are doing, but what does seem clear is that you have some base line code, and based on some its properties, you want to generate more code.
So the key issue here are, given the base line code, how do you extract interesting properties, and how do you generate code from those properties?
Reflection is a way to extract properties of code running (well, at least loaded) into the same execution enviroment as the reflection user code. The problem with reflection is it only provides a very limited set of properties, typically lists of classes, methods, or perhaps names of arguments. IF all the code generation you want to do can be done with just that, well, then reflection seems just fine. But if you want more detailed properties about the code, reflection won't cut it.
In fact, the only artifact from which truly arbitrary code properties can be extracted is the the source code as a character string (how else could you answer, is the number of characters between the add operator and T in middle of the variable name is a prime number?). As a practical matter, properties you can get from character strings are generally not very helpful (see the example I just gave :).
The compiler guys have spent the last 60 years figuring out how to extract interesting program properties and you'd be a complete idiot to ignore what they've learned in that half century.
They have settled on a number of relatively standard "compiler data structures": abstract syntax trees (ASTs), symbol tables (STs), control flow graphs (CFGs), data flow facts (DFFs), program triples, ponter analyses, etc.
If you want to analyze or generate code, your best bet is to process it first into such standard compiler data structures and then do the job. If you have ASTs, you can answer all kinds of question about what operators and operands are used. If you have STs, you can answer questions about where-defined, where-visible and what-type. If you have CFGs, you can answer questions about "this-before-that", "what conditions does statement X depend upon". If you have DFFs, you can determine which assignments affect the actions at a point in the code. Reflection will never provide this IMHO, because it will always be limited to what the runtime system developers are willing to keep around when running a program. (Maybe someday they'll keep all the compiler data structures around, but then it won't be reflection; it will just finally be compiler support).
Now, after you have determined the properties of interest, what do you do for code generation? Here the compiler guys have been so focused on generation of machine code that they don't offer standard answers. The guys that do are the program transformation community (http://en.wikipedia.org/wiki/Program_transformation). Here the idea is to keep at least one representation of your program as ASTs, and to provide special support for matching source code syntax (by constructing pattern-match ASTs from the code fragments of interest), and provide "rewrite" rules that say in effect, "when you see this pattern, then replace it by that pattern under this condition".
By connecting the condition to various property-extracting mechanisms from the compiler guys, you get relatively easy way to say what you want backed up by that 50 years of experience. Such program transformation systems have the ability to read in source code,
carry out analysis and transformations, and generally to regenerate code after transformation.
For your code generation task, you'd read in the base line code into ASTs, apply analyses to determine properties of interesting, use transformations to generate new ASTs, and then spit out the answer.
For such a system to be useful, it also has to be able to parse and prettyprint a wide variety of source code langauges, so that folks other than C# lovers can also have the benefits of code analysis and generation.
These ideas are all reified in the
DMS Software Reengineering Toolkit. DMS handles C, C++, C#, Java, COBOL, JavaScript, PHP, Verilog, ... and a lot of other langauges.
(I'm the architect of DMS, so I have a rather biased view. YMMV).

Have you considered using T4 templates for performing the code generation? It looks like it's getting much more publicity and attention now and more support in VS2010.
This tutorial seems database centric but it may give you some pointers: http://www.olegsych.com/2008/09/t4-tutorial-creatating-your-first-code-generator/ in addition there was a recent Hanselminutes on T4 here: http://www.hanselminutes.com/default.aspx?showID=170.
Edit: Another great place is the T4 tag here on StackOverflow: https://stackoverflow.com/questions/tagged/t4
EDIT: (By asker, new developments)
As of VS2012, T4 now supports reflection over an active project in a single step. This means you can make a change to your code, and the compiled output of the T4 template will reflect the newest version, without requiring you to perform a second reflect/build step. With this capability, I'm marking this as the accepted answer.

You may wish to use CodeDom, so that you only have to build once.
First, I would read this CodeProject article to make sure there are not language-specific features you'd be unable to support without using Reflection.

From what I understand, you could use something like Common Compiler Infrastructure (http://ccimetadata.codeplex.com/) to programatically analyze your existing c# source.
This looks pretty involved to me though, and CCI apparently only has full support for C# language spec 2. A better strategy may be to streamline your existing method instead.

I'm not sure of the best way to do this, but you could do this
As a post-build step on your base dll, run the code generator
As another post-build step, run csc or msbuild to build the generated dll
Other things which depend on the generated dll will also need to depend on the base dll, so the build order remains correct

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.