AGAIN: If you're voting -1, please leave a comment explaining why. This post isn't about whether or not you approve if this approach, but how to go about it.
Like many architects, I've developed coding standards through years of experience to which I expect my developers to adhere.
This is especially a problem with the crowd that believes that three or four years of experience makes you a senior-level developer.Approaching this as a training and code review issue has generated limited success.
So, I was thinking that it would be great to be able to add custom compile-time errors to the build process to more strictly enforce our in-house best practices and coding standards.
For instance, we use stored procedures for ALL database access, which provides procedure-level security, db encapsulation (table structure is hidden from the app), and other benefits. (Note: I am not interested in starting a debate about this.) Some developers prefer inline SQL or parametrized queries, and that's fine - on their own time and own projects.
I'd like a way to add a compilation check that finds, say, anything that looks like
string sql = "insert into some_table (col1,col2) values (#col1, #col2);"
and generates an error or, in certain circumstances, a warning, with a message like
Inline SQL and parametrized queries are not permitted.
Or, if they use the var keyword
var x = new MyClass();
Variable definitions must be explicitly typed.
Do Visual Studio and MSBuild provide a way to add this functionality? I'm thinking that I could use a regular expression to find unacceptable code and generate the correct error, but I'm not sure what, from a performance standpoint, is the best way to to integrate this into the build process.
We could add a pre- or post-build step to run a custom EXE, but how can I return line- and file-specifc errors? Also, I'd like this to run after compilation of each file, rather than post-link.
Is a regex the best way to perform this type of pattern matching, or should I go crazy and run the code through a C# parser, which would allow node-level validation via the parse tree?
I'd appreciate suggestions and tales of prior experience.
Comments
Several respondents have pointed out that it's possible to restrict the ability of a user to run anything but a stored proc through db permissions. However, we're in the process of porting a 350k+ line application from ASP 3.0 to ASP.NET MVC, and the existing code base relies pretty heavily on concatenated SQL, whereas the new stuff all uses Enterprise Library. I guess I could add a separate web user account for the .NET code with more restrictive permissions.
For coding standards I would look at writing custom rules for FxCop or StyleCop. I don't think Regex would be a suitable tool for the job.
For the specific case of requiring Stored Procedures - if you ensure the application doesn't have permission to do anything else on the production database, everyone will soon fall in line.
What about writing a plugin for Resharper? Here is a tutorial to start with: Writing plug-ins for ReSharper: Part 1 of Undefined
Implicit typing (var x = ....) is a feature that can be turned off on project level in visual studio.
The other one is trickier. Have you had a look at FxCop, which is the tool for enforcing code standards.
The requirement that only stored procedures can be used should be managed through database permissions. The rule against using var seems fairly arbitrary to me and I can't think of a way to enforce it. Do you have any more examples of your best practices?
Related
Is it possible to use C# as a DSL in which the C# source code is edited by the end user in a TextBox, compiled while the application is running, then called by the already-running application?
I ask because in the next few months I will be needing to implement a simple math-crunching DSL (similar to somthing Rachel Lim blogged about at http://rachel53461.wordpress.com/2011/08/20/the-math-converter/
I am focused on the math-processing aspect of her code, not the XAML/Converter aspect). I would lean against just reusing her code because I want to add if-statements and possibly other features. If I can use C# itself, then I get all of the features without having to re-implement them.
If it is possible to do this, what framework or namespace or class would I want to use to accomplish such?
Please note that one thing I would do with the C#-derived DSL is hard-code all necessary using header statements, then remove all using statements entered by the savvy user. The purpose of this is to reduce the prospect of an end user trying to leverage my C#-like DSL into a full-fledged compiler against the wishes of their enterprise policy or without the knowledge of the site administrator. Is my proposed managing of using statements an adequate defense against user mischief?
Finally, if all of the answers up to this point are "yes", then what are the drawbacks of this approach, especially drawbacks of introducing a security vulnerability?
Paul
Is my proposed managing of using statements an adequate defense against user mischief?
No. You'd have to remove references to fully-qualified classes as well. And then, the user can still use reflection to gain access to classes they have not referred to in either way.
You'll want to create a separate appdomain to contain the user's code, which you can then sandbox appropriately. Here is a relevant article on MSDN, which explains this process in depth.
Stackoverflow automatically converts link answers to comments now. How lovely.
Compile and run dynamic code, without generating EXE?
Anyway, the answer lies with Microsoft.CSharp.CSharpCodeProvider
Removing using directives will not help, unless you also find some way to prevent the user from writing e.g. System.Diagnostics.Process.Start("evilprogram.exe"). Doing this (without also preventing property accesses) will require you to use a C# parser.
You might, however, be able to use Code Access Security for this.
We are developing a product that will be deployed for a number of clients. In an area of this product, we need to execute some formulas. The problem is that this formulas can be different for each client. So we need to have a good architecture that can 'execute code dynamically'
The application is a web application hosted on sharepoint 2010 (thereby using .net 3.5)
A Simplified example:
I have class MyClassA with two numeric properties PropA and PropB
For one client the formula is PropA + PropB. For the other it is PropA-PropB.
This is just a simplified example as the formula is more complex than this.
I need to have a way that client A we can set PropA+PropB perhaps in an XMl file or database.
I would then load the code dynamically?
Does this make sense? Has anyone implement similar scenario in a production environment please?
I have found that the following article describes a similar situation but I do not know whether it is 100% reliable for a production environment:
http://west-wind.com/presentations/dynamicCode/DynamicCode.htm
I have also found that IronPython can also solve a similar problem but I cannot understand how I would use my ClassA with IronPython.
Any assistance would be greately appreciated.
Update ...
Thanks everyone for the detailed feedback. It was a very constructive exercise. I have reviewed the different approaches and it seems very likely that we will go ahead with the nCalc approach. nCalc seems to be a wonderful tool and I am already loving the technology :)
Thank you all!!
Look into nCalc.
You could store your calculations in a database field then simply execute them on demand. Pretty simple and powerful. We are using this in a multi-tenant environment to help with similar types of customization.
I'm just proposing this because I don't know the problem very well but the idea could be a Dll for each formula (so you can handle the code as you wish, with normal C# functionalities instead of an uncomfortable xml file).
With MEF you can inject dll into your code (you just have to upload those when you develop a new one, no need to recompile the exe file) and have a different formula for each client.
This is my idea because it looks like a perfect example for Strategy pattern
Do you have a fixed set of formulas, or does the client have the capacity to dynamically type those in, i.e. as for a calculator?
In the first case, I'd recommend the following: set of C# delegates, which get called/call each other in a particular order, and (a) Dictionary(ies) of closures which fit the delegates. The closures would then be assigned to the delegates based on your predefined conditions.
In the alternative case, I wouldn't compile .NET code based on what the client types in, since that (unless preempted) represents a server-side security risk. I would implement/adapt a simple parser for expressions that you're expecting.
P.S. nCalc sugguested by Chris Lively is definitely a viable option for this kind of task, and is better than directly using delegates if you have tons and tons of formulas that you don't want to keep in memory.
ClassA with Ironpython?
Keep it simple
Run through the classA instance for each member (maybe a custom attribute to mark up the ones you want to use in the calc) and end up with name=value pair which by some unfortunate coincidence looks like an assignment
e.g
PropA = 100
PropB = 200
Prepend that to your python script
PropAPropB = PropA + PropB
Execute the script which is the assignments and the calculation
Then it's basically
ClassB.PropAPropB = ipCalc.Eval("PropAPropB");
You can start getting real clever with it, but a methods to get the inputs from an instance and one the evaluatios the result of the calc and sets teh properties.
Bob's your mother's sister's brother...
I have mail-merge like functionality, which takes a template, some business object, and produces html which is then made into PDF.
I'm using RazorEngine to do the template+model to html bit.
If I let the users specify the templates, what risks am I taking? Is it possible to mitigate any risks?
For example, could the users execute arbitrary code? (delete files, alter database, etc.?) Is there some way I can detect this sort of thing? (I know that would be impossible generally, but the bits of code in the razor template should be model property gets, or possibly if statements based on model property values).
I do basically trust the users here (it's a small private project), but as templating engines go, this one seems excessively powerful for this application.
In version 3 I've introduced an IsolatedTemplateService which supports the parsing/compiling of templates in another AppDomain. You'll be able to control the creation of the application domain that templates will be compiled in, which means you can introduce whatever security requirements you want by applying security policies to the child application domain itself.
In future pushes, I am hoping to introduce a generic way for adding extensions to the pipeline, so you can do things like code generation inspection. I would imagine this will enable scenarios for type checking of the generated code before it is compiled.
I pushed an early version of RazorEngine (v3) onto GitHub a few days ago. Feel free to check it out. https://github.com/Antaris/RazorEngine
A cshtml Razor file is able to execute any. NET code in the context of the site so yes, it is a security risk to permit them to be supplied by users.
You would be better served by accepting a more general HTML template, with custom tokens to input Model data.
I believe that having removed using statements and replacing any #System.[...] like System.IO.File.Delete(filepath) using regex can reduce a fair amount of possible security holes.
Keep in mind that the Template runs inside a context and can access only what is available in it but that includes also .NET Framework assemblies.
I would like to build an application framework that is mainly interpreted.
Say that the source code would be stored in the database that could be edited by the users and always the latest version would be executed.
Can anyone give me some ideas how does one implement sth like this !
cheers,
gabor
In .Net, you can use reflection and CodeDOM to compile code on the fly. But neither approach is really very simple or practical. Mono has some ability to interpret c# on the fly as well, but I haven't looked closely at it yet.
Another alternative is to go with an interpreted .Net language like Boo or IronPython as the language for your database code.
Either way, make sure you think long and hard about the security of your platform. Allowing users to execute arbitrary code is always an exercise fraught with peril. It's often too tempting to look for a simple eval() method, and even if one exists, that is not good enough for this kind of scenario.
Try Mono ( http://www.monoproject.org ). It supports many scripting languages including JavaScript.
If you don't want to use any scripting you can use CodeDOM or Reflection (see Reflection.Emit).
Here are really useful links on the topic :
Dynamically executing code in .Net (Here you can find a tool which can be very helpul)
Late Binding and On-the-Fly Code
Generation Using Reflection in C#
Dynamic Source Code Generation and
Compilation
Usually the Program uses a scripting language for the scriptable parts, i.e. Lua or Javascript.
To answer your technical question: You don't want to write your own language and interpreter. That's too much work for you to do. So pick some other language, say Python or Lua, and look for the documentation that lets your C program hand it blocks of code to execute. Of course, the script needs to be able to do something, so you'll need to find how to expose your program's objects to the script. Also, what will happen if a client is running the program when you update its source code in the database? Should the client restart? Are you going to store the entire program as a single row in this database, or did you want to store individual functions? That affects how you structure your updates.
To address other issues with your question: Why do you want to do this? Making "interpreted language" part of your design spec for a system is not often a good sign. Is the real requirement something like this: "I update the program often and I want users to always have the latest copy?" If so, there are other, better ways to go about this (just give us your actual scenario and requirements).
I'm writing a console tool to generate some C# code for objects in a class library. The best/easiest way I can actual generate the code is to use reflection after the library has been built. It works great, but this seems like a haphazard approch at best. Since the generated code will be compiled with the library, after making a change I'll need to build the solution twice to get the final result, etc. Some of these issues could be mitigated with a build script, but it still feels like a bit too much of a hack to me.
My question is, are there any high-level best practices for this sort of thing?
Its pretty unclear what you are doing, but what does seem clear is that you have some base line code, and based on some its properties, you want to generate more code.
So the key issue here are, given the base line code, how do you extract interesting properties, and how do you generate code from those properties?
Reflection is a way to extract properties of code running (well, at least loaded) into the same execution enviroment as the reflection user code. The problem with reflection is it only provides a very limited set of properties, typically lists of classes, methods, or perhaps names of arguments. IF all the code generation you want to do can be done with just that, well, then reflection seems just fine. But if you want more detailed properties about the code, reflection won't cut it.
In fact, the only artifact from which truly arbitrary code properties can be extracted is the the source code as a character string (how else could you answer, is the number of characters between the add operator and T in middle of the variable name is a prime number?). As a practical matter, properties you can get from character strings are generally not very helpful (see the example I just gave :).
The compiler guys have spent the last 60 years figuring out how to extract interesting program properties and you'd be a complete idiot to ignore what they've learned in that half century.
They have settled on a number of relatively standard "compiler data structures": abstract syntax trees (ASTs), symbol tables (STs), control flow graphs (CFGs), data flow facts (DFFs), program triples, ponter analyses, etc.
If you want to analyze or generate code, your best bet is to process it first into such standard compiler data structures and then do the job. If you have ASTs, you can answer all kinds of question about what operators and operands are used. If you have STs, you can answer questions about where-defined, where-visible and what-type. If you have CFGs, you can answer questions about "this-before-that", "what conditions does statement X depend upon". If you have DFFs, you can determine which assignments affect the actions at a point in the code. Reflection will never provide this IMHO, because it will always be limited to what the runtime system developers are willing to keep around when running a program. (Maybe someday they'll keep all the compiler data structures around, but then it won't be reflection; it will just finally be compiler support).
Now, after you have determined the properties of interest, what do you do for code generation? Here the compiler guys have been so focused on generation of machine code that they don't offer standard answers. The guys that do are the program transformation community (http://en.wikipedia.org/wiki/Program_transformation). Here the idea is to keep at least one representation of your program as ASTs, and to provide special support for matching source code syntax (by constructing pattern-match ASTs from the code fragments of interest), and provide "rewrite" rules that say in effect, "when you see this pattern, then replace it by that pattern under this condition".
By connecting the condition to various property-extracting mechanisms from the compiler guys, you get relatively easy way to say what you want backed up by that 50 years of experience. Such program transformation systems have the ability to read in source code,
carry out analysis and transformations, and generally to regenerate code after transformation.
For your code generation task, you'd read in the base line code into ASTs, apply analyses to determine properties of interesting, use transformations to generate new ASTs, and then spit out the answer.
For such a system to be useful, it also has to be able to parse and prettyprint a wide variety of source code langauges, so that folks other than C# lovers can also have the benefits of code analysis and generation.
These ideas are all reified in the
DMS Software Reengineering Toolkit. DMS handles C, C++, C#, Java, COBOL, JavaScript, PHP, Verilog, ... and a lot of other langauges.
(I'm the architect of DMS, so I have a rather biased view. YMMV).
Have you considered using T4 templates for performing the code generation? It looks like it's getting much more publicity and attention now and more support in VS2010.
This tutorial seems database centric but it may give you some pointers: http://www.olegsych.com/2008/09/t4-tutorial-creatating-your-first-code-generator/ in addition there was a recent Hanselminutes on T4 here: http://www.hanselminutes.com/default.aspx?showID=170.
Edit: Another great place is the T4 tag here on StackOverflow: https://stackoverflow.com/questions/tagged/t4
EDIT: (By asker, new developments)
As of VS2012, T4 now supports reflection over an active project in a single step. This means you can make a change to your code, and the compiled output of the T4 template will reflect the newest version, without requiring you to perform a second reflect/build step. With this capability, I'm marking this as the accepted answer.
You may wish to use CodeDom, so that you only have to build once.
First, I would read this CodeProject article to make sure there are not language-specific features you'd be unable to support without using Reflection.
From what I understand, you could use something like Common Compiler Infrastructure (http://ccimetadata.codeplex.com/) to programatically analyze your existing c# source.
This looks pretty involved to me though, and CCI apparently only has full support for C# language spec 2. A better strategy may be to streamline your existing method instead.
I'm not sure of the best way to do this, but you could do this
As a post-build step on your base dll, run the code generator
As another post-build step, run csc or msbuild to build the generated dll
Other things which depend on the generated dll will also need to depend on the base dll, so the build order remains correct