We are about to use Code Protectors (Obsfucation as well as Native Compilation), I assume ORMs will be dependent little bit on Reflection and I am worried will Obsfucation and Native Compilation protection techniques create any problems?
Has anyone tried successful ORM and Code Protection for any good desktop application? We are having WPF Desktop Application.
Our primary language for development is C# and we are using our custom ORM but I want to evaluate any commercial ORM or ADO.NET EF etc as well.
Question is not about what is Code Protection and which one I should use, I am trying to ask about the effect of protection on ORM.
If your code is using Reflection, most probably the obfuscated assembly will not work. You will need to exclude from obfuscation those entities referenced by their original name. Take a look at Crypto Obfuscator which will analyze your code during obfuscation and show all methods and line numbers where potentially breaking methods (such as Reflection ) are called. This is a huge timer-saver since it pinpoints the exact location and helps determine the properties/classes you need to exclude from renaming.
Try .Net Reactor. Available at http://www.eziriz.com/
Its a LOT cheaper than some of the others around, and it can do a lot more. You can also disable certain options (like obfuscation, to preserve the use of reflection) and only have certain options enabled like ILDASM Suppression, which will still protect the code.
Cheers
Redgate acquired Smart Assembly not too long ago, which is what I'd look at if I had a need to do this.
A while ago I trialed CodeViel to look at obfuscating/encrypting code with some degree of success. I think if you’re serious about doing this it’s not as simple as dropping an assembly in one end and it popping out a protected assembly. You will have to consider portions of your code (ie Namespaces, Classes, Methods, Fields, Properties, Structures, Events, and Resources) which are only to be used internally, and those that need to be exposed to other resources and libraries. In the case I was looking at I was able to encrypt (or use native compilation) to hide some method implementations, but left the class definition (name, methods, properties untouched). In some cases I left whole namespaces untouched as they contained only simple POCO objects required by other libraries.
It really seems to be a careful case by case basis as to what strategy you use where, some internals you could obfuscate to make decompilation/reverse engineering hard and that would be enough. Other cases you could use the encryption/native compilation to simply hide a method implementation. And you will also get cases where you are excluding portions of an assembly from being touched at all. Most of these programs will give you some recommended defaults and options that you can start from, but you will need to tweak and change these until you can produce results that protect your core IP but don't restrict your end users.
Related
Is it possible to use C# as a DSL in which the C# source code is edited by the end user in a TextBox, compiled while the application is running, then called by the already-running application?
I ask because in the next few months I will be needing to implement a simple math-crunching DSL (similar to somthing Rachel Lim blogged about at http://rachel53461.wordpress.com/2011/08/20/the-math-converter/
I am focused on the math-processing aspect of her code, not the XAML/Converter aspect). I would lean against just reusing her code because I want to add if-statements and possibly other features. If I can use C# itself, then I get all of the features without having to re-implement them.
If it is possible to do this, what framework or namespace or class would I want to use to accomplish such?
Please note that one thing I would do with the C#-derived DSL is hard-code all necessary using header statements, then remove all using statements entered by the savvy user. The purpose of this is to reduce the prospect of an end user trying to leverage my C#-like DSL into a full-fledged compiler against the wishes of their enterprise policy or without the knowledge of the site administrator. Is my proposed managing of using statements an adequate defense against user mischief?
Finally, if all of the answers up to this point are "yes", then what are the drawbacks of this approach, especially drawbacks of introducing a security vulnerability?
Paul
Is my proposed managing of using statements an adequate defense against user mischief?
No. You'd have to remove references to fully-qualified classes as well. And then, the user can still use reflection to gain access to classes they have not referred to in either way.
You'll want to create a separate appdomain to contain the user's code, which you can then sandbox appropriately. Here is a relevant article on MSDN, which explains this process in depth.
Stackoverflow automatically converts link answers to comments now. How lovely.
Compile and run dynamic code, without generating EXE?
Anyway, the answer lies with Microsoft.CSharp.CSharpCodeProvider
Removing using directives will not help, unless you also find some way to prevent the user from writing e.g. System.Diagnostics.Process.Start("evilprogram.exe"). Doing this (without also preventing property accesses) will require you to use a C# parser.
You might, however, be able to use Code Access Security for this.
I am embedding IronPython into my game engine, where you can attach scripts to objects. I don't want scripts to be able to just access the CLR whenever they want, because then they could pretty much do anything.
Having random scripts, especially if downloaded from the internet, being able to open internet connections, access the users HDD, or modify the internal game state is a very bad thing.
Normally people would just suggest, "Use a seperate AppDomain". However, unless I am severely mistaken, cross-AppDomains are slow. Very slow. Too slow for a game engine. So I am looking at alternatives.
I thought about compiling a custom version of IronPython that stops you from being able import clr or any namespace, thus limiting it to the standard library.
The option I would rather go with goes along the following lines:
__builtins__.__import__ = None #Stops imports working
reload = None #Stops reloading working (specifically stops them reloading builtins
#giving back an unbroken __import___!
I read this in another stack overflow post.
Assume that instead of setting __ builtins_._ import__ to none, I instead set it to a custom function that lets you load the standard API.
The question is, using the method outlined above, would there be any way for a script to be able to be able to get access to the clr module, the .net BCL, or anything else that could potentially do bad things? Or should I go with modifying the source? A third option?
The only way to guarantee it is to use an AppDomain. I don't know what the performance hit is; it depends on your use case, so you should measure it first to make sure that it actually is too slow.
If you only need a best-effort system, and if the scripts don't need to import anything, ever, and you supply all of the objects they need from the host, then your scheme should be acceptable. You can also avoid shipping the Python standard library, which will save some space.
You'll want to check the rest of the builtins for anything that might talk to the outside world; open, file, input, raw_input, and execfile come to mind, but there may be others. exec might be an issue as well, and as it's a keyword it might be trickier to turn off if there are openings there. Never underestimate the ability of a determined attacker!
I have embedded Iron Python in apps before and shared similar security concerns. What I did to help mitigate the risk was to create special objects just for the scripting run-time that were essentially wrappers around my core objects that only exposed "safe" functionality.
Another benefit from creating objects just for scripting is that you can optimize them for scripting with helper functions that make your scripts more terse and tidy.
Appdomain or not, there is nothing stopping somebody from loading an external .py module in their script.... Its a price you pay for the flexibility.
I am currently developing an application where you can create "programs" with it without writing source code, just click&play if you like.
Now the question is how do I generate an executable program from my data model. There are many possibilities but I am not sure which one is the best for me. I need to generate assemblies with classes and namespace and everything which can be part of the application.
CodeDOM class: I heard of lots of limitations and bugs of this class. I need to create attributes on method parameters and return values. Is this supported?
Create C# source code programmatically and then call CompileAssemblyFromFile on it: This would work since I can generate any code I want and C# supports most CLR features. But wouldn't this be slow?
Use the reflection ILGenerator class: I think with this I can generate every possible .NET code. But I think this is much more complicated and error prone than the other approaches?
Are there other possible solutions?
EDIT:
The tool is general for developing applications, it is not restricted to a specific domain. I don't know if it can be considered a visual programming language. The user can create classes, methods, method calls, all kinds of expressions. It won't be very limitating because you should be able to do most things which are allowed in real programming languages.
At the moment lots of things must still be written by the user as text, but the goal at the end is, that nearly everything can be clicked together.
You my find it is rewarding to look at the Dynamic Language Runtime which is more or less designed for creating high-level languages based on .NET.
It's perhaps also worth looking at some of the previous Stack Overflow threads on Domain Specific Languages which contain some useful links to tools for working with DSLs, which sounds a little like what you are planning although I'm still not absolutely clear from the question what exactly your aim is.
Most things "click and play" should be simple enough just to stick some pre-defined building-block objects together (probably using interfaces on the boundaries). Meaning: you might not need to do dynamic code generation - just "fake it". For example, using property-bag objects (like DataTable etc, although that isn't my first choice) for values, etc.
Another option for dynamic evaluation is the Expression class; especially in .NET 4.0, this is hugely versatile, and allows compilation to a delegate.
Do the C# source generation and don't care about speed until it matters. The C# compiler is quite quick.
When I wrote a dynamic code generator, I relied heavily on System.Reflection.Emit.
Basically, you programatically create dynamic assemblies and add new types to them. These types are constructed using the Emit constructs (properties, events, fields, etc..). When it comes to implementing methods, you'll have to use an ILGenerator to pump out MSIL op-codes into your method. That sounds super scary, but you can use a couple of tools to help:
A pre-built sample implementation
ILDasm to inspect the op-codes of the sample implementation.
It depends on your requirements, CodeDOM would certainly be the best fit for a "program" stored it in a "data model".
However its unlikely that using option 2 will be in any way measurably slower in comparision with any other approach.
I would echo others in that 1) the compiler is quick, and 2) "Click and Play" things should be simple enough so that no single widget added to a pile of widgets can make it an illegal pile.
Good luck. I'm skeptical that you can achieve point (2) for anything but really toy-level programs.
I'm writing a console tool to generate some C# code for objects in a class library. The best/easiest way I can actual generate the code is to use reflection after the library has been built. It works great, but this seems like a haphazard approch at best. Since the generated code will be compiled with the library, after making a change I'll need to build the solution twice to get the final result, etc. Some of these issues could be mitigated with a build script, but it still feels like a bit too much of a hack to me.
My question is, are there any high-level best practices for this sort of thing?
Its pretty unclear what you are doing, but what does seem clear is that you have some base line code, and based on some its properties, you want to generate more code.
So the key issue here are, given the base line code, how do you extract interesting properties, and how do you generate code from those properties?
Reflection is a way to extract properties of code running (well, at least loaded) into the same execution enviroment as the reflection user code. The problem with reflection is it only provides a very limited set of properties, typically lists of classes, methods, or perhaps names of arguments. IF all the code generation you want to do can be done with just that, well, then reflection seems just fine. But if you want more detailed properties about the code, reflection won't cut it.
In fact, the only artifact from which truly arbitrary code properties can be extracted is the the source code as a character string (how else could you answer, is the number of characters between the add operator and T in middle of the variable name is a prime number?). As a practical matter, properties you can get from character strings are generally not very helpful (see the example I just gave :).
The compiler guys have spent the last 60 years figuring out how to extract interesting program properties and you'd be a complete idiot to ignore what they've learned in that half century.
They have settled on a number of relatively standard "compiler data structures": abstract syntax trees (ASTs), symbol tables (STs), control flow graphs (CFGs), data flow facts (DFFs), program triples, ponter analyses, etc.
If you want to analyze or generate code, your best bet is to process it first into such standard compiler data structures and then do the job. If you have ASTs, you can answer all kinds of question about what operators and operands are used. If you have STs, you can answer questions about where-defined, where-visible and what-type. If you have CFGs, you can answer questions about "this-before-that", "what conditions does statement X depend upon". If you have DFFs, you can determine which assignments affect the actions at a point in the code. Reflection will never provide this IMHO, because it will always be limited to what the runtime system developers are willing to keep around when running a program. (Maybe someday they'll keep all the compiler data structures around, but then it won't be reflection; it will just finally be compiler support).
Now, after you have determined the properties of interest, what do you do for code generation? Here the compiler guys have been so focused on generation of machine code that they don't offer standard answers. The guys that do are the program transformation community (http://en.wikipedia.org/wiki/Program_transformation). Here the idea is to keep at least one representation of your program as ASTs, and to provide special support for matching source code syntax (by constructing pattern-match ASTs from the code fragments of interest), and provide "rewrite" rules that say in effect, "when you see this pattern, then replace it by that pattern under this condition".
By connecting the condition to various property-extracting mechanisms from the compiler guys, you get relatively easy way to say what you want backed up by that 50 years of experience. Such program transformation systems have the ability to read in source code,
carry out analysis and transformations, and generally to regenerate code after transformation.
For your code generation task, you'd read in the base line code into ASTs, apply analyses to determine properties of interesting, use transformations to generate new ASTs, and then spit out the answer.
For such a system to be useful, it also has to be able to parse and prettyprint a wide variety of source code langauges, so that folks other than C# lovers can also have the benefits of code analysis and generation.
These ideas are all reified in the
DMS Software Reengineering Toolkit. DMS handles C, C++, C#, Java, COBOL, JavaScript, PHP, Verilog, ... and a lot of other langauges.
(I'm the architect of DMS, so I have a rather biased view. YMMV).
Have you considered using T4 templates for performing the code generation? It looks like it's getting much more publicity and attention now and more support in VS2010.
This tutorial seems database centric but it may give you some pointers: http://www.olegsych.com/2008/09/t4-tutorial-creatating-your-first-code-generator/ in addition there was a recent Hanselminutes on T4 here: http://www.hanselminutes.com/default.aspx?showID=170.
Edit: Another great place is the T4 tag here on StackOverflow: https://stackoverflow.com/questions/tagged/t4
EDIT: (By asker, new developments)
As of VS2012, T4 now supports reflection over an active project in a single step. This means you can make a change to your code, and the compiled output of the T4 template will reflect the newest version, without requiring you to perform a second reflect/build step. With this capability, I'm marking this as the accepted answer.
You may wish to use CodeDom, so that you only have to build once.
First, I would read this CodeProject article to make sure there are not language-specific features you'd be unable to support without using Reflection.
From what I understand, you could use something like Common Compiler Infrastructure (http://ccimetadata.codeplex.com/) to programatically analyze your existing c# source.
This looks pretty involved to me though, and CCI apparently only has full support for C# language spec 2. A better strategy may be to streamline your existing method instead.
I'm not sure of the best way to do this, but you could do this
As a post-build step on your base dll, run the code generator
As another post-build step, run csc or msbuild to build the generated dll
Other things which depend on the generated dll will also need to depend on the base dll, so the build order remains correct
How do I protect the dlls of my project in such a way that they cannot be referenced and used by other people?
Thanks
The short answer is that beyond the obvious things, there is not much you can do.
The obvious things that you might want to consider (roughly in order of increasing difficulty and decreasing plausibility) include:
Static link so there is no DLL to attack.
Strip all symbols.
Use a .DEF file and an import library to have only anonymous exports known only by their export ids.
Keep the DLL in a resource and expose it in the file system (under a suitably obscure name, perhaps even generated at run time) only when running.
Hide all real functions behind a factory method that exchanges a secret (better, proof of knowledge of a secret) for a table of function pointers to the real methods.
Use anti-debugging techniques borrowed from the malware world to prevent reverse engineering. (Note that this will likely get you false positives from AV tools.)
Regardless, a sufficiently determined user can still figure out ways to use it. A decent disassembler will quickly provide all the information needed.
Note that if your DLL is really a COM object, or worse yet a CLR Assembly, then there is a huge amount of runtime type information that you can't strip off without breaking its intended use.
EDIT: Since you've retagged to imply that C# and .NET are the environment rather than a pure Win32 DLL written in C, then I really should revise the above to "You Can't, But..."
There has been a market for obfuscation tools for a long time to deal with environments where delivery of compilable source is mandatory, but you don't want to deliver useful source. There are C# products that play in that market, and it looks like at least one has chimed in.
Because loading an Assembly requires so much effort from the framework, it is likely that there are permission bits that exert some control for honest providers and consumers of Assemblies. I have not seen any discussion of the real security provided by these methods and simply don't know how effective they are against a determined attack.
A lot is going to depend on your use case. If you merely want to prevent casual use, you can probably find a solution that works for you. If you want to protect valuable trade secrets from reverse engineering and reuse, you may not be so happy.
You're facing the same issue as proponents of DRM.
If your program (which you wish to be able to run the DLL) is runnable by some user account, then there is nothing that can stop a sufficiently determined programmer who can log on as that user from isolating the code that performs the decryption and using that to decrypt your DLL and run it.
You can of course make it inconvenient to perform this reverse engineering, and that may well be enough.
Take a look at the StrongNameIdentityPermissionAttribute. It will allow you to declare access to your assembly. Combined with a good code protection tool (like CodeVeil (disclaimer I sell CodeVeil)) you'll be quite happy.
You could embed it into your executable, and extract and loadlibrary at runtime and call into it. Or you could use some kind of shared key to encrypt/decrypt the accompanying file and do the same above.
I'm assuming you've already considered solutions like compiling it in if you really don't want it shared. If someone really wants to get to it though, there are many ways to do it.
Have you tried .Net reactor? I recently came across it. Some people say its great but I am still testing it out.
Well you could mark all of your "public" classes as "internal" or "protected internal" then mark you assemblies with [assembly:InternalsVisibleTo("")] Attribute and no one but the marked assemblies can see the contents.
You may be interested in the following information about Friend assemblies:
http://msdn.microsoft.com/en-us/library/0tke9fxk(VS.80).aspx