Selectively remove preprocessor directives using Roslyn

Selectively remove preprocessor directives using Roslyn - c#

I want to use Roslyn to clean the code of some of the older preprocessor directives.
For example, from this code
#define TEST_1_0
#define TEST_1_1
namespace ConsoleApplication1
{
class TypeName
{
public static void Main(string[] args)
{
#if TEST_1_0
int TEST_1_0 = 1;
#if TEST_1_1
int TEST_1_1 = 1;
#else//TEST_1_1
int TEST_1_1 = 0;
#endif//TEST_1_1
#else//TEST_1_0
int TEST_1_0 = 0;
#endif//TEST_1_0
}
}
}
I'd like to remove else//TEST_1_0, but keep the else//TEST_1_1. I cannot count on the comments, so I should related a #if with its corresponding #else, if there is one.
Finding the #if is easy, but finding the corresponding #else is less easy.
I tried two strategies:
Here i lookup #else//TEST_1_0 in the analyzer, and create a codefix for that location
Here I just create a codefix for #if TEST_1_0 in analyzer, and try to get to the corresponding else from the CodeFixprovider
Both get quite complicated quickly, it seems problematic that directives are trivia, which are spread out over the leadingTrivia of different SyntaxTokens. Changes in the code affect the location directives around quite a bit so it looks like lots of work to program all cases..
Am I missing something? Is there an easier way to do this without programming all the different cases by hand?
Would you go for strategy 1 or 2?

I agree with Arjan - Roslyn in not usable for the task. To solve a similar task I made my own simple C# preprocessor tool based on regexp and Python sympy library: undefine. I believe it would be helpful for you. As for the task you described, try the following command:
>> python undefine apply -d TEST_1_0 YourFile.cs

I concluded that roslyn is not the way to go here.
Roslyn models preprocesessor directives as trivia in the syntax tree, and the location of trivia has great variation depending on the structure of the actual code.
Therefore working on the syntax tree introduces lookup complexities that are not an issue when working text based, and more complexity means more RISK. Binaries should be the same before/after processing!
So I chose to abandon Roslyn and simply parse the code/directive mix as text, using regex to parse and the good-old stack to handle the directive logic.
Now it's much easier, even a piece of cake ..
Still need to handle some encoding issues, then I'm done! :)
Happy parsing!

Related

Is it possible to create a C# compile time Method? [duplicate]

I've been puzzling about this for a while and I've looked around a bit, unable to find any discussion about the subject.
Lets assume I wanted to implement a trivial example, like a new looping construct: do..until
Written very similarly to do..while
do {
//Things happen here
} until (i == 15)
This could be transformed into valid csharp by doing so:
do {
//Things happen here
} while (!(i == 15))
This is obviously a simple example, but is there any way to add something of this nature? Ideally as a Visual Studio extension to enable syntax highlighting etc.

Microsoft proposes Rolsyn API as an implementation of C# compiler with public API. It contains individual APIs for each of compiler pipeline stages: syntax analysis, symbol creation, binding, MSIL emission. You can provide your own implementation of syntax parser or extend existing one in order to get C# compiler w/ any features you would like.
Roslyn CTP
Let's extend C# language using Roslyn! In my example I'm replacing do-until statement w/ corresponding do-while:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Roslyn.Compilers.CSharp;
namespace RoslynTest
{
class Program
{
static void Main(string[] args)
{
var code = #"
using System;
class Program {
public void My() {
var i = 5;
do {
Console.WriteLine(""hello world"");
i++;
}
until (i > 10);
}
}
";
//Parsing input code into a SynaxTree object.
var syntaxTree = SyntaxTree.ParseCompilationUnit(code);
var syntaxRoot = syntaxTree.GetRoot();
//Here we will keep all nodes to replace
var replaceDictionary = new Dictionary<DoStatementSyntax, DoStatementSyntax>();
//Looking for do-until statements in all descendant nodes
foreach (var doStatement in syntaxRoot.DescendantNodes().OfType<DoStatementSyntax>())
{
//Until token is treated as an identifier by C# compiler. It doesn't know that in our case it is a keyword.
var untilNode = doStatement.Condition.ChildNodes().OfType<IdentifierNameSyntax>().FirstOrDefault((_node =>
{
return _node.Identifier.ValueText == "until";
}));
//Condition is treated as an argument list
var conditionNode = doStatement.Condition.ChildNodes().OfType<ArgumentListSyntax>().FirstOrDefault();
if (untilNode != null && conditionNode != null)
{
//Let's replace identifier w/ correct while keyword and condition
var whileNode = Syntax.ParseToken("while");
var condition = Syntax.ParseExpression("(!" + conditionNode.GetFullText() + ")");
var newDoStatement = doStatement.WithWhileKeyword(whileNode).WithCondition(condition);
//Accumulating all replacements
replaceDictionary.Add(doStatement, newDoStatement);
}
}
syntaxRoot = syntaxRoot.ReplaceNodes(replaceDictionary.Keys, (node1, node2) => replaceDictionary[node1]);
//Output preprocessed code
Console.WriteLine(syntaxRoot.GetFullText());
}
}
}
///////////
//OUTPUT://
///////////
// using System;
// class Program {
// public void My() {
// var i = 5;
// do {
// Console.WriteLine("hello world");
// i++;
// }
//while(!(i > 10));
// }
// }
Now we can compile updated syntax tree using Roslyn API or save syntaxRoot.GetFullText() to text file and pass it to csc.exe.

The big missing piece is hooking into the pipeline, otherwise you're not much further along than what .Emit provided. Don't misunderstand, Roslyn brings alot of great things, but for those of us who want to implement preprocessors and meta programming, it seems for now that was not on the plate. You can implement "code suggestions" or what they call "issues"/"actions" as an extension, but this is basically a one off transformation of code that acts as a suggested inline replacement and is not the way you would implement a new language feature. This is something you could always do with extensions, but Roslyn makes the code analysis/transformation tremendously easier:
From what I've read of comments from Roslyn developers on the codeplex forums, providing hooks into the pipeline has not been an initial goal. All of the new C# language features they've provided in C# 6 preview involved modifying Roslyn itself. So you'd essentially need to fork Roslyn. They have documentation on how to build Roslyn and test it with Visual Studio. This would be a heavy handed way to fork Roslyn and have Visual Studio use it. I say heavy-handed because now anyone who wants to use your new language features must replace the default compiler with yours. You could see where this would begin to get messy.
Building Roslyn and replacing Visual Studio 2015 Preview's compiler with your own build
Another approach would be to build a compiler that acts as a proxy to Roslyn. There are standard APIs for building compilers that VS can leverage. It's not a trivial task though. You'd read in the code files, call upon the Roslyn APIs to transform the syntax trees and emit the results.
The other challenge with the proxy approach is going to be getting intellisense to play nicely with any new language features you implement. You'd probably have to have your "new" variant of C#, use a different file extension, and implement all the APIs that Visual Studio requires for intellisense to work.
Lastly, consider the C# ecosystem, and what an extensible compiler would mean. Let's say Roslyn did support these hooks, and it was as easy as providing a Nuget package or a VS extension to support a new language feature. All of your C# leveraging the new Do-Until feature is essentially invalid C#, and will not compile without the use of your custom extension. If you go far enough down this road with enough people implementing new features, very quickly you will find incompatible language features. Maybe someone implements a preprocessor macro syntax, but it can't be used along side someone else's new syntax because they happened to use similar syntax to delineate the beginning of the macro. If you leverage alot of open source projects and find yourself digging into their code, you would encounter alot of strange syntax that would require you side track and research the particular language extensions that project is leveraging. It could be madness. I don't mean to sound like a naysayer, as I have alot of ideas for language features and am very interested in this, but one should consider the implications of this, and how maintainable it would be. Imagine if you got hired to work somewhere and they had implemented all kinds of new syntax that you had to learn, and without those features having been vetted the same way C#'s features have, you can bet some of them would be not well designed/implemented.

You can check www.metaprogramming.ninja (I am the developer), it provides an easy way to accomplish language extensions (I provide examples for constructors, properties, even js-style functions) as well as full-blown grammar based DSLs.
The project is open source as well. You can find documentations, examples, etc at github.
Hope it helps.

You can't create your own syntactic abstractions in C#, so the best you can do is to create your own higher-order function. You could create an Action extension method:
public static void DoUntil(this Action act, Func<bool> condition)
{
do
{
act();
} while (!condition());
}
Which you can use as:
int i = 1;
new Action(() => { Console.WriteLine(i); i++; }).DoUntil(() => i == 15);
although it's questionable whether this is preferable to using a do..while directly.

I found the easiest way to extend the C# language is to use the T4 text processor to preprocess my source. The T4 Script would read my C# and then call a Roslyn based parser, which would generate a new source with custom generated code.
During build time, all my T4 scripts would be executed, thus effectively working as an extended preprocessor.
In your case, the none-compliant C# code could be entered as follows:
#if ExtendedCSharp
do
#endif
{
Console.WriteLine("hello world");
i++;
}
#if ExtendedCSharp
until (i > 10);
#endif
This would allow syntax checking the rest of your (C# compliant) code during development of your program.

No there is no way to achieve what you'are talking about.
Cause what you're asking about is defining new language construct, so new lexical analysis, language parser, semantic analyzer, compilation and optimization of generated IL.
What you can do in such cases is use of some macros/functions.
public bool Until(int val, int check)
{
return !(val == check);
}
and use it like
do {
//Things happen here
} while (Until(i, 15))

how to add debug code? (should go to Debug, shouldn't go to Release)

I need to log a lot of information in my software for debugging.
However I need this option only during development, I would prefer to exclude all this code in Release.
Of course I can surround the place I want to debug with "if":
if (isDebugMode()) {
Logger.log(blahblah);
}
But because my software is pretty time-critical, I want to avoid a lot of unnesseray "if" tests.
I.e. I think I need analog of c "#define #ifdef #ifndef". Are there any technics in c# or .net to solve my task easily?

You can use [ConditionalAttribute("DEBUG")] in front of your Logger.log method so it will only get compiled in debug mode and completely removed in release mode.
This is how the Debug class methods are all marked.
The upside of this compared to using the #if approach is that code with the Conditional attribute doesn't even exist at runtime, and all references to it get removed, instead of staying there with an empty body (or however you'd compile it conditionally).

Why don't you instead use the DEBUG symbol?
#if DEBUG
// debug only code here
#endif
Here is an article describing all of the preprocessor directives available to you in C#, and you are also able to define your own preprocessor variables in your project's properties menu.

As others have mentioned, you can use a combination of the "preprocessor" directives:
#if DEBUG
ThisWholeSectionIsTreatedAsACommentIfDEBUGIsNotDefined();
#endif
and the conditional attribute:
Debug.Assert(x); // Call will be removed entirely in non-debug build because it has conditional attr
However, it is important to realize that one of them is a part of the lexical analysis of the program and the other is part of the semantic analysis. It is easy to get confused. My article on the difference might help:
http://ericlippert.com/2009/09/10/whats-the-difference-between-conditional-compilation-and-the-conditional-attribute/

You can use this as a native alternative for debug only logging:
Debug.Write("blah blah");
which will only be logged if this is a debug build.

Conditional attribute could help in this case.
Something like this:
class DebugModeLogger
{
[Conditional("DEBUG")]
public static void WriteLog(string message)
{
//Log it
}
}
Extract from MSDN:
The attribute Conditional enables the definition of conditional methods. The Conditional attribute indicates a condition by testing a conditional compilation symbol. Calls to a conditional method are either included or omitted depending on whether this symbol is defined at the point of the call. If the symbol is defined, the call is included; otherwise, the call (including evaluation of the parameters of the call) is omitted.

Batch refactor "using" statement declarations in C# across multiple files

I have a bunch of c# code I inherited that has "using" statement declarations like this
using Foo;
using NS1=Bar.x.y.z;
and I've been asked to make our codebase consistent with regard to namespacing - the policy is simply that
1 some namespaces should always be fully qualified (no aliases) - for example things inside "Foo" above should always be fully qualitied.
2 some namesapces like Bar.x.y.a should always be accessed through a specific using alias ("NS1" in the above example)
To illustrate what is desired,
if this is the BEFORE code
using FOO;
int x = SomeClass1.SomeStaticMethodMethod(1); // where SomeClass1 is in "FOO"
var y = new Bar.x.y.z.SomeClass2()
this is what is desired AFTER
using NS1 = Bar.x.y.z;
int x = Foo.SomeClass1.SomeStaticMethodMethod(1); // where SomeClass1 is in "FOO"
var y = new NS1.SomeClass2()
Of course I can do all this manually. But I have a lot of files to Fix. I'm looking for a tool that can do this over many files (100s of .CS files). I even have the latest version of Resharper (5.1) which doesn't seem to let me do this. (Actually resharper is in fact causing more problems because it loves adding using statements I don't want)
Are there tools or techinques I can use to simplify my task? I am allowed to purchase more dev tools so commercial tools are an option for me.

I know it probably is not ideal buy you could use DxCore from DevExpress to write a plugin similarly to those
http://code.google.com/p/dxcorecommunityplugins/

Fine-grained visibility for 'internal' members

Recently I was reading about partitioning code with .NET assemblies and stumbled upon a nice suggestion from this post: "reduce the number of your .NET assemblies to the strict minimum".
I couldn't agree more! And one of the reasons I personally see most often, is that people just want to isolate some piece of code, so they make the types/methods internal and put them into a separate project.
There are many other reasons (valid and not) for splitting code into several assemblies, but if you want to isolate components/APIs while still having them located in one library, how can you do that?
namespace MyAssembly.SomeApiInternals
{
//Methods from this class should not
//be used outside MyAssembly.SomeApiInternals
internal class Foo
{
internal void Boo() { }
}
}
namespace MyAssembly.AnotherPart
{
public class Program
{
public void Test()
{
var foo = MyAssembly.SomeApiInternals.Foo();
foo.Boo(); //Ok, not a compiler error but some red flag at least
}
}
}
How can one restrict a type/method from being used by other types/methods in the same assembly but outside this very namespace?
(I'm going to give a few answers myself and see how people would vote.)
Thanks!

You could put the code in different assemblies, then merge the assemblies with ILMerge in a post-build step...

Use NDepend and put in CQL rules that embody what you want and run them as part of your build. The language isnt interested in this level of restrictions. (I hadnt followed your link yet - are you really trying to do this without NDepend? Your answer should rule it in or out)

How do you use #define?

I'm wondering about instances when it makes sent to use #define and #if statements. I've known about it for a while, but never incorporated it into my way of coding. How exactly does this affect the compilation?
Is #define the only thing that determines if the code is included when compiled? If I have #define DEBUGme as a custom symbol, the only way to exclude it from compile is to remove this #define statement?

In C# #define macros, like some of Bernard's examples, are not allowed. The only common use of #define/#ifs in C# is for adding optional debug only code. For example:
static void Main(string[] args)
{
#if DEBUG
//this only compiles if in DEBUG
Console.WriteLine("DEBUG")
#endif
#if !DEBUG
//this only compiles if not in DEBUG
Console.WriteLine("RELEASE")
#endif
//This always compiles
Console.ReadLine()
}

#define is used to define compile-time constants that you can use with #if to include or exclude bits of code.
#define USEFOREACH
#if USEFOREACH
foreach(var item in items)
{
#else
for(int i=0; i < items.Length; ++i)
{ var item = items[i]; //take item
#endif
doSomethingWithItem(item);
}

Is #define the only thing that
determines if the code is included
when compiled? If I have #define
DEBUGme as a custom symbol, the only
way to exclude it from compile is to
remove this #define statement?
You can undefine symbols as well
#if defined(DEBUG)
#undef DEBUG
#endif

Well, defines are used often for compile time constants and macros. This can make your code a bit faster as there are really no function calls, the output values of the macros are determined at compile time. The #if's are very useful. The most simple example that I can think of is checking for a debug build to add in some extra logging or messaging, maybe even some debugging functions. You can also check different environment variables this way.
Others with more C/C++ experience can add more I am sure.

I often find myself defining some things that are done repetitively in certain functions. That makes the code much shorter and thus allows a better overview.
But as always, try to find a good measure to not create a new language out of it. Might be a little hard to read for the occasional maintenance later on.

It's for conditional compilation, so you can include or remove bits of code based upon project attributes which tend to be:
Intended platform (Windows/Linux/XB360/PS3/Iphone.... etc)
Release or Debug (Generally logging, asserts etc are only included in a debug build)
They can also be used to disable large parts of a system quickly,
for example, during development of a game, I might define
#define PLAYSOUNDS
and then wrap the final call to play a sound in:
#ifdef PLAYSOUNDS
// Do lots of funk to play a sound
return true;
#else
return true;
So it's very easy for me to turn on and off the playing of sounds for a build. (Typically I don't play sounds when debugging because it gets in the way of my personal music :) )
The benefit is that you're not introducing a branch through adding an if statement....

#Ed: When using C++, there is rarely any benefit for using #define over inline functions when creating macros. The idea of "greater speed" is a misconception. With inline functions you get the same speed, but you also get type safey, and no side-effects of preprocessor "pasting" due to the fact that parameters are evaluated before the function is called (for an example, try writing the ubiquitous MAX macro, and call it like this: MAX(x++, y).. you'll see what I'm getting at).
I have never had to use #define in my C#, and I very rarely use it for anything other that platform and compiler version checking for conditional compilation in C++.

Perhaps the most common usees of #define in C# is to differentiate between debug/release and different platforms (for example Windows and X-Box 360 in the XNA framework).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.