C#: Creating a Methodheader Parser

C#: Creating a Methodheader Parser - c#

I would like to write a parser to tell me what part of a string is a methodheader. What is the best way to do this in C#?
The language grammar specification can be found here. I don't think this is proper BNF/EBNF, so perhaps there is a way to transform it into such (like an html parser that puts it into proper BNF.)
Should I use regular expressions or a custom built parser somehow? I am restricted in that I need to build it myself without the help of outside tools.

I found the NRefactory library, part of the open-source SharpDevelop tool, to be very good at parsing C# modules into an abstract syntax tree. Once you have that you can scan through very easily to find the method headers, the locations, and so on.
Though its primary use is for within SharpDevelop (A GUI tool), it is a standalone DLL, and it can be used within any .NET app. The documentation isn't very thorough, as far as I could tell, but Reflector let me examine it and figure things out pretty easily.
some code:
internal static string CreateAstSexpression(string filename)
{
using (var fs = File.OpenRead(filename))
{
using (var parser = ParserFactory.CreateParser(SupportedLanguage.CSharp,
new StreamReader(fs)))
{
parser.Parse();
// RetrieveSpecials() returns an IList<ISpecial>
// parser.Lexer.SpecialTracker.RetrieveSpecials()...
// "specials" == comments, preprocessor directives, etc.
// parser.CompilationUnit retrieves the root node of the result AST
return SexpressionGenerator.Generate(parser.CompilationUnit).ToString();
}
}
}
The ParserFactory class is part of NRefactory.
In my case I wanted a lisp s-expression describing the C# buffer, so I wrote an S-expression generator that walked through the "CompilationUnit". It's just a tree of nodes, starting with namespace, then class/struct/enum. Within the class/struct node, there are method nodes (as well as field, property, etc).
If that finished DLL is not of interest, then maybe this is.
Before finding and embracing NRefactory, I tried to produce a wisent grammar for c#. This was for use within emacs, which has a wisent engine.
I never could get it to work properly.
Maybe it's of use to you.
you said that you didn't want to use "outside tools". Not sure of the motivation for that restriction; if it is homework, then I guess it makes sense, but for other purposes, it really would be a shame to not use the well-tested and well-understood tools that are already out there.
If you take either of the suggestions I've made here, you're building on something that is an outside tool. But some of the options are a little better than others.

Related

How to get various types used in a C# project with Roslyn

I am analyzing about converting an existing C# library to java. To start with I need to know what are the types / built-in keywords used in the existing C# library. I mean, for example
public class CSharpClass
{
int i;
float j;
Console.Writeline(String.Concat("A","B"));
}
In this class the types/Keywords used are,
public
int
float
Console
String
My Questions are,
Is there any way to do this. I hope I can do this with Roslyn. But we can get “LocalDeclarationStatementSyntax” for variables. But how to parse “Console” and “Concat”. Is that “Sytax walker” that parses all the tokens in a class is the only option?
Also how to get all the classes from a project file with Roslyn?

You need to semantics as well -- syntax is just the text you see and that's exactly what you get, nothing more, nothing less. Get a Compilation for your project, then you can call GetSemanticModel where you give it a tree, and then from there you can call GetTypeInfo or GetSymbolInfo (as appropriate, search online for the difference between these two) to get type information.
As far as getting the Compilation, if you're writing a command line tool you probably want to use MSBuildWorkspace to load your project. If you're analyzing the projects open in Visual Studio, use VisualStudioWorkspace, etc.

Basic implementation of AOP like attribute using standard .NET Framework [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
C# wrap method via attributes
I'd like to achieve such functionality:
[Atomic]
public void Foo()
{
/* foo logic */
}
Where [Atomic] attribute is an attribute, which wraps function logic within a transaction scope:
using(var scope = new TransactionScope())
{
/* foo logic */
scope.Complete();
}
How to write such an attribute?
I've asked before basically the same question, I know this can be done using AOP, but I didn't mention I'm searching for some simplest proof of concept implementation or helpful articles which can help me to write this using pure .NET Framework (I suppose using RealProxy and MarshalByRefObject types, about which I've read browsing related questions).
I need to solve exactly this shown example. It seems like a basic thing so I want to learn how to do it starting from scratch. It doesn't need to be safe and flexible for now.

It seems like a basic thing...
It's one of the (many) things which are simple to understand the concept, but not at all simple to implement.
As per Oded's answer, Attributes in .NET don't do anything. They only exist so that other code (or developers) can look at them later on. Think of it as a fancy comment.
With that in mind, you can write your attribute like this
public class AtomicAttribute : Attribute { }
Now the hard part, you have to write some code to scan for that attribute, and change the behaviour of the code.
Given that C# is a compiled language, and given the rules of the .NET CLR there are theoretically 3 ways to do this
Hook into the C# compiler, and make it output different code when it sees that attribute.
This seems like it would be nice, but it is simply not possible
right now. Perhaps the
Roslyn
project might allow this in future, but for now, you can't do it.
Write something which will scan the .NET assembly after the C# compiler has converted it to MSIL, and change the MSIL.
This is basically what PostSharp does. Scanning and rewriting MSIL is hard. There are libraries such as Mono.Cecil which can help, but it's still a hugely difficult problem. It may also interfere with the debugger, etc.
Use the .NET Profiling API's to monitor the program while it is running, and every time you see a function call with that attribute, redirect it to some other wrapper function.
This is perhaps the simplest option (although it's still very difficult), but the drawback is that your program now must be run under the profiler. This may be fine on your development PC, but it will cause a huge problem if you try deploy it. Also, there is likely to be a large performance hit using this approach.
In my opinion, your best bet is to create a wrapper function which sets up the transaction, and then pass it a lambda which does the actual work. Like this:
public static class Ext
{
public static void Atomic(Action action)
{
using(var scope = new TransactionScope())
{
action();
scope.Commit();
}
}
}
.....
using static Ext; // as of VS2015
public void Foo()
{
Atomic(() => {
// foo logic
}
}
The fancy computer science term for this is Higher order programming

Attributes are meta data - that's all they are.
There are many tools that can take advantage of such metadata, but such tooling needs to be aware of the attribute.
AOP tools like PostSharp read such metadata in order to know what and where to weave aspects into code.
In short - just writing an AtomicAttribute will give you nothing - you will need to pass the compiled assembly through a tool that knows about this attribute and do "something" to it in order to achieve AOP.

It is not a basic thing at all. No extra code is run just because a method has an attribute, so there is nowhere to put your TransactionScope code.
What you would need to do is at application start-up use reflection to iterate over every method on every class in your assembly and find the methods that are marked with AtomicAttribute, then write a custom proxy around that object. Then somehow get everything else to call your proxy instead of the real implementation, perhaps using a dependency injection framework.
Most AOP frameworks do this at build time. PostSharp for example runs after VisualStudio builds your assembly. It scans your assembly and rewrites the IL code to include the proxies and AOP interceptors. This way the assembly is all set to go when it is run, but the IL has changed from what you originally wrote.

Maybe resolve all objects using IoC container?
You could configure interceptors for your types and in them check if called method is decorated with that attribute. You could cache that information so that you don't have to use reflection on every method call.
So when you do this:
var something = IoC.Resolve<ISomething>();
something is not object you have implemented but proxy. In that proxy you can do whatever you want before and after the method call.

Porting a very Pythonesque library over to .NET

I'm investigating the possibility of porting the Python library Beautiful Soup over to .NET. Mainly, because I really love the parser and there's simply no good HTML parsers on the .NET framework (Html Agility Pack is outdated, buggy, undocumented and doesn't work well unless the exact schema is known.)
One of my primary goals is to get the basic DOM selection functionality to really parallel the beauty and simplicity of BeautifulSoup, allowing developers to easily craft expressions to find elements they're looking for.
BeautifulSoup takes advantage of loose-binding and named parameters to make this happen. For example, to find all a tags with an id of test and a title that contains the word foo, I could do:
soup.find_all('a', id='test', title=re.compile('foo'))
However, C# doesn't have a concept of an arbitrary number of named elements. The .NET4 Runtime has named parameters, however they have to match an existing method prototype.
My Question: What is the C# design pattern that most parallels this Pythonic construct?
Some Ideas:
I'd like to go after this based on how I, as a developer, would like to code. Implementing this is out of the scope of this post. One idea I has would be to use anonymous types. Something like:
soup.FindAll("a", new { Id = "Test", Title = new Regex("foo") });
Though this syntax loosely matches the Python implementation, it still has some disadvantages.
The FindAll implementation would have to use reflection to parse the anonymous type, and handle any arbitrary metadata in a reasonable manner.
The FindAll prototype would need to take an Object, which makes it fairly unclear how to use the method unless you're well familiar with the documented behavior. I don't believe there's a way to declare a method that must take an anonymous type.
Another idea I had is perhaps a more .NET way of handling this but strays further away from the library's Python roots. That would be to use a fluent pattern. Something like:
soup.FindAll("a")
.Attr("id", "Test")
.Attr("title", new Regex("foo"));
This would require building an expression tree and locating the appropriate nodes in the DOM.
The third and last idea I have would be to use LINQ. Something like:
var nodes = (from n in soup
where n.Tag == "a" &&
n["id"] == "Test" &&
Regex.Match(n["title"], "foo").Success
select n);
I'd appreciate any insight from anyone with experience porting Python code to C#, or just overall recommendations on the best way to handle this situation.

Have you try to run your code inside the IronPython engine. As far as I know performs really well and you don't have to touch your python code.

Automatically marking c# classes for serialization

I want to build a visual studio plugin that automatically annotates classes for serialization. For example for the built in binary serializer I could just add [Serializable] to the class declaration, for WCF it could add [DataContract] to the class and [DataMember] to the members and properties (I could get [KnownType] information through reflection and annotate where appropriate). If using protocol buffers it could add [ProtoContract], [ProtoMember] and [ProtoInclude] attributes and so on.
I am assuming that the classes we are going to use this on are safe to serialize (so no sockets or nonserializable stuff in there). What I want to know is what is the easier way to take an existent piece of code (or a binary if that's easier) and add those attributes while preserving the rest of the code intact. I am fine with the output being source code or binary.
It comes to mind the idea of a using a C# parser, parse everything find the interesting code elements, annotate them and write back the code. However that seems to be very complex given the relatively small amount of modifications I want to make to the code. Is there an easier way to do so?

Visual Studio already has an API for discovering and emitting code which you might take a look at. It's not exactly a joy to use but could work for this purpose.

While such a plugin would certainly be a useful thing, I would consider rather making an add-in for a tool like ReSharper instead of VS directly. The advantage is somebody already solved the huge pile of problems you haven't even dreamed of yet and so it will be a lot easier to build such a specific functionality.

it looks to me like you need to have a MSBuild task similar to this one http://kindofmagic.codeplex.com/. is that about right?

Templates in C#

I know generics are in C# to fulfill a role similar to C++ templates but I really need a way to generate some code at compile time - in this particular situation it would be very easy to solve the problem with C++ templates.
Does anyone know of any alternatives? Perhaps a VS plug-in that preprocesses the code or something like that? It doesn't need to be very sophisticated, I just need to generate some methods at compile time.
Here's a very simplified example in C++ (note that this method would be called inside a tight loop with various conditions instead of just "Advanced" and those conditions would change only once per frame - using if's would be too slow and writing all the alternative methods by hand would be impossible to maintain). Also note that performance is very important and that's why I need this to be generated at compile time.
template <bool Advanced>
int TraceRay( Ray r )
{
do
{
if ( WalkAndTestCollision( r ) )
{
if ( Advanced )
return AdvancedShade( collision );
else
return SimpleShade( collision );
}
}
while ( InsideScene( r ) );
}

You can use T4.
EDIT: In your example, you can use a simple bool parameter.

Not really, as far as I know. You can do this type of thing at runtime, of course; a few meta-programming options, none of them trivial:
reflection (the simplest option if you don't need "fastest possible")
CSharpCodeProvider and some code-generation
the same with CodeDom
ILGenerator if you want hardcore

Generics does work as templates, if that the case.
There is a way to create code in runtime -
Check is CodeProject Example:
CodeProject

In addition to Marc's excellent suggestions, you may want to have a look at PostSharp.

I've done some Meta-Programming - style tricks using static generics that use reflection (and now I'm using dynamic code generation using System.Linq.Expressions; as well having used ILGenerator for some more insane stuff). http://www.lordzoltan.org/blog/post/Pseudo-Template-Meta-Programming-in-C-Sharp-Part-2.aspx for an example I put together (sorry about the lack of code formatting - it's a very old post!) that might be of use.
I've also used T4 (link goes to a series of tutorials by my favourite authority on T4 - Oleg Sych), as suggested by SLaks - which is a really nice way to generate code, especially if you're also comfortable with Asp.Net-style syntax. If you generate partial classes using the T4 output, then the developer can then embellish and add to the class however they see fit.
If it absolutely has to be compile-time - then I'd go for T4 (or write your own custom tool, but that's a bit heavy). If not, then a static generic could help, probably in partnership with the kind of solutions mentioned by Marc.

If you want true code generation, you could use CodeSmith http://www.codesmithtools.com which isn't free/included like T4, but has some more features, and can function as a VS.NET plug-in.

Here's an older article that uses genetic programming to generate and compile code on the fly:
http://msdn.microsoft.com/en-us/magazine/cc163934.aspx
"The Generator class is the kernel of the genetic programming application. It discovers available base class terminals and functions. It generates, compiles, and executes C# code to search for a good solution to the problem it is given. The constructor is passed a System.Type which is the root class for .NET reflection operations."
Might be overkill for your situation, but does show what C# can do. (Note this article is from the 1.0 days)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C#: Creating a Methodheader Parser - c#

Related

How to get various types used in a C# project with Roslyn

Basic implementation of AOP like attribute using standard .NET Framework [duplicate]

Porting a very Pythonesque library over to .NET

Automatically marking c# classes for serialization

Templates in C#

Categories

Resources