Automatically extract classes from a generated file

Automatically extract classes from a generated file - c#

I have a file that is generated containing multiple classes that I want to split into multiple files each one containing just one class.
The code is in c#
Is there a program that can do this (preferably with source code available)? Is there a simple Regex that can extract the classes/interfaces?

I don't believe Regex would be the correct strategy to parse C# code. It could probably works in some simple cases but you probably face some situation tricking you. Think as an example about having some commentend unbalanced '{' in the code.
I suggest to you to investigate this other SO question: Parser for C# about how to parse c# code.

If that's a one off, do the least thing that (quickly) works. So if it is just one or a few files and you're not after a very generic solution, I would identify all the headers (public class Foo and the like), for example with the help of Notepad++ and a recorded macro, manually correcting the results, and then I'd write a little C# program to split the file in places where these headers are

I think you should try with Visual Studio Macro Programming.

Related

is there a "Data Conversion Object" principle/pattern?

I recently had a question regarding String Object with fixed length C# .
(Please read this question first)
Some of the answers, which were given, pointed out that my design might be flawed.
Since the last question was about Strings with a fixed length this one is about the underlying principle. This question might be a little bit long so pleas bear with me.
Requirements:
I have a plain textfile with values in it with a specified fixed length. The standard for this textfiles is from the 90's. I have to create such a file.
A File may contain 1-60 Rows.
There are 10 different types of Rows.
A Row has between 10-40 values.
A Row is specified like this:
Back in the 90's there was an application which created those files placed it on a Server and the server then read the File and did something with it like writing it to the database or informing somebody that something went wrong etc.
This application isn't usable anymore due to recent legal changes.
Suggested design
The new Application that is in its place doesn't provide any data in the form of an export but it has a database with the values inside. I have the responsibility to write a converter. So I have to get the data and write an exported text file. The Data is only send and never received !
Question
Since a A DTO's only purpose is to transfer state, and should have no behavior(POCO vs DTO)
Is there something like a "Data Conversion Object" which has the purpose of converting data which is transfered ? Is there a design pattern which is applicable ?

I recently designed a solution for a similar problem, though my solution was in SAS language, which is not Object-Oriented. But, to me it seems that the problem is pretty much the same. Now, lets dissect the problem:
The problem:
There are some plain text files.
These files have specification, about the layout, fields, types etc.
These files need to be converted to some other format.
Solution (Objected-Oriented):
I would define three classes, PlainTextFile, Specification, Output, and a Reader Class.
Specification: Contractor takes an specification (probably it is stored in a file or so), and parses that into an Specification object.
PlainTextFile: This can be handle to a text file, or a wrapper around the handle if some other feature is added to it. I prefer the second option.
Output: This is the output you would like to produce.
Reader: It takes two inputs, PlainTextFile and Specification. Uses Specification to read and parse the PlainTextFile and write the output in the Output object/format.
Now, the output can be the final step or not. I suggest, that the Reader do only this much. It you want to write the output to a database, or send it somewhere, create another class to do this.
Remember, I don't know what the name of this pattern. Actually, I don't think that matter much. For me, this method solved a problem that existed in the company for a decade and it integrated two of the most used systems there.

Manipulating a Python file from C#

I'm working on some tools for a game I'm making. The tools serve as a front end to making editing game files easier. Several of the files are python scripting files. For instance, I have an Items.py file that contains the following (minimalized for example)
from ItemModule import *
import copy
class ScriptedItem(Item):
def __init__(self, name, description, itemtypes, primarytype, flags, usability, value, throwpower):
Item.__init__(self, name, description, itemtypes, primarytype, flags, usability, value, throwpower, Item.GetNextItemID())
def Clone(self):
return copy.deepcopy(self)
ItemLibrary.AddItem(ScriptedItem("Abounding Crystal", "A colourful crystal composed of many smaller crystals. It gives off a warm glow.", ItemType.SynthesisMaterial, ItemType.SynthesisMaterial, 0, ItemUsage.Unusable, 0, 50))
As I Mentioned, I want to provide a front end for editing this file without requring an editor to know python/edit the file directly. My editor needs to be able to:
Find and list all the class types (in this example, it'd be only
Scripted Item)
Find and list all created items (in this case there'd only be one,
Abounding Crystal). I'd need to find the type (in this
caseScriptedItem) and all the parameter values
Allow editing of parameters and the creation/removal of items.
To do this, I started writing my own parser, looking for the class keyword and when these recorded classes are use to construct objects. This worked for simple data, but when I started using classes with complex constructors (lists, maps, etc.) it became increasing difficult to correctly parse.
After searching around, I found IronPython made it easy to parse python files, so that's what I went about doing. Once I built the Abstract Syntax Tree I used PythonWalkers to identify and find all the information I need. This works perfectly for reading in data, but I don't see an easy way to push updated data into the Python file. As far as I can tell, there's no way to change the values in the AST and much less so to convert the AST back into a script file. If I'm wrong, I'd love for someone to tell me how I could do this. What I'd need to do now is search through the file until I find the correctly line, then try to push the data into the constructor, ensuring correct ordering.
Is there some obvious solution I'm not seeing? Should I just keeping working on my parser and make it support more complex data types? I really though I had it with the IronPython parser, but I didn't think about how tricky it'd be to push modified data back into the file.
Any suggestions would be appreciated

You want a source-to-source program transformation tool.
Such a tool parses a language to an internal data structure (invariably an AST), allows you to modify the AST, and then can regenerate source text from the modified AST without changing essentially anything about the source except where the AST changes were made.
Such a program transformation tool has to parse text to ASTs, and "anti-parse" (called "Prettyprint") ASTs to text. If IronPython has a prettyprinter, that's what you need.
If it doesn't, you can build one with some (maybe a lot) of effort; as you've observed,
this isn't as easy as one might think. See my answer
Compiling an AST back to source code
If that doesn't work, our DMS Software Reengineering Toolkit with its Python front end might do the trick. It has all the above properties.

Provided you can find a complete and up-to-date context free grammar file for Python, you could use CoCo/R parser generator to generate a python parser in C#.
You can add production code to the grammar file itself to populate a data structure in your C# app. Said data structure can hold all the information you need (methods and their arguments, properties, constructors, destructors etc). Once you have this data structure, its just a task of designing a front end for the user and representing this data structure in a way that makes it editable to them (this is more of a design task than a complicated programming task).
Finally, iterate through you data structure and write out a .py file.

You can use the python inspect module to print the source of an object. In your case: To print the source of your module - the file you just parsed with IronPython. I haven't checked to see if inspect works with IronPython yet, though.
As to adding stuff, well, it's a module, right? You can just add stuff to a module... I'd load the module and then alter it, use inspect to view print it and save to disk.
From your post, it looks like you're already deep in the trenches and having fun, so I'd be really happy to see a post here on how you solved this problem!

To me it sounds more like you are at the point where you shove it all into a sqlite database and start editing it that way. Hooking up some forms to edit tables is simpler for the UI. At that point you generate new python files by dumping your tables out with some formatting to provide the surrounding python scripts.
SVN / Git / whatever can merge the updated changes via the python files.
This is what I ended up doing for my project at any rate. I started using python to hook up the various items using their computed keys and then just added some forms UI to avoid editing mistakes in the python files.

how to break up the code in a method body into statement by statement symbols/"tokens"

I'm writing something that will examine a function and rewrite that function in another language so basically if inside my function F1, i have this line of code var x=a.b(1) how do i break up the function body into symbols or "tokens"?
I've searched around and thought that stuff in System.Reflection.MethodInfo.GetMethodBody would do the trick however that class doesn't seem to be able to have the capabilities to do what i want..
what other solutions do we have?
Edit:
Is there anyway we can get the "method body" of a method using reflection? (like as a string or something)
Edit 2:
basically what I'm trying to do is to write a program in c#/vb and when i hit F5 a serializer function will (use reflection and) take the entire program (all the classes in that program) and serialize it into a single javascript file. of course javascript doesn't have the .net library so basically the C#/VB program will limit its use of classes to the .js library (which is a library written in c#/vb emulating the framework of javascript objects).
The advantage is that i have type safety while coding my javascript classes and many other benefits like using overloading and having classes/etc. since javascript doesn't have classes/overloading features natively, it rely on hacks to get it done. so basically the serializer function will write the javascript based on the C#/VB program input for me (along with all the hacks and possible optimizations).
I'm trying to code this serializer function

It sounds like you want a parse tree, which Reflection won't give you. Have a look at NRefactory, which is a VB and C# parser.

If you want to do this, the best way would be to parse the C#/VB code with a parser/lexer, such as the Gardens Point Parser Generator, flex/bison or ANTLR. then at the token level, reassemble it with proper javascript grammar. There are a few out there for C# and Java.

See this answer on analyzing and transforming source code
and this one on translating between programming languages.
These assume that you use conventional compiler methods for breaking your text into tokens ("lexing") and grouping related tokens into program structures ("parsing"). If you analysis is anything other than trivial, you'll need all the machinery, or it won't be reliable.
Reflection can only give you what the language designers decided to give you. They invariably don't give you detail inside functions.

If you want to go from IL to other language it may be easier than parsing source language first. If you want to go this route consider reading on Microsoft's "Volta" project (IL->JavaScript), while project is no longer available there are still old blogs discussing issues around it.
Note that reflection alone is not enough - reflection gives you byte array for the body of any particular method (MethodInfo.GetMethodBody.GetILAsByteArray - http://msdn.microsoft.com/en-us/library/system.reflection.methodbody.aspx) and you have to read it. There are several publically available "IL reader" libraries.

Is it possible to generate a list of all marked strings in c# at compile time/runtime?

So, I've got a little translation system set up for my app, where we generate a list of all strings that are marked as translatable, dump that in a CSV as the translation template, and then a translator fills out the next column with the translations.
The problem I'm trying to solve is how to extract a bunch of marked strings from a codebase, to automate the translation template generation.
An example line of c# code looks something like:
textBoxName.Text = string.Format(Translate.tr("Create {0}"), NextAutoName());
And c++ would look like:
info_out << tr( L"Grip weights range from {0} to {1}" )(low_weight)(high_weight) << endl;
On the c++ side, building the list of strings for the template generation uses a c++ parser (see my previous question), which runs as part of the external builds over all c++ code in the project. Basically, any string that's placed in a tr() call is automatically extracted.
Is there a better solution with c# than writing another parser? I would love a list of strings that gets produced at compile time, or one that I can access at runtime. A List<string> would be great.
I'd like to keep the same translation file format, because that makes coordinating the two sides much simpler. As you'd expect, we reuse a lot of strings.
Right now it's much more convenient in c++ to keep the translation template up to date - I just need to make sure that the strings I want translated are wrapped in tr(), and the parser handles the rest. In c#, I currently manually inspect all strings, and update a dummy function on the c++ side with new strings. I'm getting close to breaking down and just writing another parser. I was hoping that c#, with its high-level features, might be able to do a better job here.

I have a project that actually does the exact same thing. Only the translation parser is itself written in C# (it's name is c3po). In addition to parsing the project, c3po is also responsible for generating files to send to the translation vendor as well as generating the files that the .net project uses to store the translated strings. We found it has several advantages over the "traditional" .Net resource files:
1) because c3po maintains it's own internal database of localized strings we can keep track of our own translation memory and make sure to only send new strings to the translators every month. It also removes strings that are no longer needed. This has saved us literary thousands of dollars in translation costs.
2) Developers are free to write whatever string they want, wherever they want and they don't have to worry about resource files.
3) c3po services several different projects at once which streamlines our interactions with our translation vendor.
4) We can automate c3po through our CI server so every time a developer checks in, (or once a night or whatever) we can have it do all of it's tasks including sending files to the translators, pick up new phrases, etc.

I did something similar (I was just pulling out a list of string constants) and I used the same parser for both C# and C++. Though it was more lexical analyser than parser, which works because the two languages have a very similar lexical structure.
Once you've got the list of strings, you can write it to a C# source file and compile it into your program. Your code would generate a file something like this:
namespace MyProject
{
class MyStrings
{
public string[] Strings = {
"pony",
"cob",
"stallion"
};
}
}
Then you can include this file in your project to give your code access to a list of its strings.
If you run your tool as a pre-build event it will happen as part of the build.
However, as Hans says, you might do better to look at the built-in localization support.

Reflect and Load code at design time in Visual Studio

I have an XML file that lists a series of items, and the items are often referenced by their name in code.
Is there any way to use reflection within Visual Studio to make this list 'accessible' sort of like intellisence? I've seen it done before - and I'm pretty sure it's ridiculously difficult, but I figure it can't hurt to at least ask.

I would recommend against using reflection for this.
Apart from the added complexity in the code you are also opening the code up to abuse from somebody modifying your XML to get your code to do what they want (think injection attack).
You would be better off parsing the XML as usual but using a big if / switch statement to define what how the code runs. That way you have more chance of catching any problems and validating the input.
From string to function call sounds great but will bite you in the bum.

I think he wants to access xml from c# code with intelligences.
My guess is that you would have to build some sort of code generator that would generate c# class that has the properties of you xml field... kind of how visual studio generates code for resourcefiles.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.