Manipulating a Python file from C#

Manipulating a Python file from C# - c#

I'm working on some tools for a game I'm making. The tools serve as a front end to making editing game files easier. Several of the files are python scripting files. For instance, I have an Items.py file that contains the following (minimalized for example)
from ItemModule import *
import copy
class ScriptedItem(Item):
def __init__(self, name, description, itemtypes, primarytype, flags, usability, value, throwpower):
Item.__init__(self, name, description, itemtypes, primarytype, flags, usability, value, throwpower, Item.GetNextItemID())
def Clone(self):
return copy.deepcopy(self)
ItemLibrary.AddItem(ScriptedItem("Abounding Crystal", "A colourful crystal composed of many smaller crystals. It gives off a warm glow.", ItemType.SynthesisMaterial, ItemType.SynthesisMaterial, 0, ItemUsage.Unusable, 0, 50))
As I Mentioned, I want to provide a front end for editing this file without requring an editor to know python/edit the file directly. My editor needs to be able to:
Find and list all the class types (in this example, it'd be only
Scripted Item)
Find and list all created items (in this case there'd only be one,
Abounding Crystal). I'd need to find the type (in this
caseScriptedItem) and all the parameter values
Allow editing of parameters and the creation/removal of items.
To do this, I started writing my own parser, looking for the class keyword and when these recorded classes are use to construct objects. This worked for simple data, but when I started using classes with complex constructors (lists, maps, etc.) it became increasing difficult to correctly parse.
After searching around, I found IronPython made it easy to parse python files, so that's what I went about doing. Once I built the Abstract Syntax Tree I used PythonWalkers to identify and find all the information I need. This works perfectly for reading in data, but I don't see an easy way to push updated data into the Python file. As far as I can tell, there's no way to change the values in the AST and much less so to convert the AST back into a script file. If I'm wrong, I'd love for someone to tell me how I could do this. What I'd need to do now is search through the file until I find the correctly line, then try to push the data into the constructor, ensuring correct ordering.
Is there some obvious solution I'm not seeing? Should I just keeping working on my parser and make it support more complex data types? I really though I had it with the IronPython parser, but I didn't think about how tricky it'd be to push modified data back into the file.
Any suggestions would be appreciated

You want a source-to-source program transformation tool.
Such a tool parses a language to an internal data structure (invariably an AST), allows you to modify the AST, and then can regenerate source text from the modified AST without changing essentially anything about the source except where the AST changes were made.
Such a program transformation tool has to parse text to ASTs, and "anti-parse" (called "Prettyprint") ASTs to text. If IronPython has a prettyprinter, that's what you need.
If it doesn't, you can build one with some (maybe a lot) of effort; as you've observed,
this isn't as easy as one might think. See my answer
Compiling an AST back to source code
If that doesn't work, our DMS Software Reengineering Toolkit with its Python front end might do the trick. It has all the above properties.

Provided you can find a complete and up-to-date context free grammar file for Python, you could use CoCo/R parser generator to generate a python parser in C#.
You can add production code to the grammar file itself to populate a data structure in your C# app. Said data structure can hold all the information you need (methods and their arguments, properties, constructors, destructors etc). Once you have this data structure, its just a task of designing a front end for the user and representing this data structure in a way that makes it editable to them (this is more of a design task than a complicated programming task).
Finally, iterate through you data structure and write out a .py file.

You can use the python inspect module to print the source of an object. In your case: To print the source of your module - the file you just parsed with IronPython. I haven't checked to see if inspect works with IronPython yet, though.
As to adding stuff, well, it's a module, right? You can just add stuff to a module... I'd load the module and then alter it, use inspect to view print it and save to disk.
From your post, it looks like you're already deep in the trenches and having fun, so I'd be really happy to see a post here on how you solved this problem!

To me it sounds more like you are at the point where you shove it all into a sqlite database and start editing it that way. Hooking up some forms to edit tables is simpler for the UI. At that point you generate new python files by dumping your tables out with some formatting to provide the surrounding python scripts.
SVN / Git / whatever can merge the updated changes via the python files.
This is what I ended up doing for my project at any rate. I started using python to hook up the various items using their computed keys and then just added some forms UI to avoid editing mistakes in the python files.

Related

Is it possible to generate a list of all marked strings in c# at compile time/runtime?

So, I've got a little translation system set up for my app, where we generate a list of all strings that are marked as translatable, dump that in a CSV as the translation template, and then a translator fills out the next column with the translations.
The problem I'm trying to solve is how to extract a bunch of marked strings from a codebase, to automate the translation template generation.
An example line of c# code looks something like:
textBoxName.Text = string.Format(Translate.tr("Create {0}"), NextAutoName());
And c++ would look like:
info_out << tr( L"Grip weights range from {0} to {1}" )(low_weight)(high_weight) << endl;
On the c++ side, building the list of strings for the template generation uses a c++ parser (see my previous question), which runs as part of the external builds over all c++ code in the project. Basically, any string that's placed in a tr() call is automatically extracted.
Is there a better solution with c# than writing another parser? I would love a list of strings that gets produced at compile time, or one that I can access at runtime. A List<string> would be great.
I'd like to keep the same translation file format, because that makes coordinating the two sides much simpler. As you'd expect, we reuse a lot of strings.
Right now it's much more convenient in c++ to keep the translation template up to date - I just need to make sure that the strings I want translated are wrapped in tr(), and the parser handles the rest. In c#, I currently manually inspect all strings, and update a dummy function on the c++ side with new strings. I'm getting close to breaking down and just writing another parser. I was hoping that c#, with its high-level features, might be able to do a better job here.

I have a project that actually does the exact same thing. Only the translation parser is itself written in C# (it's name is c3po). In addition to parsing the project, c3po is also responsible for generating files to send to the translation vendor as well as generating the files that the .net project uses to store the translated strings. We found it has several advantages over the "traditional" .Net resource files:
1) because c3po maintains it's own internal database of localized strings we can keep track of our own translation memory and make sure to only send new strings to the translators every month. It also removes strings that are no longer needed. This has saved us literary thousands of dollars in translation costs.
2) Developers are free to write whatever string they want, wherever they want and they don't have to worry about resource files.
3) c3po services several different projects at once which streamlines our interactions with our translation vendor.
4) We can automate c3po through our CI server so every time a developer checks in, (or once a night or whatever) we can have it do all of it's tasks including sending files to the translators, pick up new phrases, etc.

I did something similar (I was just pulling out a list of string constants) and I used the same parser for both C# and C++. Though it was more lexical analyser than parser, which works because the two languages have a very similar lexical structure.
Once you've got the list of strings, you can write it to a C# source file and compile it into your program. Your code would generate a file something like this:
namespace MyProject
{
class MyStrings
{
public string[] Strings = {
"pony",
"cob",
"stallion"
};
}
}
Then you can include this file in your project to give your code access to a list of its strings.
If you run your tool as a pre-build event it will happen as part of the build.
However, as Hans says, you might do better to look at the built-in localization support.

Using reflection for code gen?

I'm writing a console tool to generate some C# code for objects in a class library. The best/easiest way I can actual generate the code is to use reflection after the library has been built. It works great, but this seems like a haphazard approch at best. Since the generated code will be compiled with the library, after making a change I'll need to build the solution twice to get the final result, etc. Some of these issues could be mitigated with a build script, but it still feels like a bit too much of a hack to me.
My question is, are there any high-level best practices for this sort of thing?

Its pretty unclear what you are doing, but what does seem clear is that you have some base line code, and based on some its properties, you want to generate more code.
So the key issue here are, given the base line code, how do you extract interesting properties, and how do you generate code from those properties?
Reflection is a way to extract properties of code running (well, at least loaded) into the same execution enviroment as the reflection user code. The problem with reflection is it only provides a very limited set of properties, typically lists of classes, methods, or perhaps names of arguments. IF all the code generation you want to do can be done with just that, well, then reflection seems just fine. But if you want more detailed properties about the code, reflection won't cut it.
In fact, the only artifact from which truly arbitrary code properties can be extracted is the the source code as a character string (how else could you answer, is the number of characters between the add operator and T in middle of the variable name is a prime number?). As a practical matter, properties you can get from character strings are generally not very helpful (see the example I just gave :).
The compiler guys have spent the last 60 years figuring out how to extract interesting program properties and you'd be a complete idiot to ignore what they've learned in that half century.
They have settled on a number of relatively standard "compiler data structures": abstract syntax trees (ASTs), symbol tables (STs), control flow graphs (CFGs), data flow facts (DFFs), program triples, ponter analyses, etc.
If you want to analyze or generate code, your best bet is to process it first into such standard compiler data structures and then do the job. If you have ASTs, you can answer all kinds of question about what operators and operands are used. If you have STs, you can answer questions about where-defined, where-visible and what-type. If you have CFGs, you can answer questions about "this-before-that", "what conditions does statement X depend upon". If you have DFFs, you can determine which assignments affect the actions at a point in the code. Reflection will never provide this IMHO, because it will always be limited to what the runtime system developers are willing to keep around when running a program. (Maybe someday they'll keep all the compiler data structures around, but then it won't be reflection; it will just finally be compiler support).
Now, after you have determined the properties of interest, what do you do for code generation? Here the compiler guys have been so focused on generation of machine code that they don't offer standard answers. The guys that do are the program transformation community (http://en.wikipedia.org/wiki/Program_transformation). Here the idea is to keep at least one representation of your program as ASTs, and to provide special support for matching source code syntax (by constructing pattern-match ASTs from the code fragments of interest), and provide "rewrite" rules that say in effect, "when you see this pattern, then replace it by that pattern under this condition".
By connecting the condition to various property-extracting mechanisms from the compiler guys, you get relatively easy way to say what you want backed up by that 50 years of experience. Such program transformation systems have the ability to read in source code,
carry out analysis and transformations, and generally to regenerate code after transformation.
For your code generation task, you'd read in the base line code into ASTs, apply analyses to determine properties of interesting, use transformations to generate new ASTs, and then spit out the answer.
For such a system to be useful, it also has to be able to parse and prettyprint a wide variety of source code langauges, so that folks other than C# lovers can also have the benefits of code analysis and generation.
These ideas are all reified in the
DMS Software Reengineering Toolkit. DMS handles C, C++, C#, Java, COBOL, JavaScript, PHP, Verilog, ... and a lot of other langauges.
(I'm the architect of DMS, so I have a rather biased view. YMMV).

Have you considered using T4 templates for performing the code generation? It looks like it's getting much more publicity and attention now and more support in VS2010.
This tutorial seems database centric but it may give you some pointers: http://www.olegsych.com/2008/09/t4-tutorial-creatating-your-first-code-generator/ in addition there was a recent Hanselminutes on T4 here: http://www.hanselminutes.com/default.aspx?showID=170.
Edit: Another great place is the T4 tag here on StackOverflow: https://stackoverflow.com/questions/tagged/t4
EDIT: (By asker, new developments)
As of VS2012, T4 now supports reflection over an active project in a single step. This means you can make a change to your code, and the compiled output of the T4 template will reflect the newest version, without requiring you to perform a second reflect/build step. With this capability, I'm marking this as the accepted answer.

You may wish to use CodeDom, so that you only have to build once.
First, I would read this CodeProject article to make sure there are not language-specific features you'd be unable to support without using Reflection.

From what I understand, you could use something like Common Compiler Infrastructure (http://ccimetadata.codeplex.com/) to programatically analyze your existing c# source.
This looks pretty involved to me though, and CCI apparently only has full support for C# language spec 2. A better strategy may be to streamline your existing method instead.

I'm not sure of the best way to do this, but you could do this
As a post-build step on your base dll, run the code generator
As another post-build step, run csc or msbuild to build the generated dll
Other things which depend on the generated dll will also need to depend on the base dll, so the build order remains correct

What is the best way to implement precomputed data?

I have a computation that calculates a resulting percentage based on certain input. But these calculations can take quite some time, which can be annoying. Since there are about 12500 possible inputs, I thought it would be a good idea to precompute all the data, and look this up during normal program execution.
My first idea was to just create a simple file which is read at program initialization and populates some arrays. Although this will work, I would like to know if there are some other options? For example that the array is populated during compile time.
BTW, I'm writing my code in C#.

This tutorial here implements a serializer, which you can use to easily convert an object to a binary file and back. Once you have the serializer in hand, you can just create an object that holds all your data and serialize it; when you actually run your program, just deserialize the object and use it.
This has all the benefits of saving an object to the hard drive, with an implementation that is object-agnostic (meaning you don't have to write much code for any object you want to serialize) and outputs in binary (thus saving space, if that is a concern).

A file with data is probably the easiest and most flexible way to implement it.
If you wanted it in memory without having to read it from somewhere, I would write a program to output your data in C#-like CSV format suitable for copying and pasting into an array/collection initializer, and thereby generate the source code for your precomputed data.

Create a program that outputs valid C# code which initializes your lookup tables. Make this part of your build process so that it will automatically create the source file and then build the rest of your project.

As Daniel Lew said, serialize it into a binary file.
If you need speed, go for a Dictionary. A Dictionary is indexed on it's key, and should allow rapid lookup even with large amounts of data.

I would always start by considering if there was any way to avoid precomputing. If there's 12500 possible inputs, how many are required per user request ? Will all 12500 be needed at the same time or will they be spread out in time ? If you can get by with calculating a few at a time, I'd do that with lazy initialization. I prefer this solution simply because I'll have fewer issues with it in the long run. What do you do when the persistent format changes, or the data changes. How will you handle it when the file is missing or corrupted ? Persisting to a file does not create less code.
I would serialize such a file to a human-readable format if I had to persist a pre-loaded version. I'd probably use xml serialization since it's simple. But quite often there's issues of invalidation and recalculation. Do the values never change or only very infrequently ?

I agree with mquander and Trent. Use your favorite language or script to generate the whole C# file you need to define your data (no copy-pasting, that's a manual step and error-prone). Add it as a Pre-Build event in Visual Studio. You could even detect that you have an up-to-date file and avoid regeneration for most builds.
There is definitely a way to statically generate almost any data using template metaprogramming in C++, although it can be painful. It's not worth it unless you need many sets of different data in several parts of your program. I am not familiar enough with metaprogrammation in C# to evaluate the general effort in your case. You should look into that.

Binary serialization of Silverlight XAML object

I'm working on Silverlight application that needs to display complex 2d vector graphics.
It downloads zipped XAML file from the server, parses it (XamlRead) and injects to the layout root on the page.
This works fine for fairly small xaml files. The problems is that I need to make it work with much bigger file (lots more content in it). For example one of my uncompressed xaml files is 20 MB large and XamlRead method takes tool long to parse it. My question is if is there a way to do all the parsing on the server side. It would best to just store serialized binary output of XamlRead method as BLOB in the database. However when I try to serialize it, I'm getting a message that "Canvas object is not marked as serializable". I will really appreciate any advices .

Silverlight doesn't have much binary serialization built in; however, protobuf-net works on Silverlight and may help plug this gap. In the current build you can only really serialize types you control (due to adding attributes) - however, I'm in the middle of a big refactor to (among other things) add support for serializing types without attributes.
I expect it to be about 2 more weeks before this is available as a (hopefully) stable build, but you're welcome to take a look at it then.
Note that you will still need to give it some help (telling it what you want it to serialize), but it may be useful.
In particular, the data format ("protocol buffers") is designed to be both dense and efficient to process, which should increase the parse speed. See here for more (numbers are from main .NET, not Silverlight)

I've found the SharpSerializer package very easy to use for fast binary serlization in Silverlight: http://www.sharpserializer.com/en/index.html. You do not need to use the Serializable attribute -- however it only serializes public members.

If parsing is really the problem, it might help to use pre-compiled XAMLs called 'BAML'. This is a binary representation of the XAML file. Since the binary format has a much much cheaper parser instead of the too generic XML, this helps a lot. BAML is also used internally by the .NET compiler to generate more compact files.
For more information, see also http://stuff.seans.com/2008/07/13/hello-wpf-world-part-2-why-xaml/

How to get all file attributes including author, title, mp3 tags, etc, in one sweep

I would like to write all meta data (including advanced summary properties) for my files in a windows folder to a csv file. Is there a way to collect all the attributes? I see mp3 files have a different set of attributes compared to jpg files. (c#)
This can also be a script (vb, perl)
Update: by looking at libextractor (thank you) I can see this can be achieved by writing different plugins for different type of files. I gather this meta data is not a simple collection...

In Perl, you can use MP3::Tag or MP3::Info

If you can cope w/ VB.Net: http://www.codeproject.com/KB/vb/mp3id3v1.aspx
If you can cope w/ C++/.Net: http://www.codeproject.com/KB/audio-video/mp3fileinfo.aspx
For either (assuming the C++) is compiled to .Net, you can use Reflector to disassemble the binary and convert it to C#. Check w/ the respective authors about their licenses first (usually Code Project articles are under an open license like CPOL).

In a library? Try libextractor if your software is GPL.

Ok, after the clarification edits, I would suggest looking at the introspection available in .Net. I will warn you however that I think you will get more satisfying results if you forgo introspection and define the specific properties that you want for the file types that you expect to see.
Since scripting is valid, then if this were my problem to solve I would use Powershell since the .net introspection is baked in.

It may not be worth it to add all of the data from a jpeg file (exif data). I would hand pick what attributes I wanted from those files.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.