Scriptable Windows Disassembler [non cygwin] - c#

I'm currently trying to implement something that combines reverse engineering and graph theory. Therefore I'd like to disassemble PE binaries. There're some very sophisticated tools to do so, like IDA or w32dasm. Latter seems to be dead.
IDA is not scriptable - as far as I know.
The reason why I want a scriptable disassembler is, that I implement my program in C#. It gets a binary, and therefore it has to get the opcode somehow. I think I need to call some helping program with arguments. IDA cannot be called without GUI. It doesn't offer real cmdline options.
Any ideas?
Thanks,
wishi

IDA has a built-in scripting language called IDC. Lots of examples here. Also, IDA can be called without a GUI - consult the documentation for idaw.exe.

IDA can be scripted with Python. Version 5.5 even comes bundled with idapython.

[dumpbin /disasm](http://msdn.microsoft.com/en-us/library/xtf7fdaz(VS.71).aspx) should do the trick. You could also script CDB to do it.

Related

Tool to analyze the executable

is there a tool that analyzes the executable and detects:
- the programming language used (compiler),
- frameworks used (Qt, Gtk, .Net, WxWidgets etc),
- other useful information (compression, etc.).
I know it is quite hard to tell the programming language sometimes (especially in C or Pascal exes), but it is possible to tell the language or compiler used? (Delphi generates exes differently, VB6 too for instance).
It may be possible eg. with dependency analysis of the dlls, headers etc.
Thanks.
On GNU, you can use several tools to try to guess the informartion you want :
ldd to resolv shared libraries linked to the binary
nm to list symbols Imported/exporeted by the binary
strings, which can dump the strings embedded in the binary
objdump can be useful too
A hex editor can be useful too.
I guess there are similar tools on the windows plateform. Dumpbin.exe is something similar to nm, and depends.exe to ldd iirc.
Btw, java is often bytecode compiled, not native.
I have used in the past (uni) PEInfo, but it did not give information you want. After that I used reflector as I knew my dll/exe where .net
But I think there is no software to do that.
Workaround: Best thing you can do is look in the strings of exe (for example use Process explorer) and guess yourself.
Open your executable in a binary file viewer and look for strings that look like names of the functions. These strings are not always available, but in certain cases they are present. They can be used to resolve links with DLLs for example. After that google those strings. There is a chance that they will tell you something.
i dont think there is a possible way to do this correctly. Maybe some basic programming languages can be detected but nobody can detect frameworks used. There are thousands of frameworks.

Running CPython functions from C#

I'm working on a project where I need to be able to run a python function that depends on SciPy/NumPy. Due to this being an add-on to a project already in progress, using IronPython would not be an option.
Additional info:
Python.NET seemed to be a good fit, but I was unable to get the return value from RunString() (it would only return NULL).
Passing arguments and catching the return value (a tuple) is necessary.
The function is in a statistical package that was created by a support group for the team, so modification of that would also not be possible.
I'm at quite a loss for what to do. Any hints in the right direction are appreciated. Thanks for any help you can give!
I understand that this may be quite vague, but I cannot give explicit details to the project. If any clarification is needed please let me know and I'll do my best!
I guess you could write a DLL that uses the CPython API to expose the function, then call it in C#?
It's possible to embed the Python interpreter; although I've never done this personally, I guess it would be useful: http://docs.python.org/extending/embedding.html
Does it need to be portable beyond Windows? If not, perhaps you can embed the CPython interpreter with C++/CLI, wrap that in a nice .Net-ish interface and use the resulting code from C#. Never tried that, so I don't know if it's going to work.
Regardless if you go through this route or the 'write a native DLL' route, it will probably be easier to to embed python using Boost.Python, though I'm not sure if your wrapper code enough is going to be large enough to make all of this (compiling the Boost behemoth, learning Boost.Python, making sure it works with C++/CLR, increasing your target file size) worth it.
IronPython using DLR might be the way to go. Mind you it won't be the fastest way, but it seems like something worth pursuing if you're going to do this a lot. Another useful link
The ironclad project was started to allow using CPython extensions from IronPython, especially SciPy/NumPy it seems. I don't know how usable it is (and how actively it is still being developed)

c# compile source code from database

I would like to build an application framework that is mainly interpreted.
Say that the source code would be stored in the database that could be edited by the users and always the latest version would be executed.
Can anyone give me some ideas how does one implement sth like this !
cheers,
gabor
In .Net, you can use reflection and CodeDOM to compile code on the fly. But neither approach is really very simple or practical. Mono has some ability to interpret c# on the fly as well, but I haven't looked closely at it yet.
Another alternative is to go with an interpreted .Net language like Boo or IronPython as the language for your database code.
Either way, make sure you think long and hard about the security of your platform. Allowing users to execute arbitrary code is always an exercise fraught with peril. It's often too tempting to look for a simple eval() method, and even if one exists, that is not good enough for this kind of scenario.
Try Mono ( http://www.monoproject.org ). It supports many scripting languages including JavaScript.
If you don't want to use any scripting you can use CodeDOM or Reflection (see Reflection.Emit).
Here are really useful links on the topic :
Dynamically executing code in .Net (Here you can find a tool which can be very helpul)
Late Binding and On-the-Fly Code
Generation Using Reflection in C#
Dynamic Source Code Generation and
Compilation
Usually the Program uses a scripting language for the scriptable parts, i.e. Lua or Javascript.
To answer your technical question: You don't want to write your own language and interpreter. That's too much work for you to do. So pick some other language, say Python or Lua, and look for the documentation that lets your C program hand it blocks of code to execute. Of course, the script needs to be able to do something, so you'll need to find how to expose your program's objects to the script. Also, what will happen if a client is running the program when you update its source code in the database? Should the client restart? Are you going to store the entire program as a single row in this database, or did you want to store individual functions? That affects how you structure your updates.
To address other issues with your question: Why do you want to do this? Making "interpreted language" part of your design spec for a system is not often a good sign. Is the real requirement something like this: "I update the program often and I want users to always have the latest copy?" If so, there are other, better ways to go about this (just give us your actual scenario and requirements).

how to transition from C# to python?

i feel like i'm going back to the stone age.
how do i relearn to develop without intellisense (pydev intellisense doesn't count).. in general, how does one successfully leave the comfort of visual studio ?
I recently learned python with a strong C# background.
My advise: Just do it. Sorry, couldn't resist but I'm also serious. Install python and read: Python.org documentation (v2.6). A book might help too -- I like the Python PhraseBook. From there, I started using python to implement solutions for various things. Most notably, ProjectEuler.net questions.
It forced me to consider the languages and built in data structures.
Python is truly easy to use and intuitive. To learn the basics, took me about an hour. To get pretty good with it, took around 5 hours. Of course, there is always more to learn.
Also, I want to note that I would discourage using IronPython or Jython because I feel learning core, regular python is the first step.
Python has rich "introspection" features. In particular, you can find out a lot about built-in features using a command called help() from the Python command line.
Suppose you want to use regular expressions, and want to find out how to use them.
>>> import re
>>> help(re)
You get a nice display of information, automatically shown to you a page at a time (hit the space bar to see the next page).
If you already know you want to use the sub() function from the re module, you can get help on just that:
>>> help(re.sub)
And this help() feature will even work on your own code, as long as you define Python docstrings for your modules, classes, and functions.
You can enable features in the vim editor (or gvim, or vim for Windows) that enable an "IntelliSense"-like autocompletion feature, and you can use Exuberant Ctags to generate hyperlink "tags" to let you navigate quickly through your code. These turn vim into something that is roughly as powerful as an IDE, with the full power of vim for editing. (There isn't an explicit refactoring tool built in to vim, but there are options available.
And as others have noted, you can get IDEs for Python too. I've used the Wingware IDE, and I recommend it. I try to do most of my work with free, open-source software, but that is one piece of proprietary software I would be willing to buy. I have also used Eclipse with the Pydev plugin (I used its refactoring tool and it worked fine).
P.S. Python has a richer feature set than C#, albeit at the cost that your code won't run as fast. Once you get used to Python, you won't feel like you are in the stone age anymore.
One step at a time?
Start off with simple programs (things you can write with your eyes closed in C#), and keep going... You will end up knowing the API by heart.
<rant>
This is sort of the reason that I think being a good visual studio user makes you a bad developer. Instead of learning an API, you look in the object browser until you find something that sounds more or less like what you are looking for, instantiate it, then hit . and start looking for what sounds like the right thing to use. While this is very quick, it also means it takes forever to learn an API in any depth, which often means you end up either re-inventing the wheel (because the wheel is buried under a mountain worth of household appliances and you had no idea it was there), or just doing things in a sub-optimal way. Just because you can find A solution quickly, doesn't mean it is THE BEST solution.
</rant>
In the case of .NET, which ships with about a billion APIs for everything under the sun, this is actually preferred. You need to sift through a lot of noise to find what you are aiming for.
Pythonic code favors one, obvious way to do any given thing. This tends to make the APIs very straight forward and readable. The trick is to learn the language and learn the APIs. It is not terribly hard to do, especially in the case of python, and while not being able to rely on intellisense may be jarring at first, it shouldn't take more then a week or two before you get used to it not being there.
Learn the language and the basic standard libraries with a good book. When it comes to a new library API, I find the REPL and some exploratory coding can get me to a pretty good understanding of what is going on fairly quickly.
You could always start with IronPython and continue to develop in Visual Studio.
The python ide from wingware is pretty nice. Thats what I ended up using going from visual studio C++/.net to doing python.
Don't worry about intellisense. Python is really simple and there really isn't that much to know, so after a few projects you'll be conceiving of python code while driving, eating, etc., without really even trying. Plus, python's built in text editor (IDLE) has a wimpy version of intellisense that has always been more than sufficient for my purposes. If you ever go blank and want to know what methods/properties and object has, just type this in an interactive session:
dir(myobject)
Same way you do anything else that doesn't have IntelliStuff.
I'm betting you wrote that question without the aid of an IntelliEnglish computer program that showed you a list of verbs you could use and automatically added punctuation at the end of your sentences, for example. :-)
I use Eclipse and PyDev, most of the time, and the limited auto-completion help that it provides is pretty useful.
It's not ever going to come up to the level of VS's IntelliSense, and it can't, because of the dynamic nature of Python. But there are compensations, big ones.
The biggest is the breaking of the code-compile-test cycle. It's so easy to write and test prototype code in IDLE that very often it's where I go first: I'll sketch out and test a couple of methods that have to interoperate, figure out that there's something I don't know, learn it, fix my test, and then port the whole thing over to PyDev and watch it work the first time.
Another is that it's a lot simpler. It's really important to know what the standard modules are, and what they do, but for the most part that can be picked up with a little reading. I only use a small handful of modules in my everyday programming - itertools, os, csv (yeah, well), datetime, StringIO - and everything else is there if I need it, but usually I don't.
The stuff that it's really important to know is stuff that IntelliSense couldn't help you with anyway. Auto-completion isn't going to make
self.__dict__.update(kwargs)
make a damn bit of sense; you have to learn what an amazing line of code that is, and why you'd write it, by yourself.
Then you'll think, "how would I even begin to implement something like that in C#?" and realize that the tools these stone-age people are using are a little more sophisticated than you think.
A mistake people coming from C# or Java make a lot is recreate a large class hierarchy like is common in those languages. With Python this is almost never useful. Instead, learn about duck typing. And don't worry too much about the lack of static typing: in practice, it's almost never an issue.
I'm not too familiar with Python so I'll give some general advice. Make sure to print out some reference pages of common functions. Keep them by you(or pinned to a wall or desk) at all times when your programming.. I'm traditionally a "stone age" developer and after developing in VS for a few months I'm finding my hobby programming is difficult without intellisense.. but keep at it to memorize the most common functions and eventually it'll just be natural
You'll also have to get use to a bit more "difficult" of debugging(not sure how that works with python)
While your learning though, you should enjoy the goodness(at least compared to C#) of Python even if you don't have intellisense.. hopefully this will motivate you!
I've only ever used IDLE for python development and it is not fun in the least. I would recommend getting your favorite text editor, mine is notepad++, and praying for a decent plugin for it. I only ever had to go from Eclipse to Visual Studio so my opinions are generally invalid in these contexts.
Pyscripter does a reasonable job at auto-completion/intellisense. I've been using it recently as I started to learn Python/Django, where I've been mainly a C# developer for the last few years.
I suggest going cold turkey - languages like Python shine with great text editors. Choose one you want to become amazing at (vim, emacs, etc.) and never look back.
I use Komodo Edit and it does reasonably good guessing at the autocompletion.
Others have suggested several editors that have intellisense-like capabilities. Try them out.
Also install ipython and use that to learn the language interactively. It is like a souped up version of the regular python interactive shell with lots and lots of added capabilities, and one of the most useful it its extensive context sensitive tab-completion and help.
For example if you type
import r<tab>
it will show all the modules you can import starting with r
import re
re.<tab>
will show all the objects in the re module
re.compile?
will show the docstring and other information about the re.compile function, automatically piping it through a pager if it is longer than a screenful.
re.compile??
will show the source code as well, if available.
Using this I find it is much faster to switch to ipython and query objects directly than it is to look up anything in the docs. You have access to the usual python help() system as well.
ipython has lots and lots of other features - far too many to cover in a short post.

Lex/Yacc for C#?

Actually, maybe not full-blown Lex/Yacc. I'm implementing a command-interpreter front-end to administer a webapp. I'm looking for something that'll take a grammar definition and turn it into a parser that directly invokes methods on my object. Similar to how ASP.NET MVC can figure out which controller method to invoke, and how to pony up the arguments.
So, if the user types "create foo" at my command-prompt, it should transparently call a method:
private void Create(string id) { /* ... */ }
Oh, and if it could generate help text from (e.g.) attributes on those controller methods, that'd be awesome, too.
I've done a couple of small projects with GPLEX/GPPG, which are pretty straightforward reimplementations of LEX/YACC in C#. I've not used any of the other tools above, so I can't really compare them, but these worked fine.
GPPG can be found here and GPLEX here.
That being said, I agree, a full LEX/YACC solution probably is overkill for your problem. I would suggest generating a set of bindings using IronPython: it interfaces easily with .NET code, non-programmers seem to find the basic syntax fairly usable, and it gives you a lot of flexibility/power if you choose to use it.
I'm not sure Lex/Yacc will be of any help. You'll just need a basic tokenizer and an interpreter which are faster to write by hand. If you're still into parsing route see Irony.
As a sidenote: have you considered PowerShell and its commandlets?
Also look at Antlr, which has C# support.
Still early CTP so can't be used in production apps but you may be interested in Oslo/MGrammar:
http://msdn.microsoft.com/en-us/oslo/
Jison is getting a lot of traction recently. It is a Bison port to javascript. Because of it's extremely simple nature, I've ported the jison parsing/lexing template to php, and now to C#. It is still very new, but if you get a chance, take a look at it here: https://github.com/robertleeplummerjr/jison/tree/master/ports/csharp/Jison
If you don't fear alpha software and want an alternative to Lex / Yacc for creating your own languages, you might look into Oslo. I would recommend you to sit through session recordings of sessions TL27 and TL31 from last years PDC. TL31 directly addresses the creation of Domain Specific Languages using Oslo.
Coco/R is a compiler generator with a .NET implementation. You could try that out, but I'm not sure if getting such a library to work would be faster than writing your own tokenizer.
http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/
I would suggest csflex - C# port of flex - most famous unix scanner generator.
I believe that lex/yacc are in one of the SDKs already (i.e. RTM). Either Windows or .NET Framework SDK.
Gardens Point Parser Generator here provides Yacc/Bison functionality for C#. It can be donwloaded here. A usefull example using GPPG is provided here
As Anton said, PowerShell is probably the way to go. If you do want a lex/ yacc implementation then Malcolm Crowe has a good set.
Edit: Direct Link to the Compiler Tools
Just for the record, implementation of lexer and LALR parser in C# for C#:
http://code.google.com/p/naive-language-tools/
It should be similar in use to Lex/Yacc, however those tools (NLT) are not generators! Thus, forget about speed.

Categories

Resources