Hook on compiler / Hook on the output of a method (C# - c++)

Hook on compiler / Hook on the output of a method (C# - c++) - c#

Well to start ill tell u what i have to do.
I have to make a program so students can upload some C++ code from an exercise. And this uploaded code needs to get compared with the "best code" from that exercise. And from those comparisation the server gives back some feedback if the student uploaded good or bad code.
EG: the Exercise is to make an arraylist from 1 to 10 and so the student can upload his code. The server then compares it with some other code and gives feedback.
This is easyer said then done, because it can't be just a file comparer because of the different variables a user can code. Thats why i was tinking of using external compilers to get some output and compare this output with the output of the "best code". Or more detailed to get a hook within the compiler so i can check every method and every variable.
Or any other idea how i can check this or compare?
Or is there already a program that exist?
Very thanks,
Michael

There are systems that assess the output of the whole program (like ejudge, it is used as a contest system), and, in my opinion, it's easier to use them, because specifying conditions on the program code is itself not a trivial task (if your students are writing non-trivial programs).
You can use formal specification languages like ACSL to set input and output conditions and prove that the program works correctly.

It might be more practical to evaluate the code by testing the behaviour against some automated tests, like a boundary test suite (or even a mutant test if you want things to get exciting), and then to look for odd implementations or interesting architectures look at code metrics like number of functions, lines of code, compile size, etc. .
This approach would be a lot more scalable, and there is a lot of free tools, especially for c++ and java that would be easy to set up an automated test system.
Comparing code to determine correctness is not necessarily a correct approach, depending on how it is done, and would be very difficult to scale up.

Related

How force a class to get a UnitTest for every public method?

we are starting with UnitTests now to raise our code quality a lot and (of course) for different other reasons.
Sadly we are very late to the party, so we have classes with ~50 methods which are completly untested until now. And we have a lot of classes!
Is it possible / how is it possible to:
a) see which methods are untested?
b) force a developer who adds a method to a class to write a unit test for it (e.g. I achieve to test all 50 methods and tomorrow someone adds a 51. method)?
c) raise a warning or a error when a method is untested?

a) see which methods are untested?
This is fairly easy with a Code Coverage tool. There's one built into Visual Studio, but there are other options as well.
b) force a developer who adds a method to a class to write a unit test for it (e.g. I achieve to test all 50 methods and tomorrow someone adds a 51. method)?
Don't make hard rules based on coverage!
All experience (including mine) shows that if you set hard code coverage targets, the only result you'll get is that developers will begin to game the system. The result will be worse code.
Here's an example; consider this Reverse method:
public static string Reverse(string s)
{
char[] charArray = s.ToCharArray();
Array.Reverse(charArray);
return new string(charArray);
}
You can write a single test to get 100 % coverage of this method. However, as it's given here, the implementation is brittle: what happens if s is null? The method is going to throw a NullReferenceException.
A better implementation would be to check s for null, and add a branch that handles that case, but unless you write a test for it, code coverage will drop.
If you have developers who don't want to write tests, but you demand a hard code coverage target, such developers will leave the Reverse function as is, instead of improving it.
I've seen such gaming of the system in the wild. It will happen if you institute a rule without buy-in from developers.
You'll need to teach developers how their jobs could become easier if they add good unit tests.
Once developers understand this, they'll add tests themselves, and you don't need any rules.
As long as developers don't understand this, the rule will only hurt you even more, because the 'tests' you'll get out of it will be written without understanding.
c) raise a warning or a error when a method is untested?
As explained above, code coverage is a poor tool if used for an absolute metric. On the other hand, it can be useful if you start monitoring trends.
In general, code coverage should increase until it reaches some plateau where it's impractical to increase it more.
Sometimes, it may even drop, which can be okay, as long as there's a good reason for it. Monitoring coverage trends will tell you when the coverage drops, and then you can investigate. Sometimes, there's a good reason (such as when someone adds a Humble Object), but sometimes, there isn't, and you'll need to talk to the developer(s) responsible for the drop.

To see which methods are untested there are tools for code coverage(either built-in or external like NCrunch, dotCover). It depends on your needs according results format, reports etc.
Those tools can be run either with VS or e.g. while building with build server(like TeamCity, Bamboo). Build can be set to fail when code coverage lowers(it is somehow answer to your c) question).
In my opinion you should force dev to write unit test by some plugins. It should be the part of your process, it can be pointed during code review. What is more - how you would like to know when to force to write a test? I mean - you don't know when the process of the development finishes so it is hard to tell someone "write your test right know".
EDIT:
If you want some free code coverage tool for TC the one I know is OpenCover - you can easily find some information about plugin installation and usage.

That's funny, your section B "force a developer who adds.." was a question I posted just a few minutes ago, and I got so many down votes I removed the question, because apparently, it's not the right approach to force someone to write tests, it should be a "culture" you set in your organization.
So to your question (which is a bit wide, but still can get some direction):
a. You can use existing code coverage tools / libraries or implement your own to achieve this functionality, this is a good post which mentions a few.
What can I use for good quality Code Coverage for C#/.NET?
You can build customization based on such libaries to assist with achieving answer to section c (raise a warning or a error when a method is untested) as well.
b.You should try implementing TDD if you can, and write the tests before you write the method, this way you won't have to force anything.
If you can't do it since the method cannot be changed due to time and resource considerations, do TED (Test eventually developed).
True it won't be a unit test but still you'll have a test which has value.

rewrite a jar execuable file to c#

I've been given the task to rewrite or convert a jar program into another language like C#.
The program is like a virtual time stamper for when workers check in and out. The writer of this program made an unfinished program leaving it more or less useless if it needs an update or changing of users, cause none of the people using it is able to rewrite or edit this code.
My problem is that I have very little to no knowledge or experience with Java programming.
My question is: Would it be easier to start from scratch or could someone with good knowledge in both Java and C# look at it and maybe point in the right direction.
MY efforts: well i tried Java Eclipse but that made me more confused than before, then i tryed a program called Java Decompiler, this is when i found out the writer is not even half done with this program, there are class files for user registration and Admin panels for editing so everything would be accessible through graphic user interface.
Heres a view of the interface (note that faces and names have been whiped due to privacy, also note that some words are in danish just ignore those. "Tids registrering" means: "time registration" and "ikke til stede" means that they are not there)
The goal is to make a program able to do:
1. Save check in/check out times and date for each user.
2. Instead of saving in a *.log it should be saved in a database.
3. Graphic interface for admin access to edit add/edit/remove users
If a Java/C# programmer wants to help we can talk about it in private about how, where and when to transfer files (again due to privacy)
If you find it too difficult or time consuming to convert i could use some help to rewrite from scratch so help from just C# programmers would also be appreciated.
If there is any grammatical and/or spelling errors in this post I'm sorry and apologist

Most likely effort to port UI from Java (either source or byte code) to C# (or any other .Net language) will be comparable or significantly higher than rewrite as frameworks are different.
For non-UI code you may be able to do direct port of most of the code either by hand or with some automated tools as languages are close enough. If original code does not have unit tests/integration tests and not written in a style you are comfortable with than cost of rewrite may be lower.
Ultimately it is your call if the porting effort worth it or not.

c# compile source code from database

I would like to build an application framework that is mainly interpreted.
Say that the source code would be stored in the database that could be edited by the users and always the latest version would be executed.
Can anyone give me some ideas how does one implement sth like this !
cheers,
gabor

In .Net, you can use reflection and CodeDOM to compile code on the fly. But neither approach is really very simple or practical. Mono has some ability to interpret c# on the fly as well, but I haven't looked closely at it yet.
Another alternative is to go with an interpreted .Net language like Boo or IronPython as the language for your database code.
Either way, make sure you think long and hard about the security of your platform. Allowing users to execute arbitrary code is always an exercise fraught with peril. It's often too tempting to look for a simple eval() method, and even if one exists, that is not good enough for this kind of scenario.

Try Mono ( http://www.monoproject.org ). It supports many scripting languages including JavaScript.

If you don't want to use any scripting you can use CodeDOM or Reflection (see Reflection.Emit).
Here are really useful links on the topic :
Dynamically executing code in .Net (Here you can find a tool which can be very helpul)
Late Binding and On-the-Fly Code
Generation Using Reflection in C#
Dynamic Source Code Generation and
Compilation

Usually the Program uses a scripting language for the scriptable parts, i.e. Lua or Javascript.

To answer your technical question: You don't want to write your own language and interpreter. That's too much work for you to do. So pick some other language, say Python or Lua, and look for the documentation that lets your C program hand it blocks of code to execute. Of course, the script needs to be able to do something, so you'll need to find how to expose your program's objects to the script. Also, what will happen if a client is running the program when you update its source code in the database? Should the client restart? Are you going to store the entire program as a single row in this database, or did you want to store individual functions? That affects how you structure your updates.
To address other issues with your question: Why do you want to do this? Making "interpreted language" part of your design spec for a system is not often a good sign. Is the real requirement something like this: "I update the program often and I want users to always have the latest copy?" If so, there are other, better ways to go about this (just give us your actual scenario and requirements).

If it is possible to auto-format code before and after a source control commit, checkout, diff, etc. does a company really need a standard code style?

If it is possible to auto-format code before and after a source control commit, checkout, diff, etc. does a company really need a standard code style?
It feels like standard coding style debates that have been raging since programming began like "put the bracket on the following line" or "properly indent your (" are no longer essential.
I realize in languages where white space matters the diff will have to consider it but for languages where the style is a personal preference is there really a need to worry about it anymore?

Auto-format can really only address whitespace.
It wont address developers giving variables bizarre nonsensical names.
It won't address some developers having functions return null on an error vs throwing an exception.
I'm sure others can think of more examples.

This is what we do at my work:
We all use Eclipse. We don't have a policy for using Eclipse but somehow none of us is an IDEA/IntelliJ guy. We also think our code should be written with legacy in mind. This means our code has to be readable in a certain way even years after (#1) no matter who wrote it and if that person even is in the company anymore.
Eclipse has couple handy features, automatic format on save and a specific Formatter tool. As you can see from the linked screenshot, it can be configured with XML. Thus there's a bunch of premade XML:s available for every worker in our company so that when a new guy comes in, we walk him through of the whole process and configure their Eclipse for them (yes, it's slightly evil thing to do) so that it actually uses those formatting XML:s we have provided. We do not enforce automatic format on save, we don't want to be completely intrusive, we just want to push all our developers into the right directions. For even increased compatability, we mostly use rules defined in JCC.
Next comes the important part, the actual builds. We are those who embrace automatic builds and for that we use Hudson Continuous Integration Server. There's two important parts in our configurations beyond this:
We use CVS loginfo to trigger builds whenever something is committed.
We utilize several plugins available for Hudson, including Continuous Integration Game in conjuction with the most important one, Checkstyle.
The Checkstyle Plugin is the magician in our code style enforcement guide line:
After commiting code to CVS, Hudson build is triggered
After build has been completed succesfully (all unit tests pass etc.), Checkstyle inspects the actual source files
Checkstyle ranks the code based rules we have defined for it
Continuous Integration Game sees the result of Checkstyle and awards/takes away points for the person who has the ownership for the relevant part of the code
Leaderboard shows total points for every commiter in system
Basically this means that when anyone commits ugly code into our CVS, our build server automatically reduces that person's points.
This means that eventually any one of us can be ranked on the Leaderboard based on the general code quality in both look and OO principles such as Law of Demeter, Cyclomatic complexity etc. etc. Naturally this isn't a completely serious statistic, but it's a good indication you're doing something wrong when causing a build to be initiated in our CI won't reduce your points - most of our commits are worth between 1 and 5 points.
And is it working? Sort of, I don't think anyone of us at my work writes ugly or unmaintainable code and personally I love to hunt all kinds of scores so it's definitely motivating me to make code that looks nice and follows all the OO paradigms I know of.
And do we as a company really need it? I think we do as you should see from reading this entire answer, it can be considered a good practice for the advancements it brings.
#1: in a related note, I refactored legacy code from 2002 today which used those standards, didn't look "bad" at all even in its original form and certainly not worse in its new form

No, not really.
If you can actually get it to work consistently and not make it flag code has changed due to a different style of laying the code out.
However, this is just a small part of coding standards. It won't cover multiple return statements, the use or not of ternary operators, etc.

It is always nice if the coding style that the shop uses is the same one that is also followed by the development tools.
Otherwise, if there is a large body of code that already follows a shop standard which is NOT the same as that of the tools you have two choices:
Modify all of the code to follow the tool standard, or
Maintain the existing shop standard.
Many shops do the latter. Regardless, there does need to be some kind of standard, and it does need to be followed.
Some development tools allow you to tweak their standard. In some cases you may be able to bring the tools in alignment with the shop standard.

It probably doesn't matter that much anymore if you can ensure that everybody in the team sees the source code "correctly" formatted, whatever they think it is. However I've not seen a system that can do that - you can do parts of it (say, reformat before and after checkin/checkout) but these days you also have to consider web interfaces into the version control, external code review systems that interact directly with the version control system etc.
The main purpose of a standard code style is (IMHO) to ensure that you can read other team members' code easily without having to start reverse engineering it because all the code is written using the same sort of guiding principles. Indenting and parentheses placement seem to be a major hangup on this but they are only a very small and in my opinion, somewhat overblown and not very important part of the need to make code consistent.
Unfortunately I'm not aware of any tools that can automatically apply consistent coding principles to source code...

Yes, coding styles are needed if there is a desire to have a homogeneous code base. Such a code base can be useful in preventing individual ownership of parts of the code base, which can cause problems when people leave the team. If you can't imagine having wildly different styles and problems understanding all of it, just look at all the different ways English text can be organized in various communications, all written but quite different such as tweets, e-mail, text messages, IM, message board posts, etc. and changes in fonts, capitalization, decorations, etc.

Using reflection for code gen?

I'm writing a console tool to generate some C# code for objects in a class library. The best/easiest way I can actual generate the code is to use reflection after the library has been built. It works great, but this seems like a haphazard approch at best. Since the generated code will be compiled with the library, after making a change I'll need to build the solution twice to get the final result, etc. Some of these issues could be mitigated with a build script, but it still feels like a bit too much of a hack to me.
My question is, are there any high-level best practices for this sort of thing?

Its pretty unclear what you are doing, but what does seem clear is that you have some base line code, and based on some its properties, you want to generate more code.
So the key issue here are, given the base line code, how do you extract interesting properties, and how do you generate code from those properties?
Reflection is a way to extract properties of code running (well, at least loaded) into the same execution enviroment as the reflection user code. The problem with reflection is it only provides a very limited set of properties, typically lists of classes, methods, or perhaps names of arguments. IF all the code generation you want to do can be done with just that, well, then reflection seems just fine. But if you want more detailed properties about the code, reflection won't cut it.
In fact, the only artifact from which truly arbitrary code properties can be extracted is the the source code as a character string (how else could you answer, is the number of characters between the add operator and T in middle of the variable name is a prime number?). As a practical matter, properties you can get from character strings are generally not very helpful (see the example I just gave :).
The compiler guys have spent the last 60 years figuring out how to extract interesting program properties and you'd be a complete idiot to ignore what they've learned in that half century.
They have settled on a number of relatively standard "compiler data structures": abstract syntax trees (ASTs), symbol tables (STs), control flow graphs (CFGs), data flow facts (DFFs), program triples, ponter analyses, etc.
If you want to analyze or generate code, your best bet is to process it first into such standard compiler data structures and then do the job. If you have ASTs, you can answer all kinds of question about what operators and operands are used. If you have STs, you can answer questions about where-defined, where-visible and what-type. If you have CFGs, you can answer questions about "this-before-that", "what conditions does statement X depend upon". If you have DFFs, you can determine which assignments affect the actions at a point in the code. Reflection will never provide this IMHO, because it will always be limited to what the runtime system developers are willing to keep around when running a program. (Maybe someday they'll keep all the compiler data structures around, but then it won't be reflection; it will just finally be compiler support).
Now, after you have determined the properties of interest, what do you do for code generation? Here the compiler guys have been so focused on generation of machine code that they don't offer standard answers. The guys that do are the program transformation community (http://en.wikipedia.org/wiki/Program_transformation). Here the idea is to keep at least one representation of your program as ASTs, and to provide special support for matching source code syntax (by constructing pattern-match ASTs from the code fragments of interest), and provide "rewrite" rules that say in effect, "when you see this pattern, then replace it by that pattern under this condition".
By connecting the condition to various property-extracting mechanisms from the compiler guys, you get relatively easy way to say what you want backed up by that 50 years of experience. Such program transformation systems have the ability to read in source code,
carry out analysis and transformations, and generally to regenerate code after transformation.
For your code generation task, you'd read in the base line code into ASTs, apply analyses to determine properties of interesting, use transformations to generate new ASTs, and then spit out the answer.
For such a system to be useful, it also has to be able to parse and prettyprint a wide variety of source code langauges, so that folks other than C# lovers can also have the benefits of code analysis and generation.
These ideas are all reified in the
DMS Software Reengineering Toolkit. DMS handles C, C++, C#, Java, COBOL, JavaScript, PHP, Verilog, ... and a lot of other langauges.
(I'm the architect of DMS, so I have a rather biased view. YMMV).

Have you considered using T4 templates for performing the code generation? It looks like it's getting much more publicity and attention now and more support in VS2010.
This tutorial seems database centric but it may give you some pointers: http://www.olegsych.com/2008/09/t4-tutorial-creatating-your-first-code-generator/ in addition there was a recent Hanselminutes on T4 here: http://www.hanselminutes.com/default.aspx?showID=170.
Edit: Another great place is the T4 tag here on StackOverflow: https://stackoverflow.com/questions/tagged/t4
EDIT: (By asker, new developments)
As of VS2012, T4 now supports reflection over an active project in a single step. This means you can make a change to your code, and the compiled output of the T4 template will reflect the newest version, without requiring you to perform a second reflect/build step. With this capability, I'm marking this as the accepted answer.

You may wish to use CodeDom, so that you only have to build once.
First, I would read this CodeProject article to make sure there are not language-specific features you'd be unable to support without using Reflection.

From what I understand, you could use something like Common Compiler Infrastructure (http://ccimetadata.codeplex.com/) to programatically analyze your existing c# source.
This looks pretty involved to me though, and CCI apparently only has full support for C# language spec 2. A better strategy may be to streamline your existing method instead.

I'm not sure of the best way to do this, but you could do this
As a post-build step on your base dll, run the code generator
As another post-build step, run csc or msbuild to build the generated dll
Other things which depend on the generated dll will also need to depend on the base dll, so the build order remains correct

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.