A C# programmer rewrote a Delphi 6 program (no GUI, just files-in-files-out grinding, about 50 procedures and functions totaling less than 1200 lines == 57kb keystrokes) that lives as a single .DPR file.
He delivered a project containing 58 files (52 of them .CS files) in 13 folders nested to various degrees, totaling over 330kb.
Is that typical of C# projects? What strategy do C# programmers typically use to decide how to chop up and organize their project?
Code-file size is a horrible metric to determine the worth of a project, especially in line-of-business projects. Three reasons for that:
1) Small code files are easier to understand than large ones, but this can lead to some repetition of certain constructs (using declarations, namespace declaration, etc.) and certainly adds to the number of files in the project.
2) Small classes are easier to understand than large ones. This is a major benefit for newcomers to the code. If they can wrap their head around any one class, they can expand their understanding outward from there.
3) Good code is larger than small code. When you add decent error-checking, documentation and descriptive method/variable names, your code is more resilient and maintainable, but also much larger. That's perfectly okay.
Now with that all said, of course there are plenty of cases where the code is big simply because the programmer doesn't know what they're doing. You'll be able to identify that by looking at the largest files; if you see a lot of repetition of precisely the same code... or if you see lots and lots of string concatenation.... or you don't see any comments at all (or the comments don't tell you anything useful) then you probably have some good old-fashioned code bloat on your hands.
It's more an artifact of the developer using Visual Studio IDE (VS) rather than an issue of C#/.NET itself. The tendancy, when using VS tools, is to put each class in its own .cs file because the Solution Explorer window shows files/folders in a tree-like structure allowing the programmer to visually target their classes quickly.
Also the Visual Studio Add New Item dialog encourages a one-class-per-file approach by generating a new file each time you add a Class to your project.
The namespace hierarchy of a program is usually mimicked using directory folders in Solution Explorer (although it's not required to match) but this is just another visual quickie.
Example:
(source: spaanjaars.com)
If the programmer were to work outside of the Visual Studio environment you'd likely have much less diarrhea on your hands. Ewww...
Without seeing the actual code of the original or the new code I can't tell you if the new organization is properly designed code just by knowing the line count, method count, and file size. In C# I usually :
Separate each class and interface into its own files.
Static helper methods are grouped by function
I usually separate files into folders by layers. Ex. GUI layer, Business Logic Layer, etc...
Extension methods are separated by the class or interface they relate to or sometimes by function.
Now, the new code could be broken up to follow a more object oriented design, but I can't tell without seeing the code.
Delphi is a great language, but its not a magical language. So no, what you are seeing is not a typical scenario.
Without knowing anything about what your program does its hard to impossible to make any meaningful comment about why your programmer decided to a) rewrite it and b) why the disparity when he did.
I will say this though, its common that when developers do not understand someone else's source, and especially when they do understand the requirements they will choose to rewrite rather than refactor. It's something we see in this industry time and time again.
Related
Coming from Java , I'm used to the package structure (com.domain.appname.tier)
Now I've started working on a C# project , where all the projects have depth of 1:
i.e
ProjectA
- Utilities.cs
- Validation.cs
- ....
- Extraction.cs
and all the cs files are around 2,500 lines long ...
How do you order your classes and namespaces in C# so it will make sense , and keep the source file in logical size ?
The same way as I'd imagine you do in Java:
A few (< 10?) classes in each namespace, with namespaces arranged in a hierarchy
One class per source file
One or two screenfuls of text per source file
The project you've joined doesn't sound very structured and isn't a good example of good source code organisation.
In a similar way in Java, you just need to make some effort :) Some C# developers, especially with VB background, tend to write looooong classes and put them at the top level.
I would suggest reading Microsoft guidelines on the subject:
Design Guidelines for Developing Class Libraries
In particular you should look at the following section:
Guidelines for Names
Even if you are not writing a class library you may still benefit a lot from these guidelines. FxCop (or Code Analysis as it is named now) will flag many constructs that are not in accordance with these guidelines.
I would first start grouping the classes together into areas of functionality, areas around authorisation for example would go under a folder within a project.
Then update the namespaces of the classes in the folder to reflect the change, Resharper does this for you and newer versions of VS will probably do too.
Lastly (if you are able) I would start to break the classes to smaller more manageable size.
Here's an example of how I organize my solutions, which mirrors the namespace structure.
The project has a default namespace which, in this case, is CompanyName.ProjectName
Source files are organized logically into a directory structure. In the example, my WF4 activity designers are organized under Activities in a folder called Designers.
The way VS works is that, as you create directories in a project, you are also creating namespaces. So, if I were to add a new activity designer called "Foo" in the shown directory, its namespace would be
"CompanyName.ProjectName.Activities.Designers"
Visual studio takes the default namespace, then uses the folder structure to determine the namespace for a particular file. Of course, once the file is created, and you move a file, it isn't automatically refactored. But the system works very well for not only controlling namespaces for classes, but also for keeping files organized.
The same way as you would in Java.
In Java, packages organize classes in physical directories. I'm not sure about this, but the compiler even encourages this convention IIRC. In C# you're not obliged to organize your classes into separate directories that match your namespaces, but it's a very common convention though.
Speaking of namespaces in C#, they do not follow the com.domain.appname.tier convention, but use the Company.Product.Tier format.
How to reorganize large classes depends on the application. This is an exercise in applying OOP guidelines and applies to both Java and C#.
if you are deeply engaged in the project ,i recommend investing some time in redesinging the stucture the way you used to in java ,considering that packages are equivalent to namespaces in c#.
If it is possible to auto-format code before and after a source control commit, checkout, diff, etc. does a company really need a standard code style?
It feels like standard coding style debates that have been raging since programming began like "put the bracket on the following line" or "properly indent your (" are no longer essential.
I realize in languages where white space matters the diff will have to consider it but for languages where the style is a personal preference is there really a need to worry about it anymore?
Auto-format can really only address whitespace.
It wont address developers giving variables bizarre nonsensical names.
It won't address some developers having functions return null on an error vs throwing an exception.
I'm sure others can think of more examples.
This is what we do at my work:
We all use Eclipse. We don't have a policy for using Eclipse but somehow none of us is an IDEA/IntelliJ guy. We also think our code should be written with legacy in mind. This means our code has to be readable in a certain way even years after (#1) no matter who wrote it and if that person even is in the company anymore.
Eclipse has couple handy features, automatic format on save and a specific Formatter tool. As you can see from the linked screenshot, it can be configured with XML. Thus there's a bunch of premade XML:s available for every worker in our company so that when a new guy comes in, we walk him through of the whole process and configure their Eclipse for them (yes, it's slightly evil thing to do) so that it actually uses those formatting XML:s we have provided. We do not enforce automatic format on save, we don't want to be completely intrusive, we just want to push all our developers into the right directions. For even increased compatability, we mostly use rules defined in JCC.
Next comes the important part, the actual builds. We are those who embrace automatic builds and for that we use Hudson Continuous Integration Server. There's two important parts in our configurations beyond this:
We use CVS loginfo to trigger builds whenever something is committed.
We utilize several plugins available for Hudson, including Continuous Integration Game in conjuction with the most important one, Checkstyle.
The Checkstyle Plugin is the magician in our code style enforcement guide line:
After commiting code to CVS, Hudson build is triggered
After build has been completed succesfully (all unit tests pass etc.), Checkstyle inspects the actual source files
Checkstyle ranks the code based rules we have defined for it
Continuous Integration Game sees the result of Checkstyle and awards/takes away points for the person who has the ownership for the relevant part of the code
Leaderboard shows total points for every commiter in system
Basically this means that when anyone commits ugly code into our CVS, our build server automatically reduces that person's points.
This means that eventually any one of us can be ranked on the Leaderboard based on the general code quality in both look and OO principles such as Law of Demeter, Cyclomatic complexity etc. etc. Naturally this isn't a completely serious statistic, but it's a good indication you're doing something wrong when causing a build to be initiated in our CI won't reduce your points - most of our commits are worth between 1 and 5 points.
And is it working? Sort of, I don't think anyone of us at my work writes ugly or unmaintainable code and personally I love to hunt all kinds of scores so it's definitely motivating me to make code that looks nice and follows all the OO paradigms I know of.
And do we as a company really need it? I think we do as you should see from reading this entire answer, it can be considered a good practice for the advancements it brings.
#1: in a related note, I refactored legacy code from 2002 today which used those standards, didn't look "bad" at all even in its original form and certainly not worse in its new form
No, not really.
If you can actually get it to work consistently and not make it flag code has changed due to a different style of laying the code out.
However, this is just a small part of coding standards. It won't cover multiple return statements, the use or not of ternary operators, etc.
It is always nice if the coding style that the shop uses is the same one that is also followed by the development tools.
Otherwise, if there is a large body of code that already follows a shop standard which is NOT the same as that of the tools you have two choices:
Modify all of the code to follow the tool standard, or
Maintain the existing shop standard.
Many shops do the latter. Regardless, there does need to be some kind of standard, and it does need to be followed.
Some development tools allow you to tweak their standard. In some cases you may be able to bring the tools in alignment with the shop standard.
It probably doesn't matter that much anymore if you can ensure that everybody in the team sees the source code "correctly" formatted, whatever they think it is. However I've not seen a system that can do that - you can do parts of it (say, reformat before and after checkin/checkout) but these days you also have to consider web interfaces into the version control, external code review systems that interact directly with the version control system etc.
The main purpose of a standard code style is (IMHO) to ensure that you can read other team members' code easily without having to start reverse engineering it because all the code is written using the same sort of guiding principles. Indenting and parentheses placement seem to be a major hangup on this but they are only a very small and in my opinion, somewhat overblown and not very important part of the need to make code consistent.
Unfortunately I'm not aware of any tools that can automatically apply consistent coding principles to source code...
Yes, coding styles are needed if there is a desire to have a homogeneous code base. Such a code base can be useful in preventing individual ownership of parts of the code base, which can cause problems when people leave the team. If you can't imagine having wildly different styles and problems understanding all of it, just look at all the different ways English text can be organized in various communications, all written but quite different such as tweets, e-mail, text messages, IM, message board posts, etc. and changes in fonts, capitalization, decorations, etc.
I'm asking this because I find it quite a dangerous feature to distribute the class definition so that you can't really be sure if you know all about it. Even if I find three partial definitions, how do I know that there's not a fourth somewhere?
I'm new to C# but have spent 10 years with C++, maybe that's why I'm shaken up?
Anyway, the "partial" concept must have some great benefit, which I'm obviously missing. I would love to learn more about the philosophy behind it.
EDIT: Sorry, missed this duplicate when searching for existing posts.
Partial classes are handy when using code generation. If you want to modify a generated class (rather than inheriting from it) then you run the risk of losing your changes when the code is regenerated. If you are able to define your extra methods etc in a separate file, the generated parts of the class can be re-created without nuking your hand-crafted code.
The great benefit is to hide computer generated code (by the designer).
Eric Lippert has a recent blog post about the partial-keyword in general.
Another usage could be to give nested classes their own file.
An other point is, that when a class implements multiple interfaces, you can split the interface implementations on diffrent files.
So every code file has only the code that belongs to the interface implementation. It´s according to separation of concerns concept.
Two people editing the same class, and autogenerated designer code are the two immediate features I can see that were solved by partial classes and methods.
Having designer generated code in a separate file is a lot easier to work with compared to 1.1, where you code could often be mangled by Visual Studio (in windows forms).
Visual Studio still makes a mess of syncing the designer file, code behind and design file with ASP.NET.
If you have some kind of absurdly large class that for some reason are unable or not allowed to logically break apart into smaller classes then you can at least physically break them into multiple files in order to work with it more effectively. Essentially, you can view small chunks at a time avoiding scrolling up and down.
This might apply to legacy code that perhaps due to some arcane policy are not allowed to mess with the existing API because of numerous and entrenched dependencies.
Not necessarily the best use of partial classes, but certainly gives you an alternate option to organize code you might not be able to otherwise modify.
maybe its too late but please let me to add my 2 cents too:
*.When working on large projects, spreading a class over separate files allows multiple programmers to work on it simultaneously.
*.You can easily write your code (for extended functionality) for a VS.NET generated class. This will allow you to write the code of your own need without messing with the system generated code
I'm writing a console tool to generate some C# code for objects in a class library. The best/easiest way I can actual generate the code is to use reflection after the library has been built. It works great, but this seems like a haphazard approch at best. Since the generated code will be compiled with the library, after making a change I'll need to build the solution twice to get the final result, etc. Some of these issues could be mitigated with a build script, but it still feels like a bit too much of a hack to me.
My question is, are there any high-level best practices for this sort of thing?
Its pretty unclear what you are doing, but what does seem clear is that you have some base line code, and based on some its properties, you want to generate more code.
So the key issue here are, given the base line code, how do you extract interesting properties, and how do you generate code from those properties?
Reflection is a way to extract properties of code running (well, at least loaded) into the same execution enviroment as the reflection user code. The problem with reflection is it only provides a very limited set of properties, typically lists of classes, methods, or perhaps names of arguments. IF all the code generation you want to do can be done with just that, well, then reflection seems just fine. But if you want more detailed properties about the code, reflection won't cut it.
In fact, the only artifact from which truly arbitrary code properties can be extracted is the the source code as a character string (how else could you answer, is the number of characters between the add operator and T in middle of the variable name is a prime number?). As a practical matter, properties you can get from character strings are generally not very helpful (see the example I just gave :).
The compiler guys have spent the last 60 years figuring out how to extract interesting program properties and you'd be a complete idiot to ignore what they've learned in that half century.
They have settled on a number of relatively standard "compiler data structures": abstract syntax trees (ASTs), symbol tables (STs), control flow graphs (CFGs), data flow facts (DFFs), program triples, ponter analyses, etc.
If you want to analyze or generate code, your best bet is to process it first into such standard compiler data structures and then do the job. If you have ASTs, you can answer all kinds of question about what operators and operands are used. If you have STs, you can answer questions about where-defined, where-visible and what-type. If you have CFGs, you can answer questions about "this-before-that", "what conditions does statement X depend upon". If you have DFFs, you can determine which assignments affect the actions at a point in the code. Reflection will never provide this IMHO, because it will always be limited to what the runtime system developers are willing to keep around when running a program. (Maybe someday they'll keep all the compiler data structures around, but then it won't be reflection; it will just finally be compiler support).
Now, after you have determined the properties of interest, what do you do for code generation? Here the compiler guys have been so focused on generation of machine code that they don't offer standard answers. The guys that do are the program transformation community (http://en.wikipedia.org/wiki/Program_transformation). Here the idea is to keep at least one representation of your program as ASTs, and to provide special support for matching source code syntax (by constructing pattern-match ASTs from the code fragments of interest), and provide "rewrite" rules that say in effect, "when you see this pattern, then replace it by that pattern under this condition".
By connecting the condition to various property-extracting mechanisms from the compiler guys, you get relatively easy way to say what you want backed up by that 50 years of experience. Such program transformation systems have the ability to read in source code,
carry out analysis and transformations, and generally to regenerate code after transformation.
For your code generation task, you'd read in the base line code into ASTs, apply analyses to determine properties of interesting, use transformations to generate new ASTs, and then spit out the answer.
For such a system to be useful, it also has to be able to parse and prettyprint a wide variety of source code langauges, so that folks other than C# lovers can also have the benefits of code analysis and generation.
These ideas are all reified in the
DMS Software Reengineering Toolkit. DMS handles C, C++, C#, Java, COBOL, JavaScript, PHP, Verilog, ... and a lot of other langauges.
(I'm the architect of DMS, so I have a rather biased view. YMMV).
Have you considered using T4 templates for performing the code generation? It looks like it's getting much more publicity and attention now and more support in VS2010.
This tutorial seems database centric but it may give you some pointers: http://www.olegsych.com/2008/09/t4-tutorial-creatating-your-first-code-generator/ in addition there was a recent Hanselminutes on T4 here: http://www.hanselminutes.com/default.aspx?showID=170.
Edit: Another great place is the T4 tag here on StackOverflow: https://stackoverflow.com/questions/tagged/t4
EDIT: (By asker, new developments)
As of VS2012, T4 now supports reflection over an active project in a single step. This means you can make a change to your code, and the compiled output of the T4 template will reflect the newest version, without requiring you to perform a second reflect/build step. With this capability, I'm marking this as the accepted answer.
You may wish to use CodeDom, so that you only have to build once.
First, I would read this CodeProject article to make sure there are not language-specific features you'd be unable to support without using Reflection.
From what I understand, you could use something like Common Compiler Infrastructure (http://ccimetadata.codeplex.com/) to programatically analyze your existing c# source.
This looks pretty involved to me though, and CCI apparently only has full support for C# language spec 2. A better strategy may be to streamline your existing method instead.
I'm not sure of the best way to do this, but you could do this
As a post-build step on your base dll, run the code generator
As another post-build step, run csc or msbuild to build the generated dll
Other things which depend on the generated dll will also need to depend on the base dll, so the build order remains correct
Since yesterday, I am analyzing one of our project with Ndepend (free for most of its features) and more I am using it, and more I have doubt about the real value of this type of software (code-analysis software).
Let me explain, The system build a report about the health of the system and class by Rank every metric. I thought it would be a good starting point to do modifications but most of the top result are here because they have over 100 lines inside the class (we have big headers and we do use VS comments styles) so it's not a big deal... than the number of Afferent Coupling level (CA) is always too high and this is almost very true for Interface that we used a lot... so at this moment I do not see something wrong but NDepend seem to do not like it (if you have suggestion to improve that tell me because I do not see the need for). It's the samething for the metric called "NOC" for Number of children that most of my Interface are too high...
For the moment, the only very useful metric is the Cyclomatic Complexity...
My question is : Do you find is worth it to analyse code with Automatic Code Analyser like NDepend? If yes, how do you filter all information that I have mentionned that doesn't really show the real health of the system?
Actually metrics are just one feature of NDepend, did you try to use VisualNDepend that lets you analyze your project much more in depth than the report? By reading your comment, I am almost sure you didn't play with NDepend UI (standalone or integrated in Visual Studio) which is the best way to filter data about your code base.
I am one of the developers of NDepend and we use it a lot to analyze our own code. Basically we write our own quality rules with Code Rules over LINQ Queries (CQLinq). These rules automatically make sure that we don't have regression on our design. Here you'll find the list of around 200 default code rules.
Here are some unique features of NDepend and not related to code metrics:
Write CQLinq rules to make sure we don't have architectural flaws, such as dependency cycles between our components, UI using directly the DB or DB entangled with the business objects.
Make sure we don't have problem with code coverage by tests (like we make sure with a CQLinq rule that if a class is supposed to be 100% covered, it will remain 100% covered in the future)
Enforce side-effects-free code (immutable class/pure methods)
Use the ability to compare 2 analysis to code review changes since the last release, before doing a new release. More specifically, I enjoy using NDepend to know which method has been added and refactored since the last release, and is not 100% covered by tests.
Having an optimal encapsulation for all our members and types (like knowing which internal methods can be declared as private). This is also related to dead-code detection that NDepend also supports.
For a complete list of features if NDepend, see here.
I don't necessarily see NDepend results as "good" or "bad" in software engineering, there's always a good reason why an application is designed the way it is. I see it as a report that can probably help me point out issues with my design, but I have the final word when it comes to deciding if a method needs to be refactored or if it's good the way I designed it. In general, don't get too caught up trying to answer if it's worth it or not. It definitely is, instead I would suggest you carefully review the results. This will help you view your design from another perspective and there may be occasions where you decide the way you designed it is the best to achieve your applications goals.