If you are planning to write a very parallel application in C#, is it better to build things very small, like
20 small classes, making 40 larger classes, and together making 60 more, for a total of 120
or gigantic like:
making these 60 classes individually (still with reusability in mind).
So in #2 these 60 classes can contain methods to do things instead of other classes.
Abstractly, neither one of those approaches would make a difference.
Concretely, minimizing mutable state will make your application more paralellizable. Every time you change the state of an instance of your object, you create the potential for thread safety issues (either complexity, or bugs; choose at least one). If you look at Parallel LINQ or functional languages emphasizing parallelism, you'll notice that class design matters less than the discipline of avoiding changes in state.
Class design is for your sanity. Loosely coupled code makes you more sane. Immutable objects make you more parallel. Combine as needed.
Smaller pieces are easier to test, easier to refactor, and easier to maintain.
It's not the size of the classes, but the scope of the coupling that matters.
For parallel applications, you should favor immutable objects---sometimes called "value objects" rather than objects with a lot of properties. If you need to apply operations that result in new values, just create new objects as the result.
Observe good separation of concerns, and let that lead you to the natural number of classes to represent the concepts in your program. I recommend the SOLID principles, cataloged and popularized by Robert Martin from ObjectMentor. (That should be enough Google-fodder to locate the list!)
Finally, I also recommend that you get intimate with both System.Threading and System.Collections. Most of the collections are not inherently thread safe, and synchronization is notoriously difficult to get right. So, you're better off using widely-used, tested, reliable synchronization primitives.
Related
Quite an academic question this - I've had it remarked that a class I've written in a WCF service is very long (~3000 lines) and it should be broken down into smaller classes.
The scope of the service has grown over time and the methods contained contain many similar functions hence me not creating multiple smaller classes up until now so I've no problem with doing so (other than the time it'll take to do so!), but it got me thinking - is there a significant performance overhead in using a single large class instead of multiple smaller classes? If so, why?
It won't make any noticeable difference. Before even thinking about such extreme micro-optimization, you should think about maintainability, which is quite endangered with a class of about 3000 LOC.
Write your code first such that it is correct and maintainable. Only if you then really run into performance problems, you should first profile your application before making any decisions about optimizations. Usually performance bottlenecks will be found somewhere else (lack of parallelization, bad algorithms etc.).
No, having one large class should not affect performance. Splitting a large class into smaller classes could even reduce performance as you will have more redirections. However, the impact is negligible in almost all cases.
The purpose of splitting a class into smaller parts is not to improve performance but to make it easier to read, modify and maintain the code. But this alone is enough reason to do it.
Performance considerations are the last of your worries when it comes to the decision to add a handful of well designed classes over a single source file. Think more of:
Maintainability... It's hard to make point fixes in so much code.
Readability... If a you have to page up and down like a fiend to
get anywhere, it's not readable.
Reusability... No decomposition
makes things difficult to reuse.
Cohesion... If you're doing too
many things in a single class, it's probably not cohesive in any way.
Testability... Good luck unit testing a 3,000 LoC bunch of
spaghetti code to any sensible level of coverage.
I could go on, but the mentality of large single source files seems to hark back to the VB/Procedureal programing era. Nowadays, I start to get the fear if a method has a cyclomatic complexity of more than 15 and a class has more than a couple of hundred lines in it.
Usually I find that if I refactor one of these 10k line of code behemoths, the sum total of the lines of code of the new classes ends up being 40% of the original if not less. More classes and decomposition (within reason) lead to less code. Counterintuitive at first, but it really works.
The real issue is not performance overhead. The overhead is in maintainability and reuse. You may have hard of the SOLID principles of Object Oriented design, a number of which imply smaller classes are better. In particular, I'd look at the Single Responsibility Principle, the Open/Closed Principle and the Liskov Substitution Principle, and... actually, come to think of it they all pretty much imply smaller classes are better, albeit indirectly.
This stuff is not easy to 'get'. If you've been programming with an OO language a while you look at SOLID and it suddenly makes so much sense. But until those lightbulbs come on it can seem a bit obscure.
On a far simpler note, having several classes, with one file per class, each one sensibly named to describe the behaviour, where each class has a single job, has to be easier to manage from a pure sanity perspective than a long page of 3,000 lines.
And then consider if one part of your 3,000 line class might be useful in another part of your program... putting that functionality in a dedicated class is an excellent way of encapsulating it for reuse.
In essence, as I write, I'm finding I'm just teasing out aspects of SOLID anyway. You'd probably be best to read straight from the horses mouth on this.
I wouldn't say there are performance issues, but rather maintenance a readability issues. It's far more easier to modify more classes that each perform its purpose than to work with a single monstrous class. That's just ridiculous. You're breaking all OOP principles by doing so.
hence me not creating multiple smaller classes up until now so I've no
problem with doing so
Precisely the case I've been warning of multiple times at SO already... People are afraid of premature optimization, but they are not afraid of writing a bad code with an idea like "I'll fix it later when it becomes an issue". Let me tell you something - 3000+ LOC class IS already an issue, no matter the performance impacts, if any.
It depends on how class is used and how often is instantiated. When class is instantiated once, e.g. contract service class, than performance overhead typical is not significant.
When class will be instantiated often, than it could reduce performance.
But in this case think not about performance, think about it design. Better to think about support and further development and testability. Classes of 3K LOC are huge and typically books of anti-patterns. Such classes are leading to code duplication and bugs, further development will be painful and causes already fixed bugs appear again and again, code is fragile.
So class definitely should be refactored.
I've been playing around w/ the specification pattern to handle and contain the business logic in our c#/mvc application. So far so good. I do have a question though - since we'll be creating a number of specification objects on the heap, will that affect performance in any way versus, say creating helper methods to handle the business logic? Thanks!
I do have a question though - since we'll be creating a number of specification objects on the heap, will that affect performance in any way versus, say creating helper methods to handle the business logic?
Of course it will affect performance, every line of code you write and design choice you makes affects performance in one way or another. This one is unlikely to be meaningful, be a bottleneck in your application or be worth caring about as this is almost surely a case of premature optimization. These days you should just focus on modeling your domain properly, and writing extremely clear and maintainable code. Focus more on developer productivity than on machine productivity. CPU cycles are cheap, and in nearly limitless supply. Developer cycles are not cheap, and are not limitless in supply.
But only you can know if it will impact the real-world use of your application on real-world data by profiling. We don't, and can't know, because we don't know your domain, don't know your users, don't know what performance you expect, etc. And even if we knew those things, we still couldn't give you as powerful of an answer as you can give yourself by dusting a profiler off the shelf and seeing what your application actually does.
since we'll be creating a number of specification objects on the heap, will that affect performance in any way
Most design patterns trade off some overhead for cleanliness of design - this is no exception. In general, the amount of memory that the specifications add is very minimal (typically a couple of references, and that's it). In addition, they tend to add a couple of extra method calls vs. custom logic.
That being said, I would not try to prematurely optimize this. The overhead here is incredibly small, so I would highly doubt it would be noticeable in any real world application.
If you use NSpecifications lib just as the examples in its GitHub page, you'll get the benefits from both worlds:
Most of these specifications are simply stored in static members therefore it doesn't take much from the heap
These specifications also use compiled expressions so that they can be reused many times with better performance
If you are using ORM to query the database with lambda expressions, that also uses the heap, the difference here is that NSpecifications stores those expressions inside a Spec object so that it can be reused for both business loginc and querying.
Check here
https://github.com/jnicolau/NSpecifications
I have recently been looking at code, specifically component oriented code that uses threads internally. Is this a bad practise. The code I looked at was from an F# example that showed the use of event based programming techniques. I can not post the code in case of copyright infringements, but it does spin up a thread of its own. Is this regarded as bad practise or is it feasible that code not written by yourself has full control of thread creation. I do point out that this code is not a visual component and is very much "built from scratch".
What are the best practises of component creation where threading would be helpful?
I am completely language agnostic on this, the f# example could have been in c# or python.
I am concerned about the lack of control over the components run time and hogging of resources, the example just implemented another thread, but as far as I can see there is nothing stopping this type of design from spawning as many threads as it wishes, well to the limit of what your program allows.
I did think of methods such as object injecting and so fourth, but threads are weird as they are from a component perspective pure "action" as opposed to "model, state, declarations"
any help would be great.
This is too general a question to bear any answer more specific than "it depends" :-)
There are cases when using internal threads within a component is completely valid, and there are cases when not. This has to be decided on a case by case basis. Overall, though, since threads do make the code much more difficult to test and maintain, and increase the chances of subtle, hard to find bugs, they should be used with caution, only when there is a really decisive reason to use them.
An example to the legitimate use of threads is a worker thread, where a component handling an event starts an action which takes a long time to execute (such as a lengthy computation, a web request, or extensive file I/O), and spawns a separate thread to do the job, so that the control can be immediately returned to the interface to handle further user input. Without the worker thread, the UI would be totally unresponsive for a long time, which usually makes users angry.
Another example is a lengthy calculation/process which lends itself well to parallel execution, i.e. it consists of many smaller independent tasks of more or less similar size. If there are strong performance requirements, it does indeed make sense to execute the individual tasks in a concurrent fashion using a pool of worker threads. Many languages provide high level support for such designs.
Note that components are generally free to allocate and use any other kinds of resources too and thus wreak havoc in countless other ways - are you ever worried about a component eating up all memory, exhausting the available file handles, reserving ports etc.? Many of these can cause much more trouble globally within a system than spawning extra threads.
There's nothing wrong about creating new threads in a component/library. The only thing wrong would be if it didn't give the consumer of the API/component a way to synchronize whenever necessary.
First of all, what is the nature of component you are talking about? Is it a dll to be consumed by some different code? What does it do? What are the business requirements? All these are essential to determine if you do need to worry about parallelism or not.
Second of all, threading is just a tool to acheive better performance, responsivness so avoiding it at all cost everywhere does not sound like a smart approach - threading is certainly vital for some business needs.
Third of all, when comparing threading symantics in c# vs f#, you have to remember that those are very different beasts in theirselfs - f# implicitly makes threading safer to code as there is no notion of global variables hence the critical section in your code is something easier to eschew in f# than in c#. That puts your as a deleloper in a better place bc you dont have to deal with memoryblocks, locks, semaphores etc.
I would say if your 'component' relies heavily on threading you might want to consider using either the parallel FX in c# or even go with f# since it kind of approaches working with processer time slicing and parallelism in more elegant way (IMHO).
And last but not least, when you say about hogging up computer resources by using threading in your component - please remember that coding threads do not necessarily impose higher resource impact per se – you can just as easily do the same damage on one thread if you don’t dispose of your objects (unmaneged) properly, granted you might get OutOfMemeory Exception faster when you make the same mistake on several threads…
In which situations are CERs useful? I mean, real-life situations, not some abstract examples.
Do you personally use them? Haven't seen their usage except for examples in books and articles. That, for sure, can be because of my insufficient programming experience. So I am also interested how wide-spread technique it is.
What are the pros and cons for using them?
In which situations are CERs useful? I mean, real-life situations, not some abstract examples.
When building software that has stringent reliability requirements. Database servers, for example, must not leak resources, must not corrupt internal data structures, and must keep running, period, end of story, even in the face of godawful scenarios like thread aborts.
Building managed code that cannot possibly leak, that maintains consistent data structures when aborts can happen at arbitrary places, and keeps the service going is a difficult task. CERs are one of the tools in the toolbox for building such reliable services; in this case, by restricting where aborts can occur.
One can imagine other services that must stay reliable in difficult circumstances. Software that, say, finds efficient routes for ambulances, or moves robot arms around in factories, has higher reliability constraints than your average end user code running on a desktop machine.
Do you personally use them?
No. I build compilers that run on end-user machines. If the compiler fails halfway through a compilation, that's unfortunate but it is not likely to have a human life safety impact or result in the destruction of important data.
I am also interested how wide-spread technique it is.
I have no idea.
What are the pros and cons for using them?
I don't understand the question. You might as well ask what the pros and cons of a roofing hatchet are; unless you state the task that you intend to use the hatchet for, it's hard to say what the pros and cons of the tool are. What task do you wish to perform with CERs? Once we know the task we can describe the pros and cons of using any particular tool to accomplish that task.
I am getting two contradicting views on this. Some source says there should be less little methods to reduce method calls, but some other source says writing shorter method is good for letting the JIT to do the optimization.
So, which side is correct?
The overhead of actually making the method call is inconsequentially small in most every case. You never need to worry about it unless you can clearly identify a problem down the road that requires revisiting the issue (you won't).
It's far more important that your code is simple, readable, modular, maintainable, and modifiable. Methods should do one thing, one thing only and delegate sub-things to other routines. This means your methods should be as short as they can possibly be, but not any shorter. You will see far more performance benefits by having code that is less prone to error and bugs because it is simple, than by trying to outsmart the compiler or the runtime.
The source that says methods should be long is wrong, on many levels.
None, you should have relatively short method to achieve readability.
There is no one simple rule about function size. The guideline should be a function should do 'one thing'. That's a little vague but becomes easier with experience. Small functions generally lead to readability. Big ones are occasionally necessary.
Worrying about the overhead of method calls is premature optimization.
As always, it's about finding a good balance. The most important thing is that the method does one thing only. Longer methods tend to do more than one thing.
The best single criterion to guide you in sizing methods is to keep them well-testable. If you can (and actually DO!-) thoroughly unit-test every single method, your code is likely to be quite good; if you skimp on testing, your code is likely to be, at best, mediocre. If a method is difficult to test thoroughly, then that method is likely to be "too big" -- trying to do too many things, and therefore also harder to read and maintain (as well as badly-tested and therefore a likely haven for bugs).
First of all, you should definitely not be micro-optimizing the performance on the number-of-methods level. You will most likely not get any measurable performance benefit. Only if you have some method that are being called in a tight loop millions of times, it might be an idea - but don't begin optimizing on that before you need it.
You should stick to short concise methods, that does one thing, that makes the intent of the method clear. This will give you easier-to-read code, that is easier to understand and promotes code reuse.
The most important cost to consider when writing code is maintanability. You will spend much, much more time maintaining an application and fixing bugs than you ever will fixing performance problems.
In this case the almost certainly insignificant cost of calling a method is incredibly small when compared to the cost of maintaining a large unwieldy method. Small concise methods are easier to maintain and comprehend. Additionally the cost of calling the method almost certainly will not have a significant performance impact on your application. And if it does, you can only assertain that by using a profiler. Developers are notoriously bad at identifying performance problems before hand.
Generally speaking, once a performance problem is identified, they are easy to fix. Making a method or more importantly a code base, maintainable is a much higher cost.
Personally, I am not afraid of long methods as long as the person writing them writes them well (every piece of sub-task separated by 2 newlines and a nice comment preceeding it, etc. Also, identation is very important.).
In fact, many times I even prefer them (e.g. when writing code that does things in a specific order with sequential logic).
Also, I really don't understand why breaking a long method into 100 pieces will improve readablility (as others suggest). Only the opposite. You will only end-up jumping all over the place and holding pieces of code in your memory just to get a complete picture of what is going on in your code. Combine that with possible lack of comments, bad function names, many similar function names and you have the perfect recipe for chaos.
Also, you could go the other end while trying to reduce the size of the methods: to create MANY classes and MANY functions each of which may take MANY parameters. I don't think this improves readability either (especially for a begginer to a project that has no clue what each class/method do).
And the demand that "a function should do 'one thing'" is very subjective. 'One thing' may be increasing a variable by one up to doing a ton of work supposedly for the 'same thing'.
My rule is only reuseability:
The same code should not appear many times in many places. If this is the case you need a new function.
All the rest is just philosophical talk.
In a question of "why do you make your methods so big" I reply, "why not if the code is simple?".