Specification Pattern and Performance - c#

I've been playing around w/ the specification pattern to handle and contain the business logic in our c#/mvc application. So far so good. I do have a question though - since we'll be creating a number of specification objects on the heap, will that affect performance in any way versus, say creating helper methods to handle the business logic? Thanks!

I do have a question though - since we'll be creating a number of specification objects on the heap, will that affect performance in any way versus, say creating helper methods to handle the business logic?
Of course it will affect performance, every line of code you write and design choice you makes affects performance in one way or another. This one is unlikely to be meaningful, be a bottleneck in your application or be worth caring about as this is almost surely a case of premature optimization. These days you should just focus on modeling your domain properly, and writing extremely clear and maintainable code. Focus more on developer productivity than on machine productivity. CPU cycles are cheap, and in nearly limitless supply. Developer cycles are not cheap, and are not limitless in supply.
But only you can know if it will impact the real-world use of your application on real-world data by profiling. We don't, and can't know, because we don't know your domain, don't know your users, don't know what performance you expect, etc. And even if we knew those things, we still couldn't give you as powerful of an answer as you can give yourself by dusting a profiler off the shelf and seeing what your application actually does.

since we'll be creating a number of specification objects on the heap, will that affect performance in any way
Most design patterns trade off some overhead for cleanliness of design - this is no exception. In general, the amount of memory that the specifications add is very minimal (typically a couple of references, and that's it). In addition, they tend to add a couple of extra method calls vs. custom logic.
That being said, I would not try to prematurely optimize this. The overhead here is incredibly small, so I would highly doubt it would be noticeable in any real world application.

If you use NSpecifications lib just as the examples in its GitHub page, you'll get the benefits from both worlds:
Most of these specifications are simply stored in static members therefore it doesn't take much from the heap
These specifications also use compiled expressions so that they can be reused many times with better performance
If you are using ORM to query the database with lambda expressions, that also uses the heap, the difference here is that NSpecifications stores those expressions inside a Spec object so that it can be reused for both business loginc and querying.
Check here
https://github.com/jnicolau/NSpecifications

Related

Performance overhead of large class size in c#

Quite an academic question this - I've had it remarked that a class I've written in a WCF service is very long (~3000 lines) and it should be broken down into smaller classes.
The scope of the service has grown over time and the methods contained contain many similar functions hence me not creating multiple smaller classes up until now so I've no problem with doing so (other than the time it'll take to do so!), but it got me thinking - is there a significant performance overhead in using a single large class instead of multiple smaller classes? If so, why?
It won't make any noticeable difference. Before even thinking about such extreme micro-optimization, you should think about maintainability, which is quite endangered with a class of about 3000 LOC.
Write your code first such that it is correct and maintainable. Only if you then really run into performance problems, you should first profile your application before making any decisions about optimizations. Usually performance bottlenecks will be found somewhere else (lack of parallelization, bad algorithms etc.).
No, having one large class should not affect performance. Splitting a large class into smaller classes could even reduce performance as you will have more redirections. However, the impact is negligible in almost all cases.
The purpose of splitting a class into smaller parts is not to improve performance but to make it easier to read, modify and maintain the code. But this alone is enough reason to do it.
Performance considerations are the last of your worries when it comes to the decision to add a handful of well designed classes over a single source file. Think more of:
Maintainability... It's hard to make point fixes in so much code.
Readability... If a you have to page up and down like a fiend to
get anywhere, it's not readable.
Reusability... No decomposition
makes things difficult to reuse.
Cohesion... If you're doing too
many things in a single class, it's probably not cohesive in any way.
Testability... Good luck unit testing a 3,000 LoC bunch of
spaghetti code to any sensible level of coverage.
I could go on, but the mentality of large single source files seems to hark back to the VB/Procedureal programing era. Nowadays, I start to get the fear if a method has a cyclomatic complexity of more than 15 and a class has more than a couple of hundred lines in it.
Usually I find that if I refactor one of these 10k line of code behemoths, the sum total of the lines of code of the new classes ends up being 40% of the original if not less. More classes and decomposition (within reason) lead to less code. Counterintuitive at first, but it really works.
The real issue is not performance overhead. The overhead is in maintainability and reuse. You may have hard of the SOLID principles of Object Oriented design, a number of which imply smaller classes are better. In particular, I'd look at the Single Responsibility Principle, the Open/Closed Principle and the Liskov Substitution Principle, and... actually, come to think of it they all pretty much imply smaller classes are better, albeit indirectly.
This stuff is not easy to 'get'. If you've been programming with an OO language a while you look at SOLID and it suddenly makes so much sense. But until those lightbulbs come on it can seem a bit obscure.
On a far simpler note, having several classes, with one file per class, each one sensibly named to describe the behaviour, where each class has a single job, has to be easier to manage from a pure sanity perspective than a long page of 3,000 lines.
And then consider if one part of your 3,000 line class might be useful in another part of your program... putting that functionality in a dedicated class is an excellent way of encapsulating it for reuse.
In essence, as I write, I'm finding I'm just teasing out aspects of SOLID anyway. You'd probably be best to read straight from the horses mouth on this.
I wouldn't say there are performance issues, but rather maintenance a readability issues. It's far more easier to modify more classes that each perform its purpose than to work with a single monstrous class. That's just ridiculous. You're breaking all OOP principles by doing so.
hence me not creating multiple smaller classes up until now so I've no
problem with doing so
Precisely the case I've been warning of multiple times at SO already... People are afraid of premature optimization, but they are not afraid of writing a bad code with an idea like "I'll fix it later when it becomes an issue". Let me tell you something - 3000+ LOC class IS already an issue, no matter the performance impacts, if any.
It depends on how class is used and how often is instantiated. When class is instantiated once, e.g. contract service class, than performance overhead typical is not significant.
When class will be instantiated often, than it could reduce performance.
But in this case think not about performance, think about it design. Better to think about support and further development and testability. Classes of 3K LOC are huge and typically books of anti-patterns. Such classes are leading to code duplication and bugs, further development will be painful and causes already fixed bugs appear again and again, code is fragile.
So class definitely should be refactored.

In managed code, what should I look after to keep good performance?

I am originally a native C++ programmer, in C++ every process in your program is bound to your code, i.e, nothing happens unless you want it to happen. And every bit of memory is allocated (and deallocated) according to what you wrote. So, performance is all your responsibility, if you do good, you get great performance.
(Note: Please don't complain about the code one haven't written himself such as STL, it's a C++ unmanaged code after all, that is the significant part).
But in managed code, such as code in Java and C#, you don't control every process, and memory is "hidden", or not under your control, to some extent. And that makes performance something relatively unknown, mostly you fear bad performance.
So my question is: What issues and Bold Lines should I look after and keep in mind to achieve a good performance in managed code?
I could think only of some practices such as:
Being aware of boxing and unboxing.
Choosing the correct Collection that best suites your needs and has the lowest operation cost.
But these never seem to be enough and even convincing! In fact perhaps I shouldn't have mentioned them.
Please note I am not asking for a C++ VS C# (or Java) code comparing, I just mentioned C++ to explain the problem.
There is no single answer here. The only way to answer this is: profile. Measure early and often. The bottlenecks are usually not where you expect them. Optimize the things that actually hurt. We use mvc-mini-profiler for this, but any similar tool will work.
You seem to be focusing on GC; now, that can sometimes be an issue, but usually only in specific cases; for the majority of systems the generational GC works great.
Obviously external resources will be slow; caching may be critical: in odd scenarios with very-long-lived data there are tricks you can do with structs to avoid long GEN-2 collects; serialization (files, network, etc), materialization (ORM), or just bad collection/algorithn choice may be the biggest issue - you cannot know until you measure.
Two things though:
make sure you understand what IDisposable and "using" mean
don't concatenate strings in loops; mass concatenation is the job of StringBuilder
Reusing large objects is very important in my experience.
Objects on the large object heap are implicitly generation 2, and thus require a full GC to clean up. And that's expensive.
The main thing to keep in mind with performance with managed languages is that your code can change structure at runtime to be better optimized.
For example the default JVM most people use is Sun's Hotspot VM which will actually optimize your code as it runs by converting parts of the program to native code, in-lining on the fly and other optimizations (such as the CLR or other managed runtimes) which you will never get using C++.
Additionally Hotspot will also detect which parts of you're code are used the most and optimize accordingly.
So as you can see optimising performance on a managed system is slightly harder than on an un-managed system because you have an intermediate layer that can make code faster without your intervention.
I am going to invoke the law of premature optimization here and say that you should first create the correct solution then, if performance becomes an issue, go back and measure what is actually slow before attempting to optimize.
I would suggest understanding better garbage collection algorithms. You can find good books on that matter, e.g. The Garbage Collection Handbook (by Richard Jones, Antony Hosking, Eliot Moss).
Then, your question is practically related to particular implementation, and perhaps even to a specific version of it. For instance, Mono used (e.g. in version 2.4) to use Boehm's garbage collector, but now uses a copying generational one.
And don't forget that some GC techniques can be remarkably efficient. Remember A.Appel's old paper Garbage Collection can be faster than stack allocation (but today, the cache performance matters much much more, so details are different).
I think that being aware of boxing (& unboxing) and allocation is enough. Some compilers are able to optimize these (by avoiding some of them).
Don't forget that GC performance can vary widely. There are good GCs (for your application) and bad ones.
And some GC implementations are quite fast. For example the one inside Ocaml
I would not bother that much: premature optimization is evil.
(and C++ memory management, even with smart pointers, or with ref-counters, can often be viewed as a poor man's garbage collection technique; and you don't have full control on what C++ is doing -unless you re-implement your ::operator new using operating system specific system calls-, so you don't really know a priori its performance)
.NET Generics don't specialize on reference types, which severely limits how much inlining can be done. It may (in certain performance hotspots) make sense to forgo a generic container type in favor of a specific implementation that will be better optimized. (Note: this doesn't mean to use .NET 1.x containers with element type object).
you must :
using large objects is very important in my experience.
Objects on the large object heap are implicitly generation 2, and thus require a full GC to clean up. And that's expensive.

Using Constrained Execution Regions

In which situations are CERs useful? I mean, real-life situations, not some abstract examples.
Do you personally use them? Haven't seen their usage except for examples in books and articles. That, for sure, can be because of my insufficient programming experience. So I am also interested how wide-spread technique it is.
What are the pros and cons for using them?
In which situations are CERs useful? I mean, real-life situations, not some abstract examples.
When building software that has stringent reliability requirements. Database servers, for example, must not leak resources, must not corrupt internal data structures, and must keep running, period, end of story, even in the face of godawful scenarios like thread aborts.
Building managed code that cannot possibly leak, that maintains consistent data structures when aborts can happen at arbitrary places, and keeps the service going is a difficult task. CERs are one of the tools in the toolbox for building such reliable services; in this case, by restricting where aborts can occur.
One can imagine other services that must stay reliable in difficult circumstances. Software that, say, finds efficient routes for ambulances, or moves robot arms around in factories, has higher reliability constraints than your average end user code running on a desktop machine.
Do you personally use them?
No. I build compilers that run on end-user machines. If the compiler fails halfway through a compilation, that's unfortunate but it is not likely to have a human life safety impact or result in the destruction of important data.
I am also interested how wide-spread technique it is.
I have no idea.
What are the pros and cons for using them?
I don't understand the question. You might as well ask what the pros and cons of a roofing hatchet are; unless you state the task that you intend to use the hatchet for, it's hard to say what the pros and cons of the tool are. What task do you wish to perform with CERs? Once we know the task we can describe the pros and cons of using any particular tool to accomplish that task.

Should a programmer really care about how many and/or how often objects are created in .NET?

This question has been puzzling me for a long time now. I come from a heavy and long C++ background, and since I started programming in C# and dealing with garbage collection I always had the feeling that such 'magic' would come at a cost.
I recently started working in a big MMO project written in Java (server side). My main task is to optimize memory comsumption and CPU usage. Hundreds of thousands of messages per second are being sent and the same amount of objects are created as well. After a lot of profiling we discovered that the VM garbage collector was eating a lot of CPU time (due to constant collections) and decided to try to minimize object creation, using pools where applicable and reusing everything we can. This has proven to be a really good optimization so far.
So, from what I've learned, having a garbage collector is awesome, but you can't just pretend it does not exist, and you still need to take care about object creation and what it implies (at least in Java and a big application like this).
So, is this also true for .NET? if it is, to what extent?
I often write pairs of functions like these:
// Combines two envelopes and the result is stored in a new envelope.
public static Envelope Combine( Envelope a, Envelope b )
{
var envelope = new Envelope( _a.Length, 0, 1, 1 );
Combine( _a, _b, _operation, envelope );
return envelope;
}
// Combines two envelopes and the result is 'written' to the specified envelope
public static void Combine( Envelope a, Envelope b, Envelope result )
{
result.Clear();
...
}
A second function is provided in case someone has an already made Envelope that may be reused to store the result, but I find this a little odd.
I also sometimes write structs when I'd rather use classes, just because I know there'll be tens of thousands of instances being constantly created and disposed, and this feels really odd to me.
I know that as a .NET developer I shouldn't be worrying about this kind of issues, but my experience with Java and common sense tells me that I should.
Any light and thoughts on this matter would be much appreciated. Thanks in advance.
Yes, it's true of .NET as well. Most of us have the luxury of ignoring the details of memory management, but in your case -- or in cases where high volume is causing memory congestion -- then some optimizaition is called for.
One optimization you might consider for your case -- something I've been thinking about writing an article about, actually -- is the combination of structs and ref for real deterministic disposal.
Since you come from a C++ background, you know that in C++ you can instantiate an object either on the heap (using the new keyword and getting back a pointer) or on the stack (by instantiating it like a primitive type, i.e. MyType myType;). You can pass stack-allocated items by reference to functions and methods by telling the function to accept a reference (using the & keyword before the parameter name in your declaration). Doing this keeps your stack-allocated object in memory for as long as the method in which it was allocated remains in scope; once it goes out of scope, the object is reclaimed, the destructor is called, ba-da-bing, ba-da-boom, Bob's yer Uncle, and all done without pointers.
I used that trick to create some amazingly performant code in my C++ days -- at the expense of a larger stack and the risk of a stack overflow, naturally, but careful analysis managed to keep that risk very minimal.
My point is that you can do the same trick in C# using structs and refs. The tradeoffs? In addition to the risk of a stack overflow if you're not careful or if you use large objects, you are limited to no inheritance, and you tightly couple your code, making it it less testable and less maintainable. Additionally, you still have to deal with issues whenever you use core library calls.
Still, it might be worth a look-see in your case.
This is one of those issues where it is really hard to pin down a definitive answer in a way that will help you. The .NET GC is very good at tuning itself to the memory needs of you application. Is it good enough that your application can be coded without you needing to worry about memory management? I don't know.
There are definitely some common-sense things you can do to ensure that you don't hammer the GC. Using value types is definitely one way of accomplishing this but you need to be careful that you don't introduce other issues with poorly-written structs.
For the most part however I would say that the GC will do a good job managing all this stuff for you.
I've seen too many cases where people "optimize" the crap out of their code without much concern for how well it's written or how well it works even. I think the first thought should go towards making code solve the business problem at hand. The code should be well crafted and easily maintainable as well as properly tested.
After all of that, optimization should be considered, if testing indicates it's needed.
Random advice:
Someone mentioned putting dead objects in a queue to be reused instead of letting the GC at them... but be careful, as this means the GC may have more crap to move around when it consolidates the heap, and may not actually help you. Also, the GC is possibly already using techniques like this. Also, I know of at least one project where the engineers tried pooling and it actually hurt performance. It's tough to get a deep intuition about the GC. I'd recommend having a pooled and unpooled setting so you can always measure the perf differences between the two.
Another technique you might use in C# is dropping down to native C++ for key parts that aren't performing well enough... and then use the Dispose pattern in C# or C++/CLI for managed objects which hold unmanaged resources.
Also, be sure when you use value types that you are not using them in ways that implicitly box them and put them on the heap, which might be easy to do coming from Java.
Finally, be sure to find a good memory profiler.
I have the same thought all the time.
The truth is, though we were taught to watch out for unnecessary CPU tacts and memory consumption, the cost of little imperfections in our code just negligible in practice.
If you are aware of that and always watch, I believe you are okay to write not perfect code. If you have started with .NET/Java and have no prior experience in low level programming, the chances are you will write very abusive and ineffective code.
And anyway, as they say, the premature optimization is the root of all evil. You can spend hours optimizing one little function and it turns out then that some other part of code gives a bottleneck. Just keep balance doing it simple and doing it stupidly.
Although the Garbage Collector is there, bad code remains bad code. Therefore I would say yes as a .Net developer you should still care about how many objects you create and more importantly writing optimized code.
I have seen a considerable amount of projects get rejected because of this reason in Code Reviews inside our environment, and I strongly believe it is important.
.NET Memory management is very good and being able to programmatically tweak the GC if you need to is good.
I like the fact that you can create your own Dispose methods on classes by inheriting from IDisposable and tweaking it to your needs. This is great for making sure that connections to networks/files/databases are always cleaned up and not leaking that way. There is also the worry of cleaning up too early as well.
You might consider writing a set of object caches. Instead of creating new instances, you could keep a list of available objects somewhere. It would help you avoid the GC.
I agree with all points said above: the garbage collector is great, but it shouldn't be used as a crutch.
I've sat through many wasted hours in code-reviews debating over finer points of the CLR. The best definitive answer is to develop a culture of performance in your organization and actively profile your application using a tool. Bottlenecks will appear and you address as needed.
I think you answered your own question -- if it becomes a problem, then yes! I don't think this is a .Net vs. Java question, it's really a "should we go to exceptional lengths to avoid doing certain types of work" question. If you need better performance than you have, and after doing some profiling you find that object instantiation or garbage collection is taking tons of time, then that's the time to try some unusual approach (like the pooling you mentioned).
I wish I were a "software legend" and could speak of this in my own voice and breath, but since I'm not, I rely upon SL's for such things.
I suggest the following blog post by Andrew Hunter on .NET GC would be helpful:
http://www.simple-talk.com/dotnet/.net-framework/understanding-garbage-collection-in-.net/
Even beyond performance aspects, the semantics of a method which modifies a passed-in mutable object will often be cleaner than those of a method which returns a new mutable object based upon an old one. The statements:
munger.Munge(someThing, otherParams);
someThing = munger.ComputeMungedVersion(someThing, otherParams);
may in some cases behave identically, but while the former does one thing, the latter will do two--equivalent to:
someThing = someThing.Clone(); // Or duplicate it via some other means
munger.Munge(someThing, otherParams);
If someThing is the only reference, anywhere in the universe, to a particular object, then replacing it with a reference to a clone will be a no-op, and so modifying a passed-in object will be equivalent to returning a new one. If, however, someThing identifies an object to which other references exist, the former statement would modify the object identified by all those references, leaving all the references attached to it, while the latter would cause someThing to become "detached".
Depending upon the type of someThing and how it is used, its attachment or detachment may be moot issues. Attachment would be relevant if some object which holds a reference to the object could modify it while other references exist. Attachment is moot if the object will never be modified, or if no references outside someThing itself can possibly exist. If one can show that either of the latter conditions will apply, then replacing someThing with a reference to a new object will be fine. Unless the type of someThing is immutable, however, such a demonstration would require documentation beyond the declaration of someThing, since .NET provides no standard means of annotating that a particular reference will identify an object which--despite its being of mutable type--nobody is allowed to modify, nor of annotating that a particular reference should be the only one anywhere in the universe that identifies a particular object.

C#, Writing longer method or shorter method?

I am getting two contradicting views on this. Some source says there should be less little methods to reduce method calls, but some other source says writing shorter method is good for letting the JIT to do the optimization.
So, which side is correct?
The overhead of actually making the method call is inconsequentially small in most every case. You never need to worry about it unless you can clearly identify a problem down the road that requires revisiting the issue (you won't).
It's far more important that your code is simple, readable, modular, maintainable, and modifiable. Methods should do one thing, one thing only and delegate sub-things to other routines. This means your methods should be as short as they can possibly be, but not any shorter. You will see far more performance benefits by having code that is less prone to error and bugs because it is simple, than by trying to outsmart the compiler or the runtime.
The source that says methods should be long is wrong, on many levels.
None, you should have relatively short method to achieve readability.
There is no one simple rule about function size. The guideline should be a function should do 'one thing'. That's a little vague but becomes easier with experience. Small functions generally lead to readability. Big ones are occasionally necessary.
Worrying about the overhead of method calls is premature optimization.
As always, it's about finding a good balance. The most important thing is that the method does one thing only. Longer methods tend to do more than one thing.
The best single criterion to guide you in sizing methods is to keep them well-testable. If you can (and actually DO!-) thoroughly unit-test every single method, your code is likely to be quite good; if you skimp on testing, your code is likely to be, at best, mediocre. If a method is difficult to test thoroughly, then that method is likely to be "too big" -- trying to do too many things, and therefore also harder to read and maintain (as well as badly-tested and therefore a likely haven for bugs).
First of all, you should definitely not be micro-optimizing the performance on the number-of-methods level. You will most likely not get any measurable performance benefit. Only if you have some method that are being called in a tight loop millions of times, it might be an idea - but don't begin optimizing on that before you need it.
You should stick to short concise methods, that does one thing, that makes the intent of the method clear. This will give you easier-to-read code, that is easier to understand and promotes code reuse.
The most important cost to consider when writing code is maintanability. You will spend much, much more time maintaining an application and fixing bugs than you ever will fixing performance problems.
In this case the almost certainly insignificant cost of calling a method is incredibly small when compared to the cost of maintaining a large unwieldy method. Small concise methods are easier to maintain and comprehend. Additionally the cost of calling the method almost certainly will not have a significant performance impact on your application. And if it does, you can only assertain that by using a profiler. Developers are notoriously bad at identifying performance problems before hand.
Generally speaking, once a performance problem is identified, they are easy to fix. Making a method or more importantly a code base, maintainable is a much higher cost.
Personally, I am not afraid of long methods as long as the person writing them writes them well (every piece of sub-task separated by 2 newlines and a nice comment preceeding it, etc. Also, identation is very important.).
In fact, many times I even prefer them (e.g. when writing code that does things in a specific order with sequential logic).
Also, I really don't understand why breaking a long method into 100 pieces will improve readablility (as others suggest). Only the opposite. You will only end-up jumping all over the place and holding pieces of code in your memory just to get a complete picture of what is going on in your code. Combine that with possible lack of comments, bad function names, many similar function names and you have the perfect recipe for chaos.
Also, you could go the other end while trying to reduce the size of the methods: to create MANY classes and MANY functions each of which may take MANY parameters. I don't think this improves readability either (especially for a begginer to a project that has no clue what each class/method do).
And the demand that "a function should do 'one thing'" is very subjective. 'One thing' may be increasing a variable by one up to doing a ton of work supposedly for the 'same thing'.
My rule is only reuseability:
The same code should not appear many times in many places. If this is the case you need a new function.
All the rest is just philosophical talk.
In a question of "why do you make your methods so big" I reply, "why not if the code is simple?".

Categories

Resources