Is it safe to generally assume that toString() has a low cost? - c#

Do you generally assume that toString() on any given object has a low cost (i.e. for logging)? I do. Is that assumption valid? If it has a high cost should that normally be changed? What are valid reasons to make a toString() method with a high cost? The only time that I get concerned about toString costs is when I know that it is on some sort of collection with many members.
From: http://jamesjava.blogspot.com/2007/08/tostring-cost.html
Update: Another way to put it is: Do you usually look into the cost of calling toString on any given class before calling it?

No it's not. Because ToString() can be overloaded by anyone, they can do whatever they like. It's a reasonable assumption that ToString() SHOULD have a low cost, but if ToString() accesses properties that do "lazy loading" of data, you might even hit a database inside your ToString().

The Java standard library seems to have been written with the intent of keeping the cost of toString calls very low. For example, Java arrays and collections have toString methods which do not iterate over their contents; to get a good string representation of these objects you must use either Arrays.toString or Collections.toString from the java.util package.
Similarly, even objects with expensive equals methods have inexpensive toString calls. For example, the java.net.URL class has an equals method which makes use of an internet connection to determine whether two URLs are truly equal, but it still has a simple and constant-time toString method.
So yes, inexpensive toString calls are the norm, and unless you use some weird third-party package which breaks with the convention, you shouldn't worry about these taking a long time.
Of course, you shouldn't really worry about performance until you find yourself in a situation where your program is taking too long, and even then you should use a profiler to figure out what's taking so longer rather than worrying about this sort of thing ahead of time.

The best way to find out is to profile your code. However, rather than worry that a particular function has a high overhead, it's (usually) better to worry about the correctness of your application and then do performance profiling on it (but be wary that real-world use and your test setup may differ radically). As it turns out, programmers generally guess wrong about what's really slow in their application and they often spend a lot of time optimizing things that don't need optimizing (eliminating that triple nested loop which only consumes .01% of your application's time is probably a waste).
Fortunately, there are plenty of open source profilers for Java.

Do you generally assume that toString() on any given object has a low cost? I do.
Why would you do that? Profile your code if you're running into performance issues; it'll save you a lot of time working past incorrect assumptions.

Your question's title uses the contradictory words "safe" and "generally." So even though in comments you seem to be emphasizing the general case, to which the answer is probably "yes, it's generally not a problem," a lot of people are seeing "safe" and therefore are answering either "No, because there's a risk of arbitrarily poor performance," or "No, because if you want to be 'safe' with a performance question, you must profile."

Since I generally only call toString() on methods and classes that I have written myself and overrode the base method, then I generally know what the cost is ahead of time. The only time I use toString() otherwise is error handling and or debugging when speed is not of the same importance.

My pragmatic answer would be: yes, you always assume a toString() call is cheap, unless you make an enormous amount of them. On the one hand, it is extremely unlikely that a toString() method would be expensive and on the other hand, it is extremely unlikely that you run into trouble if it isn't. I generally don't worry about issues like these, because there are too many of them and you won't get any code written if you do ;).
If you do run into performance issues, everything is open, including the performance of toString() and you should, as Shog9 suggest, simply profile the code. The Java Puzzlers show that even Sun wrote some pretty nasty constructors and toString() methods in their JDK's.

I think the question has a flaw. I wouldn't even assume toString() will print a useful piece of data. So, if you begin with that assumption, you know you have to check it prior to calling it and can assess it's 'cost' on a case by case basis.

Possibly the largest cost with naive toString() chaining is appending all those strings. If you want to generate large strings, you should use an underlying representation that supports an efficient append. If you know the append is efficient, then toString()s probably have a relatively low cost.
For example, in Java, StringBuilder will preallocate some space so that a certain amount of string appending takes linear time. It will reallocate when you run out of space.
In general, if you want to append sequences of things and you for whatever reason don't want to do something similar, you can use difference lists. These support linear time append by turning sequence appending into function composition.

toString() is used to represent an object as a String. So if you need slow running code to create a representation of an object, you need to be very careful and have a very good reason to do so. Debugging would be the only one I can think of where a slow running toString is acceptable.

My thought is:
Yes on standard-library objects
No on non-standard objects unless you have the source code in front of you and can check it.

I will always override toString to put in whatever I think I will need to debug problems. It it usually up to the developer to use it by either calling the toString method itself or having another class call it for you (println, logging, etc.).

There's an easy answer to this one, which I first heard in a discussion about reflection: "if you have to ask, you can't afford it."
Basically, if you need ToString() of large objects in the day-to-day operation of your program, then your program is crazy. Even if you need to ToString() an integer for anything time critical, your program is crazy, because it's obviously using a string where an integer would do.
ToString() for log messages is automatically okay, because logging is already expensive. If your program is too slow, turn down the log level! It doesn't really matter how slow it is to actually generate the debug messages, as long as you can choose to not generate them. (Note: your logging infrastructure should call ToString() itself, and only when the log message is supposed to be printed. Don't ToString() it by hand on the way into the log infrastructure, or you'll pay the price even if the log level is low and you won't be printing it after all! See http://www.colijn.ca/~caffeine/?m=200708#16 for more explanation of this.)

Since you put "generally" in your question, I would say yes. For -most- objects, there isn't going to be a costly ToString overload. There definitely can be, but generally there won't be.

In general I consider toString() low cost when I use it on simple objects, such as an integer or very simple struct. When applied to complex objects, however, toString() is a bit of crap shoot. There are two reason's for this. First, complex objects tend to contain other objects, so a single call to toString() can cascade into many calls to toString() on other objects plus the overehead of concatenating all those results. Second, there is no "standard" for converting complex objects to strings. One toString() call may yeild a single line of comma-separated values; another a much more verbose form. Only by checking it yourself can you know.
So my rule is toString() on simple objects is generally safe but on complex objects is suspect until checked.

I'd avoid using toString() on objects other than the basic types. toString() may not display anything useful. It may iterate over all the member variables and print them out. It may load something not yet loaded. Depending on what you plan to do with that string, you should consider not building it.
There are typically a few reasons why you use toString(): logging/debugging is probably the most common for random objects; display is common for certain objects (such as numbers). For logging I'd do something like
if(logger.isDebugEnabled()) {
logger.debug("The zig didn't take off. Response: {0}", response.getAsXML().toString());
}
This does two things: 1. Prevents constructing the string and 2. Prevents unnecessary string addition if the message won't be logged.

In general, I don't check every implementation. However, if I see a dependency on Apache commons, alarm bells go off and I look at the implementation more closely to make sure that they aren't using ToStringBuilder or other atrocities.

Related

c# handle return value and execute code

there is some code to execute after validation.
consider a variable SOQualityStandards = true;
this variable is validated before the execution of code.
i have come across two ways of checking SOQualityStandards
one is
if(SOQualityStandards)
{
//code to execute
}
and the other is
if(!SOQualityStandards) return;
//code to execute
is there any performance difference between both. which one should i consider.
They have the same semantics (assuming there is no other code in the function after the if-block in the first example).
I find the first to be clearer, but that is a matter of personal preference.
The compiler will consider those two options to be the same and could transform one into the other or visa versa, so performance considerations are irrelivent. Even if performence were affected, I think the readability/maintainability is a larger issue in these questions anyway.
I tend to do the return at the beginning in cases like this, because it reduces the indentations and mental burden in reading the rest of the method. The states tested in returns at the beginning become states that no longer need to be considered in understanding the method. Large if blocks, on the other hand, require mentally tracking the state differences throughout the method.
This becomes especially important if there are several tests that need to be done to guard an interior block of code.
There's no difference in the two approaches. It's a matter of personal choice.
Choosing between the two is just personal preference (see Sven's answer and bomslang's answer). Micro-optimization is in most cases completely unnecessary.
You shouldn't optimize execute-time until you see it's a problem. You could be spending that valuable time to add other functionality or to come up with improvements to the system architecture.
In the case you actually need to optimize, loops and recursive functions are generally the first place to look.
If you would need further optimization than that, single line variable checks and manipulations would still be some of the last things to optimize.
Remember (as Jeffrey said) readability and maintainability are in most cases the most important factors.

Using pre-defined instances

A few problem about instances are that Equals method doesn't provide true if even they contain same values. So trying to override Equals method provide much more slower than reference equality.
While i was thinking point of performance, i thought that it is stupidly to create 2 instance which is same but not same memory adress. If i can avoid to create same instances with different references will increase performance and help to compare references will be more easy than write custom Equal methods.
For example:
I have Coordinate class Which hold chess board coordinates.So i need only Coordinate[8,8] array for represent all cordinates of board. Instead of create instances, i can create all instances then my factory method can return them.
Cooardinate.Get(2,3) instead of new Coordinate(2,3)
First is static class's static method which returns pre-defined coordinate in given values.
Another advantage is that we will not spend time to create and collect garbage objects in memory. All of them is pre-defined already. Also we can provide unique GetHashCode for instances in easy and fast way like 0 for [0,0], 1 for [0,1]....
Isn't it worthy to try ? this idea would make coding harder write or understand ? Is there any such a pattern ?
Well, shortly what is disadvantages of this way ?
This is a good solution and in certain situations can save you a lot of time and memory. The main drawback is that it gets really complicated if your objects are mutable. If that is not the case, then this is really not that bad. You just have to make sure that all instances are obtained from the same factory. You do not even have to create all the instances in advance, but can make the class create a new instance when a particular set of parameters is requested for the first time (basically lazy-loading).
In this instance where you're dealing with a chess board (so 64 total coordinates) you don't need to be overly concerned with performance or garbage collection. Having said that, I think holding the baseline 64 coordinates in a dictionary is just fine (and makes sense). In terms of the Equals() comparison, you're basically comparing 2 integer values which is going to be lightning fast so overriding the Equals() method to specify that comparison is the right approach.
The short answer is that the disadvantage of this approach is that it increases the complexity of your code, thereby making it more difficult to code correctly, more difficult to debug, and more difficult to maintain.
Therefore, as with all optimisations, you should only contemplate it if you have a genuine need to optimise (e.g. it's running way too slow, or using way too much memory), and if the reward (faster performance, smaller memory usage) outweighs the risks (spending time optimising when you could be doing something more useful, introducing a bug, not being able to find a bug, not being being able to modify the code in the future quickly and easily, or without introducing a bug).
If you really were running into performance problems because of having to do deep comparisons in order to determine equality then you might want to look at the Flyweight pattern, but as you're only talking about 64 pairs of smallints I think that would be complete overkill in this case. What you actually need is to override the Equals operator to compare the two coordinates - that will be plenty fast enough. Anything more complex than that is probably a false optimisation, at least on most normal platforms.

Long lists of pass-by-ref parameters versus wrapper types

I need to get three objects out of a function, my instinct is to create a new type to return the three refs. Or if the refs were the same type I could use an array. However pass-by-ref is easier:
private void Mutate_AddNode_GetGenes(ref NeuronGene newNeuronGene, ref ConnectionGene newConnectionGene1, ref ConnectionGene newConnectionGene2)
{
}
There's obviously nothing wrong with this but I hesitate to use this approach, mostly I think for reasons of aesthetics and psycholgical bias. Are there actually any good reasons to use one of these approaches over the others? Perhaps a performance issue with creating extra wrapper objects or pushing parameters onto the stack. Note that in my particular case this is CPU intensive code. CPU cycles matter.
Is there a more elegant C#2 of C#3 approach?
Thanks.
For almost all computing problems, you will not notice the CPU difference. Since your sample code has the word "Gene" in it, you may actually fall into the rare category of code that would notice.
Creating and destroying objects just to wrap other objects would cost a bit of performance (they need to be created and garbage collected after all).
Aesthetically I would not create an object just to group unrelated objects, but if they logically belong together it is perfectly fine to define a containing object.
If you're worrying about the performance of a wrapping type (which is a lot cleaner, IMHO), you should use a struct. Current 32-bits implementations of .NET (and the upcomming 64-bits 4.0) support inlining / optimizing away of structs in many cases, so you'd probably see no performance difference whatsoever between a struct and ref arguments.
Worrying about the relative execution speed of those two options is probably a premature optimization. Focus on getting the algorithm correct first, and having clean, maintainable code. When that's done, you can run a profiler on it and optimize the 20% of the code that takes 80% of the CPU time. Even if this method ends up being in that 20%, the difference between the two calling styles is probably to small to register.
So, performance issues aside, I'd probably use a container class. Since this method takes only those three parameters, and (presumably) modifies each one, it sounds like it would make sense to have it as a method of the container class, with three member variables instead of ref parameters.

Hash codes for immutable types

Are there any considerations for immutable types regarding hash codes?
Should I generate it once, in the constructor?
How would you make it clear that the hash code is fixed? Should I? If so, is it better to use a property called HashCode, instead of GetHashCode method? Would there be any drawback to it? (Considering both would work, but the property would be recommend).
Are there any considerations for immutable types regarding hash codes?
Immutable types are the easiest types to hash correctly; most hash code bugs happen when hashing mutable data. The most important thing is that hashing and equality agree; if two instances compare as equal, they should have the same hash code. (The reverse is not necessarily true; two instances that have the same hash need not be equal.)
Should I generate it once, in the constructor?
That's a performance optimizing technique; by doing so, you trade increased consumption of space (for the storage of the computed value) for a possible decrease in time. I never make performance optimizations unless they are driven by realistic, customer-focused performance tests that carefully measure the performance of both options against documented goals. You should do this if your carefully-designed experiments indicate that (1) failure to do so causes you to miss your goal, and (2) doing so causes you to meet your goal.
How would you make it clear that the hash code is fixed?
I don't understand the question. A changing hash code is the exception, not the rule. Hash codes are always supposed to be unchanging. If the hash code of an object changes then the object can get "lost" in a hash table, so everyone should assume that hash codes remain stable.
is it better to use a property called HashCode, instead of GetHashCode method?
What consumer of your object is going to say "well, I could call GetHashCode(), a method guaranteed to be on all objects, but instead I'm going to call this HashCode getter that does exactly the same thing" ? Do you have such a consumer in mind?
If you don't have any consumers of functionality, then don't provide the functionality.
I wouldn't normally generate it in the constructor, but I'd also want to know more about the expected usage before deciding whether to cache it or not.
Are you expecting a small number of instances, which get hashed an awful lot and which take a long time to calculate the hash? If so, caching may be appropriate. If you're expecting a large number of potentially "throw-away" instances, I wouldn't bother caching.
Interestingly, .NET and Java made different choices for String in this respect - Java caches the hash, .NET doesn't. Given that many string instances are never hashed, and those which are hashed are often only hashed once (e.g. on insertion into the hash table) I think I favour .NET's decision here.
Basically you're trading memory + complexity against speed. As Michael says, test before making your code more complex. Of course in some cases (e.g. for a class library) you can't accurate predict the real-world usage, but in many situations you'll have a pretty good idea.
You certainly don't need a separate property though. Hash codes should always stay the same unless someone changes the state of the object - and if your type is immutable, you're already prohibiting that, therefore a user shouldn't expect any changes. Just override GetHashCode().
I would generate the hash code once when getHashCode is called the first time, then cache it for later calls. This avoids calling it in the constructor when it may not be needed.
If you don't expect to call getHashCode very many times for each value object, you may not need to cache the value at all.
Well, you've got to have a GetHashCode() overridden method, as that's how consumers are going to retrieve your hashcode. Most hashcodes are fairly simple arithmetic operations, that will execute quickly. Do you have a reason to believe that caching the results (which has a memory cost) will give you a noticeable performance improvement?
Start simple - generate the hashcode on the fly. If you think you'll see performance improvements caching it, test first.
Regulations require me to refer you to the "premature optimization is the root of all evil" quote at this point.
I know from my personal experience that developers are really good at misjudging performance issues.
So it it recommended to keep everything as simple as possible while calculating hash code on the fly in the GetHashCode().
Why do you need to make sure that the hashcode is fixed? The semantics of a hashcode are that it will always be the same value for any given state of an object. Since your objects are immutable, this is a given. How you choose to implement GetHashCode is us up to you.
Having it be a private field that is returned is one choice - it's small, easy, and fast.
In general, computing the HashCode should be fast. So caching should not be much of an optimization, and not worth the trouble.
If profiling really shows that GethashCode takes a significant amount of time then maybe you should cache it, as a fix.
But I wouldn't consider it part of the normal practice.

C#, Writing longer method or shorter method?

I am getting two contradicting views on this. Some source says there should be less little methods to reduce method calls, but some other source says writing shorter method is good for letting the JIT to do the optimization.
So, which side is correct?
The overhead of actually making the method call is inconsequentially small in most every case. You never need to worry about it unless you can clearly identify a problem down the road that requires revisiting the issue (you won't).
It's far more important that your code is simple, readable, modular, maintainable, and modifiable. Methods should do one thing, one thing only and delegate sub-things to other routines. This means your methods should be as short as they can possibly be, but not any shorter. You will see far more performance benefits by having code that is less prone to error and bugs because it is simple, than by trying to outsmart the compiler or the runtime.
The source that says methods should be long is wrong, on many levels.
None, you should have relatively short method to achieve readability.
There is no one simple rule about function size. The guideline should be a function should do 'one thing'. That's a little vague but becomes easier with experience. Small functions generally lead to readability. Big ones are occasionally necessary.
Worrying about the overhead of method calls is premature optimization.
As always, it's about finding a good balance. The most important thing is that the method does one thing only. Longer methods tend to do more than one thing.
The best single criterion to guide you in sizing methods is to keep them well-testable. If you can (and actually DO!-) thoroughly unit-test every single method, your code is likely to be quite good; if you skimp on testing, your code is likely to be, at best, mediocre. If a method is difficult to test thoroughly, then that method is likely to be "too big" -- trying to do too many things, and therefore also harder to read and maintain (as well as badly-tested and therefore a likely haven for bugs).
First of all, you should definitely not be micro-optimizing the performance on the number-of-methods level. You will most likely not get any measurable performance benefit. Only if you have some method that are being called in a tight loop millions of times, it might be an idea - but don't begin optimizing on that before you need it.
You should stick to short concise methods, that does one thing, that makes the intent of the method clear. This will give you easier-to-read code, that is easier to understand and promotes code reuse.
The most important cost to consider when writing code is maintanability. You will spend much, much more time maintaining an application and fixing bugs than you ever will fixing performance problems.
In this case the almost certainly insignificant cost of calling a method is incredibly small when compared to the cost of maintaining a large unwieldy method. Small concise methods are easier to maintain and comprehend. Additionally the cost of calling the method almost certainly will not have a significant performance impact on your application. And if it does, you can only assertain that by using a profiler. Developers are notoriously bad at identifying performance problems before hand.
Generally speaking, once a performance problem is identified, they are easy to fix. Making a method or more importantly a code base, maintainable is a much higher cost.
Personally, I am not afraid of long methods as long as the person writing them writes them well (every piece of sub-task separated by 2 newlines and a nice comment preceeding it, etc. Also, identation is very important.).
In fact, many times I even prefer them (e.g. when writing code that does things in a specific order with sequential logic).
Also, I really don't understand why breaking a long method into 100 pieces will improve readablility (as others suggest). Only the opposite. You will only end-up jumping all over the place and holding pieces of code in your memory just to get a complete picture of what is going on in your code. Combine that with possible lack of comments, bad function names, many similar function names and you have the perfect recipe for chaos.
Also, you could go the other end while trying to reduce the size of the methods: to create MANY classes and MANY functions each of which may take MANY parameters. I don't think this improves readability either (especially for a begginer to a project that has no clue what each class/method do).
And the demand that "a function should do 'one thing'" is very subjective. 'One thing' may be increasing a variable by one up to doing a ton of work supposedly for the 'same thing'.
My rule is only reuseability:
The same code should not appear many times in many places. If this is the case you need a new function.
All the rest is just philosophical talk.
In a question of "why do you make your methods so big" I reply, "why not if the code is simple?".

Categories

Resources