A few problem about instances are that Equals method doesn't provide true if even they contain same values. So trying to override Equals method provide much more slower than reference equality.
While i was thinking point of performance, i thought that it is stupidly to create 2 instance which is same but not same memory adress. If i can avoid to create same instances with different references will increase performance and help to compare references will be more easy than write custom Equal methods.
For example:
I have Coordinate class Which hold chess board coordinates.So i need only Coordinate[8,8] array for represent all cordinates of board. Instead of create instances, i can create all instances then my factory method can return them.
Cooardinate.Get(2,3) instead of new Coordinate(2,3)
First is static class's static method which returns pre-defined coordinate in given values.
Another advantage is that we will not spend time to create and collect garbage objects in memory. All of them is pre-defined already. Also we can provide unique GetHashCode for instances in easy and fast way like 0 for [0,0], 1 for [0,1]....
Isn't it worthy to try ? this idea would make coding harder write or understand ? Is there any such a pattern ?
Well, shortly what is disadvantages of this way ?
This is a good solution and in certain situations can save you a lot of time and memory. The main drawback is that it gets really complicated if your objects are mutable. If that is not the case, then this is really not that bad. You just have to make sure that all instances are obtained from the same factory. You do not even have to create all the instances in advance, but can make the class create a new instance when a particular set of parameters is requested for the first time (basically lazy-loading).
In this instance where you're dealing with a chess board (so 64 total coordinates) you don't need to be overly concerned with performance or garbage collection. Having said that, I think holding the baseline 64 coordinates in a dictionary is just fine (and makes sense). In terms of the Equals() comparison, you're basically comparing 2 integer values which is going to be lightning fast so overriding the Equals() method to specify that comparison is the right approach.
The short answer is that the disadvantage of this approach is that it increases the complexity of your code, thereby making it more difficult to code correctly, more difficult to debug, and more difficult to maintain.
Therefore, as with all optimisations, you should only contemplate it if you have a genuine need to optimise (e.g. it's running way too slow, or using way too much memory), and if the reward (faster performance, smaller memory usage) outweighs the risks (spending time optimising when you could be doing something more useful, introducing a bug, not being able to find a bug, not being being able to modify the code in the future quickly and easily, or without introducing a bug).
If you really were running into performance problems because of having to do deep comparisons in order to determine equality then you might want to look at the Flyweight pattern, but as you're only talking about 64 pairs of smallints I think that would be complete overkill in this case. What you actually need is to override the Equals operator to compare the two coordinates - that will be plenty fast enough. Anything more complex than that is probably a false optimisation, at least on most normal platforms.
Related
I find myself often with a situation where I need to perform an operation on a set of properties. The operation can be anything from checking if a particular property matches anything in the set to a single iteration of actions. Sometimes the set is dynamically generated when the function is called, some built with a simple LINQ statement, other times it is a hard-coded set that will always remain the same. But one constant always exists: the set only exists for one single operation and has no use before or after it.
My problem is, I have so many points in my application where this is necessary, but I appear to be very, very inconsistent in how I store these sets. Some of them are arrays, some are lists, and just now I've found a couple linked lists. Now, none of the operations I'm specifically concerned about have to care about indices, container size, order, or any other functionality that is bestowed by any of the individual container types. I picked resource efficiency because it's a better idea than flipping coins. I figured, since array size is configured and it's a very elementary container, that might be my best choice, but I figure it is a better idea to ask around. Alternatively, if there's a better choice not out of resource-efficiency but strictly as being a better choice for this kind of situation, that would be nice as well.
With your acknowledgement that this is more about coding consistency rather than performance or efficiency, I think the general practice is to use a List<T>. Its actual backing store is an array, so you aren't really losing much (if anything noticable) to container overhead. Without more qualifications, I'm not sure that I can offer anything more than that.
Of course, if you truly don't care about the things that you list in your question, just type your variables as IEnumerable<T> and you're only dealing with the actual container when you're populating it; where you consume it will be entirely consistent.
There are two basic principles to be aware of regarding resource efficiency.
Runtime complexity
Memory overhead
You said that indices and order do not matter and that a frequent operation is matching. A Dictionary<T> (which is a hashtable) is an ideal candidate for this type of work. Lookups on the keys are very fast which would be beneficial in your matching operation. The disadvantage is that it will consume a little more memory than what would be strictly required. The usual load factor is around .8 so we are not talking about a huge increase or anything.
For your other operations you may find that an array or List<T> is a better option especially if you do not need to have the fast lookups. As long as you are not needing high performance on specialty operations (lookups, sorting, etc.) then it is hard to beat the general resource characteristics of array based containers.
List is probably fine in general. It's easy to understand (in the literate programming sense) and reasonably efficient. The keyed collections (e.g. Dict, SortedList) will throw an exception if you add an entry with a duplicate key, though this may not be a problem for what you're working on now.
Only if you find that you're running into a CPU-time or memory-size problem should you look at improving the "efficiency", and then only after determining that this is the bottleneck.
No matter which approach you use, there will still be creation and deletion of the underlying objects (collection or iterator) that will eventually be garbage collected, if the application runs long enough.
I need to get three objects out of a function, my instinct is to create a new type to return the three refs. Or if the refs were the same type I could use an array. However pass-by-ref is easier:
private void Mutate_AddNode_GetGenes(ref NeuronGene newNeuronGene, ref ConnectionGene newConnectionGene1, ref ConnectionGene newConnectionGene2)
{
}
There's obviously nothing wrong with this but I hesitate to use this approach, mostly I think for reasons of aesthetics and psycholgical bias. Are there actually any good reasons to use one of these approaches over the others? Perhaps a performance issue with creating extra wrapper objects or pushing parameters onto the stack. Note that in my particular case this is CPU intensive code. CPU cycles matter.
Is there a more elegant C#2 of C#3 approach?
Thanks.
For almost all computing problems, you will not notice the CPU difference. Since your sample code has the word "Gene" in it, you may actually fall into the rare category of code that would notice.
Creating and destroying objects just to wrap other objects would cost a bit of performance (they need to be created and garbage collected after all).
Aesthetically I would not create an object just to group unrelated objects, but if they logically belong together it is perfectly fine to define a containing object.
If you're worrying about the performance of a wrapping type (which is a lot cleaner, IMHO), you should use a struct. Current 32-bits implementations of .NET (and the upcomming 64-bits 4.0) support inlining / optimizing away of structs in many cases, so you'd probably see no performance difference whatsoever between a struct and ref arguments.
Worrying about the relative execution speed of those two options is probably a premature optimization. Focus on getting the algorithm correct first, and having clean, maintainable code. When that's done, you can run a profiler on it and optimize the 20% of the code that takes 80% of the CPU time. Even if this method ends up being in that 20%, the difference between the two calling styles is probably to small to register.
So, performance issues aside, I'd probably use a container class. Since this method takes only those three parameters, and (presumably) modifies each one, it sounds like it would make sense to have it as a method of the container class, with three member variables instead of ref parameters.
This question has been puzzling me for a long time now. I come from a heavy and long C++ background, and since I started programming in C# and dealing with garbage collection I always had the feeling that such 'magic' would come at a cost.
I recently started working in a big MMO project written in Java (server side). My main task is to optimize memory comsumption and CPU usage. Hundreds of thousands of messages per second are being sent and the same amount of objects are created as well. After a lot of profiling we discovered that the VM garbage collector was eating a lot of CPU time (due to constant collections) and decided to try to minimize object creation, using pools where applicable and reusing everything we can. This has proven to be a really good optimization so far.
So, from what I've learned, having a garbage collector is awesome, but you can't just pretend it does not exist, and you still need to take care about object creation and what it implies (at least in Java and a big application like this).
So, is this also true for .NET? if it is, to what extent?
I often write pairs of functions like these:
// Combines two envelopes and the result is stored in a new envelope.
public static Envelope Combine( Envelope a, Envelope b )
{
var envelope = new Envelope( _a.Length, 0, 1, 1 );
Combine( _a, _b, _operation, envelope );
return envelope;
}
// Combines two envelopes and the result is 'written' to the specified envelope
public static void Combine( Envelope a, Envelope b, Envelope result )
{
result.Clear();
...
}
A second function is provided in case someone has an already made Envelope that may be reused to store the result, but I find this a little odd.
I also sometimes write structs when I'd rather use classes, just because I know there'll be tens of thousands of instances being constantly created and disposed, and this feels really odd to me.
I know that as a .NET developer I shouldn't be worrying about this kind of issues, but my experience with Java and common sense tells me that I should.
Any light and thoughts on this matter would be much appreciated. Thanks in advance.
Yes, it's true of .NET as well. Most of us have the luxury of ignoring the details of memory management, but in your case -- or in cases where high volume is causing memory congestion -- then some optimizaition is called for.
One optimization you might consider for your case -- something I've been thinking about writing an article about, actually -- is the combination of structs and ref for real deterministic disposal.
Since you come from a C++ background, you know that in C++ you can instantiate an object either on the heap (using the new keyword and getting back a pointer) or on the stack (by instantiating it like a primitive type, i.e. MyType myType;). You can pass stack-allocated items by reference to functions and methods by telling the function to accept a reference (using the & keyword before the parameter name in your declaration). Doing this keeps your stack-allocated object in memory for as long as the method in which it was allocated remains in scope; once it goes out of scope, the object is reclaimed, the destructor is called, ba-da-bing, ba-da-boom, Bob's yer Uncle, and all done without pointers.
I used that trick to create some amazingly performant code in my C++ days -- at the expense of a larger stack and the risk of a stack overflow, naturally, but careful analysis managed to keep that risk very minimal.
My point is that you can do the same trick in C# using structs and refs. The tradeoffs? In addition to the risk of a stack overflow if you're not careful or if you use large objects, you are limited to no inheritance, and you tightly couple your code, making it it less testable and less maintainable. Additionally, you still have to deal with issues whenever you use core library calls.
Still, it might be worth a look-see in your case.
This is one of those issues where it is really hard to pin down a definitive answer in a way that will help you. The .NET GC is very good at tuning itself to the memory needs of you application. Is it good enough that your application can be coded without you needing to worry about memory management? I don't know.
There are definitely some common-sense things you can do to ensure that you don't hammer the GC. Using value types is definitely one way of accomplishing this but you need to be careful that you don't introduce other issues with poorly-written structs.
For the most part however I would say that the GC will do a good job managing all this stuff for you.
I've seen too many cases where people "optimize" the crap out of their code without much concern for how well it's written or how well it works even. I think the first thought should go towards making code solve the business problem at hand. The code should be well crafted and easily maintainable as well as properly tested.
After all of that, optimization should be considered, if testing indicates it's needed.
Random advice:
Someone mentioned putting dead objects in a queue to be reused instead of letting the GC at them... but be careful, as this means the GC may have more crap to move around when it consolidates the heap, and may not actually help you. Also, the GC is possibly already using techniques like this. Also, I know of at least one project where the engineers tried pooling and it actually hurt performance. It's tough to get a deep intuition about the GC. I'd recommend having a pooled and unpooled setting so you can always measure the perf differences between the two.
Another technique you might use in C# is dropping down to native C++ for key parts that aren't performing well enough... and then use the Dispose pattern in C# or C++/CLI for managed objects which hold unmanaged resources.
Also, be sure when you use value types that you are not using them in ways that implicitly box them and put them on the heap, which might be easy to do coming from Java.
Finally, be sure to find a good memory profiler.
I have the same thought all the time.
The truth is, though we were taught to watch out for unnecessary CPU tacts and memory consumption, the cost of little imperfections in our code just negligible in practice.
If you are aware of that and always watch, I believe you are okay to write not perfect code. If you have started with .NET/Java and have no prior experience in low level programming, the chances are you will write very abusive and ineffective code.
And anyway, as they say, the premature optimization is the root of all evil. You can spend hours optimizing one little function and it turns out then that some other part of code gives a bottleneck. Just keep balance doing it simple and doing it stupidly.
Although the Garbage Collector is there, bad code remains bad code. Therefore I would say yes as a .Net developer you should still care about how many objects you create and more importantly writing optimized code.
I have seen a considerable amount of projects get rejected because of this reason in Code Reviews inside our environment, and I strongly believe it is important.
.NET Memory management is very good and being able to programmatically tweak the GC if you need to is good.
I like the fact that you can create your own Dispose methods on classes by inheriting from IDisposable and tweaking it to your needs. This is great for making sure that connections to networks/files/databases are always cleaned up and not leaking that way. There is also the worry of cleaning up too early as well.
You might consider writing a set of object caches. Instead of creating new instances, you could keep a list of available objects somewhere. It would help you avoid the GC.
I agree with all points said above: the garbage collector is great, but it shouldn't be used as a crutch.
I've sat through many wasted hours in code-reviews debating over finer points of the CLR. The best definitive answer is to develop a culture of performance in your organization and actively profile your application using a tool. Bottlenecks will appear and you address as needed.
I think you answered your own question -- if it becomes a problem, then yes! I don't think this is a .Net vs. Java question, it's really a "should we go to exceptional lengths to avoid doing certain types of work" question. If you need better performance than you have, and after doing some profiling you find that object instantiation or garbage collection is taking tons of time, then that's the time to try some unusual approach (like the pooling you mentioned).
I wish I were a "software legend" and could speak of this in my own voice and breath, but since I'm not, I rely upon SL's for such things.
I suggest the following blog post by Andrew Hunter on .NET GC would be helpful:
http://www.simple-talk.com/dotnet/.net-framework/understanding-garbage-collection-in-.net/
Even beyond performance aspects, the semantics of a method which modifies a passed-in mutable object will often be cleaner than those of a method which returns a new mutable object based upon an old one. The statements:
munger.Munge(someThing, otherParams);
someThing = munger.ComputeMungedVersion(someThing, otherParams);
may in some cases behave identically, but while the former does one thing, the latter will do two--equivalent to:
someThing = someThing.Clone(); // Or duplicate it via some other means
munger.Munge(someThing, otherParams);
If someThing is the only reference, anywhere in the universe, to a particular object, then replacing it with a reference to a clone will be a no-op, and so modifying a passed-in object will be equivalent to returning a new one. If, however, someThing identifies an object to which other references exist, the former statement would modify the object identified by all those references, leaving all the references attached to it, while the latter would cause someThing to become "detached".
Depending upon the type of someThing and how it is used, its attachment or detachment may be moot issues. Attachment would be relevant if some object which holds a reference to the object could modify it while other references exist. Attachment is moot if the object will never be modified, or if no references outside someThing itself can possibly exist. If one can show that either of the latter conditions will apply, then replacing someThing with a reference to a new object will be fine. Unless the type of someThing is immutable, however, such a demonstration would require documentation beyond the declaration of someThing, since .NET provides no standard means of annotating that a particular reference will identify an object which--despite its being of mutable type--nobody is allowed to modify, nor of annotating that a particular reference should be the only one anywhere in the universe that identifies a particular object.
Are there any considerations for immutable types regarding hash codes?
Should I generate it once, in the constructor?
How would you make it clear that the hash code is fixed? Should I? If so, is it better to use a property called HashCode, instead of GetHashCode method? Would there be any drawback to it? (Considering both would work, but the property would be recommend).
Are there any considerations for immutable types regarding hash codes?
Immutable types are the easiest types to hash correctly; most hash code bugs happen when hashing mutable data. The most important thing is that hashing and equality agree; if two instances compare as equal, they should have the same hash code. (The reverse is not necessarily true; two instances that have the same hash need not be equal.)
Should I generate it once, in the constructor?
That's a performance optimizing technique; by doing so, you trade increased consumption of space (for the storage of the computed value) for a possible decrease in time. I never make performance optimizations unless they are driven by realistic, customer-focused performance tests that carefully measure the performance of both options against documented goals. You should do this if your carefully-designed experiments indicate that (1) failure to do so causes you to miss your goal, and (2) doing so causes you to meet your goal.
How would you make it clear that the hash code is fixed?
I don't understand the question. A changing hash code is the exception, not the rule. Hash codes are always supposed to be unchanging. If the hash code of an object changes then the object can get "lost" in a hash table, so everyone should assume that hash codes remain stable.
is it better to use a property called HashCode, instead of GetHashCode method?
What consumer of your object is going to say "well, I could call GetHashCode(), a method guaranteed to be on all objects, but instead I'm going to call this HashCode getter that does exactly the same thing" ? Do you have such a consumer in mind?
If you don't have any consumers of functionality, then don't provide the functionality.
I wouldn't normally generate it in the constructor, but I'd also want to know more about the expected usage before deciding whether to cache it or not.
Are you expecting a small number of instances, which get hashed an awful lot and which take a long time to calculate the hash? If so, caching may be appropriate. If you're expecting a large number of potentially "throw-away" instances, I wouldn't bother caching.
Interestingly, .NET and Java made different choices for String in this respect - Java caches the hash, .NET doesn't. Given that many string instances are never hashed, and those which are hashed are often only hashed once (e.g. on insertion into the hash table) I think I favour .NET's decision here.
Basically you're trading memory + complexity against speed. As Michael says, test before making your code more complex. Of course in some cases (e.g. for a class library) you can't accurate predict the real-world usage, but in many situations you'll have a pretty good idea.
You certainly don't need a separate property though. Hash codes should always stay the same unless someone changes the state of the object - and if your type is immutable, you're already prohibiting that, therefore a user shouldn't expect any changes. Just override GetHashCode().
I would generate the hash code once when getHashCode is called the first time, then cache it for later calls. This avoids calling it in the constructor when it may not be needed.
If you don't expect to call getHashCode very many times for each value object, you may not need to cache the value at all.
Well, you've got to have a GetHashCode() overridden method, as that's how consumers are going to retrieve your hashcode. Most hashcodes are fairly simple arithmetic operations, that will execute quickly. Do you have a reason to believe that caching the results (which has a memory cost) will give you a noticeable performance improvement?
Start simple - generate the hashcode on the fly. If you think you'll see performance improvements caching it, test first.
Regulations require me to refer you to the "premature optimization is the root of all evil" quote at this point.
I know from my personal experience that developers are really good at misjudging performance issues.
So it it recommended to keep everything as simple as possible while calculating hash code on the fly in the GetHashCode().
Why do you need to make sure that the hashcode is fixed? The semantics of a hashcode are that it will always be the same value for any given state of an object. Since your objects are immutable, this is a given. How you choose to implement GetHashCode is us up to you.
Having it be a private field that is returned is one choice - it's small, easy, and fast.
In general, computing the HashCode should be fast. So caching should not be much of an optimization, and not worth the trouble.
If profiling really shows that GethashCode takes a significant amount of time then maybe you should cache it, as a fix.
But I wouldn't consider it part of the normal practice.
Do you generally assume that toString() on any given object has a low cost (i.e. for logging)? I do. Is that assumption valid? If it has a high cost should that normally be changed? What are valid reasons to make a toString() method with a high cost? The only time that I get concerned about toString costs is when I know that it is on some sort of collection with many members.
From: http://jamesjava.blogspot.com/2007/08/tostring-cost.html
Update: Another way to put it is: Do you usually look into the cost of calling toString on any given class before calling it?
No it's not. Because ToString() can be overloaded by anyone, they can do whatever they like. It's a reasonable assumption that ToString() SHOULD have a low cost, but if ToString() accesses properties that do "lazy loading" of data, you might even hit a database inside your ToString().
The Java standard library seems to have been written with the intent of keeping the cost of toString calls very low. For example, Java arrays and collections have toString methods which do not iterate over their contents; to get a good string representation of these objects you must use either Arrays.toString or Collections.toString from the java.util package.
Similarly, even objects with expensive equals methods have inexpensive toString calls. For example, the java.net.URL class has an equals method which makes use of an internet connection to determine whether two URLs are truly equal, but it still has a simple and constant-time toString method.
So yes, inexpensive toString calls are the norm, and unless you use some weird third-party package which breaks with the convention, you shouldn't worry about these taking a long time.
Of course, you shouldn't really worry about performance until you find yourself in a situation where your program is taking too long, and even then you should use a profiler to figure out what's taking so longer rather than worrying about this sort of thing ahead of time.
The best way to find out is to profile your code. However, rather than worry that a particular function has a high overhead, it's (usually) better to worry about the correctness of your application and then do performance profiling on it (but be wary that real-world use and your test setup may differ radically). As it turns out, programmers generally guess wrong about what's really slow in their application and they often spend a lot of time optimizing things that don't need optimizing (eliminating that triple nested loop which only consumes .01% of your application's time is probably a waste).
Fortunately, there are plenty of open source profilers for Java.
Do you generally assume that toString() on any given object has a low cost? I do.
Why would you do that? Profile your code if you're running into performance issues; it'll save you a lot of time working past incorrect assumptions.
Your question's title uses the contradictory words "safe" and "generally." So even though in comments you seem to be emphasizing the general case, to which the answer is probably "yes, it's generally not a problem," a lot of people are seeing "safe" and therefore are answering either "No, because there's a risk of arbitrarily poor performance," or "No, because if you want to be 'safe' with a performance question, you must profile."
Since I generally only call toString() on methods and classes that I have written myself and overrode the base method, then I generally know what the cost is ahead of time. The only time I use toString() otherwise is error handling and or debugging when speed is not of the same importance.
My pragmatic answer would be: yes, you always assume a toString() call is cheap, unless you make an enormous amount of them. On the one hand, it is extremely unlikely that a toString() method would be expensive and on the other hand, it is extremely unlikely that you run into trouble if it isn't. I generally don't worry about issues like these, because there are too many of them and you won't get any code written if you do ;).
If you do run into performance issues, everything is open, including the performance of toString() and you should, as Shog9 suggest, simply profile the code. The Java Puzzlers show that even Sun wrote some pretty nasty constructors and toString() methods in their JDK's.
I think the question has a flaw. I wouldn't even assume toString() will print a useful piece of data. So, if you begin with that assumption, you know you have to check it prior to calling it and can assess it's 'cost' on a case by case basis.
Possibly the largest cost with naive toString() chaining is appending all those strings. If you want to generate large strings, you should use an underlying representation that supports an efficient append. If you know the append is efficient, then toString()s probably have a relatively low cost.
For example, in Java, StringBuilder will preallocate some space so that a certain amount of string appending takes linear time. It will reallocate when you run out of space.
In general, if you want to append sequences of things and you for whatever reason don't want to do something similar, you can use difference lists. These support linear time append by turning sequence appending into function composition.
toString() is used to represent an object as a String. So if you need slow running code to create a representation of an object, you need to be very careful and have a very good reason to do so. Debugging would be the only one I can think of where a slow running toString is acceptable.
My thought is:
Yes on standard-library objects
No on non-standard objects unless you have the source code in front of you and can check it.
I will always override toString to put in whatever I think I will need to debug problems. It it usually up to the developer to use it by either calling the toString method itself or having another class call it for you (println, logging, etc.).
There's an easy answer to this one, which I first heard in a discussion about reflection: "if you have to ask, you can't afford it."
Basically, if you need ToString() of large objects in the day-to-day operation of your program, then your program is crazy. Even if you need to ToString() an integer for anything time critical, your program is crazy, because it's obviously using a string where an integer would do.
ToString() for log messages is automatically okay, because logging is already expensive. If your program is too slow, turn down the log level! It doesn't really matter how slow it is to actually generate the debug messages, as long as you can choose to not generate them. (Note: your logging infrastructure should call ToString() itself, and only when the log message is supposed to be printed. Don't ToString() it by hand on the way into the log infrastructure, or you'll pay the price even if the log level is low and you won't be printing it after all! See http://www.colijn.ca/~caffeine/?m=200708#16 for more explanation of this.)
Since you put "generally" in your question, I would say yes. For -most- objects, there isn't going to be a costly ToString overload. There definitely can be, but generally there won't be.
In general I consider toString() low cost when I use it on simple objects, such as an integer or very simple struct. When applied to complex objects, however, toString() is a bit of crap shoot. There are two reason's for this. First, complex objects tend to contain other objects, so a single call to toString() can cascade into many calls to toString() on other objects plus the overehead of concatenating all those results. Second, there is no "standard" for converting complex objects to strings. One toString() call may yeild a single line of comma-separated values; another a much more verbose form. Only by checking it yourself can you know.
So my rule is toString() on simple objects is generally safe but on complex objects is suspect until checked.
I'd avoid using toString() on objects other than the basic types. toString() may not display anything useful. It may iterate over all the member variables and print them out. It may load something not yet loaded. Depending on what you plan to do with that string, you should consider not building it.
There are typically a few reasons why you use toString(): logging/debugging is probably the most common for random objects; display is common for certain objects (such as numbers). For logging I'd do something like
if(logger.isDebugEnabled()) {
logger.debug("The zig didn't take off. Response: {0}", response.getAsXML().toString());
}
This does two things: 1. Prevents constructing the string and 2. Prevents unnecessary string addition if the message won't be logged.
In general, I don't check every implementation. However, if I see a dependency on Apache commons, alarm bells go off and I look at the implementation more closely to make sure that they aren't using ToStringBuilder or other atrocities.