Boo/C#: Returning a collection performance - c#

I have a function that gets a collection by reference, iterates over it, constructs a new collection of the same length containing updated structs, returns that collection (or rather a reference to it, since it's Boo/C# code).
I'm worried about performance of such a function. Is performance a lot worse than just updating the collection by reference? I need to call this function tens of times per second.
Thank you. Alisa.
P.S.: Why am I doing this? I'm trying to move onto functional programming and make it as pure as possible.

It will be slower, but not by much. It will also consume more memory as you'll have two collections in RAM whenever you're updating the structs.
The impact on performance will also be affected by your collections' sizes.
The best way to answer your question is to create both functions, then profile them.

Related

Many Methods Kill Code Speed?

I'm building an application that is seriously slower than it should be (a process takes 4 seconds when it should take only .1 seconds, that's my goal at least).
I have a bunch of methods that pass an array from one to the other. This has kept my code nice and organized, but I'm worried that it's killing the efficiency of my code.
Can anyone confirm if this is the case?
Also, I have all of my code contained in a class separate from my UI. Is this going make things run significantly slower than if I had my code contained in the Form1.cs file?
Edit: There are about 95000 points that need to be calculated, each point goes through 7 methods that does additional calculations.
Have you tried any profiling or performance tools to narrow down why the slowdown occurs?
It might show you ways that you could use to refactor your code and improve performance.
This question asked by another user has several options that you can choose from:
Good .Net Profilers
No. This is not what is killing your code speed, unless many methods means like a million or something. You probably have more things iterating through your array than you need or realize, and the array itself may have a larger memory footprint than you realize.
Perhaps you should look into a design where instead of passing the array to 7 methods, you iterate the array once, passing the members to 7 methods, this will minimize the number of times you're iterating through 95000 members.
In general, function calls are basic enough to be highly optimized by any interpreter (or compiler). Therefore these do not produce to much blow-up in run time. In fact, if wrap your problem to, say, some fancy iterative solution, you save handling the stack, but instead have to handle some iteration variables, which will not be to hard.
I know, there have been programmers who wondered why their recursive algorithms have been so slow, until someone told them not to pass array entries by value.
You should provide some sample code. In general, you should for other bottlenecks, or find another algorithm.
Just need to run it against a good profiling tool. I've got some stuff I wished only took 4 seconds - works with upwards of a hundred million records in a pass.
An Array is a reference type not a value type. Therefore you never pass the array. You are actually passing the pointer to the array in memory. So passing the array isn't your issue. Most likely you have an issue with what you do with your array. You need to do what Jamie Keeling said and run it through a profiler or even just debug it and see if you get stuck in some big loops.
Why are you loading them all into an array and doing each method in turn rather than iterating through them as loaded?
If you can obtain them (from whatever input source) deal with them and output them (whether to screen, file our wherever) this will inevitably use less memory and reduce start-up time, at the very least.
If this answer is applicable to your situation, start by changing your methods to deal with enumerations rather than arrays (non-breaking change, since arrays are enumerations), then change your input method to yield return items as loaded rather than loading an entire array.
Sorry for posting an old link (.NET 1.1) but it was contained in VS2010 article, so:
Here you can read about method costs. (Initial link)
Then, if you start your code from VS (no matters, even in Release mode) the VS debugger connects to your code and slow it down.
I know that for this advise I will be minused but... The max performance will be achieved with unsafe operations with arrays (yes it's UNSAFE, but when there is performance deal, so...)
And the last - refactor your code to use minimum of methods which working with your arrays. It will improve the performance.

Using pre-defined instances

A few problem about instances are that Equals method doesn't provide true if even they contain same values. So trying to override Equals method provide much more slower than reference equality.
While i was thinking point of performance, i thought that it is stupidly to create 2 instance which is same but not same memory adress. If i can avoid to create same instances with different references will increase performance and help to compare references will be more easy than write custom Equal methods.
For example:
I have Coordinate class Which hold chess board coordinates.So i need only Coordinate[8,8] array for represent all cordinates of board. Instead of create instances, i can create all instances then my factory method can return them.
Cooardinate.Get(2,3) instead of new Coordinate(2,3)
First is static class's static method which returns pre-defined coordinate in given values.
Another advantage is that we will not spend time to create and collect garbage objects in memory. All of them is pre-defined already. Also we can provide unique GetHashCode for instances in easy and fast way like 0 for [0,0], 1 for [0,1]....
Isn't it worthy to try ? this idea would make coding harder write or understand ? Is there any such a pattern ?
Well, shortly what is disadvantages of this way ?
This is a good solution and in certain situations can save you a lot of time and memory. The main drawback is that it gets really complicated if your objects are mutable. If that is not the case, then this is really not that bad. You just have to make sure that all instances are obtained from the same factory. You do not even have to create all the instances in advance, but can make the class create a new instance when a particular set of parameters is requested for the first time (basically lazy-loading).
In this instance where you're dealing with a chess board (so 64 total coordinates) you don't need to be overly concerned with performance or garbage collection. Having said that, I think holding the baseline 64 coordinates in a dictionary is just fine (and makes sense). In terms of the Equals() comparison, you're basically comparing 2 integer values which is going to be lightning fast so overriding the Equals() method to specify that comparison is the right approach.
The short answer is that the disadvantage of this approach is that it increases the complexity of your code, thereby making it more difficult to code correctly, more difficult to debug, and more difficult to maintain.
Therefore, as with all optimisations, you should only contemplate it if you have a genuine need to optimise (e.g. it's running way too slow, or using way too much memory), and if the reward (faster performance, smaller memory usage) outweighs the risks (spending time optimising when you could be doing something more useful, introducing a bug, not being able to find a bug, not being being able to modify the code in the future quickly and easily, or without introducing a bug).
If you really were running into performance problems because of having to do deep comparisons in order to determine equality then you might want to look at the Flyweight pattern, but as you're only talking about 64 pairs of smallints I think that would be complete overkill in this case. What you actually need is to override the Equals operator to compare the two coordinates - that will be plenty fast enough. Anything more complex than that is probably a false optimisation, at least on most normal platforms.

What C# container is most resource-efficient for existence for only one operation?

I find myself often with a situation where I need to perform an operation on a set of properties. The operation can be anything from checking if a particular property matches anything in the set to a single iteration of actions. Sometimes the set is dynamically generated when the function is called, some built with a simple LINQ statement, other times it is a hard-coded set that will always remain the same. But one constant always exists: the set only exists for one single operation and has no use before or after it.
My problem is, I have so many points in my application where this is necessary, but I appear to be very, very inconsistent in how I store these sets. Some of them are arrays, some are lists, and just now I've found a couple linked lists. Now, none of the operations I'm specifically concerned about have to care about indices, container size, order, or any other functionality that is bestowed by any of the individual container types. I picked resource efficiency because it's a better idea than flipping coins. I figured, since array size is configured and it's a very elementary container, that might be my best choice, but I figure it is a better idea to ask around. Alternatively, if there's a better choice not out of resource-efficiency but strictly as being a better choice for this kind of situation, that would be nice as well.
With your acknowledgement that this is more about coding consistency rather than performance or efficiency, I think the general practice is to use a List<T>. Its actual backing store is an array, so you aren't really losing much (if anything noticable) to container overhead. Without more qualifications, I'm not sure that I can offer anything more than that.
Of course, if you truly don't care about the things that you list in your question, just type your variables as IEnumerable<T> and you're only dealing with the actual container when you're populating it; where you consume it will be entirely consistent.
There are two basic principles to be aware of regarding resource efficiency.
Runtime complexity
Memory overhead
You said that indices and order do not matter and that a frequent operation is matching. A Dictionary<T> (which is a hashtable) is an ideal candidate for this type of work. Lookups on the keys are very fast which would be beneficial in your matching operation. The disadvantage is that it will consume a little more memory than what would be strictly required. The usual load factor is around .8 so we are not talking about a huge increase or anything.
For your other operations you may find that an array or List<T> is a better option especially if you do not need to have the fast lookups. As long as you are not needing high performance on specialty operations (lookups, sorting, etc.) then it is hard to beat the general resource characteristics of array based containers.
List is probably fine in general. It's easy to understand (in the literate programming sense) and reasonably efficient. The keyed collections (e.g. Dict, SortedList) will throw an exception if you add an entry with a duplicate key, though this may not be a problem for what you're working on now.
Only if you find that you're running into a CPU-time or memory-size problem should you look at improving the "efficiency", and then only after determining that this is the bottleneck.
No matter which approach you use, there will still be creation and deletion of the underlying objects (collection or iterator) that will eventually be garbage collected, if the application runs long enough.

.NET: Scalability of generic Dictionary

I'm using a Dictionary<> to store a bazillion items. Is it safe to assume that as long as the server's memory has enough space to accommodate these bazillion items that I'll get near O(1) retrieval of items from it? What should I know about using a generic Dictionary as huge cache when performance is important?
EDIT: I shouldn't rely on the default implementations? What makes for a good hashing function?
It depends, just about entirely, on how good a hashing functionality your "bazillion items" support -- if their hashing function is not excellent (so that many conflicts result) your performance will degrade with the growth of the dictionary.
You should measure it and find out. You're the one who has knowledge of the exact usage of your dictionary, so you're the one who can measure it to see if it meets your needs.
A word of advice: I have in the past done performance analysis on large dictionary structures, and discovered that performance did degrade as the dictionary became extremely large. But it seemed to degrade here and there, not consistently on each operation. I did a lot of work trying to analyze the hash algorithms, etc, before smacking myself in the forehead. The garbage collector was getting slower because I had so much live working set; the dictionary was just as fast as it always was, but if a collection happened to be triggered, then that was eating up my cycles.
That's why it is important to not do performance testing in unrealistic benchmark scenarios; to find out what the real-world performance cost of your bazillion-item dictionary is, well, that's going to be gated on lots of stuff that has nothing to do with your dictionary, like how much collection triggering is happening throughout the rest of your program, and when.
Yes you will have O(1) access times. In fact to be pedantic g it will be exactly O(1).
You need to ensure that all your objects that are used as keys have a good GetHashCode implementation and should likely override Equals.
Edit to clarify: In reality acess times will get slower the more items you have unless you can provide a "perfect" hash function.
Yes, you will have near O(1) no matter how many objects you put into the Dictionary. But for the Dictionary to be fast, your key-objects should provide a sufficient GetHashCode-implementation, because Dictionary uses a hashtable inside.

Long lists of pass-by-ref parameters versus wrapper types

I need to get three objects out of a function, my instinct is to create a new type to return the three refs. Or if the refs were the same type I could use an array. However pass-by-ref is easier:
private void Mutate_AddNode_GetGenes(ref NeuronGene newNeuronGene, ref ConnectionGene newConnectionGene1, ref ConnectionGene newConnectionGene2)
{
}
There's obviously nothing wrong with this but I hesitate to use this approach, mostly I think for reasons of aesthetics and psycholgical bias. Are there actually any good reasons to use one of these approaches over the others? Perhaps a performance issue with creating extra wrapper objects or pushing parameters onto the stack. Note that in my particular case this is CPU intensive code. CPU cycles matter.
Is there a more elegant C#2 of C#3 approach?
Thanks.
For almost all computing problems, you will not notice the CPU difference. Since your sample code has the word "Gene" in it, you may actually fall into the rare category of code that would notice.
Creating and destroying objects just to wrap other objects would cost a bit of performance (they need to be created and garbage collected after all).
Aesthetically I would not create an object just to group unrelated objects, but if they logically belong together it is perfectly fine to define a containing object.
If you're worrying about the performance of a wrapping type (which is a lot cleaner, IMHO), you should use a struct. Current 32-bits implementations of .NET (and the upcomming 64-bits 4.0) support inlining / optimizing away of structs in many cases, so you'd probably see no performance difference whatsoever between a struct and ref arguments.
Worrying about the relative execution speed of those two options is probably a premature optimization. Focus on getting the algorithm correct first, and having clean, maintainable code. When that's done, you can run a profiler on it and optimize the 20% of the code that takes 80% of the CPU time. Even if this method ends up being in that 20%, the difference between the two calling styles is probably to small to register.
So, performance issues aside, I'd probably use a container class. Since this method takes only those three parameters, and (presumably) modifies each one, it sounds like it would make sense to have it as a method of the container class, with three member variables instead of ref parameters.

Categories

Resources