Sorry for my last question, my code was so stupid.
My base situation is: I want to construct a state tree which has 8! items in the last state. so the total count of iterations is about 100.000 (8!*2 + 7! + 6! + ... )
it currently takes less than one second, i need to construct it every time my artificial intelligence is making a move. Of course, the alpha/beta search is a solution but before thinking of that i want to optimize my code so i really have the best possible performance.
what i already did:
tried to replace every LINQ function with precalculations or collections with faster access (Dictionary), more precalculations for skipping whole operations, of course, some approximations to spare heavy calculations, using List constructors only when there's actually a change, if not, just use the reference.
there'll be more calculations coming so i really need more ideas for reducing. maybe something about what collection is fastest for my purpose.
My code
It's about the BuildChildNodes function and the called TryCollect function. My Constructor is doing some little precalculations. my state tree knows everything, even the cards which aren't actually shown.
as the comment came up: i'm not asking you to read and understand my code to provide content-wise advices. i'm asking you about the functions, operators, data types and classes i'm using and if there could be make a replacement which runs a bit faster. e.g. if there's a faster collection for my purpose or if you have a better idea to replace the collections constructor with a faster method reagarding of adding and removing afterwards.
Edit: okay List is definitely the best type i can use. i tried [] Arrays and even Dictionaries () and last of all i even tried LinkedLists. All with a significant loss.
I can see that RemoveAt() could be expensive at it is proportional to the size of the list.
You can always use the Visual Studio performance profiler to find out where you should optimize your code the most.
If you can find a way to use fixed-size arrays that you allocate when your program starts, instead of dynamically-allocated data structures like List, you will save of lot on memory allocation management overhead.
Related
I have an ArrayList that stores 100,000+ numbers inside of it. Each number is 10 digits in length or smaller. The program itself has data input into it, of which it loops through the user input to see if any of their numbers are already in the array using if ArrayList.Contains(userinput).
It would appear that when having an ArrayList of this size a LOT of memory is being used. Would there be a faster way to run this, E.g. Database or If TextFile.Contains(Line)?
You should use a List<T> to avoid boxing and save memory.
Using a HashSet<T> will be much faster, but will use a little more memory than a List<T>.
Depending on your precise scenario, a database would probably be best.
Another solution could be have array of 100.000+ elements in sorted order and use BinarySearch to find an element of interest.
Mush faster then Contains does and you do not need allocation of dictionary, so no additional memory consuption.
All these stuff is a subject for measuring to pick the right choice for you in your concrete scenario.
As soon as you have a concurrent read/write scenario, consider using System.Collections.Concurrent.ConcurrentDictionary<,>. It should provide better performance since it doesn't require locks around it's operations. However, if the operations are more complex than simple add/get/remove, then you'll still need locks and a HashSet<> should be faster (as SLaks has suggested).
I'm still quite new to C#, but noticed the advantages through forum postings of using a HashSet instead of a List in specific cases.
My current case isn't that I'm storing a tremendous amount of data in a single List exectly, but rather than I'm having to check for members of it often.
The catch is that I do indeed need to iterate over it as well, but the order they are stored or retrieved doesn't actually matter.
I've read that for each loops are actually slower than for next, so how else could I go about this in the fastest method possible?
The number of .Contains() checks I'm doing is definitely hurting my performance with lists, so at least comparing to the performance of a HashSet would be handy.
Edit: I'm currently using lists, iterating through them in numerous locations, and different code is being executed in each location. Most often, the current lists contain point coordinates that I then use to refer to a 2 dimensional array for that I then do some operation or another based on the criteria of the list.
If there's not a direct answer to my question, that's fine, but I assumed there might be other methods of iterating over a HashSet than just foreach cycle. I'm currently in the dark as to what other methods there might even be, what advantages they provide, etc. Assuming there are other methods, I also made the assumption that there would be a typical preferred method of choice that is only ignored when it doesn't suite the needs (my needs are pretty basic).
As far as prematurely optimizing, I already know using the lists as I am is a bottleneck. How to go about helping this issue is where I'm getting stuck. Not even stuck exactly, but I didn't want to re-invent the wheel by testing repeatedly only to find out I'm already doing it the best way I could (this is a large project with over 3 months invested, lists are everywhere, but there are definitely ones that I do not want duplicates, have a lot of data, need not be stored in any specific order, etc).
A foreach loop has a small amount of addition overhead on an indexed collections (like an array).
This is mostly because the foreach does a little more bounds checking than a for loop.
HashSet does not have an indexer so you have to use the enumerator.
In this case foreach is efficient as it only calls MoveNext() as it moves through the collection.
Also Parallel.ForEach can dramatically improve your performance, depending on the work you are doing in the loop and the size of your HashSet.
As mentioned before profiling is your best bet.
You shouldn't be iterating over a hashset in the first place to determine if an item is in it. You should use the HashSet (not the LINQ) contains method. The HashSet is designed such that it won't need to look through every item to see if any given value is inside of the set. That is what makes it so powerful for searching over a List.
Not strictly answering the question in the header, but more concerning your specific problem:
I would make your own Collection object that uses both a HashSet and a List internally. Iterating is fast as you can use the List, checking for Contains is fast as you can use the HashSet. Just make it an IEnumerable and you can use this Collection in foreach as well.
The downside is more memory, but there are only twice as many references to object, not twice as many objects. Worst case scenario it's only twice as much memory, but you seem much more concerned with performance.
Adding, checking, and iterating are fast this way, only removal is still O(N) because of the List.
EDIT: If removal needs to be O(1) as well, use a doubly linked list instead of a regular list, and make the hashSet a Dictionary<KeyType, Cell> instead. You can check the dictionary for Contains, but also to find the cell with the data in it fast, so removal from the data structure is fast.
I had the same issue, where the HashSet suits very well the addition of unique elements, but is very slow when getting elements in a for loop. I solved it by converting the HashSet to array and then running the for over it.
I'm writing a plug-in for a 3D modeling program. There is a a feature of the API where you can intercept the display pipeline and insert additional geometry that will be displayed with out actually being in the model (you can see it but you can't select/move/delete etc. etc..).
Part of this feature of the API is a method that gets called on every screen refresh that is used to tell the program what extra geometry to display. Right now I have a HashSet that is iterated through with a foreach statement. OnBrep is the generic geometry class of the API.
I have an additional command that will dump the "Ghost" geometry into the actual model. I've found, that if the geometry is actually in the model the display speeds up a lot. So I'm wondering if there is a faster way to provided the list of objects to the program? Would a simple one dimensional array be significantly faster than a HashSet<>?
The fastest way to return a collection of objects is to return either (a) the actual physical type that was used internally to build up the collection, or (b) a type that can be cast to in such a way that data is not copied in memory. As soon as you start copying data (e.g. CopyTo, ToArray, ToList, a copy constructor, etc) you have lost time.
Having said that, unless the number of items is large, this will be a micro-optimisation and therefore probably not worth doing. In that case, just return the collection type that would be of most use to the calling code. If you are unusure, do some timing tests rather than taking a guess.
This here is an extensive study on the performance of hashset/dictionary/generic list
But it's about key lookups
Personnaly I think that a normal or generic list is faster for a foreach operation since it involves no indexed items/overhead (esp inserting etc should be faster).... But this is just a gut feeling.
Usually when working with 3D graphics, you get the best performance if you manage to reduce the draw calls/state changes as much as possible.
In your case I'd try to reduce the draw calls to a minimum by merging your adorned geometry or trying to use some sort of batching feature if it's available.
It's very likely that the frame drop is not because of using a hash list/dictionary instead of an array. (Unless there's a broken/expensive hashing function somewhere...).
Are there any general rules when using recursion on how to avoid stackoverflows?
How many times you will be able to recurse will depend on:
The stack size (which is usually 1MB IIRC, but the binary can be hand-edited; I wouldn't recommend doing so)
How much stack each level of the recursion uses (a method with 10 uncaptured Guid local variables will be take more stack than a method which doesn't have any local variables, for example)
The JIT you're using - sometimes the JIT will use tail recursion, other times it won't. The rules are complicated and I can't remember them. (There's a blog post by David Broman back from 2007, and an MSDN page from the same author/date, but they may be out of date by now.)
How to avoid stack overflows? Don't recurse too far :) If you can't be reasonably sure that your recursion will terminate without going very far (I'd be worried at "more than 10" although that's very safe) then rewrite it to avoid recursion.
It really depends on what recursive algorithm you're using. If it's simple recursion, you can do something like this:
public int CalculateSomethingRecursively(int someNumber)
{
return doSomethingRecursively(someNumber, 0);
}
private int doSomethingRecursively(int someNumber, int level)
{
if (level >= MAX_LEVEL || !shouldKeepCalculating(someNumber))
return someNumber;
return doSomethingRecursively(someNumber, level + 1);
}
It's worth noting that this approach is really only useful where the level of recursion can be defined as a logical limit. In the case that this cannot occur (such as a divide and conquer algorithm), you will have to decide how you want to balance simplicity versus performance versus resource limitations. In these cases, you may have to switch between methods once you hit an arbritrary pre-defined limit. An effective means of doing this that I have used in the quicksort algorithm is to do it as a ratio of the total size of the list. In this case, the logical limit is a result of when conditions are no longer optimal.
I am not aware of any hard set to avoid stackoverflows. I personally try to ensure -
1. I have my base cases right.
2. The code reaches the base case at some point.
If you're finding yourself generating that many stack frames, you might want to consider unrolling your recursion into a loop.
Especially if you are doing multiple levels of recursion (A->B->C->A->B...) you might find that you can extract one of those levels into a loop and save yourself some memory.
The normal limit, if not much is left on the stack between successive calls, is around 15000-25000 levels deep. 25% of that if you are on IIS 6+.
Most recursive algorhitms can be expressed iteratively.
There are various way to increase allocated stack space, but I'll rather let you find an iterative version first. :)
Other than having a reasonable stack size and making sure you divide and conquer your problem such that you continually work on a smaller problem, not really.
I just thought of tail-recursion, but it turned out, that C# does not support it. However the .Net-Framework seems to support it:
http://blogs.msdn.com/abhinaba/archive/2007/07/27/tail-recursion-on-net.aspx
The default stack size for a thread is 1 MB, if you're running under the default CLR. However, other hosts may change that. For example the ASP host changes the default to 256 KB. This means that you may have code that runs perfectly well under VS, but breaks when you deploy it to the real hosting environment.
Fortunately you can specify a stack size, when you create a new thread by using the correct constructor. In my experience it is rarely necessary, but I have seen one case where this was the solution.
You can edit the PE header of the binary itself to change the default size. This is useful if you want to change the size for the main thread. Otherwise I would recommend using the appropriate constructor when creating threads.
I wrote a short article about this here. Basically, I pass an optional parameter called, depth, adding 1 to it each time I go deeper into it. Within the recursive method I check the depth for a value. If it is greater than the value I set, I throw an exception. The value (threshold) would be dependent on your applications needs.
Remember, if you have to ask about system limits, then you are probably doing something horribly wrong.
So, if you think you might get a stack overflow in normal operation then you need to think of a different approach to the problem.
It's not difficult to convert a recursive function into an iterative one, especially as C# has the Generic::Stack collection. Using the Stack type moves the memory used into the program's heap instead of the stack. This gives you the full address range to store the recursive data. If that isn't enough, it's not too difficult to page the data to disk. But I'd seriously consider other solutions if you get to this stage.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am wondering what kind of optimization techniques people often use nowadays. I have seen people do caching all the time with dictionary and all. Is the trading space for speed the only way to go?
Really it's about your choice in algorithms. Usually there is no "silver bullet" for optimization.
For example, using a StringBuilder instead of concatenation can make your code significantly faster, but there is a tradeoff. If you aren't concatenating huge sets of strings, the memory and time it takes to initialize StringBuilder is worse than just using regular concatenation. There are a lot of examples of this throughout the framework, such as dictionary caching as you mentioned in your question.
The only general optimization you can really learn and apply to your coding throughout your day is the performance hit from boxing/unboxing (heap vs. stack). To do this you need to learn what it's about and how to avoid, or reduce the need to do it.
Microsoft's MSDN documentation has 2 articles on performance that give a lot of good general purpose techniques to use (they're really just different versions of the same article).
http://msdn.microsoft.com/en-us/library/ms173196.aspx
http://msdn.microsoft.com/en-us/library/ms173196(VS.80).aspx
I will suggest below
1. Knowing when to use StringBuilder
You must have heard before that a StringBuilder object is much faster at appending strings together than normal string types.
The thing is StringBuilder is faster mostly with big strings. This means if you have a loop that will add to a single string for many iterations then a StringBuilder class is definitely much faster than a string type.
However if you just want to append something to a string a single time then a StringBuilder class is overkill. A simple string type variable in this case improves on resources use and readability of the C# source code.
Simply choosing correctly between StringBuilder objects and string types you can optimize your code.
2. Comparing Non-Case-Sensitive Strings
In an application sometimes it is necessary to compare two string variables, ignoring the cases. The tempting and traditionally approach is to convert both strings to all lower case or all upper case and then compare them, like such:
str1.ToLower() == str2.ToLower()
However repetitively calling the function ToLower() is a bottleneck in performace. By instead using the built-in string.Compare() function you can increase the speed of your applications.
To check if two strings are equal ignoring case would look like this:
string.Compare(str1, str2, true) == 0 //Ignoring cases
The C# string.Compare function returns an integer that is equal to 0 when the two strings are equal.
3. Use string.Empty
This is not so much a performance improvement as it is a readability improvement, but it still counts as code optimization. Try to replace lines like:
if (str == "")
with:
if (str == string.Empty)
This is simply better programming practice and has no negative impact on performance.
Note, there is a popular practice that checking a string's length to be 0 is faster than comparing it to an empty string. While that might have been true once it is no longer a significant performance improvement. Instead stick with string.Empty.
4. Replace ArrayList with List<>
ArrayList are useful when storing multiple types of objects within the same list. However if you are keeping the same type of variables in one ArrayList, you can gain a performance boost by using List<> objects instead.
Take the following ArrayList:
ArrayList intList = new ArrayList();
intList.add(10);
return (int)intList[0] + 20;
Notice it only contains intergers. Using the List<> class is a lot better. To convert it to a typed List, only the variable types need to be changed:
List<int> intList = new List<int>();
intList.add(10)
return intList[0] + 20;
There is no need to cast types with List<>. The performance increase can be especially significant with primitive data types like integers.
5. Use && and || operators
When building if statements, simply make sure to use the double-and notation (&&) and/or the double-or notation (||), (in Visual Basic they are AndAlso and OrElse).
If statements that use & and | must check every part of the statement and then apply the "and" or "or". On the other hand, && and || go thourgh the statements one at a time and stop as soon as the condition has either been met or not met.
Executing less code is always a performace benefit but it also can avoid run-time errors, consider the following C# code:
if (object1 != null && object1.runMethod())
If object1 is null, with the && operator, object1.runMethod()will not execute. If the && operator is replaced with &, object1.runMethod() will run even if object1 is already known to be null, causing an exception.
6. Smart Try-Catch
Try-Catch statements are meant to catch exceptions that are beyond the programmers control, such as connecting to the web or a device for example. Using a try statement to keep code "simple" instead of using if statements to avoid error-prone calls makes code incredibly slower. Restructure your source code to require less try statements.
7. Replace Divisions
C# is relatively slow when it comes to division operations. One alternative is to replace divisions with a multiplication-shift operation to further optimize C#. The article explains in detail how to make the conversion.
REFERENCE
There are often problems with algorithms as well, usually when something expensive is done inside of a loop. Generally, the first thing you do is profile your application, which will tell you the slowest part(s) of the application. Generally, what you do to speed up your application depends upon what you find. For example, if your application mimics a file system, it may be that you're calling the database recursively to travel up the tree (for instance). You may optimise that case by changing those recursive calls into one flattened database call that returns all of the data in one call.
Again, the answer is, as always, 'it depends'. However, more examples and advice can be found in Rico Mariani's blog (browse back a few years, as his focus has shifted):
Depends on a lot of things, really.
As an example, when memory becomes an issue and a lot of temporary objects are being created I tend to use object pools. (Having a garbage-collector is not a reason to not take care of memory allocation). If speed is what matters then I might use unsafe pointers to work with arrays.
Either way, if you find yourself struggling too much with optimization techniques in a c#/.net application you probably chose the wrong language/platform.
In general, make sure you understand the time complexity of different algorithms, and use that knowledge to choose your implementations wisely.
For .NET in particular, this article goes into great detail about optimizing code deployed to the CLR (though it's also relevant for Java, or any other modern platform), and is one of the best guides I've ever read:
http://msdn.microsoft.com/en-us/library/ms973852.aspx
To distill the article into one sentence: Nothing affects the speed of a .NET application (with sensible algorithms) more than the memory-footprint of its objects. Be very careful to minimize your memory consumption.
I would recommend Effective C# by Bill Wagner (first edition and second edition). He goes through a number of language constructs and techniques and explains which ones are faster and why. He touches on a lot of best practices as well.
More often than not, however, optimizing your algorithm will give you far better results than using any kind of language / optimization technique.