Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm just trying to determine the impact of each "if" statement on the performance of my C# application when it is used in cycles with large number of iterations. I have not found the topic about this so I have created this one.
For the test I made 2 cycles: one without "if" and one with a single "if" statement. The code is the following.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Diagnostics;
namespace IfPerformance
{
class Program
{
static void Main(string[] args)
{
int N = 500000000;
Stopwatch sw = new Stopwatch();
double a = 0, b = 0;
bool f;
sw.Restart();
for (int i = 0; i < N; i++)
{
a += 1.1;
f = a < N;
}
sw.Stop();
Console.WriteLine("Without if: " + sw.ElapsedMilliseconds + " ms");
a = 0;
sw.Restart();
for (int i = 0; i < N; i++)
{
if (a < N)
a += 1.1;
else
b += 1.1;
}
sw.Stop();
Console.WriteLine("With if: " + sw.ElapsedMilliseconds + " ms");
Console.ReadKey();
}
}
}
I ran the test with "Optimize code" build option and "Start without debugging". The result is the following:
Without if: 154 ms
With if: 742 ms
This means that a single "if" statement brings almost 5 times slowdown to the performance. I think regarding this will be helpful.
Also, I have noticed that the presence of several extra "if"s in a large loop may slow down my final application by 25%, which on my opinion is significant.
To be specific, I'm running Monte-Carlo optimization on a set of data, which require many loops through the whole data set. The loop contains branching which depends on the user settings. From this point "if"s arise.
My questions to the professionals in performance aspects are:
What is the impact of extra "if"s in a loop on the time of running many iterations?
How to avoid the slowdown?
Please post your opinion if I'm going in the wrong direction.
It doesn't matter ...
You're testing 500 MILLION iterations ... and it takes less than a second ... IN THE WORST case ...
As comments said, you'll be in a hell of a trouble to begin with, since you won't be running in debug for testing performance, and even then, you'll have heaps of other things to take into consideration (it's a whole big world about performance testing, and it's not as simple as it seems usually).
Now, do notice that you're doing two different things in the two places. If you would like to see the performance of the if, you should have them do basically the same. I'm sure the branching changes the IL code to begin with ...
Last, but not least, as I said again ... it DOESTN'T MATTER ... unless you really need to run 500 MILLION times, and have this in so many places that your program starts to slow down because of that.
Go for readability over obsessing if you can save some micro seconds on an if statement
Feel free to read these articles by Eric Lippert (who has "only" 250K rep and i̶s̶ was a principal developer on the C# compiler team :) who'll get you on the right direction:
c# performance benchmarks mistakes part 1
c# performance benchmarks mistakes part 2
c# performance benchmarks mistakes part 3
c# performance benchmarks mistakes part 4
(Talking about this, I would guess that garbage collection (article 4) might have been something to consider ...)
Then look at: this elaborate answer about the topic
And last, but not least, have a look at Writing Faster Managed Code: Know What Things Cost. This is by Jan Gray, from the Microsoft CLR Performance Team. I'll be honest and say I didn't read this one yet :). I Will though, later on...
It goes on an on ... :)
These code sample are two different codes one is a boolean assignment and the other one is condition statement so this is not suitable method to evaluate performance
Those benchmarks tell you essentially nothing at all.
There are much more things at play than just an additional if.
You also have to take branch-prediction and caching into account.
Such micro optimizations will only hinder you writing good code.
You will spend more time optimizing useless stuff than you spend time implementing good features in your software...
Think of it this way, no kind of optimization will help you if you have even a single design mistake in your code.
For example using a unfitting datastructure (for example a list for 'fast' lookup instead of a dictionary).
Related
I ran this on a laptop, 64-bit Windows 8.1, 2.2 Ghz Intel Core i3. The code was compiled in release mode and ran without a debugger attached.
static void Main(string[] args)
{
calcMax(new[] { 1, 2 });
calcMax2(new[] { 1, 2 });
var A = GetArray(200000000);
var stopwatch = new Stopwatch();
stopwatch.Start(); stopwatch.Stop();
GC.Collect();
stopwatch.Reset();
stopwatch.Start();
calcMax(A);
stopwatch.Stop();
Console.WriteLine("caclMax - \t{0}", stopwatch.Elapsed);
GC.Collect();
stopwatch.Reset();
stopwatch.Start();
calcMax2(A);
stopwatch.Stop();
Console.WriteLine("caclMax2 - \t{0}", stopwatch.Elapsed);
Console.ReadKey();
}
static int[] GetArray(int size)
{
var r = new Random(size);
var ret = new int[size];
for (int i = 0; i < size; i++)
{
ret[i] = r.Next();
}
return ret;
}
static int calcMax(int[] A)
{
int max = int.MinValue;
for (int i = 0; i < A.Length; i++)
{
max = Math.Max(max, A[i]);
}
return max;
}
static int calcMax2(int[] A)
{
int max1 = int.MinValue;
int max2 = int.MinValue;
for (int i = 0; i < A.Length; i += 2)
{
max1 = Math.Max(max1, A[i]);
max2 = Math.Max(max2, A[i + 1]);
}
return Math.Max(max1, max2);
}
Here are some statistics of program performance (time in miliseconds):
Framework 2.0
X86 platform:
2269 (calcMax)
2971 (calcMax2)
[winner calcMax]
X64 platform:
6163 (calcMax)
5916 (calcMax2)
[winner calcMax2]
Framework 4.5 (time in miliseconds)
X86 platform:
2109 (calcMax)
2579 (calcMax2)
[winner calcMax]
X64 platform:
2040 (calcMax)
2488 (calcMax2)
[winner calcMax]
As you can see the performance is different depend on framework and choosen compilied platform. I see generated IL code and it is the same for each cases.
The calcMax2 is under test because it should use "pipelining" of processor. But it is faster only with framework 2.0 on 64-bit platform. So, what is real reason of shown case in different performance?
Just some notes worth mentioning. My processor (Haswell i7) doesn't compare well with yours, I certainly can't get close to reproducing the outlier x64 result.
Benchmarking is a hazardous exercise and it is very easy to make simple mistakes that can have big consequences on execution time. You can only truly see them when you look at the generated machine code. Use Tools + Options, Debugging, General and untick the "Suppress JIT optimization" option. That way you can look at the code with Debug > Windows > Disassembly and not affect the optimizer.
Some things you'll see when you do this:
You made a mistake, you are not actually using the method return value. The jitter optimizer opportunities like this where possible, it completely omits the max variable assignment in calcMax(). But not in calcMax2(). This is a classic benchmarking oops, in a real program you'd of course use the return value. This makes calcMax() look too good.
The .NET 4 jitter is smarter about optimizing Math.Max(), in can generate the code inline. The .NET 2 jitter couldn't do that yet, it has to make a call to a CLR helper function. The 4.5 test should thus run a lot faster, that it didn't is a strong hint at what really throttles the code execution. It is not the processor's execution engine, it is the cost of accessing memory. Your array is too large to fit in the processor caches so your program is bogged down waiting for the slow RAM to supply the data. If the processor cannot overlap that with executing instructions then it just stalls.
Noteworthy about calcMax() is what happens to the array-bounds check that C# performs. The jitter knows how to completely eliminate it from the loop. It however isn't smart enough to do the same in calcMax2(), the A[i + 1] screws that up. That check doesn't come for free, it should make calcMax2() quite a bit slower. That it doesn't is again a strong hint that memory is the true bottleneck. That's pretty normal btw, array bound checking in C# can have low to no overhead because it is so much cheaper than the array element access.
As for your basic quest, trying to improve super-scalar execution opportunities, no, that's not how processors work. A loop is not a boundary for the processor, it just sees a different stream of compare and branch instructions, all of which can execute concurrently if they don't have inter-dependencies. What you did by hand is something the optimizer already does itself, an optimization called "loop unrolling". It selected not to do so in this particular case btw. An overview of jitter optimizer strategies is available in this post. Trying to outsmart the processor and the optimizer is a pretty tall order and getting a worse result by trying to help is certainly not unusual.
Many of the differences that you see are well within the range of tolerance, so they should be considered as no differences.
Essentially, what these numbers show is that Framework 2.0 was highly unoptimized for X64, (no surprise at all here,) and that overall, calcMax performs slightly better than calcMax2. (No surprise there either, because calcMax2 contains more instructions.)
So, what we learn is that someone came up with a theory that they could achieve better performance by writing high-level code that somehow takes advantage of some pipelining of the CPU, and that this theory was proved wrong.
The running time of your code is dominated by the failed branch predictions that are occurring within Math.max() due to the randomness of your data. Try less randomness (more consecutive values where the 2nd one will always be greater) and see if it gives you any better insights.
Every time you run the program, you'll get slightly different results.
Sometimes calcMax will win, and sometimes calcMax2 will win. This is because there is a problem comparing performance that way. What StopWhatch measures is the time elapsed since stopwatch.Start() is called, until stopwatch.Stop() is called. In between, things independent of your code can occur. For example, the operating system can take the processor from your process and give it for a while to another process running on your machine, due to the end of your process's time slice. after a while, your process gets the processor back for another time slice.
Such occurrences cannot be controlled or foreseen by your comparison code, and thus the entire experiment shouldn't be treated as reliable.
To minimize this kind of measurement errors, you should measure every function many times (for example, 1000 times), and calculate the average time of all measurements. This method of measurement tends to significantly improve the reliability of the result, as it is more resilient to statistical errors.
I wrote a run-once program to read data from one table and migrate what was read into several other tables (using LINQ). It was one Main() method that extracted the data, transformed it where needed, converted some fields, etc. and inserted the data into the appropriate tables. Basically, just migrating data from one format to another. The program would take about 5 minutes to run, but it did what I needed.
While looking at the program, I thought I'd break up the huge Main() method into smaller chunks. Basically, I just refactored areas of the code and extracted them to methods.
The program still does what it's supposed to, migrate data, but it takes twice as long now, if not longer.
So, my question is: Do method calls slow down processing? None of the code itself changed, other than being put inside its own method.
Yes, function calls generally have a cost but it's not usually very high unless your code has been refactored to a point where every function has only one line, or you're calling them billions of times :-)
The question you have to ask yourself is: do the benefits outweigh the cost?
Modularising your code will almost certainly make it easier to maintain, unless it's some Mickey-Mouse Hello-World type of program.
The other question you have to ask is, if it's run-once, why did you bother trying to improve it? If five minutes is acceptable, then the effort you spent improving it seems like a sunk cost to me. If it's going to be used a lot, or by many other people, that's one thing. But, if you're only running it (for example) once a month, why bother?
If you really want to know where the bottlenecks are, Microsoft have spent some time making it easy for you.
Though not a huge sample, consider the following C program (since that's my area of expertise):
#include <stdio.h>
void xyzzy(int argc, char *argv[]) {}
int main (void) {
int x = argc;
for (int i = 0; i < 1000; i++) {
for (int j = 0; j < 1000000; j++) {
x = x + 1;
//xyzzy();
}
}
printf ("%d\n", x);
return 0;
}
When compiled (without any optimisation since I don't want the compiler second-guessing me, and using trickery to reduce the chances of the compiler weaving any magic before running the code), the figures I get for with and without the function call (five separate runs each, using sys+user times from the time command) are:
with without
------- -------
2.452 2.264
2.451 2.358
2.468 2.342
2.390 2.233
2.374 2.249
------- -------
12.135 11.446 total
2.468 2.358 max
2.374 2.233 min
So what can we tell from that, apart from the fact I'm a lousy statistician? :-)
It appears, based on the total that the one without the function call is about 6% faster. It's also telling that the fastest run with the function call is still slower than the slowest run without it.
What's more efficient?
decimal value1, value2, formula
This:
for(int i = 0; i>1000000000000; i++);
{
value1 = getVal1fromSomeWhere();
value2 = getVal2fromSomeWhere();
SendResultToA( value1*value2 + value1/value2);
SendResultToB( value1*value2 + value1/value2);
}
Or this:
for(int i = 0; i>1000000000000; i++)
{
value1 = getVal1fromSomeWhere();
value2 = getVal2fromSomeWhere();
formula = value1*value2 + value1/value2;
SendResultToA(formula);
SendResultToA(formula);
}
Intuitively I would go for the latter...
I guess there's a tradeoff between having an extra-assignment at each iteration (decimal, formula) and performing the computation on and on with no extra-variable...
EDIT :
Uhhh. God... Do I Have to go through this each time I ask a question ?
If I ask it, it is because YES it DOES MATTER to me, fellows.
Everybody does not live in a gentle non-memory-critical world, WAKE-UP !
this was just an overly simple example. I am doing MILLIONS of scientific computation and clouding multithreaded stuff, do not take me for a noob :-)
So YES, DEFINITELY every nanosecond counts.
PS : I almost regret C++ and pointers. Automatic Memory Management and GC's definitely made developers ignorant and lazy :-P
First of all profile first, and only do such micro optimizations if it's necessary. Else optimize for readability. And in your case I think the second one is easier to read.
And your statement that the second code has an additional assignment isn't true anyways. The result of your formula needs to be stored into a register in both codes.
The concept of the extra variable isn't valid once the code is compiled. For example in your case the compiler can store formula in the register where value1 or value2 was stored before, since their lifetimes don't overlap.
I wouldn't be surprised if the first one gets optimized to the second one. I think this optimization is called "Common subexpression folding". But of course it's only possible if the expression is free of side-effects.
And inspecting the IL isn't always enough to see what gets optimized. The jitter optimizes too. I had some code that was quite ugly and slow looking in IL, but very short in the finally generated x86 code. And when inspecting the machine code you need to make sure it's actually optimized. For example if you run in VS even the release code isn't fully optimized.
So my guess is that they are equally fast if the compiler can optimize them, and else the second one is faster since it doesn't need to evaluate your formula twice.
Unless you're doing this tens of thousands of times a second, it doesn't matter at all. Optimize towards readability and maintainability!
Edit: Haters gonna hate, okay fine, here you go. My code:
static void MethodA()
{
for (int i = 0; i < 1000; i++) {
var value1 = getVal1fromSomeWhere();
var value2 = getVal2fromSomeWhere();
SendResultToA(value1 * value2 + value1 / value2);
SendResultToB(value1 * value2 + value1 / value2);
}
}
static void MethodB()
{
for (int i = 0; i < 1000; i++) {
var value1 = getVal1fromSomeWhere();
var value2 = getVal2fromSomeWhere();
var formula = value1 * value2 + value1 / value2;
SendResultToA(formula);
SendResultToB(formula);
}
}
And the actual x86 assembly generated by both of them:
MethodA: http://pastie.org/1532794
MethodB: http://pastie.org/1532792
These are very long because it inlined getVal[1/2]fromSomewhere and SendResultTo[A/B], which I wired up to Random and Console.WriteLine. We can see that indeed, the CLR nor the Jitter is not smart enough to not duplicate the previous calculation, so we spend an additional 318 bytes of x86 bytecode doing the extra math.
However, keep this in mind - any gains you make by these kinds of optimizations are immediately made irrelevant by even a single extra page fault or disk read/write. These days, CPUs are rarely the bottleneck in most applications - I/O and memory are. Optimize toward spatial locality (i.e using contiguous arrays so you hit less page faults), and reducing disk I/O and hard page faults (i.e. loading code you don't need requires the OS to fault it in).
To the extent that it might matter, I think you're right. And both are equally readable (arguably).
Remember, the number of loop iterations has nothing to do with the local memory requirements. You're only talking about a few extra bytes, (and the value is going to be put on the stack for passage to the function, anyway); whereas the cycles you save* by caching the result of the calculation does go down significantly with the number of iterations.
* That is, provided that the compiler doesn't do this for you. It would be instructive to look at the IL generated in each case.
You'd have to disassemble the bytecode and/or benchmark to be sure but I'd argue that this would probably be the same since it's trivial for the compiler to see that formula (in the loop scope) does not change and can quite easily be 'inlined' (substituted) directly.
EDIT: As user CodeInChaos correctly comments disassembling the bytecode might not be enough since it's possible the optimisation is only introduced after jitting.
Yes, I am using a profiler (ANTS). But at the micro-level it cannot tell you how to fix your problem. And I'm at a microoptimization stage right now. For example, I was profiling this:
for (int x = 0; x < Width; x++)
{
for (int y = 0; y < Height; y++)
{
packedCells.Add(Data[x, y].HasCar);
packedCells.Add(Data[x, y].RoadState);
packedCells.Add(Data[x, y].Population);
}
}
ANTS showed that the y-loop-line was taking a lot of time. I thought it was because it has to constantly call the Height getter. So I created a local int height = Height; before the loops, and made the inner loop check for y < height. That actually made the performance worse! ANTS now told me the x-loop-line was a problem. Huh? That's supposed to be insignificant, it's the outer loop!
Eventually I had a revelation - maybe using a property for the outer-loop-bound and a local for the inner-loop-bound made CLR jump often between a "locals" cache and a "this-pointer" cache (I'm used to thinking in terms of CPU cache). So I made a local for Width as well, and that fixed it.
From there, it was clear that I should make a local for Data as well - even though Data was not even a property (it was a field). And indeed that bought me some more performance.
Bafflingly, though, reordering the x and y loops (to improve cache usage) made zero difference, even though the array is huge (3000x3000).
Now, I want to learn why the stuff I did improved the performance. What book do you suggest I read?
CLR via C# by Jeffrey Richter.
It is such a great book that someone stolen it in my library together with C# in depth.
The CLR is not involved at all here, this should all be translated to straight machine code without calls into the CLR. The JIT compiler is responsible for generating that machine code, it has an optimizer that tries to come up with the most efficient code. It has limitations, it cannot spend a large amount of time on it.
One of the important things it does is figuring out what local variables should be stored in the CPU registers. That's something that changed when you put the Height property in a local variable. It possibly decided to store that variable in a register. But now there's one less available to store another variable. Like the x or y variable, one that's critical for speed. Yes, that will slow it down.
You got a bad diagnostic about the outer loop. That could possibly be caused by the JIT optimizer re-arranging the loop code, giving the profiler a harder time mapping the machine code back to the corresponding C# statement.
Similarly, the optimizer might have decided that you were using the array inefficiently and switched the indexing order back. Not so sure it actually does that, but not impossible.
Anyhoo, the only way you can get some insight here is by looking at the generated machine code. There are many decent books about x86 assembly code, although they might be a bit hard to find these days. Your starting point is Debug + Windows + Disassembly.
Keep in mind however that even the machine code is not a very good predictor of how efficient code is going to run. Modern CPU cores are enormously complicated and the machine code is no longer representative for what actually happens inside the core. The only tried and true way is what you've already been doing: trial and error.
Albin - no. Honestly I didn't think that running outside a profiler would change the performance difference, so I didn't bother. You think I should have? Has that been a problem for you before? (I am compiling with optimizations on though)
Running under a debugger changes the performance: when it's being run under a debugger, the just-in-time compiler automatically disables optimizations (to make it easier to debug)!
If you must, use the debugger to attach to an already-running already-JITted process.
One thing you should know about working with Arrays is that the CLR will always make sure that array-indices are not out-of-bounds. It has an optimization for 1-dimensional arrays but not for 2+ dimensions.
Knowing this, you may want to benchmark MyCell Data[][] instead of MyCell Data[,]
Hm, I don't think that the loop enrolling is the real problem.
1. I'd try to avoid accessing the array Data three times per inner loop.
2. I'd also recommend, to re-think the three Add statements: you are apparently accessing a collection three times to add trivial some data. Make it only one access per iteration and add a data type containing three entries:
for (int y = 0; ... {
tTemp = Data[x, y];
packedCells.Add(new {
tTemp.HasCar, tTemp.RoadState, tTemp.Population
});
}
Another look reveals, that you are basically vectorizing a matrix by copying it into an array (or some other sequential collection)... Is that necessary at all? Why don't you just define a specialized indexer which simulates that linear access? Even better, if you only need to enumerate the entries (in that example you do, no random access required), why don't you use an adequate LINQ expression?
Point 1) Educated guesses are not the way to do performance tuning. In this case I can guess about as well as most, but guessing is the wrong way to do it.
Point 2) Profilers need to be well understood before you know what they're actually telling you. Here's a discussion of the issues. For example, what many profilers do is tell you "where the program spends its time", i.e. where the program counter spends its time, so they are almost absolutely blind to time requested by function calls, which is what your inner loop seems to consist of.
I do a lot of performance tuning, and here is what I do. I cycle between two activities:
Overall time measurement. This doesn't require special tools. I'm not trying to measure individual routines.
"Bottleneck" location. This does not require running the code at any kind of speed, because I'm not measuring. What I'm doing is locating lines of code that are responsible for a significant percent of time. I know which lines they are because they are on the stack for that percent, and stack samples easily find them.
Once I find a "bottleneck" and fix it, I go back to the first step, measure what percent of time I saved, and do it all again on the next "bottleneck", typically from 2 to 6 times. I am helped by the "magnification effect", in which a fixed problem magnifies the percentage used by remaining problems. It works for both macro and micro optimization.
(Sorry if I can't write "bottleneck" without quotes, because I don't think I've ever found a performance problem that resembled the neck of a bottle. Rather they were all simply doing things that didn't really need to be done.)
Since the comment might be overseen, I repeat myself: it is quite cumbersome to optimize code which is per se overfluous. You do not really need to explicitely linearize your matrix at all, see the comment above: Define a linearizing adapter which implements IEnumerable<MyCell> and feed it into the formatter.
I am getting a warning when I try to add another answer, so I am going to recycle this one.. :) After reading Steve's comments and thinking about it for a while, I suggest the following:
If serializing a multi-dimensional array is too slow (haven't tryied, I just believe you...) don't use it at all! It appears, that your matrix is not sparse and has fixed dimensions. So define the structure holding your cells as simple linear array with indexer:
[Serializable()]
class CellMatrix {
Cell [] mCells;
public int Rows { get; }
public int Columns { get; }
public Cell this (int i, int j) {
get {
return mCells[i + Rows * j];
}
// setter...
}
// constructor taking rows/cols...
}
A thing like this should serialize as fast as native Array does... I don't recommend hard coding the layout of Cell in order to save few bytes there...
Cheers,
Paul
I have built an application that is used to simulate the number of products that a company can produce in different "modes" per month. This simulation is used to aid in finding the optimal series of modes to run in for a month to best meet the projected sales forecast for the month. This application has been working well, until recently when the plant was modified to run in additional modes. It is now possible to run in 16 modes. For a month with 22 work days this yields 9,364,199,760 possible combinations. This is up from 8 modes in the past that would have yielded a mere 1,560,780 possible combinations. The PC that runs this application is on the old side and cannot handle the number of calculations before an out of memory exception is thrown. In fact the entire application cannot support more than 15 modes because it uses integers to track the number of modes and it exceeds the upper limit for an integer. Baring that issue, I need to do what I can to reduce the memory utilization of the application and optimize this to run as efficiently as possible even if it cannot achieve the stated goal of 16 modes. I was considering writing the data to disk rather than storing the list in memory, but before I take on that overhead, I would like to get people’s opinion on the method to see if there is any room for optimization there.
EDIT
Based on a suggestion by few to consider something more academic then merely calculating every possible answer, listed below is a brief explanation of how the optimal run (combination of modes) is chosen.
Currently the computer determines every possible way that the plant can run for the number of work days that month. For example 3 Modes for a max of 2 work days would result in the combinations (where the number represents the mode chosen) of (1,1), (1,2), (1,3), (2,2), (2,3), (3,3) For each mode a product produces at a different rate of production, for example in mode 1, product x may produce at 50 units per hour where product y produces at 30 units per hour and product z produces at 0 units per hour. Each combination is then multiplied by work hours and production rates. The run that produces numbers that most closely match the forecasted value for each product for the month is chosen. However, because some months the plant does not meet the forecasted value for a product, the algorithm increases the priority of a product for the next month to ensure that at the end of the year the product has met the forecasted value. Since warehouse space is tight, it is important that products not overproduce too much either.
Thank you
private List<List<int>> _modeIterations = new List<List<int>>();
private void CalculateCombinations(int modes, int workDays, string combinationValues)
{
List<int> _tempList = new List<int>();
if (modes == 1)
{
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
_modeIterations.Add(_tempList);
}
else
{
for (int i = workDays + 1; --i >= 0; )
{
CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",");
}
}
}
This kind of optimization problem is difficult but extremely well-studied. You should probably read up in the literature on it rather than trying to re-invent the wheel. The keywords you want to look for are "operations research" and "combinatorial optimization problem".
It is well-known in the study of optimization problems that finding the optimal solution to a problem is almost always computationally infeasible as the problem grows large, as you have discovered for yourself. However, it is frequently the case that finding a solution guaranteed to be within a certain percentage of the optimal solution is feasible. You should probably concentrate on finding approximate solutions. After all, your sales targets are already just educated guesses, therefore finding the optimal solution is already going to be impossible; you haven't got complete information.)
What I would do is start by reading the wikipedia page on the Knapsack Problem:
http://en.wikipedia.org/wiki/Knapsack_problem
This is the problem of "I've got a whole bunch of items of different values and different weights, I can carry 50 pounds in my knapsack, what is the largest possible value I can carry while meeting my weight goal?"
This isn't exactly your problem, but clearly it is related -- you've got a certain amount of "value" to maximize, and a limited number of slots to pack that value into. If you can start to understand how people find near-optimal solutions to the knapsack problem, you can apply that to your specific problem.
You could process the permutation as soon as you have generated it, instead of collecting them all in a list first:
public delegate void Processor(List<int> args);
private void CalculateCombinations(int modes, int workDays, string combinationValues, Processor processor)
{
if (modes == 1)
{
List<int> _tempList = new List<int>();
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
processor.Invoke(_tempList);
}
else
{
for (int i = workDays + 1; --i >= 0; )
{
CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",", processor);
}
}
}
I am assuming here, that your current pattern of work is something along the lines
CalculateCombinations(initial_value_1, initial_value_2, initial_value_3);
foreach( List<int> list in _modeIterations ) {
... process the list ...
}
With the direct-process-approach, this would be
private void ProcessPermutation(List<int> args)
{
... process ...
}
... somewhere else ...
CalculateCombinations(initial_value_1, initial_value_2, initial_value_3, ProcessPermutation);
I would also suggest, that you try to prune the search tree as early as possible; if you can already tell, that certain combinations of the arguments will never yield something, which can be processed, you should catch those already during generation, and avoid the recursion alltogether, if this is possible.
In new versions of C#, generation of the combinations using an iterator (?) function might be usable to retain the original structure of your code. I haven't really used this feature (yield) as of yet, so I cannot comment on it.
The problem lies more in the Brute Force approach that in the code itself. It's possible that brute force might be the only way to approach the problem but I doubt it. Chess, for example, is unresolvable by Brute Force but computers play at it quite well using heuristics to discard the less promising approaches and focusing on good ones. Maybe you should take a similar approach.
On the other hand we need to know how each "mode" is evaluated in order to suggest any heuristics. In your code you're only computing all possible combinations which, anyway, will not scale if the modes go up to 32... even if you store it on disk.
if (modes == 1)
{
List<int> _tempList = new List<int>();
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
processor.Invoke(_tempList);
}
Everything in this block of code is executed over and over again, so no line in that code should make use of memory without freeing it. The most obvious place to avoid memory craziness is to write out combinationValues to disk as it is processed (i.e. use a FileStream, not a string). I think that in general, doing string concatenation the way you are doing here is bad, since every concatenation results in memory sadness. At least use a stringbuilder (See back to basics , which discusses the same issue in terms of C). There may be other places with issues, though. The simplest way to figure out why you are getting an out of memory error may be to use a memory profiler (Download Link from download.microsoft.com).
By the way, my tendency with code like this is to have a global List object that is Clear()ed rather than having a temporary one that is created over and over again.
I would replace the List objects with my own class that uses preallocated arrays to hold the ints. I'm not really sure about this right now, but I believe that each integer in a List is boxed, which means much more memory is used than with a simple array of ints.
Edit: On the other hand it seems I am mistaken: Which one is more efficient : List<int> or int[]