Finding a pattern within a string variable

Finding a pattern within a string variable - c#

I need to create a heads or tails project where the computer will guess randomly up to 5 times, but on the sixth time it will look into the playersGuessHistory variable setup as a string to see if it can find a match for a pattern of 4 entires. If there is a pattern found the computer will guess the next character after the pattern.
For example, given the sequence HHTTH the pattern is HHTT so the computer would guess H for the sixth turn. My only problem is that I'm having difficulty setting up the project so that it will look through the playersguesshistory and find the patterns and guess the next character in the history. Any suggestions?

Create a List<string> and throw the history into this, so that each item in the list is a string of 4 characters (like you show in your text). Then when the computer should guess select the items (there should be several) from the list that starts with (myList.StartsWith - method) your string, then you should sum up the amount of times that H is the next character, and the amount of times that T is the next character - calculate the probability of each of them and let the computer choose the one with the highest probability...
Does it make sense?

This is a little snippet based on what I understand of your requirement. The below method will return a string of guesses of 'H' heads or 'T' tails. The first 5 guesses are random, and then if any sequence of 4 guesses is HHTT the final guess will be 'H'.
static string HeadsOrTails()
{
string guessHistory = String.Empty;
// Guess heads or tails 5 times
Random random = new Random();
for (int currentGuess = 0; currentGuess < 5; currentGuess++)
{
if (random.Next(2) == 0)
guessHistory += 'H';
else
guessHistory += 'T';
}
// Analyse pattern of guesses
if (guessHistory.Substring(0, 4) == "HHTT" || guessHistory.Substring(1, 4) == "HHTT")
{
// If guess history contains HHTT then make the 6th guess = H
guessHistory += 'H';
}
return guessHistory;
}
This is a very simple implementation and will only work for 5 random initial guesses, but it should be quite easy to enhance as needed.

First of all, if the heads and tails are really random, like results from flipping an actual coin, this task is pointless. The computer will always get the next throw right with probability 1/2, regardless of any perceived patters in the history. (See "Independence".)
Now, if the heads and tails are not really random (e.g. they are created by a person calling heads or tails in a way he thinks is random), then we can maybe get the computer to have a higher success quote than 1/2.
I'd try the following: To start, check how often in the history.
heads are followed by heads
heads are followed by tails
and use these number for a guess on the transition probability H->H and H->T, do the same with the tails, and guess the next outcome based on the last one, choosing whatever seems more probable..
Says in the sequence "HHHTH", you find
- H->H: 2 of 3
- H->T: 1 of 3
- T->H: 1 of 1
Since the last throw came up heads, the computer should choose heads as the guess for the next throw.
Now, you can experiment with taking longer parts of the history into account, by counting the transitions "HH->T" and so on and try to improve your success rate.

Related

Magic square brute force algorithm

Basically for an assignment I need to create a C# program that will take a number as input (n) then create a 2D array size n*n with the numbers 1 to (n*n). It needs to use a brute force method. I have done this but at the moment the program will just randomly generate the order of the numbers each time, so it will sometimes check the same order more than once. Obviously this means it takes a really long time to check any number above 3, and even for 3 it can take a few minutes. Basically I'm wondering if there is any way for me to make it only check each order once. I am only allowed to use "basic" C# functions, so just things like *, /, +, - and nothing like .Shuffle etc.

Let me make sure I understand the question: you wish to enumerate all permutations of the numbers 1 through n squared, and check whether the permutation produces a magic square. You are now generating permutations randomly, but instead you wish to generate all permutations.
I wrote a series of articles on generating permutations; it is too long to easily summarize here.
http://ericlippert.com/2013/04/15/producing-permutations-part-one/

Choosing random order, as you found, is not a good idea.
I suggest that you put all the number 1 ... (n*n) in array , and than find all the permutation.
when you have all the permutation, it's easy to create square (1 .. n ==> the first row, n+1 ... 2n ==> the second row and so on).
Now, finding all the permutation can be done with the basic operation with recursion

Genetic algorithm initial seed diversity using value encoding C#

I would like to know the following:
How to effectively make initial generation of chromosomes with high diversity using value encoding ?
One way is grid initialization, but it is too slow.
Till now I have been using Random class from .NET for choosing random values in value encoding but, although values are uniformly distributed, fitness function values calculated from such chromosomes are not. Here is a code for Chromosome initialization:
public Chromosome(Random rand)
{
Alele = new List<double>();
for (int i = 0; i < ChromosomeLength; i++)
{
Alele.Add(rand.NextDouble() * 2000 - 1000);
}
}
So, I developed a function that calculates fitness from new, randomly made chromosome (upper code) and if fitness is similar to any other already in the list of chromosomes, a new chromosome is made randomly and his fitness is calculated and this process is repeated until his fitness is not different enough from those already in the list.
Here is the code for this part:
private bool CheckSimilarFitnes(List<Chromosome> chromosome, Chromosome newCandidate)
{
Boolean flag=false;
double fitFromList, fitFromCandidate;
double fitBigger,fitSmaller;
foreach (var listElement in chromosome)
{
fitFromList = listElement.CalculateChromosomeFitness(listElement.Alele);
fitFromCandidate = newCandidate.CalculateChromosomeFitness(newCandidate.Alele);
fitBigger = fitFromList >= fitFromCandidate ? fitFromList : fitFromCandidate;
fitSmaller = fitFromList < fitFromCandidate ? fitFromList : fitFromCandidate;
if ((fitFromList / fitFromCandidate) < 1.5)
return false
}
else return true;
}
But, the more chromosomes I have in the list it takes longer to add a new one, with fitness that is enough different from others already in there.
So, is there a way to make this grid initialization more faster, it takes days to make 80 chromosomes like this?

here's some code that might help (which I just wrote): GA for ordering 10 values spaced by 1.0. It starts with a population of 100 completely random alleles, which is exactly how your code starts.
The goal I gave the GA to solve was to order the values in increasing order with a separation of 1.0. It does this in the fitness function Eval_OrderedDistance by by computing the standard deviation of each pair of samples from 1.0. As the fitness tends toward 0.0, the alleles should start to appear in sequential order.
Generation 0's fittest Chromosome was completely random, as were the rest of the Chromosomes. You can see the fitness value is very high (i.e., bad):
GEN: fitness (allele, ...)
0: 375.47460 (583.640, -4.215, -78.418, 164.228, -243.982, -250.237, 354.559, 374.306, 709.859, 115.323)
As the generations continue, the fitness (standard deviation from 1.0) decreases until it's nearly perfect in generation 100,000:
100: 68.11683 (-154.818, -173.378, -170.846, -193.750, -198.722, -396.502, -464.710, -450.014, -422.194, -407.162)
...
10000: 6.01724 (-269.681, -267.947, -273.282, -281.582, -287.407, -293.622, -302.050, -307.582, -308.198, -308.648)
...
99999: 0.67262 (-294.746, -293.906, -293.114, -292.632, -292.596, -292.911, -292.808, -292.039, -291.112, -290.928)
The interesting parts of the code are the fitness function:
// try to pack the aleles together spaced apart by 1.0
// returns the standard deviation of the samples from 1.0
static float Eval_OrderedDistance(Chromosome c) {
float sum = 0;
int n = c.alele.Length;
for(int i=1; i<n; i++) {
float diff = (c.alele[i] - c.alele[i-1]) - 1.0f;
sum += diff*diff; // variance from 1.0
}
return (float)Math.Sqrt(sum/n);
}
And the mutations. I used a simple crossover and a "completely mutate one allele":
Chromosome ChangeOne(Chromosome c) {
Chromosome d = c.Clone();
int i = rand.Next() % d.alele.Length;
d.alele[i] = (float)(rand.NextDouble()*2000-1000);
return d;
}
I used elitism to always keep one exact copy of the best Chromosome. Then generated 100 new Chromosomes using mutation and crossover.
It really sounds like you're calculating the variance of the fitness, which does of course tell you that the fitnesses in your population are all about the same. I've found that it's very important how you define your fitness function. The more granular the fitness function, the more you can discriminate between your Chromosomes. Obviously, your fitness function is returning similar values for completely different chromosomes, since your gen 0 returns a fitness variance of 68e-19.
Can you share your fitness calculation? Or what problem you're asking the GA to solve? I think that might help us help you.
[Edit: Adding Explicit Fitness Sharing / Niching]
I rethought this a bit and updated my code. If you're trying to maintain unique chromosomes, you have to compare their content (as others have mentioned). One way to do this would be to compute the standard deviation between them. If it's less than some threshold, you can consider them the same. From class Chromosome:
// compute the population standard deviation
public float StdDev(Chromosome other) {
float sum = 0.0f;
for(int i=0; i<alele.Length; i++) {
float diff = other.alele[i] - alele[i];
sum += diff*diff;
}
return (float)Math.Sqrt(sum);
}
I think Niching will give you what you'd like. It compares all the Chromosomes in the population to determine their similarity and assigns a "niche" value to each. The chromosomes are then "penalized" for belonging to a niche using a technique called Explicit Fitness Sharing. The fitness values are divided by the number of Chromosomes in each niche. So if you have three in niche group A (A,A,A) instead of that niche being 3 times as likely to be chosen, it's treated as a single entity.
I compared my sample with Explicit Fitness Sharing on and off. With a max STDDEV of 500 and Niching turned OFF, there were about 18-20 niches (so basically 5 duplicates of each item in a 100 population). With Niching turned ON, there were about 85 niches. Thats 85% unique Chromosomes in the population. In the output of my test, you can see the diversity after 17000 generations.
Here's the niching code:
// returns: total number of niches in this population
// max_stddev -- any two chromosomes with population stddev less than this max
// will be grouped together
int ComputeNiches(float max_stddev) {
List<int> niches = new List<int>();
// clear niches
foreach(var c in population) {
c.niche = -1;
}
// calculate niches
for(int i=0; i<population.Count; i++) {
var c = population[i];
if( c.niche != -1) continue; // niche already set
// compute the niche by finding the stddev between the two chromosomes
c.niche = niches.Count;
int count_in_niche = 1; // includes the curent Chromosome
for(int j=i+1; j<population.Count; j++) {
var d = population[j];
float stddev = c.StdDev(d);
if(stddev < max_stddev) {
d.niche = c.niche; // same niche
++count_in_niche;
}
}
niches.Add(count_in_niche);
}
// penalize Chromosomes by their niche size
foreach(var c in population) {
c.niche_scaled_fitness = c.scaled_fitness / niches[c.niche];
}
return niches.Count;
}
[Edit: post-analysis and update of Anton's code]
I know this probably isn't the right forum to address homework problems, but since I did the effort before knowing this, and I had a lot of fun doing it, I figure it can only be helpful to Anton.
Genotip.cs, Kromosom.cs, KromoMain.cs
This code maintains good diversity, and I was able in one run to get the "raw fitness" down to 47, which is in your case the average squared error. That was pretty close!
As noted in my comment, I'd like to try to help you in your programming, not just help you with your homework. Please read these analysis of your work.
As we expected, there was no need to make a "more diverse" population from the start. Just generate some completely random Kromosomes.
Your mutations and crossovers were highly destructive, and you only had a few of them. I added several new operators that seem to work better for this problem.
You were throwing away the best solution. When I got your code running with only Tournament Selection, there would be one Kromo that was 99% better than all the rest. With tournament selection, that best value was very likely to be forgotten. I added a bit of "elitism" which keeps a copy of that value for the next generation.
Consider object oriented techniques. Compare the re-write I sent you with my original code.
Don't duplicate code. You had the sampling parameters in two different classes.
Keep your code clean. There were several unused parts of code. Especially when submitting questions to SO, try to narrow it down, remove unused code, and do some cleaning up.
Comment your code! I've commented the re-work significantly. I know it's Serbian, but even a few comments will help someone else understand what you are doing and what you intended to do.
Overall, nice job implementing some of the more sophisticated things like Tournament Selection
Prefer double[] arrays instead of List. There's less overhead. Also, several of your List temp variables weren't even needed. Your structure
List temp = new List();
for(...) {
temp.add(value);
}
for(each value in temp) {
sum += value
}
average = sum / temp.Count
can easily be written as:
sum = 0
for(...) {
sum += value;
}
average = sum / count;
In several places you forgot to initialize a loop variable, which could have easily added to your problem. Something like this will cause serious problems, and it was in your fitness code along with one or two other places
double fit = 0;
for(each chromosome) {
// YOU SHOULD INITIALIZE fit HERE inside the LOOP
for(each allele) {
fit += ...;
}
fit /= count;
}
Good luck programming!

The basic problem here is that most randomly generated chromosomes have similar fitness, right? That's fine; the idea isn't for your initial chromosomes to have wildly different fitnesses; it's for the chromosomes themselves to be different, and presumably they are. In fact, you should expect the initial fitness of most of your first generation to be close to zero, since you haven't run the algorithm yet.
Here's why your code is so slow. Let's say the first candidate is terrible, basically zero fitness. If the second one has to be 1.5x different, that really just means it has to be 1.5x better, since it can't really get worse. Then the next one has to 1.5x better than that, and so on up to 80. So what you're really doing is searching for increasingly better chromosomes by generating completely random ones and comparing them to what you have. I bet if you logged the progress, you'd find it takes more and more time to find the subsequent candidates, because really good chromosomes are hard to find. But finding better chromosomes is what the GA is for! Basically what you've done is optimize some of the chromosomes by hand before, um, actually optimizing them.
If you want to ensure that your chromosomes are diverse, compare their content, don't compare their fitness. Comparing the fitness is the algo's job.

I'm going to take a quick swing at this, but Isaac's pretty much right. You need to let the GA do its job. You have a generation of individuals (chromosomes, whatever), and they're all over the scale on fitness (or maybe they're all identical).
You pick some good ones to mutate (by themselves) and crossover (with each other). You maybe use the top 10% to generate another full population and throw out the bottom 90%. Maybe you always keep the top guy around (Elitism).
You iterate at this for a while until your GA stops improving because the individuals are all very much alike. You've ended up with very little diversity in your population.
What might help you is to 1) make your mutations more effective, 2) find a better way to select individuals to mutate. In my comment I recommended AI Techniques for Game Programmers. It's a great book. Very easy to read.
To list a few headings from the book, the things you're looking for are:
Selection techniques like Roulette Selection (on stackoveflow) (on wikipedia) and Stochastic Universal Sampling, which control how you select your individuals. I've always liked Roulette Selection. You set the probabilities that an individual will be selected. It's not just simple white-noise random sampling.
I used this outside of GA for selecting 4 letters from the Roman alphabet randomly. I assigned a value from 0.0 to 1.0 to each letter. Every time the user (child) would pick the letter correctly, I would lower that value by, say 0.1. This would increase the likelihood that the other letters would be selected. If after 10 times, the user picked the correct letter, the value would be 0.0, and there would be (almost) no chance that letter would be presented again.
Fitness Scaling techniques like Rank Scaling, Sigma Scaling, and Boltzmann Scaling (pdf on ftp!!!) that let you modify your raw fitness values to come up with adjusted fitness values. Some of these are dynamic, like Boltzmann Scaling, which allows you to set a "pressure" or "temperature" that changes over time. Increased "pressure" means that fitter individuals are selected. Decreased pressure means that any individual in the population can be selected.
I think of it this way: you're searching through multi-dimensional space for a solution. You hit a "peak" and work your way up into it. The pressure to be fit is very high. You snug right into that local maxima. Now your fitness can't change. Your mutations aren't getting you out of the peak. So you start to reduce the pressure and just, oh, select items randomly. Your fitness levels start to drop, which is okay for a while. Then you start to increase the pressure again, and surprise! You've skipped out of the local maxima and found a lovely new local maxima to climb into. Increase the pressure again!
Niching (which I've never used, but appears to be a way to group similar individuals together). Say you have two pretty good individuals, but they're wildly different. They keep getting selected. They keep mutating slightly, and not getting much better. Now you have half your population as minor variants of A, and half your population minor variants of B. This seems like a way to say, hey, what's the average fitness of that entire group A? and what for B? And what for every other niche you have. Then do your selection based on the average fitness for each niche. Pick your niche, then select a random individual from that niche. Maybe I'll start using this after all. I like it!
Hope you find some of that helpful!

If you need true random numbers for your application, I recommend you check out Random.org. They have a free HTTP API, and clients for just about every language.
The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs.
(I am unaffiliated with Random.org, although I did contribute the PHP client).

I think your problem is in how your fitness function and how you select candidates, not in how random values are. Your filtering feels too strict that may not even allow enough elements to be accepted.
Sample
values: random float 0-10000.
fitness function square root(n)
desired distribution of fitness - linear with distance at least 1.
With this fitness function you will quickly get most of the 1-wide "spots" taken (as you have at most 100 places), so every next one will take longer. At some point there will be several tiny ranges left and most of the results will simply rejected, even worse after you get about 50 numbers places there is a good chance that next one simply will not be able to fit.

algorithms for tournament brackets (NCAA, etc.)

I'm trying implement a bracket in my program (using C#/.NET MVC) and I am stuck trying to figure out some algorithm.
For example, I have a bracket like this with 8 entries (A,B,C,D,E,F,G,H)
I'm trying to figure out if there's an algorithmic way to
depending on # of entries, find # of
games per round
depending on # of entries, for a
specific game #, what is the
corresponding game # in the next
round?
For example, in this case, for 8 entries, the example are:
for round 1, there are 4 games. Round 2, 2 games. Round 3, 1 game
game 2 in round 1 corresponds to game 5 in round 2.
I also thought about storing this info in a table, but it seems overkill since it never changes, but here it is anyway:
Any help will be greatly appreciated!
Cheers,
Dean

C# code for the first part of your question:
// N = Initial Team Count
// R = Zero-Based Round #
// Games = (N / (2 ^ R)) / 2
public double GamesPerRound(int totalTeams, int currentRound) {
var result = (totalTeams / Math.Pow(2, currentRound)) / 2;
// Happens if you exceed the maximum possible rounds given number of teams
if (result < 1.0F) throw new InvalidOperationException();
return result;
}
The next step in solving part (2) is to know the minimum game number for a given round. An intuitive way to do that would be through a for loop, but there's probably a better method:
var totalTeams = 8;
var selectedRound = 2;
var firstGame = 1;
// If we start with round 1, this doesn't execute and firstGame remains at 1
for (var currentRound = 1; currentRound < selectedRound; currentRound++) {
var gamesPerRound = GamesPerRound(totalTeams, currentRound);
firstGame += gamesPerRound;
}

Quoting #Yuck who answered the first question perfectly.
C# code for the first part of your question:
// N = Initial Team Count
// R = Zero-Based Round #
// Games = (N / (2 ^ R)) / 2
public double GamesPerRound(int totalTeams, int currentRound) {
var result = (totalTeams / Math.Pow(2, currentRound)) / 2;
// Happens if you exceed the maximum possible rounds given number of teams
if (result < 1.0F) throw new InvalidOperationException();
return result;
}
Moving on to the second question:
//G = current game.
//T = total teams
//Next round game = (T / 2) + RoundedUp(G / 2)
//i. e.: G = 2, T = 8
//Next round game = (8 / 2) + RoundedUp(2 / 2) = 5
public int NextGame(int totalTeams, int currentGame) {
return (totalTeams / 2) + (int)Math.Ceiling((double)currentGame / 2);
}

I actually worked this out myself fairly recently and stumbled on (that is, I worked it out, but it has probably been discovered before) a neat recursive solution.
You start with your list of players, in a list that is sorted in seed order. This will be important later.
The overall algorithm consists of splitting the list of players into two and then creating two sub-tournaments. The winners of the two sub-tournaments will end up the grand final of the overall tournament.
Required Objects
Player
Name
Seed
Match
Home Player
Away Player
Next Match (pointer to the Match the winner goes to)
Splitting the Lists
Most tournaments put the top seeded player against the bottom seeded player in round one. In order to do this I used the following algorithm, but you could just put the first n / 2 players in one list and the rest in the other list to create a tournament where seeds 1 and 2 play off in round one (and seed 3 plays 4, 5 plays 6 etc).
I'll note here that the neat thing about having top seed play bottom seed is that with this algorithm if you don't have a power of two number of players, the top seed(s) will get a bye in the early rounds.
Take the first player and put them in the "left" list
Take the next two players (or the last player) and put them in the "right" list
Take the next two player and put them in the "left" list
Repeat from step 2 until there are no more players.
Of course, if there are only two players in the list you simply create a match between them and return.
Building the Tournament
So you start out with a list of say, 64 players. You split it into two lists of 32 players and recursively create two sub tournaments. The methods that you call recursively should return the Matches that represent the sub-tournament's grand final match (the semi-final of your overall tournament). You can then create a match to be the grand final of your overall tournament and set the nextMatch of the semi final matches to be this grand final.
Things to Consider
You'll need to break out of recursing if there are only two players in the list that's passed in.
If your split gives you a list of one, shouldn't recurse with it. Just create a sub-tournament with the other list (it should only have two players so will return with a match immediately), set the home team to be the single player and the nextMatch of the sub-tournament.
If you want to be able to keep track of rounds, you'll need to pass a recursion depth integer - increment it when you create a sub-tournament.
Hope this helps, let me know if you need any clarification :)

So basically its a elimination contest.
So just have List.
The algorithm will always put the first and second teams together if the number of teams is even. You then increase the counter by two and repeat.
If the number of teams is odd do pretty much the samething except you randomly select a winner of the "first around" and put it against the odd team.
After the first round you repeat the algorithm the same way.
A+1
C+1
...
For example, I have a bracket like
this with 8 entries (A,B,C,D,E,F,G,H)
You should be able to figure out how to parse this. This seems like a homework question.

Consider renumbering the games (you can always renumber them back afterwards)
if the final is 1
semis are 2,3
the problem is then has well-published solutions: ahnentafel (German for ancestor table) has been used for a long time by genealogists - http://en.wikipedia.org/wiki/Ahnentafel
one interesting part of this is the binary notation of the game # gives a lot of information as to the structure of the tournament and where in the tree the match is.
Also note that as every match knocks out 1 competitor, for n competitors there will be n-1 matches

performance problem

OK so I need to know if anyone can see a way to reduce the number of iterations of these loops because I can't. The first while loop is for going through a file, reading one line at a time. The first foreach loop is then comparing each of the compareSet with what was read in the first while loop. Then the next while loop is to do with bit counting.
As requested, an explaination of my algorithm:
There is a file that is too large to fit in memory. It contains a word followed by the pages in a very large document that this word is on. EG:
sky 1 7 9 32....... (it is not in this format, but you get the idea).
so parseLine reads in the line and converts it into a list of ints that are like a bit array where 1 means the word is on the page, and 0 means it isn't.
CompareSet is a bunch of other words. I can't fit my entire list of words into memory so I can only fit a subset of them. This is a bunch of words just like the "sky" example. I then compare each word in compareSet with Sky by seeing if they are on the same page.
So if sky and some other word both have 1 set at a certain index in the bit array (simulated as an int array for performance), they are on the same page. The algorithm therefore counts the occurances of any two words on a particular page. So in the end I will have a list like:
(for all words in list) is on the same page as (for all words in list) x number of times.
eg sky and land is on the same page x number of times.
while ((line = parseLine(s)) != null) {
getPageList(line.Item2, compareWord);
foreach (Tuple<int, uint[], List<Tuple<int, int>>> word in compareSet) {
unchecked {
for (int i = 0; i < 327395; i++) {
if (word.Item2[i] == 0 || compareWord[i] == 0)
continue;
uint combinedNumber = word.Item2[i] & compareWord[i];
while (combinedNumber != 0) {
actual++;
combinedNumber = combinedNumber & (combinedNumber - 1);
}
}
}

As my old professor Bud used to say: "When you see nested loops like this, your spidey senses should be goin' CRAZY!"
You have a while with a nested for with another while. This nesting of loops is an exponential increase on the order of operations. Your one for loop has 327395 iterations. Assuming they have the same or similar number of iterations, that means you have an order of operations of
327,395 * 327,395 * 327,395 = 35,092,646,987,154,875 (insane)
It's no wonder that things would be slowing down. You need to redefine your algorithm to remove these nested loops or combine work somewhere. Even if the numbers are smaller than my assumptions, the nesting of the loops is creating a LOT of operations that are probably unnecessary.

As Joal already mentioned nobody is able to optimize this looping algorithm. But what you can do is trying to better explain what you are trying to accomplish and what your hard requirements are. Maybe you can take a different approach by using some like HashSet<T>.IntersectWith() or BloomFilter or something like this.
So if you really want help from here you should not only post the code that doesn't work, but also what the overall task is you like to accomplish. Maybe someone has a completely other idea to solve your problem, making your whole algorithm obsolete.

Optimizing a Recursive Function for Very Large Lists .Net

I have built an application that is used to simulate the number of products that a company can produce in different "modes" per month. This simulation is used to aid in finding the optimal series of modes to run in for a month to best meet the projected sales forecast for the month. This application has been working well, until recently when the plant was modified to run in additional modes. It is now possible to run in 16 modes. For a month with 22 work days this yields 9,364,199,760 possible combinations. This is up from 8 modes in the past that would have yielded a mere 1,560,780 possible combinations. The PC that runs this application is on the old side and cannot handle the number of calculations before an out of memory exception is thrown. In fact the entire application cannot support more than 15 modes because it uses integers to track the number of modes and it exceeds the upper limit for an integer. Baring that issue, I need to do what I can to reduce the memory utilization of the application and optimize this to run as efficiently as possible even if it cannot achieve the stated goal of 16 modes. I was considering writing the data to disk rather than storing the list in memory, but before I take on that overhead, I would like to get people’s opinion on the method to see if there is any room for optimization there.
EDIT
Based on a suggestion by few to consider something more academic then merely calculating every possible answer, listed below is a brief explanation of how the optimal run (combination of modes) is chosen.
Currently the computer determines every possible way that the plant can run for the number of work days that month. For example 3 Modes for a max of 2 work days would result in the combinations (where the number represents the mode chosen) of (1,1), (1,2), (1,3), (2,2), (2,3), (3,3) For each mode a product produces at a different rate of production, for example in mode 1, product x may produce at 50 units per hour where product y produces at 30 units per hour and product z produces at 0 units per hour. Each combination is then multiplied by work hours and production rates. The run that produces numbers that most closely match the forecasted value for each product for the month is chosen. However, because some months the plant does not meet the forecasted value for a product, the algorithm increases the priority of a product for the next month to ensure that at the end of the year the product has met the forecasted value. Since warehouse space is tight, it is important that products not overproduce too much either.
Thank you
private List<List<int>> _modeIterations = new List<List<int>>();
private void CalculateCombinations(int modes, int workDays, string combinationValues)
{
List<int> _tempList = new List<int>();
if (modes == 1)
{
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
_modeIterations.Add(_tempList);
}
else
{
for (int i = workDays + 1; --i >= 0; )
{
CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",");
}
}
}

This kind of optimization problem is difficult but extremely well-studied. You should probably read up in the literature on it rather than trying to re-invent the wheel. The keywords you want to look for are "operations research" and "combinatorial optimization problem".
It is well-known in the study of optimization problems that finding the optimal solution to a problem is almost always computationally infeasible as the problem grows large, as you have discovered for yourself. However, it is frequently the case that finding a solution guaranteed to be within a certain percentage of the optimal solution is feasible. You should probably concentrate on finding approximate solutions. After all, your sales targets are already just educated guesses, therefore finding the optimal solution is already going to be impossible; you haven't got complete information.)
What I would do is start by reading the wikipedia page on the Knapsack Problem:
http://en.wikipedia.org/wiki/Knapsack_problem
This is the problem of "I've got a whole bunch of items of different values and different weights, I can carry 50 pounds in my knapsack, what is the largest possible value I can carry while meeting my weight goal?"
This isn't exactly your problem, but clearly it is related -- you've got a certain amount of "value" to maximize, and a limited number of slots to pack that value into. If you can start to understand how people find near-optimal solutions to the knapsack problem, you can apply that to your specific problem.

You could process the permutation as soon as you have generated it, instead of collecting them all in a list first:
public delegate void Processor(List<int> args);
private void CalculateCombinations(int modes, int workDays, string combinationValues, Processor processor)
{
if (modes == 1)
{
List<int> _tempList = new List<int>();
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
processor.Invoke(_tempList);
}
else
{
for (int i = workDays + 1; --i >= 0; )
{
CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",", processor);
}
}
}
I am assuming here, that your current pattern of work is something along the lines
CalculateCombinations(initial_value_1, initial_value_2, initial_value_3);
foreach( List<int> list in _modeIterations ) {
... process the list ...
}
With the direct-process-approach, this would be
private void ProcessPermutation(List<int> args)
{
... process ...
}
... somewhere else ...
CalculateCombinations(initial_value_1, initial_value_2, initial_value_3, ProcessPermutation);
I would also suggest, that you try to prune the search tree as early as possible; if you can already tell, that certain combinations of the arguments will never yield something, which can be processed, you should catch those already during generation, and avoid the recursion alltogether, if this is possible.
In new versions of C#, generation of the combinations using an iterator (?) function might be usable to retain the original structure of your code. I haven't really used this feature (yield) as of yet, so I cannot comment on it.

The problem lies more in the Brute Force approach that in the code itself. It's possible that brute force might be the only way to approach the problem but I doubt it. Chess, for example, is unresolvable by Brute Force but computers play at it quite well using heuristics to discard the less promising approaches and focusing on good ones. Maybe you should take a similar approach.
On the other hand we need to know how each "mode" is evaluated in order to suggest any heuristics. In your code you're only computing all possible combinations which, anyway, will not scale if the modes go up to 32... even if you store it on disk.

if (modes == 1)
{
List<int> _tempList = new List<int>();
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
processor.Invoke(_tempList);
}
Everything in this block of code is executed over and over again, so no line in that code should make use of memory without freeing it. The most obvious place to avoid memory craziness is to write out combinationValues to disk as it is processed (i.e. use a FileStream, not a string). I think that in general, doing string concatenation the way you are doing here is bad, since every concatenation results in memory sadness. At least use a stringbuilder (See back to basics , which discusses the same issue in terms of C). There may be other places with issues, though. The simplest way to figure out why you are getting an out of memory error may be to use a memory profiler (Download Link from download.microsoft.com).
By the way, my tendency with code like this is to have a global List object that is Clear()ed rather than having a temporary one that is created over and over again.

I would replace the List objects with my own class that uses preallocated arrays to hold the ints. I'm not really sure about this right now, but I believe that each integer in a List is boxed, which means much more memory is used than with a simple array of ints.
Edit: On the other hand it seems I am mistaken: Which one is more efficient : List<int> or int[]

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.