What are some strategies for testing large state machines?

What are some strategies for testing large state machines? - c#

I inherited a large and fairly complex state machine. It has 31 possible states, all are really needed (big business process). It has the following inputs:
Enum: Current State (so 0 -> 30)
Enum: source (currently only 2 entries)
Boolean: Request
Boolean: Type
Enum: Status (3 states)
Enum: Handling (3 states)
Boolean: Completed
Breaking it into separate state machines doesn't seem feasible, as each state is distinct. I wrote tests for the most common inputs, with one test per input, all inputs constant, except for the State.
[Subject("Application Process States")]
public class When_state_is_meeting2Requested : AppProcessBase
{
Establish context = () =>
{
//Setup....
};
Because of = () => process.Load(jas, vac);
It Current_node_should_be_meeting2Requested = () => process.CurrentNode.ShouldBeOfType<meetingRequestedNode>();
It Can_move_to_clientDeclined = () => Check(process, process.clientDeclined);
It Can_move_to_meeting1Arranged = () => Check(process, process.meeting1Arranged);
It Can_move_to_meeting2Arranged = () => Check(process, process.meeting2Arranged);
It Can_move_to_Reject = () => Check(process, process.Reject);
It Cannot_move_to_any_other_state = () => AllOthersFalse(process);
}
No one is entirely sure what the output should be for each state and set of inputs. I have started to write tests for it. However, I'll need to write something like 4320 tests (30 * 2 * 2 * 2 * 3 * 3 * 2).
What suggestions do you have for testing state machines?
Edit: I am playing with all of the suggestions, and will mark an answer when I find one that works best.

I see the problem, but I'd definitely try splitting the logic out.
The big problem area in my eyes is:
It has 31 possible states to be in.
It has the following inputs:
Enum: Current State (so 0 -> 30)
Enum: source (currently only 2 entries)
Boolean: Request
Boolean: type
Enum: Status (3 states)
Enum: Handling (3 states)
Boolean: Completed
There is just far too much going on. The input is making the code hard to test. You've said it would be painful to split this up into more manageable areas, but it's equally if not more painful to test this much logic in on go. In your case, each unit test covers far too much ground.
This question I asked about testing large methods is similar in nature, I found my units were simply too big. You'll still end up with many tests, but they'll be smaller and more manageable, covering less ground. This can only be a good thing though.
Testing Legacy Code
Check out Pex. You claim you inherited this code, so this is not actually Test-Driven-Development. You simply want unit tests to cover each aspect. This is a good thing, as any further work will be validated. I've personally not used Pex properly yet, however I was wowed by the video I saw. Essentially it will generate unit tests based on the input, which in this case would be the finite state machine itself. It will generate test cases you will not have enough thought of. Granted this is not TDD, but in this scenario, testing legacy code, it should be ideal.
Once you have your test coverage, you can begin refactoring, or adding new features with the safety of good test coverage to ensure you don't break any existing functionality.

All-Pair Testing
To constraint the amount of combinations to test and to be reasonable assured you have most important combinations covered, you should take a look at all-pair testing.
the reasoning behind all-pairs testing
is this: the simplest bugs in a
program are generally triggered by a
single input parameter. The next
simplest category of bugs consists of
those dependent on interactions
between pairs of parameters, which can
be caught with all-pairs testing.1
Bugs involving interactions between
three or more parameters are
progressively less common2, whilst at
the same time being progressively more
expensive to find by exhaustive
testing, which has as its limit the
exhaustive testing of all possible
inputs.
Also take a look at a previous answer here (shameless plug) for additional information and links to both all-pair & pict as tool.
Example Pict model file
Given model generates 93 testcases, covering all pairs of input parameters.
#
# This is a PICT model for testing a complex state machine at work
#
CurrentState :0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
Source :1,2
Request :True, False
Type :True, False
Status :State1, State2, State3
Handling :State1, State2, State3
Completed :True,False
#
# One can add constraints to the model to exclude impossible
# combinations if needed.
#
# For example:
# IF [Completed]="True" THEN CurrentState>15;
#
#
# This is the PICT output of "pict ComplexStateMachine.pict /s /r1"
#
# Combinations: 515
# Generated tests: 93

I can't think of any easy way to do test an FSM like this with out getting really pedantic and employing proofs, using machine learning techniques, or brute force.
Brute force:
Write a something that will generate all the 4320 test cases in some declarative manner with mostly incorrect data. I would recommend putting this in a CSV file and then use something like NUnits parameteric testing to load all the test cases.
Now most of these test cases will fail so you will have to update the declarative file manually to be correct and take just a sample of the test cases randomly to fix.
Machine Learning technique:
You could employ some Vector machines or MDA algorithms/heuristics to try to learn on the sample you took from what we mentioned above and teach your ML program your FSM. Then run the algorithm on all the 4320 inputs and see where the two disagree.

How many test do you think is needed to "completely" test function sum(int a, int b)? In c# it would be something like 18446744056529682436 tests... Much worse than in your case.
I would suggest following:
Test most possible situations,
boundary conditions.
Test some critical parts of your SUT
separately.
Add test cases when bugs found by QA
or in production.
In this particular case the best way is to test how system switches from one state to onother. Create DSL to test state machine and implement most frequent use cases using it. For Example:
Start
.UploadDocument("name1")
.SendDocumentOnReviewTo("user1")
.Review()
.RejectWithComment("Not enough details")
.AssertIsCompleted()
The example of creating simple tests for flows is here: http://slmoloch.blogspot.com/2009/12/design-of-selenium-tests-for-aspnet_09.html

Use SpecExplorer or NModel.

I had constructed a finite state machine for a piece of medical equipment.
The FSM was configurable through an XML format I had defined.
To define a state-machine, one has to rely on experience on digital circuit designs of using state maps,
You have to use what I term as a turnpike transition map. In the United States East Coast, most highways are nicknamed turnpikes. Turnpike authorities issue a turnpike toll pricing map. If a toll section had 50 exits, the pricing map would have a 50rows x 50cols table, listing the exits exhaustively as both rows and columns. To find out the toll charge for entering exit 20 and exiting exit 30, you simply look for the intersect of row 20 and column 30.
For a state machine of 30 states, the turnpike transition map would be a 30 x 30 matrix listing all the 30 possible states row and column wise. Let us decide the rows to be CURRENT states and columns to be NEXT states.
Each intersecting cell would list the "price" of transitioning from a CURRENT state(row) to a NEXT state(col). However instead of a single $ value, the cell would refer to a row in the Inputs table, which we could term as the transition id.
In the medical equipment FSM I developed, there were inputs that are String, enums, int, etc. The Inputs table listed these input stimulus column-wise.
To construct the Inputs table, you would write a simple routine to list all possible combinations of inputs. But the table would be huge. In your case, the table would have 4320 rows and hence 4320 transition ids. But its not a tedious table because you generated the table programmatically. In my case, I wrote a simple JSP to list the transitions input table (and the turnpike table) on browser or download as csv to be displayed in MS Excel.
There are two directions in constructing these two tables.
the design direction, where you construct the turnpike table all possible transitions, graying out non-reachable transitions. Then consturct the Inputs table of all expected inputs for each reachable transition only, with the row number as transition id. Each transition id is transcribed onto the respective cell of the turnpike transition map. However, since the FSM is a sparse matrix, not all transition ids will be used in the cells of the turnpike transition map. Also, a transition id can be used many multiple times because the same transition conditions can apply to more than one pair of state change.
the test direction is reverse, where you construct the Inputs table.
You have to write a general routine for the exhaustive transition test.
The routine would first read a transition sequencing table to bring the state-machine it to an entrypoint state to start a test cycle. At each CURRENT state, it is poised to run through all 4320 transition ids. On each row of CURRENT states in the Turnpike transition map, there would be a limited number of columns valid NEXT states.
You would want the routine to cycle thro all 4320 rows of inputs that it reads from the Inputs table, to ensure unused transition ids have no effect on a CURRENT state. You want to test that all effectual transition ids are valid transitions.
But you cannot - because once an effectual transition is pumped in, it would change the state of the machine into a NEXT state and prevent you from completing testing the rest of the transition ids on that previous CURRENT state. Once the machine changes state, you have to start testing from transition id 0 again.
Transition paths can be cyclical or irreversible or having combination of cyclical and irreversible sections along the path.
Within your test routine, you need a register for each state to memorise the last transition id pumped into that state. Everytime the test reaches an effectual transition id, that transition id is left in that register. So that when you complete a cycle and return to an already traversed state, you start iterating on next transition id greater than the one stored in the register.
Your routine would have to take care of the irreversible sections of a transition path, wheb a machine is brought to a final state, it restarts the test from the entry point state, reiterating the 4320 inputs from the next transition id greater than the one stored for a state. In this way, you would be able to exhaustively discover all the possible transition paths of the machine.
Fortunately, FSMs are sparse matrices of effectual transitions because exhaustive testing would not consume the complete combination of number of transition ids x number of possible states squared. However, the difficulty occurs if you are dealing with a legacy FSM where visual or temperature states cannot be fed back into the test system, where you have to monitor each state visually. That would be ugly, but still we spent two weeks additionally testing the equipment visually going through only the effectual transitions.
You may not need a transition sequencing table (for each entry point state for the test routine to read to bring the machine to a desired entrypoint) if your FSM allows you to reach an entrypoint with a simple reset and applying a transition id would simply it to an entrypoint state. But having your routine capable of reading a transition sequencing table is useful because frequently, you would need to go into the midst of the state network and start your testing from there.
You should acquaint yourself with the use of transition and state maps because it is very useful to detect all the possible and undocumented states of a machine and interview users if they actually wanted them grayed out (transitions made ineffectual and states made unreachable).
The advantage I had was that it was a new piece of equipment and I had the choice to design the state-machine controller to read xml files which means I could change the behaviour of the state machine anyway I wanted, actually anyway the customer wanted and I was able to assure that unused transition ids were really ineffectual.
For the java listing of the finite state machine controller http://code.google.com/p/synthfuljava/source/browse/#svn/trunk/xml/org/synthful. Test routines not included.

Test based on the requirements. If a certain state is required to move to a certain other state whenever Completed is true, then write a test that automatically cycles through all the combinations of the other inputs (this should just be a couple for loops) to prove that the other inputs are correctly ignored. You should end up with one test for each transition arc, which I'd estimate would be somewhere on the order of 100 or 150 tests, not 4000.

You might consider investigating Model Based Testing. There are a few tools available to help with test generation in situations like this. I usually recommend MBT.

Brute force with coverage tests seems to be a very beginning.

Related

Find right features in multiclass svm without PCA

I'm classifing users with a multiclass svm (one-against-on), 3 classes. In binary, I would be able to plot the distribution of the weight of each feature in the hyperplan equation for different training sets. In this case, I don't really need a PCA to see stability of the hyperplan and relative importance of the features (reudced centered btw). What would the alternative be in multiclass svm, as for each training set you have 3 classifiers and you choose one class according to the result of the three classifiers (what is it already ? the class that appears the maximum number of times or the bigger discriminant ? whichever it does not really matter here). Anyone has an idea.
And if it matters, I am writing in C# with Accord.
Thank you !

In a multi-class SVM that uses the one-vs-one strategy, the problem is divided into a set of smaller binary problems. For example, if you have three possible classes, using the one-vs-one strategy requires the creation of (n(n-1))/n binary classifiers. In your example, this would be
(n(n-1))/n = (3(3-1))/2 = (3*2)/2 = 3
Each of those will be specialized in the following problems:
Distinguishing between class 1 and class 2 (let's call it svma).
Distinguishing between class 1 and class 3 (let's call it svmb)
Distinguishing between class 2 and class 3 (let's call it svmc)
Now, I see that actually you have asked multiple questions in your original post, so I will ask them separately. First I will clarify how the decision process works, and then tell how you could detect which features are the most important.
Since you mentioned Accord.NET, there are two ways this framework might be computing the multi-class decision. The default one is to use a Decision Directed Acyclic Graph (DDAG), that is nothing more but the sequential elimination of classes. The other way is by solving all binary problems and taking the class that won most of the time. You can configure them at the moment you are classifying a new sample by setting the method parameter of the SVM's Compute method.
Since the winning-most-of-the-time version is straightforward to understand, I will explain a little more about the default approach, the DDAG.
Decision using directed acyclic graphs
In this algorithm, we test each of the SVMs and eliminate the class that lost at each round. So for example, the algorithm starts with all possible classes:
Candidate classes: [1, 2, 3]
Now it asks svma to classify x, it decides for class 2. Therefore, class 1 lost and is not considered anymore in further tests:
Candidate classes: [2, 3]
Now it asks svmb to classify x, it decides for class 2. Therefore, class 3 lost and is not considered anymore in further tests:
Candidate classes: [2]
The final answer is thus 2.
Detecting which features are the most useful
Now, since the one-vs-one SVM is decomposed into (n(n-1)/2) binary problems, the most straightforward way to analyse which features are the most important is by considering each binary problem separately. Unfortunately it might be tricky to globally determine which are the most important for the entire problem, but it will be possible to detect which ones are the most important to discriminate between class 1 and 2, or class 1 and 3, or class 2 and 3.
However, here I can offer a suggestion if your are using DDAGs. Using DDAGs, it is possible to extract the decision path that lead to a particular decision. This means that is it possible to estimate how many times each of the binary machines was used when classifying your entire database. If you can estimate the importance of a feature for each of the binary machines, and estimate how many times a machine is used during the decision process in your database, perhaps you could take their weighted sum as an indicator of how useful a feature is in your decision process.
By the way, you might also be interested in trying one of the Logistic Regression Support Vector Machines using L1-regularization with a high C to perform sparse feature selection:
// Create a new linear machine
var svm = new SupportVectorMachine(inputs: 2);
// Creates a new instance of the sparse logistic learning algorithm
var smo = new ProbabilisticCoordinateDescent(svm, inputs, outputs)
{
// Set learning parameters
Complexity = 100,
};

I'm not an expert in ML or SVM. I am a self learner. However my prototype over-performed some of similar commercial or academical software in accuracy, while the training time is about 2 hours in contrast of days and weeks(!) of some competitors.
My recognition system (patterns in bio-cells) uses following approach to select best features:
1)Extract features and calculate mean and variance for all classes
2)Select those features, where means of classes are most distanced and variances are minimal.
3)Remove redundant features - those which mean-histograms over classes are similar
In my prototype I'm using parametric features e.g feature "circle" with parameters diameter, threshold, etc.
The training is controlled by scripts defining which features with which argument-ranges are to use. So the software tests all possible combinations.
For some training-time optimization:
The software begins with 5 instances per class for extracting the features and increases the number when the 2nd condition met.
Probably there are some academical names for some of the steps. Unfortunately I'm not aware of them, I've "invented the wheel" myself.

Training error and Validation error in Multiple Output Neural Network

I am developing a program to study Neural Networks, by now I understand the differences (I guess) of dividing a dataset into 3 sets (training, validating & testing). My networks may be of just one output or multiple outputs, depending on the datasets and the problems. The learning algorithm is the back-propagation.
So, the problem basically is that I am getting confused with each error and the way to calculate it.
Which is the training error? If I want to use the MSE is the (desired - output)^2 ? But then, what happens if my network has 2 or more outputs, the training error is going to be the sum of all outputs?
Then, the validation error is just using the validation data set to calculate the output and compare the obtained results with the desired results, this will give me an error, is it computed the same way as in the training error? and with multiple outputs?
And finally, not totally clear, when should the validation run? Somewhere I read that it could be once every 5 epochs, but, is there any rule for this?
Thanks the time in advance!

For multiple output neurons, to calculate the training error, in each epoch/iteration, you take each output value, get the difference to the target value for that neuron. Square it, do the same for the other output neurons, and then get the mean.
So eg with two output neurons,
MSE = (|op1 - targ1|^2 + |op2 - targ2|^2 ) / 2
The training, validation and test errors are calculated the same way. The difference is when they are run and how they are used.
The full validation set is usually checked on every training epoch. Maybe to speed computation, you could run it every 5.
The result of the validation test/check is not used to update the weights, only to decide when to exit training. Its used to decide if the network has generalized on the data, and not overfitted.
Check the pseudocode in the first answer in this question
whats is the difference between train, validation and test set, in neural networks?

Looking for tips to build "TestMaker" (Questions and Responses) application with Evaluation Engine

I'm working on a new project.
My best analogy would be a psychological evaluation test maker.
Aspect #1.
The end-business-user needs to create test questions. With question types. And possible responses to the questions when applicable.
Examples:
1. Do you have red hair? (T/F)
2. What is your favorite color? (Single Response/Multiple Choice)
(Possible Responses: Red, Yellow, Black, White)
3. What size shoe do you wear (rounded to next highest size)? (Integer)
4. How much money do you have on you right now? (Dollar Amount (decimal))
So I need to be able to create questions, their question type, and for some of the questions, possible answers.
Here:
Number 1 is a know type of "True or False".
Number 2 is a know type of "Single Response/Multiple Choice" AND the end-business-user will create the possible responses.
Number 3 is a known type of "Integer". The end-user (person taking the evaluation) can basically put in any integer value.
Number 4 is a known type of "Decimal". Same thing as Integer.
Aspect #2.
The end-business-user needs to evaluate the person's responses. And assign some scalar value to a set of responses.
Example:
If someone responded:
1. T
2. Red
3. >=8
4. (Not considered for this situation)
Some psychiatrist-expert figures out that if someone answered with the above responses, that you are a 85% more at risk for depression than the normal. (Where 85% is a number the end-business-user (psychiatrist) can enter as a parameter.
So Aspect #2 is running through someone's responses, and determining a result.
The "response grid" would have to be setup so that it will go through (some or all) the possibilities in a priority ranking order, and then after all conditions are met (on a single row), exit out with the result.
Like this:
(1.) T (2.) Red (3.) >=8 ... Result = 85%
(1.) T (2.) Yellow (3.) >=8 ... Result = 70%
(1.) F (2.) Red (3.) >=8 ... Result = 50%
(1.) F (2.) Yellow (3.) >=8 ... Result = 40%
Once a match is found, you exit with the percentage. If you don't find a match, you go to the next rule.
Also, running with this psych evaluation mock example, I don't need to define every permutation. A lot of questions of psych evaluations are not actually used, they are just "fluff". So in my grid above, I have purposely left out question #4. It has no bearing on the results.
There can also be a "Did this person take this seriously?" evaluation grid:
(3.) >=20 (4.) >=$1,000 ... Result = False
(The possibility of having a shoe size >= 20 and having big dollars in your pocket is very low, thus you probably did not take the psych test seriously.)
If no rule is found, (in my real application, not this mock up), I would throw an exception or just not care. I would not need a default or fall-through rule.
In the above, Red and Yellow are "worrisome" favorite colors. If your favorite color is black or white, it has no bearing upon your depression risk factor.
I have used Business Rule Engines in the past. (InRule for example).
They are very expensive, and it is not in the budget.
BizTalk Business Rules Framework is a possibility. Not de$irable, but possible.
My issue with any Rules-Engine is that the "vocabulary" (I have limited experience with business rules engines, mind you) is based off of objects, with static properties.
public class Employee
{
public string LastName
{ get; set; }
public DateTime HireDate
{ get; set; }
public decimal AnnualSalary
{ get; set; }
public void AdjustSalary(int percentage)
{
this.AdjustSalary= this.AdjustSalary + (this.AdjustSalary * percentage);
}
}
This would be easy to create business rules.
If
the (Employee's HireDate) is before (10 years ago)
then
(Increase Their Salary) By (5) Percent.)
else
(Increase Their Salary) By (3) Percent.)
But in my situation, the Test is composed of (dynamic) Questions and (dynamic) Responses, not predetermined properties.
So I guess I'm looking for some ideas to investigate on how to pull this off.
I know I can build a "TestMaker" application fairly quickly.
The biggest issue is integrating the Questions and (Possible Responses) into "evaluating rules".
Thanks for any tips.
Technologies:
DotNet 4.0 Framework
Sql Server 2008 Backend Database
VS2010 Pro, C#

If it's a small application, i.e 10s of users as opposed to 1000s, and its not business critical, then I would recommend Excel.
The advantages are, most business users are very familiar with excel and most probably have it on their machines. Basically you ship an Excel Workbook to your business users. They would enter the questions, the Type (Decimal, True False etc.). Click a button which triggers an excel macro. This could generate an XML configuration file or put the questions into SQL. Your application just reads it, displays it and collects responses as usual.
The main advantage of Excel comes in Aspect #2, the dynamic user chosen business rules. In another sheet of the same Excel document, the end business user can specify as many of the permutations of the responses/questions as they feel like. Excel users are very familiar with inputting simple formulas like =If(A1 > 20 && A2 <50) etc. Again the user pushes a button and generates either a configuration file or input the data into SQL server.
In your application you iterate through the Rules table and apply it to the responses.
Given the number of users/surveys etc. This simple solution would be much more simpler than biztalk or a full on customs rules engine. Biztalk would be good if you need to talk to multiple system, integrate their data, enforce rules, create work flow etc. Most of the other rules engines are also geared towards really big, complex rule system. The complexity in this rule systems, isn't just the number of rules or permutations, it is mostly having to talk to multiple system or "End dating" certain rules or putting in rules for future kick off dates etc.
In your case an Excel based or a similar datagrid on a webpage system would work fine. Unless of course you are doing a project for Gartner or some other global data survey firm or a major government statistical organisation. In that case I would suggest Biztalk or other commercial rules engines.
In your case, its SQL tables with Questions, Answer Types, Rules To Apply. Input is made user friendly via Excel or "Excel in the web" via a datagrid and then just iterate through the rules and apply them to the response table.

Are you sure you want to use a business rule engine for this problem?
As far as I understand it, the usecase for a BRE is
Mostly static execution flow
Some decisions do change often
In your usecase, the whole system (Q/A-flow and evaluation) are dynamic, so IMHO a simple domain specific language for the whole system would be a better solution.
You might want to take some inspiration from testMaker - a web based software exactly for this workflow. (Disclaimer: I contributed a bit to this project.) Its scoring rules are quite basic, so this might not help you that much. It was designed to export data to SPSS and build reports from there...

Be sure you are modeling a database suitable for hierarchical objects This article would help
create table for test
create tables for questions, with columns question, testid, questiontype
create tables for answers, answer id,question id, answer and istrue columns
answers belong to questions
one question can have many answer
create table for user or employee
create table for user answers, answer id, selected answer
evaluation (use object variables for boolean-integer coverage, use try catch before using function for high exception coverage.):
function(questiontype,answer,useranswer)
switch(questiontype) //condition can be truefalse, biggerthan,smallerthan, equals
{
case: "biggerthan": if(useranswer>answer) return true else return false;
case "truefalse": if(useranswer==answer) return true else return false
case "equals": if(useranswer==answer) return true else return false
}
get output as a data dictionary and post here please.
without a data schema the help you get will be limited.

How to create unit tests for a fair distribution algorithm?

I have the following algorithm:
Given a list of accounts, I have to divide them fairly between system users.
Now, in order to ease the workload on the users I have to split them over days.
So, if an account has service orders they must be inserted to the list that will be distributed over 547 days (a year and a half). If an account has no service orders they must be inserted to the list that will be distributed over 45 days (a month and a half).
I am using the following LINQ extension from a question I asked before:
public static IEnumerable<IGrouping<TBucket, TSource>> DistributeBy<TSource, TBucket>(
this IEnumerable<TSource> source, IList<TBucket> buckets)
{
var tagged = source.Select((item,i) => new {item, tag = i % buckets.Count});
var grouped = from t in tagged
group t.item by buckets[t.tag];
return grouped;
}
and I can guarantee that it works.
My problem is that I don't really know how to unit test all those cases.
I might have 5 users and 2 accounts which might not be enough to be split over a year and a half or even a month and a half.
I might have exactly 547 accounts and 547 system users so each account will be handled by each system user each day.
Basically I don't know what kind of datasets I should create, because it seems that there are too many options, and what should I assert on because I have no idea how the distribution will be.

Start with boundary conditions (natural limits on the input of the method) and any known corner cases (places where the method behaves in an unexpected manner and you need special code to account for this).
For example:
How does the method behave when there are zero accounts?
Zero users?
Exactly one account and user
547 accounts and 547 users
It sounds like you already know a lot of the expected boundary conditions here. The corner cases will be harder to think of initially, but you will probably have hit some of them as you developed the method. They will also naturally come through manual testing - each time you find a bug this a new necessary test.
Once you have tested bounday conditons and corner cases you should also look at a "fair" sample of other situations - like your 5 users and 2 accounts example. You can't (or at least, arguably don't need to) test all possible inputs into the method, but you can test a representative sample of inputs, for things like uneven division of accounts.

I think part of your issue is that your code describes how you are solving your problem, but not what problem you are trying to solve. Your current implementation is deterministic and gives you an answer, but you could easily swap it with another "fair allocator", which would give you an answer that would differ in the details (maybe different users would be allocated different accounts), but satisfy the same requirements (allocation is fair).
One approach would be to focus on what "fairness" means. Rather than checking the exact output of your current implementation, maybe reframe it so that at a high level, it looks like:
public interface IAllocator
{ IAllocation Solve(IEnumerable<User> users, IEnumerable<Account> accounts); }
and write tests which verify not the specific implementation of accounts to users, but that the allocation is fair, to be defined - something along "every user should have the same number of accounts allocated, plus or minus one". Defining what fair is, or what the exact goal of your algorithm is, should help you identify the corner cases of interest. Working off a higher-level objective (the allocation should be fair, and not the specific allocation) should allow you to easily swap implementations and verify whether the allocator is doing its job.

Permutation/Algorithm to Solve Conditional Fill Puzzle

I've been digging around to see if something similar has been done previously, but have not seen anything with the mirrored conditions. To make swallowing the problem a little easier to understand, I'm going to apply it in the context of filling a baseball team roster.
The given roster structure is organized as such: C, 1B, 2B, 3B, SS, 2B/SS (either or), 1B/3B, OF, OF, OF, OF, UT (can be any position)
Every player has at least one of the non-backup positions (positions that allow more than one position) where they're eligible and in many cases more than one (i.e. a player that can play 1B and OF, etc.). Say that you are manager of a team, which already has some players on it and you want to see if you have room for a particular player at any of your slots or if you can move one or more players around to open up a slot where he is eligible.
My initial attempts were to use a conditional permutation and collect in a list all the possible unique "lineups" for each player, updating the open slots before moving to the next player. This also required (since the order that the player was moved would affect what positions were available for the next player) that the list being looped through was reordered and then looped through again. I still think that this is the way to go, but there are a number of pitfalls that have snagged the function.
The data to start the loop that you assume is given is:
1. List of positions the player being evaluated can player (the one being checked if he can fit)
2. List of players currently on the roster and the positions each of those is eligible at (I'm currently storing a list of lists and using the list index as the unique identifier of the player)
3. A list of the positions open as the roster currently is
It's proven a bigger headache than I originally anticipated. It was even suggested to me by a colleague that the situation I have (which involves, on a much larger scale, conditional assignments for each object) was NP-complete. I am certain that it is not, since once a player has been repositioned in a particular lineup being tested, the entire roster should not need to be iterated over again once another player has moved. That's the long and short of it and I finally decided to open it up to the forums.
Thanks for any help anyone can provide. Due to restrictions, I can't post portions of code (some of it is legacy). It is, however, being translated in .NET (C# at the moment). If there's additional information necessary, I'll try and rewrite some of the short pieces of the function for post.
Joseph G.
EDITED 07/24/2010
Thank you very much for the responses. I actually did look into using a genetic algorithm, but ultimately abandoned it because of the amount of work that would go into the determination of ordinal results was superfluous. The ultimate aim of the test is to determine if there is, in fact, a scenario that returns a positive. There's no need to determine the relative benefit of each working solution.
I appreciate the feedback on the likely lack of familiarity with the context I presented the problem. The actual model is in the distribution of build commands across multiple platform-specific build servers. It's accessible, but I'd rather not get into why certain build tasks can only be executed on certain systems and why certain systems can only execute certain types of build commands.
It appears that you have gotten the gist of what I was presenting, but here's a different model that's a little less specific. There are a set of discrete positions in an ordered array of lists as such (I'll refer to these as "positions"):
((2), (2), (3), (4), (5), (6), (4, 6), (3, 5), (7), (7), (7), (7), (7), (2, 3, 4, 5, 6, 7))
Additionally, there is a an unordered array of lists (I'll refer to as "employees") that can only occupy one of the slots if its array has a member in common with the ordered list to which it would be assigned. After the initial assignments have been made, if an additional employee comes along, I need to determine if he can fill one of the open positions, and if not, if the current employees can be rearranged to allow one of the positions the employee CAN fill to be made available.
Brute force is something I'd like to avoid, because with this being on the order of 40 - 50 objects (and soon to be increasing), individual determinations will be very expensive to calculate at runtime.

I don't understand baseball at all so sorry if I'm on the wrong track. I do like rounders though, but there are only 2 positions to play in rounders, a batter or everyone else.
Have you considered using Genetic Algorithms to solve this problem? They are very good at solving NP hard problems and work surprisingly well for rota and time schedule type problems as well.
You have a solution model which can easily be scored and easily manipulated which is a great start for a genetic algorithm.
For more complex problems where the total permutations are too large to calculate a genetic algorithm should find a near optimum or excellent solution (along with lots and lots of other valid solutions) in a fairly short amount of time. Although if you wish the find the optimum solution every time, you are going to have to brute force it in all likelihood (I have only skimmed the problem so this may not be the case but it sounds like it probably is).
In your example, you would have a solution class, this represents a solution, IE a line-up for the baseball team. You randomly generate say 20 solutions, regardless if they are valid or not, then you have a rating algorithm that rates the solution. In your case, a better player in the line-up would score more than a worse player, and any invalid line-ups (for whatever reason) would force a score of 0.
Any 0 scoring solutions are killed off, and replaced with new random ones, and the rest of the solutions breed together to form new solutions. Theoretically and after enough time the pool of solutions should improve.
This has the benefit of not only finding lots of valid unique line-ups, but also rating them. You didn't specify in your problem the need to rate the solutions, but it offers plenty of benefits (for example if a player is injured, he can be temporarily rated as a -10 or whatever). All other players score based on their quantifiable stats.
It's scalable and performs well.

It sounds as though you have a bipartite matching problem. One partition has a vertex for each player on the roster. The other has a vertex for each roster position. There is an edge between a player vertex and a position vertex if and only if the player can play that position. You are interested in matchings: collections of edges such that no endpoint is repeated.
Given an assignment of players to positions (a matching) and a new player to be accommodated, there is a simple algorithm to determine if it can be done. Direct each edge in the current matching from the position to the player; direct the others from the player to the position. Now, using breadth-first search, look for a path from the new player to an unassigned position. If you find one, it tells you one possible series of reassignments. If you don't, there's no matching with all of the players.
For example, suppose player A can play positions 1 or 2
A--1
\
\
2
We provisionally assign A to 2. Now B shows up and can only play 2. Direct the graph:
A->1
<
\
B->2
We find a path B->2->A->1, which means "assign B to 2, displacing A to 1".
There is lots of pretty theory for dealing with hypothetical matchings. Genetic algorithms need not apply.
EDIT: I should add that because of the use of BFS, it computes the least disruptive sequence of reassignments.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.