Function profiling woes - Visual Studio 2010 Ultimate

Function profiling woes - Visual Studio 2010 Ultimate - c#

I am trying to profile my application to monitor the effects of a function, both before and after refactoring. I have performed an analysis of my application and having looked at the Summary I've noticed that the Hot Path list does not mention any of my functions used, it only mentions functions up to Application.Run()
I'm fairly new to profiling and would like to know how I could get more information about the Hot Path as demonstrated via the MSDN documentation;
MSDN Example:
My Results:
I've noticed in the Output Window there are a lot of messages relating to a failure when loading symbols, a few of them are below;
Failed to load symbols for C:\Windows\system32\USP10.dll.
Failed to load symbols for C:\Windows\system32\CRYPTSP.dll.
Failed to load symbols for (Omitted)\WindowsFormsApplication1\bin\Debug\System.Data.SQLite.dll.
Failed to load symbols for C:\Windows\system32\GDI32.dll.
Failed to load symbols for C:\Windows\WinSxS\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.7601.17514_none_41e6975e2bd6f2b2\comctl32.dll.
Failed to load symbols for C:\Windows\system32\msvcrt.dll.
Failed to load symbols for C:\Windows\Microsoft.NET\Framework\v4.0.30319\nlssorting.dll.
Failed to load symbols for C:\Windows\Microsoft.Net\assembly\GAC_32\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll. Failed to load symbols for
C:\Windows\Microsoft.Net\assembly\GAC_32\System.Transactions\v4.0_4.0.0.0__b77a5c561934e089\System.Transactions.dll.
Unable to open file to serialize symbols: Error VSP1737: File could not be opened due to sharing violation: - D:\(Omitted)\WindowsFormsApplication1110402.vsp
(Formatted using code tool so it's readable)
Thanks for any pointers.

The "Hot Path" shown on the summary view is the most expensive call path based on the number of inclusive samples (samples from the function and also samples from functions called by the function) and exclusive samples (samples only from the function). A "sample" is just the fact the function was at the top of the stack when the profiler's driver captured the stack (this occurs at very small timed intervals). Thus, the more samples a function has, the more it was executing.
By default for sampling analysis, a feature called "Just My Code" is enabled that hides functions on the stack coming from non-user modules (it will show a depth of 1 non-user functions if called by a user function; in your case Application.Run). Functions coming from modules without symbols loaded or from modules known to be from Microsoft would be excluded. Your "Hot Path" on the summary view indicates that the most expensive stack didn't have anything from what the profiler considers to be your code (other than Main). The example from MSDN shows more functions because the PeopleTrax.* and PeopleNS.* functions are coming from "user code". "Just My Code" can be turned off by clicking the "Show All Code" link on the summary view, but I would not recommend doing so here.
Take a look at the "Functions Doing The Most Individual Work" on the summary view. This displays functions that have the highest exclusive sample counts and are therefore, based on the profiling scenario, the most expensive functions to call. You should see more of your functions (or functions called by your functions) here. Additionally, the "Functions" and "Call Tree" view might show you more details (there's a drop-down at the top of the report to select the current view).
As for your symbol warnings, most of those are expected because they are Microsoft modules (not including System.Data.SQLite.dll). While you don't need the symbols for these modules to properly analyze your report, if you checked "Microsoft Symbol Servers" in "Tools -> Options -> Debugging -> Symbols" and reopened the report, the symbols for these modules should load. Note that it'll take much longer to open the report the first time because the symbols need to be downloaded and cached.
The other warning about the failure to serialize symbols into the report file is the result of the file not being able to be written to because it is open by something else that prevents writing. Symbol serialization is an optimization that allows the profiler to load symbol information directly from the report file on the next analysis. Without symbol serialization, analysis simply needs to perform the same amount of work as when the report was opened for the first time.
And finally, you may also want to try instrumentation instead of sampling in your profiling session settings. Instrumentation modifies modules that you specify to capture data on each and every function call (be aware that this can result in a much, much larger .vsp file). Instrumentation is ideal for focusing in on the timing of specific pieces of code, whereas sampling is ideal for general low-overhead profiling data collection.

Do you mind too much if I talk a bit about profiling, what works and what doesn't?
Let's make up an artificial program, some of whose statements are doing work that can be optimized away - i.e. they are not really necessary.
They are "bottlenecks".
Subroutine foo runs a CPU-bound loop that takes one second.
Also assume subroutine CALL and RETURN instructions take insignificant or zero time, compared to everything else.
Subroutine bar calls foo 10 times, but 9 of those times are unnecessary, which you don't know in advance and can't tell until your attention is directed there.
Subroutines A, B, C, ..., J are 10 subroutines, and they each call bar once.
The top-level routine main calls each of A through J once.
So the total call tree looks like this:
main
A
bar
foo
foo
... total 10 times for 10 seconds
B
bar
foo
foo
...
...
J
...
(finished)
How long does it all take? 100 seconds, obviously.
Now let's look at profiling strategies.
Stack samples (like say 1000 samples) are taken at uniform intervals.
Is there any self time? Yes. foo takes 100% of the self time.
It's a genuine "hot spot".
Does that help you find the bottleneck? No. Because it is not in foo.
What is the hot path? Well, the stack samples look like this:
main -> A -> bar -> foo (100 samples, or 10%)
main -> B -> bar -> foo (100 samples, or 10%)
...
main -> J -> bar -> foo (100 samples, or 10%)
There are 10 hot paths, and none of them look big enough to gain you much speedup.
IF YOU HAPPEN TO GUESS, and IF THE PROFILER ALLOWS, you could make bar the "root" of your call tree. Then you would see this:
bar -> foo (1000 samples, or 100%)
Then you would know that foo and bar were each independently responsible for 100% of the time and therefore are places to look for optimization.
You look at foo, but of course you know the problem isn't there.
Then you look at bar and you see the 10 calls to foo, and you see that 9 of them are unnecessary. Problem solved.
IF YOU DIDN'T HAPPEN TO GUESS, and instead the profiler simply showed you the percent of samples containing each routine, you would see this:
main 100%
bar 100%
foo 100%
A 10%
B 10%
...
J 10%
That tells you to look at main, bar, and foo. You see that main and foo are innocent. You look at where bar calls foo and you see the problem, so it's solved.
It's even clearer if in addition to showing you the functions, you can be shown the lines where the functions are called. That way, you can find the problem no matter how large the functions are in terms of source text.
NOW, let's change foo so that it does sleep(oneSecond) rather than be CPU bound. How does that change things?
What it means is it still takes 100 seconds by the wall clock, but the CPU time is zero. Sampling in a CPU-only sampler will show nothing.
So now you are told to try instrumentation instead of sampling. Contained among all the things it tells you, it also tells you the percentages shown above, so in this case you could find the problem, assuming bar was not very big. (There may be reasons to write small functions, but should satisfying the profiler be one of them?)
Actually, the main thing wrong with the sampler was that it can't sample during sleep (or I/O or other blocking), and it doesn't show you code line percents, only function percents.
By the way, 1000 samples gives you nice precise-looking percents. Suppose you took fewer samples. How many do you actually need to find the bottleneck? Well, since the bottleneck is on the stack 90% of the time, if you took only 10 samples, it would be on about 9 of them, so you'd still see it.
If you even took as few as 3 samples, the probability it would appear on two or more of them is 97.2%.**
High sample rates are way overrated, when your goal is to find bottlenecks.
Anyway, that's why I rely on random-pausing.
** How did I get 97.2 percent? Think of it as tossing a coin 3 times, a very unfair coin, where "1" means seeing the bottleneck. There are 8 possibilities:
#1s probabality
0 0 0 0 0.1^3 * 0.9^0 = 0.001
0 0 1 1 0.1^2 * 0.9^1 = 0.009
0 1 0 1 0.1^2 * 0.9^1 = 0.009
0 1 1 2 0.1^1 * 0.9^2 = 0.081
1 0 0 1 0.1^2 * 0.9^1 = 0.009
1 0 1 2 0.1^1 * 0.9^2 = 0.081
1 1 0 2 0.1^1 * 0.9^2 = 0.081
1 1 1 3 0.1^0 * 0.9^3 = 0.729
so the probability of seeing it 2 or 3 times is .081*3 + .729 = .972

Related

How can I work with multiple Task.Assignments in VSTO MS Project?

I got a collection of 'Users' which are working on task between two dates (firstPerformance and LastPerformance) for some hours. I would like to asign that information to a task with creating asignments and setting their Start, Finish and Work:
Assignment a = task.Assignments.Add(ResourceID: user.Resource.Id);
a.Start = user.Performances.FirstPerformance;
a.Finish = user.Performances.LastPerformance;
a.Work = user.Performances.TotalTime * 60 // TotalTime is in hours while Work is in minutes
Before I do that I remove all assignments on the task and set it to pjFixedWork:
task.Type = PjTaskFixedType.pjFixedWork;
foreach (Assignment a in task.Assignments) a.Delete();
This results in strange behaivor:
On the first run it sets Work and Start in Project correct and assigns the correct resource.
On the second run it sets Work, Start and Finish correct
And on the third run it also fixes the Unit
For example:
Start: 19.10.21 Finish: 02.02.22 Work: 288,75h Resource: Andreas
Start: 19.10.21 Finish: 27.01.22 Work: 288,75h Resource: Andreas [57%]
Start: 19.10.21 Finish: 27.01.22 Work: 288,75h Resource: Andreas [60%]
Of course the input is on all three runs the same!
While still not understanding this behavior I experience something different if there is more than one resource to be assigned.
The data I'm assigning looks like:
User 1 First: 14.09.2021 Last: 02.12.2021 Total Time: 82,75
User 2 First: 31.08.2021 Last: 28.01.2022 Total Time: 263,5
User 3 First: 08.09.2021 Last: 09.09.2021 Total Time: 4,5
User 4 First: 02.12.2021 Last: 02.12.2021 Total Time: 1
User 5 First: 19.10.2021 Last: 28.01.2022 Total Time: 223
User 6 First: 06.12.2021 Last: 25.01.2022 Total Time: 18,5
User 7 First: 07.09.2021 Last: 14.12.2021 Total Time: 126
The results of different runs look like:
Start: 31.08.21 Finish: 14.01.22 Work: 170h Resource: User1; User2; ...
Start: 31.08.21 Finish: 14.01.22 Work: 169,04h Resource: User1; User2; ...
Start: 31.08.21 Finish: 14.01.22 Work: 169,04h Resource: User1; User2; ... NO CHANGES
Since I have assigned 1€/hour on the resouces I can see that something is going on since the costs are:
User 1 0,21€
User 2 2,59€
User 3 0,16€
User 4 0,07€
User 5 30,79€
User 6 9,22€
User 7 126,00€
What adds up to 169,04.
What am I doing wrong here?
Best Regards
Andreas

What am I doing wrong here?
Your code is not wrong. The 'wrong' part is trying to use Microsoft Project in a way it is not designed to work. Let me explain.
Microsoft Project uses a Critical Path Method algorithm to schedule tasks based on such things as duration of the tasks, links between them and other dependencies such as constraint dates and working hours (calendars). Given a set of these inputs, Project calculates task start and finish dates, work required, etc.
Problems arise when, instead of allowing the CPM algorithm to do the calculations, too many exact dates (for example) are forced into the schedule. Trying to force an exact amount of work for each resource also works against the CPM engine.
Having a task with more than a few resources is a good indication that it should be split up. Having tasks that are longer than a few weeks is also a good indication that they might be better split up. Break large tasks down into smaller parts.
Use Resource calendars to limit the availability of each resource as necessary. Use leveling to ensure resources are not overloaded. Let Project do what it does best--schedule tasks based on the constraints. Telling Project exactly when and how much each resource will work on each task is the opposite of how it is designed to be used. Don't fight it.

Rachel's answer is sufficient but here's my piece for what it's worth:
The resources that you assign to tasks in code should be the same period of performance as the task itself. (i.e. if the task starts on 31.08.21 and finishes on 14.01.22, all of it's resource assignments should also start and finish on those dates). When you start to mess around with that, MS Project starts to behave erroneously and you'll eventually start seeing things in the schedule that make you think "What the heck is going on? This makes no sense". Assuming your VSTO is going to be for production use, this is not something you want to subject your users to.
Don't worry about the assigned resource units on a task. I'm not sure exactly how MS Project calculates the resource assignment units (maybe Rachel can shed some light on this), but I find them to often not make sense and be confusing. Personally I just ignore them. I just worry about the total number of hours each resource assignment has. (FYI be weary of the Effort Driven flag if you are working on a task that is set to fixed duration)
Two other general things about working with resource assignments on tasks:
If you start to delete or add resources to a task in code, you need to set the task to 0% complete first. Results may vary otherwise. You may be doing this already but that's why I asked before.
When you delete resource assignments off a task, make sure the task is set to fixed duration.
You will probably want to store at least the original duration and start date of the task in memory before you start messing with the resources so you can make sure the task is still occurring at the same time after you have altered the resources assignments.

Tracking method "bloat" over time with nDepend

We are looking at using nDepend to start tracking some of our technical debt, particularly around hard to maintain methods and cyclomatic complexity.
I believe this is possibly by taking a baseline report and then running a new analysis to provide the delta. Below is a very basic Powershell I've put together that does this.
$nDepend = "C:\_DEVELOPMENT\nDepend\NDepend.Console.exe"
$targetFile = "C:\_DEVELOPMENT\AssemblyToTest\CodeChallenge.Domain.ndproj"
$projectFolder = Split-Path -Path $targetFile
$outputFolder = "nDepend.Reports"
$previous = ""
Clear-Host
# See if we already have a .ndar file in the output folder, if we do back it up so we can do a comparison
if (Test-Path $projectFolder\$outputFolder\*.ndar)
{
Write-Output "Backing up previous NDAR report"
Copy-Item $projectFolder\$outputFolder\*.ndar $projectFolder\previous.ndar
$previous = ".\previous.ndar"
}
#The output path appears to be relative to the .ndproj file
& $nDepend $targetFile /Silent /OutDir .\$outputFolder /AnalysisResultToCompareWith .\previous.ndar
Here is the rule I've configured in nDepend: -
failif count > 1 bobs
from m in Methods
where m.NbLinesOfCode > 10
where m.WasChanged()
select new { m, m.NbLinesOfCode }
The goal of this is not to break the build if we have methods over 10 lines, but rather to break the build if somebody edits an existing method that is too big and does not improve it (or make it worse). However the where m.WasChanged() part of the rule isn't being triggered regardless of how much code I add. If I comment it out it will alert me that there are plenty of methods that exceed 10 lines, but I only want to know about recently changed ones.
Am I using the rule wrong? Or perhaps my powershell is incorrectly using the /AnalysisResultToCompareWith parameter?

There are default rules like Avoid making complex methods even more complex in the rule group Code Smells Regression that are close to what you want to achieve. You can get inspired by their source code.
The key is to retrieve the methods changed with...
m.IsPresentInBothBuilds() &&
m.CodeWasChanged() &&
and then compare metric evolution since baseline by accessing m.OlderVersion().
A ICompareContext reference two code bases snapshots the newer version and the older version. In this context the OlderVersion() extension method returns actually calls the ICompareContext.OlderVersion(codeElement), from the doc:
Returns the older version of the codeElement object.
If codeElement is already the older version, returns the codeElement object.
If codeElement has been added and has no corresponding older version, returns null.
This method has a constant time complexity.

What can I do to programmatically prevent or limit Resource contentions?

I have created an app that, given enough data, fails to complete, with "The transaction log for database 'tempdb' is full due to 'ACTIVE_TRANSACTION'." and "Cannot find table 0."
The Stored Procedure that the report uses does not explicitly reference "tempdb" so it must be something that SQL Server manages on its own.
Anyway, I ran the "Resource Contention" analysis in Visual Studio 2013 via Analyze > Performance and Diagnostics.
When it finished, it told me in the "Concurrency Profiling Report" that there were 30,790 total contentions, with "Handle2" and "Multiple Handles 1" making up over 99% of the "Most Contended Resources" and "_CorExeMain", Thread ID 4936 as the "Most Contended Thread"
This is all interesting enough, I guess, but now that I know this, what can I do about it?
Is 30,790 an excessive amount of total contentions? It sounds like it, but I don't know. But again, assuming it is, this information doesn't really seem to tell me anything of value, that is: what can I do to improve the situation? How can I programmatically prevent or limit Resource and/or Thread contention?
There were no errors in the report generated, and the six messages were of the "for information only" variety. There was one Warning:
Warning 1 DA0022: # Gen 1 Collections / # Gen 2 Collections = 2.52; There is a relatively high rate of Gen 2 garbage collections occurring. If, by design, most of your program's data structures are allocated and persisted for a long time, this is not ordinarily a problem. However, if this behavior is unintended, your app may be pinning objects. If you are not certain, you can gather .NET memory allocation data and object lifetime information to understand the pattern of memory allocation your application uses.
...but the persisting of data structures for "a long time" is, indeed, by design.
UPDATE
I then ran the ".NET Memory Allocation" report, and here too I'm not sure what to make of it:
Is 124 million bytes excessive? Is there anything untoward about the functions allocating the most memory, or which types have the most memory allocated?
I also don't understand why the red vertical line moves about after the report has been generated; it was first around "616", then it moved to 0 (as seen in the screenshot above), and now it's around 120.
UPDATE 2
I see now (after running the final performance check (Instrumentation)) that the vertical red line is just a lemming - it follows the cursor wherever you drag it. This has some purpose, I guess...

This is what ended up working: a two-pronged attack:
0) The database guru refactored the Stored Procedure to be more efficient.
1) I reworked the data reading code by breaking it into two parts. I determined the first half of the data to be retrieved, and got that, then followed a short pause with a second pass, retrieving the rest of the data. In short, what used to be this:
. . .
ReadData(_unit, _monthBegin, _monthEnd, _beginYearStr, _endYearStr);
. . .
...is now this:
. . .
if // big honkin' amount of data
{
string monthBegin1 = _monthBegin;
string monthEnd1 = GetEndMonthOfFirstHalf(_monthBegin, _monthEnd, _beginYearStr, _endYearStr);
string monthBegin2 = GetBeginMonthOfSecondHalf(_monthBegin, _monthEnd, _beginYearStr, _endYearStr);
string monthEnd2 = _monthEnd;
string yearBegin1 = _beginYearStr;
string yearEnd1 = GetEndYearOfFirstHalf(_monthBegin, _monthEnd, _beginYearStr, _endYearStr);
string yearBegin2 = GetBeginYearOfSecondHalf(_monthBegin, _monthEnd, _beginYearStr, _endYearStr);
string yearEnd2 = _endYearStr;
ReadData(_unit, monthBegin1, monthEnd1, yearBegin1, yearEnd1);
Thread.Sleep(10000);
ReadData(_unit, monthBegin2, monthEnd2, yearBegin2, yearEnd2);
}
else // a "normal" Unit (not an overly large amount of data to be processed)
{
ReadData(_unit, _monthBegin, _monthEnd, _beginYearStr, _endYearStr);
}
. . .
It now successfully runs to completion without any exceptions. I don't know if the problem was entirely database-related, or if all Excel Interop activity was also problematic, but I can now generate Excel files of 1,341KB with this operation.

Norms, rules or guidelines for calculating and showing "ETA/ETC" for a process

ETC = "Estimated Time of Completion"
I'm counting the time it takes to run through a loop and showing the user some numbers that tells him/her how much time, approximately, the full process will take. I feel like this is a common thing that everyone does on occasion and I would like to know if you have any guidelines that you follow.
Here's an example I'm using at the moment:
int itemsLeft; //This holds the number of items to run through.
double timeLeft;
TimeSpan TsTimeLeft;
list<double> avrage;
double milliseconds; //This holds the time each loop takes to complete, reset every loop.
//The background worker calls this event once for each item. The total number
//of items are in the hundreds for this particular application and every loop takes
//roughly one second.
private void backgroundWorker1_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
//An item has been completed!
itemsLeft--;
avrage.Add(milliseconds);
//Get an avgrage time per item and multiply it with items left.
timeLeft = avrage.Sum() / avrage.Count * itemsLeft;
TsTimeLeft = TimeSpan.FromSeconds(timeLeft);
this.Text = String.Format("ETC: {0}:{1:D2}:{2:D2} ({3:N2}s/file)",
TsTimeLeft.Hours,
TsTimeLeft.Minutes,
TsTimeLeft.Seconds,
avrage.Sum() / avrage.Count);
//Only using the last 20-30 logs in the calculation to prevent an unnecessarily long List<>.
if (avrage.Count > 30)
avrage.RemoveRange(0, 10);
milliseconds = 0;
}
//this.profiler.Interval = 10;
private void profiler_Tick(object sender, EventArgs e)
{
milliseconds += 0.01;
}
As I am a programmer at the very start of my career I'm curious to see what you would do in this situation. My main concern is the fact that I calculate and update the UI for every loop, is this bad practice?
Are there any do's/don't's when it comes to estimations like this? Are there any preferred ways of doing it, e.g. update every second, update every ten logs, calculate and update UI separately? Also when would an ETA/ETC be a good/bad idea.

The real problem with estimation of time taken by a process is the quantification of the workload. Once you can quantify that, you can made a better estimate
Examples of good estimates
File system I/O or network transfer. Whether or not file systems have bad performance, you can get to know in advance, you can quantify the total number of bytes to be processed and you can measure the speed. Once you have these, and once you can monitor how many bytes have you transferred, you get a good estimate. Random factors may affect your estimate (i.e. an application starts meanwhile), but you still get a significative value
Encryption on large streams. For the reasons above. Even if you are computing a MD5 hash, you always know how many blocks have been processed, how many are to be processed and the total.
Item synchronization. This is a little trickier. If you can assume that the per-unit workload is constant or you can make a good estimate of the time required to process an item when variance is low or insignificant, then you can make another good estimate of the process. Pick email synchronization: if you don't know the byte size of the messages (otherwise you fall in case 1) but common practice tells that the majority of emails have quite the same size, then you can use the mean of the time taken to download/upload all processed emails to estimate the time taken to process a single email. This won't work in 100% of the cases and is subject to error, but you still see progress bar progressing on a large account
In general the rule is that you can make a good estimate of ETC/ETA (ETA is actually the date and time the operation is expected to complete) if you have a homogeneous process about of which you know the numbers. Homogeneity grants that the time to process a work item is comparable to others, i.e. the time taken to process a previous item can be used to estimate future. Numbers are used to make correct calculations.
Examples of bad estimates
Operations on a number of files of unknown size. This time you know only how many files you want to process (e.g. to download) but you don't know their size in advance. Once the size of the files has a high variance you see troubles. Having downloaded half of the file, when these were the smallest and sum up to 10% of total bytes, can be said being halfway? No! You just see the progress bar growing fast to 50% and then much slowly
Heterogenous processes. E.g. Windows installations. As pointed out by #HansPassant, Windows installations provide a worse-than-bad estimate. Installing a Windows software involves several processes including: file copy (this can be estimated), registry modifications (usually never estimated), execution of transactional code. The real problem is the last. Transactional processes involving execution of custom installer code are discusses below
Execution of generic code. This can never be estimated. A code fragment involves conditional statements. The execution of these involve changing paths depending on a condition external to the code. This means, for example, that a program behaves differently whether you have a printer installed or not, whether you have a local or a domain account, etc.
Conclusions
Estimating the duration of a software process isn't both an impossible and an exact/*deterministic* task.
It's not impossible because, even in the case of code fragments, you can either find a model for your code (pick a LU factorization as an example, this may be estimated). Or you might redesign your code splitting it into an estimation phase - where you first determine the branch conditions - and an execution phase, where all pre-determined branches are taken. I said might because this task is in practice impossible: most code determines branches as effects of previous conditions, meaning that estimating a branch actually involves running the code. Chicken and egg circle
It's not a deterministic process. Computer systems, especially if multitasking are affected by a number of random factors that may impact on your estimated process. You will never get a correct estimate before running your process. At most, you can detect external factors and re-estimate your process. The fork between your estimate and the real duration of process is mathematically converging to zero when you get closer to process end (lim [x->N] |est(N) - real(N)| == 0, where N is the process duration)

If your user interface is so obscure that you have to explain that ETC doesn't mean Etcetera then you are doing it wrong. Every user understands what a progress bar does, don't help.
Nothing is quite as annoying as an inaccurate progress bar. Particularly ones that promise a quick finish but then don't deliver. I'd give the progress bar displayed by any installer on Windows as a good example of one that is fundamentally broken. Just not a shining example of an implementation that you should pursue.
Such a progress bar is broken because it is utterly impossible to guess up front how long it is going to take to install a program. File systems have very unpredictable perf. This is a very common problem with estimating execution time. Better UI models are the spinning dots you'd see in a video player and many programs in Windows 8. Or the marquee style supported by the common ProgressBar control. Just feedback that says "I'm not dead, working on it". Even the hour-glass cursor is better than a bad estimate. If you have something to report beyond a technicality that no user is really interested in then don't hesitate to display that. Like the number of files you've processed or the number of kilobytes you've downloaded. The actual value of the number isn't that useful, seeing the rate at which it increases is the interesting tidbit.

What are some strategies for testing large state machines?

I inherited a large and fairly complex state machine. It has 31 possible states, all are really needed (big business process). It has the following inputs:
Enum: Current State (so 0 -> 30)
Enum: source (currently only 2 entries)
Boolean: Request
Boolean: Type
Enum: Status (3 states)
Enum: Handling (3 states)
Boolean: Completed
Breaking it into separate state machines doesn't seem feasible, as each state is distinct. I wrote tests for the most common inputs, with one test per input, all inputs constant, except for the State.
[Subject("Application Process States")]
public class When_state_is_meeting2Requested : AppProcessBase
{
Establish context = () =>
{
//Setup....
};
Because of = () => process.Load(jas, vac);
It Current_node_should_be_meeting2Requested = () => process.CurrentNode.ShouldBeOfType<meetingRequestedNode>();
It Can_move_to_clientDeclined = () => Check(process, process.clientDeclined);
It Can_move_to_meeting1Arranged = () => Check(process, process.meeting1Arranged);
It Can_move_to_meeting2Arranged = () => Check(process, process.meeting2Arranged);
It Can_move_to_Reject = () => Check(process, process.Reject);
It Cannot_move_to_any_other_state = () => AllOthersFalse(process);
}
No one is entirely sure what the output should be for each state and set of inputs. I have started to write tests for it. However, I'll need to write something like 4320 tests (30 * 2 * 2 * 2 * 3 * 3 * 2).
What suggestions do you have for testing state machines?
Edit: I am playing with all of the suggestions, and will mark an answer when I find one that works best.

I see the problem, but I'd definitely try splitting the logic out.
The big problem area in my eyes is:
It has 31 possible states to be in.
It has the following inputs:
Enum: Current State (so 0 -> 30)
Enum: source (currently only 2 entries)
Boolean: Request
Boolean: type
Enum: Status (3 states)
Enum: Handling (3 states)
Boolean: Completed
There is just far too much going on. The input is making the code hard to test. You've said it would be painful to split this up into more manageable areas, but it's equally if not more painful to test this much logic in on go. In your case, each unit test covers far too much ground.
This question I asked about testing large methods is similar in nature, I found my units were simply too big. You'll still end up with many tests, but they'll be smaller and more manageable, covering less ground. This can only be a good thing though.
Testing Legacy Code
Check out Pex. You claim you inherited this code, so this is not actually Test-Driven-Development. You simply want unit tests to cover each aspect. This is a good thing, as any further work will be validated. I've personally not used Pex properly yet, however I was wowed by the video I saw. Essentially it will generate unit tests based on the input, which in this case would be the finite state machine itself. It will generate test cases you will not have enough thought of. Granted this is not TDD, but in this scenario, testing legacy code, it should be ideal.
Once you have your test coverage, you can begin refactoring, or adding new features with the safety of good test coverage to ensure you don't break any existing functionality.

All-Pair Testing
To constraint the amount of combinations to test and to be reasonable assured you have most important combinations covered, you should take a look at all-pair testing.
the reasoning behind all-pairs testing
is this: the simplest bugs in a
program are generally triggered by a
single input parameter. The next
simplest category of bugs consists of
those dependent on interactions
between pairs of parameters, which can
be caught with all-pairs testing.1
Bugs involving interactions between
three or more parameters are
progressively less common2, whilst at
the same time being progressively more
expensive to find by exhaustive
testing, which has as its limit the
exhaustive testing of all possible
inputs.
Also take a look at a previous answer here (shameless plug) for additional information and links to both all-pair & pict as tool.
Example Pict model file
Given model generates 93 testcases, covering all pairs of input parameters.
#
# This is a PICT model for testing a complex state machine at work
#
CurrentState :0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
Source :1,2
Request :True, False
Type :True, False
Status :State1, State2, State3
Handling :State1, State2, State3
Completed :True,False
#
# One can add constraints to the model to exclude impossible
# combinations if needed.
#
# For example:
# IF [Completed]="True" THEN CurrentState>15;
#
#
# This is the PICT output of "pict ComplexStateMachine.pict /s /r1"
#
# Combinations: 515
# Generated tests: 93

I can't think of any easy way to do test an FSM like this with out getting really pedantic and employing proofs, using machine learning techniques, or brute force.
Brute force:
Write a something that will generate all the 4320 test cases in some declarative manner with mostly incorrect data. I would recommend putting this in a CSV file and then use something like NUnits parameteric testing to load all the test cases.
Now most of these test cases will fail so you will have to update the declarative file manually to be correct and take just a sample of the test cases randomly to fix.
Machine Learning technique:
You could employ some Vector machines or MDA algorithms/heuristics to try to learn on the sample you took from what we mentioned above and teach your ML program your FSM. Then run the algorithm on all the 4320 inputs and see where the two disagree.

How many test do you think is needed to "completely" test function sum(int a, int b)? In c# it would be something like 18446744056529682436 tests... Much worse than in your case.
I would suggest following:
Test most possible situations,
boundary conditions.
Test some critical parts of your SUT
separately.
Add test cases when bugs found by QA
or in production.
In this particular case the best way is to test how system switches from one state to onother. Create DSL to test state machine and implement most frequent use cases using it. For Example:
Start
.UploadDocument("name1")
.SendDocumentOnReviewTo("user1")
.Review()
.RejectWithComment("Not enough details")
.AssertIsCompleted()
The example of creating simple tests for flows is here: http://slmoloch.blogspot.com/2009/12/design-of-selenium-tests-for-aspnet_09.html

Use SpecExplorer or NModel.

I had constructed a finite state machine for a piece of medical equipment.
The FSM was configurable through an XML format I had defined.
To define a state-machine, one has to rely on experience on digital circuit designs of using state maps,
You have to use what I term as a turnpike transition map. In the United States East Coast, most highways are nicknamed turnpikes. Turnpike authorities issue a turnpike toll pricing map. If a toll section had 50 exits, the pricing map would have a 50rows x 50cols table, listing the exits exhaustively as both rows and columns. To find out the toll charge for entering exit 20 and exiting exit 30, you simply look for the intersect of row 20 and column 30.
For a state machine of 30 states, the turnpike transition map would be a 30 x 30 matrix listing all the 30 possible states row and column wise. Let us decide the rows to be CURRENT states and columns to be NEXT states.
Each intersecting cell would list the "price" of transitioning from a CURRENT state(row) to a NEXT state(col). However instead of a single $ value, the cell would refer to a row in the Inputs table, which we could term as the transition id.
In the medical equipment FSM I developed, there were inputs that are String, enums, int, etc. The Inputs table listed these input stimulus column-wise.
To construct the Inputs table, you would write a simple routine to list all possible combinations of inputs. But the table would be huge. In your case, the table would have 4320 rows and hence 4320 transition ids. But its not a tedious table because you generated the table programmatically. In my case, I wrote a simple JSP to list the transitions input table (and the turnpike table) on browser or download as csv to be displayed in MS Excel.
There are two directions in constructing these two tables.
the design direction, where you construct the turnpike table all possible transitions, graying out non-reachable transitions. Then consturct the Inputs table of all expected inputs for each reachable transition only, with the row number as transition id. Each transition id is transcribed onto the respective cell of the turnpike transition map. However, since the FSM is a sparse matrix, not all transition ids will be used in the cells of the turnpike transition map. Also, a transition id can be used many multiple times because the same transition conditions can apply to more than one pair of state change.
the test direction is reverse, where you construct the Inputs table.
You have to write a general routine for the exhaustive transition test.
The routine would first read a transition sequencing table to bring the state-machine it to an entrypoint state to start a test cycle. At each CURRENT state, it is poised to run through all 4320 transition ids. On each row of CURRENT states in the Turnpike transition map, there would be a limited number of columns valid NEXT states.
You would want the routine to cycle thro all 4320 rows of inputs that it reads from the Inputs table, to ensure unused transition ids have no effect on a CURRENT state. You want to test that all effectual transition ids are valid transitions.
But you cannot - because once an effectual transition is pumped in, it would change the state of the machine into a NEXT state and prevent you from completing testing the rest of the transition ids on that previous CURRENT state. Once the machine changes state, you have to start testing from transition id 0 again.
Transition paths can be cyclical or irreversible or having combination of cyclical and irreversible sections along the path.
Within your test routine, you need a register for each state to memorise the last transition id pumped into that state. Everytime the test reaches an effectual transition id, that transition id is left in that register. So that when you complete a cycle and return to an already traversed state, you start iterating on next transition id greater than the one stored in the register.
Your routine would have to take care of the irreversible sections of a transition path, wheb a machine is brought to a final state, it restarts the test from the entry point state, reiterating the 4320 inputs from the next transition id greater than the one stored for a state. In this way, you would be able to exhaustively discover all the possible transition paths of the machine.
Fortunately, FSMs are sparse matrices of effectual transitions because exhaustive testing would not consume the complete combination of number of transition ids x number of possible states squared. However, the difficulty occurs if you are dealing with a legacy FSM where visual or temperature states cannot be fed back into the test system, where you have to monitor each state visually. That would be ugly, but still we spent two weeks additionally testing the equipment visually going through only the effectual transitions.
You may not need a transition sequencing table (for each entry point state for the test routine to read to bring the machine to a desired entrypoint) if your FSM allows you to reach an entrypoint with a simple reset and applying a transition id would simply it to an entrypoint state. But having your routine capable of reading a transition sequencing table is useful because frequently, you would need to go into the midst of the state network and start your testing from there.
You should acquaint yourself with the use of transition and state maps because it is very useful to detect all the possible and undocumented states of a machine and interview users if they actually wanted them grayed out (transitions made ineffectual and states made unreachable).
The advantage I had was that it was a new piece of equipment and I had the choice to design the state-machine controller to read xml files which means I could change the behaviour of the state machine anyway I wanted, actually anyway the customer wanted and I was able to assure that unused transition ids were really ineffectual.
For the java listing of the finite state machine controller http://code.google.com/p/synthfuljava/source/browse/#svn/trunk/xml/org/synthful. Test routines not included.

Test based on the requirements. If a certain state is required to move to a certain other state whenever Completed is true, then write a test that automatically cycles through all the combinations of the other inputs (this should just be a couple for loops) to prove that the other inputs are correctly ignored. You should end up with one test for each transition arc, which I'd estimate would be somewhere on the order of 100 or 150 tests, not 4000.

You might consider investigating Model Based Testing. There are a few tools available to help with test generation in situations like this. I usually recommend MBT.

Brute force with coverage tests seems to be a very beginning.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.