I wanted to store some pixels locations without allowing duplicates, so the first thing comes to mind is HashSet<Point> or similar classes. However this seems to be very slow compared to something like HashSet<string>.
For example, this code:
HashSet<Point> points = new HashSet<Point>();
using (Bitmap img = new Bitmap(1000, 1000))
{
for (int x = 0; x < img.Width; x++)
{
for (int y = 0; y < img.Height; y++)
{
points.Add(new Point(x, y));
}
}
}
takes about 22.5 seconds.
While the following code (which is not a good choice for obvious reasons) takes only 1.6 seconds:
HashSet<string> points = new HashSet<string>();
using (Bitmap img = new Bitmap(1000, 1000))
{
for (int x = 0; x < img.Width; x++)
{
for (int y = 0; y < img.Height; y++)
{
points.Add(x + "," + y);
}
}
}
So, my questions are:
Is there a reason for that? I checked this answer, but 22.5 sec is way more than the numbers shown in that answer.
Is there a better way to store points without duplicates?
There are two perf problems induced by the Point struct. Something you can see when you add Console.WriteLine(GC.CollectionCount(0)); to the test code. You'll see that the Point test requires ~3720 collections but the string test only needs ~18 collections. Not for free. When you see a value type induce so many collections then you need to conclude "uh-oh, too much boxing".
At issue is that HashSet<T> needs an IEqualityComparer<T> to get its job done. Since you did not provide one, it needs to fall back to one returned by EqualityComparer.Default<T>(). That method can do a good job for string, it implements IEquatable. But not for Point, it is a type that harks from .NET 1.0 and never got the generics love. All it can do is use the Object methods.
The other issue is that Point.GetHashCode() does not do a stellar job in this test, too many collisions, so it hammers Object.Equals() pretty heavily. String has an excellent GetHashCode implementation.
You can solve both problems by providing the HashSet with a good comparer. Like this one:
class PointComparer : IEqualityComparer<Point> {
public bool Equals(Point x, Point y) {
return x.X == y.X && x.Y == y.Y;
}
public int GetHashCode(Point obj) {
// Perfect hash for practical bitmaps, their width/height is never >= 65536
return (obj.Y << 16) ^ obj.X;
}
}
And use it:
HashSet<Point> list = new HashSet<Point>(new PointComparer());
And it is now about 150 times faster, easily beating the string test.
The main reason for the performance drop is all the boxing going on (as already explained in Hans Passant's answer).
Apart from that, the hash code algorithm worsens the problem, because it causes more calls to Equals(object obj) thus increasing the amount of boxing conversions.
Also note that the hash code of Point is computed by x ^ y. This produces very little dispersion in your data range, and therefore the buckets of the HashSet are overpopulated — something that doesn't happen with string, where the dispersion of the hashes is much larger.
You can solve that problem by implementing your own Point struct (trivial) and using a better hash algorithm for your expected data range, e.g. by shifting the coordinates:
(x << 16) ^ y
For some good advice when it comes to hash codes, read Eric Lippert's blog post on the subject.
Related
I have a System.Windows.Shapes.Polygon object, whose layout is determined completely by a series of points. I need to determine if this polygon is self-intersecting, i.e., if any of the sides of the polygon intersect any of the other sides at a point which is not a vertex.
Is there an easy/fast way to compute this?
Easy, slow, low memory footprint: compare each segment against all others and check for intersections. Complexity O(n2).
Slightly faster, medium memory footprint (modified version of above): store edges in spatial "buckets", then perform above algorithm on per-bucket basis. Complexity O(n2 / m) for m buckets (assuming uniform distribution).
Fast & high memory footprint: use a spatial hash function to split edges into buckets. Check for collisions. Complexity O(n).
Fast & low memory footprint: use a sweep-line algorithm, such as the one described here (or here). Complexity O(n log n)
The last is my favorite as it has good speed - memory balance, especially the Bentley-Ottmann algorithm. Implementation isn't too complicated either.
Check if any pair of non-contiguous line segments intersects.
For the sake of completeness i add another algorithm to this discussion.
Assuming the reader knows about axis aligned bounding boxes(Google it if not) It can be very efficient to quickly find pairs of edges that have theirs AABB's clashing using the "Sweep and Prune Algorithm". (google it). Intersection routines are then called on these pairs.
The advantage here is that you may even intersect a non straight edge(circles and splines) and the approach is more general albeit almost similarly efficient.
I am a new bee here and this question is old enough. However, here is my Java code for determining if any pair of sides of a polygon defined by an ordered set of points crossover. You can remove the print statements used for debugging. I have also not included the code for returning the first point of crossover found. I am using the Line2D class from the standard java library.
/**
* Checks if any two sides of a polygon cross-over.
* If so, returns that Point.
*
* The polygon is determined by the ordered sequence
* of Points P
*
* If not returns null
*
* #param V vertices of the Polygon
*
* #return
*/
public static Point verify(Point[] V)
{
if (V == null)
{
return null;
}
int len = V.length;
/*
* No cross-over if len < 4
*/
if (len < 4)
{
return null;
}
System.out.printf("\nChecking %d Sided Polygon\n\n", len);
for (int i = 0; i < len-1; i++)
{
for (int j = i+2; j < len; j++)
{
/*
* Eliminate combinations already checked
* or not valid
*/
if ((i == 0) && ( j == (len-1)))
{
continue;
}
System.out.printf("\nChecking if Side %3d cuts Side %3d: ", i, j);
boolean cut = Line2D.linesIntersect(
V[i].X,
V[i].Y,
V[i+1].X,
V[i+1].Y,
V[j].X,
V[j].Y,
V[(j+1) % len].X,
V[(j+1) % len].Y);
if (cut)
{
System.out.printf("\nSide %3d CUTS Side %3d. Returning\n", i, j);
return ( ... some point or the point of intersection....)
}
else
{
System.out.printf("NO\n");
}
}
}
return null;
}
I have a method that compares two byte arrays for equality, with the major caveat that it does not fail and exit early when it detects inequality. Basically the code is used to compare Cross-Site Request Forgery tokens, and avoid (as much as possible) the ability to use timing to hack the key. I wish I could find the link to the paper that discusses the attack in detail, but the important thing is that the code I have still has a statistically measurable bias for returning sooner if the two byte arrays are equal--although it is an order of magnitude better. So without further ado, here is the code:
public static bool SecureEquals(byte[] original, byte[] potential)
{
// They should be the same size, but we don't want to throw an
// exception if we are wrong.
bool isEqual = original.Length == potential.Length;
int maxLenth = Math.Max(original.Length, potential.Length);
for(int i=0; i < maxLength; i++)
{
byte originalByte = (i < original.Length) ? original[i] : (byte)0;
byte potentialByte = (i < potential.Length) ? potential[i] : (byte)0;
isEqual = isEqual && (originalByte == potentialByte);
}
return isEqual;
}
The difference in average timing between equal and unequal tokens is consistently 10-25ms (depending on garbage collection cycles) shorter for unequal tokens. This is precisely what I want to avoid. If the relative timing were equal, or the average timing swapped based on the run I would be happy. The problem is that we are consistently running shorter for unequal tokens. In contrast, if we stopped the loop on the first unequal token, we could have up to an 80x difference in timing.
While this equality check is a major improvement over the typical eager return, it is still not good enough. Essentially, I don't want any consistent result for equality or inequality returning faster. If I could get the results into the range where garbage collection cycles will mask any consistent bias, I will be happy.
Anyone have a clue what is causing the timing bias toward inequality being faster? At first I thought it was the ternary operator returning an access to the array or a constant if the arrays were of unequal size. The problem is that I still get this bias if the two arrays are the same size.
NOTE: As requested, the links to the articals on Timing Attacks:
http://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf (official paper, linked to from the blog post below)
http://codahale.com/a-lesson-in-timing-attacks/ (talks about the failure in Java's library)
http://en.wikipedia.org/wiki/Timing_attack (more general, but not as complete)
This line could be causing a problem:
isEqual = isEqual && (originalByte == potentialByte);
That won't bother evaluating the originalByte == potentialByte subexpression if isEquals is already false. You don't want the shortcircuiting here, so change it to:
isEqual = isEqual & (originalByte == potentialByte);
EDIT: Note that you're already effectively leaking the information about the size of the original data - because it will always run in a constant time until the potential array exceeds the original array in size, at which point the time will increase. It's probably quite tricky to avoid this... so I would go for the "throw an exception if they're not the right size" approach which explicitly acknowledges it, effectively.
EDIT: Just to go over the idea I included in comments:
// Let's assume you'll never really need more than this
private static readonly byte[] LargeJunkArray = new byte[1024 * 32];
public static bool SecureEquals(byte[] original, byte[] potential)
{
// Reveal that the original array will never be more than 32K.
// This is unlikely to be particularly useful.
if (potential.Length > LargeJunkArray.Length)
{
return false;
}
byte[] copy = new byte[potential.Length];
int bytesFromOriginal = Math.Min(original.Length, copy.Length);
// Always copy the same amount of data
Array.Copy(original, 0, copy, 0, bytesFromOriginal);
Array.Copy(LargeJunkArray, 0, copy, bytesFromOriginal,
copy.Length - bytesFromOriginal);
bool isEqual = original.Length == potential.Length;
for(int i=0; i < copy.Length; i++)
{
isEqual = isEqual & (copy[i] == potential[i]);
}
return isEqual;
}
Note that this assumes that Array.Copy will take the same amount of time to copy the same amount of data from any source - that may well not be true, based on CPU caches...
In the event that .NET should be smart enough to optimize this out – have you tried introducing a counter variable that counts the number of unequal characters, and returns true if and only if that number is zero?
public static bool SecureEquals(byte[] original, byte[] potential)
{
// They should be the same size, but we don't want to throw an
// exception if we are wrong.
int maxLength = Math.Max(original.Length, potential.Length);
int equals = maxLength - Math.Min(original.Length, potential.Length);
for(int i=0; i < maxLength; i++)
{
byte originalByte = (i < original.Length) ? original[i] : (byte)0;
byte potentialByte = (i < potential.Length) ? potential[i] : (byte)0;
equals += originalByte != potentialByte ? 1 : 0;
}
return equals == 0;
}
I have no idea what is causing the difference -- a profiler seems like it would be a good tool to have here. But I'd consider going with a different approach altogether.
What I'd do in this situation is build a timer into the method so that it can measure its own timing when given two equal keys. (Use the Stopwatch class.) It should compute the mean and standard deviation of the success timing and stash that away in some global state. When you get an unequal key, you can then measure how much time it took you to detect the unequal key, and then make up the difference by spinning a tight loop until the appropriate amount of time has elapsed. You can choose a random time to spin consistent with a normal distribution based on the mean and standard deviation you've already computed.
The nice thing about this approach is that now you have a general mechanism that you can re-use when defeating other timing attacks, and the meaning of the code is obvious; no one is going to come along and try to optimize it away.
And like any other "Security" code, get it reviewed by a security professional before you ship it.
I have an idea of how I can improve the performance with dynamic code generation, but I'm not sure which is the best way to approach this problem.
Suppose I have a class
class Calculator
{
int Value1;
int Value2;
//..........
int ValueN;
void DoCalc()
{
if (Value1 > 0)
{
DoValue1RelatedStuff();
}
if (Value2 > 0)
{
DoValue2RelatedStuff();
}
//....
//....
//....
if (ValueN > 0)
{
DoValueNRelatedStuff();
}
}
}
The DoCalc method is at the lowest level and it is called many times during calculation. Another important aspect is that ValueN are only set at the beginning and do not change during calculation. So many of the ifs in the DoCalc method are unnecessary, as many of ValueN are 0. So I was hoping that dynamic code generation could help to improve performance.
For instance if I create a method
void DoCalc_Specific()
{
const Value1 = 0;
const Value2 = 0;
const ValueN = 1;
if (Value1 > 0)
{
DoValue1RelatedStuff();
}
if (Value2 > 0)
{
DoValue2RelatedStuff();
}
....
....
....
if (ValueN > 0)
{
DoValueNRelatedStuff();
}
}
and compile it with optimizations switched on the C# compiler is smart enough to only keep the necessary stuff. So I would like to create such method at run time based on the values of ValueN and use the generated method during calculations.
I guess that I could use expression trees for that, but expression trees works only with simple lambda functions, so I cannot use things like if, while etc. inside the function body. So in this case I need to change this method in an appropriate way.
Another possibility is to create the necessary code as a string and compile it dynamically. But it would be much better for me if I could take the existing method and modify it accordingly.
There's also Reflection.Emit, but I don't want to stick with it as it would be very difficult to maintain.
BTW. I'm not restricted to C#. So I'm open to suggestions of programming languages that are best suited for this kind of problem. Except for LISP for a couple of reasons.
One important clarification. DoValue1RelatedStuff() is not a method call in my algorithm. It's just some formula-based calculation and it's pretty fast. I should have written it like this
if (Value1 > 0)
{
// Do Value1 Related Stuff
}
I have run some performance tests and I can see that with two ifs when one is disabled the optimized method is about 2 times faster than with the redundant if.
Here's the code I used for testing:
public class Program
{
static void Main(string[] args)
{
int x = 0, y = 2;
var if_st = DateTime.Now.Ticks;
for (var i = 0; i < 10000000; i++)
{
WithIf(x, y);
}
var if_et = DateTime.Now.Ticks - if_st;
Console.WriteLine(if_et.ToString());
var noif_st = DateTime.Now.Ticks;
for (var i = 0; i < 10000000; i++)
{
Without(x, y);
}
var noif_et = DateTime.Now.Ticks - noif_st;
Console.WriteLine(noif_et.ToString());
Console.ReadLine();
}
static double WithIf(int x, int y)
{
var result = 0.0;
for (var i = 0; i < 100; i++)
{
if (x > 0)
{
result += x * 0.01;
}
if (y > 0)
{
result += y * 0.01;
}
}
return result;
}
static double Without(int x, int y)
{
var result = 0.0;
for (var i = 0; i < 100; i++)
{
result += y * 0.01;
}
return result;
}
}
I would usually not even think about such an optimization. How much work does DoValueXRelatedStuff() do? More than 10 to 50 processor cycles? Yes? That means you are going to build quite a complex system to save less then 10% execution time (and this seems quite optimistic to me). This can easily go down to less then 1%.
Is there no room for other optimizations? Better algorithms? An do you really need to eliminate single branches taking only a single processor cycle (if the branch prediction is correct)? Yes? Shouldn't you think about writing your code in assembler or something else more machine specific instead of using .NET?
Could you give the order of N, the complexity of a typical method, and the ratio of expressions usually evaluating to true?
It would surprise me to find a scenario where the overhead of evaluating the if statements is worth the effort to dynamically emit code.
Modern CPU's support branch prediction and branch predication, which makes the overhead for branches in small segments of code approach zero.
Have you tried to benchmark two hand-coded versions of the code, one that has all the if-statements in place but provides zero values for most, and one that removes all of those same if branches?
If you are really into code optimisation - before you do anything - run the profiler! It will show you where the bottleneck is and which areas are worth optimising.
Also - if the language choice is not limited (except for LISP) then nothing will beat assembler in terms of performance ;)
I remember achieving some performance magic by rewriting some inner functions (like the one you have) using assembler.
Before you do anything, do you actually have a problem?
i.e. does it run long enough to bother you?
If so, find out what is actually taking time, not what you guess. This is the quick, dirty, and highly effective method I use to see where time goes.
Now, you are talking about interpreting versus compiling. Interpreted code is typically 1-2 orders of magnitude slower than compiled code. The reason is that interpreters are continually figuring out what to do next, and then forgetting, while compiled code just knows.
If you are in this situation, then it may make sense to pay the price of translating so as to get the speed of compiled code.
I have some image processing code that loops through 2 multi-dimensional byte arrays (of the same size). It takes a value from the source array, performs a calculation on it and then stores the result in another array.
int xSize = ResultImageData.GetLength(0);
int ySize = ResultImageData.GetLength(1);
for (int x = 0; x < xSize; x++)
{
for (int y = 0; y < ySize; y++)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
The loop currently takes ~11ms, which I assume is mostly due to accessing the byte arrays values as the calculation is pretty simple (2 multiplications and 1 addition).
Is there anything I can do to speed this up? It is a time critical part of my program and this code gets called 80-100 times per second, so any speed gains, however small will make a difference. Also at the moment xSize = 768 and ySize = 576, but this will increase in the future.
Update: Thanks to Guffa (see answer below), the following code saves me 4-5ms per loop. Although it is unsafe code.
int size = ResultImageData.Length;
int counter = 0;
unsafe
{
fixed (byte* r = ResultImageData, c = CurrentImageData, a = AlphaImageData)
{
while (size > 0)
{
*(r + counter) = (byte)(*(c + counter) * AlphaValue +
*(a + counter) * OneMinusAlphaValue);
counter++;
size--;
}
}
}
To get any real speadup for this code you would need to use pointers to access the arrays, that removes all the index calculations and bounds checking.
int size = ResultImageData.Length;
unsafe
{
fixed(byte* rp = ResultImageData, cp = CurrentImageData, ap = AlphaImageData)
{
byte* r = rp;
byte* c = cp;
byte* a = ap;
while (size > 0)
{
*r = (byte)(*c * AlphaValue + *a * OneMinusAlphaValue);
r++;
c++;
a++;
size--;
}
}
}
Edit:
Fixed variables can't be changed, so I added code to copy the pointers to new pointers that can be changed.
These are all independent calculations so if you have a multicore CPU you should be able to gain some benefit by parallelizing the calculation. Note that you'd need to keep the threads around and just hand them work to do since the overhead of thread creation would probably make this slower rather than faster if the threads are recreated each time.
The other thing that may work is farming the work off to the graphics processor. Look at this question for some ideas, for example, using Accelerator.
An option would be to use unsafe code: fixing the array in memory and use pointer operations. I doubt the speed increase will be that dramatic though.
One note: how are you timing? If you are using DateTime then be aware that this class has poor resolution. You should add an outer loop and repeat the operation say ten times -- I bet the result is less than 110ms.
for (int outer = 0; outer < 10; ++outer)
{
for (int x = 0; x < xSize; x++)
{
for (int y = 0; y < ySize; y++)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
}
Since it appears that each cell in the matrix is calculated entirely independent of the others. You may want to look into having more than one thread handle this. To avoid the cost of creating threads you could have a thread pool.
If the matrix is of sufficient size, it could be a very nice speed gain. On the other hand, if it is too small, it may not help (even hurt). Worth a try though.
An example (pseudo code) could be like this:
void process(int x, int y) {
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
ThreadPool pool(3); // 3 threads big
int xSize = ResultImageData.GetLength(0);
int ySize = ResultImageData.GetLength(1);
for (int x = 0; x < xSize; x++) {
for (int y = 0; y < ySize; y++) {
pool.schedule(x, y); // this will add all tasks to the pool's work queue
}
}
pool.waitTilFinished(); // wait until all scheduled tasks are complete
EDIT: Michael Meadows mentioned in a comment that plinq may be a suitable alternative: http://msdn.microsoft.com/en-us/magazine/cc163329.aspx
I'd recommend running a few empty tests to figure out what your theoretical bounds are. For example, take out the calculation from inside the loop and see how much time is saved. Try replacing the double loop with a single loop that runs the same number of times and see how much time that saves. Then you can be sure you are going down the right path for optimization (the two paths I see are flattening the double loop into a single loop and working with the multiplication [maybe using a lookup table would be faster]).
Just real quick, you can get an optimization by looping in reverse and comparing against 0. Most CPUs have a fast op for comparison to 0.
E.g.
int xSize = ResultImageData.GetLength(0) -1;
int ySize = ResultImageData.GetLength(1) -1; //minor optimization suggested by commenter
for (int x = xSize; x >= 0; --x)
{
for (int y = ySize; y >=0; --y)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
See http://dotnetperls.com/Content/Decrement-Optimization.aspx
You are probably suffering from Boundschecking. Like Jon Skeet states, a jagged array instead of a multidimensional (that is data[][] instead of data[,]) will be faster, strange as that may seem.
The compiler will optimize
for (int i = 0; i < data.Length; i++)
by eliminating the per-element range check. But it's some kind of special case, it won't do the same for Getlength().
For the same reason, caching or hoisting the Length property (putting it in a variable like xSize) also used to be a bad thing though I haven't been able to verify that with Framework 3.5
Try swapping the x and y for loops for a more linear memory access pattern and (thus) less cache misses, like so.
int xSize = ResultImageData.GetLength(0);
int ySize = ResultImageData.GetLength(1);
for (int y = 0; y < ySize; y++)
{
for (int x = 0; x < xSize; x++)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
If you are using LockBits to get at the image buffer, you should loop through y in the outer loop and x in the inner loop as that is how it is stored in memory (by row, not column). I would say that 11ms is pretty darn fast though...
Does the image data have to be stored in a multi-dimensional (rectangular) array? If you use jagged arrays instead, you may well find the JIT has more optimizations available (including removing the bounds checking).
If CurrentImageData and/or AlphaImageData don't change every time you run your code snippet, you could store the product prior to running the code snippet you show and avoid that multiplication in your loops.
Edit: Another thing I just thought of: Sometimes int operations are quicker than byte operations. Offset this with your processor cache utilization (you'll increase the data size considerably and stand a greater risk of a cache miss).
442,368 additions and 884,736 multiplications for the calculation i would think 11ms is actually extremely slow on a modern CPU.
while i don't know much about the specifics of .net i do know high speed calculation is not its strong suit. In the past i've built java apps with similar problems, i've always used C libraries to do the image / audio processing.
coming from a hardware perspective you want to make sure the memory accesses are sequential, that is step through the buffer in the order it exists in memory. you also may need to reorder this such that the compiler takes advantage of available instructions such as SIMD. How to approach this will end up being dependent on your compiler and i can't help on vs.net.
on an embedded DSP i would break out
(AlphaImageData[x, y] * OneMinusAlphaValue) and (CurrentImageData[x, y] * AlphaValue) and use SIMD instructions to calculate buffers, possibly in parallel before performing the addition. perhaps doing small enough chunks to keep the buffers in cache on the cpu.
i believe anything you do will require more direct access to the memory/cpu than .net allows.
You may also want to take a look at the Mono runtime and its Simd extensions. Perhaps some of your calculations can make use of the SSE acceleration as I gather that you basically do vector calculations (I don't know up to which vector size there is acceleration for multiplication but there is for some sizes)
(Blog post announcing Mono.Simd: http://tirania.org/blog/archive/2008/Nov-03.html)
Of course, that wouldn't work on Microsoft .NET but maybe you are interested in some experimentation.
Interestingly, image data is frequently pretty similar, meaning that the calculations are likely very repetitive. Have you explored doing a lookup table for the calculations? So any time 0.8 was multiplied by 128 - value[80,128] which you've precalculated to 102.4, you simply looked that up? You're basically trading memory space for CPU speed, but it could work for you.
Of course, if your image data has too high a resolution (and goes to too significant a digit), this may not be practical.
Note: I'm optimizing because of past experience and due to profiler software's advice. I realize an alternative optimization would be to call GetNeighbors less often, but that is a secondary issue at the moment.
I have a very simple function described below. In general, I call it within a foreach loop. I call that function a lot (about 100,000 times per second). A while back, I coded a variation of this program in Java and was so disgusted by the speed that I ended up replacing several of the for loops which used it with 4 if statements. Loop unrolling seems ugly, but it did make a noticeable difference in application speed. So, I've come up with a few potential optimizations and thought I would ask for opinions on their merit and for suggestions:
Use four if statements and totally ignore the DRY principle. I am confident this will improve performance based on past experience, but it makes me sad. To clarify, the 4 if statements would be pasted anywhere I called getNeighbors() too frequently and would then have the inside of the foreach block pasted within them.
Memoize the results in some mysterious manner.
Add a "neighbors" property to all squares. Generate its contents at initialization.
Use a code generation utility to turn calls to GetNeighbors into if statements as part of compilation.
public static IEnumerable<Square> GetNeighbors(Model m, Square s)
{
int x = s.X;
int y = s.Y;
if (x > 0) yield return m[x - 1, y];
if (y > 0) yield return m[x, y - 1];
if (x < m.Width - 1) yield return m[x + 1, y];
if (y < m.Height - 1) yield return m[x, y + 1];
yield break;
}
//The property of Model used to get elements.
private Square[,] grid;
//...
public Square this[int x, int y]
{
get
{
return grid[x, y];
}
}
Note: 20% of the time spent by the GetNeighbors function is spent on the call to m.get_Item, the other 80% is spent in the method itself.
Brian,
I've run into similar things in my code.
The two things I've found with C# that helped me the most:
First, don't be afraid necessarily of allocations. C# memory allocations are very, very fast, so allocating an array on the fly can often be faster than making an enumerator. However, whether this will help depends a lot on how you're using the results. The only pitfall I see is that, if you return a fixed size array (4), you're going to have to check for edge cases in the routine that's using your results.
Depending on how large your matrix of Squares is in your model, you may be better off doing 1 check up front to see if you're on the edge, and if not, precomputing the full array and returning it. If you're on an edge, you can handle those special cases separately (make a 1 or 2 element array as appropriate). This would put one larger statement in there, but that is often faster in my experience. If the model is large, I would avoid precomputing all of the neighbors. The overhead in the Squares may outweigh the benefits.
In my experience, as well, preallocating and returning vs. using yield makes the JIT more likely to inline your function, which can make a big difference in speed. If you can take advantage of the IEnumerable results and you are not always using every returned element, that is better, but otherwise, precomputing may be faster.
The other thing to consider - I don't know what information is saved in Square in your case, but if hte object is relatively small, and being used in a large matrix and iterated over many, many times, consider making it a struct. I had a routine similar to this (called hundreds of thousands or millions of times in a loop), and changing the class to a struct, in my case, sped up the routine by over 40%. This is assuming you're using .net 3.5sp1, though, as the JIT does many more optimizations on structs in the latest release.
There are other potential pitfalls to switching to struct vs. class, of course, but it can have huge performance impacts.
I'd suggest making an array of Squares (capacity four) and returning that instead. I would be very suspicious about using iterators in a performance-sensitive context. For example:
// could return IEnumerable<Square> still instead if you preferred.
public static Square[] GetNeighbors(Model m, Square s)
{
int x = s.X, y = s.Y, i = 0;
var result = new Square[4];
if (x > 0) result[i++] = m[x - 1, y];
if (y > 0) result[i++] = m[x, y - 1];
if (x < m.Width - 1) result[i++] = m[x + 1, y];
if (y < m.Height - 1) result[i++] = m[x, y + 1];
return result;
}
I wouldn't be surprised if that's much faster.
I'm on a slippery slope, so insert disclaimer here.
I'd go with option 3. Fill in the neighbor references lazily and you've got a kind of memoization.
ANother kind of memoization would be to return an array instead of a lazy IEnumerable, and GetNeighbors becomes a pure function that is trivial to memoize. This amounts roughly to option 3 though.
In any case, but you know this, profile and re-evaluate every step of the way. I am for example unsure about the tradeoff between the lazy IEnumerable or returning an array of results directly. (you avoid some indirections but need an allocation).
Why not make the Square class responsible of returning it's neighbours? Then you have an excellent place to do lazy initialisation without the extra overhead of memoization.
public class Square {
private Model _model;
private int _x;
private int _y;
private Square[] _neightbours;
public Square(Model model, int x, int y) {
_model = model;
_x = x;
_y = y;
_neightbours = null;
}
public Square[] Neighbours {
get {
if (_neightbours == null) {
_neighbours = GetNeighbours();
}
return _neighbours;
}
}
private Square[] GetNeightbours() {
int len = 4;
if (_x == 0) len--;
if (_x == _model.Width - 1) len--;
if (_y == 0) len--;
if (-y == _model.Height -1) len--;
Square [] result = new Square(len);
int i = 0;
if (_x > 0) {
result[i++] = _model[_x - 1,_y];
}
if (_x < _model.Width - 1) {
result[i++] = _model[_x + 1,_y];
}
if (_y > 0) {
result[i++] = _model[_x,_y - 1];
}
if (_y < _model.Height - 1) {
result[i++] = _model[_x,_y + 1];
}
return result;
}
}
Depending on the use of GetNeighbors, maybe some inversion of control could help:
public static void DoOnNeighbors(Model m, Square s, Action<s> action) {
int x = s.X;
int y = s.Y;
if (x > 0) action(m[x - 1, y]);
if (y > 0) action(m[x, y - 1]);
if (x < m.Width - 1) action(m[x + 1, y]);
if (y < m.Height - 1) action(m[x, y + 1]);
}
But I'm not sure, if this has better performance.