I'm not stuck with my code because I don't know how to write it but more because I'm split between two ways of doing and don't know which path to go.
I will first write my problem then post some code.
So I have a switch with 4 cases and then in my function called by the cases, I have two more cases (ifs). I have to rewrite the code used in all those cases but a little different in every one.
If I write all the code, I have less things to check therefor it's more performant but it makes the code less flexible.
private void FillDiagonalStartAndEnd(FastNoise noiseBiome, FastNoise noiseTransition)
{
switch (direction)
{
case DirectionOfBiomeCell.NE:
TransitionDiagonallyNorthEast(noiseBiome, noiseTransition); // Less checks
break;
case DirectionOfBiomeCell.SE:
towardsRight = 1;
TransitionDiagonallyED(noiseBiome, noiseTransition, new Vector2(spacePosX -1, spacePosZ + BIOME_CELL_SIZE), new Vector2(spacePosX + BIOME_CELL_SIZE - 1, spacePosZ)); // More flexible
break;
case DirectionOfBiomeCell.SW:
//TransitionLinearlySouth(noiseBiome, noiseTransition);
break;
case DirectionOfBiomeCell.NW:
//TransitionLinearlyWest(noiseBiome, noiseTransition);
break;
default:
Debug.Log("Default case in FillDiagonallyStartAndEnd");
break;
}
}
Above, I have the NE direction which is more performant I think and then SE direction which is flexible with parameters.
Then in the functions :
private void TransitionDiagonallyED(FastNoise noiseBiome, FastNoise noiseTransition, Vector2 start, Vector2 end)
{
//Variables for flexible function
int heightModificator;
//Start and ending values for diagonal line
int startValue = TerrainGen.GetNoise2D(noiseBiome, (int)start.x, (int)start.y, TerrainGen.min, TerrainGen.max);
int endValue = TerrainGen.GetNoise2D(noiseTransition, (int)end.x, (int)end.y, TerrainGen.min, TerrainGen.max);
//Step values
float endMinusStart = endValue - startValue;
float stepValue = endMinusStart / BIOME_CELL_SIZE;
//1 or 0 for height start of diagonal
heightModificator = (int)start.y > spacePosX ? 1 : 0;
for (int xz = 0; xz < BIOME_CELL_SIZE; xz++)
{
//Making diagonal and adjusting if it starts at the bottom or top of the square
transitionHeights[xz, Mathf.Abs((BIOME_CELL_SIZE - 1) * heightModificator - xz)] = Mathf.RoundToInt((startValue + stepValue * (xz + 1)) * towardsRight + (endValue - xz * stepValue) * towardsLeft);
}
}
I won't post all of the other function since it's very long but above you can see in the loop that I am using multiplication by 0 or 1. This is set in the heightModificator which doesn't exist in the other function with no parameters. This is very handy since it's flexible and set in the beginning of the class.
public int towardsRight = 0;
public int towardsLeft = 0;
Now the other function doesn't need this since it's only usable in the case the direction is NE
private void TransitionDiagonallyNorthEast(FastNoise noiseBiome, FastNoise noiseTransition)
{
float endMinusStart;
float stepValue;
if (position == PositionOfBiomeCell.outward)
{
//For diagonal
endMinusStart = valuesFromTransitionNoiseEND[BIOME_CELL_SIZE - 1] - valuesFromBiomeNoiseSTART[0];
stepValue = endMinusStart / BIOME_CELL_SIZE;
for (int xz = 0; xz < BIOME_CELL_SIZE; xz++)
{
//Filling diagonal
transitionHeights[xz, xz] = Mathf.RoundToInt(valuesFromBiomeNoiseSTART[0] + (xz + 1) * stepValue);
So, above you can see there is no multiplication by heightModificator. The functions aren't finished, so there would be two more loops of this kind in both of the function. I didn't want to write them since I don't know for which one to go. What is considered good practice, being more flexible but less performant or the opposite?
In my case this code will be executed A LOT of times. I would say in my Start() function about 2080 (because a lot of objects use this code). Furthermore, my two loops which loop around 496 times each will be nested in the first one.
Thanks for reading
Best regards.
What is considered good practice, being more flexible but less performant or the opposite?
Until you are sure there is a performance issue, don't look for a way to optimize it. There is a good chance that you're spending time on the wrong thing. For your example, a decent device won't even break a sweat with simple code like this being called 2000 times, even if you do the these calls every second.
On the other hand, having clean code will enable you to work faster and produce less bugs. And it's easier to optimize clean code to make it faster than to clean up unnecessary optimizations.
Related
for(int i = 0; i < gos.Length; i++)
{
float randomspeed = (float)Math.Round (UnityEngine.Random.Range (1.0f, 15.0f));
floats.Add (randomspeed);
_animator [i].SetFloat ("Speed", randomspeed);
}
Now what i get is only round numbers between 1 and 15. I mean i'm not getting numbers like 1.0 or 5.4 or 9.8 or 14.5 is it logical to have speed values like this ? If so how can i make that the random numbers will include also floats ?
Second how can i make sure that there will be no the same numbers ?
gos Length is 15
As noted in the other answer, you aren't getting fractional values, because you call Math.Round(), which has the express purpose of rounding to the nearest whole number (when called the way you do).
As for preventing duplicates, I question the need to ensure against duplicates. First, the number of possible values within the range you're selecting is large enough that the chances of getting duplicates is very small. Second, it appears you are selecting random speeds for some game object, and it seems to me that in that scenario, it's entirely plausible that once in a while you would find a pair of game objects with the same speed.
That said, if you still want to do that, I would advise against the linear searches recommended by the other answers. Game logic should be reasonably efficient, and in this scenario that would mean using a hash set. For example:
HashSet<float> values = new HashSet<float>();
while (values.Count < gos.Length)
{
float randomSpeed = UnityEngine.Random.Range(1.0f, 15.0f);
// The Add() method returns "true" if the value _wasn't_ already in the set
if (values.Add(randomSpeed))
{
_animator[values.Count - 1].SetFloat("Speed, randomSpeed);
}
}
// it's not clear from your question whether you really need the list of
// floats at the end, but if you do, this is a way to convert the hash set
// to a list
floats = values.ToList();
The reason you're not getting any decimals is because you're using Math.Round, this will either raise the float to the next whole number or lower it.
As for if it's logical, it depends.As for your case animation speed is usually done by floats because it can smoothly speed up and down.
Also to answer your question on how to avoid duplicates of the same float.. which in itself is already very unlikely, try doing this instead :
for(int i = 0; i < gos.Length; i++)
{
float randomspeed = 0f;
// Keep repeating this until we find an unique randomspeed.
while(randomspeed == 0f || floats.Contains(randomspeed))
{
// Use this is you want round numbers
//randomspeed = Mathf.Round(Random.Range(1.0f, 15.0f));
randomspeed = Random.Range(1.0f, 15.0f);
}
floats.Add (randomspeed);
_animator [i].SetFloat ("Speed", randomspeed);
}
Your first problem: if you use Math.Round(), you'll never get numbers like 5.4...
Second question: you can check for existance of the number before you add the number:
private float GenerateRandomSpeed()
{ return (float)UnityEngine.Random.Range (1.0f, 15.0f);}
for(int i = 0; i < gos.Length; i++)
{
float randomspeed= GenerateRandomSpeed();
while (floats.any(x=>x==randomspeed))
randomspeed=GenerateRandomSpeed();
floats.Add (randomspeed);
_animator [i].SetFloat ("Speed", randomspeed);
}
I didn't test it but i hope it can direct you to the answer.
I'm having a problem generating the Terras number sequence.
Here is my unsuccessful attempt:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Terras
{
class Program
{
public static int Terras(int n)
{
if (n <= 1)
{
int return_value = 1;
Console.WriteLine("Terras generated : " + return_value);
return return_value;
}
else
{
if ((n % 2) == 0)
{
// Even number
int return_value = 1 / 2 * Terras(n - 1);
Console.WriteLine("Terras generated : " + return_value);
return return_value;
}
else
{
// Odd number
int return_value = 1 / 2 * (3 * Terras(n - 1) + 1);
Console.WriteLine("Terras generated : " + return_value);
return return_value;
}
}
}
static void Main(string[] args)
{
Console.WriteLine("TERRAS1");
Terras(1); // should generate 1
Console.WriteLine("TERRAS2");
Terras(2); // should generate 2 1 ... instead of 1 and 0
Console.WriteLine("TERRAS5");
Terras(5); // should generate 5,8,4,2,1 not 1 0 0 0 0
Console.Read();
}
}
}
What am I doing wrong?
I know the basics of recursion, but I don’t understand why this doesn’t work.
I observe that the first number of the sequence is actually the number that you pass in, and subsequent numbers are zero.
Change 1 / 2 * Terros(n - 1); to Terros(n - 1)/2;
Also 1 / 2 * (3 * Terros(n - 1) + 1); to (3 * Terros(n - 1) + 1)/2;
1/2 * ... is simply 0 * ... with int math.
[Edit]
Recursion is wrong and formula is mis-guided. Simple iterate
public static void Terros(int n) {
Console.Write("Terros generated :");
int t = n;
Console.Write(" " + t);
while (t > 1) {
int t_previous = t;
if (t_previous%2 == 0) {
t = t_previous/2;
}
else {
t = (3*t_previous+1)/2;
}
Console.Write(", " + t);
}
Console.WriteLine("");
}
The "n is even" should be "t(subscript n-1) is even" - same for "n is odd".
int return_value = 1 / 2 * Terros(n - 1);
int return_value = 1 / 2 * (3 * Terros(n - 1) + 1);
Unfortunately you've hit a common mistake people make with ints.
(int)1 / (int)2 will always be 0.
Since 1/2 is an integer divison it's always 0; in order to correct the math, just
swap the terms: not 1/2*n but n/2; instead of 1/2* (3 * n + 1) put (3 * n + 1) / 2.
Another issue: do not put computation (Terros) and output (Console.WriteLine) in the
same function
public static String TerrosSequence(int n) {
StringBuilder Sb = new StringBuilder();
// Again: dynamic programming is far better here than recursion
while (n > 1) {
if (Sb.Length > 0)
Sb.Append(",");
Sb.Append(n);
n = (n % 2 == 0) ? n / 2 : (3 * n + 1) / 2;
}
if (Sb.Length > 0)
Sb.Append(",");
Sb.Append(n);
return Sb.ToString();
}
// Output: "Terros generated : 5,8,4,2,1"
Console.WriteLine("Terros generated : " + TerrosSequence(5));
The existing answers guide you in the correct direction, but there is no ultimate one. I thought that summing up and adding detail would help you and future visitors.
The problem name
The original name of this question was “Conjuncture of Terros”. First, it is conjecture, second, the modification to the original Collatz sequence you used comes from Riho Terras* (not Terros!) who proved the Terras Theorem saying that for almost all t₀ holds that ∃n ∈ ℕ: tₙ < t₀. You can read more about it on MathWorld and chux’s question on Math.SE.
* While searching for who is that R. Terras mentioned on MathWorld, I found not only the record on Geni.com, but also probable author of that record, his niece Astrid Terras, and her family’s genealogy. Just for the really curious ones. ☺
The formula
You got the formula wrong in your question. As the table of sequences for different t₀ shows, you should be testing for parity of tₙ₋₁ instead of n.
Formula taken from MathWorld.
Also the second table column heading is wrong, it should read t₀, t₁, t₂, … as t₀ is listed too.
You repeat the mistake with testing n instead of tₙ₋₁ in your code, too. If output of your program is precisely specified (e.g. when checked by an automatic judge), think once more whether you should output t₀ or not.
Integer vs float arithmetic
When making an operation with two integers, you get an integer. If a float is involved, the result is float. In both branches of your condition, you compute an expression of this form:
1 / 2 * …
1 and 2 are integers, therefore the division is integer division. Integer division always rounds down, so the expression is in fact
0 * …
which is (almost*) always zero. Mystery solved. But how to fix it?
Instead of multiplying by one half, you can divide by two. In even branch, division by 2 gives no remainder. In odd branch, tₙ₋₁ is odd, so 3 · tₙ₋₁ is odd too. Odd plus 1 is even, so division by two always produces remainder equal to zero in both branches. Integer division is enough, the result is precise.
Also, you could use float division, just replace 1 with 1.0. But this will probably not give correct results. You see, all members of the sequence are integers and you’re getting float results! So rounding with Math.Round() and casting to integer? Nah… If you can, always evade using floats. There are very few use cases for them, I think, most having something to do with graphics or numerical algorithms. Most of the time you don’t really need them and they just introduce round-off errors.
* Zero times whatever could produce NaN too, but let’s ignore the possibility of “whatever” being from special float values. I’m just pedantic.
Recursive solution
Apart from the problems mentioned above, your whole recursive approach is flawed. Obviously you intended Terras(n) to be tₙ. That’s not utterly bad. But then you forgot that you supply t₀ and search for n instead of the other way round.
To fix your approach, you would need to set up a “global” variable int t0 that would be set to given t₀ and returned from Terras(0). Then Terras(n) would really return tₙ. But you wouldn’t still know the value of n when the sequence stops. You could only repeat for bigger and bigger n, ruining time complexity.
Wait. What about caching the results of intermediate Terras() calls in an ArrayList<int> t? t[i] will contain result for Terras(i) or zero if not initialized. At the top of Terras() you would add if (n < t.Count() && t[n] != 0) return t[n]; for returning the value immediately if cached and not repeating the computation. Otherwise the computation is really made and just before returning, the result is cached:
if (n < t.Count()) {
t[n] = return_value;
} else {
for (int i = t.Count(); i < n; i++) {
t.Add(0);
}
t.Add(return_value);
}
Still not good enough. Time complexity saved, but having the ArrayList increases space complexity. Try tracing (preferably manually, pencil & paper) the computation for t0 = 3; t.Add(t0);. You don’t know the final n beforehand, so you must go from 1 up, till Terras(n) returns 1.
Noticed anything? First, each time you increment n and make a new Terras() call, you add the computed value at the end of cache (t). Second, you’re always looking just one item back. You’re computing the whole sequence from the bottom up and you don’t need that big stupid ArrayList but always just its last item!
Iterative solution
OK, let’s forget that complicated recursive solution trying to follow the top-down definition and move to the bottom-up approach that popped up from gradual improvement of the original solution. Recursion is not needed anymore, it just clutters the whole thing and slows it down.
End of sequence is still found by incrementing n and computing tₙ, halting when tₙ = 1. Variable t stores tₙ, t_previous stores previous tₙ (now tₙ₋₁). The rest should be obvious.
public static void Terras(int t) {
Console.Write("Terras generated:");
Console.Write(" " + t);
while (t > 1) {
int t_previous = t;
if (t_previous % 2 == 0) {
t = t_previous / 2;
} else {
t = (3 * t_previous + 1) / 2;
}
Console.Write(", " + t);
}
Console.WriteLine("");
}
Variable names taken from chux’s answer, just for the sake of comparability.
This can be deemed a primitive instance of dynamic-programming technique. The evolution of this solution is common to the whole class of such problems. Slow recursion, call result caching, dynamic “bottom-up” approach. When you are more experienced with dynamic programming, you’ll start seeing it directly even in more complicated problems, not even thinking about recursion.
I have an idea of how I can improve the performance with dynamic code generation, but I'm not sure which is the best way to approach this problem.
Suppose I have a class
class Calculator
{
int Value1;
int Value2;
//..........
int ValueN;
void DoCalc()
{
if (Value1 > 0)
{
DoValue1RelatedStuff();
}
if (Value2 > 0)
{
DoValue2RelatedStuff();
}
//....
//....
//....
if (ValueN > 0)
{
DoValueNRelatedStuff();
}
}
}
The DoCalc method is at the lowest level and it is called many times during calculation. Another important aspect is that ValueN are only set at the beginning and do not change during calculation. So many of the ifs in the DoCalc method are unnecessary, as many of ValueN are 0. So I was hoping that dynamic code generation could help to improve performance.
For instance if I create a method
void DoCalc_Specific()
{
const Value1 = 0;
const Value2 = 0;
const ValueN = 1;
if (Value1 > 0)
{
DoValue1RelatedStuff();
}
if (Value2 > 0)
{
DoValue2RelatedStuff();
}
....
....
....
if (ValueN > 0)
{
DoValueNRelatedStuff();
}
}
and compile it with optimizations switched on the C# compiler is smart enough to only keep the necessary stuff. So I would like to create such method at run time based on the values of ValueN and use the generated method during calculations.
I guess that I could use expression trees for that, but expression trees works only with simple lambda functions, so I cannot use things like if, while etc. inside the function body. So in this case I need to change this method in an appropriate way.
Another possibility is to create the necessary code as a string and compile it dynamically. But it would be much better for me if I could take the existing method and modify it accordingly.
There's also Reflection.Emit, but I don't want to stick with it as it would be very difficult to maintain.
BTW. I'm not restricted to C#. So I'm open to suggestions of programming languages that are best suited for this kind of problem. Except for LISP for a couple of reasons.
One important clarification. DoValue1RelatedStuff() is not a method call in my algorithm. It's just some formula-based calculation and it's pretty fast. I should have written it like this
if (Value1 > 0)
{
// Do Value1 Related Stuff
}
I have run some performance tests and I can see that with two ifs when one is disabled the optimized method is about 2 times faster than with the redundant if.
Here's the code I used for testing:
public class Program
{
static void Main(string[] args)
{
int x = 0, y = 2;
var if_st = DateTime.Now.Ticks;
for (var i = 0; i < 10000000; i++)
{
WithIf(x, y);
}
var if_et = DateTime.Now.Ticks - if_st;
Console.WriteLine(if_et.ToString());
var noif_st = DateTime.Now.Ticks;
for (var i = 0; i < 10000000; i++)
{
Without(x, y);
}
var noif_et = DateTime.Now.Ticks - noif_st;
Console.WriteLine(noif_et.ToString());
Console.ReadLine();
}
static double WithIf(int x, int y)
{
var result = 0.0;
for (var i = 0; i < 100; i++)
{
if (x > 0)
{
result += x * 0.01;
}
if (y > 0)
{
result += y * 0.01;
}
}
return result;
}
static double Without(int x, int y)
{
var result = 0.0;
for (var i = 0; i < 100; i++)
{
result += y * 0.01;
}
return result;
}
}
I would usually not even think about such an optimization. How much work does DoValueXRelatedStuff() do? More than 10 to 50 processor cycles? Yes? That means you are going to build quite a complex system to save less then 10% execution time (and this seems quite optimistic to me). This can easily go down to less then 1%.
Is there no room for other optimizations? Better algorithms? An do you really need to eliminate single branches taking only a single processor cycle (if the branch prediction is correct)? Yes? Shouldn't you think about writing your code in assembler or something else more machine specific instead of using .NET?
Could you give the order of N, the complexity of a typical method, and the ratio of expressions usually evaluating to true?
It would surprise me to find a scenario where the overhead of evaluating the if statements is worth the effort to dynamically emit code.
Modern CPU's support branch prediction and branch predication, which makes the overhead for branches in small segments of code approach zero.
Have you tried to benchmark two hand-coded versions of the code, one that has all the if-statements in place but provides zero values for most, and one that removes all of those same if branches?
If you are really into code optimisation - before you do anything - run the profiler! It will show you where the bottleneck is and which areas are worth optimising.
Also - if the language choice is not limited (except for LISP) then nothing will beat assembler in terms of performance ;)
I remember achieving some performance magic by rewriting some inner functions (like the one you have) using assembler.
Before you do anything, do you actually have a problem?
i.e. does it run long enough to bother you?
If so, find out what is actually taking time, not what you guess. This is the quick, dirty, and highly effective method I use to see where time goes.
Now, you are talking about interpreting versus compiling. Interpreted code is typically 1-2 orders of magnitude slower than compiled code. The reason is that interpreters are continually figuring out what to do next, and then forgetting, while compiled code just knows.
If you are in this situation, then it may make sense to pay the price of translating so as to get the speed of compiled code.
I have some image processing code that loops through 2 multi-dimensional byte arrays (of the same size). It takes a value from the source array, performs a calculation on it and then stores the result in another array.
int xSize = ResultImageData.GetLength(0);
int ySize = ResultImageData.GetLength(1);
for (int x = 0; x < xSize; x++)
{
for (int y = 0; y < ySize; y++)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
The loop currently takes ~11ms, which I assume is mostly due to accessing the byte arrays values as the calculation is pretty simple (2 multiplications and 1 addition).
Is there anything I can do to speed this up? It is a time critical part of my program and this code gets called 80-100 times per second, so any speed gains, however small will make a difference. Also at the moment xSize = 768 and ySize = 576, but this will increase in the future.
Update: Thanks to Guffa (see answer below), the following code saves me 4-5ms per loop. Although it is unsafe code.
int size = ResultImageData.Length;
int counter = 0;
unsafe
{
fixed (byte* r = ResultImageData, c = CurrentImageData, a = AlphaImageData)
{
while (size > 0)
{
*(r + counter) = (byte)(*(c + counter) * AlphaValue +
*(a + counter) * OneMinusAlphaValue);
counter++;
size--;
}
}
}
To get any real speadup for this code you would need to use pointers to access the arrays, that removes all the index calculations and bounds checking.
int size = ResultImageData.Length;
unsafe
{
fixed(byte* rp = ResultImageData, cp = CurrentImageData, ap = AlphaImageData)
{
byte* r = rp;
byte* c = cp;
byte* a = ap;
while (size > 0)
{
*r = (byte)(*c * AlphaValue + *a * OneMinusAlphaValue);
r++;
c++;
a++;
size--;
}
}
}
Edit:
Fixed variables can't be changed, so I added code to copy the pointers to new pointers that can be changed.
These are all independent calculations so if you have a multicore CPU you should be able to gain some benefit by parallelizing the calculation. Note that you'd need to keep the threads around and just hand them work to do since the overhead of thread creation would probably make this slower rather than faster if the threads are recreated each time.
The other thing that may work is farming the work off to the graphics processor. Look at this question for some ideas, for example, using Accelerator.
An option would be to use unsafe code: fixing the array in memory and use pointer operations. I doubt the speed increase will be that dramatic though.
One note: how are you timing? If you are using DateTime then be aware that this class has poor resolution. You should add an outer loop and repeat the operation say ten times -- I bet the result is less than 110ms.
for (int outer = 0; outer < 10; ++outer)
{
for (int x = 0; x < xSize; x++)
{
for (int y = 0; y < ySize; y++)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
}
Since it appears that each cell in the matrix is calculated entirely independent of the others. You may want to look into having more than one thread handle this. To avoid the cost of creating threads you could have a thread pool.
If the matrix is of sufficient size, it could be a very nice speed gain. On the other hand, if it is too small, it may not help (even hurt). Worth a try though.
An example (pseudo code) could be like this:
void process(int x, int y) {
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
ThreadPool pool(3); // 3 threads big
int xSize = ResultImageData.GetLength(0);
int ySize = ResultImageData.GetLength(1);
for (int x = 0; x < xSize; x++) {
for (int y = 0; y < ySize; y++) {
pool.schedule(x, y); // this will add all tasks to the pool's work queue
}
}
pool.waitTilFinished(); // wait until all scheduled tasks are complete
EDIT: Michael Meadows mentioned in a comment that plinq may be a suitable alternative: http://msdn.microsoft.com/en-us/magazine/cc163329.aspx
I'd recommend running a few empty tests to figure out what your theoretical bounds are. For example, take out the calculation from inside the loop and see how much time is saved. Try replacing the double loop with a single loop that runs the same number of times and see how much time that saves. Then you can be sure you are going down the right path for optimization (the two paths I see are flattening the double loop into a single loop and working with the multiplication [maybe using a lookup table would be faster]).
Just real quick, you can get an optimization by looping in reverse and comparing against 0. Most CPUs have a fast op for comparison to 0.
E.g.
int xSize = ResultImageData.GetLength(0) -1;
int ySize = ResultImageData.GetLength(1) -1; //minor optimization suggested by commenter
for (int x = xSize; x >= 0; --x)
{
for (int y = ySize; y >=0; --y)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
See http://dotnetperls.com/Content/Decrement-Optimization.aspx
You are probably suffering from Boundschecking. Like Jon Skeet states, a jagged array instead of a multidimensional (that is data[][] instead of data[,]) will be faster, strange as that may seem.
The compiler will optimize
for (int i = 0; i < data.Length; i++)
by eliminating the per-element range check. But it's some kind of special case, it won't do the same for Getlength().
For the same reason, caching or hoisting the Length property (putting it in a variable like xSize) also used to be a bad thing though I haven't been able to verify that with Framework 3.5
Try swapping the x and y for loops for a more linear memory access pattern and (thus) less cache misses, like so.
int xSize = ResultImageData.GetLength(0);
int ySize = ResultImageData.GetLength(1);
for (int y = 0; y < ySize; y++)
{
for (int x = 0; x < xSize; x++)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
If you are using LockBits to get at the image buffer, you should loop through y in the outer loop and x in the inner loop as that is how it is stored in memory (by row, not column). I would say that 11ms is pretty darn fast though...
Does the image data have to be stored in a multi-dimensional (rectangular) array? If you use jagged arrays instead, you may well find the JIT has more optimizations available (including removing the bounds checking).
If CurrentImageData and/or AlphaImageData don't change every time you run your code snippet, you could store the product prior to running the code snippet you show and avoid that multiplication in your loops.
Edit: Another thing I just thought of: Sometimes int operations are quicker than byte operations. Offset this with your processor cache utilization (you'll increase the data size considerably and stand a greater risk of a cache miss).
442,368 additions and 884,736 multiplications for the calculation i would think 11ms is actually extremely slow on a modern CPU.
while i don't know much about the specifics of .net i do know high speed calculation is not its strong suit. In the past i've built java apps with similar problems, i've always used C libraries to do the image / audio processing.
coming from a hardware perspective you want to make sure the memory accesses are sequential, that is step through the buffer in the order it exists in memory. you also may need to reorder this such that the compiler takes advantage of available instructions such as SIMD. How to approach this will end up being dependent on your compiler and i can't help on vs.net.
on an embedded DSP i would break out
(AlphaImageData[x, y] * OneMinusAlphaValue) and (CurrentImageData[x, y] * AlphaValue) and use SIMD instructions to calculate buffers, possibly in parallel before performing the addition. perhaps doing small enough chunks to keep the buffers in cache on the cpu.
i believe anything you do will require more direct access to the memory/cpu than .net allows.
You may also want to take a look at the Mono runtime and its Simd extensions. Perhaps some of your calculations can make use of the SSE acceleration as I gather that you basically do vector calculations (I don't know up to which vector size there is acceleration for multiplication but there is for some sizes)
(Blog post announcing Mono.Simd: http://tirania.org/blog/archive/2008/Nov-03.html)
Of course, that wouldn't work on Microsoft .NET but maybe you are interested in some experimentation.
Interestingly, image data is frequently pretty similar, meaning that the calculations are likely very repetitive. Have you explored doing a lookup table for the calculations? So any time 0.8 was multiplied by 128 - value[80,128] which you've precalculated to 102.4, you simply looked that up? You're basically trading memory space for CPU speed, but it could work for you.
Of course, if your image data has too high a resolution (and goes to too significant a digit), this may not be practical.
Note: I'm optimizing because of past experience and due to profiler software's advice. I realize an alternative optimization would be to call GetNeighbors less often, but that is a secondary issue at the moment.
I have a very simple function described below. In general, I call it within a foreach loop. I call that function a lot (about 100,000 times per second). A while back, I coded a variation of this program in Java and was so disgusted by the speed that I ended up replacing several of the for loops which used it with 4 if statements. Loop unrolling seems ugly, but it did make a noticeable difference in application speed. So, I've come up with a few potential optimizations and thought I would ask for opinions on their merit and for suggestions:
Use four if statements and totally ignore the DRY principle. I am confident this will improve performance based on past experience, but it makes me sad. To clarify, the 4 if statements would be pasted anywhere I called getNeighbors() too frequently and would then have the inside of the foreach block pasted within them.
Memoize the results in some mysterious manner.
Add a "neighbors" property to all squares. Generate its contents at initialization.
Use a code generation utility to turn calls to GetNeighbors into if statements as part of compilation.
public static IEnumerable<Square> GetNeighbors(Model m, Square s)
{
int x = s.X;
int y = s.Y;
if (x > 0) yield return m[x - 1, y];
if (y > 0) yield return m[x, y - 1];
if (x < m.Width - 1) yield return m[x + 1, y];
if (y < m.Height - 1) yield return m[x, y + 1];
yield break;
}
//The property of Model used to get elements.
private Square[,] grid;
//...
public Square this[int x, int y]
{
get
{
return grid[x, y];
}
}
Note: 20% of the time spent by the GetNeighbors function is spent on the call to m.get_Item, the other 80% is spent in the method itself.
Brian,
I've run into similar things in my code.
The two things I've found with C# that helped me the most:
First, don't be afraid necessarily of allocations. C# memory allocations are very, very fast, so allocating an array on the fly can often be faster than making an enumerator. However, whether this will help depends a lot on how you're using the results. The only pitfall I see is that, if you return a fixed size array (4), you're going to have to check for edge cases in the routine that's using your results.
Depending on how large your matrix of Squares is in your model, you may be better off doing 1 check up front to see if you're on the edge, and if not, precomputing the full array and returning it. If you're on an edge, you can handle those special cases separately (make a 1 or 2 element array as appropriate). This would put one larger statement in there, but that is often faster in my experience. If the model is large, I would avoid precomputing all of the neighbors. The overhead in the Squares may outweigh the benefits.
In my experience, as well, preallocating and returning vs. using yield makes the JIT more likely to inline your function, which can make a big difference in speed. If you can take advantage of the IEnumerable results and you are not always using every returned element, that is better, but otherwise, precomputing may be faster.
The other thing to consider - I don't know what information is saved in Square in your case, but if hte object is relatively small, and being used in a large matrix and iterated over many, many times, consider making it a struct. I had a routine similar to this (called hundreds of thousands or millions of times in a loop), and changing the class to a struct, in my case, sped up the routine by over 40%. This is assuming you're using .net 3.5sp1, though, as the JIT does many more optimizations on structs in the latest release.
There are other potential pitfalls to switching to struct vs. class, of course, but it can have huge performance impacts.
I'd suggest making an array of Squares (capacity four) and returning that instead. I would be very suspicious about using iterators in a performance-sensitive context. For example:
// could return IEnumerable<Square> still instead if you preferred.
public static Square[] GetNeighbors(Model m, Square s)
{
int x = s.X, y = s.Y, i = 0;
var result = new Square[4];
if (x > 0) result[i++] = m[x - 1, y];
if (y > 0) result[i++] = m[x, y - 1];
if (x < m.Width - 1) result[i++] = m[x + 1, y];
if (y < m.Height - 1) result[i++] = m[x, y + 1];
return result;
}
I wouldn't be surprised if that's much faster.
I'm on a slippery slope, so insert disclaimer here.
I'd go with option 3. Fill in the neighbor references lazily and you've got a kind of memoization.
ANother kind of memoization would be to return an array instead of a lazy IEnumerable, and GetNeighbors becomes a pure function that is trivial to memoize. This amounts roughly to option 3 though.
In any case, but you know this, profile and re-evaluate every step of the way. I am for example unsure about the tradeoff between the lazy IEnumerable or returning an array of results directly. (you avoid some indirections but need an allocation).
Why not make the Square class responsible of returning it's neighbours? Then you have an excellent place to do lazy initialisation without the extra overhead of memoization.
public class Square {
private Model _model;
private int _x;
private int _y;
private Square[] _neightbours;
public Square(Model model, int x, int y) {
_model = model;
_x = x;
_y = y;
_neightbours = null;
}
public Square[] Neighbours {
get {
if (_neightbours == null) {
_neighbours = GetNeighbours();
}
return _neighbours;
}
}
private Square[] GetNeightbours() {
int len = 4;
if (_x == 0) len--;
if (_x == _model.Width - 1) len--;
if (_y == 0) len--;
if (-y == _model.Height -1) len--;
Square [] result = new Square(len);
int i = 0;
if (_x > 0) {
result[i++] = _model[_x - 1,_y];
}
if (_x < _model.Width - 1) {
result[i++] = _model[_x + 1,_y];
}
if (_y > 0) {
result[i++] = _model[_x,_y - 1];
}
if (_y < _model.Height - 1) {
result[i++] = _model[_x,_y + 1];
}
return result;
}
}
Depending on the use of GetNeighbors, maybe some inversion of control could help:
public static void DoOnNeighbors(Model m, Square s, Action<s> action) {
int x = s.X;
int y = s.Y;
if (x > 0) action(m[x - 1, y]);
if (y > 0) action(m[x, y - 1]);
if (x < m.Width - 1) action(m[x + 1, y]);
if (y < m.Height - 1) action(m[x, y + 1]);
}
But I'm not sure, if this has better performance.