I found some surprising results while drawing some shapes in .Net Compact Framework.
Method1 and Method2 draws some rectangles but Method1 is faster then Method2, here is the code:
Method1:
int height = Height;
for (int i = 0; i < data.Length; i++)
{
barYPos = Helper.GetPixelValue(Point1, Point2, data[i]);
barRect.X = barXPos;
barRect.Y = barYPos;
barRect.Height = height - barYPos;
//
//rects.Add(barRect);
_gBmp.FillRectangle(_barBrush, barRect);
//
barXPos += (WidthOfBar + DistanceBetweenBars);
}
Method2:
for (int i = 0; i < data.Length; i++)
{
barYPos = Helper.GetPixelValue(Point1, Point2, data[i]);
barRect.X = barXPos;
barRect.Y = barYPos;
barRect.Height = Height - barYPos;
//
//rects.Add(barRect);
_gBmp.FillRectangle(_barBrush, barRect);
//
barXPos += (WidthOfBar + DistanceBetweenBars);
}
The only difference between two is in Method1 I am storing the Height of the control in a local variable.
Can anyone please explain the reason and some guidelines for drawings in .Net Compact Framework?
Method 2 is slower because you're accessing the Height property at each iteration of your for loop. This property may cause some time consuming calculations, and putting it in a local variable outside the loop acts as a cache.
A call to a property in C# has more associated cost than it would directly accessing a variable in memory; as properties are generated as methods with a backing field in the background (and/or worse.. perhaps it queries something else!)
If your application is indeed single threaded and you can afford to cache it, do so. Avoid properties in tight loops.
I believe thats because you access Height- data.Lenght amoutns of times. and in the first methood you only initalize it once.
Related
I have a large list of Meshes, each with a large list of vertices which I iterate over. I iterate over them twice.
The first pass is to convert the vertices to latitude / longitude, and to find the centre point of these vertices, which I use to create a transformation matrix.
The second pass is to convert each point to a different coordinate system, and then multiply the point by the transformation matrix.
Since I will be processing millions of points, I'm attempting to see if it's possible to by-pass this double for-loop.
I've attempted this using delegates and events:
protected event TransformationMatrixCalculated CalculatedTransformationMatrix;
public async Task MyMethod(IList<Mesh> meshes)
{
var firstPass = 0;
var secondPass = 0;
var latLongs = new LatLongClass();
for (int i = 0; i < meshes.Count; i++)
{
var mesh = meshes[i];
var points = mesh.ControlPoints;
for (int j = 0; j < points.Count; j++)
{
firstPass++;
var v = points[j];
var wgs84 = ConvertToLatLong(v);
latLongs.AddPoint(wgs84);
var meshIndexCopy = i;
var vertexIndexCopy = j;
CalculatedTransformationMatrix += delegate
{
// Use calculated TransformMatrix once ready
UpdateMeshVertex(TransformMatrix, meshIndexCopy, vertexIndexCopy);
secondPass++;
};
}
}
// Now we have converted all the points we can calculate the center and the matrix
Center = latLongs.GetCenter();
TransformMatrix = CalcTransformationMatrix(Center);
// Inform all handlers that the matrix has been calculated and ready to use
CalculatedTransformationMatrix();
do
{
await Task.Delay(100);
}
while (firstPass != secondPass);
}
Firstly, with the above I needed to create copies of the index of the mesh/point when creating the delegate to ensure I had the correct indexes.
Secondly, the method needed to be awaitable, otherwise I'm assuming the event handlers would be called after the method had finished, causing issues down the road. So I added two counters & the Task.Delay which you can see above to ensure all the handlers had finished.
Thirdly, at present I have not found a way to use Parallel.ForEach, as the method I use to transform the points in both passes make use of c++ bindings (proj library), causing an Access Violation: 'Attempted to read or write protected memory'. This is something I'm looking into resolving as it may be the best way to iterate over all the points.
The above method works, but is on average 10% slower than just creating two for loops. I'm assuming this is the overhead of delegates / event handlers.
Is there a way for me to set out what I want to achieve? Am I looking at this wroing? Or am I prematurely optimising?
So, been looking at this code for a good while now, and I am lost.
The point is to run a for loop that adds classes to an array, and then for each class runs through an array of points inside of that class, and add variations to it.
This then shows as a bunch of dots on a form, which are supposed to move independently of each other, but now follows each other completely.
It does not matter how much variation there is or anything, it's just 99 dots with the exact same acceleration, velocity, and location, and path.
The code is here, the method isn't touched by any other code, and the problem arises before it returns.
//Point of the method is to put variations of Baby into an array, and return that array
Dot.Class[] MutateAndListBaby(Dot.Class Baby)
{
//Making the empty array
Dot.Class[] BabyList = new Dot.Class[dots.Length];
//For loop that goes through through the whole array
for (int i = 1; i < BabyList.Length; i++)
{
//For each itteration the for loop adds the class reference to the index, then puts the standard directions into that reference, and then sets a value preventing it from being changed in another code
BabyList[i] = new Dot.Class();
BabyList[i].Directions = Baby.Directions;
BabyList[i].StartupComplete = true;
//The zero index variation when made like this, allows it to not be overriden, which would lead one to believe that how the directions are copied is he problem
//But it shouldn't be, BabyList[i].Directions = Baby.Directions; should be fire and forget, it should just add the Directions to the array and then leave it
BabyList[0] = new Dot.Class();
BabyList[0].Directions = new PointF[100];
for (int b = 0; b < BabyList[0].Directions.Length; b++)
{
BabyList[0].Directions[b] = new Point (5, 10);
}
BabyList[0].StartupComplete = true;
//The for loop that shuld add variation, but it seems like it somehow overrides it self, somehow
for (int b = 0; b < BabyList[i].Directions.Length; b++)
{
if (rand.Next(0, 101) >= 100)
{
int rando = rand.Next(-50, 51);
float mod = (float)rando / 50;
float x = BabyList[i].Directions[b].X;
x = x + mod;
BabyList[i].Directions[b].X = rand.Next(-5, 6);
}
if (rand.Next(0, 101) >= 100)
{
int rando = rand.Next(-50, 51);
float mod = (float)rando / 50;
float y = BabyList[i].Directions[b].Y;
y = y * mod;
BabyList[i].Directions[b].Y = rand.Next(-5, 6);
}
}
//Now one would assume this would create a unique dot that would move 100% independently right? Since it's at the end of the for loop, so nothin should change it
// Nope, somehow it makes every other dot copy its directions...
if (i == 5)
{
for (int b = 0; b < BabyList[5].Directions.Length; b++)
{
BabyList[5].Directions[b] = new PointF(-5f, -5f);
}
}
}
return BabyList;
}
}
}
With the code there, what I get is the 0 index dot going its own way, while the other 99 dots for some reason follow the 5th index's Directions, even though they should get their own variations later on in the code.
Any help would be much appreciated, it probarbly something obvious, but trust me, been looking at this thing for quite a while, can't see anything.
If I understand you correctly, this might be the issue:
BabyList[i].Directions = Baby.Directions;
Directions is of type array of PointF - a reference. The line above does not copy the array. Is that what you assume? If I'm not misreading the code you're presenting, you're creating one Dot.Class with its own array of PointF at index 0 and fill the rest of your Dot.Class array with instances that share one single array.
Directions is array, which is a reference type. When you're making assigment of a variable of this type
BabyList[i].Directions = Baby.Directions;
no new instance is created and reference us just being copied into new variable which still references original instance. Essentially in your loop only very first item gets a new instance of Directions as it's explicitly constructed. The rest share the instance which comes as a member of parameter passed to the method.
You probably want to change your if conditions:
(rand.Next(0, 101) >= 100
to
(rand.Next(0, 100) < 99
This will run an average of 99 times out of 100, whereas your current condition runs 1 out of 101 times (on average)
Oh, and Benjamin Podszun's answer about assigning the same array (not a copy of the same array) to Directions apply as well!
(Assuming that Directions isn't a getter that you created to return a copy of an array instead of a reference!)
I'm quite new in C# and I want to creating something like falling snow (dots) in C# using Windows Forms.
I was already able to create the snowflakes at the top of the screen (I want to create new dot every 0,1s, at random x-position of Form and write down every snowflake's position into the List(Point) and with every Tick of timer (0,1s) I want the snowflake to change its position by 3px down and 1-3px right)
But I have the problem with refreshing the snowflakes positions. I don't know how to acces each snowflake in the List to Randomize its new position.
I tried foreach, but it gives me error, that says I cannot change variable in foreach.
Example:
foreach (var snowflake in snowflakeList)
{
snowflake.Y += 3;
snowflake.X += moveRandom.Next(1, 4);
}
Can anyone please tell me how can I divide List(Point) of snowflakes into invdividual snowflakes, so I could change position of every single dot separately?
Thank you :-)
The simplest way is just to use the index of the collection:
for (int i = 0; i < snowflakeList.Count; i++)
{
var snowflake = snowflakeList[i];
snowflake.Y += 3;
snowflake.X += moveRandom.Next(1, 4);
snowflakeList[i] = snowflake;
}
As Andrews answer use a for loop, but as the list is of Points (a value type) you would need to reference the Point in the list directly rather than make a copy of it:
for (int i = 0; i < snowflakeList.Count; i++)
{
snowflakeList[i].Y += 3;
snowflakeList[i].X += moveRandom.Next(1, 4);
}
We are working on a video processing application using EmguCV and recently had to do some pixel level operation. I initially wrote the loops to go across all the pixels in the image as follows:
for (int j = 0; j < Img.Width; j++ )
{
for (int i = 0; i < Img.Height; i++)
{
// Pixel operation code
}
}
The time to execute the loops was pretty bad. Then I posted on the EmguCV forum and got a suggestion to switch the loops like this:
for (int j = Img.Width; j-- > 0; )
{
for (int i = Img.Height; i-- > 0; )
{
// Pixel operation code
}
}
I was very surprised to find that the code executed in half the time!
The only thing I can think of is the comparison that takes place in the loops each time accesses a property, which it no longer has to. Is this the reason for the speed up? Or is there something else? I was thrilled to see this improvement. And would love it if someone could clarify the reason for this.
The difference isn't the cost of branching, it's the fact that you are fetching an object property Img.Width and Img.Height in the inner loop. The optimizer has no way of knowing that these are constants for purposes of that loop.
You should get the same performance speedup by doing this.
const int Width = Img.Width;
const int Height = Img.Height;
for (int j = 0; j < Width; j++ )
{
for (int i = 0; i < Height; i++)
{
// Pixel operation code
}
}
Edit:
As Joshua Suggests, putting Width in the inner loop will have you walking through the memory sequentially, which will be better cache coherency, and might be faster. (depends on how big your bitmap is).
const int Width = Img.Width;
const int Height = Img.Height;
for (int i = 0; i < Height; i++)
{
for (int j = 0; j < Width; j++ )
{
// Pixel operation code
}
}
I assume you are using the System.Drawing.Image class? Looking at the implementation of .Width and .Height I see they do a function call into GDI+ (GdipGetImageHeight and GdipGetImageWidth in gdiplus.dll), which seems to be rather expensive.
By going backwards you make that call once, rather than in every iteration.
It's not the loop reversal that speeds things up -- it's the fact that you're accessing the Width and Height properties far fewer times.
It's because the CPUs are like hockey players, they go faster when going backward ;-)
More seriously:
This is not related in the direction of the loop in any way, but rather to the fact that the in the original construct, the loop control conditions implied dereferencing the Img object to index to its Width or Height property (for each and single iteration in the loops), whereby the second construct evaluates these properties only once.
Also, the fact that the new condition tests against the value 0, saves even the loading of an immediate value.
This probably explains the difference (assuming the work done inside the inner was relatively minimal, i.e. +/- the same as work to test an Object.Property, since you indicate a roughly 50% gain).
Edit:
see Michael Stum's answer, which indicates that the Img.Width/Height reference is even more costly than thought. As it sometimes happens with properties, the implementation of the object may run a significant amount of code to produce the value (for example it may do a bunch of math to get to the width, each time, rather than somehow caching it etc..). This seems to be the case with this Img object, hence the interest to do this only once (if you are sure that the value will remain constant for the duration of the loop logic).
I have some image processing code that loops through 2 multi-dimensional byte arrays (of the same size). It takes a value from the source array, performs a calculation on it and then stores the result in another array.
int xSize = ResultImageData.GetLength(0);
int ySize = ResultImageData.GetLength(1);
for (int x = 0; x < xSize; x++)
{
for (int y = 0; y < ySize; y++)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
The loop currently takes ~11ms, which I assume is mostly due to accessing the byte arrays values as the calculation is pretty simple (2 multiplications and 1 addition).
Is there anything I can do to speed this up? It is a time critical part of my program and this code gets called 80-100 times per second, so any speed gains, however small will make a difference. Also at the moment xSize = 768 and ySize = 576, but this will increase in the future.
Update: Thanks to Guffa (see answer below), the following code saves me 4-5ms per loop. Although it is unsafe code.
int size = ResultImageData.Length;
int counter = 0;
unsafe
{
fixed (byte* r = ResultImageData, c = CurrentImageData, a = AlphaImageData)
{
while (size > 0)
{
*(r + counter) = (byte)(*(c + counter) * AlphaValue +
*(a + counter) * OneMinusAlphaValue);
counter++;
size--;
}
}
}
To get any real speadup for this code you would need to use pointers to access the arrays, that removes all the index calculations and bounds checking.
int size = ResultImageData.Length;
unsafe
{
fixed(byte* rp = ResultImageData, cp = CurrentImageData, ap = AlphaImageData)
{
byte* r = rp;
byte* c = cp;
byte* a = ap;
while (size > 0)
{
*r = (byte)(*c * AlphaValue + *a * OneMinusAlphaValue);
r++;
c++;
a++;
size--;
}
}
}
Edit:
Fixed variables can't be changed, so I added code to copy the pointers to new pointers that can be changed.
These are all independent calculations so if you have a multicore CPU you should be able to gain some benefit by parallelizing the calculation. Note that you'd need to keep the threads around and just hand them work to do since the overhead of thread creation would probably make this slower rather than faster if the threads are recreated each time.
The other thing that may work is farming the work off to the graphics processor. Look at this question for some ideas, for example, using Accelerator.
An option would be to use unsafe code: fixing the array in memory and use pointer operations. I doubt the speed increase will be that dramatic though.
One note: how are you timing? If you are using DateTime then be aware that this class has poor resolution. You should add an outer loop and repeat the operation say ten times -- I bet the result is less than 110ms.
for (int outer = 0; outer < 10; ++outer)
{
for (int x = 0; x < xSize; x++)
{
for (int y = 0; y < ySize; y++)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
}
Since it appears that each cell in the matrix is calculated entirely independent of the others. You may want to look into having more than one thread handle this. To avoid the cost of creating threads you could have a thread pool.
If the matrix is of sufficient size, it could be a very nice speed gain. On the other hand, if it is too small, it may not help (even hurt). Worth a try though.
An example (pseudo code) could be like this:
void process(int x, int y) {
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
ThreadPool pool(3); // 3 threads big
int xSize = ResultImageData.GetLength(0);
int ySize = ResultImageData.GetLength(1);
for (int x = 0; x < xSize; x++) {
for (int y = 0; y < ySize; y++) {
pool.schedule(x, y); // this will add all tasks to the pool's work queue
}
}
pool.waitTilFinished(); // wait until all scheduled tasks are complete
EDIT: Michael Meadows mentioned in a comment that plinq may be a suitable alternative: http://msdn.microsoft.com/en-us/magazine/cc163329.aspx
I'd recommend running a few empty tests to figure out what your theoretical bounds are. For example, take out the calculation from inside the loop and see how much time is saved. Try replacing the double loop with a single loop that runs the same number of times and see how much time that saves. Then you can be sure you are going down the right path for optimization (the two paths I see are flattening the double loop into a single loop and working with the multiplication [maybe using a lookup table would be faster]).
Just real quick, you can get an optimization by looping in reverse and comparing against 0. Most CPUs have a fast op for comparison to 0.
E.g.
int xSize = ResultImageData.GetLength(0) -1;
int ySize = ResultImageData.GetLength(1) -1; //minor optimization suggested by commenter
for (int x = xSize; x >= 0; --x)
{
for (int y = ySize; y >=0; --y)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
See http://dotnetperls.com/Content/Decrement-Optimization.aspx
You are probably suffering from Boundschecking. Like Jon Skeet states, a jagged array instead of a multidimensional (that is data[][] instead of data[,]) will be faster, strange as that may seem.
The compiler will optimize
for (int i = 0; i < data.Length; i++)
by eliminating the per-element range check. But it's some kind of special case, it won't do the same for Getlength().
For the same reason, caching or hoisting the Length property (putting it in a variable like xSize) also used to be a bad thing though I haven't been able to verify that with Framework 3.5
Try swapping the x and y for loops for a more linear memory access pattern and (thus) less cache misses, like so.
int xSize = ResultImageData.GetLength(0);
int ySize = ResultImageData.GetLength(1);
for (int y = 0; y < ySize; y++)
{
for (int x = 0; x < xSize; x++)
{
ResultImageData[x, y] = (byte)((CurrentImageData[x, y] * AlphaValue) +
(AlphaImageData[x, y] * OneMinusAlphaValue));
}
}
If you are using LockBits to get at the image buffer, you should loop through y in the outer loop and x in the inner loop as that is how it is stored in memory (by row, not column). I would say that 11ms is pretty darn fast though...
Does the image data have to be stored in a multi-dimensional (rectangular) array? If you use jagged arrays instead, you may well find the JIT has more optimizations available (including removing the bounds checking).
If CurrentImageData and/or AlphaImageData don't change every time you run your code snippet, you could store the product prior to running the code snippet you show and avoid that multiplication in your loops.
Edit: Another thing I just thought of: Sometimes int operations are quicker than byte operations. Offset this with your processor cache utilization (you'll increase the data size considerably and stand a greater risk of a cache miss).
442,368 additions and 884,736 multiplications for the calculation i would think 11ms is actually extremely slow on a modern CPU.
while i don't know much about the specifics of .net i do know high speed calculation is not its strong suit. In the past i've built java apps with similar problems, i've always used C libraries to do the image / audio processing.
coming from a hardware perspective you want to make sure the memory accesses are sequential, that is step through the buffer in the order it exists in memory. you also may need to reorder this such that the compiler takes advantage of available instructions such as SIMD. How to approach this will end up being dependent on your compiler and i can't help on vs.net.
on an embedded DSP i would break out
(AlphaImageData[x, y] * OneMinusAlphaValue) and (CurrentImageData[x, y] * AlphaValue) and use SIMD instructions to calculate buffers, possibly in parallel before performing the addition. perhaps doing small enough chunks to keep the buffers in cache on the cpu.
i believe anything you do will require more direct access to the memory/cpu than .net allows.
You may also want to take a look at the Mono runtime and its Simd extensions. Perhaps some of your calculations can make use of the SSE acceleration as I gather that you basically do vector calculations (I don't know up to which vector size there is acceleration for multiplication but there is for some sizes)
(Blog post announcing Mono.Simd: http://tirania.org/blog/archive/2008/Nov-03.html)
Of course, that wouldn't work on Microsoft .NET but maybe you are interested in some experimentation.
Interestingly, image data is frequently pretty similar, meaning that the calculations are likely very repetitive. Have you explored doing a lookup table for the calculations? So any time 0.8 was multiplied by 128 - value[80,128] which you've precalculated to 102.4, you simply looked that up? You're basically trading memory space for CPU speed, but it could work for you.
Of course, if your image data has too high a resolution (and goes to too significant a digit), this may not be practical.