Related
I've programmed a solution in Python which worked great, but required several libraries to install and a lot of burocratic setup to work. I've decided to build it with a GUI in C# on Visual Studio Community 2017 but in the first successful function the result was way slower than in Python. Which IMO it should actually be faster.
The code essentially is just doing a needle in a haystack image search, by getting all images from a folder and testing each needle (total 60 images) in a haystack, in python I return the string, but in C# I'm only printing.
My code in Python is the following:
def getImages(tela):
retorno = []
folder = 'Images'
img_rgb = cv2.imread(tela)
for filename in os.listdir(folder):
template = cv2.imread(os.path.join(folder,filename))
w, h = template.shape[:-1]
res = cv2.matchTemplate(img_rgb, template, cv2.TM_CCOEFF_NORMED)
threshold = .96
loc = np.where(res >= threshold)
if loc[0]>0:
retorno.append(filename[0]+filename[1].lower())
if len(retorno)> 1:
return retorno
and in C#:
Debug.WriteLine(ofd.FileName);
Image<Bgr, byte> source = new Image<Bgr, byte>(ofd.FileName);
string filepath = Directory.GetCurrentDirectory().ToString()+"\\Images";
DirectoryInfo d = new DirectoryInfo(filepath);
var files = d.GetFiles();
foreach (var fname in files){
Image<Bgr, byte> template = new Image<Bgr, byte>(fname.FullName);
Image<Gray, float> result = source.MatchTemplate(template, Emgu.CV.CvEnum.TemplateMatchingType.CcoeffNormed);
double[] minValues, maxValues;
Point[] minLocations, maxLocations;
result.MinMax(out minValues, out maxValues, out minLocations, out maxLocations);
if (maxValues[0] > 0.96) {
Debug.WriteLine(fname);
}
}
I didn't measure the time elapsed between each one, but I can say the result in C# takes about 3 seconds and in Python about 100ms.
There is room for optimization, if anyone would like to suggest any improvements, they are welcome.
The issue is that in Python code you finish the iteration when at least one match is added to retorno:
if len(retorno)> 1:
return retorno
In C# sample you continue iteration until all files are looped through.
I've combined the solutions proposed by denfromufa and HouseCat in the source code below, and did some overall cleanup, so you can see how your code could be. You will also notice minor readability improvements, since I wrote the refactored code using C# 7.0 / .NET 4.7.
Real Algorithm Optimization
Although denfromula correctly pointed out that implementation issue, and HouseCat mentioned using more CPU resources, the true gain relies on reducing the number of operations executed during your image search algorithm.
TURBO STAGE 1 - Suppose the MinMax() function goes through all your image's pixels to collect all those statistics, but you are only interested in using maxValue[0]. An extreme fine tuning would be to write a specific function which stops iterating through all your image's pixels when maxValue[0] goes below your minimum threshold. Apparently, that's all you need in your function. Remember: never burn all your processors computing lots of unused image statistics.
TURBO STAGE 2 - It looks like you are trying to recognize whether any image of your set of images matches your input screenshot (tela). If there are not too many images to be matched, and if you are constantly checking your screen for new matches, it is highly recommended to pre-load all those image match objects, and reuse them among your function calls. Constant disk IO operations and instantiating bitmap classes (for every single screenshot) leads to strong performance hit.
TURBO STAGE 3 - Just in case you are taking several screenshots per second, then try to reuse the screenshot's buffer. Constantly reallocating the whole screenshot's buffer when its dimensions simply did not change also causes performance loss.
TURBO STAGE 4 - This is hard to get, and depends on how much you want to invest on this. Think about your image recognition system as a big pipeline. The bitmaps as containers of data flowing among your stages (image matching stage, OCR stage, mouse position painting stage, video recording stage, etc). The idea is to create a fixed number of containers and reuse them, avoiding their creation and their destruction. The amount of containers is like the "buffer size" for your pipeline system. When the several stages of your pipeline finished using these containers, they are returned to the start of your pipeline, to a kind of container pool.
This last optimization this is really hard to achieve using these external libraries, because in most cases their API require some internal bitmap instantiation, and the fine tuning would also cause extreme software coupling between your library and the external one. So you will have to dig into these nice libraries to understand how they actually work, and build your own custom Framework. I can say it's a nice learning experience.
Those libraries are really cool for many purposes; they provide a generic API for improved functionality re-usability. This also means they address much more stuff than you actually need in a single API call. When it comes to high performance algorithms, you should always re-think what is the essential functionality you need from those libraries to achieve your goal, and if they are your bottleneck, do it by yourself.
I can say that a good fine-tuned image recognition algorithm doesn't take more than a few milliseconds to do what you want. I've experienced image recognition applications which do it almost instantaneously for larger screenshots (e.g. Eggplant Functional).
Now back to your code...
Your refactored code should look like below. I did not include all those fine-tuned algorithms I've mentioned - you should better ask separate questions for them in SO.
Image<Bgr, byte> source = new Image<Bgr, byte>(ofd.FileName);
// Preferably use Path.Combine here:
string dir = Path.Combine(Directory.GetCurrentDirectory(), "Images");
// Check whether directory exists:
if (!Directory.Exists(dir))
throw new Exception($"Directory was not found: '{dir}'");
// It looks like you just need filenames here...
// Simple parallel foreach suggested by HouseCat (in 2.):
Parallel.ForEach(Directory.GetFiles(dir), (fname) =>
{
Image<Gray, float> result = source.MatchTemplate(
new Image<Bgr, byte>(fname.FullName),
Emgu.CV.CvEnum.TemplateMatchingType.CcoeffNormed);
// By using C# 7.0, we can do inline out declarations here:
result.MinMax(
out double[] minValues,
out double[] maxValues,
out Point[] minLocations,
out Point[] maxLocations);
if (maxValues[0] > 0.96)
{
// ...
var result = ...
return result; // <<< As suggested by: denfromufa
}
// ...
});
Happy Tuning ;-)
This (denfromufa's answer) indeed explains your issue but to piggy back and add a few suggestions/optimizations as well:
1.) Your GetFiles can be replaced with a Parallel file enumerator, that is also recursive with children directories. I have shamelessly written a few on GitHub.
2.) You can parellelize the foreach loop into a Parallel.ForEach(files, fname () => { Code(); }); Again, my FileSearchBenchmark Repository on GitHub has plenty of File code execution in Parallel to provide examples.
I've been look ing for a method for connected component labeling in Emgu (c# wrapper for OpenCV). I've failed to find a direct method for such a basic CV strategy. However, I did come across many suggestions for doing it using FindContours and DrawContours but without code examples. So I had a go at it and it seems to work okay.
I'm dropping it here for two reasons.
So people searching for it can find a code example.
More importantly, i'm wondering if there are suggestions for optimization and improvements of this function. E.g. is the chain approximation method for FindContours efficient/appropriate?
public static Image<Gray, byte> LabelConnectedComponents(this Image<Gray, byte> binary, int startLabel)
{
Contour<Point> contours = binary.FindContours(
CHAIN_APPROX_METHOD.CV_CHAIN_APPROX_NONE,
RETR_TYPE.CV_RETR_CCOMP);
int count = startLabel;
for (Contour<Point> cont = contours;
cont != null;
cont = cont.HNext)
{
CvInvoke.cvDrawContours(
binary,
cont,
new MCvScalar(count),
new MCvScalar(0),
2,
-1,
LINE_TYPE.FOUR_CONNECTED,
new Point(0, 0));
++count;
}
return binary;
}
I would use the following:
connected component labeling (AForge / Accord.NET ?)
(although for many cases you will find it almost the same for the function that you wrote, give it a try to verify the results)
After this step you will probably find more regions that are close together and that belong to the same person.
Than you can use:
implement or search for implementation of hierarhical aglomerative clustering (HCA)
to combine close region.
P.S.:
is the chain approximation method for FindContours efficient/appropriate
if you are using NO_APPROX no approximations of chain code will be used. By using this you can get non-smoothed edges (with many small hills and valleys) but if that does not bother you than this parameter value is fine.
First of all, I am aware that this question really sounds as if I didn't search, but I did, a lot.
I wrote a small Mandelbrot drawing code for C#, it's basically a windows form with a PictureBox on which I draw the Mandelbrot set.
My problem is, is that it's pretty slow. Without a deep zoom it does a pretty good job and moving around and zooming is pretty smooth, takes less than a second per drawing, but once I start to zoom in a little and get to places which require more calculations it becomes really slow.
On other Mandelbrot applications my computer does really fine on places which work much slower in my application, so I'm guessing there is much I can do to improve the speed.
I did the following things to optimize it:
Instead of using the SetPixel GetPixel methods on the bitmap object, I used LockBits method to write directly to memory which made things a lot faster.
Instead of using complex number objects (with classes I made myself, not the built-in ones), I emulated complex numbers using 2 variables, re and im. Doing this allowed me to cut down on multiplications because squaring the real part and the imaginary part is something that is done a few time during the calculation, so I just save the square in a variable and reuse the result without the need to recalculate it.
I use 4 threads to draw the Mandelbrot, each thread does a different quarter of the image and they all work simultaneously. As I understood, that means my CPU will use 4 of its cores to draw the image.
I use the Escape Time Algorithm, which as I understood is the fastest?
Here is my how I move between the pixels and calculate, it's commented out so I hope it's understandable:
//Pixel by pixel loop:
for (int r = rRes; r < wTo; r++)
{
for (int i = iRes; i < hTo; i++)
{
//These calculations are to determine what complex number corresponds to the (r,i) pixel.
double re = (r - (w/2))*step + zeroX ;
double im = (i - (h/2))*step - zeroY;
//Create the Z complex number
double zRe = 0;
double zIm = 0;
//Variables to store the squares of the real and imaginary part.
double multZre = 0;
double multZim = 0;
//Start iterating the with the complex number to determine it's escape time (mandelValue)
int mandelValue = 0;
while (multZre + multZim < 4 && mandelValue < iters)
{
/*The new real part equals re(z)^2 - im(z)^2 + re(c), we store it in a temp variable
tempRe because we still need re(z) in the next calculation
*/
double tempRe = multZre - multZim + re;
/*The new imaginary part is equal to 2*re(z)*im(z) + im(c)
* Instead of multiplying these by 2 I add re(z) to itself and then multiply by im(z), which
* means I just do 1 multiplication instead of 2.
*/
zRe += zRe;
zIm = zRe * zIm + im;
zRe = tempRe; // We can now put the temp value in its place.
// Do the squaring now, they will be used in the next calculation.
multZre = zRe * zRe;
multZim = zIm * zIm;
//Increase the mandelValue by one, because the iteration is now finished.
mandelValue += 1;
}
//After the mandelValue is found, this colors its pixel accordingly (unsafe code, accesses memory directly):
//(Unimportant for my question, I doubt the problem is with this because my code becomes really slow
// as the number of ITERATIONS grow, this only executes more as the number of pixels grow).
Byte* pos = px + (i * str) + (pixelSize * r);
byte col = (byte)((1 - ((double)mandelValue / iters)) * 255);
pos[0] = col;
pos[1] = col;
pos[2] = col;
}
}
What can I do to improve this? Do you find any obvious optimization problems in my code?
Right now there are 2 ways I know I can improve it:
I need to use a different type for numbers, double is limited with accuracy and I'm sure there are better non-built-in alternative types which are faster (they multiply and add faster) and have more accuracy, I just need someone to point me where I need to look and tell me if it's true.
I can move processing to the GPU. I have no idea how to do this (OpenGL maybe? DirectX? is it even that simple or will I need to learn a lot of stuff?). If someone can send me links to proper tutorials on this subject or tell me in general about it that would be great.
Thanks a lot for reading that far and hope you can help me :)
If you decide to move the processing to the gpu, you can choose from a number of options. Since you are using C#, XNA will allow you to use HLSL. RB Whitaker has the easiest XNA tutorials if you choose this option. Another option is OpenCL. OpenTK comes with a demo program of a julia set fractal. This would be very simple to modify to display the mandlebrot set. See here
Just remember to find the GLSL shader that goes with the source code.
About the GPU, examples are no help for me because I have absolutely
no idea about this topic, how does it even work and what kind of
calculations the GPU can do (or how is it even accessed?)
Different GPU software works differently however ...
Typically a programmer will write a program for the GPU in a shader language such as HLSL, GLSL or OpenCL. The program written in C# will load the shader code and compile it, and then use functions in an API to send a job to the GPU and get the result back afterwards.
Take a look at FX Composer or render monkey if you want some practice with shaders with out having to worry about APIs.
If you are using HLSL, the rendering pipeline looks like this.
The vertex shader is responsible for taking points in 3D space and calculating their position in your 2D viewing field. (Not a big concern for you since you are working in 2D)
The pixel shader is responsible for applying shader effects to the pixels after the vertex shader is done.
OpenCL is a different story, its geared towards general purpose GPU computing (ie: not just graphics). Its more powerful and can be used for GPUs, DSPs, and building super computers.
WRT coding for the GPU, you can look at Cudafy.Net (it does OpenCL too, which is not tied to NVidia) to start getting an understanding of what's going on and perhaps even do everything you need there. I've quickly found it - and my graphics card - unsuitable for my needs, but for the Mandelbrot at the stage you're at, it should be fine.
In brief: You code for the GPU with a flavour of C (Cuda C or OpenCL normally) then push the "kernel" (your compiled C method) to the GPU followed by any source data, and then invoke that "kernel", often with parameters to say what data to use - or perhaps a few parameters to tell it where to place the results in its memory.
When I've been doing fractal rendering myself, I've avoided drawing to a bitmap for the reasons already outlined and deferred the render phase. Besides that, I tend to write massively multithreaded code which is really bad for trying to access a bitmap. Instead, I write to a common store - most recently I've used a MemoryMappedFile (a builtin .Net class) since that gives me pretty decent random access speed and a huge addressable area. I also tend to write my results to a queue and have another thread deal with committing the data to storage; the compute times of each Mandelbrot pixel will be "ragged" - that is to say that they will not always take the same length of time. As a result, your pixel commit could be the bottleneck for very low iteration counts. Farming it out to another thread means your compute threads are never waiting for storage to complete.
I'm currently playing with the Buddhabrot visualisation of the Mandelbrot set, looking at using a GPU to scale out the rendering (since it's taking a very long time with the CPU) and having a huge result-set. I was thinking of targetting an 8 gigapixel image, but I've come to the realisation that I need to diverge from the constraints of pixels, and possibly away from floating point arithmetic due to precision issues. I'm also going to have to buy some new hardware so I can interact with the GPU differently - different compute jobs will finish at different times (as per my iteration count comment earlier) so I can't just fire batches of threads and wait for them all to complete without potentially wasting a lot of time waiting for one particularly high iteration count out of the whole batch.
Another point to make that I hardly ever see being made about the Mandelbrot Set is that it is symmetrical. You might be doing twice as much calculating as you need to.
For moving the processing to the GPU, you have lots of excellent examples here:
https://www.shadertoy.com/results?query=mandelbrot
Note that you need an WebGL capable browser to view that link. Works best in Chrome.
I'm no expert on fractals but you seem to have come far already with the optimizations. Going beyond that may make the code much harder to read and maintain so you should ask yourself it is worth it.
One technique I've often observed in other fractal programs is this: While zooming, calculate the fractal at a lower resolution and stretch it to full size during render. Then render at full resolution as soon as zooming stops.
Another suggestion is that when you use multiple threads you should take care that each thread don't read/write memory of other threads because this will cause cache collisions and hurt performance. One good algorithm could be split the work up in scanlines (instead of four quarters like you did now). Create a number of threads, then as long as there as lines left to process, assign a scanline to a thread that is available. Let each thread write the pixel data to a local piece of memory and copy this back to main bitmap after each line (to avoid cache collisions).
The application I am working on currently requires functionality for Perspective Image Distortion. Basically what I want to do is to allow users to load an image into the application and adjust its perspective view properties based on 4 corner points that they can specify.
I had a look at ImageMagic. It has some distort functions with perpective adjustment but is very slow and some certain inputs are giving incorrect outputs.
Any of you guys used any other library or algorithm. I am coding in C#.
Any pointers would be much appreciated.
Thanks
This seems to be exactly what you (and I) were looking for:
http://www.codeproject.com/KB/graphics/YLScsFreeTransform.aspx
It will take an image and distort it using 4 X/Y coordinates you provide.
Fast, free, simple code. Tested and it works beautifully. Simply download the code from the link, then use FreeTransform.cs like this:
using (System.Drawing.Bitmap sourceImg = new System.Drawing.Bitmap(#"c:\image.jpg"))
{
YLScsDrawing.Imaging.Filters.FreeTransform filter = new YLScsDrawing.Imaging.Filters.FreeTransform();
filter.Bitmap = sourceImg;
// assign FourCorners (the four X/Y coords) of the new perspective shape
filter.FourCorners = new System.Drawing.PointF[] { new System.Drawing.PointF(0, 0), new System.Drawing.PointF(300, 50), new System.Drawing.PointF(300, 411), new System.Drawing.PointF(0, 461)};
filter.IsBilinearInterpolation = true; // optional for higher quality
using (System.Drawing.Bitmap perspectiveImg = filter.Bitmap)
{
// perspectiveImg contains your completed image. save the image or do whatever.
}
}
Paint .NET can do this and there are also custom implementations of the effect. You could ask for the source code or use Reflector to read it and get an idea of how to code it.
If it is a perspective transform, you should be able to specify a 4x4 transformation matrix that matches the four corners.
Calculate that matrix, then apply each pixel on the resulting image on the matrix, resulting in the "mapped" pixel. Notice that this "mapped" pixel is very likely going to lie between two or even four pixels. In this case, use your favorite interpolation algorithm (e.g. bilinear, bicubic) to get the interpolated color.
This really is the only way for it to be done and cannot be done faster. If this feature is crucial and you absolutely need it to be fast, then you'll need to offload the task to a GPU. For example, you can call upon the DirectX library to apply a perspective transformation on a texture. That can make it extremely fast, even when there is no GPU because the DirectX library uses SIMD instructions to accelerate matrix calculations and color interpolations.
Had the same problem. Here is the demo code with sources ported from gimp.
YLScsFreeTransform doesn't work as expected. Way better solution is ImageMagic
Here is how you use it in c#:
using(MagickImage image = new MagickImage("test.jpg"))
{
image.Distort(DistortMethod.Perspective, new double[] { x0,y0, newX0,newY0, x1,y1,newX1,newY1, x2,y2,newX2,newY2, x3,y3,newX3,newY3 });
control.Image = image.ToBitmap();
}
I have an dynamic List of Point, new Point can be added at any time. I want to draw lines to connect them using different color. Color is based on the index of those points. Here is the code:
private List<Point> _points;
private static Pen pen1 = new Pen(Color.Red, 10);
private static Pen pen2 = new Pen(Color.Yellow, 10);
private static Pen pen3 = new Pen(Color.Blue, 10);
private static Pen pen4 = new Pen(Color.Green, 10);
private void Init()
{
// use fixed 80 for simpicity
_points = new List<Point>(80);
for (int i = 0; i < 80; i++)
{
_points.Add(new Point(30 + i * 10, 30));
}
}
private void DrawLinesNormal(PaintEventArgs e)
{
for (int i = 0; i < _points.Count-1; i++)
{
if (i < 20)
e.Graphics.DrawLine(pen1, _points[i], _points[i + 1]);
else if (i < 40)
e.Graphics.DrawLine(pen2, _points[i], _points[i + 1]);
else if (i < 60)
e.Graphics.DrawLine(pen3, _points[i], _points[i + 1]);
else
e.Graphics.DrawLine(pen4, _points[i], _points[i + 1]);
}
}
I find this method is not fast enough when I have new points coming in at a high speed. Is there any way to make it faster? I did some research and someone said using GraphicsPath could be faster, but how?
[UPDATE] I collect some possible optimizations:
Using GrahpicsPath, Original Question
Change Graphics quality ( such as SmoothingMode/PixelOffsetMode...), also call SetClip to specify the only necessary region to render.
You won't be able to squeeze much more speed out of that code without losing quality or changing to a faster renderer (GDI, OpenGL, DirectX). But GDI will often be quite a bit faster (maybe 2x), and DirectX/OpenGL can be much faster (maybe 10x), depending on what you're drawing.
The idea of using a Path is that you batch many (in your example, 20) lines into a single method call, rather than calling DrawLine 20 times. This will only benefit you if you can arrange the incoming data into the correct list-of-points format for the drawing routine. Otherwise, you will have to copy the points into the correct data structure and this will waste a lot of the time that you are gaining by batching into a path. In the case of DrawPath, you may have to create a GraphicsPath from an array of points, which may result in no time saved. But if you have to draw the same path more than once, you can cache it, and you may then see a net benefit.
If new points are added to the list, but old ones are not removed (i.e. you are always just adding new lines to the display) then you would be able to use an offscreen bitmap to store the lines rendered so far. That way each time a point is added, you draw one line, rather than drawing all 80 lines every time.
It all depends on exactly what you're trying to do.
Doesn't really help to improve performance, but i would put the pens also into a list and writing all this lines in this way:
int ratio = _points.Count / _pens.Count;
for (int i = 0; i < _points.Count - 1; i++)
{
e.Graphics.DrawLine(_pens[i / ratio], _points[i], _points[i + 1]);
}
This is about as fast as you're going to get with System.Drawing. You might see a bit of gain using Graphics.DrawLines(), but you'd need to format your data differently to get the advantage of drawing a bunch of lines at once with the same pen. I seriously doubt GraphicsPath will be faster.
One sure way to improve speed is to reduce the quality of the output. Set Graphics.InterpolationMode to InterpolationMode.Low, Graphics.CompositingQuality to CompositingQuality.HighSpeed, Graphics.SmoothingMode to SmoothingMode.HighSpeed, Graphics.PixelOffsetMode to PixelOffsetMode.HighSpeed and Graphics.CompositingMode to CompositingMode.SourceCopy.
I remember a speed test once where someone compared Graphics to P/Invoke into GDI routines, and was quite surprised by the much faster P/Invoke speeds. You might check that out. I'll see if I can find that comparison... Apparently this was for the Compact Framework, so it likely doesn't hold for a PC.
The other way to go is to use Direct2D, which can be faster yet than GDI, if you have the right hardware.
Too late, but possibly somebody still need a solution.
I've created small library GLGDI+ with similiar (but not full/equal) GDI+ syntax, which run upon OpenTK: http://code.google.com/p/glgdiplus/
I'm not sure about stability, it has some issues with DrawString (problem with TextPrint from OpenTK). But if you need performance boost for your utility (like level editor in my case) it can be solution.
You might wanna look into the Brush object, and it's true that you won't get near real-time performance out of a GDI+ program, but you can easily maintain a decent fps as long as the geometry and number of objects stay within reasonable bounds. As for line drawing, I don't see why not.
But if you reach the point where you doing what you think is optimal, and all that is, drawing lines.. you should consider a different graphics stack, and if you like .NET but have issues with unmanaged APIs like OpenGL and DirectX, go with WPF or Silverlight, it's quite powerful.
Anyway, you could try setting up a System.Drawing.Drawing2D.GraphicsPath and then using a System.Drawing.Drawing2D.PathGradientBrush to a apply the colors this way. That's a single buffered draw call and if you can't get enough performance out of that. You'll have to go with something other entirely than GDI+
Not GDI(+) at all, but a completely different way to tackle this could be to work with a block of memory, draw your lines into there, convert it to a Bitmap object to instantly paint where you need to show your lines.
Of course this hinges in the extreme on fast ways to
draw lines of given color in the memory representation of choice and
convert that to the Bitmap to show.
Not in the .NET Framework, I think, but perhaps in a third party library? Isn't there a bitmap writer of sorts in Silverlight for stuff like this? (Not into Silverlight myself that much yet...)
At least it might be an out of the box way to approach this. Hope it helps.
I think you have to dispose pen object and e.Graphics object after drawing.
One more thing it is better if you write your drawLine code inside onPaint().
just override onPaint() method it support better drawing and fast too.