I have a project on monogame platform. The purpose of the project is to make the calculation of the viewfactor of geometry put into the platform using the ortographic method. In a basic level, I put a basic cube and camera across from the cube. Here as I look into the cube through the camera, I am required to count the number of pixels of an object seen from a perspective by the ortographic method. I already have a solution but it is very slow. In my solution, I count the number of pixels with a certain color and then divide that number to the total number of pixels on the screen. I have heard of a technique that involves using OcclusionQuery. But I guess I have to do some shader programming in order to use that technique, of which I do not have a clue. Can you guys do some suggestions if there is another technique that is easy to implement and faster than what I recently do or explain how that OcclusionQuery works.Here for example I count the total number of grey pixels then divide it to total screen area
here you can find my code written below;
private void CalculateViewFactor(Color[] data)
{
int objectPixelCount = 0;
var color = new Color();
color.R = data[0].R;
color.G = data[0].G;
color.B = data[0].B;
foreach (Color item in data)
if (item.R != color.R && item.G != color.G && item.B != color.B)
objectPixelCount++;
Console.WriteLine(objectPixelCount);
Console.WriteLine(data.Length);
Console.WriteLine( (float) objectPixelCount / data.Length);
}
due to fact that the color of the first pixel of the screen is also color of the background, I take the RGB values of the first pixel and compare these RGB values to all the other pixels on the screen and count the number of pixels which has a different color from the first pixel.
But as I know that this method is pretty slow, I want to adapt OcclusionQuery into my code. If you could help me, I would be grateful.
This is pretty tricky to do right, and I can only suggest an "alternative", not necessarily more performant or better design-wise approach.
In case you don't really need to know exact number of drawn pixels, you can approximate it. There is a technique called Monte Carlo Integration.
Start off by creating N points on the screen with random coordinates. You check and count colors at these points. Divide the number of points with color of your object by the total number of tested points (that is N). What you get is an approximate ratio of pixels that your object occupies on the final screen. If now you multiply this ratio by the total number of pixels on the screen (that is WidthPx * HeightPx) you'll get an approximate number of pixels occupied by the object.
Advantages:
Select bigger N for more accurate result, select lesser N for better performance
Algorithm is simple, harder to screw it up
Disadvantages:
It's random and never deterministic (you'll get a different result every time)
It's approximate and never exact
You'll need to generate 2 * N random values (two for each of test points), generating random values is a long operation
I'm sure later you'll want to draw textures/shading on the screen, and then this technique won't work as you'll not be able to distinguish pixels of your object and the others. You can still manage a smaller unseen buffer, where you draw the same objects, but without any shading, and each object having same unique color, then you apply Monte Carlo algorithm on it, but of course, that'll cost computing resources.
First of all, I am aware that this question really sounds as if I didn't search, but I did, a lot.
I wrote a small Mandelbrot drawing code for C#, it's basically a windows form with a PictureBox on which I draw the Mandelbrot set.
My problem is, is that it's pretty slow. Without a deep zoom it does a pretty good job and moving around and zooming is pretty smooth, takes less than a second per drawing, but once I start to zoom in a little and get to places which require more calculations it becomes really slow.
On other Mandelbrot applications my computer does really fine on places which work much slower in my application, so I'm guessing there is much I can do to improve the speed.
I did the following things to optimize it:
Instead of using the SetPixel GetPixel methods on the bitmap object, I used LockBits method to write directly to memory which made things a lot faster.
Instead of using complex number objects (with classes I made myself, not the built-in ones), I emulated complex numbers using 2 variables, re and im. Doing this allowed me to cut down on multiplications because squaring the real part and the imaginary part is something that is done a few time during the calculation, so I just save the square in a variable and reuse the result without the need to recalculate it.
I use 4 threads to draw the Mandelbrot, each thread does a different quarter of the image and they all work simultaneously. As I understood, that means my CPU will use 4 of its cores to draw the image.
I use the Escape Time Algorithm, which as I understood is the fastest?
Here is my how I move between the pixels and calculate, it's commented out so I hope it's understandable:
//Pixel by pixel loop:
for (int r = rRes; r < wTo; r++)
{
for (int i = iRes; i < hTo; i++)
{
//These calculations are to determine what complex number corresponds to the (r,i) pixel.
double re = (r - (w/2))*step + zeroX ;
double im = (i - (h/2))*step - zeroY;
//Create the Z complex number
double zRe = 0;
double zIm = 0;
//Variables to store the squares of the real and imaginary part.
double multZre = 0;
double multZim = 0;
//Start iterating the with the complex number to determine it's escape time (mandelValue)
int mandelValue = 0;
while (multZre + multZim < 4 && mandelValue < iters)
{
/*The new real part equals re(z)^2 - im(z)^2 + re(c), we store it in a temp variable
tempRe because we still need re(z) in the next calculation
*/
double tempRe = multZre - multZim + re;
/*The new imaginary part is equal to 2*re(z)*im(z) + im(c)
* Instead of multiplying these by 2 I add re(z) to itself and then multiply by im(z), which
* means I just do 1 multiplication instead of 2.
*/
zRe += zRe;
zIm = zRe * zIm + im;
zRe = tempRe; // We can now put the temp value in its place.
// Do the squaring now, they will be used in the next calculation.
multZre = zRe * zRe;
multZim = zIm * zIm;
//Increase the mandelValue by one, because the iteration is now finished.
mandelValue += 1;
}
//After the mandelValue is found, this colors its pixel accordingly (unsafe code, accesses memory directly):
//(Unimportant for my question, I doubt the problem is with this because my code becomes really slow
// as the number of ITERATIONS grow, this only executes more as the number of pixels grow).
Byte* pos = px + (i * str) + (pixelSize * r);
byte col = (byte)((1 - ((double)mandelValue / iters)) * 255);
pos[0] = col;
pos[1] = col;
pos[2] = col;
}
}
What can I do to improve this? Do you find any obvious optimization problems in my code?
Right now there are 2 ways I know I can improve it:
I need to use a different type for numbers, double is limited with accuracy and I'm sure there are better non-built-in alternative types which are faster (they multiply and add faster) and have more accuracy, I just need someone to point me where I need to look and tell me if it's true.
I can move processing to the GPU. I have no idea how to do this (OpenGL maybe? DirectX? is it even that simple or will I need to learn a lot of stuff?). If someone can send me links to proper tutorials on this subject or tell me in general about it that would be great.
Thanks a lot for reading that far and hope you can help me :)
If you decide to move the processing to the gpu, you can choose from a number of options. Since you are using C#, XNA will allow you to use HLSL. RB Whitaker has the easiest XNA tutorials if you choose this option. Another option is OpenCL. OpenTK comes with a demo program of a julia set fractal. This would be very simple to modify to display the mandlebrot set. See here
Just remember to find the GLSL shader that goes with the source code.
About the GPU, examples are no help for me because I have absolutely
no idea about this topic, how does it even work and what kind of
calculations the GPU can do (or how is it even accessed?)
Different GPU software works differently however ...
Typically a programmer will write a program for the GPU in a shader language such as HLSL, GLSL or OpenCL. The program written in C# will load the shader code and compile it, and then use functions in an API to send a job to the GPU and get the result back afterwards.
Take a look at FX Composer or render monkey if you want some practice with shaders with out having to worry about APIs.
If you are using HLSL, the rendering pipeline looks like this.
The vertex shader is responsible for taking points in 3D space and calculating their position in your 2D viewing field. (Not a big concern for you since you are working in 2D)
The pixel shader is responsible for applying shader effects to the pixels after the vertex shader is done.
OpenCL is a different story, its geared towards general purpose GPU computing (ie: not just graphics). Its more powerful and can be used for GPUs, DSPs, and building super computers.
WRT coding for the GPU, you can look at Cudafy.Net (it does OpenCL too, which is not tied to NVidia) to start getting an understanding of what's going on and perhaps even do everything you need there. I've quickly found it - and my graphics card - unsuitable for my needs, but for the Mandelbrot at the stage you're at, it should be fine.
In brief: You code for the GPU with a flavour of C (Cuda C or OpenCL normally) then push the "kernel" (your compiled C method) to the GPU followed by any source data, and then invoke that "kernel", often with parameters to say what data to use - or perhaps a few parameters to tell it where to place the results in its memory.
When I've been doing fractal rendering myself, I've avoided drawing to a bitmap for the reasons already outlined and deferred the render phase. Besides that, I tend to write massively multithreaded code which is really bad for trying to access a bitmap. Instead, I write to a common store - most recently I've used a MemoryMappedFile (a builtin .Net class) since that gives me pretty decent random access speed and a huge addressable area. I also tend to write my results to a queue and have another thread deal with committing the data to storage; the compute times of each Mandelbrot pixel will be "ragged" - that is to say that they will not always take the same length of time. As a result, your pixel commit could be the bottleneck for very low iteration counts. Farming it out to another thread means your compute threads are never waiting for storage to complete.
I'm currently playing with the Buddhabrot visualisation of the Mandelbrot set, looking at using a GPU to scale out the rendering (since it's taking a very long time with the CPU) and having a huge result-set. I was thinking of targetting an 8 gigapixel image, but I've come to the realisation that I need to diverge from the constraints of pixels, and possibly away from floating point arithmetic due to precision issues. I'm also going to have to buy some new hardware so I can interact with the GPU differently - different compute jobs will finish at different times (as per my iteration count comment earlier) so I can't just fire batches of threads and wait for them all to complete without potentially wasting a lot of time waiting for one particularly high iteration count out of the whole batch.
Another point to make that I hardly ever see being made about the Mandelbrot Set is that it is symmetrical. You might be doing twice as much calculating as you need to.
For moving the processing to the GPU, you have lots of excellent examples here:
https://www.shadertoy.com/results?query=mandelbrot
Note that you need an WebGL capable browser to view that link. Works best in Chrome.
I'm no expert on fractals but you seem to have come far already with the optimizations. Going beyond that may make the code much harder to read and maintain so you should ask yourself it is worth it.
One technique I've often observed in other fractal programs is this: While zooming, calculate the fractal at a lower resolution and stretch it to full size during render. Then render at full resolution as soon as zooming stops.
Another suggestion is that when you use multiple threads you should take care that each thread don't read/write memory of other threads because this will cause cache collisions and hurt performance. One good algorithm could be split the work up in scanlines (instead of four quarters like you did now). Create a number of threads, then as long as there as lines left to process, assign a scanline to a thread that is available. Let each thread write the pixel data to a local piece of memory and copy this back to main bitmap after each line (to avoid cache collisions).
Im developing an application to exctract text in C# in different light condition.
My problem is that sometimes there are different brightness levels in the image, like this:
So i cant utilize a pre-calculated threshold for the whole image, or i will loose some letters.
Im searching an algorithm/snippet/function or else, that can apply the right Threshold/Binarization to the image.
I founded thhis BradleyLocalThresholding in AForge, is better than other non adaptive methods, but it loose some details. ( for example the G in the image become an O )
Anyone can suggest to me a better way?
yes, use niblack (opencv has it as a function) - basically it uses the local average to construct a variable theshold. it works best for OCR. depending on the image resolution you might also want to bicubically upsample by a factor of 2x or 3x BEFORE thresholding.
Its quite difficult since the quality of your images are so low, but you could try an iterative global thresholding approach as follows:
Randomly select an initial estimate threshold T (usually as the mean).
Segment the signal using T, which will yield two groups, G1 consisting of all points with values<=T and G2 consisting of points with value>T.
Compute the average distance between points of G1 and T, and points of G2 and T.
Compute a new threshold value T=(M1+M2)/2
Repeat steps 2 through 4 until the change of T is smaller enough.
The trick is not to apply it to the whole image, but to break up the image into blocks of (for example) 5x5 and apply it to the blocks individually which would give you:
Below is an implementation in R which I'm sure you could reproduce
getT = function(y){
t = mean(y)
mu1 = mean(y[y>=t])
mu2 = mean(y[y 1){
cmu1 = mean(y[y>=t])
cmu2 = mean(y[y 1 & cmu1 == mu1 & cmu2 == mu2){
print(paste('done t=', t))
return(t)
break;
}else{
mu1 = cmu1
mu2 = cmu2
t = (mu1 + mu2)/2
print(paste('new t=', t))
}
i = i+1
}
}
r = seq(1, nrow(image), by=5)
c = seq(1, ncol(image), by=5)
r[length(r)] = nrow(image)
c[length(c)] = ncol(image)
y = image
for(i in 2:length(r) ){
for(j in 2:length(c) ){
block = image[r[i-1]:r[i], c[j-1]:c[j]]
t = getT(block)
y[r[i-1]:r[i], c[j-1]:c[j]] = (block>t)+0
}
}
display(y)
The other option besides a local threshold would be to adjust for the varying illumination. There are methods that attempt to correct the illumination and make it uniform across the image. You could then use a constant threshold, or continue to use a local threshold, with perhaps better success. If the images are like the one you show, then you could use the brighter squares around the letters as the key to adjusting the illumination.
I am trying to find coordinates of one image inside of another using AForge framework:
ExhaustiveTemplateMatching tm = new ExhaustiveTemplateMatching();
TemplateMatch[] matchings = tm.ProcessImage(new Bitmap("image.png"), new Bitmap(#"template.png"));
int x_coordinate = matchings[0].Rectangle.X;
ProcessImages takes about 2 minutes to perform.
Image's size is about 1600x1000 pixels
Template's size is about 60x60 pixels
Does anyone know how to speed up that process?
As addition to the other answers, I would say that for your case:
Image's size is about 1600x1000 pixels Template's size is about 60x60 pixels
This framework is not the best fit. The thing you are trying to achieve is more search-image-in-other-image, than compare two images with different resolution (like "Search Google for this image" can be used).
About this so
called pyramid search.
it's true that the algorithm works way faster for bigger images. Actually the image-pyramid is based on template matching. If we take the most popular implementation (I found and used):
private static bool IsSearchedImageFound(this Bitmap template, Bitmap image)
{
const Int32 divisor = 4;
const Int32 epsilon = 10;
ExhaustiveTemplateMatching etm = new ExhaustiveTemplateMatching(0.90f);
TemplateMatch[] tm = etm.ProcessImage(
new ResizeNearestNeighbor(template.Width / divisor, template.Height / divisor).Apply(template),
new ResizeNearestNeighbor(image.Width / divisor, image.Height / divisor).Apply(image)
);
if (tm.Length == 1)
{
Rectangle tempRect = tm[0].Rectangle;
if (Math.Abs(image.Width / divisor - tempRect.Width) < epsilon
&&
Math.Abs(image.Height / divisor - tempRect.Height) < epsilon)
{
return true;
}
}
return false;
}
It should give you a picture close to this one:
As bottom line - try to use different approach. Maybe closer to Sikuli integration with .Net. Or you can try the accord .Net newer version of AForge.
If this is too much work, you can try to just extend your screenshot functionality with cropping of the page element that is required (Selenium example).
2 minutes seems too much for a recent CPU with the image a template sizes you are using. But there are a couple of ways to speed up the process. The first one is by using a smaller scale. This is called pyramid search. You can try to divide the image and template by 4 so that you will have an image of 400x250 and a template of 15x15 and match this smaller template. This will run way faster but it will be also less accurate. You can then use the interesting pixels found with the 15x15 template and search the corresponding pixels in the 1600x1000 image using the 60x60 template instead of searching in the whole image.
Depending on the template details you may try at an even lower scale (1/8) instead.
Another thing to know is that a bigger template will run faster. This is counter-intuitive but with a bigger template you will have less pixel to compare. So if possible try to use a bigger template. Sometimes this optimization is not possible if your template is already as big as it can be.
I have a problem. My company has given me an awfully boring task. We have two databases of dialog boxes. One of these databases contains images of horrific quality, the other very high quality.
Unfortunately, the dialogs of horrific quality contain important mappings to other info.
I have been tasked with, manually, going through all the bad images and matching them to good images.
Would it be possible to automate this process to any degree? Here is an example of two dialog boxes (randomly pulled from Google images) :
So I am currently trying to write a program in C# to pull these photos from the database, cycle through them, find the ones with common shapes, and return theird IDs. What are my best options here ?
I really see no reason to use any external libraries for this, I've done this sort of thing many times and the following algorithm works quite well. I'll assume that if you're comparing two images that they have the same dimensions, but you can just resize one if they don't.
badness := 0.0
For x, y over the entire image:
r, g, b := color at x,y in image 1
R, G, B := color at x,y in image 2
badness += (r-R)*(r-R) + (g-G)*(g-G) + (b-B)*(b-B)
badness /= (image width) * (image height)
Now you've got a normalized badness value between two images, the lower the badness, the more likely that the images match. This is simple and effective, there are a variety of things that make it work better or faster in certain cases but you probably don't need anything like that. You don't even really need to normalize the badness, but this way you can just come up with a single threshold for it if you want to look at several possible matches manually.
Since this question has gotten some more attention I've decided to add a way to speed this up in cases where you are processing many images many times. I used this approach when I had several tens of thousands of images that I needed to compare, and I was sure that a typical pair of images would be wildly different. I also knew that all of my images would be exactly the same dimensions. In a situation in which you are comparing dialog boxes your typical images may be mostly grey-ish, and some of your images may require resizing (although maybe that just indicates a mis-match), in which case this approach may not gain you as much.
The idea is to form a quad-tree where each node represents the average RGB values of the region that node represents. So an 4x4 image would have a root node with RGB values equal to the average RGB value of the image, its children would have RGB values representing the average RGB value of their respective 2x2 regions, and their children would represent individual pixels. (In practice it is a good idea to not go deeper than a region of about 16x16, at that point you should just start comparing individual pixels.)
Before you start comparing images you will also need to decide on a badness threshold. You won't calculate badnesses above this threshold with any reliable accuracy, so this is basically the threshold at which you are willing to label an image as 'not a match'.
Now when you compare image A to image B, first compare the root nodes of their quad-tree representations. Calculate the badness just as you would for a single pixel image, and if the badness exceeds your threshold then return immediately and report the badness at this level. Because you are using normalized badnesses, and since badnesses are calculated using squared differences, the badness at any particular level will be equal to or less than the badness at lower levels, so if it exceeds the threshold at any points you know it will also exceed the threshold at the level of individual pixels.
If the threshold test passes on an nxn image, just drop to the next level down and compare it like it was a 2nx2n image. Once you get low enough just compare the individual pixels. Depending on your corpus of images this may allow you to skip lots of comparisons.
I would personally go for an image hashing algorithm.
The goal of image hashing is to transform image content into a feature sequence, in order to obtain a condensed representation.
This feature sequence (i.e. a vector of bits) must be short enough for fast matching and preserve distinguishable features for similarity measurement to be feasible.
There are several algorithms that are freely available through open source communities.
A simple example can be found in this article, where Dr. Neal Krawetz shows how the Average Hash algorithm works:
Reduce size. The fastest way to remove high frequencies and detail is to shrink the image. In this case, shrink it to 8x8 so that there are 64 total pixels. Don't bother keeping the aspect ratio, just crush it down to fit an 8x8 square. This way, the hash will match any variation of the image, regardless of scale or aspect ratio.
Reduce color. The tiny 8x8 picture is converted to a grayscale. This changes the hash from 64 pixels (64 red, 64 green, and 64 blue) to 64 total colors.
Average the colors. Compute the mean value of the 64 colors.
Compute the bits. This is the fun part. Each bit is simply set based on whether the color value is above or below the mean.
Construct the hash. Set the 64 bits into a 64-bit integer. The order does not matter, just as long as you are consistent. (I set the bits from left to right, top to bottom using big-endian.)
David Oftedal wrote a C# command-line application which can classify and compare images using the Average Hash algorithm.
(I tested his implementation with your sample images and I got a 98.4% similarity).
The main benefit of this solution is that you read each image only once, create the hashes and classify them based upon their similiarity (using, for example, the Hamming distance).
In this way you decouple the feature extraction phase from the classification phase, and you can easily switch to another hashing algorithm if you find it's not enough accurate.
Edit
You can find a simple example here (It includes a test set of 40 images and it gets a 40/40 score).
Here's a topic discussing image similarity with algorithms, already implemented in OpenCV library. You should have no problem importing low-level functions in your C# application.
The Commercial TinEye API is a really good option.
I've done image matching programs in the past and Image Processing technology these days is amazing, its advanced so much.
ps here's where those two random pics you pulled from google came from: http://www.tineye.com/search/1ec9ebbf1b5b3b81cb52a7e8dbf42cb63126b4ea/
Since this is a one-off job, I'd make do with a script (choose your favorite language; I'd probably pick Perl) and ImageMagick. You could use C# to accomplish the same as the script, although with more code. Just call the command line utilities and parse the resulting output.
The script to check a pair for similarity would be about 10 lines as follows:
First retrieve the sizes with identify and check aspect ratios nearly the same. If not, no match. If so, then scale the larger image to the size of the smaller with convert. You should experiment a bit in advance with filter options to find the one that produces the most similarity in known-equivalent images. Nine of them are available.
Then use the compare function to produce a similarity metric. Compare is smart enough to deal with translation and cropping. Experiment to find a similarity threshold that doesn't provide too many false positives.
I would do something like this :
If you already know how the blurred images have been blurred, apply the same function to the high quality images before comparison.
Then compare the images using least-squares as suggested above.
The lowest value should give you a match. Ideally, you would get 0 if both images are identical
to speed things up, you could perform most comparison on downsampled images then refine on a selected subsample of the images
If you don't know, try various probable functions (JPEG compression, downsampling, ...) and repeat
You could try Content-Based Image Retrieval (CBIR).
To put it bluntly:
For every image in the database, generate a fingerprint using a
Fourier Transform
Load the source image, make a fingerprint of the
image
Calculate the Euclidean Distance between the source and all
the images in the database
Sort the results
I think a hybrid approach to this would be best to solve your particular batch matching problem
Apply the Image Hashing algorithm suggested by #Paolo Morreti, to all images
For each image in one set, find the subset of images with a hash closer that a set distance
For this reduced search space you can now apply expensive matching methods as suggested by #Running Wild or #Raskolnikov ... the best one wins.
IMHO, best solution is to blur both images and later use some similarity measure (correlation/ mutual information etc) to get top K (K=5 may be?) choices.
If you extract the contours from the image, you can use ShapeContext to get a very good matching of images.
ShapeContext is build for this exact things (comparing images based on mutual shapes)
ShapeContext implementation links:
Original publication
A goot ppt on the subject
CodeProject page about ShapeContext
*You might need to try a few "contour extraction" techniques like thresholds or fourier transform, or take a look at this CodeProject page about contour extraction
Good Luck.
If you calculate just pixel difference of images, it will work only if images of the same size or you know exactly how to scale it in horizontal and vertical direction, also you will not have any shift or rotation invariance.
So I recomend to use pixel difference metric only if you have simplest form of problem(images are the same in all characteristics but quality is different, and by the way why quality is different? jpeg artifacts or just rescale?), otherwise i recommend to use normalized cross-correlation, it's more stable metric.
You can do it with FFTW or with OpenCV.
If bad quality is just result of lower resolution then:
rescale high quality image to low quality image resolution (or rescale both to equal low resolution)
compare each pixel color to find closest match
So for example rescaling all of images to 32x32 and comparing that set by pixels should give you quite reasonable results and its still easy to do. Although rescaling method can make difference here.
You could try a block-matching algorithm, although I'm not sure its exact effectiveness against your specific problem - http://scien.stanford.edu/pages/labsite/2001/ee368/projects2001/dropbox/project17/block.html - http://www.aforgenet.com/framework/docs/html/05d0ab7d-a1ae-7ea5-9f7b-a966c7824669.htm
Even if this does not work, you should still check out the Aforge.net library. There are several tools there (including block matching from above) that could help you in this process - http://www.aforgenet.com/
I really like Running Wild's algorithm and I think it can be even more effective if you could make the two images more similar, for example by decreasing the quality of the better one.
Running Wild's answer is very close. What you are doing here is calculating the Peak Signal to Noise Ratio for each image, or PSNR. In your case you really only need the Mean Squared Error, but the squaring component of it helps a great deal in calculating difference between images.
PSNR Reference
Your code should look like:
sum = 0.0
for(imageHeight){
for(imageWidth){
errorR = firstImage(r,x,y) - secondImage(r,x,y)
errorG = firstImage(g,x,y) - secondImage(g,x,y)
errorB = firstImage(b,x,y) - secondImage(b,x,y)
totalError = square(errorR) + square(errorG) + square(errorB)
}
sum += totalError
}
meanSquaredError = (sum / (imageHeight * imageWidth)) / 3
I asume the images from the two databases show the same dialog and that the images should be close to identical but of different quality? Then matching images will have same (or very close to same) aspect ratio.
If the low quality images were produced from the high quality images (or equivalent image), then you should use the same image processing procedure as a preprocessing step on the high quality image and match with the low quality image database. Then pixel by pixel comparison or histogram matching should work well.
Image matching can use a lot of resources if you have many images. Maybe a multipass approach is a good idea? For example:
Pass 1: use simple mesures like aspect ratio to groupe images (width and height fields in db?) (computationally cheap)
Pass 2: match or groupe by histogram for 1st-color-channel (or all channels) (relatively computationally cheap)
I will also recommend OpenCV. You can use it with c,c++ and Python (and soon Java).
Just thinking out loud:
If you use two images that should be compared as layers and combine these (subtract one from the other) you get a new image (some drawing programs can be scripted to do batch conversion, or you could use the GPU by writing a tiny DirectX or OpenGL program)
Next you would have to get the brightness of the resulting image; the darker it is, the better the match.
Have you tried contour/thresholding techniques in combination with a walking average window (for RGB values ) ?