Im developing an application to exctract text in C# in different light condition.
My problem is that sometimes there are different brightness levels in the image, like this:
So i cant utilize a pre-calculated threshold for the whole image, or i will loose some letters.
Im searching an algorithm/snippet/function or else, that can apply the right Threshold/Binarization to the image.
I founded thhis BradleyLocalThresholding in AForge, is better than other non adaptive methods, but it loose some details. ( for example the G in the image become an O )
Anyone can suggest to me a better way?
yes, use niblack (opencv has it as a function) - basically it uses the local average to construct a variable theshold. it works best for OCR. depending on the image resolution you might also want to bicubically upsample by a factor of 2x or 3x BEFORE thresholding.
Its quite difficult since the quality of your images are so low, but you could try an iterative global thresholding approach as follows:
Randomly select an initial estimate threshold T (usually as the mean).
Segment the signal using T, which will yield two groups, G1 consisting of all points with values<=T and G2 consisting of points with value>T.
Compute the average distance between points of G1 and T, and points of G2 and T.
Compute a new threshold value T=(M1+M2)/2
Repeat steps 2 through 4 until the change of T is smaller enough.
The trick is not to apply it to the whole image, but to break up the image into blocks of (for example) 5x5 and apply it to the blocks individually which would give you:
Below is an implementation in R which I'm sure you could reproduce
getT = function(y){
t = mean(y)
mu1 = mean(y[y>=t])
mu2 = mean(y[y 1){
cmu1 = mean(y[y>=t])
cmu2 = mean(y[y 1 & cmu1 == mu1 & cmu2 == mu2){
print(paste('done t=', t))
return(t)
break;
}else{
mu1 = cmu1
mu2 = cmu2
t = (mu1 + mu2)/2
print(paste('new t=', t))
}
i = i+1
}
}
r = seq(1, nrow(image), by=5)
c = seq(1, ncol(image), by=5)
r[length(r)] = nrow(image)
c[length(c)] = ncol(image)
y = image
for(i in 2:length(r) ){
for(j in 2:length(c) ){
block = image[r[i-1]:r[i], c[j-1]:c[j]]
t = getT(block)
y[r[i-1]:r[i], c[j-1]:c[j]] = (block>t)+0
}
}
display(y)
The other option besides a local threshold would be to adjust for the varying illumination. There are methods that attempt to correct the illumination and make it uniform across the image. You could then use a constant threshold, or continue to use a local threshold, with perhaps better success. If the images are like the one you show, then you could use the brighter squares around the letters as the key to adjusting the illumination.
Related
I am trying to compare two bitmaps to one another. One is premade, and the other one consists of a small image of the main screen being taken and filtered for everything besides full white. I now need a way to compare the amount of white pixels in the live Bitmap to the amount in the premade Bitmap (101 white pixels). A way I know of would be using the Bitmap.Get/SetPixel commands, but they are really slow and as this is used for a kind of time critical application, unfitting.
Especially since I could cut down the filtering process by a factor of 70 by following this guide.
https://www.codeproject.com/articles/617613/fast-pixel-operations-in-net-with-and-without-unsa
I also can't just compare the 2 Bitmaps, as the live one will usually not have the pixels in the same position, but will share the same amount of white pixels.
So yeah. It'd be great if one of you had a time effective solution to this problem.
Edit
Huge oversight on my part. When looking at the filtering method, it becomes apparent that one can just use a counter+=1 every time a pixel is not filtered out.
So I just changed this line of code in the filter function
row[rIndex] = row[bIndex] = row[gIndex] = distance > toleranceSquared ? unmatchingValue : matchingValue;
to this
if(distance > toleranceSquared)
{
row[rIndex] = row[bIndex] = row[gIndex] = unmatchingValue;
}
else
{
row[rIndex] = row[bIndex] = row[gIndex] = matchingValue;
WhitePixelCount += 1;
}
I have a project on monogame platform. The purpose of the project is to make the calculation of the viewfactor of geometry put into the platform using the ortographic method. In a basic level, I put a basic cube and camera across from the cube. Here as I look into the cube through the camera, I am required to count the number of pixels of an object seen from a perspective by the ortographic method. I already have a solution but it is very slow. In my solution, I count the number of pixels with a certain color and then divide that number to the total number of pixels on the screen. I have heard of a technique that involves using OcclusionQuery. But I guess I have to do some shader programming in order to use that technique, of which I do not have a clue. Can you guys do some suggestions if there is another technique that is easy to implement and faster than what I recently do or explain how that OcclusionQuery works.Here for example I count the total number of grey pixels then divide it to total screen area
here you can find my code written below;
private void CalculateViewFactor(Color[] data)
{
int objectPixelCount = 0;
var color = new Color();
color.R = data[0].R;
color.G = data[0].G;
color.B = data[0].B;
foreach (Color item in data)
if (item.R != color.R && item.G != color.G && item.B != color.B)
objectPixelCount++;
Console.WriteLine(objectPixelCount);
Console.WriteLine(data.Length);
Console.WriteLine( (float) objectPixelCount / data.Length);
}
due to fact that the color of the first pixel of the screen is also color of the background, I take the RGB values of the first pixel and compare these RGB values to all the other pixels on the screen and count the number of pixels which has a different color from the first pixel.
But as I know that this method is pretty slow, I want to adapt OcclusionQuery into my code. If you could help me, I would be grateful.
This is pretty tricky to do right, and I can only suggest an "alternative", not necessarily more performant or better design-wise approach.
In case you don't really need to know exact number of drawn pixels, you can approximate it. There is a technique called Monte Carlo Integration.
Start off by creating N points on the screen with random coordinates. You check and count colors at these points. Divide the number of points with color of your object by the total number of tested points (that is N). What you get is an approximate ratio of pixels that your object occupies on the final screen. If now you multiply this ratio by the total number of pixels on the screen (that is WidthPx * HeightPx) you'll get an approximate number of pixels occupied by the object.
Advantages:
Select bigger N for more accurate result, select lesser N for better performance
Algorithm is simple, harder to screw it up
Disadvantages:
It's random and never deterministic (you'll get a different result every time)
It's approximate and never exact
You'll need to generate 2 * N random values (two for each of test points), generating random values is a long operation
I'm sure later you'll want to draw textures/shading on the screen, and then this technique won't work as you'll not be able to distinguish pixels of your object and the others. You can still manage a smaller unseen buffer, where you draw the same objects, but without any shading, and each object having same unique color, then you apply Monte Carlo algorithm on it, but of course, that'll cost computing resources.
I am trying to find coordinates of one image inside of another using AForge framework:
ExhaustiveTemplateMatching tm = new ExhaustiveTemplateMatching();
TemplateMatch[] matchings = tm.ProcessImage(new Bitmap("image.png"), new Bitmap(#"template.png"));
int x_coordinate = matchings[0].Rectangle.X;
ProcessImages takes about 2 minutes to perform.
Image's size is about 1600x1000 pixels
Template's size is about 60x60 pixels
Does anyone know how to speed up that process?
As addition to the other answers, I would say that for your case:
Image's size is about 1600x1000 pixels Template's size is about 60x60 pixels
This framework is not the best fit. The thing you are trying to achieve is more search-image-in-other-image, than compare two images with different resolution (like "Search Google for this image" can be used).
About this so
called pyramid search.
it's true that the algorithm works way faster for bigger images. Actually the image-pyramid is based on template matching. If we take the most popular implementation (I found and used):
private static bool IsSearchedImageFound(this Bitmap template, Bitmap image)
{
const Int32 divisor = 4;
const Int32 epsilon = 10;
ExhaustiveTemplateMatching etm = new ExhaustiveTemplateMatching(0.90f);
TemplateMatch[] tm = etm.ProcessImage(
new ResizeNearestNeighbor(template.Width / divisor, template.Height / divisor).Apply(template),
new ResizeNearestNeighbor(image.Width / divisor, image.Height / divisor).Apply(image)
);
if (tm.Length == 1)
{
Rectangle tempRect = tm[0].Rectangle;
if (Math.Abs(image.Width / divisor - tempRect.Width) < epsilon
&&
Math.Abs(image.Height / divisor - tempRect.Height) < epsilon)
{
return true;
}
}
return false;
}
It should give you a picture close to this one:
As bottom line - try to use different approach. Maybe closer to Sikuli integration with .Net. Or you can try the accord .Net newer version of AForge.
If this is too much work, you can try to just extend your screenshot functionality with cropping of the page element that is required (Selenium example).
2 minutes seems too much for a recent CPU with the image a template sizes you are using. But there are a couple of ways to speed up the process. The first one is by using a smaller scale. This is called pyramid search. You can try to divide the image and template by 4 so that you will have an image of 400x250 and a template of 15x15 and match this smaller template. This will run way faster but it will be also less accurate. You can then use the interesting pixels found with the 15x15 template and search the corresponding pixels in the 1600x1000 image using the 60x60 template instead of searching in the whole image.
Depending on the template details you may try at an even lower scale (1/8) instead.
Another thing to know is that a bigger template will run faster. This is counter-intuitive but with a bigger template you will have less pixel to compare. So if possible try to use a bigger template. Sometimes this optimization is not possible if your template is already as big as it can be.
public void checkForCollision () {
int headX = cells[0].x;
int headY = cells[0].y;
int noOfParts = nPoints;
for(int i = 1; i <noOfParts;i++)
{
int tempX = cells[i].x;
int tempY = cells[i].y;
if(tempX == headX && tempY == headY){
JOptionPane.showMessageDialog(null,"Head hit body");
//EndGameCollectScore etc.
}
}
}
EDIT: 'Cells[]' is an array of type Point AND noOfParts is just how many segments the snake has
main Question
With the above code I'm trying to compare tempX to headX but i would like to have a sort of margin for error e.g. +-5 but am unsure how to accomplish this, my reasoning behind this is i'm thinking maybe the x and Y variables might be a few digits apart so if i have the radius of one of the segment of the snake (explanation of 'snake' in Alternate below) then if i'm right and the values are a tiny bit off it should still come back positive.
OR
Alternative
if anyone can suggest a better way for doing this? Basically it's for a Snake game and headX and headY is the head of the snake and the remaining X and Y variables in Cells is the body, and I'm attempting to compare if the head hits the body.
I tested it and it seemed to work but after i tested it again it seems it will only pick up the collision if i make the snake double back on itself for a few squares. e.g. IF i cross the body perpendicular it will not detect the collision.
Also i am fairly certain that this method is called after each block the snake moves.
Cheers,
Shane.
P.S Running of very little sleep and way too much sugar in my blood, If you need further clarification because the above doesn't make alot of sense let me know.
int eps = 5;
if (Math.abs(tempX - headX) <= eps && Math.abs(tempY - headY) <= eps) {
// ...
}
To check if two points are within a delta from each other, compute the distance between them. You can avoid going into the square root territory by using squares, like this:
int distSq = (tempX-headX)*(tempX-headX) + (tempY-headY)*(tempY-headY);
int minDist = 5;
if (distSq < minDist*minDist) {
// too close
}
I don't know how your snake looks, but if it has a complex shape, looking for a hit can be expensive in terms of speed. You can speed up collision detection if you can do a quick test, to see if a collision is possible at all. You can do this by using a bounding box. You would have to keep track of minimum and maximum x and y positions of the snake body. Only if a coordinate lies within these boundaries you would take account of the exact shape of the snake. How this has to be done depends on how the snake is represented. Check for each tile or each pixel the snake is made of or possibly check if the coordinate is within a polygon, if the snake outline is defined by a polygon. (I'm not going to explain how this works here, but you will find algorithms if you google a bit.)
If you need to calculate the distance to another point (the snake head), you can use different metrics for this. If only horizontal and vertical movements are possible within the game, the so called Manhattan or taxi distance can be used: d = |x1-x0| + |y1-y0|. It consists of adding the x and y distances, or you can use the maximum of both distances: d = Max(|x1-x0|, |y1-y0|) (correponds to 2kay's approach).
If you need the exact distance, apply the Pythagorean formula. In order to compare the distance with the error margin, you don't need to calculate the square root. Instead compare the square of the distance with the square of the error margin. This saves time. (x1-x0)^2 + (y1-y0)^2 < error_margin^2.
I have made a simple webcam based application that detects the "edges of motion" so draws a texture that shows where the pixels of the current frame are significantly different to the previous frame. This is my code:
// LastTexture is a Texture2D of the previous frame.
// CurrentTexture is a Texture2D of the current frame.
// DifferenceTexture is another Texture2D.
// Variance is an int, default 100;
Color[] differenceData = new Color[CurrentTexture.Width * CurrentTexture.Height];
Color[] currentData = new Color[CurrentTexture.Width * CurrentTexture.Height];
Color[] lastData = new Color[LastTexture.Width * LastTexture.Height];
CurrentTexture.GetData<Color>(currentData);
LastTexture.GetData<Color>(lastData);
for (int i = 0; i < currentData.Length; i++)
{
int sumCD = ColorSum(currentData[i]); // ColorSum is the same as c.R + c.B + c.G where c is a Color.
int sumLD = ColorSum(lastData[i]);
if ((sumCD > sumLD - Variance) && (sumCD < sumLD + Variance))
differenceData[i] = new Color(0, 0, 0, 0); // If the current pixel is within the range of +/- the Variance (default: 100) variable, it has not significantly changes so is drawn black.
else
differenceData[i] = new Color(0, (byte)Math.Abs(sumCD - sumLD), 0); // This has changed significantly so is drawn a shade of green.
}
DifferenceTexture = new Texture2D(game1.GraphicsDevice, CurrentTexture.Width, CurrentTexture.Height);
DifferenceTexture.SetData<Color>(differenceData);
LastTexture = new Texture2D(game1.GraphicsDevice,CurrentTexture.Width, CurrentTexture.Height);
LastTexture.SetData<Color>(currentData);
Is there a way to offload this calculation to the GPU using shaders (it can go at about 25/26 fps using the above method, but this is a bit slow)? I have a basic understanding of how HLSL shaders work and don't expect a full solution, I just want to know if this would be possible and how to get the "difference" texture data from the shader and if this would actually be any faster.
Thanks in advance.
You could sample two textures inside the pixel shader, then write the difference out as the colour value. If you set up a Render Target, the colour information you ouput from the shader will be stored in this texture instead of the framebuffer.
I don't know what sort of speed gain you'd expect to see, but that's how I'd do it.
*edit - Oh and I forgot to say, be aware of the sampling type you use, as it will affect the results. If you want your algorithm to directly translate to the GPU, use point sampling to start with.
Regarding your comment above about deciding to use a dual thread approach to your problem, check out the .Net Parallel Extensions CTP from Microsoft. microsoft.com
If you're not planning on deploying to an XBox360, this library works great with XNA, and I've seen massive speed improvements in certain loops and iterations.
You would basically only have to change a couple lines of code, for example:
for (int i = 0; i < currentData.Length; i++)
{
// ...
}
would change to:
Parallel.For(0, currentData.Length, delegate(int i)
{
// ...
});
to automatically make each core in your processor help out with the number crunching. It's fast and excellent.
Good luck!
The Sobel operator or something like it is used in game engines and other real-time applications for edge detection. It is trivial to write as a pixel shader.