I'm trying to find a digit within an image. To test my code I took an image of the digit and then used AForge's Exhaustive Template Matching algorithm to search for it in another image. But I think there is a problem in that the digit is obviously not rectangular whereas the image that contains it is. That means that there are a lot of pixels participating in the comparison which shouldn't be. Is there any way to make this comparison while ignoring those pixels? If not in AForge then maybe EMGU/OpenCV or Octave?
Here's my code:
Grayscale gray = new GrayscaleRMY();
Bitmap template = (Bitmap)Bitmap.FromFile(#"5template.png");
template = gray.Apply(template);
Bitmap image = (Bitmap)Bitmap.FromFile(filePath);
Bitmap sourceImage = gray.Apply(image);
ExhaustiveTemplateMatching tm = new ExhaustiveTemplateMatching(0.7f);
TemplateMatch[] matchings = tm.ProcessImage(sourceImage, template);
As mentioned above in the comment, you should preprocess your data to improve matching.
The first thing that comes to mind is morphological opening (erode then dilate) to reduce the background noise
Read in your image and invert it so that your character vectors are white:
Apply opening with smallest possible structuring element/window (3x3):
You could try slightly larger structuring elements (5x5):
Invert it back to the original:
See if that helps!
Related
I've been using office document imaging for OCR to get the text from the image. For this image,
I'd like to know the preprocessing steps involved to improve the quality of the image before feeding it to the OCR. So far I've tried binarization (threshold), blur(Gaussian), sharpen, mean removal & increasing the brightness and contrast of the image, but still the OCR engine couldn't get the exact text (may be 50 % success).
I'd like to know the preprocessing steps (in right order) to improve the quality preferably in C#. The image of the screen is captured via a webcam. Thanks.
I have played with your image a bit in C++ with my DIP lib and here is the result:
picture pic0,pic1;
pic0.load("ocr_green.png");
pic0.pixel_format(_pf_u); // RGB -> Grayscale <0-765>
pic0.enhance_range(); // remove DC offset and use full dynamic range <0-765>
pic0.normalize(8,false); // try to normalize ilumination conditions of image (equalize light) based on 8x8 sqares analysis, do not recolor saturated square with avg color
pic0.enhance_range(); // remove DC offset and use full dynamic range <0-765>
pic1=pic0; // copy result to pic1
pic0.pixel_format(_pf_rgba); // Grayscale -> RGBA
int x,y,c,c0,c1;
for (y=0;y<pic1.ys;y++) // process all H lines
{
c0=pic1.p[y][0].dd; c1=c0; // find min and max intensity in H line
for (x=0;x<pic1.xs;x++)
{
c=pic1.p[y][x].dd;
if (c0>c) c0=c;
if (c1<c) c1=c;
}
if (c1-c0<700) // if difference not big enough blacken H line...
for (x=0;x<pic1.xs;x++) pic1.p[y][x].dd=0;
else // else binarize H line
for (x=0;x<pic1.xs;x++)
if (pic1.p[y][x].dd>=155) pic1.p[y][x].dd=765; else pic1.p[y][x].dd=0;
}
pic1.pixel_format(_pf_rgba); // Grayscale -> RGBA
The left image (pic0) is just yours converted to grayscale, enhanced dynamic range to max and equalized illumination.
see: Enhancing dynamic range and normalizing illumination you will find also description of mine picture class there ...
The right image (pic1) is binarized but only for horizontal lines with high enough change on pixel intensities (as mentioned in my comment)... the rest is set to black...
This image is of a very good quality for OCR. It will binarize seamlessly. Depending on the engine, you will perform the binarization yourself or let the engine do it.
Probably you have to blacken the bottom area so that characters get separated. As the screen layout is fixed, this can be easily automated.
You also need to check if this OCR knows about this font.
You can delimit the white areas by profile analysis (cumulating horizontally).
I would like to find a piece of an image inside another image. However, I have some regions pixels in both images that I don't want to take into account. So I was thinking of using some type of mask with zeros or ones to indicate the good pixels.
I am using the MatchTemplate method from emgu and it does not accept a mask. Is there any other way of doing what I would like to do? Thank you!
ReferenceImage.MatchTemplate(templateImage, Emgu.CV.CvEnum.TM_TYPE.CV_TM_CCORR_NORMED);
I thought of a solution. Asuming that referenceImageMask and templateMask have 1s in the good pixels and 0s in the bad ones. And that referenceImage and templateImage have already been masked and have 0s in the bad pixels as well.
Then, the first result of template matching will give the not normalized cross correlation between the images.
The second template matching will give for each possible offset the number of pixels that were at the same time different from zero (unmasked) in both images.
Then, normalizing the correlation by that number should give the value I wanted. The average product for the pixels that are not masked in both images.
Image<Gray, float> imCorr = referenceImage.MatchTemplate(templateImage, Emgu.CV.CvEnum.TM_TYPE.CV_TM_CCORR);
Image<Gray, float> imCorrMask = referenceImageMask.MatchTemplate(templateMask, Emgu.CV.CvEnum.TM_TYPE.CV_TM_CCORR);
imCorr = imCorr .Mul(imCorrMask .Pow(-1));
Today you could use this method:
CvInvoke.MatchTemplate(actualImage, expectedImage, result, TemplateMatchingType.CcoeffNormed, mask);
I've a image like this (white background and black text). If there is not noise (as you can see: the top and bottom of number line has many noise), Tesseract can recognize number very good.
But when has noise, Tesseract try to recognize it as number and add more number to result. It is really bad. How can I make Tesseract Ignore Noise? I can't make a preprocessing image to make it more contrast or sharp text. This doesn't help anything.
If some tool can to hightlight only string line. It can be really good input to Tesseract. Please help me. Thanks everybody.
You should try eroding and dilating:
The most basic morphological operations are two: Erosion and Dilation.
They have a wide array of uses, i.e. :
Removing noise
...
you could try to down sample your binary image and sample it up again (pyrDown and PyrUp) or you could try to smooth your image with an gaussian blur. And, as already suggested, erode and dilate your image.
I see 3 solutions for your problem:
As already sugested - try using erode and dilate or some kind of blur. It's the simplest solution.
Find all contours (findContours function) and then delete all contours with area less then some value (try different values, you should find correct one quite fast). Note that the value may not be constant - for example you can try to use 80% of average contour area (just add all contours areas, divide it by number of contours and multiply by 0.8).
Find all contours. Create one dimension array of integers, with length equal to your image height. Fill array with zeros. Now for each contour:
I. Find the top and the bottom point (points with the biggest and the smallest value of y coordinate). Let's name this points T and B.
II. Add one to all elements of array which index is between B.y and T.y. (so if B = (1, 4) and T = (3, 11) then add one to array[4], array[5], array[6] ..., array[11]).
Find the biggest element of array. Let's name this value v. All contours for which B.y <= v <= T.y should be letters, other contours - noise.
you can easily remove these noises by using image processing techniques(Morphological operations like erode and dilate) you can choose opencv for this operations.
Do connected component labeling....that is blob counting....all dose noises can never match the size of the numbers....with morphological techniques the numbers also get modified...label the image...count the number of pixels in each labeled region and set a threshold (which you can easily set as you will only have numbers and noises)...cvblob is the library written in C++ available at code googles...
I had similar problem: small noises was cause of tesseract fails. I cannot use open-cv, because I was developing some feature on android, and open-cv was unwanted because of it large size. I don't know if this solution is good, but here is what I did.
I found all black regions in image (points of each region I added to own region set). Then, I check if count of point in this region is bigger than some threshold, like 10, 25 and 50. If true, I make white all points of that region.
I am trying to find coordinates of one image inside of another using AForge framework:
ExhaustiveTemplateMatching tm = new ExhaustiveTemplateMatching();
TemplateMatch[] matchings = tm.ProcessImage(new Bitmap("image.png"), new Bitmap(#"template.png"));
int x_coordinate = matchings[0].Rectangle.X;
ProcessImages takes about 2 minutes to perform.
Image's size is about 1600x1000 pixels
Template's size is about 60x60 pixels
Does anyone know how to speed up that process?
As addition to the other answers, I would say that for your case:
Image's size is about 1600x1000 pixels Template's size is about 60x60 pixels
This framework is not the best fit. The thing you are trying to achieve is more search-image-in-other-image, than compare two images with different resolution (like "Search Google for this image" can be used).
About this so
called pyramid search.
it's true that the algorithm works way faster for bigger images. Actually the image-pyramid is based on template matching. If we take the most popular implementation (I found and used):
private static bool IsSearchedImageFound(this Bitmap template, Bitmap image)
{
const Int32 divisor = 4;
const Int32 epsilon = 10;
ExhaustiveTemplateMatching etm = new ExhaustiveTemplateMatching(0.90f);
TemplateMatch[] tm = etm.ProcessImage(
new ResizeNearestNeighbor(template.Width / divisor, template.Height / divisor).Apply(template),
new ResizeNearestNeighbor(image.Width / divisor, image.Height / divisor).Apply(image)
);
if (tm.Length == 1)
{
Rectangle tempRect = tm[0].Rectangle;
if (Math.Abs(image.Width / divisor - tempRect.Width) < epsilon
&&
Math.Abs(image.Height / divisor - tempRect.Height) < epsilon)
{
return true;
}
}
return false;
}
It should give you a picture close to this one:
As bottom line - try to use different approach. Maybe closer to Sikuli integration with .Net. Or you can try the accord .Net newer version of AForge.
If this is too much work, you can try to just extend your screenshot functionality with cropping of the page element that is required (Selenium example).
2 minutes seems too much for a recent CPU with the image a template sizes you are using. But there are a couple of ways to speed up the process. The first one is by using a smaller scale. This is called pyramid search. You can try to divide the image and template by 4 so that you will have an image of 400x250 and a template of 15x15 and match this smaller template. This will run way faster but it will be also less accurate. You can then use the interesting pixels found with the 15x15 template and search the corresponding pixels in the 1600x1000 image using the 60x60 template instead of searching in the whole image.
Depending on the template details you may try at an even lower scale (1/8) instead.
Another thing to know is that a bigger template will run faster. This is counter-intuitive but with a bigger template you will have less pixel to compare. So if possible try to use a bigger template. Sometimes this optimization is not possible if your template is already as big as it can be.
I need to take a full color JPG Image and remap it's colors to a Indexed palette. The palette will consist of specific colors populated from a database. I need to map each color of the image to it's "closest" value in the index. I am sure there are different algorithms for comparing and calculating the "closest" value. Looking for C#, .NET managed code libraries only.
(It will be used in a process where we have 120 or so specific colors of buttons, and we want to map any image to those 120 colors to make a collage).
Nothing will help you with GDI. It seems indexed images are too backward a technology for Microsoft to care. All you can do is read and write indexed image files.
There are usually two step when quantizing colors in an image:
1) Find the best palette for the image (Color Quantization)
2) Map the source solors to the found palette (Color Mapping)
From what I understand, you already have the palette in the database, that means the hardest part has been done for you. All you need to do is map the 24 bit colors to the provided palette colors. If you don't have the starting palette, then you will have to compute it yourself using a quantisation algorithm: Octrees or Median Cut are the most well known. Median Cut gives better results but is slower and harder to implement and fine tune.
To map the colors, the simplest algorithm in your case is to calculate the distance from your source color to all the palette colors and pick the nearest.
float ColorDistanceSquared(Color c1, Color c2)
{
float deltaR = c2.R - c1.R;
float deltaG = c2.G - c1.G;
float deltaB = c2.B - c1.B;
return deltaR*deltaR + deltaG*deltaG + deltaB*deltaB;
}
You can also ponderate the channels so that blue has less weight, don't go too overboard with it, else it will give horrible results, specifically 30/59/11 won't work at all:
float ColorDistanceSquared(Color c1, Color c2)
{
float deltaR = (c2.R - c1.R) * 3;
float deltaG = (c2.G - c1.G) * 3;
float deltaB = (c2.B - c1.B) * 2;
return deltaR*deltaR + deltaG*deltaG + deltaB*deltaB;
}
Call that thing for all source and palette colors and find the Min. If you cache your results as you go in a map, this will be very fast.
Also, the source color will rarely fit a palette color enough to not create banding and plain areas and loss of details in your image. To avoid that, you can use dithering. The simplest algorithm and the one that gives the best results is Error Diffusion Dithering.
Once you mapped your colors, you will have to manually lock a Bitmap and write the indices in there as .Net won't let you write to an indexed image.
This process is called Quantization. Since each color represents 3 packed values, you'll need to use Octrees to solve this problem.
Check out this article with example code.
The article focuses on getting the ultimate palette for the image, but your process it would be reverse for the second part, only reduce the most used colors that are close to the given palette.
I had to do this in a big .NET project. There's nothing in the framework for it, but this article quickly led me to a solution: http://codebetter.com/blogs/brendan.tompkins/archive/2004/01/26/6103.aspx
The JPEG word should ring alarm bells. The images are very likely to already be in a heavily quantised colour space, and further resampling will potentially introduce aliasing. If you can, work from uncompressed images to reduce this effect.
The answer to your question is yes - you can save the images in an alternate format - but I'm not sure if the native functionality is adequate for what sounds like a quite complex requirement. If you are able to define the colour palette from the collection of images, you will likely improve the quality of the output.
The already referenced blog entry entitled Use 'GDI+ to Save Crystal-Clear GIF Images with .NET' contains useful references to code.