I'm trying to learn how to remove noise of a captcha image. I started trying to find patterns in the images.
1) The background are always orange:
2) The font are aways the same and aways are in the same size.
Now its time to try to remove the noise, but in my searches I coudnt understand how to remove noise effectively with the captchas I have.
I'm familiar with C# and I was reading about OpenCV, how can I use it to remove the noise in the images I have?
Here's a very simple approach:
Obtain binary image. Load the image, convert to grayscale, and adaptive threshold.
Isolate desired characters. Perform morphological opening to remove the salt & pepper noise.
Remove small noise. Find contours and filter using contour area.
Invert image. The reason we invert the image is because when performing OCR, we want the desired text in black with the background in white.
Here's a visualization of each step:
Binary image
Morph opening + contour area filtering
Invert image for result
Here's the output with the other images
I implemented this appraoch in python but you can adapt the same strategy into C#
import cv2
# Load image, grayscale, adaptive threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,11,3)
# Morph open
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# Remove noise by filtering using contour area
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 10:
cv2.drawContours(opening, [c], -1, (0,0,0), -1)
# Invert image for result
result = 255 - opening
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('result', result)
cv2.waitKey()
For images like these which are very binary, I would suggest OpenCV's morphological transformations. See here for a description of the different types.
This transformation is probably what you'd want to do to remove the noise, although it will slightly change the shape of your letters:
This is called "opening" and it erodes white space (completely erasing small little flecks like your noise) then it'll dilate the white space that remains, so larger pieces will stay about the same size. Try thresholding your image and then use opening with different size kernels to see which one works the best.
Here is the documentation on the function you'll need for morphological opening.
Related
I'm struggling with finding a optimal binarization as preprocessing step for OCR (tesseract in C#).
The images are 1624 X 1728 of pixel size and contain car gui elements (Buttons, Sliders, Info Boxes) and corresponding text from a car navigation command interface generation (different use case scenarios like radio control, car control, etc.). The images contain multiple colors, most of images are dark blue, and the text is white/gray or close to white. Unfortunately, I cannot share the images due to data privacy.
Problem: I cannot separate the text from the background in a efficent way (text to be black, everything else to be white), because the text color has a high range and is partialy the same with the background color (speaking of grayscaled images).
Actual procedure: First I convert the RGB Image from System.Drawing.Image to OpenCvSharp.Mat. Then I convert the Mat image from colored to gray and then from gray to binarized.
This is the main code for the binarization:
Mat binarized = grayscaled.Threshold(tresh, maxVal, ThresholdTypes.BinaryInv);
I use 255 as maxVal. If I use tresh=90, the binarized image looks ok overall (even if tesseract results are bad here), but some pixels of the bottom control elements text (and some other text) are white, because the tresh is too high (so some text characters are unsharp and not complete).
If I use like tresh = 40, the characters of the bottom control elements become complete and sharp (as the should be), but the background (middle of the image) gets completely black, which means that some text in there disappears inside of a big black chunk.
So the problem is a high text pixel color range inside of the grayscaled image that "interferes" with the colors of other elements or background, which makes the text extraction hard.
Note: I already tried AdaptiveThresholding like MeanC and GaussianC with different treshholds, kernel sizes and mean substraction constants without good results.
Question: What would be a efficient solution for the preprocessing?
I'm thinking about writing a method that binarizas from RGB, not from grayscaled. So the method would take a RGB image as input and binarize that white text color range into black and everything else into white.
One approach is to remove any frequencies in the image lower than that of your text. This can be done by creating a blurred copy of the image, with a kernel a bit larger than your text, and subtract this blurred image from the original. This should keep high frequencies, i.e. text and other edges, while removing any vignetting or other gradients over the image. Keep in mind that the resulting image will have a different range of values, where some will probably be negative.
Another option would be to split the image into sections, and use different thresholds in each, but that may lead to artifacts at the section boundaries.
So I have a program that scans cameras from multiple sources and takes a thumbnail of their view at a certain time and saves them as jpg's.
I would like to now scan these through my C# program and check if any of the created jpg files are completely black (either completely obstructed, or no signal in this case).
I am wondering what would be the best way of solving this problem. Not a color depth issue.
Thanks!
Use the GetPixel(x,y) function to check color at x,y location. You can iterate through the whole image and if they're all black then it's black. You can also check if majority of pixels are gray / black - if so then it's probably a very dim image.
Load picture.
Go through all pixels and check their RGB value.
If you find all below a certain threshhold - assume picture is black.
Beware: you should likely ignore single pixels not being black. Sensors are not perfect. Stuck pixels are a known phenomenon.
I'm trying to draw a string using either textrenderer.drawtext, graphics.drawstring or graphicspath.addstring - the main purpose is to extract all fonts to bitmaps to edit them and use them as bitmaps with shaders in a game.
With textrenderer.drawtext and graphics.drawstring, I get a padding on top of varying degrees - so I try graphicspath.addstring. I extract the font family's ascent height and descent height, but they are wildly unusable with emheight. (using ascent and descent with emheight is how microsoft suggest you do what I am trying to do - via http://msdn.microsoft.com/en-us/library/xwf9s90b%28v=vs.110%29.aspx. Has anyone successfully ever draw pixel perfect fonts using C#? Every time I ever try or look it up, textrenderer and graphics always' padding always screwed up drawing and this new graphicspath method seems to have an issue with using a specific scale.
The usual methods using TextRenderer or MeasureString will give you a SizeF, containing the bounds of the string you measure. Most formats include a little slack so you can compose text by adding strings together.
The aim of theses methods is to help create blocks of text by letting you measure when a line will be full or how many pixels to advance for the next line.
They are not really meant for maesuring single characters.
For this there is a special stringformat GenericTypographic as described here which leaves out the white space.
To get an even more precise measurement one can use GraphicsPath.AddString and then GetBounds, maybe after switching antialias off..
Now, if you wanted to draw a single character precisely, say centered on a Button this would do the job.
But you know all that and your aim is different - if I understand you correctl,y you want to create Bitmaps from each character in order to later join them to form text. This means you need them to line up correctly vertically, ie sit on the same baseline.
The sizes of the characters don't help you here; now, normally you'd need the baseline of each charcater, which you don't get, at least not for anything descending like 'f' or even just ',' etc..
But it wouldn't help you either because in GDI you don't print/draw to the baseline anyway..
What you should do, imo is either draw one long string with all characters, so that they're all lined up right and then cut out the characters one by one. Or you could draw each character on its own, but suffix all or some characters you know to have ascenders and descenders and then only pick the first columns from the result.
So the only way I figured out how to do this is is to first draw the string to a graphicpath, then measure all the empty spots in the graphic path, and get it's height only after I've measure every spot, then redraw the string (I have an attempt counter to limit attempts but increase em to pixel accuracy) taking the old size and new size into account by a modifier and then extract the final size and store it.
Only I got to get around the BS of every font having a weird top padding that isn't associated with it's ascent and internal overflow (ex: Ñ), as well as descent, in refrence to a 0,0 point, this way.
I have an image where I need to change the background colour (E.g. changing the background of the example image below to blue).
However, the image is anti-aliased so I cannot simply do a replace of the background colour with a different colour.
One way I have tried is creating a second image that is just the background and changing the colour of that and merging the two images into one, however this does not work as the border between the two images is fuzzy.
Is there any way to do this, or some other way to achieve this that I have no considered?
Example image
Just using GDI+
Image image = Image.FromFile("cloud.png");
Bitmap bmp = new Bitmap(image.Width, image.Height);
using (Graphics g = Graphics.FromImage(bmp)) {
g.Clear(Color.SkyBlue);
g.InterpolationMode = InterpolationMode.NearestNeighbor;
g.PixelOffsetMode = PixelOffsetMode.None;
g.DrawImage(image, Point.Empty);
}
resulted in:
Abstractly
Each pixel in your image is a (R, G, B) vector, where each component is in the range [0, 1]. You want a transform, T, that will convert all of the pixels in your image to a new (R', G', B') under the following constraints:
black should stay black
T(0, 0, 0) = (0, 0, 0)
white should become your chosen color C*
T(1, 1, 1) = C*
A straightforward way to do this is to choose the following transform T:
T(c) = C* .* c (where .* denotes element-wise multiplication)
This is just standard image multiplication.
Concretely
If you're not worried about performance, you can use the (very slow) methods GetPixel and SetPixel on your Bitmap to apply this transform for each pixel in it. If it's not clear how to do this, just say so in a comment and I'll add a detailed explanation for that part.
Comparison
Compare this to the method presented by LarsTech. The method presented here is on the top; the method presented by LarsTech is on the bottom. Notice the undesirable edge effects on the bottom icon (white haze on the edges).
And here is the image difference of the two:
Afterthought
If your source image has a transparent (i.e. transparent-white) background and black foreground (as in your example), then you can simply make your transform T(a, r, g, b) = (a, 0, 0, 0) then draw your image on top of whatever background color you want, as LarsTech suggested.
If it is a uniform colour you want to replace you could convert this to an alpha. I wouldn't like to code it myself!
You could use GIMP's Color To Alpha source code (It's GPL), here's a version of it
P.S. Not sure how to get the latest.
Background removal /replacement, IMO is more art than science, you’ll not find one algorithm fit all solution for this BUT depending on how desperate or interested you are in solving this problem, you may want to consider the following explanation:
Let’s assume you have a color image.
Use your choice of decoding mechanism and generate a gray scale / luminosity image of your color image.
Plot a graph (metaphorically speaking) of numeric value of the pixel(x) vs number of pixels in the image for that value(y). Aka. a luminosity histogram.
Now if your background is large enough (or small), you’d see a part of the graph representing the distribution of a range of pixels which constitute your background. You may want to select a slightly wider range to handle the anti-aliasing (based on a fixed offset that you define if you are dealing with similar images) and call it the luminosity range for your background.
It would make your life easier if you know at least one pixel (sample/median pixel value) out of the range of pixels which defines your background, that way you can ‘look up’ the part of the graph which defines your background.
Once you have the range of luminosity pixels for the background, you may run through the original image pixels, compare their luminosity values with the range you have, if it falls within, replace the pixel in the original image with the desired color, preferably luminosity shifted based on the original pixel and the sample pixel, so that the replaced background looks anti-aliased too.
This is not a perfect solution and there are a lot of scenarios where it might fail / partially fail, but again it would work for the sample image that you had attached with your question.
Also there are a lot of performance improvement opportunities, including GPGPU etc.
Another possible solution would be to use some of the pre-built third party image processing libraries, there are a few open source such as Camellia but I am not sure of what features are provided and how sophisticated they are.
Currently I'm seeking for a rather fast and reasonably accurate algorithm in C#/.NET to do these steps in code:
Load an image into memory.
Starting from the color at position (0,0), find the unoccupied space.
Crop away this unnecessary space.
I've illustrated what I want to achieve:
What I can imagine is to get the color of the pixel at (0,0) and then do some unsafe line-by-line/column-by-column walking through all pixels until I meet a pixel with another color, then cut away the border.
I just fear that this is really really slow.
So my question is:
Are you aware of any quick algorithmns (ideally without any 3rd party libraries) to cut away "empty" borders from an in-memory image/bitmap?
Side-note: The algorithm should be "reasonable accurate", not 100% accurate. Some tolerance like one line too much or too few cropped would be way OK.
Addition 1:
I've just finished implementing my brute force algorithm in the simplest possible manner. See the code over at Pastebin.com.
If you know your image is centered, you might try walking diagonally ( ie (0,0), (1,1), ...(n,n) ) until you have a hit, then backtrack one line at a time checking until you find an "empty" line (in each dimension). For the image you posted, it would cut a lot of comparisons.
You should be able to do that from 2 opposing corners concurrently to get some multi-core action.
Of course, hopefully you dont it the pathelogical case of 1 pixel wide line in the center of the image :) Or the doubly pathological case of disconnected objects in your image such that the whole image is centered, but nothing crosses the diagonal.
One improvement you could make is to give your "hit color" some tolerance (adjustable maybe?)
The algorithm you are suggesting is a brute force algorithm and will work all the time for all type of images.
but for special cases like, subject image is centered and is a continuous blob of colors (as you have displayed in your example), binary sort kind of algorithm can be applied.
start from center line (0,length/2) and start in one direction at a time, examine the lines as we do in binary search.
do it for all the sides.
this will reduce complexity to log n to the base 2
For starters, your current algorithm is basically the best possible.
If you want it to run faster, you could code it in c++. This tends to be more efficient than managed unsafe code.
If you stay in c#, you can parallel extensions to run it on multiple cores. That wont reduce the load on the machine but it will reduce the latency, if any.
If you happen to have a precomputed thumbnail for the image, you can apply your algo on the thumbnail first to get a rough idea.
First, you can convert your bitmap to a byte[] using LockBits(), this will be much faster than GetPixel() and won't require you to go unsafe.
As long as you don't naively search the whole image and instead search one side at a time, you nailed the algorithm 95%. Just make you are not searching already cropped pixels, as this might actually make the algorithm worse than the naive one if you have two adjacent edges that crop a lot.
A binary search can improve a tiny bit, but it's not that significant as it will maybe save you a line of search for each direction in the best case scenario.
Although i prefer the answer of Tarang, i'd like to give some hints on how to 'isolate' objects in an image by refering to a given foregroundcolor and backgroundcolor, which is called 'segmentation' and used when working in the field of 'optical inspection', where an image is not just cropped to some detected object but objects are counted and also measured, things you can measure on an object is area, contour, diameter etc..
First of all, usually you'll start really to walk through your image beginning at x/y coordinates 0,0 and walk from left to right and top to bottom until you'll find a pixel that has another value as the background. The sensitivity of the segmentation is given by defining the grayscale value of the background as well as the grayscale value of the foreground. You possibly will walk through the image as said, by coordinates, but from the programs view you'll just walk through an array of pixels. That means you'll have to deal with the formula that calculates the x/y coordinate to the pixel's index in the pixel array. This formula sure needs width and height of the image.
For your concern of cropping, i think when you've found the so called 'pivot point' of your foreground object, you'll usually walk along the found object by using a formula that detects neighbor pixels of the same foregeground value. If there is only one object to detect as in your case, it's easy to store those pixels coordinates that are north-most, east-most, south-most and west-most. These 4 coordinates mark the rectangle your object fits in. With this information you can calculate the new images (cropped image) width and height.