Background Info:
I am currently working on a screenshot tool which performs OCR on the snippet to copy text to the clipboard.
While it works fine most of the time, it has issues with small selections.
In order to ensure the OCR to work I scale the snippets to have either a minimum width of 640 pixels or a minimum height of 480 pixels and scale resolution and size accordingly.
The tesseract OCR engine has issues recognizing text on small selections.
I suspect that the image needs padding if the text is not somewhat centered or has enough white or black around it to make it stand out properly from the background.
Question:
How could I recognize the need of padding in an image before performing OCR on it?
Current pre-processing steps:
For pre-processing I'm resizing (if required) and convert the 24bppRGB image to an 8bppIndexed grayscale image.
I then create a histogram, calculate the global threshold and apply binarization to it.
Examples:
A) Not recognizing any text:
B) Recognizing text properly:
Solution 1: The "petrol-head" approach
Well, if you need padding, and that seems to fix it, then the "petrol-head" approach would be to simply add it, by calculating how much white is available before it hits black on the image kind of like casting rays from left,right,top,bottom or just resizing the image if it's less than the needed width/height. Then adding the needed amount of white "padding" accordingly and not doing this operation if the size exceeds 640x480 or whatever size it needs to work correctly. This would pretty much look like this pseudo-code:
/* PSEUDO-CODE */
void make_ocr_readable(image) {
if (image.width >= 640 && image.height >= 480) {
doOCR(image);
} else {
ocr_readable_img = castrays(image); // cast rays, add padding accordingly
doOCR(ocr_readable_img);
}
}
OR
/* PSEUDO-CODE */
void make_ocr_readable(image) {
if (image.width >= 640 && image.height >= 480) {
doOCR(image);
} else {
Bitmap padding = new Bitmap(640, 480, System.Drawing.Imaging.PixelFormat.Format32bppPArgb);
ocr_readable_img = add_images(image, padding); // adds the image on top of padding
doOCR(ocr_readable_img);
}
}
Solution 2:
As already suggested by #Ralf you can take this issue up with the team behind the OCR, ask around on github or simply train the model yourself.
Related
I'm trying to learn how to remove noise of a captcha image. I started trying to find patterns in the images.
1) The background are always orange:
2) The font are aways the same and aways are in the same size.
Now its time to try to remove the noise, but in my searches I coudnt understand how to remove noise effectively with the captchas I have.
I'm familiar with C# and I was reading about OpenCV, how can I use it to remove the noise in the images I have?
Here's a very simple approach:
Obtain binary image. Load the image, convert to grayscale, and adaptive threshold.
Isolate desired characters. Perform morphological opening to remove the salt & pepper noise.
Remove small noise. Find contours and filter using contour area.
Invert image. The reason we invert the image is because when performing OCR, we want the desired text in black with the background in white.
Here's a visualization of each step:
Binary image
Morph opening + contour area filtering
Invert image for result
Here's the output with the other images
I implemented this appraoch in python but you can adapt the same strategy into C#
import cv2
# Load image, grayscale, adaptive threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,11,3)
# Morph open
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# Remove noise by filtering using contour area
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 10:
cv2.drawContours(opening, [c], -1, (0,0,0), -1)
# Invert image for result
result = 255 - opening
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('result', result)
cv2.waitKey()
For images like these which are very binary, I would suggest OpenCV's morphological transformations. See here for a description of the different types.
This transformation is probably what you'd want to do to remove the noise, although it will slightly change the shape of your letters:
This is called "opening" and it erodes white space (completely erasing small little flecks like your noise) then it'll dilate the white space that remains, so larger pieces will stay about the same size. Try thresholding your image and then use opening with different size kernels to see which one works the best.
Here is the documentation on the function you'll need for morphological opening.
I'm working on a Winforms app that contains a large map image (5500px by 2500px). I've set it up so the map starts in full size, but the user can zoom out to a few different scales to see more of the map. The user is able to drag the map around to shift what they are looking at (like Google Maps, Bing Maps, Civilization, etc.).
When the map is full sized (scale = 1.0), I am able to prevent the user from scrolling past the borders of the image. I do this by calculating if they are trying to move past 0, or past the image width - current window size, similar to this:
if (_currHScroll <= 0) {
_currHScroll = 0;
}
This all works just fine. But, when I zoom out on the map (thus, making the image smaller), the limits for the bottom and right of the map break down. I know why this happens--because the Transform that is performed basically "compresses" the map a little bit, and so what used to be a 5000 px image is now smaller, depending on the scale. But, my limiters are based on the image size.
So, the user can scroll past the end of the map, and just sees white space. Worse things happen, I realize, but if possible I'd like to keep them from doing that.
I'm sure there is a straight-forward way to do this, but I haven't figured it out yet. I've tried simply multiplying my calculation by the scale, but that didn't seem to work (seems to under-estimate the size initially, then over-estimate on the smallest sizes). I've tried calculating the transform location of the bottom right of the image, and using that, but it turns out, that number is inverted, and I can't find what it relates to.
I'm including my transform point method here. It works just fine. It tells me, regardless of zoom level, what pixel was clicked on the original image. Thus, if someone clicks on point 200, 200 but the image is scaled at .5, it will show something like 400,400 as what was clicked (but, as I said, I don't think the scale value is a multiplier--using this just for demonstration purposes).
public Point GetTransformedPoint(Point mousePoint) {
Matrix clickTransform = _mapTransform.Clone();
Point[] xPoints = { new Point(mousePoint.X, mousePoint.Y) };
clickTransform.Invert();
clickTransform.TransformPoints(xPoints);
Debug.Print("Orig: {0}, {1} -- Trans: {2}, {3}", mousePoint.X, mousePoint.Y, xPoints[0].X, xPoints[0].Y);
return xPoints[0];
}
Many thanks in advance. I'm sure it's something relatively easy that I'm overlooking, but after several hours, I'm just not finding it.
If i understand right, you can calculate the maximum with your method GetTransformedPoint by using width and height from your Image as Point. The result can then be used inside your check...
And by the way, you are right, the scale value is a multiplier used as a factor. The only thing is, you have to cast the result to an integer.
I have an application that scan Images from scanner but some scanners put a black border around the saved image.
How can I remove that black border?
Thanks so much for your participation.
I’ve had good luck in the past process images using the Magick.NET library. It’s available on Codeplex or you can install it using NuGet in Visual Studio. Documentation for the library is a little sparse, but it’s served me well in the past.
Depending upon the exact nature of the images you’re dealing with, you might be able to do something as simple as crop off the edges where the border is and then add a white (or whatever color; I just assumed that you were scanning text documents or something) border to bring the image back up to a standardized size. If having a standardized size doesn’t matter, then of course you can just leave the image cropped. If that sounds like a viable solution, here’s some code that should accomplish what you need:
using (MagickImage image = new MagickImage(#"path_to_original"))
{
int width = image.Width, height = image.Height;
image.Crop(width - 800, height - 800);
//if the image needs to be brought back up to a standarized size
image.BorderColor = new ColorRGB(System.Drawing.Color.White);
image.Border(100, 100);
image.Write(#"path_to_cropped_image_with_no_more_black_border_around_it");
}
You will, of course need to add your own values for just how much width you need to crop off/add back in.
I need to create a program that will display lines or dots of coordinates read from a txt file. The application will be attached to the output of an eye-tracking program, and will display the data.
How do I display some sort of graphic at a particular coordinate on the screen?
Note: The window is full-screen, and I can use WPF or WinForms.
I would overlay your video with an Image element; something like:
<Grid>
<Image x:Name=TrackingImage />
<MediaElement/>
</Grid>
Then in your code behind; set the source to a WriteableBitmap. The documentation has an excellent sample, but to summarize it here:
WriteableBitmap writeableSource = new WriteableBitmap(100, 100, 96, 96, PixelFormats.Bgra32, null);
// Calculate the number of bytes per pixel.
int _bytesPerPixel = (writeableSource .Format.BitsPerPixel + 7) / 8;
// Stride is bytes per pixel times the number of pixels.
// Stride is the byte width of a single rectangle row.
int _stride = writeableSource .PixelWidth * _bytesPerPixel;
private void SomeUpdateFunction()
{
// Define the rectangle of the writeable image we will modify.
// The size is that of the writeable bitmap.
Int32Rect _rect = new Int32Rect(0, 0, _wb.PixelWidth, _wb.PixelHeight);
//Update writeable bitmap with the colorArray to the image.
_wb.WritePixels(_rect, pixelBuffer, _stride, 0);
TrackingImage.Source = writeableSource;
}
Note that it uses WritePixels (specifically; this overload: MSDN)
Obviously you will need to modify the parameters to get the correct pixel in the correct place. This is the right technique though.
This answer was inspired by: Drawing Pixels in WPF It might be worth looking at if you need more info.
Various bitmap formats are instructions to put colored dots at specific locations. Why not use something like that? What ELSE do you need it to do?
Regarding your eye-tracking and point-data comment, if you want to composite it with captured video, then you don't need to worry about how to display the images so much as you need to think about how to add the dots to the video itself. The video player will do the displaying.
From what I know about screen-capture and video codecs (not a whole lot) it will be best to work with the uncompressed video before it gets encoded. Otherwise you'll have to decode, add, and re-encode. I'd look for a way to hook into the capture program and add the live eye-tracker data to the captured frames.
I have an image where I need to change the background colour (E.g. changing the background of the example image below to blue).
However, the image is anti-aliased so I cannot simply do a replace of the background colour with a different colour.
One way I have tried is creating a second image that is just the background and changing the colour of that and merging the two images into one, however this does not work as the border between the two images is fuzzy.
Is there any way to do this, or some other way to achieve this that I have no considered?
Example image
Just using GDI+
Image image = Image.FromFile("cloud.png");
Bitmap bmp = new Bitmap(image.Width, image.Height);
using (Graphics g = Graphics.FromImage(bmp)) {
g.Clear(Color.SkyBlue);
g.InterpolationMode = InterpolationMode.NearestNeighbor;
g.PixelOffsetMode = PixelOffsetMode.None;
g.DrawImage(image, Point.Empty);
}
resulted in:
Abstractly
Each pixel in your image is a (R, G, B) vector, where each component is in the range [0, 1]. You want a transform, T, that will convert all of the pixels in your image to a new (R', G', B') under the following constraints:
black should stay black
T(0, 0, 0) = (0, 0, 0)
white should become your chosen color C*
T(1, 1, 1) = C*
A straightforward way to do this is to choose the following transform T:
T(c) = C* .* c (where .* denotes element-wise multiplication)
This is just standard image multiplication.
Concretely
If you're not worried about performance, you can use the (very slow) methods GetPixel and SetPixel on your Bitmap to apply this transform for each pixel in it. If it's not clear how to do this, just say so in a comment and I'll add a detailed explanation for that part.
Comparison
Compare this to the method presented by LarsTech. The method presented here is on the top; the method presented by LarsTech is on the bottom. Notice the undesirable edge effects on the bottom icon (white haze on the edges).
And here is the image difference of the two:
Afterthought
If your source image has a transparent (i.e. transparent-white) background and black foreground (as in your example), then you can simply make your transform T(a, r, g, b) = (a, 0, 0, 0) then draw your image on top of whatever background color you want, as LarsTech suggested.
If it is a uniform colour you want to replace you could convert this to an alpha. I wouldn't like to code it myself!
You could use GIMP's Color To Alpha source code (It's GPL), here's a version of it
P.S. Not sure how to get the latest.
Background removal /replacement, IMO is more art than science, you’ll not find one algorithm fit all solution for this BUT depending on how desperate or interested you are in solving this problem, you may want to consider the following explanation:
Let’s assume you have a color image.
Use your choice of decoding mechanism and generate a gray scale / luminosity image of your color image.
Plot a graph (metaphorically speaking) of numeric value of the pixel(x) vs number of pixels in the image for that value(y). Aka. a luminosity histogram.
Now if your background is large enough (or small), you’d see a part of the graph representing the distribution of a range of pixels which constitute your background. You may want to select a slightly wider range to handle the anti-aliasing (based on a fixed offset that you define if you are dealing with similar images) and call it the luminosity range for your background.
It would make your life easier if you know at least one pixel (sample/median pixel value) out of the range of pixels which defines your background, that way you can ‘look up’ the part of the graph which defines your background.
Once you have the range of luminosity pixels for the background, you may run through the original image pixels, compare their luminosity values with the range you have, if it falls within, replace the pixel in the original image with the desired color, preferably luminosity shifted based on the original pixel and the sample pixel, so that the replaced background looks anti-aliased too.
This is not a perfect solution and there are a lot of scenarios where it might fail / partially fail, but again it would work for the sample image that you had attached with your question.
Also there are a lot of performance improvement opportunities, including GPGPU etc.
Another possible solution would be to use some of the pre-built third party image processing libraries, there are a few open source such as Camellia but I am not sure of what features are provided and how sophisticated they are.