I would like to ask for some insight/assistance on how I might improve my OCR accuracy. My target images are low resolution (screenshots) and I would very much prefer not to upscale them, as my program needs to perform fast.
I have 2 images. I see no apparent difference between them, however tesseract is having trouble with one.
image 1 image2
The first image is the issue, the result I am getting is: 251\n41\n31\n\n11\n11\n\n11\n
As you can see, there is something wrong with how it's handling the spacing. There are 2x new lines when things start to go wrong.
Meanwhile, in the second image I get the expected result: 300\n60\n40\n\n1\n15\n15\n10\n6\n15\n
These images were created through the following preprocessing steps:
image.Alpha(AlphaOption.Remove);
image.BlackThreshold(new Percentage(27));
image.Negate(); // Original image has white text on black background
I have limited tesseract's charset to only digits (01234567890-).
I have tried various segmentation modes (SparseText, SingleColumn, SingleBlock). I am running Tesseract 4.1. Do you guys have any pointers?
Or maybe you could tell me what resize algorithm is fast and good for OCR?
If you are having issues with Tesseract and are considering using a more robust library without the need for training, you can try using a commercial library such as Leadtools. With the Leadtools OCR toolkit, I was able to get perfect results for both images with only the basic image processing built in to the OCR demo. There are, however, more sophisticated image processing functions that you can use for more complex tasks if need be. Besides the OCR demo, I was also able to get the same results, in JSON form, without any preprocessing, from one of the tutorials posted here below. As a disclaimer, I work for this vendor.
https://www.leadtools.com/help/sdk/v21/tutorials/dotnet-console-export-ocr-results-to-json.html
Here's some simplified source code that would achieve the same task for one image and print out the raw text:
// Create a new OCR engine with the default settings
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, null);
// Create an OCR document to hold everything
using (IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()){
// Add the input image as a new page
using(IOcrPage page = ocrDocument.Pages.AddPage(inputFilename, null)){
// Perform OCR on just the one page
page.Recognize(null);
// build a string from the recognized characters
string text = page.GetText(0);
// Show output
Console.WriteLine($"text: '{text}'");
}
}
The results I got for the two images were "251\r\n41\r\n31\r\n1\r\n11\r\n11\r\n7\r\n4\r\n11\r\n" and "300\r\n60\r\n40\r\n1\r\n15\r\n15\r\n10\r\n6\r\n15\r\n".
Related
I'm using a c# wrapper for the Tesseract library (3.02 if I'm not mistaken) (https://github.com/charlesw/tesseract). I've got it running and giving output, but that output is essentially garbage. Often it gives nothing and when it does give something it's often a mess. I know it's theoretically working because I've tried it on some really perfect images and it works. I'm wondering if someone can help me diagnose the issues and suggest some ways I can improve Tesseract accuracy. I've already converted all the images to black and white and the resolution is set at 300x300. I don't do any line straightening programmatically but as you can see below they're pretty straight.
This image works perfectly
This one does not work at all, producing either gibberish or nothing at all
I tried flipping the colors on some examples, thinking that it might give greater contrast (since most text is black on a white background, whereas the working ones were white text on black background). But:
Does not work at all, whereas
Again works perfectly.
I suspect this has something to do with the additional spacing between the letters in "INVOICE." But there must be some way to get decent results with a tighter font. Any suggestions are welcome, I'm a relative noob here.
If possible you should consider using pictures with a higher resolution. The other problem about the Payments image is probably the gap between the letters that is too small. Tesseract cannot detect single letters if they are (almost) connected to the next letter of the word.
I would suggest an image processing library like openCV to improve your results.
You could try erosion/dilation. This will seperate the letters if the right parameters are used for the kernel. Use different kernels to see what works best for you.
Mat element = getStructuringElement(erosion_type,
Size(2 * erosion_size + 1, 2 * erosion_size + 1),
Point(erosion_size, erosion_size));
erode(src, erosion_dst, element);
What was helping me a lot when I was working on my project was using an adaptive threshold. I found this to be way more effective than just turning it into a grayscale or binary image.
Note: Java Code, should be very similar in C though.
Imgproc.adaptiveThreshold(cropedIm, cropedIm, 255, Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C, Imgproc.THRESH_BINARY, 29, 10);
This is what I get after selecting one of your images in Pixtern, an android project of mine(source code on github). I was using a the adapting threshold but no dilation/erosion and the result is already quite good.
[broken links removed]
For the Payments image and similar ones:
Try using a normal threshold and inverting the image(black font, white background). Again, dilation/erosion can be used afterwards. Java Code:
//results in binary image
Imgproc.threshold(cropedIm, cropedIm, 127, 255, Imgproc.THRESH_BINARY);
//Inverting image
Core.bitwise_not(cropedIm, cropedIm);
Tesseract expects whole pages or rather it was trained on those.
If you give it one or two characters or words it won't work well.
I assume you have more of these images. Stitch them together as lines of text: like each image is a line of text after the previous and it should work much better.
Furthermore, make sure you set the psm-parameter right when using tesseract. More on this: https://www.pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/
I'm trying to make a screen sharing program, the program flows will be like this:
capture screen
slice to 9
compare new slice with old slice
replace the different slice
upload to web (with new slice)
But I've got some problems with replacing the slices (in replace function). From all the source I have searched I need to convert the bitmap image (the slice) to string, then I can replace. but there's no example for converting bitmap double array to strings.
Is there any possibility to replace the image without convert it to strings?
Why would you need to replace bitmap data using a string as intermediate? You can use bitmap manipulation functions just fine. Also, I'm having trouble understanding your algorithm. You get a bitmap of the whole screen. Then you cut it into 9 parts (are those the corners, edges and center?), compare each of the slices to their old versions one by one, replace the ones that changed, and then you upload the whole bitmap? Don't you want to upload each of the slices separately, only uploading the ones that changed? Otherwise it doesn't really make sense to do the slicing at all, or does it?
Now, it's true that converting the data to string lets you use the string comparison functions and other stuff like that, but that's an awful idea. The fastest way to compare two byte arrays would be using the memcmp function in msvcrt.dll. This answer gives you the solution to that - https://stackoverflow.com/a/2038515/3032289, including reading the data from the original bitmaps.
Then you just send the slices that aren't the same as their older versions and you're done, no replacing needed.
Probably the best way is to perform a Base64 encoding
Google for base64 C++ source code.
Currently i'm having a huge intrest in image processing and optical character recognition. After some basic recognition and some filters I decided to start on something more diffucult.
I'm trying to read the value out of these captchas:
http://img851.imageshack.us/img851/9579/57859946.png
I have written some filters for pre-processing:
- Replace Color (to White)
Remove blue lines
remove the lines that go trough the text (two)
- Threshold image (255)
Wich outputs an images like this;
http://img232.imageshack.us/img232/2325/00i3q45j1zt.png
As you can see there are holes in some letters. I first thought maybe it's better to leave the lines trough the letters but that made it worse. I'm using the tesseract OCR engine
and I trained it using the Elephant font (The font the captcha uses). I also tried
using other OCR engines like GOCR but it makes everything worse. With tesseract I now have a recognition of 20%. I'm coding in C# (.NET 4.0).
The captcha is generated by a software package named PHPCaptcha.
Now my question is:
Is there any algorithm or tick to fill up the holes in the letters? And is there any other way to get a better recognition?
I'm excited to hear from you guys :)
Greetings,
Part 0 - Preface
i) Before hand, you may want read to my OCR-related answer here, which may give you some tricks for using tesseract
ii) I assume you could just turn everything into black and white (in your case, processing in colors doesn't give you an edge)
Part 1 - Preprocessing
To fill 'the-holes' after you've removed the blue lines. You can always dilate or perform 'dilate-then-erode' operations. Here, dilation means you enlarge every pixel in 8-directions(making a bigger pixel). Once you've dilated the pixels, see if you can get them to be recognized or see if the characters are 'over-filled' (dilated too much). If the chars cannot be recognized or the characters are dilated too much, you can then apply a erosion operation. Of course there are advanced synthesis algorithms, but i think you are better off to start with a simpler image processing operation first.
Part 2 - OCR/Tesseract
With Tesseract, if you are feeding the whole image into Tesseract, it would perform line analysis and so on and so forth. Since characters in captcha dont behave like normal text, doing line analysis or recognizing them in a group may somewhat deteoriate the recognition rate. So my suggestion is to recognize by character-by-character first.
I am trying to extract human from a video source, so that I can use his image later. I need to only extract human body, and ignore the environment. The good thing is that the background is static. I have tried to use AForge and applied CustomFrameDifferenceDetector filter, which compares current frame to the static background image and extracts the pixels which differ (difference>threshold). It works well, but there is a problem when skin or part of the clothing has the similar color to background. In these cases filter ignores these parts and the result has various holes in the body. Simply decreasing threshold doesn't solve the problem, since body shadows and other noise increases (even under noise supression).
Do you know of any known solution to this problem? Or is it still unsolved problem?
It's a hard to solve issue (and one of the reasons for Microsoft's Kinect to not use visible light only and why blue/green screening is still so popular). I'd try to remove holes (you should be able to predict where the body has to be). If you've got the processing power, use different thresholds and merge the results. You could as well try to filter new separated images (e.g. add current frame to last frame and normalize the result). This way you could track shapes you're losing for one frame a lot more consistent.
A different approach: Use the detected shape/region for detecting the position of the body only. I.e. ignore its specific shape and use a premade shapre above/around the estimated body position. This most likely won't work if you'd like to do some kind of bluescreen like behaviour but it might help as well closing holes.
Alturos.Yolo does exactly what you are looking for.
Yolo learns from annotated images how to detect the objects you are looking for. First you need to install the project, along with a set of already trained images using the Nuget Package Manager. In your case the YOLOv2-tiny model should suffice:
Install-Package Alturos.Yolo
Install-Package Alturos.YoloV2TinyVocData
Once installed, you can use it like this to detect a human in your image:
using (var yoloWrapper = new YoloWrapper("yolov2-tiny-voc.cfg", "yolov2-tiny-voc.weights", "voc.names"))
{
var items = yoloWrapper.Detect(#"your_image.jpg");
//if (items[0].Type == "Person") { ... }
}
The items array will contain information about all the objects found. You can check there if it's a human you are looking at, using the Type property.
I have a source image like left picture and a set of elements like right picture: Source Image And Elements...
..and I need to generate a mosaic picture like this.
But until this moment I have not worked with images, аnd I do not know where I should start.
I worked several years with C#, but you can give examples in other similar languages.
The result image you gave is apparently a ministeck pattern - in 2011 they had a downloadable software that seemed to do what you want. (Which is not available anymore by ministeck directly, but it seems that pfci.de still provides a download).
So, if you're just looking to generate the patterns for ministeck out of a given image, use their software. If you're after an algorithm to achieve something different, this won't help.
EDIT
Ok, if you're after analyzing your image, you need to load it into an object like this:
using(Bitmap b = new Bitmap(yourFileName))
{
MessageBox.Show(string.Format("image size {0} by {1} pixels", b.Width, b.Height));
MessageBox.Show(string.Format("color of pixel (100,100) is {0}", b.GetPixel(100, 100).ToString()));
}
The Bitmap object has several properties and methods that will help you to analyze the image content. Try this to get started with analyzing your image, and don't forget to either dispose your bitmap afterwards or wrap it into a using statement as shown above ...