Hello I want to extract text-blocks from images and pass it to ocr for better accuracy. I have been searching on internet but not able to find suitable example for this. I am very new to this concept can anyone please help me out on the same?
This is what I want to achieve.
Note I am using EMGUCV for opencv and ocr.
I want to scan receipt mostly.
If you can help with that it would be great.
Is your text always in the same location? If yes, you will have the location of the region of interest.
//Create the rectangle
cv::Rect roi(0, 0, 500, 500);
//Create the cv::Mat with the ROI you need
cv::Mat imageRoi = image(roi)
Then you can send to the ocr this images
You can threshold your image into a binary image. After that you can use the morphologic operation "DILATE" (repeatedly) do join the letters. When the letters are joined us the "findContours()" function to extract the contour and the bounding boy of it.
Related
I would like to ask for some insight/assistance on how I might improve my OCR accuracy. My target images are low resolution (screenshots) and I would very much prefer not to upscale them, as my program needs to perform fast.
I have 2 images. I see no apparent difference between them, however tesseract is having trouble with one.
image 1 image2
The first image is the issue, the result I am getting is: 251\n41\n31\n\n11\n11\n\n11\n
As you can see, there is something wrong with how it's handling the spacing. There are 2x new lines when things start to go wrong.
Meanwhile, in the second image I get the expected result: 300\n60\n40\n\n1\n15\n15\n10\n6\n15\n
These images were created through the following preprocessing steps:
image.Alpha(AlphaOption.Remove);
image.BlackThreshold(new Percentage(27));
image.Negate(); // Original image has white text on black background
I have limited tesseract's charset to only digits (01234567890-).
I have tried various segmentation modes (SparseText, SingleColumn, SingleBlock). I am running Tesseract 4.1. Do you guys have any pointers?
Or maybe you could tell me what resize algorithm is fast and good for OCR?
If you are having issues with Tesseract and are considering using a more robust library without the need for training, you can try using a commercial library such as Leadtools. With the Leadtools OCR toolkit, I was able to get perfect results for both images with only the basic image processing built in to the OCR demo. There are, however, more sophisticated image processing functions that you can use for more complex tasks if need be. Besides the OCR demo, I was also able to get the same results, in JSON form, without any preprocessing, from one of the tutorials posted here below. As a disclaimer, I work for this vendor.
https://www.leadtools.com/help/sdk/v21/tutorials/dotnet-console-export-ocr-results-to-json.html
Here's some simplified source code that would achieve the same task for one image and print out the raw text:
// Create a new OCR engine with the default settings
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, null);
// Create an OCR document to hold everything
using (IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument()){
// Add the input image as a new page
using(IOcrPage page = ocrDocument.Pages.AddPage(inputFilename, null)){
// Perform OCR on just the one page
page.Recognize(null);
// build a string from the recognized characters
string text = page.GetText(0);
// Show output
Console.WriteLine($"text: '{text}'");
}
}
The results I got for the two images were "251\r\n41\r\n31\r\n1\r\n11\r\n11\r\n7\r\n4\r\n11\r\n" and "300\r\n60\r\n40\r\n1\r\n15\r\n15\r\n10\r\n6\r\n15\r\n".
I'm using a c# wrapper for the Tesseract library (3.02 if I'm not mistaken) (https://github.com/charlesw/tesseract). I've got it running and giving output, but that output is essentially garbage. Often it gives nothing and when it does give something it's often a mess. I know it's theoretically working because I've tried it on some really perfect images and it works. I'm wondering if someone can help me diagnose the issues and suggest some ways I can improve Tesseract accuracy. I've already converted all the images to black and white and the resolution is set at 300x300. I don't do any line straightening programmatically but as you can see below they're pretty straight.
This image works perfectly
This one does not work at all, producing either gibberish or nothing at all
I tried flipping the colors on some examples, thinking that it might give greater contrast (since most text is black on a white background, whereas the working ones were white text on black background). But:
Does not work at all, whereas
Again works perfectly.
I suspect this has something to do with the additional spacing between the letters in "INVOICE." But there must be some way to get decent results with a tighter font. Any suggestions are welcome, I'm a relative noob here.
If possible you should consider using pictures with a higher resolution. The other problem about the Payments image is probably the gap between the letters that is too small. Tesseract cannot detect single letters if they are (almost) connected to the next letter of the word.
I would suggest an image processing library like openCV to improve your results.
You could try erosion/dilation. This will seperate the letters if the right parameters are used for the kernel. Use different kernels to see what works best for you.
Mat element = getStructuringElement(erosion_type,
Size(2 * erosion_size + 1, 2 * erosion_size + 1),
Point(erosion_size, erosion_size));
erode(src, erosion_dst, element);
What was helping me a lot when I was working on my project was using an adaptive threshold. I found this to be way more effective than just turning it into a grayscale or binary image.
Note: Java Code, should be very similar in C though.
Imgproc.adaptiveThreshold(cropedIm, cropedIm, 255, Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C, Imgproc.THRESH_BINARY, 29, 10);
This is what I get after selecting one of your images in Pixtern, an android project of mine(source code on github). I was using a the adapting threshold but no dilation/erosion and the result is already quite good.
[broken links removed]
For the Payments image and similar ones:
Try using a normal threshold and inverting the image(black font, white background). Again, dilation/erosion can be used afterwards. Java Code:
//results in binary image
Imgproc.threshold(cropedIm, cropedIm, 127, 255, Imgproc.THRESH_BINARY);
//Inverting image
Core.bitwise_not(cropedIm, cropedIm);
Tesseract expects whole pages or rather it was trained on those.
If you give it one or two characters or words it won't work well.
I assume you have more of these images. Stitch them together as lines of text: like each image is a line of text after the previous and it should work much better.
Furthermore, make sure you set the psm-parameter right when using tesseract. More on this: https://www.pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/
well iam trying to make an object tracker i produced the filtered image which is tracking the object and convert it to white i used this to get the filtered image
CvInvoke.cvInRangeS(HSVimg, low, high, THImg);
now iam trying to get the contours and get the center point so i used this (can't test it yet)
using (Image<Gray, Byte> canny = smoothedRedMask.Canny(100.0, 50.0))
using (MemStorage stor = new MemStorage())
{
Contour<Point> contours = canny.FindContours(
Emgu.CV.CvEnum.CHAIN_APPROX_METHOD.CV_CHAIN_APPROX_SIMPLE,
Emgu.CV.CvEnum.RETR_TYPE.CV_RETR_TREE,
stor);
}
so i have two questions what does canny method do ?
how do i draw a shape around the tracked object then get the center point using moment or any other method ?
u don't have to write code just give me reference to simple code that i can use
The Canny function is the implementation of an edge detection algorithm ,it uses a multi-stage algorithm to detect a wide range of edges in images.
refer to this wikipedia article or this tutorial/code to understand better.
The other part of the question is a bit more tricky as drawing a shape around the tracked object will depend on the quality of the image received after applying the canny edge detection and also the geometry of the object.
So you might want to adjust values of the canny function to suit your needs.
but you can refer to these youtube video tutorials to better understand/code your object tracking logic.
Video 1
Video 2.
Hoping it helps.
I have a source image like left picture and a set of elements like right picture: Source Image And Elements...
..and I need to generate a mosaic picture like this.
But until this moment I have not worked with images, аnd I do not know where I should start.
I worked several years with C#, but you can give examples in other similar languages.
The result image you gave is apparently a ministeck pattern - in 2011 they had a downloadable software that seemed to do what you want. (Which is not available anymore by ministeck directly, but it seems that pfci.de still provides a download).
So, if you're just looking to generate the patterns for ministeck out of a given image, use their software. If you're after an algorithm to achieve something different, this won't help.
EDIT
Ok, if you're after analyzing your image, you need to load it into an object like this:
using(Bitmap b = new Bitmap(yourFileName))
{
MessageBox.Show(string.Format("image size {0} by {1} pixels", b.Width, b.Height));
MessageBox.Show(string.Format("color of pixel (100,100) is {0}", b.GetPixel(100, 100).ToString()));
}
The Bitmap object has several properties and methods that will help you to analyze the image content. Try this to get started with analyzing your image, and don't forget to either dispose your bitmap afterwards or wrap it into a using statement as shown above ...
My requirement is something like this:
Lets take there is a Bitmap with a big letter 'A'.
The Bitmap is two colors (Either Black or White).
I need to skeletonize the big 'A'. (see: http://en.wikipedia.org/wiki/Topological_skeleton)
Using "Medial Axis Transforming" algorithm.
I tried my best in googling but i ended up being lost in finding a C#, C++ or at least pseudo code implementation of this algorithm.
I would like if someone could help me on this.
This page http://www.cs.sunysb.edu/~algorith/files/thinning.shtml has some sources you may wish to review.
The following two articles are the ones where the Medial Axis Transform was first proposed, so I think that you can find the algorithm to implement there. Do not expect a C++/C# implementation though.
A transformation for extracting new descriptors of shape
Shape description using weighted symmetric axis features
For the first one I was able to find a URL to a pdf. For the second one you will have to have access to ScienceDirect to download.
Another approach that you can use to extract the skeleton of a shape is by the Image Foresting Transform (IFT). It consists in representing the binary image as a graph. I made an implementation of the skeletonization by IFT in Matlab using the following article:
Multiscale skeletons by image foresting transform and its applications to neuromorphometry