How to join 2 contours - c#

Im trying to build an ocr/ocv app. it works well. But in real world scenarios, printed text is not perfect, it has some defects like ink spread or a cut in between. Inkspread is manageable, but im stuck at how to join the 2 parts of a character when there is a cut, like in the image below:
I find contours before I do ocr/ocv, :
using (VectorOfVectorOfPoint contours = new VectorOfVectorOfPoint())
{
CvInvoke.FindContours(binaryimg, contours, null, RetrType.External, ChainApproxMethod.ChainApproxSimple);
int count = contours.Size;
for (int i = 0; i < count; i++)
{
double perimeter = CvInvoke.ArcLength(contours[i], true);
VectorOfPoint approx = new VectorOfPoint();
CvInvoke.ApproxPolyDP(contours[i], approx, 0.04 * perimeter, true);
CvInvoke.DrawContours(mainimg, contours, i, new MCvScalar(0, 255, 0), 2);
Rectangle r = CvInvoke.BoundingRectangle(approx);
id++;
int area = r.Width * r.Height;
int width = r.Width;
int height = r.Height;
}
}
I get the height and width of the character, and inside those rectangle I do OCR and OCV. When there is a cut in character, it gets detected as 2 contours. How do I join those? I tried open and closing but it didnt help much.

I'm not very familiar with OCR, but my understanding is that the first step is usually to use a text-detector to find parts of the image covered by text to form Regions of interest to feed to the OCR algorithm. Creating per character ROIs seem like it could be problematic since some fonts may have very small or even no spacing between some characters. Some OCR engines may benefit from knowing the font.
Assuming the print defects cover more than one character, one option would be to detect and repair that specific defect. You could for example apply a horizontal blur filter, do some thresholding and line detection. Once you found a defect you could try to repair it by finding places where it seem to cut characters, and fill in those places.
Another approach might be to retrain the neural network on your particular dataset and defects to try to improve accuracy.
But it is very likely some errors will still occur. So in the end you might still need to have a human proof-reading the result, or have some system to inform the operator about sections the algorithm is uncertain about.

Related

Perceive Dimensions (or prominent points) of a Cuboid in a 2D image using OpenCV

I was wondering if it is possible to find the dimensions (in pixel) of a cube/cuboid in an image like the one shown below?
I know its nearly impossible because of no information about the depth,the viewing angle, etc. But at least can one find the appropriate corners of the cube so that the length, width and height can be approximated?
Any type help or information would be appreciated.
Thanks in advance.
I guess I could suggest a solution to the "at least" part of the question. You can find the corners of the cube by finding the lines in the image.
Firstly, find the edges in the image. If the target images are as plain and clear as the provided one, finding edges must be straighforward. Use cv::Canny().
cv::Mat img = cv::imread("cube.png");
cv::Mat edges;
cv::Canny(img, edges, 20, 60);
Secondly, in the edges image, detect the straight lines. Use either cv::HoughLines() or cv::HoughLinesP(). Here, I proceed with the former one:
std::vector<cv::Vec2f> lines;
cv::HoughLines(edges, lines, 0.6, CV_PI / 120, 50);
Plaese refer to the on the OpenCV documentation on Hough lines. I also took the code for the visualization from there.
The cv::HoughLines() function detects straight lines and, for each line, returns 2 values (ρ - distance, and θ - rotation angle) which define this line's equation in polar coordinates. This function would often return several lines for one source edge (as it did for a couple of lines here). In our case, we can remove such duplicates by filtering lines with the very close ρ values.
The benefit of our case is that the sides of the cube resposible for each dimension (length, width, and height) will have the same rotation angle θ in found line equations. For instance, we can expect vertical sides of the cube (responsible for the height dimension) to remain vertical and have their θ close to 0 or π (see the OpenCV documentation). We could find such lines in the vector of the detected Hough lines:
std::vector<cv::Vec2f> vertical_lines;
std::copy_if(lines.begin(), lines.end(), std::back_inserter(vertical_lines), [](cv::Vec2f line) {
//copy if θ is near 0 or CV_PI
return ((0 < line[1]) && (line[1] < 0 + CV_PI / 10)) ||
((line[1] < CV_PI) && (line[1] > CV_PI - CV_PI / 10));
});
The same reasoning applies to finding the lines for the rest of the cube sides. Just filter the found Hough lines by appropriate θ.
Now that we have the equations of the lines of our interest, we can find their corresponding edge pixels (not optimal code below, just demo):
std::vector<cv::Point> non_zero_points;
cv::findNonZero(edges, non_zero_points);
std::vector<std::vector<cv::Point>> corresponding_points(vertical_lines.size());
for (int i = 0; i < vertical_lines.size(); ++i)
for (auto point : non_zero_points)
if (abs(cos(vertical_lines[i][1])*point.x + sin(vertical_lines[i][1])*point.y - vertical_lines[i][0]) < 2)
corresponding_points[i].push_back(point);
Now, for each found cluster find the top-most, the bottom-most points (or left-most/right-most for the other sides) and get your cube corners.
Please note the pixel I denoted by exclamation marks. It got accidently sorted to one of the vertical Hough lines, but it actually belongs to a non-vertical top side. It needs to be removed, by some outlier detection or by some other approach to the corresponding pixel search.
About retreiving actual lengths of the sides: to my knowledge, it is really a non-trivial problem. Maybe this SO question would be a good place to start.

Local Thresholding or Binarization of Text in an image

Im developing an application to exctract text in C# in different light condition.
My problem is that sometimes there are different brightness levels in the image, like this:
So i cant utilize a pre-calculated threshold for the whole image, or i will loose some letters.
Im searching an algorithm/snippet/function or else, that can apply the right Threshold/Binarization to the image.
I founded thhis BradleyLocalThresholding in AForge, is better than other non adaptive methods, but it loose some details. ( for example the G in the image become an O )
Anyone can suggest to me a better way?
yes, use niblack (opencv has it as a function) - basically it uses the local average to construct a variable theshold. it works best for OCR. depending on the image resolution you might also want to bicubically upsample by a factor of 2x or 3x BEFORE thresholding.
Its quite difficult since the quality of your images are so low, but you could try an iterative global thresholding approach as follows:
Randomly select an initial estimate threshold T (usually as the mean).
Segment the signal using T, which will yield two groups, G1 consisting of all points with values<=T and G2 consisting of points with value>T.
Compute the average distance between points of G1 and T, and points of G2 and T.
Compute a new threshold value T=(M1+M2)/2
Repeat steps 2 through 4 until the change of T is smaller enough.
The trick is not to apply it to the whole image, but to break up the image into blocks of (for example) 5x5 and apply it to the blocks individually which would give you:
Below is an implementation in R which I'm sure you could reproduce
getT = function(y){
t = mean(y)
mu1 = mean(y[y>=t])
mu2 = mean(y[y 1){
cmu1 = mean(y[y>=t])
cmu2 = mean(y[y 1 & cmu1 == mu1 & cmu2 == mu2){
print(paste('done t=', t))
return(t)
break;
}else{
mu1 = cmu1
mu2 = cmu2
t = (mu1 + mu2)/2
print(paste('new t=', t))
}
i = i+1
}
}
r = seq(1, nrow(image), by=5)
c = seq(1, ncol(image), by=5)
r[length(r)] = nrow(image)
c[length(c)] = ncol(image)
y = image
for(i in 2:length(r) ){
for(j in 2:length(c) ){
block = image[r[i-1]:r[i], c[j-1]:c[j]]
t = getT(block)
y[r[i-1]:r[i], c[j-1]:c[j]] = (block>t)+0
}
}
display(y)
The other option besides a local threshold would be to adjust for the varying illumination. There are methods that attempt to correct the illumination and make it uniform across the image. You could then use a constant threshold, or continue to use a local threshold, with perhaps better success. If the images are like the one you show, then you could use the brighter squares around the letters as the key to adjusting the illumination.

Checking to see if an image is Blank in C#

I've looked everywhere but there doesn't seem to be a standard (I could see) of how one would go about checking to see if an image is blank. In C#
I have a way of doing this, but would love to know what the correct way is of checking to see if an image is blank, so everyone could also know in the future.
I'm not going to copy paste a bunch of code in, if you want me to, it will be my pleasure, but I just first want to explain how i go about checking to see if an image is blank.
You take a .jpg image, Get the width of it. For example 500 pixels
Then you divide that by 2
giving you 250
Then you check what the colour of every pixel is in the location of (250 width, and i height) (where you iterate thought the hight of the image.
What this then do is only check the middle line of pixels of an image, vertically. It goes though all the pixels checking to see if the colour is anything Except white. I've done this so you wont have to search ALL 500*height of pixels and since you will almost always come across a colour in the middle of the page.
Its working... a bit slow...There must be a better way to do this? You can change it to search 2/3/4 lines vertically to increase your chance to spot a page that's not blank, but that will take even longer.
(Also note, using the size of the image to check if it contains something will not work in this case, since a page with two sentences on and a blank page's size is too close to one another)
After solution has been added.
Resources to help with the implementation and understanding of the solution.
Writing unsafe code - pointers in C
Using Pointers in C#
/unsafe (C# Compiler Options)
Bitmap.LockBits Method (Rectangle, ImageLockMode, PixelFormat)
(Note that on the first website, the stated Pizelformat is actually Pixelformat) - Small error i know, just mentioning, might cause some confusion to some.
After I implemented the method to speed up the pixel hunting, the speed didn't increase that much. So I would think I'm doing something wrong.
Old time = 15.63 for 40 images.
New time = 15.43 for 40 images
I saw with the great article DocMax quoted, that the code "locks" in a set of pixels. (or thats how i understood it)
So what I did is lock in the middle row of pixels of each page. Would that be the right move to do?
private int testPixels(String sourceDir)
{
//iterate through images
string[] fileEntries = Directory.GetFiles(sourceDir).Where(x => x.Contains("JPG")).ToArray();
var q = from string x in Directory.GetFiles(sourceDir)
where x.ToLower().EndsWith(".jpg")
select new FileInfo(x);
int holder = 1;
foreach (var z in q)
{
Bitmap mybm= Bitmap.FromFile(z.FullName) as Bitmap;
int blank = getPixelData2(mybm);
if (blank == 0)
{
holder = 0;
break;
}
}
return holder;
}
And then the class
private unsafe int getPixelData2(Bitmap bm)
{
BitmapData bmd = bm.LockBits(new System.Drawing.Rectangle((bm.Width / 2), 0, 1, bm.Height), System.Drawing.Imaging.ImageLockMode.ReadOnly, bm.PixelFormat);
int blue;
int green;
int red;
int width = bmd.Width / 2;
for (int y = 0; y < bmd.Height; y++)
{
byte* row = (byte*)bmd.Scan0 + (y * bmd.Stride);
blue = row[width * 3];
green = row[width * 2];
red = row[width * 1];
// Console.WriteLine("Blue= " + blue + " Green= " + green + " Red= " + red);
//Check to see if there is some form of color
if ((blue != 255) || (green != 255) || (red != 255))
{
bm.Dispose();
return 1;
}
}
bm.Dispose();
return 0;
}
If you can tolerate the chance of getting it wrong, the approach seems fine; I have done something very similar in my case, although I always had a visual confirmation to deal with errors.
For the performance, the key open question is how you are getting the pixels to test. If you are using Bitmap.GetPixel, you are bound to have performance problems. (Search for "Bitmap.GetPixel slow" in Google to see lots of discussion.)
Far better performance will come from getting all the pixels at once and then looping over them. I personally like Bob Powell's LockBits discussion for clarity and completeness. With that approach, checking all of the pixels may well be reasonable depending on your performance needs.
If you're using System.Drawing.Bitmap you can speed up things up (substantially), by:
Not using GetPixel to access the pixels, use LockBits and UnlockBits to copy the image bitmap to regular memory. See the examples on the MSDN documentation for usage.
Not calling the Width, Height or Size properties in for loop. Call Size once, store the values in a local variable and use those in the loop.
Notes:
When using System.Drawing.Bitmap your image may be in device memory and accessing it may be time consuming.
I don't remember whether loading an image into a Bitmap already converts it to RGB format as other formats are more difficult to work with, but if that is not the case you can create an RGB Bitmap of the same size as your original image, get it's Graphic object (Graphics.FromImage) and use DrawImage to draw the original image in the RGB bitmap.
Edit: Beat to the punch by DocMax.
In any case for speed you can also try using alternative libraries such as the excellent FreeImage which includes C# wrappers.
Scale the image to 1x1 then check one pixel
new Bitmap(previousImage, new Size(1, 1));

iTextSharp reporting text position incorrectly

I'm working on a text extraction system from PDF files using iTextSharp. I have already created a class that implements ITextExtractionStrategy and implemented methods like RenderText(), GetResultantText() etc. I have studied LocationTextExtractionStrategy class provided by iTextSharp itself as well.
The problem I'm facing is that for a particular PDF document, the RenderText() method reports the horizontal position of a few text chunks incorrectly. This happens for around 15-20 chunks out of a total of 700+ text chunks available on the page. I'm using the following simple code to get text position in RenderText():
Vector curBaselineStart = renderInfo.GetBaseline().GetStartPoint();
LineSegment segment = renderInfo.GetBaseline();
TextChunk location = new TextChunk(renderInfo.GetText(), segment.GetStartPoint(), segment.GetEndPoint(), renderInfo.GetSingleSpaceWidth());
chunks.Add(location);
After collecting all the text chunks, I try to draw them on a bitmap, using Graphics class and the following simple loop:
for (int k = 0; k < chunks.Count; k++)
{
var ch = chunks[k];
g.DrawString(ch.text, fnt, Brushes.Black, ch.startLocation[Vector.I1], bmp.Height - ch.startLocation[Vector.I2], StringFormat.GenericTypographic);
}
The problem happens with the X (horizontal) dimension only for these few text chunks. They appear slightly towards the left than their actual position. Was wondering if there's something wrong with my code here.
Shujaat
Finally figured this out. In PDF, computing actual text positions is more complicated than simply getting the baseline co-ordinates. You need to incorporate character and word spacing, horizontal and vertical scaling and some other factors too. I did some correspondance with iText guys and they have now incorporated a new method in TextRenderInfo class that provides actual character-by-character positions by taking care of all of the above factors.

Smooth polyline with minimal deformation

I've got a 2D closed polyline, which is reasonably smooth. The vertices that define the polyline however are not spaced equally. Sometimes two will be very close, sometimes as many as four will be very close together.
I'd like to smooth the polyline, but a regular averaging algorithm tends to shrink the area:
for (int i = 0; i < (V.Length-1); i++)
{
PointF prev = V[i-1]; //I have code that wraps the index around.
PointF next = V[i+1];
PointF pt = V[i];
float ave_x = one_third * (prev.X + next.X + pt.X);
float ave_y = one_third * (prev.Y + next.Y + pt.Y);
smooth_polyline[i] = new PointF(ave_x, ave_y);
}
My polylines contain thousands of points and the angle between two adjacent segments is typically less than 1 degree.
Is there a better way to smooth these curves, something which will space the vertices more equally, without affecting the area too much?
I think you are looking for Chaikin's Algorithm. There is a variant of this idea that makes the smoothed curve pass directly through (instead of "inside" of) the control points, but I'm having trouble googling it at the moment.
You could look at the "curve simplication" literature such as the Douglas-Peucker algorithm or this paper http://www.cs.ait.ac.th/~guha/papers/simpliPoly.pdf.
This probably won't work well if you need evenly spaced vertices even when the adjacent line segments they define are nearly collinear.
You can also use splines to interpolate - just search in wikipedia
Somebody has ported 2 smoothing algorithms to C#, with a CPOL (free) license, see here:
https://github.com/RobinCK/smooth-polyline

Categories

Resources