Analyzing scanned images in C#

Analyzing scanned images in C# - c#

I am interested in analyzing a scanned document, a form, and I want to be able to detect if someone has checked or filled in a check box in various places in the form (similar to perhaps a scantron), and maybe capture the image of a signature and such.
Since these check boxes will be at known locations it seems I might could ask for a few pixels at (x,y) and average them if its darker than N threshold then its checked. However, I imagine that scanning in could introduce a large shift in the actual position, relative to the edge of the image.
As it is clear I am a newbie in this area, does a framework exist (open source, or commercial) or any patterns or examples anyone could point me to, to start down this path. (Or might this be impossible to do in .net, and I should start looking into managed application?)

This is referred to as ICR (Intelligent Character Recognition).
It is an established field. ICR does edge detection as a skewed scan is common.
You can try and do it yourself but there is a lot to it.
Leadtools is not free and I don't work for them
But this is a good example of ICR as a tool (SDK)
LEADTOOLS ICR SDK
If you have the documents in paper another option is to take them to a commercial scan vendor.
They will have software designed for ICR.
They also have high end scanners meant to work with the ICR.

I'm not familiar with .NET image processing, but I know image processing in general. So I'll give you the theory, and references to OpenCV.
To accommodate for skewing of the image, look into Fourier transforms, and Hough Transforms and Hough Lines. What you'd basically want to do is to run the fourier transform, then turn the results into a BW image. Find the strongest lines for HoughLines, and then keep the longest of them. This line will be one of the axis lines, in my experimentation, it was usually the vertical axis. Find the angle of deviation from a straight vertical line, and then (depending on the particular rotation algorithm) rotate the image by the negative of this amount.
If the rotation algorithm fills in with 0's (or with a white that's too far off the color of the image) you can crop the image using angle found earlier to calculate the deviation (This is where all that trig you learned in school comes in handy).
Then find the bounding box that encloses the text on the page and crop down to that. When checking to see if a box is checked or not, you'll want to look in areas, probably about 5-10 pixels larger than than the size of the checkbox depending on resolution, to get checkbox ROI.
With this, you might want to see if x% of the ROI is written in to verify if the box was checked or not.

Related

Doc Stubborn and the search for Pdf User Space and Device Space Knowledge

After putting togheter enough code to parse a pdf file, i'm actually struck on how to handle the decoded stream content, describing how to "draw" the actual page content. Apart from the concepts of operators "drawing this or that, or move from here to there", which are mostly self-explanatory, i can't realize the idea of user space or device space. I simply do not understand what they are, and how should i represent them in code. Can anyone point me to a good source of technical information on the subject (maybe a book RATHER than the sea of words known as "PDF Specs")? Thank you in advance.

This is a slightly out-of-the box suggestion, but you should try reading the Apple Quartz 2D documentation. Obviously, you are not on OS X (since you have tagged c#), but I make this suggestion because the Quartz 2D drawing model is almost the same as the PDF drawing model. In fact, rendering a PDF content stream on OS X (and iOS) is very easy because, corresponding to every PDF operator is the equivalent Quartz call (using a framework call Core Graphics).
Start with this.
(The reason for this similarity is because the initial Mac OS/NextStep drawing model was based on something called Display Postscript.)
As for user space and device space - they are pretty intuitive. The device space is just the coordinate system for the device: where the origin is, and which direction the axes go. So on OS X, for example, a screen's origin is at the top left hand corner of the screen, whereas PDF Page Space has an origin (usually) on the bottom left hand corner of the page. This means that EVERY thing you draw has to be transformed appropriately, which seems pretty cumbersome, except that this so-called CTM can be applied once (in the OS X case it involves a scale transform to flip the page, and a translate to slide it down). In the Quartz case, once you have applied these two transforms to the drawing context, you can forget about the problem. I imagine that the Windows API you are using has a very similar solution.
It would be helpful to if you read the Wikipedia entry on Affine Transforms.

Marking an interest point in an image using c++

I have a bitmap image like this
My requirement is to create a GUI to load the image and for changing the contrast and other things on the image and algorithm to mark the particular area in silver colour as shown in the fig using C++ or C#.I am new to image processing and through my search I have found out that I can use the Histogram of the image for finding the required area.These are the steps.
Get the histogram
Search for intensity difference
Search for break in the line
Can someone suggest me how can I proceed from here.Can I use Opencv for this or any other efficient methods are available..?
NOTE:
This image have many bright points and the blob algorithm is not successful.
Any other suggestions to retrieve the correct coordinates of the rectangle like object.
Thanks

OpenCV should work.
Convert your input image to greyscale.
adaptiveThreshold converts it to black and white
Feature detection has a whole list of OpenCV feature detectors; choose one depending on the exact feature that you're trying to detect.
E.g. have a look at the Simple Blob Detector which lists the basic steps needed. Your silver rectangle certainly qualifies as "simple blob" (no holes or other hard bits)

If all of your pictures look like that, it seems to me not complicate to segment the silver area and find its centre. Basically you will need to apply these algorithms in the sequence below:
I would suggest binaryze the image using Otsu adaptive threshold algorithm
Apply a labelling (blob) algorithm
If you have some problem with noise you can use an opening filter or median before the blob algorithm
If you end up with only one blob (with the biggest area I guess) use moment algorithm to find its centre of mass. Then you have the X,Y coordinate you are looking for
These algorithms are classical image processing, I guess it wouldn't be hard to find then. In any case, I may have they implemented in C# and I can post here latter in case you think they solve your problem.

May be a research on Directshow, a multi media framework from Microsoft will help you to accomplish your task.

Get all pixel in shape

i have 4 shapes in image
i want to get pixels of one shape in list of point
the shapes have same color
List<point> GetAllPixelInShape(point x)
{
//imp
}
where x point of this shape

Long story short, you could begin with a connected components / region labeling algorithm.
http://en.wikipedia.org/wiki/Connected-component_labeling
In OpenCV you can call findContours() to identify contours, which are the borders of your connected regions.
http://dasl.mem.drexel.edu/~noahKuntz/openCVTut7.html
OCR is an extremely difficult task, especially for a script like Arabic. Creating an OCR algorithm from scratch takes a lot of work and numerous algorithms working together. OCR for machine printed text is hard enough. Implementing an algorithm to read handwriting is not something I'd suggest trying until you have a year or two of image processing experience. If you haven't read textbooks and academic papers on OCR, you're likely to spend a lot of time reproducing work that has already been done.
If you're not familiar with contour tracing and/or blob analysis, then working with OpenCV may not be a good first step. Since you have a specific goal in mind, you might first try different algorithms in a user-friendly GUI that will save you coding time.
Consider downloading ImageJ so that you can see how the algorithms work. There are plugins for a variety of common image processing algorithms.
http://rsbweb.nih.gov/ij/

Your proposed method signature doesn't provide enough information to solve this. Your method will need to know the bounds of your shape, how long and wide it is etc, ideally a set of points that indicate those bounds.
Once you have those, you could potentially apply the details of this article, in particular the algorithms specified in the answer to solve your problem.

Continued - Vehicle License Plate Detection

Continuing from this thread:
What are good algorithms for vehicle license plate detection?
I've developed my image manipulation techniques to emphasise the license plate as much as possible, and overall I'm happy with it, here are two samples.
Now comes the most difficult part, actually detecting the license plate. I know there are a few edge detection methods, but my maths is quite poor so I'm unable to translate some of the complex formulas into code.
My idea so far is to loop through every pixel within the image (for loop based on img width & height) From this compare each pixel against a list of colours, from this an algorithm is checked to see if the colors keep differentiating between the license plate white, and the black of the text. If this happens to be true these pixels are built into a new bitmap within memory, then an OCR scan is performed once this pattern has stopped being detected.
I'd appreciate some input on this as it might be a flawed idea, too slow or intensive.
Thanks

Your method of "see if the colors keep differentiating between the license plate white, and the black of the text" is basically searching for areas where the pixel intensity changes from black to white and vice-versa many times. Edge detection can accomplish essentially the same thing. However, implementing your own methods is still a good idea because you will learn a lot in the process. Heck, why not do both and compare the output of your method with that of some ready-made edge detection algorithm?
At some point you will want to have a binary image, say with black pixels corresponding to the "not-a-character" label, and white pixels corresponding to the "is-a-character" label. Perhaps the simplest way to do that is to use a thresholding function. But that will only work well if the characters have already been emphasized in some way.
As someone mentioned in your other thread, you can do that using the black hat operator, which results in something like this:
If you threshold the image above with, say, Otsu's method (which automatically determines a global threshold level), you get this:
There are several ways to clean that image. For instance, you can find the connected components and throw away those that are too small, too big, too wide or too tall to be a character:
Since the characters in your image are relatively large and fully connected this method works well.
Next, you could filter the remaining components based on the properties of the neighbors until you have the desired number of components (= number of characters). If you want to recognize the character, you could then calculate features for each character and input them to a classifier, which usually is built with supervised learning.
All the steps above are just one way to do it, of course.
By the way, I generated the images above using OpenCV + Python, which is a great combination for computer vision.

Colour, as much as looks good, will present quite some challenges with shading and light conditions. Depends really how much you want to make it robust but real world cases have to deal with such issues.
I have done research on road footage (see my profile page and look here for sample) and have found that the real-world road footage is extremely noisy in terms of light conditions and your colours can change from Brown to White for a yellow back-number-plate.
Most algorithms use line detection and try to find a box with an aspect ratio within an acceptable range.
I suggest you do a literature review on the subject but this was achieved back in 1993 (if I remember correctly) so there will be thousands of articles.
This is quite a scientific domain so just an algorithm will not solve it and you will needs numerous pre/post processing steps.
In brief, my suggestion is to use Hough transform to find lines and then try to look for rectangles that could create acceptable aspect ratio.
Harris feature detection could provide important edges but if the car is light-coloured this will not work.

If you have a lot of samples, you could try to check face detection method developed by Paul Viola and Michael Jones. It's good for face detection, maybe it'll do fine with license plate detection (especially if combined with some other method)

Finding Sub-Images in a image

I have a fairly simple situation. I just don't know any specific terms to search for.
I have a single image, in that image I have several other images that follow a basic pattern.
They are rectangles and will possibly have landmark image to base things off of.
An important part, is that I need to detect rotated/mis-scaled sub-images.
Basically what I need to be able to do is split 'business cards' from a single image into properly aligned single images.
As I am also designing the cards to be scanned I can put in whatever symbol or something that would make detection easier (as I said a landmark)

If your example is representative (which I doubt for some reason) then Hough transform is your friend (google it, there are plenty of explanations and code around). With it you'll be able to detect the rectangles.
Some examples of Hough transform in C# are http://www.koders.com/csharp/fid3A88BC1FF95FCA9D6A182698263A40EE7883CF26.aspx and http://www.shedletsky.com/hough/index.html
If what actually happens is that you scan some cards, and you have some control over the process, then I'd suggest that you ensure there is no overlap between cards, and provide a contrasting background (something very different from the cards). Then any edge-detection will get you close enough to what you've drawn in your example, and after that you can use Hough transform.
Alternatively, you can implement the paper http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.4239 which uses Hough transform to detect rectangles directly, without edge detection.
If I did not understand your problem, or you need clarifications, please edit your question further and post a comment on this answer.

Try AForge.NET (if you are using C#). It has DocumentSkewChecker which will calculate the angle of rotated image.

You can try ExhaustiveTemplateMatching class of AForge.Net

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.