I have the following needs. There are some number of forms, i.e blanks - for example the ones used in surveys. The ones which aren't filled with information, I will call image templates from now on. Apart from the image templates, I have also many images, which are essentially the image templates filled with information. For example, there is a survey and there are two blanks for filling - these are the image templates. Many people have filled the blanks with their personal information and these are the images.
The image templates are scanned in a perfect shape. But many of the scanned images are tilted or not properly aligned, or maybe scaled. So I have the following requirement - every image must be recognized to which image template it belongs. After it is recognized, it must be properly skewed, aligned and scaled to the image template.
I know this is a complex task and that's why I need a library, preferably a C# one. I have found AForge, but till now I have only seen a suitable method for skewing. Essentially I need a library which takes as input an image template and an image, and sets a flag if the image does not match to the image template. But if it matchs it must return the appropriate skew angle, alignment and scaling.
If you have any ideas or used such a library, I will appreciate it greatly.
Wish you all the best,
Petar
The problem seems to be an image registration problem coupled with some template matching problem.
image registration
Depending on how the scanned document may be distorted (scale factor, rotation, skew...) one can register images using simple rigid transform (i.e. translation + rotation, only two corresponding points are needed) to more complex one such as non rigid transform (more corresponding point are needed). The corresponding points can be manually given but ideally should be automatically detected.
ITK library includes several methods for image registration
template matching problem
Once your images are aligned, the comparison between an image and possible templates database could be achieved by first extracting characteristic features in the image and comparing them to your template database. This is very general and should be refined with respect to the image used.
There is other way that combine both image registration and template matching
the Bag Of Features approach which consists in extracting interesting points (robust to several types of image deformation) from the image, the points generate a signature that characterizes the image, the image comparison being in fact a signature comparison.
I used to work for a company, Accusoft Pegasus, which has some interesting forms recognition software. I've not seen their FormFix tool in action in a few years, but it should be able to do what you are needing.
Related
Using SharpMap (Windows Forms) how can I display an overlay image over the map tile background so that the image scales and moves accordingly with the map zoom and pan? I want to implement a georeferencing like functionality.
I tried using SharpMap.Layers.GdiImageLayer but I didn't find a way to position it to a custom position (real world coordinate) - it is getting placed by default at origin [0,0] (the background tile coordinates). Is there a way to define a relation (transformation) to place it in a particular position?
I managed to get closer to a solution using VectorLayer with RasterPointSymbolizer but the symbol (image loaded for symbol) would need to be transformed on map scaling/zoom (so that corners remain in the same real world coordinates) - the math calculation behind looks a little complicated for a solution that seems more like a workaround than a natural one. As a note - I don't need precise calculation using projection/real geoid as I am doing this only at building level.
Using GDAL - GeoTIFF might be a choice here by generating a GeoTIFF based on the original image and making the georeferencing metadata dynamic based on UI controls? The original image is in raster format (JPG) with no geographic metadata.
Is there a better solution?
If VectorLayer - RasterPointSymbolizer is the best choice, do you have an example for symbol(image) synchronization with the map view?
As far as i can tell the GdiImageLayer calculates the position based on a world-file (GdiImageLayer.SetEnvelope()). Here is the link to the method. This meta file has to have the same name in the same location but the filetype must be of type *.wld. There you set the position (top left corner) and the skew in x and y axis. So rotation will be perserved. I never tried this but i would suggest correct zooming would work
(The wiki article offers also some good understanding for world files)
There might be another way if no world file is present and the information rests in the header. There is a GDalSample loading a GeoTiff Image onto an existing map. The example can be found in the WinFormSamples and the Different kind of layers supported by [MapBox] Example. The map is showing the overlay only while magnifying and not fully function but maybe a good hint
I have an image that is a depth heatmap that I've filtered out anything further away than the first 25% of the image.
It looks something like this:
There are two blobs of color in the image, one is my hand (with part of my face behind it), and the other is the desk in the lower left corner. How can I search the image to find these blobs? I would like to be able to draw a rectangle around them if possible.
I can also do this (ignore shades, and filter to black or white):
Pick a random pixel as a seed pixel. This becomes area A. Repeatedly expand A until A doesn't get any bigger. That's your area.
The way to expand A is by looking for neighbor pixels to A, such that they have similar color to at least one neighboring pixel in A.
What "similar color" means to you is somewhat variable. If you can make exactly two colors, as you say in another answer, then "similar" is "equal". Otherwise, "similar" would mean colors that have RGB values or whatnot where each component of the two colors is within a small amount of each other (i.e. 255, 128, 128 is similar to 252, 125, 130).
You can also limit the selected pixels so they must be similar to the seed pixel, but that works better when a human is picking the seed. (I believe this is what is done in Photoshop, for example.)
This can be better than edge detection because you can deal with gradients without filtering them out of existence, and you don't need to process the resulting detected edges into a coherent area. It has the disadvantage that a gradient can go all the way from black to white and it'll register as the same area, but that may be what you want. Also, you have to be careful with the implementation or else it will be too slow.
It might be overkill for what you need, but there's a great wrapper for C# for the OpenCV libraries.
I have successfully used OpenCV in C++ for blob detection, so you might find it useful for what you're trying to do.
http://www.emgu.com/wiki/index.php/Main_Page
and the wiki page on OpenCV:
http://en.wikipedia.org/wiki/OpenCV
Edited to add: Here is a blobs detection library for Emgu in C#. There is even some nice features of ordering the blobs by descending area (useful for filtering out noise).
http://www.emgu.com/forum/viewtopic.php?f=3&t=205
Edit Again:
If Emgu is too heavyweight, Aforge.NET also includes some blob detection methods
http://www.aforgenet.com/framework/
If the image really is only two or more distinct colours (very little blur between colours), it is an easy case for an edge detection algorithm.
You can use something like the code sample from this question : find a color in an image in c#
It will help you find the x/y of specific colors in your image. Then you could use the min x/max x and the min y/max y to draw your rectangles.
Detect object from image based on object color by C#.
To detect a object based on its color, there is an easy algorithm for that. you have to choose a filtering method. Steps normally are:
Take the image
Apply ur filtering
Apply greyscalling
Subtract background and get your objects
Find position of all objects
Mark the objects
First you have to choose a filtering method, there are many filtering method provided for C#. Mainly I prefer AForge filters, for this purpose they have few filter:
ColorFiltering
ChannelFiltering
HSLFiltering
YCbCrFiltering
EuclideanColorFiltering
My favorite is EuclideanColorFiltering. It is easy and simple. For information about other filters you can visit link below. You have to download AForge dll for apply these in your code.
More information about the exact steps can be found here: Link
Basically I want to find the pixel location of a small image inside a large image.
I have searched for something similar to this but have had no luck.
It depends on how similar you want the result to match your query image. If you're trying to match corresponding parts of different photorealistic images, take a look at the Feature detection Wikipedia page. What you want to use depends on the transformation you expect one image to undergo to become the other.
That said, if you are looking for an exact pixel-by-pixel match, a brute-force search is probably bad. That can be O(m^2*n^2) for an m*m image used to search within an n*n image. Using better algorithms, it can be improved to O(n^2), linear in the number of pixels. Downsampling both images and doing a hierarchical kind of search might be a good approach.
You could probably use the AForge Framework to do something like this. It offers a variety of image processing tools. Possibly you could use their blob extraction to extracts blobs then compare those blobs to a stored image you have and see if they match.
If the images are pixel-by-pixel equal, you could start by searching for one pixel that has the same color as pixel (0,0) in the small image. Once found, compare each pixel in the area that would be covered by the small image. If there are no differences you found your position. Else start over by searching for the next pixel matching (0,0).
Booyer-Moore search sounds like a solution here if you treat your pixels as characters and are looking for an exact match. Much faster than per pixel searching as well.
My program is working with fax documents stored as separate bitmaps
I wonder if there is a way to detect automatically page orientation (vertical or horizontal) to show image preview for user in right order (meant rotate if neccesary)
Any advices much appreciated!
EDIT: Clarification:
When Faxmachine receives multi-page document it saves each page as separate TIFF file.
My app has built-in viewer displaying those files. All files are scaled to A4 format and saved in TIFF (so there is no change to detect orientation by height/width parameters)
My viewer displays images in portrait mode by default
What I'd like to do is automagically detect situation when org document was printed in landscape mode (eg wide Excel tables) then I'd like to show rotated preview for end user to speed up preview process
Obviously there are 4 possible fax orientation portrait / landscape x 2 kinds of rotations.
I'm even interested simplified solution detecting when org doc was landscape or portrait (I've noticed most of landscape docs needs to be rotated clockwise)
EDIT2: Idea
I think it might be some idea:
If I could draw horizontal and vertical lines and check if line doesn't cut any (black) point. Then we can compare what are more type of lines (horizontal or vertical) and his decides about page orientation.
What do you think ?
You could perform a Fast Fourier Transform (FFT) to convert your spatial image to a frequency/angle representation. Then find the angle with the most prominent frequency. It sounds complicated but it's not that hard, it's pretty efficient, and in effect it tests every possible angle at once, instead of being a hard-coded hack that only works for specific angles. Search for a sample implementation with search terms like Numerical Recipes and FFT.
You'd need OCR for that. Rolling your own OCR would be a bit difficult, but there might be library or something out there worth looking into? Also, even with good OCR, it's not a 100% reliable solution.
I wonder if there are some properties of text you could use to help you do this.
For instance based on a quick glance, there are far more vertical lines in text (l,j,k,m,n etc) than horizontal ones so maybe you could start with this.
But even detecting these isn't straightforward, you'd need to use some sort of filter like a Sobel or Prewitt. They both have horizontal and vertical versions, see here for more info.
Of course the vertical/horizontal lines of an excel spreadsheet would be the strongest edges so you'd have to ignore these and look only at the text.
Alternative: Can you not just give the user an easy way to rotate the images, like the arrows in Windows Picture viewer or just show 4 thumbnail previews they can click on. You might need to cache the 4 versions (if you are rotating) so it's quick, but only if speed turns out to be an issue?
Here's a paper entitled "Combined Script and Page Orientation Estimation using
the Tesseract OCR engine" [pdf]
I haven't been able to find an implementation of their work, but the approach looks good to me:
The basic idea behind the proposed approach is simple.
A shape classifier is trained on characters (classes) from all the scripts of interest. At run-time, the classifier is run independently on each connected component (CC) in the image and the process is repeated after rotating each CC into three other candidate orientations (90°, 180° and 270° from the input orientation).
The algorithm keeps track of the estimated number of characters in each script for a given orientation, and the accumulated classifier confidence score across all candidate orientations. The estimate of page orientation is chosen as the one with the highest cumulative confidence score, and the estimate of script is chosen as the one with the highest number of characters in that script for the best orientation estimate.
At some long-ago Flash conferences I recall seeing a demo of a Flash app that had a color picker. Based on the user's color choice the app would show the user a set of images within the approximate range of that color: a bunch of mostly red images, a bunch of mostly blue images, etc.
I'm looking for two things:
1) A link to a demo of this sort of app, ideally a Flash app
2) ActionScript or C# code that describes how to pick a bunch of images that fall within a color range.
I know how to extract the aggregate/average RGB from individual images and persist this info to a database. I need to know how exactly to select out images within a certain range of color tolerance. Could this be done purely using SQL and a knowledge of the alphanumeric assignments of RGB color codes, or is there a better way?
I could not find any sample code, but found an article that gives a high-level explanation of their process (from this other page about Flickr's feature to search for images with similar colors). Apparently, Google also lets you do this with their image search (but I don't know if that is from metadata tags or actual color matching).
Now to the actual answer:
Instead of just storing the average or aggregate color for an image, you will need to store a "color signature" of the image.
My first (educated guess) idea would entail these steps:
Generate the histogram for each color band from the image
Generate some factors that describe each histogram curve (mean, variance, std-dev, etc? -- these factors will make up your digital signature of your image)
Store those factors in your database (and each of these factors would have an index in the DB)
Then, you would take your input (either a color, range of colors, or source image), run your histogram algorithm against that source, and search for matches to your computed factors.
The Flickr Hacks solution I cite in the comments is the best I've found: it involves resizing the image to 1x1 pixels using common algorithms which gives you an average color for the entire image. Clever.
You can use a set of descriptors for each image, then match those.
There's some great work here on it.. for C#.
http://savvash.blogspot.com/p/compact-composite-descriptors.html
Also check out the free img(Rummager) tool.. which can do what you want(ie find images matched by colour).