How to detect image orientation (text) - c#

My program is working with fax documents stored as separate bitmaps
I wonder if there is a way to detect automatically page orientation (vertical or horizontal) to show image preview for user in right order (meant rotate if neccesary)
Any advices much appreciated!
EDIT: Clarification:
When Faxmachine receives multi-page document it saves each page as separate TIFF file.
My app has built-in viewer displaying those files. All files are scaled to A4 format and saved in TIFF (so there is no change to detect orientation by height/width parameters)
My viewer displays images in portrait mode by default
What I'd like to do is automagically detect situation when org document was printed in landscape mode (eg wide Excel tables) then I'd like to show rotated preview for end user to speed up preview process
Obviously there are 4 possible fax orientation portrait / landscape x 2 kinds of rotations.
I'm even interested simplified solution detecting when org doc was landscape or portrait (I've noticed most of landscape docs needs to be rotated clockwise)
EDIT2: Idea
I think it might be some idea:
If I could draw horizontal and vertical lines and check if line doesn't cut any (black) point. Then we can compare what are more type of lines (horizontal or vertical) and his decides about page orientation.
What do you think ?

You could perform a Fast Fourier Transform (FFT) to convert your spatial image to a frequency/angle representation. Then find the angle with the most prominent frequency. It sounds complicated but it's not that hard, it's pretty efficient, and in effect it tests every possible angle at once, instead of being a hard-coded hack that only works for specific angles. Search for a sample implementation with search terms like Numerical Recipes and FFT.

You'd need OCR for that. Rolling your own OCR would be a bit difficult, but there might be library or something out there worth looking into? Also, even with good OCR, it's not a 100% reliable solution.

I wonder if there are some properties of text you could use to help you do this.
For instance based on a quick glance, there are far more vertical lines in text (l,j,k,m,n etc) than horizontal ones so maybe you could start with this.
But even detecting these isn't straightforward, you'd need to use some sort of filter like a Sobel or Prewitt. They both have horizontal and vertical versions, see here for more info.
Of course the vertical/horizontal lines of an excel spreadsheet would be the strongest edges so you'd have to ignore these and look only at the text.
Alternative: Can you not just give the user an easy way to rotate the images, like the arrows in Windows Picture viewer or just show 4 thumbnail previews they can click on. You might need to cache the 4 versions (if you are rotating) so it's quick, but only if speed turns out to be an issue?

Here's a paper entitled "Combined Script and Page Orientation Estimation using
the Tesseract OCR engine" [pdf]
I haven't been able to find an implementation of their work, but the approach looks good to me:
The basic idea behind the proposed approach is simple.
A shape classifier is trained on characters (classes) from all the scripts of interest. At run-time, the classifier is run independently on each connected component (CC) in the image and the process is repeated after rotating each CC into three other candidate orientations (90°, 180° and 270° from the input orientation).
The algorithm keeps track of the estimated number of characters in each script for a given orientation, and the accumulated classifier confidence score across all candidate orientations. The estimate of page orientation is chosen as the one with the highest cumulative confidence score, and the estimate of script is chosen as the one with the highest number of characters in that script for the best orientation estimate.

Related

Render DMD in Unity3D

I want to render a custom display from an emulation. Think like a dot matrix display from pinball machines.
How would i effectively go about this? (Think about actually writing to a texture that size will probably run way too slow)
There has to be a good way to get this to render, but i have trouble finding a way that actually performs properly as well.
There are many options to do this but without further details (DMD screen resolution, number of colors, animated or not, etc) it's not easy to help. Here are a bunch of options popped into my mind, hope the one you are looking for is somewhere here :)
1) There was a similar question, you can find it along with the answer here
2) If you want to display text only, there's a wide range of sites offering DMD fonts for free, e.g. here
3) You can also edit/extend the font set you download and display 'special characters' as graphics, or just use the standard ASCII table for the purpose if that's enough for your needs. e.g. ▓ █ ╔ ═ ╗ and similar "drawing characters"
You can find inspiration and ASCII art (including animated ones) e.g. here
4) Might be slow (again, "depends") but you can go for bitmap and .SetPixels with a Texture2D and DrawTexture
5) A bit "hacky", but you can save your anim phases into either bitmap data/array (readonly/constant variables for example, or read from disc in a managed way, or draw with the help of a free asset from the store, like this one here, etc) and do Graphics.DrawTexture
6) If the thing you want to display is 100% static (i.e. it's not actual data like score, but "hardcoded" animations like "TILT" text or such), you can create a Sprite Animation
7) You can mix the above and e.g. go for a font (#2) to display dynamic data on a canvas, and play the static animation around it making it look like the whole thing is dynamic
Hm. That's all right off the top of my head :)
Hope this helps!

DrawString Not Scaling Properly With Transform

I'm essentially re-writing a document viewer with markups to move away from a COTS product and so far everything has been working VERY well. My code is based off of Mark Miller's Extensions to DrawTools (http://www.codeproject.com/Articles/17893/Extensions-to-DrawTools).
The old viewer stores pages and their markups based on x/y coordinates in inches and I have had NO trouble converting this to a pixel-based coordinate system and converting lines, boxes etc to the new viewer. The lines and boxes show up exactly where they are supposed to and have the correct size.
The problem has been displaying text markups, no matter what I do they always end up MUCH smaller than they should be.
I'm doing:
UserControl->OnPaint()
Create a Matrix Transform for:
Scale
Rotate
Translate
Apply Matrix to Graphics Object
Call method that draws the Page Image and then all of the Markups.
I have the X/Y Coords and Font Size of the Text to draw, and the resulting string DOES end up at the correct coordinates but the text is WAY too small. The really bizarre part about this is the original viewer is written in .Net so I know that the Font and Size SHOULD relate especially since everything else scales so well.
Here is an example of what I'm talking about. Please ignore the BackColor and Border of the "This is some Text", I haven't gotten around to getting that transformed yet since I've been so focused on getting the TEXT right.
Original:
My Result:
I ended up having to rework everything to be Inch Unit based. The font simply doesn't have an easy way of scaling between the units and the Inch Unit turned out to be the easiest solution.

Drawing a waveform in C#

I want to be able to display a WaveForm in C#, along with some simple features such as zooming and selection. I already have the data as a short[] of amplitude values.
However, I am an amateur when it comes to hardcoding GUI. I have already found a possible helper class WaveFormClass that may help me achieve this but as a backup, I want to learn how to manually do it.
So may I ask for some methods and possibly some links that will help? Thanks!
NAudio has a WPF sample app that displays waveforms - you can get the source code from codeplex, the author also has an article about the topic here.
As with any chart, you'll have to iterate through X values and draw appropriate Y value taken from the sample array that you have.
If you will want to pan left and right through the audio, you'll have to offset getting the data from the array. If you will ad zoom out capability - so one pixel on the screen corresponds to some samples (try with integer numbers for start), you'll have to average some values and then draw appropriate value.
If word PIXEL and SAMPLE isn't yet in your vocabulary, before drawing the waveform you should get familiar with them, because no amount of others' people code will teach you how to do it.

image focus calculation

I'm trying to develop an image focusing algorithm for some test automation work. I've chosen to use AForge.net, since it seems like a nice mature .net friendly system.
Unfortunately, I can't seem to find information on building autofocus algorithms from scratch, so I've given it my best try:
take image. apply sobel edge detection filter, which generates a greyscale edge outline. generate a histogram and save the standard dev. move camera one step closer to subject and take another picture. if the standard dev is smaller than previous one, we're getting more in focus. otherwise, we've past the optimal distance to be taking pictures.
is there a better way?
update: HUGE flaw in this, by the way. as I get past the optimal focus point, my "image in focus" value continues growing. you'd expect a parabolic-ish function looking at distance/focus-value, but in reality you get something that's more logarithmic
update 2: okay, so I went back to this and the current method we're exploring is given a few known edges (okay, so I know exactly what the objects in the picture are), I do a manual pixel intensity comparison. as the resulting graph gets steeper, I get more in focus. I'll post code once the core algorithm gets ported from matlab into c# (yeah, matlab.. :S)
update 3: yay final update. came back to this again. the final code looks like this:
step 1: get image from the list of images (I took a hundred photos through the focused point)
step 2: find an edge for the object I'm focusing (In my case its a rectangular object that's always in the same place, so I crop a HIGH and NARROW rectangle of one edge)
step 3: get the HorizontalIntensityStatistics (Aforge.net class) for that cropped image.
step 4: get the Histogram (gray, in my case)
step 5: find the derivative of the values of the histogram
step 6: when your slope is the largest, is when you're in the most focused point.
You can have a look at the technique used in the NASA Curiosity Mars Rover.
The technique is described in this article
EDGETT, Kenneth S., et al. Curiosity’s Mars Hand Lens Imager (MAHLI) Investigation. Space science reviews, 2012, 170.1-4: 259-317.
which is available as a PDF here.
Quoting from the article:
7.2.2 Autofocus
Autofocus is anticipated to be the primary method by which MAHLI is
focused on Mars. The autofocus command instructs the camera to move to
a specified starting motor count position and collect an image, move a
specified number of steps and collect another image, and keep doing so
until reaching a commanded total number of images, each separated by a
specified motor count increment. Each of these images is JPEG
compressed (Joint Photographic Experts Group; see CCITT (1993)) with
the same compression quality factor applied. The file size of each
compressed image is a measure of scene detail, which is in turn a
function of focus (an in-focus image shows more detail than a blurry,
out of focus view of the same scene). As illustrated in Fig. 23, the
camera determines the relationship between JPEG file size and motor
count and fits a parabola to the three neighboring maximum file sizes.
The vertex of the parabola provides an estimate of the best focus
motor count position. Having made this determination, MAHLI moves the
lens focus group to the best motor position and acquires an image;
this image is stored, the earlier images used to determine the
autofocus position are not saved.
Autofocus can be performed over the
entire MAHLI field of view, or it can be performed on a sub-frame that
corresponds to the portion of the scene that includes the object(s) to
be studied. Depending on the nature of the subject and knowledge of
the uncertainties in robotic arm positioning of MAHLI, users might
elect to acquire a centered autofocus sub-frame or they might select
an off-center autofocus sub-frame if positioning knowledge is
sufficient to determine where the sub-frame should be located. Use of
sub-frames to perform autofocus is highly recommended because this
usually results in the subject being in better focus than is the case
when autofocus is applied to the full CCD; further, the resulting
motor count position from autofocus using a sub-frame usually results
in a more accurate determination of working distance from pixel scale.
The following is Figure 23:
This idea was suggested also in this answer: https://stackoverflow.com/a/2173259/15485
It may be a bit simplistic for your needs, but I've had good results with a simple algorithm that looks at the difference to neighbouring pixels. The sum of the difference of pixels 2-away seems to be a reasonable measure of image contrast. I couldn't find the original paper by Brenner in the 70's but it is mentioned in http://www2.die.upm.es/im/papers/Autofocus.pdf
Another issue is when the image is extremely out of focus, there is very little focus information, so it's hard to tell which way is 'moving closer' or to avoid a local maximum.
I haven't built one myself, but my first thought would be to do a 2D DFT on a portion of the image. When out of focus, high frequencies will disappear automatically.
For a lazy prototype, You could try to compress a region of the image with JPEG (high quality), and look at the output stream size. A big file means a lot of detail, which in turn implies the image is in focus. Beware that the camera should not be too noisy, and that you can't compare file sizes across different scenes of course.
This might be useful. It's how camera's AF system actually works - Passive Autofocus
Contrast measurement
Contrast measurement is achieved by
measuring contrast within a sensor
field, through the lens. The intensity
difference between adjacent pixels of
the sensor naturally increases with
correct image focus. The optical
system can thereby be adjusted until
the maximum contrast is detected. In
this method, AF does not involve
actual distance measurement at all and
is generally slower than phase
detection systems, especially when
operating under dim light. As it does
not use a separate sensor, however,
contrast-detect autofocus can be more
flexible (as it is implemented in
software) and potentially more
accurate. This is a common method in
video cameras and consumer-level
digital cameras that lack shutters and
reflex mirrors. Some DSLRs (including
Olympus E-420, Panasonic L10, Nikon
D90, Nikon D5000, Nikon D300 in Tripod
Mode, Canon EOS 5D Mark II, Canon EOS
50D) use this method when focusing in
their live-view modes. A new
interchangeable-lens system, Micro
Four Thirds, exclusively uses contrast
measurement autofocus, and is said to
offer performance comparable to phase
detect systems.
While the sobel is a decent choice, I would probably choose to do an edge magnitude calculation on the projections in x and y directions over several small representative regions. Another .NET friendly choices based on OpenCV is # http://www.emgu.com/wiki/index.php/Main_Page.
I wonder if the standard deviation is the best choice: If the image gets sharper, the sobel filter image will contain brighter pixels at the edges, but at the same time fewer bright pixels, because the edges are getting thinner. Maybe you could try using an average of the 1% highest pixel values in the sobel image?
Another flavor for focus metric might be:
Grab several images and average them ( noise reduction).
Then FFT the averaged image and use the high to low frequencies energy ratio.
The higher this ration the better focus. A Matlab demo is available (excluding the averaging stage) within the demos of the toolbox :)

Image browse by color/color range: need examples and/or code

At some long-ago Flash conferences I recall seeing a demo of a Flash app that had a color picker. Based on the user's color choice the app would show the user a set of images within the approximate range of that color: a bunch of mostly red images, a bunch of mostly blue images, etc.
I'm looking for two things:
1) A link to a demo of this sort of app, ideally a Flash app
2) ActionScript or C# code that describes how to pick a bunch of images that fall within a color range.
I know how to extract the aggregate/average RGB from individual images and persist this info to a database. I need to know how exactly to select out images within a certain range of color tolerance. Could this be done purely using SQL and a knowledge of the alphanumeric assignments of RGB color codes, or is there a better way?
I could not find any sample code, but found an article that gives a high-level explanation of their process (from this other page about Flickr's feature to search for images with similar colors). Apparently, Google also lets you do this with their image search (but I don't know if that is from metadata tags or actual color matching).
Now to the actual answer:
Instead of just storing the average or aggregate color for an image, you will need to store a "color signature" of the image.
My first (educated guess) idea would entail these steps:
Generate the histogram for each color band from the image
Generate some factors that describe each histogram curve (mean, variance, std-dev, etc? -- these factors will make up your digital signature of your image)
Store those factors in your database (and each of these factors would have an index in the DB)
Then, you would take your input (either a color, range of colors, or source image), run your histogram algorithm against that source, and search for matches to your computed factors.
The Flickr Hacks solution I cite in the comments is the best I've found: it involves resizing the image to 1x1 pixels using common algorithms which gives you an average color for the entire image. Clever.
You can use a set of descriptors for each image, then match those.
There's some great work here on it.. for C#.
http://savvash.blogspot.com/p/compact-composite-descriptors.html
Also check out the free img(Rummager) tool.. which can do what you want(ie find images matched by colour).

Categories

Resources