I'm looking for some hints how to manage to get single character (in this case number) from screenshot (from flash game). I've tried almost all most popular commercial/non-commercial software but none of them can handle with a single character. I've been trying Office Document Imaging, LeadTools, Tesseract-OCR, Aspose.OCR. I enclosed the image, a screenshot from the game with given number:
http://desmond.imageshack.us/Himg831/scaled.php?server=831&filename=44739012.png&res=landing
For me it was obvious that OCR will work but I've been surprised, OCR can't handle with this.
Well you think it's impossible to use OCR to extract this number? Maybe you know some solution for my problem?
Another option is to use image comparing method but this is too slow, I'd prefer not use it.
Microsoft OneNote automatically parses all characters in images stored in it. Just paste in an image (or use Windows-S to fire up OneNotes screen grab).
Just right click and use Get Text from Picture.
Although it thinks your image contains
‚3
Related
I was wondering if it is possible to find the coordinates of a specific Run (text, no drawing or other elements that have offset parameter) on a page in a Word document using OpenXML SDK. I know that OpenXML is basically .. well XML, and simple runs have no relative, numerical position embedded in them.
I was reading through OpenXML SDK API and found no clues but maybe I have missed something. By coordinates I mean any tuple that can be mapped to pixels if I would generated an image out of the page (imagine you made a screenshot of page)
I suspect, if this is possible, it is not trivial.
Appreciate your help!
The Open XML SDK does not include this functionality. This would require a layout engine, which is not part of the SDK.
Word is not a page layout program, it's a word processor. Therefore:
No, it's not possible because...
The Word application dynamically lays out a page when it's opened in the Word application. Exactly how it's layed out and where things appear on-screen (or on the printed page) depends on how Word calculates font size as well as line, character and paragraph spacing (in all directions) for the currently selected printer driver. So it can vary and thus cannot be saved in the Open XML file.
I am trying to OCR and extract the email form the images. The images are supposed to have one line of text which is the email address.
I am using EmguCV.OCR to extract the text (email address) from those images. The target is to have 100% accurate result.
We can fix the font and size of the text. For example Ariel, 12pt, so that all the images will have email written in Ariel 12pt with black on white background.
The problem is that Tesseract OCR in EmguCV is not recognizing the text properly. It recognizes only 80% of the characters accurately.
I am using preprocessing with Leptonica library.
Here are some sample images I am trying to recognize.
Is there any way to achieve the target of 100% accuracy
With those sample images I can suggest two ways to solve the same problem. In those images JPEG artifacts are present (the result of lossy compression). Because of this, the letters are becoming connected to each other (zoom in on the image in a program where you can see the actual pixels, windows photo viewer worked fine for me). TesseractOCR relies on spacing between letters (it uses connected components) to do character recognition. Have any pieces connected throws off the recognition process which means it tries to recognize the combination of "co" as one letter.
Two possible solutions:
I'm not sure what preprocessing steps are already being done, but you'll want to do some thresholding to removing the lighter shades on the image (disconnecting the characters). However, you have to be careful with this as it may remove more than what you want.
If at any time during this process you have a higher resolution image, or a non-jpeg/lossy format (i.e. png), then keep it in this format as you do other processing steps. Try to avoid any lossy compression that might happen. It sounds like these images don't come to you as shown above. This is the preferable solution as you wont risk losing too data.
I tried to recognize your images with ABBYY Cloud OCR SDK and got 100% accuracy.
You can use Demo Tool to make sure of recognition accuracy.
I work for ABBYY and can give you more information about our technologies if you need.
I have following scanned document, with the logo on it, and I have another black and white image with same logo and style (Shown in black and white color below).
How do I make sure that the logo is present on this image or not?
Usually I will have many scanned documents, OCR will pickup MTNL, but sometimes these logos are just made up of symbols not recognized easily by OCR.
Size and position of logos change, they are not fixed many times. They may be placed anywhere on the document.
I want to organize and catalog scanned images based on the logos and symbols present. Most documents may or may not be in english, may or may not contain any bar codes, in such case logo match will help.
I have seen Aforge.NET library, but I am not very much sure which methods to combine to do search. Pixels search is very slow and fails if source destination are of different size.
I have heard that YouTube does some sort of Histogram or Heat Signature match to see if the video contains any copyrighted material. I will be helpful if someone can guide me in this case.
My ideal choice would be C# and Aforge.NET, otherwise some command line tool will be appreciated.
You can try using Aforge.net
Check these links
1) http://www.aforgenet.com/articles/shape_checker/
2) http://www.codeproject.com/Articles/9727/Image-Processing-Lab-in-C
3) http://www.aforgenet.com/forum/viewtopic.php?f=4&t=323
Detect useful features in your logo image, and look for those features in the scanned document. SIFT is a useful feature descriptor that is scale and rotation invariant. Other descriptors include SURF and HOG.
If you look around, there will be plenty of implementations, some of them even in C#.
You can use this small utility:
https://github.com/remdex/logoDetect
It worked for me. Perhaps it will work for you also.
I have a PDF and want to extract the text contained in it. I've tried a few different PDF libraries and they all return basically the same results. When extracting the text from a two page document with literally hundreds of words, only a dozen or so words from the header are returned.
Is there any way to tell if the text I'm after is actually text or a raster image of the text? I'm thinking something along the lines of Firebug's "Inspect Element" but at this point I'll take any solution that tells what I'm really looking at.
This project really doesn't justify attempting to use OCR. And, although a simple solution, using fields in the PDF is not an option since the generator of the file is a third party.
If Acrobat/Reader can select the text, then it Is Text.
Reasons your library might not be able to find the text in question:
Complex/bad fonts or encodings. Adobe can be very forgiving of garbage in, somehow managing to get Good Info out.
The text could be in an annotation rather than the page contents. It won't matter what program parses the content stream if you need to look in the annot array instead.
You didn't name a particular library, so it's possible that the library you're using doesn't look inside XObject Forms. That's unlikely in an even remotely mature API, but stranger things have happened.
If you can get away with copy/pasta from Reader, then just go that route.
Have you tried Amyuni PDF Creator .Net? It allows you to enumerate all components from a specified rectangular region of a page and inspect their type from a predefined types list. You could run a quick test using the trial version and the following code sample for text extraction:
// open a PDF file
axPDFCreactiveX1.Open(System.IO.Directory.GetCurrentDirectory()+"\\sampleBookmarks.pdf", "");
axPDFCreactiveX1.Refresh ();
String text = axPDFCreactiveX1.GetRawPageText (1);
MessageBox.Show (text);
Additionally, it provides Tesseract OCR integration in case you needed it.
Disclaimer: I am part of the development team of this product.
Check this site out. It may contain some helpful code snippets. http://www.codeproject.com/KB/cs/PDFToText.aspx
I am currently working on a project and my goal is to locate text in an image. OCR'ing the text is not my intention as of yet. I want to basically obtain the bounds of text within an image. I am using the AForge.Net imaging component for manipulation. Any assistance in some sense or another?
Update 2/5/09:
I've since went along another route in my project. However I did attempt to obtain text using MODI (Microsoft Office Document Imaging). It allows you to OCR an image and pull text from it with some ease.
This is an active area of research. There are literally oodles of academic papers on the subject. It's going to be difficult to give you assistance especially w/o more deatails. Are you looking for specific types of text? Fonts? English-only? Are you familiar with the academic literature?
"Text detection" is a standard problem in any OCR (optical character recognition) system and consequently there are lots of bits of code on the interwebs that deal with it.
I could start listing piles of links from google but I suggest you just do a search for "text detection" and start reading :). There is ample example code available as well.
recognizing text inside an image is indeed a hot topic for researchers in that field, but only begun to grow out of control when captcha's became the "norm" in terms of defense against spam bots. Why use captcha's as protection? well because it is/was very hard to locate (and read) text inside an image!
The reason why I mention captcha's is because the most advancement* is made within that tiny area, and I think that your solution could be best found there.
especially because captcha's are indeed about locating text (or something that resembles text) inside a cluttered image and afterwards trying to read the letters correctly.
so if you can find yourself a good open source captcha breaking tool you probably have all you need to continue your quest...
You could probably even throw away the most dificult code that handles the character recognition itself, because those OCR's are used to read distorted text, something you don't have to do.
*: advancement in terms of visible, usable, and practical information for a "non-researcher"
If you're ok with using an online API for this, the API at http://www.wisetrend.com/wisetrend_ocr_cloud.shtml can do text detection in addition to just OCR.
Stroke width transform can do that for you. That's at least what MS developed for their mobile phone OS. A discussion on the implementation is here at https://stackoverflow.com/