How to figure out direction of text or page with tesseract

How to figure out direction of text or page with tesseract - c#

So I have been trying to figure this out for a whole day now. And I really hope someone will be able to help me out.
I am trying to write a software, that will process a PDF document. Processing means, deleting empty pages and rotating pages that have been scanned upside down.
Obviously I need some kind of OCR library here, so I went with Tesseract. Detecting empty pages was easy enough. But the Orientation property doesn't seem to work at all (EDIT: by not working I mean it always says "PageUp"). But from what I gathered so far, this should give me a hint to whether or not my page is upside down. Am I missing something? Maybe something that has to be included in the tessdata folder for this?
I also tried the approach of testing for GetMeanConfidence, flip the image and then compare the MeanConfidences, because in theory the page that wasn't upside down, should be easier to read. But the difference is so minimal, that I don't think this is reliable.
And I switched the language attribute of the TesseractEngine. I tried "eng", "deu", and "osd". All with the same result.
Bitmap image = new Bitmap(filepath);
var path = Path.GetDirectoryName(Assembly.GetExecutingAssembly().CodeBase);
path = Path.Combine(path, "tessdata");
path = path.Replace("file:\\", "");
var engine = new TesseractEngine(path, "osd", EngineMode.TesseractOnly);
using (var img = PixConverter.ToPix(image))
{
using (var page = engine.Process(img, PageSegMode.AutoOsd))
{
var pageIterator = page.AnalyseLayout();
pageIterator.Begin();
var pageProperties = pageIterator.GetProperties();
Console.WriteLine(pageProperties.Orientation.ToString() +" "+
pageProperties.TextLineOrder + " " + pageProperties.DeskewAngle + " " +
pageProperties.WritingDirection);
Console.WriteLine(page.GetMeanConfidence());
}
}
I expect an output that tells me if the page is upside down. So that I know if it has to be rotated or not. Performance doesn't matter!
I am also open for different approaches and libraries (as long as they're free).

Related

Using camera to analyze garage door status

I got the idea to use images from my IP camera, to register if my garage door is open or closed (or maybe even somewhere in between).
I figured it would be simple to put some identifiable markers on the door, and then "read" their position programatic, but I have no experince in image processing, and therefore no idea what it's called.
After a lot of reading, my guess is that I need to use Emgu CV in some way (I'm coding in C#).
Can someone point me in the right direction to get started?
What is the right method for what I am trying to achieve? Blob tracking?

I know this is old but I happen to have done exactly this recently.
I have an old smartphone from which I can remotely enable the LED flash + take a picture and download it. I attached a shiny reflector on a specific location of the garage door and the smartphone is drilled to the wall.
I implemented in python. I download the picture over HTTP, and after a few attempts I identified where to crop the full picture. Then I compute the brightness, which tend to indicate a valid detection above 150 (it's usually 200 when the reflector is here, 130 when it's not but lights on, and 10 when the reflector is not here and lights are off).
def loadFromCam(url):
print("GET " + baseUrl + url)
return requests.get(baseUrl + url, timeout=(10,10))
def brightness(im):
stat = ImageStat.Stat(im)
return stat.rms[0]
def loadImage():
response = loadFromCam("cam/1/frame.jpg")
f = open(destination + 'frame.jpg', 'wb')
f.write(response.content)
f.close()
return Image.open(BytesIO(response.content))
def cropImage(img):
left = 365
top = 400
right = 410
bottom = 435
return img.crop((left, top, right, bottom))
def toggleLed():
loadFromCam("cam/1/led_toggle")
and then how it's used:
toggleLed()
time.sleep(0.1)
image = loadImage()
toggleLed()
crop = cropImage(image)
crop.save(destination + "crop.jpg")
print("brightness:", brightness(crop))
The result is two files (full picture and crop), and the brightness amount.
Note: I just started Python so this may be ugly or not the recommended practice

PdfTextExtractor.GetTextFromPage suddenly giving empty string

We've been using the iTextSharp libraries for a couple of years now within an SSIS process to read some values out of a set of PDF exam documents. Everything has been running nicely until this week when suddenly we are getting the return of an empty string when calling the PdfTextExtractor.GetTextFromPage method. I'll include the code here:
// Read the data from the blob column where the PDF exists
byte[] byteBuffer = Row.FileData.GetBlobData(0, (int)Row.FileData.Length);
using (var pdfReader = new PdfReader(byteBuffer))
{
// Here is the important stuff
var extractStrategy = new LocationTextExtractionStrategy();
// This call will extract the page with the proper data on it depending on the exam type
// 1-page exams = NBOME - need to read first page for exam result data
// 2-page exams = NBME - need to read second page for exam result data
// The next two statements utilize this construct.
var vendor = pdfReader.NumberOfPages == 1 ? "NBOME" : "NBME";
*** THIS NEXT LINE GIVES THE EMPTY STRING
var newText = PdfTextExtractor.GetTextFromPage(pdfReader, pdfReader.NumberOfPages == 1 ? 1 : 2, extractStrategy);
var stringList = newText.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
var fileParser = FileParseFactory.GetFileParse(stringList, vendor);
// Populate our output variables
Row.ParsedExamName = fileParser.GetExamName(stringList);
Row.DateParsed = DateTime.Now;
Row.ParsedId = fileParser.GetStudentId(stringList);
Row.ParsedTestDate = fileParser.GetTestDate(stringList);
Row.ParsedTestDateString = fileParser.GetTestDateAsString(stringList);
Row.ParsedName = fileParser.GetStudentName(stringList);
Row.ParsedTotalScore = fileParser.GetTestScore(stringList);
Row.ParsedVendor = vendor;
}
This is not for all PDFs, by the way. To explain more, we are reading in exam files. One of the exam types (NBME) seems to be reading just fine. However, the other type (NBOME) is not. However, prior to this week, the NBOME ones were being read fine.
This leads me to think it is an internal format change of the PDF file itself.
Also, another bit of information is that the actual pdfReader has data - I can get a byte[] array of the data - but the call to get any text is simply giving me empty.
I'm sorry I'm not able to show any exam data or files - that information is sensitive.
Has anybody seen something like this? If so, any possible solutions?

Well - we have found our answer. The user was originally going to the NBOME web site and downloading the PDF exam result files to import into my parsing system. Like I said, this worked for quite some time. Recently (this week), however, the user started not downloading the files, but using a PDF printing feature and printed the PDF files as PDF. When she did that, the problem occurred.
Bottom line, it looks like the printing the PDF as PDF may have been injecting some characters or something under the covers that was causing the reading of the PDF via iTextSharp to not fail, but to give an empty string. She should have just continued downloading them directly.
Thanks to those who offered some comments!

CSCore crashes when changing sample rate and writing to file

I'm trying to use the CSCore .net library to convert sound files to a specific format for another process. I'm using it in Unity, but I don't think that's specific to the problem as it otherwise works perfectly.
I've got this code right now:
using (IWaveSource source = CodecFactory.Instance.GetCodec(audioPath)) {
using (IWaveSource destination = source.ToSampleSource()
.ChangeSampleRate(16000)
.ToMono()
.ToWaveSource(16)) {
audioPath = Application.dataPath + "/" + Path.GetFileNameWithoutExtension(audioPath) + "_temp_converted" + Path.GetExtension(audioPath);
destination.WriteToFile(audioPath);
}
}
It seems to be the combination of changing the sample rate and writing the file that causes the crash. If I remove the .ChangeSampleRate line (or replace 16000 with the current sample rate of the file) then it saves a mono 16-bit .wav file fine, and if I keep that line but don't try to write it to a file, Unity doesn't crash.
Has anyone else experienced this, or have any insight into what might be causing it? I'm starting to tear my hair out a bit with this!
Thanks.

DotNetNuke 7.1 HTML Module converting data:image into URI?

I am unable to use the drag-and-drop functionality within DotNetNuke version 7.1.
The drag-and-drop functionality of the Telerik RadEditor takes the browser's Base64 input and encases it in an img tag where the source is the data. E.g., src="data:image/jpeg;base64,[base64data]".
When using drag/drop to a RadEditor within the HTML Module and then saving the HTML content, that src definition is changed to a URI request by prepending the relative path for the DNN portal. E.g., src="/mysite/portals/0/data:image/jpeg;base64,[base64data]".
This converts what started out as a perfectly valid embedded image tag into a request and thereby causes the browser to request this "image" from the server. The server then returns a 414 error (URI too long).
Example without prepended relative path: http://jsfiddle.net/GGGH/27Tbb/2/
<img src="data:image/jpeg;base64,[stuff]>
Example with prepended relative path (won't display): http://jsfiddle.net/GGGH/NL85G/2/
<img src="mysite/portals/0/data:image/jpeg;base64,[stuff]>
Is there some configuration that I've missed? Prepending relative paths is OK for src="/somephysicalpath" but not for src="data:image...".

I ended up solving the problem prior to posting the question but wanted to add this knowledge to SO in case someone else encountered the same problem (has no one noticed this yet?). Also, perhaps, DNN or the community can improve upon my solution and that fix can make it into a new DNN build.
I've looked at the source code for RadEditor, RadEditorProvider and then finally the Html module itself. It seems the problem is in the EditHtml.ascx.cs, FormatContent() method which calls the HtmlTextController's ManageRelativePaths() method. It's that method that runs for all "src" tags (and "background") in the Html content string. It post-processes the Html string that comes out of the RadEditor to add in that relative path. This is not appropriate when editing an embedded Base64 image that was dragged to the editor.
In order to fix this, and still allow for the standard functionality originally intended by the manufacturer, the DotNetNuke.Modules.Html.EditHtm.ascx.cs, ManageRelativePaths needs to be modified to allow for an exception if the URI includes a "data:image" string at its beginning. Line 488 (as of version 7.1.0) is potentially appropriate. I added the following code (incrementing P as appropriate and positioned after the URI length was determined -- I'm sure there's a better way but this works fine):
// line 483, HtmlTextController.cs, DNN code included for positioning
while (P != -1)
{
sbBuff.Append(strHTML.Substring(S, P - S + tLen));
// added code
bool skipThisToken = false;
if (strHTML.Substring(P + tLen, 10) == "data:image") // check for base64 image
skipThisToken = true;
// end added code - back to standard DNN
//keep characters left of URL
S = P + tLen;
//save startpos of URL
R = strHTML.IndexOf("\"", S);
//end of URL
if (R >= 0)
{
strURL = strHTML.Substring(S, R - S).ToLower();
}
else
{
strURL = strHTML.Substring(S).ToLower();
}
// added code to continue while loop after the integers were updated
if (skipThisToken)
{
P = strHTML.IndexOf(strToken + "=\"", S + strURL.Length + 2, StringComparison.InvariantCultureIgnoreCase);
continue;
}
// end added code -- the method continues from here (not reproduced)
This is probably not the best solution as its searching for a hard coded value. Better would be functionality that allows the developers to add tags later. (But, then again, EditHtml.ascx.cs and HtmlTextController both hard code the two tags that they intend to post-process.)
So, after making this small change, recompiling the DotNetNuke.Modules.Html.dll and deploying, drag-and-drop should be functional. Obviously this increases the complexity of an upgrade -- it would be better if this were fixed by DNN themselves. I verified that as of v7.2.2 this issue still exists.
UPDATE: Fixed in DNN Community Version 7.4.0

Silverlight deep zoom composing issue

Deep zoom composer itself is very nice tool. I am wondering if there are any automatic ways to compose? For example, I have 100 images, and I want to compose automatically as 10 * 10 deep zoom effect. I am implementing a background workflow and automatically composing deep zoom and publish. The Output Type I prefer is "Images" and "Export as a collection (multiple images)".
Any reference samples or documents? I am using VSTS2008 + C# + .Net 3.5.

Take a look at this post about DeepZoomTools.dll included in the app.

There's a great sample project here and if you really want to go crazy and generate images/tiles programatically, you can try the sort of thing referenced in this MSDN article.
I haven't found a lot of real documentation regarding DeepZoomTools.dll myself, but I created a small test webservice to turn a single uploaded image into a Deep Zoom source. The relevant code is:
public string CreateDeepZoomImage(byte[] abyte, string fileName)
{
ImageCreator ic = new ImageCreator();
string FilePath = Path.Combine(_uploadPath, fileName);
System.IO.FileStream fs = new System.IO.FileStream(FilePath, System.IO.FileMode.Create);
fs.Write(abyte, 0, abyte.Length);
fs.Close();
FileInfo imageFileInfo = new FileInfo(FilePath);
string destination = imageFileInfo.DirectoryName + "\\" + imageFileInfo.Name.TrimEnd(imageFileInfo.Extension.ToCharArray()) + "\\output.xml";
ic.Create(FilePath, destination);
string returnpath = "/Uploads/" + imageFileInfo.Name.TrimEnd(imageFileInfo.Extension.ToCharArray()) + "/output.xml";
return returnpath;
}
Where the return path is used like so:
ZoomImage.Source = new DeepZoomImageTileSource(new Uri(e.Result, UriKind.Relative));
(Forgive the sloppy code. It does work though.)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to figure out direction of text or page with tesseract - c#

Related

Using camera to analyze garage door status

PdfTextExtractor.GetTextFromPage suddenly giving empty string

CSCore crashes when changing sample rate and writing to file

DotNetNuke 7.1 HTML Module converting data:image into URI?

Silverlight deep zoom composing issue

Categories

Resources