Tesseract OCR simple example

Tesseract OCR simple example - c#

Hi Can you anyone give me a simple example of testing Tesseract OCR
preferably in C#.
I tried the demo found here.
I download the English dataset and unzipped in C drive. and modified the code as followings:
string path = #"C:\pic\mytext.jpg";
Bitmap image = new Bitmap(path);
Tesseract ocr = new Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only
ocr.Init(#"C:\tessdata\", "eng", false); // To use correct tessdata
List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
Console.WriteLine("{0} : {1}", word.Confidence, word.Text);
Unfortunately the code doesn't work. the program dies at "ocr.Init(..." line. I couldn't even get an exception even using try-catch.
I was able to run the vietocr! but that is a very large project for me to follow. i need a simple example like above.

Ok. I found the solution here
tessnet2 fails to load
the Ans given by Adam
Apparently i was using wrong version of tessdata. I was following the the source page instruction intuitively and that caused the problem.
it says
Quick Tessnet2 usage
Download binary here, add a reference of the assembly Tessnet2.dll to your .NET project.
Download language data definition file here and put it in tessdata directory. Tessdata directory and your exe must be in the
same directory.
After you download the binary, when you follow the link to download the language file, there are many language files. but none of them are right version. you need to select all version and go to next page for correct version (tesseract-2.00.eng)! They should either update download binary link to version 3 or put the the version 2 language file on the first page. Or at least bold mention the fact that this version issue is a big deal!
Anyway I found it.
Thanks everyone.

A simple example of testing Tesseract OCR in C#:
public static string GetText(Bitmap imgsource)
{
var ocrtext = string.Empty;
using (var engine = new TesseractEngine(#"./tessdata", "eng", EngineMode.Default))
{
using (var img = PixConverter.ToPix(imgsource))
{
using (var page = engine.Process(img))
{
ocrtext = page.GetText();
}
}
}
return ocrtext;
}
Info: The tessdata folder must exist in the repository: bin\Debug\

I was able to get it to work by following these instructions.
Download the sample code
Unzip it to a new location
Open ~\tesseract-samples-master\src\Tesseract.Samples.sln (I used Visual Studio 2017)
Install the Tesseract NuGet package for that project (or uninstall/reinstall as I had to)
Uncomment the last two meaningful lines in Tesseract.Samples.Program.cs:
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
Run (hit F5)
You should get this windows console output

Try updating the line to:
ocr.Init(#"C:\", "eng", false); // the path here should be the parent folder of tessdata

I had same problem, now its resolved. I have tesseract2, under this folders for 32 bit and 64 bit, I copied files 64 bit folder(as my system is 64 bit) to main folder ("Tesseract2") and under bin/Debug folder. Now my solution is working fine.

In my case I had all these worked except for the correct character recognition.
But you need to consider these few things:
Use correct tessnet2 library
use correct tessdata language version
tessdata should be somewhere out of your application folder where you can put in full path in the init parameter. use ocr.Init(#"c:\tessdata", "eng", true);
Debugging will cause you headache. Then you need to update your app.config
use this. (I can't put the xml code here. give me your email i will email it to you)
hope that this helps

Here's a great working example project; Tesseract OCR Sample (Visual Studio) with Leptonica Preprocessing
Tesseract OCR Sample (Visual Studio) with Leptonica Preprocessing
Tesseract OCR 3.02.02 API can be confusing, so this guides you through including the Tesseract and Leptonica dll into a Visual Studio C++ Project, and provides a sample file which takes an image path to preprocess and OCR. The preprocessing script in Leptonica converts the input image into black and white book-like text.
Setup
To include this in your own projects, you will need to reference the header files and lib and copy the tessdata folders and dlls.
Copy the tesseract-include folder to the root folder of your project. Now Click on your project in Visual Studio Solution Explorer, and go to Project>Properties.
VC++ Directories>Include Directories:
..\tesseract-include\tesseract;..\tesseract-include\leptonica;$(IncludePath)
C/C++>Preprocessor>Preprocessor Definitions:
_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)
C/C++>Linker>Input>Additional Dependencies:
..\tesseract-include\libtesseract302.lib;..\tesseract-include\liblept168.lib;%(AdditionalDependencies)
Now you can include headers in your project's file:
include
include
Now copy the two dll files in tesseract-include and the tessdata folder in Debug to the Output Directory of your project.
When you initialize tesseract, you need to specify the location of the parent folder (!important) of the tessdata folder if it is not already the current directory of your executable file. You can copy my script, which assumes tessdata is installed in the executable's folder.
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
api->Init("D:\tessdataParentFolder\", ...
Sample
You can compile the provided sample, which takes one command line argument of the image path to use. The preprocess() function uses Leptonica to create a black and white book-like copy of the image which makes tesseract work with 90% accuracy. The ocr() function shows the functionality of the Tesseract API to return a string output. The toClipboard() can be used to save text to clipboard on Windows. You can copy these into your own projects.

This worked for me, I had 3-4 more PDF to Text extractor and if one doesnot work the other one will ... tesseract in particular this code can be used on Windows 7, 8, Server 2008 . Hope this is helpful to you
do
{
// Sleep or Pause the Thread for 1 sec, if service is running too fast...
Thread.Sleep(millisecondsTimeout: 1000);
Guid tempGuid = ToSeqGuid();
string newFileName = tempGuid.ToString().Split('-')[0];
string outputFileName = appPath + "\\pdf2png\\" + fileNameithoutExtension + "-" + newFileName +
".png";
extractor.SaveCurrentImageToFile(outputFileName, ImageFormat.Png);
// Create text file here using Tesseract
foreach (var file in Directory.GetFiles(appPath + "\\pdf2png"))
{
try
{
var pngFileName = Path.GetFileNameWithoutExtension(file);
string[] myArguments =
{
"/C tesseract ", file,
" " + appPath + "\\png2text\\" + pngFileName
}; // /C for closing process automatically whent completes
string strParam = String.Join(" ", myArguments);
var myCmdProcess = new Process();
var theProcess = new ProcessStartInfo("cmd.exe", strParam)
{
CreateNoWindow = true,
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = true,
WindowStyle = ProcessWindowStyle.Minimized
}; // Keep the cmd.exe window minimized
myCmdProcess.StartInfo = theProcess;
myCmdProcess.Exited += myCmdProcess_Exited;
myCmdProcess.Start();
//if (process)
{
/*
MessageBox.Show("cmd.exe process started: " + Environment.NewLine +
"Process Name: " + myCmdProcess.ProcessName +
Environment.NewLine + " Process Id: " + myCmdProcess.Id
+ Environment.NewLine + "process.Handle: " +
myCmdProcess.Handle);
*/
Process.EnterDebugMode();
//ShowWindow(hWnd: process.Handle, nCmdShow: 2);
/*
MessageBox.Show("After EnterDebugMode() cmd.exe process Exited: " +
Environment.NewLine +
"Process Name: " + myCmdProcess.ProcessName +
Environment.NewLine + " Process Id: " + myCmdProcess.Id
+ Environment.NewLine + "process.Handle: " +
myCmdProcess.Handle);
*/
myCmdProcess.WaitForExit(60000);
/*
MessageBox.Show("After WaitForExit() cmd.exe process Exited: " +
Environment.NewLine +
"Process Name: " + myCmdProcess.ProcessName +
Environment.NewLine + " Process Id: " + myCmdProcess.Id
+ Environment.NewLine + "process.Handle: " +
myCmdProcess.Handle);
*/
myCmdProcess.Refresh();
Process.LeaveDebugMode();
//myCmdProcess.Dispose();
/*
MessageBox.Show("After LeaveDebugMode() cmd.exe process Exited: " +
Environment.NewLine);
*/
}
//process.Kill();
// Waits for the process to complete task and exites automatically
Thread.Sleep(millisecondsTimeout: 1000);
// This works fine in Windows 7 Environment, and not in Windows 8
// Try following code block
// Check, if process is not comletey exited
if (!myCmdProcess.HasExited)
{
//process.WaitForExit(2000); // Try to wait for exit 2 more seconds
/*
MessageBox.Show(" Process of cmd.exe was exited by WaitForExit(); Method " +
Environment.NewLine);
*/
try
{
// If not, then Kill the process
myCmdProcess.Kill();
//myCmdProcess.Dispose();
//if (!myCmdProcess.HasExited)
//{
// myCmdProcess.Kill();
//}
MessageBox.Show(" Process of cmd.exe exited ( Killed ) successfully " +
Environment.NewLine);
}
catch (System.ComponentModel.Win32Exception ex)
{
MessageBox.Show(
" Exception: System.ComponentModel.Win32Exception " +
ex.ErrorCode + Environment.NewLine);
}
catch (NotSupportedException notSupporEx)
{
MessageBox.Show(" Exception: NotSupportedException " +
notSupporEx.Message +
Environment.NewLine);
}
catch (InvalidOperationException invalidOperation)
{
MessageBox.Show(
" Exception: InvalidOperationException " +
invalidOperation.Message + Environment.NewLine);
foreach (
var textFile in Directory.GetFiles(appPath + "\\png2text", "*.txt",
SearchOption.AllDirectories))
{
loggingInfo += textFile +
" In Reading Text from generated text file by Tesseract " +
Environment.NewLine;
strBldr.Append(File.ReadAllText(textFile));
}
// Delete text file after reading text here
Directory.GetFiles(appPath + "\\pdf2png").ToList().ForEach(File.Delete);
Directory.GetFiles(appPath + "\\png2text").ToList().ForEach(File.Delete);
}
}
}
catch (Exception exception)
{
MessageBox.Show(
" Cought Exception in Generating image do{...}while{...} function " +
Environment.NewLine + exception.Message + Environment.NewLine);
}
}
// Delete png image here
Directory.GetFiles(appPath + "\\pdf2png").ToList().ForEach(File.Delete);
Thread.Sleep(millisecondsTimeout: 1000);
// Read text from text file here
foreach (var textFile in Directory.GetFiles(appPath + "\\png2text", "*.txt",
SearchOption.AllDirectories))
{
loggingInfo += textFile +
" In Reading Text from generated text file by Tesseract " +
Environment.NewLine;
strBldr.Append(File.ReadAllText(textFile));
}
// Delete text file after reading text here
Directory.GetFiles(appPath + "\\png2text").ToList().ForEach(File.Delete);
} while (extractor.GetNextImage()); // Advance image enumeration...

Admittedly this is an older question when Tesseract 3 was the version available, but it came up in my search results while looking for a related issue and the question, and other answers, highlight the still valid issue of the difficulty of actually getting Tesseract installed, let alone configuring it to work correctly.
There is a far simpler solution (and using the updated Tesseract 5 engine) which does all the work for you, in IronOcr.
(Disclaimer: I do work for Iron Software, though I feel that others can benefit from this information, particularly as it relates to the question of using Tesseract OCR in C# which IronOcr excels at).
using IronOcr;
var Ocr = new IronTesseract(); // nothing to configure
Ocr.Configuration.WhiteListCharacters = "0123456789"; // If digit only
using (var Input = new OcrInput(#"example.tiff"))
{
OcrResult Result = Ocr.Read(Input);
foreach (var Page in Result.Pages)
{
// Page object
int PageNumber = Page.PageNumber;
string PageText = Page.Text;
int PageWordCount = Page.WordCount;
// null if we dont set Ocr.Configuration.ReadBarCodes = true;
OcrResult.Barcode[] Barcodes = Page.Barcodes;
System.Drawing.Bitmap PageImage = Page.ToBitmap(Input);
int PageWidth = Page.Width;
int PageHeight = Page.Height;
foreach (var Paragraph in Page.Paragraphs)
{
// Pages -> Paragraphs
int ParagraphNumber = Paragraph.ParagraphNumber;
String ParagraphText = Paragraph.Text;
System.Drawing.Bitmap ParagraphImage = Paragraph.ToBitmap(Input);
int ParagraphX_location = Paragraph.X;
int ParagraphY_location = Paragraph.Y;
int ParagraphWidth = Paragraph.Width;
int ParagraphHeight = Paragraph.Height;
double ParagraphOcrAccuracy = Paragraph.Confidence;
OcrResult.TextFlow paragrapthText_direction = Paragraph.TextDirection;
foreach (var Line in Paragraph.Lines)
{
// Pages -> Paragraphs -> Lines
int LineNumber = Line.LineNumber;
String LineText = Line.Text;
System.Drawing.Bitmap LineImage = Line.ToBitmap(Input); ;
int LineX_location = Line.X;
int LineY_location = Line.Y;
int LineWidth = Line.Width;
int LineHeight = Line.Height;
double LineOcrAccuracy = Line.Confidence;
double LineSkew = Line.BaselineAngle;
double LineOffset = Line.BaselineOffset;
foreach (var Word in Line.Words)
{
// Pages -> Paragraphs -> Lines -> Words
int WordNumber = Word.WordNumber;
String WordText = Word.Text;
System.Drawing.Image WordImage = Word.ToBitmap(Input);
int WordX_location = Word.X;
int WordY_location = Word.Y;
int WordWidth = Word.Width;
int WordHeight = Word.Height;
double WordOcrAccuracy = Word.Confidence;
if (Word.Font != null)
{
// Word.Font is only set when using Tesseract Engine Modes rather than LTSM
String FontName = Word.Font.FontName;
double FontSize = Word.Font.FontSize;
bool IsBold = Word.Font.IsBold;
bool IsFixedWidth = Word.Font.IsFixedWidth;
bool IsItalic = Word.Font.IsItalic;
bool IsSerif = Word.Font.IsSerif;
bool IsUnderLined = Word.Font.IsUnderlined;
bool IsFancy = Word.Font.IsCaligraphic;
}
foreach (var Character in Word.Characters)
{
// Pages -> Paragraphs -> Lines -> Words -> Characters
int CharacterNumber = Character.CharacterNumber;
String CharacterText = Character.Text;
System.Drawing.Bitmap CharacterImage = Character.ToBitmap(Input);
int CharacterX_location = Character.X;
int CharacterY_location = Character.Y;
int CharacterWidth = Character.Width;
int CharacterHeight = Character.Height;
double CharacterOcrAccuracy = Character.Confidence;
// Output alternative symbols choices and their probability.
// Very useful for spellchecking
OcrResult.Choice[] Choices = Character.Choices;
}
}
}
}
}
}

Related

MS Word changes relative links back to absolute links

We have some documentation files in our company that go to customers, and I noticed that some links are saved as absolute references, so for example to \server001\Files..., which of course won't work for customers. So I wrote code that takes all files in a folder, and changes the links from absolute references to local references.
toc = WordprocessingDocument.Open(pathFile, true);
var baseUri = new Uri(pathFile, UriKind.Absolute);
IEnumerable<HyperlinkRelationship> all_hr = toc.MainDocumentPart.HyperlinkRelationships;
for (int index = 0; index < all_hr.Count(); index++)
{
HyperlinkRelationship hr = all_hr.ElementAt(index);
string link = hr.Uri.OriginalString.Replace("%20", " ");
if (!(link.EndsWith(".doc") || link.EndsWith(".docx")))
continue;
if (hr.Uri.IsAbsoluteUri)
{
var newHr = new Uri(link, UriKind.Absolute);
link = baseUri.MakeRelativeUri(newHr).OriginalString;
}
log.Info("Changed: " + hr.Uri.OriginalString + " to: " + link + " in file: " + sourceFile);
var hyperlinkRelationshipId = hr.Id;
toc.MainDocumentPart.DeleteReferenceRelationship(hr);
try
{
toc.MainDocumentPart.AddHyperlinkRelationship(new Uri(link, UriKind.Relative), false, hyperlinkRelationshipId);
}
catch (Exception)
{
// This would be reached if link is still absolute. Never called.
}
}
toc.Save();
toc.Close();
When I debug this, it finds absolute links just fine, and they are all stored back as relative links.
But if I then take a look at the files that were "fixed", suddenly different references that I did not even touch in the code are changed, and absolute again (to file:///\\Server001\Files\...).
Is my understanding of references in Word false, or what might be happening here?
Sample log output:
16:49:43,738 [DOC2PDF] INFO - Changed: file:///\\Server001\Files\Main\RSL.doc to: ..\Main\RSL.doc in file: file:///\\Server001\Files\Main\INDEX.doc

How can I fix my code to move files that already exist to another directory?

This is my first C# script and first non-SQL based script in general.I'm really proud of it (and I couldn't have done it this quickly without help from this community, thanks!), but I know it's going to be all kinds of messy.
The script loops through all files in a single directory, removes special characters from the file names, and renames the files to a standard, user-friendly, name. The script is looking for a specific set of files in the directory. If it finds a file that isn't supposed to be in the directory, it moves the file to a safe folder and renames it. If the folder
I'm working with 4 files that have dynamic names that will include numbers and special characters. The renaming process happens in two steps:
Remove special characters and numbers from the name. Ex: From "EOY 12.21.2018 - 12.28.2018 PRF.xls" to "EOYPRF.xls"
Rename the file to clearly label what the file is. Ex: From "EOYPRF.xls" to "EOY_CompanyName.xls"
There may be files added to this directory by accident, and since they are payroll files, they are highly confidential and cannot be moved unless they need to be moved (only if they are one of the 4 files), so I move them to a subdirectory in the same directory the files are stored in and rename them.
I am also trying to account for if my script or process messes up midway. This script is part of a larger automation process run in SSIS, so there are many failure points. It may be possible that the script fails and leaves one or all of the 4 files in the directory. If this is the case, I need to move the file out of the main directory before the user adds new, unaltered master files to be processed. If the directory contains files of the same final name ("EOY_CompanyName.xls") then it will not work properly.
I'm testing the script by placing the three scenario in the directory.
2 files that are not in any way associated with the 4 master files.
4 unaltered master files formatted with numbers and special characters: "EOY 12.21.2018 - 12.28.2018 PRF.xls"
4 master files already in their final state (simulating a failure before the files are moved to their final directory). Ex: "EOY_CompanyName.xls"
The problem I'm facing is in the rare scenario where there are both unaltered master files and final master files in the directory, the script runs up until the first unaltered file, removes the special characters, then fails at the final renaming step because a file already exists with the same name (Scenario 3 from the 3 points above). It'll then continue to run the script and will move one of the master files into the unexpected file directory and stop processing any other files for some reason. I really need some help from someone with experience.
I've tried so many things, but I think it's a matter of the order in which the files are processed. I have two files named "a.xls" and "b.xls" which are placeholders for unexpected files. They are the first two files in the directory and always get processed first. The 3rd file in the directory is the file named above in its unaltered form ("EOY 12.21.2018 - 12.28.2018 PRF.xls"). It gets renamed and moved into the unexpected files folder, but really it should be passed over to move the master files containing the final name ("EOY_CompanyName.xls") into the unexpected folder. I want to make sure that the script only processes new files whenever it's run, so I want to move any already processed files that failed to get moved via the script into another directory.
public void Main()
{
///Define paths and vars
string fileDirectory_Source = Dts.Variables["User::PayrollSourceFilePath"].Value.ToString();
string fileDirectory_Dest = Dts.Variables["User::PayrollDestFilePath"].Value.ToString();
string errorText = Dts.Variables["User::errorText"].Value.ToString();
DirectoryInfo dirInfo_Source = new DirectoryInfo(fileDirectory_Source);
DirectoryInfo dirInfo_Dest = new DirectoryInfo(fileDirectory_Dest);
string finalFileName = "";
List<string> files = new List<string>(new string[]
{
fileDirectory_Source + "EOY_PRF.xls",
fileDirectory_Source + "EOY_SU.xls",
fileDirectory_Source + "FS_PRF.xls",
fileDirectory_Source + "FS_SU.xls"
});
Dictionary<string, string> fileNameChanges = new Dictionary<string, string>();
fileNameChanges.Add("EOYReportPRF.xls", "EOY_PRF.xls");
fileNameChanges.Add("PayrollEOY.xls", "EOY_SU.xls");
fileNameChanges.Add("PRFFundingStatement.xls", "FS_PRF.xls");
fileNameChanges.Add("SUFundingStatement.xls", "FS_SU.xls");
///Determine if all files present
int count = dirInfo_Source.GetFiles().Length;
int i = 0;
///Loop through directory to standardize file names
try
{
foreach (FileInfo fi in dirInfo_Source.EnumerateFiles())
{
string cleanFileName = Regex.Replace(Path.GetFileNameWithoutExtension(fi.Name), "[0-9]|[.,/ -]", "").TrimEnd() + fi.Extension;
File.Move(fileDirectory_Source + Path.GetFileName(fi.Name), fileDirectory_Source + cleanFileName);
///Move unexpectd files in source directory
if (!fileNameChanges.ContainsKey(cleanFileName))
{
errorText = errorText + "Unexpected File: " + cleanFileName.ToString() + " moved into the Unexpected File folder.\r\n";
File.Move(dirInfo_Source + cleanFileName, dirInfo_Source + "Unexpected Files\\" + Path.GetFileNameWithoutExtension(cleanFileName) + "_" + DateTime.Now.ToString("yyyyMMddHHmmssfff") + fi.Extension);
}
if (fileNameChanges.ContainsKey(cleanFileName))
{
///Final Friendly File Name from Dict
var friendlyName = fileNameChanges[cleanFileName];
///Handle errors produced by files that already exist
if (files.Contains(fileDirectory_Source + friendlyName))//File.Exists(fileDirectory_Source + friendlyName))
{
MessageBox.Show("File.Exists(dirInfo_Source + friendlyName)" + File.Exists(dirInfo_Source + friendlyName).ToString() + " cleanFileName " + cleanFileName);
errorText = errorText + "File already exists: " + friendlyName.ToString() + " moved into the Unexpected File folder.\r\n";
File.Move(dirInfo_Source + friendlyName, dirInfo_Source + "Unexpected Files\\" + Path.GetFileNameWithoutExtension(friendlyName) + "_" + DateTime.Now.ToString("yyyyMMddHHmmssfff") + Path.GetExtension(friendlyName));
return;
}
///Rename files to friendly name
File.Move(dirInfo_Source + cleanFileName, dirInfo_Source + friendlyName);
finalFileName = friendlyName.ToString();
}
///Count valid PR files
if (files.Contains(dirInfo_Source + finalFileName))
{
i++;
}
}
///Pass number of files in source folder to SSIS
Dts.Variables["User::FilesInSourceDir"].Value = i;
}
catch (Exception ex)
{
errorText = errorText + ("\r\nError at Name Standardization step: " + ex.Message.ToString()) + $"Filename: {finalFileName}\r\n";
}
///Search for missing files and store paths
try
{
if (i != 4)
{
var errors = files.Where(x => !File.Exists(x)).Select(x => x);
if (errors.Any())
errorText = (errorText + $" Missing neccessary files in PR Shared drive. Currently {i} valid files in directory.\r\n\n" + "Files missing\r\n" + string.Join(Environment.NewLine, errors) + "\r\n");
}
}
catch (Exception ex)
{
errorText = errorText + ("Error at Finding Missing PR Files step: " + ex.Message.ToString()) + "\r\n\n";
throw;
}
///Loop through directory to move files to encrypted location
try
{
if (i == 4)
foreach (FileInfo fi in dirInfo_Source.EnumerateFiles())
{
fi.MoveTo(fileDirectory_Dest + Path.GetFileName(fi.FullName));
}
}
catch (Exception ex)
{
errorText = errorText + ("Error at Move Files to Encrypted Directory step: " + ex.Message.ToString()) + "\r\n";
}
Dts.TaskResult = (int)ScriptResults.Success;
Dts.Variables["User::errorText"].Value = errorText;
}
#region ScriptResults declaration
/// <summary>
/// This enum provides a convenient shorthand within the scope of this class for setting the
/// result of the script.
///
/// This code was generated automatically.
/// </summary>
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
}
I would ideally like to move all files that are in the folder before the files need to be cleaned and renamed so I dont receive errors or commit records to the database that already exist.
If you made it this far, thank you for your time and I appreciate you taking the hour it probably took to read this. You are a hero.

As I understand you want to move out any of the "4 short names" if they already exist before doing anything else. I would go with below, please note, I did not run the code..
I hope I understood you correct
///Loop through directory to standardize file names
try
{
//Cleanup source folder
foreach (string fileShortName in files)
{
if (File.Exists(fileDirectory_Source + fileShortName))
{
//Time to move the file, its old
errorText = errorText + "Old File: " + fileShortName + " moved into the Old File folder.\r\n";
File.Move(dirInfo_Source + fileShortName, dirInfo_Source + "Old Files\\" + Path.GetFileNameWithoutExtension(fileShortName) + "_" + DateTime.Now.ToString("yyyyMMddHHmmssfff") + Path.GetExtension(fileShortName));
}
}
foreach (FileInfo fi in dirInfo_Source.GetFiles())

Statically linking Mono using mkbundle in Windows

I'm having trouble statically linking Mono using mkbundle in Windows. In my attempts to figure out what's going on I hit a wall. When you pass the static flag to mkbundle in windows it looks for the file monosgen-2.0-static.lib in the mono directory. This directory is defined by the line the below:
string monoPath = GetEnv("MONOPREFIX", #"C:\Program Files (x86)\Mono");
The contents of this directory after installing mono 5.1.1 is:
First I noticed the file naming convention is different from that that mkbundle is looking for (monosgen-2.0 should be mono-2.0-sgen). I can change this just fine, however I suspect - given the file name - that the mono-2.0-sgen.lib file shown in the screenshot isn't statically compiled, as when I try to run my bundled application it first can't find the sgen dll, and then when it can it can't find others.
At this point I'm wondering if mkbundle officially works on Windows, and if it does am I doing something fundamentally wrong? I have seen older post asking for help setting mkbundle in Windows and have posted questions regarding this myself. Most point to using mingw instead of cl.exe. Should I be using this instead?
The source for this snippet is shown below. You can find the entire source code here https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/mkbundle.cs.
if (style == "windows")
{
Func<string, string> quote = (pp) => { return "\"" + pp + "\""; };
string compiler = GetEnv("CC", "cl.exe");
string winsdkPath = GetEnv("WINSDK", #"C:\Program Files (x86)\Windows Kits\8.1");
string vsPath = GetEnv("VSINCLUDE", #"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC");
string monoPath = GetEnv("MONOPREFIX", #"C:\Program Files (x86)\Mono");
string[] includes = new string[] {winsdkPath + #"\Include\um", winsdkPath + #"\Include\shared", vsPath + #"\include", monoPath + #"\include\mono-2.0", "." };
// string[] libs = new string[] { winsdkPath + #"\Lib\winv6.3\um\x86" , vsPath + #"\lib" };
var linkLibraries = new string[] { "kernel32.lib",
"version.lib",
"Ws2_32.lib",
"Mswsock.lib",
"Psapi.lib",
"shell32.lib",
"OleAut32.lib",
"ole32.lib",
"winmm.lib",
"user32.lib",
"libvcruntime.lib",
"advapi32.lib",
"OLDNAMES.lib",
"libucrt.lib" };
string glue_obj = "mkbundle_glue.obj";
string monoLib;
if (static_link)
monoLib = LocateFile (monoPath + #"\lib\monosgen-2.0-static.lib");
else {
Console.WriteLine ("WARNING: Dynamically linking the Mono runtime on Windows is not a tested option.");
monoLib = LocateFile (monoPath + #"\lib\monosgen-2.0.lib");
LocateFile (monoPath + #"\lib\monosgen-2.0.dll"); // in this case, the .lib is just the import library, and the .dll is also needed
}
var compilerArgs = new List<string>();
compilerArgs.Add("/MT");
foreach (string include in includes)
compilerArgs.Add(String.Format ("/I {0}", quote (include)));
if (!nomain || custom_main != null) {
compilerArgs.Add(quote(temp_c));
compilerArgs.Add(quote(temp_o));
if (custom_main != null)
compilerArgs.Add(quote(custom_main));
compilerArgs.Add(quote(monoLib));
compilerArgs.Add("/link");
compilerArgs.Add("/NODEFAULTLIB");
compilerArgs.Add("/SUBSYSTEM:windows");
compilerArgs.Add("/ENTRY:mainCRTStartup");
compilerArgs.AddRange(linkLibraries);
compilerArgs.Add("/out:"+ output);
string cl_cmd = String.Format("{0} {1}", compiler, String.Join(" ", compilerArgs.ToArray()));
Execute (cl_cmd);
}

How do I show the line number and frame in a catch exception?

I'm trying to write the line number and frame to a text file but I cannot get it to work. From what I've read online the way I've written this should work but it's not actually outputting any line numbers making my debugging quite hard. Can anyone assist in perhaps pointing out where my code could be wrong?
catch (Exception e)
{
var st = new StackTrace(e, true);
var frame = st.GetFrame(0);
var line = frame.GetFileLineNumber();
var sw = new System.IO.StreamWriter(filename, true);
sw.WriteLine(
DateTime.Now.ToString() + "\r\n"
+ e.Message + "\r\n"
+ e.InnerException + "\r\n"
+ e.Source + "\r\n"
+ frame + "\r\n"
+ line);
sw.Close();
}
It does output some information just not the line / frame numbers.
Here is an example of what's getting output.
22/08/2016 08:34:24
Input string was not in a correct format.
StringToNumber at offset 12099653 in file:line:column <filename unknown>:0:0 0
Also please note the application is running in Debug not Release.

Make sure your application is built in DEBUG mode. If it is in Release, than some information like line numbers are not included in exception texts.
Another possible reason is that you edited Debug configuration for your project and disabled debug information (Project settings => Build => Advanced => Output debug info )
And finally you need pdb files for line numbers and they should be in same folders as your .exe / .dll. By default, they are there, until you manually remove them or copy parts of your application to another location and run there.

Ghostscript.NET - no output file when run as Windows service

I'm writing a Windows Service to scan a set of directories for new PDF files and convert them to TIFF with Ghostscript.NET. When I'd compiled and ran the code as a normal program it functioned perfectly, but when I used the same code as a Service the output TIFF never appears. I've set the destination directory to allow writing for Everyone, and the original PDF is being removed as it's supposed to, so it shouldn't be a permissions issue for the "Local System" user. Auditing the directory for access Failures and Successes just shows a list of Successes.
There is a function that reads the color population of the PDF to determine if it's a color document, or B&W scanned as color. That part works, so there isn't an issue accessing and reading the PDF.
I've also tried removing '-q' from the Ghostscript switches and I don't have any errors reported, and "-dDEBUG" outputs so much garbage I don't know what it's saying - but nothing is tagged as an error.
public static void ConvertPDF(string file, GSvalues gsVals)
{
gsProc = new Ghostscript.NET.Processor.GhostscriptProcessor();
System.Collections.Generic.List<string> switches = new System.Collections.Generic.List<string>();
switches.Add("-empty"); // GS.NET ignores the first switch
switches.Add("-r" + gsVals.Resolution); // dpi
switches.Add("-dDownScaleFactor=" + gsVals.ScaleFactor); // Scale the image back down
switches.Add("-sCompression=lzw"); // Compression
switches.Add("-dNumRenderingThreads=" + Environment.ProcessorCount);
switches.Add("-c \"30000000 setvmthreshold\"");
switches.Add("-dNOGC");
string device;
if (_checkPdf(file, gsVals.InkColorLevels, gsVals))
{
gsVals.WriteLog("Color PDF");
device = "-sDEVICE=tiffscaled24"; // 24bit Color TIFF
}
else
{
gsVals.WriteLog("Grayscale PDF");
device = "-sDEVICE=tiffgray"; // grayscale TIFF
}
switches.Add(device);
// Strip the filename out of the full path to the file
string filename = System.IO.Path.GetFileNameWithoutExtension(file);
// Set the output file tag
string oFileName = _setFileName(oPath + "\\" + filename.Trim(), GSvalues.Extension);
string oFileTag = "-sOutputFile=" + oFileName;
switches.Add(oFileTag);
switches.Add(file);
// Process the PDF file
try
{
string s = string.Empty;
foreach (string sw in switches) s += sw + ' ';
gsVals.DebugLog("Switches:\n\t" + s);
gsProc.StartProcessing(switches.ToArray(), new GsStdio());
while (gsProc.IsRunning) System.Threading.Thread.Sleep(1000);
}
catch (Exception e)
{
gsVals.WriteLog("Exception caught: " + e.Message);
Console.Read();
}
gsVals.DebugLog("Archiving PDF");
try
{
System.IO.File.Move(file, _setFileName(gsVals.ArchiveDir + "\\" + filename, ".pdf"));
}
catch (Exception e)
{
gsVals.WriteLog("Error moving PDF: " + e.Message);
}
}
private static string _setFileName(string path, string tifExt)
{
if (System.IO.File.Exists(path + tifExt)) return _setFileName(path, 1, tifExt);
else return path + tifExt;
}
private static string _setFileName(string path, int ctr, string tifExt)
{
// Test the proposed altered filename. It it exists, move to the next iteration
if(System.IO.File.Exists(path + '(' + ctr.ToString() + ')' + tifExt)) return _setFileName(path, ++ctr, tifExt);
else return path + '(' + ctr.ToString() + ')' + tifExt;
}
This is a sample output of the generated switches (pulled from the output log):
Switches: -empty -r220 -dDownScaleFactor=1 -sCompression=lzw -dNumRenderingThreads=4 -c "30000000 setvmthreshold" -dNOGC -sDEVICE=tiffscaled24 -sOutputFile=\\[servername]\amb_ops_scanning$\Test.tiff \\[servername]\amb_ops_scanning$\Test.pdf
Settings are read in an XML file and stored in a class, GSVals. The class also handles writing to the System log for output, or to a text file in the normal Program version. GSSTDIO is a class for handling GS input and output, which just redirects all the output to the same logs as GSVals. The only code changes between the Program version and the Service version is the Service handling code, and the output is changed from a text file to the system logs. Nothing about the Ghostscript processing was changed.
This is being compiled as x86 for portability, but is being run on x64. GS 9.15 is installed, both x86 and x64 versions. GS.NET is version 4.0.30319 installed via NuGet into VS 2012. ILMerge 2.13.0307 is being used to package the GS.NET dll into the exe, also for portability. None of these things changed between the normal EXE and the Windows Service versions, and as I said the normal EXE works without any issues.

I got it working by using CreateProcessAsUser() from advapi32.dll, using code from this article.
I also had to restructure the order of the switches:
switches.Add("-c 30000000 setvmthreshold -f\"" + file + "\"")
The original source I'd used for speeding up the conversion left out the '-f' part, and the fact that the -f was the tag marking the file. I don't know why this worked in GS.NET, but with normal gswin32c.exe I got an error saying that it was an invalid file, until I set the switch this way.
Oddly, the processes this method creates are still Session 0, but it actually works. I'll keep tinkering, but for now it's working.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Tesseract OCR simple example - c#

Try updating the line to: ocr.Init(#"C:\", "eng", false); // the path here should be the parent folder of tessdata

I had same problem, now its resolved. I have tesseract2, under this folders for 32 bit and 64 bit, I copied files 64 bit folder(as my system is 64 bit) to main folder ("Tesseract2") and under bin/Debug folder. Now my solution is working fine.

Related

MS Word changes relative links back to absolute links

How can I fix my code to move files that already exist to another directory?

Statically linking Mono using mkbundle in Windows

How do I show the line number and frame in a catch exception?

Ghostscript.NET - no output file when run as Windows service

Categories

Resources