Statically linking Mono using mkbundle in Windows - c#

I'm having trouble statically linking Mono using mkbundle in Windows. In my attempts to figure out what's going on I hit a wall. When you pass the static flag to mkbundle in windows it looks for the file monosgen-2.0-static.lib in the mono directory. This directory is defined by the line the below:
string monoPath = GetEnv("MONOPREFIX", #"C:\Program Files (x86)\Mono");
The contents of this directory after installing mono 5.1.1 is:
First I noticed the file naming convention is different from that that mkbundle is looking for (monosgen-2.0 should be mono-2.0-sgen). I can change this just fine, however I suspect - given the file name - that the mono-2.0-sgen.lib file shown in the screenshot isn't statically compiled, as when I try to run my bundled application it first can't find the sgen dll, and then when it can it can't find others.
At this point I'm wondering if mkbundle officially works on Windows, and if it does am I doing something fundamentally wrong? I have seen older post asking for help setting mkbundle in Windows and have posted questions regarding this myself. Most point to using mingw instead of cl.exe. Should I be using this instead?
The source for this snippet is shown below. You can find the entire source code here https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/mkbundle.cs.
if (style == "windows")
{
Func<string, string> quote = (pp) => { return "\"" + pp + "\""; };
string compiler = GetEnv("CC", "cl.exe");
string winsdkPath = GetEnv("WINSDK", #"C:\Program Files (x86)\Windows Kits\8.1");
string vsPath = GetEnv("VSINCLUDE", #"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC");
string monoPath = GetEnv("MONOPREFIX", #"C:\Program Files (x86)\Mono");
string[] includes = new string[] {winsdkPath + #"\Include\um", winsdkPath + #"\Include\shared", vsPath + #"\include", monoPath + #"\include\mono-2.0", "." };
// string[] libs = new string[] { winsdkPath + #"\Lib\winv6.3\um\x86" , vsPath + #"\lib" };
var linkLibraries = new string[] { "kernel32.lib",
"version.lib",
"Ws2_32.lib",
"Mswsock.lib",
"Psapi.lib",
"shell32.lib",
"OleAut32.lib",
"ole32.lib",
"winmm.lib",
"user32.lib",
"libvcruntime.lib",
"advapi32.lib",
"OLDNAMES.lib",
"libucrt.lib" };
string glue_obj = "mkbundle_glue.obj";
string monoLib;
if (static_link)
monoLib = LocateFile (monoPath + #"\lib\monosgen-2.0-static.lib");
else {
Console.WriteLine ("WARNING: Dynamically linking the Mono runtime on Windows is not a tested option.");
monoLib = LocateFile (monoPath + #"\lib\monosgen-2.0.lib");
LocateFile (monoPath + #"\lib\monosgen-2.0.dll"); // in this case, the .lib is just the import library, and the .dll is also needed
}
var compilerArgs = new List<string>();
compilerArgs.Add("/MT");
foreach (string include in includes)
compilerArgs.Add(String.Format ("/I {0}", quote (include)));
if (!nomain || custom_main != null) {
compilerArgs.Add(quote(temp_c));
compilerArgs.Add(quote(temp_o));
if (custom_main != null)
compilerArgs.Add(quote(custom_main));
compilerArgs.Add(quote(monoLib));
compilerArgs.Add("/link");
compilerArgs.Add("/NODEFAULTLIB");
compilerArgs.Add("/SUBSYSTEM:windows");
compilerArgs.Add("/ENTRY:mainCRTStartup");
compilerArgs.AddRange(linkLibraries);
compilerArgs.Add("/out:"+ output);
string cl_cmd = String.Format("{0} {1}", compiler, String.Join(" ", compilerArgs.ToArray()));
Execute (cl_cmd);
}

Related

C# trying to getshortcuttargetfile to parse lnk file

I'm trying to build a simple archiving program. We have files scattered throughout the system so part of the routine checks to see if a shortcut was put into the system and if so, to fetch all of the files from where the shortcut points to. It was all working fine a few weeks ago, but when I applied some visual studio updates it seems to have ceased working.
var downloadFilePath = Path.Combine(storFolder, caseEntity.ID, caseDocs.FileName);
if (downloadFilePath.Contains(".lnk"))
{
try
{
Console.WriteLine("Link Found. Now Processing.");
DirectoryInfo source = new DirectoryInfo(GetShortcutTargetFile(downloadFilePath));
DirectoryInfo target = new DirectoryInfo(storFolder + "\\" + nuFolder);
CopyFilesRecursively(source, target);
}
catch { Console.WriteLine("Bad link, no documents found."); }
}
Using break points it looks like the problem is in the "DirectoryInfo source..." line but I can't figure out why it's failing.
This is the GetShortcutTargetFile function.
public static string GetShortcutTargetFile(string shortcutFilename)
{
string pathOnly = Path.GetDirectoryName(shortcutFilename);
string filenameOnly = Path.GetFileName(shortcutFilename);
Shell shell = new Shell();
Folder folder = shell.NameSpace(pathOnly);
FolderItem folderItem = folder.ParseName(filenameOnly);
if (folderItem != null)
{
ShellLinkObject link = (ShellLinkObject)folderItem.GetLink;
return link.Path;
}
return string.Empty; // not found
}

How to build multiple c# projects using MSBuild engine without using command prompt?

I am working on windows application project and from that project want to build different multiple c# projects which are in one solution of visual studio 2015 and also want them to be build programmatically individually using MSBuild tool without using command prompt and finally want to show the output in log file not in command prompt (means those project is building successfully or having any errors like this message in log file)
Do I need to use any MSBuild API and how to add in this project?
I have seen many questions like this (not exactly same) but it didn't work for me. please can anybody help me with this?
using Microsoft.Build.Evaluation;
using Microsoft.Build.Execution;
using Microsoft.Build.Logging;
...
public static BuildResult Compile(string solution_name, out string buildLog)
{
buildLog = "";
string projectFilePath = solution_name;
ProjectCollection pc = new ProjectCollection();
Dictionary<string, string> globalProperty = new Dictionary<string, string>();
globalProperty.Add("nodeReuse", "false");
BuildParameters bp = new BuildParameters(pc);
bp.Loggers = new List<Microsoft.Build.Framework.ILogger>()
{
new FileLogger() {Parameters = #"logfile=buildresult.txt"}
};
BuildRequestData buildRequest = new BuildRequestData(projectFilePath, globalProperty, "4.0",
new string[] {"Clean", "Build"}, null);
BuildResult buildResult = BuildManager.DefaultBuildManager.Build(bp, buildRequest);
BuildManager.DefaultBuildManager.Dispose();
pc = null;
bp = null;
buildRequest = null;
if (buildResult.OverallResult == BuildResultCode.Success)
{
Console.ForegroundColor = ConsoleColor.Green;
}
else
{
if (Directory.Exists("C:\\BuildResults") == false)
{
Directory.CreateDirectory("C:\\BuildResults");
}
buildLog = File.ReadAllText("buildresult.txt");
Console.WriteLine(buildLog);
string fileName = "C:\\BuildResults\\" + DateTime.Now.Ticks + ".txt";
File.Move("buildresult.txt", fileName);
Console.ForegroundColor = ConsoleColor.Red;
Thread.Sleep(5000);
}
Console.WriteLine("Build Result " + buildResult.OverallResult.ToString());
Console.ForegroundColor = ConsoleColor.Gray;
Console.WriteLine("================================");
return buildResult;
}
This is some old code I had lying around.
I use this to programatically build solutions and C# Projects. The output will be a BuildResult.Success or BuildResult.Failure.
The variable buildLog will contain the build output.
Note - the only way to access the build output that I am aware of is to use the above methodology of having a log file generated and then reading it in your C# code.
One thing to be aware of and I never did find a fix for this, is that the application that runs this code, may keep dll's it loads into memory from nuget package directories in memory. This makes deleting those directories problematic. I found a work around by having my application run as a MS Service - it seems when it runs as a local service, it has enough permissions to delete files held in memory.

Runtime C# Compile All Files in a Folder from Code

I am successfully doing run-time compilation using the code below. Except that when the number of input *.cs files is large I get the following error.
System.SystemException: Error running mono.exe: The filename or extension is too long.
I believe this is due to the command line length limit as in this article.
Here is the code
var codeProvider = new CSharpCodeProvider (
new Dictionary<string, string> { { "CompilerVersion", "v4.0" } });
string EngineOutput = Path.GetFileName(dir) + ".dll";
parameters.GenerateExecutable = false;
parameters.OutputAssembly = EngineOutput;
CompilerResults results = codeProvider.CompileAssemblyFromFile (parameters, engineFiles.ToArray ());
Since all source files are in one folder, if there is a way to pass that folder using the -recurse programmatically will be great. Or if there were a method that takes a folder like codeProvider.CompileAssemblyFromDirectoryRecursive that would have also been great, but to my knowledge that doesn't exist.
I tried the following, but it didn't work. The compiler only picks the fake.cs and not the folder specified by /recurse.
parameters.CompilerOptions += " /recurse:" + dir + System.IO.Path.PathSeparator + "*.cs";
results = codeProvider.CompileAssemblyFromFile (parameters, new string[1] { "fake.cs" });
Thanks for advance.
I just replaced the PathSeparator with DirectorySeparator, then it works.
parameters.CompilerOptions += " /recurse:" + dir + System.IO.Path.DirectorySeparatorChar + "*.cs";

localization with .net compact framework

I'm developing an application for mobile devices (Windows CE 5.0, .NET compact framwork 2.0 pre-installed) using .NET compact framework 3.5 and MS Visual Studio 2008.
I'm using the built-in option for creating localized forms. This works perfectly as long as I use the debugging function of Visual Studio with the mobile device connected to my desktop computer. In this case, Visual Studio deploys my application along with .NET compact framework 3.5. After disconnecting the mobile device and having installed my application it is still working as expected.
My problem is: If I install the .NET compact framework using the CAB file provided by Microsoft and then install my application (also by using the CAB file created by Visual Studio) without having used the debugger the application works as well but without localization. So I think there must be some parts of the .NET framework which are only installed using the deployment function of Visual Studio - and which are making .net recognizing the locale. - Does anybody know which parts (libraries...?) are these? Since the application will be provided to users which will not use Visual Studio I've to find a solution for this.
I used the tutorial - guide to do resource localization using Compact Framework: http://www.codeproject.com/Articles/28234/Survival-guide-to-do-resource-localization-using-C
The answer is simple. It should work. But it does not.
There is clearly a bug in Microsoft's tool CABWiz used by Visual Studio to generate CAB files. It has a problem when using files with the same name in different subfolders, like when using localizations.
After hours of trying to fix it, I ended up whith a solution inspired by the CodeProject guide as given by Cornel in the previous answer : You have to "hack" the Visual Studio process of generating CAB, by using resource files with unique name, and then modifying the INF file to specify the original name for deployment on the device.
To automatize a little more, I made a little EXE that is launched as project post-build :
FileInfo CurrentExeInfo = new FileInfo(System.Reflection.Assembly.GetExecutingAssembly().Location);
// Current Folder + bin\Debug
DirectoryInfo BinDebug = new DirectoryInfo( Path.Combine( CurrentExeInfo.Directory.FullName, #"bin\Debug") );
// Subfolders in \bin\Debug
Console.WriteLine(BinDebug.FullName);
string[] Dirs = Directory.GetDirectories(BinDebug.FullName, "*", SearchOption.TopDirectoryOnly);
// In each localization folder ...
foreach (string Dir in Dirs)
{
DirectoryInfo DirInfo = new DirectoryInfo(Dir);
// ... Resource files
string[] RFiles = Directory.GetFiles(Dir, "*.resources.dll");
foreach (string RFile in RFiles)
{
FileInfo RFileInfo = new FileInfo(RFile);
bool DoCopy = false;
// No underscore in resource name
if (!RFileInfo.Name.Contains("_") || RFileInfo.Name.IndexOf("_") == 0)
{
DoCopy = true;
}
// underscore in resource name
// --> Have to check if already a copy
else
{
// prefix removal
int PrefixIndex = RFileInfo.Name.IndexOf("_");
string TestFilename = RFileInfo.Name.Substring(PrefixIndex + 1);
if (!File.Exists(Path.Combine(Dir, TestFilename)))
{
// File without underscore does not exist, so must copy
DoCopy = true;
}
}
if (DoCopy)
{
// Copy file
string NewFileName = Path.Combine(Dir, DirInfo.Name.ToUpper() + "_" + RFileInfo.Name);
Console.WriteLine("Copying " + RFile + " -> " + NewFileName);
File.Copy(RFile, NewFileName, true);
}
}
}
And then this CAB patcher after normal CAB generation :
const string cabwizpath = #"C:\Program Files (x86)\Microsoft Visual Studio 9.0\SmartDevices\SDK\SDKTools\cabwiz.exe";
static void Main(string[] args)
{
if (args.Length == 0)
{
Console.WriteLine("Aborted: You must enter the inf file information");
Console.ReadLine();
return;
}
if (!File.Exists(args[0]))
{
Console.WriteLine("Aborted: I can not found the INF file!");
Console.ReadLine();
return;
}
// string to search
Regex R = new Regex("\"[A-Z]{2,3}_(.+)\\.resources\\.dll\",\"([A-Z]{2,3})_(.+)\\.resources\\.dll\"");
// File reading
string inffile = File.ReadAllText(args[0]);
// Format replace from
// "FR_ProjectName.resources.dll","FR_ProjectName.resources.dll"
// To
// "ProjectName.resources.dll","FR_ProjectName.resources.dll"
inffile = R.Replace(inffile, "\"$1.resources.dll\",\"$2_$3.resources.dll\"");
// Rewriting file
File.WriteAllText(args[0], inffile);
Console.WriteLine("INF file patched ...");
// Génération du CAB ...
Console.WriteLine("Generating correct CAB ... ");
System.Diagnostics.ProcessStartInfo proc = new System.Diagnostics.ProcessStartInfo("\"" + cabwizpath + "\"", "\"" + args[0] + "\"");
proc.ErrorDialog = true;
proc.UseShellExecute = false;
proc.RedirectStandardOutput = true;
Process CabWiz = Process.Start(proc);
Console.WriteLine("\""+cabwizpath + "\" \""+ args[0]+"\"");
CabWiz.WaitForExit();
Console.WriteLine("CAB file generated (" + CabWiz.ExitCode + ") !");
}
I hope it helps.
More links about this :
culture specific string resource dll not working in .net compact framework when application is installed using cab file
https://social.msdn.microsoft.com/Forums/fr-FR/0fbcc5b0-a6ff-4236-961d-22c5f17ed2e7/smart-device-cab-project-includes-wrong-localized-resources

Tesseract OCR simple example

Hi Can you anyone give me a simple example of testing Tesseract OCR
preferably in C#.
I tried the demo found here.
I download the English dataset and unzipped in C drive. and modified the code as followings:
string path = #"C:\pic\mytext.jpg";
Bitmap image = new Bitmap(path);
Tesseract ocr = new Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only
ocr.Init(#"C:\tessdata\", "eng", false); // To use correct tessdata
List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
Console.WriteLine("{0} : {1}", word.Confidence, word.Text);
Unfortunately the code doesn't work. the program dies at "ocr.Init(..." line. I couldn't even get an exception even using try-catch.
I was able to run the vietocr! but that is a very large project for me to follow. i need a simple example like above.
Ok. I found the solution here
tessnet2 fails to load
the Ans given by Adam
Apparently i was using wrong version of tessdata. I was following the the source page instruction intuitively and that caused the problem.
it says
Quick Tessnet2 usage
Download binary here, add a reference of the assembly Tessnet2.dll to your .NET project.
Download language data definition file here and put it in tessdata directory. Tessdata directory and your exe must be in the
same directory.
After you download the binary, when you follow the link to download the language file, there are many language files. but none of them are right version. you need to select all version and go to next page for correct version (tesseract-2.00.eng)! They should either update download binary link to version 3 or put the the version 2 language file on the first page. Or at least bold mention the fact that this version issue is a big deal!
Anyway I found it.
Thanks everyone.
A simple example of testing Tesseract OCR in C#:
public static string GetText(Bitmap imgsource)
{
var ocrtext = string.Empty;
using (var engine = new TesseractEngine(#"./tessdata", "eng", EngineMode.Default))
{
using (var img = PixConverter.ToPix(imgsource))
{
using (var page = engine.Process(img))
{
ocrtext = page.GetText();
}
}
}
return ocrtext;
}
Info: The tessdata folder must exist in the repository: bin\Debug\
I was able to get it to work by following these instructions.
Download the sample code
Unzip it to a new location
Open ~\tesseract-samples-master\src\Tesseract.Samples.sln (I used Visual Studio 2017)
Install the Tesseract NuGet package for that project (or uninstall/reinstall as I had to)
Uncomment the last two meaningful lines in Tesseract.Samples.Program.cs:
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
Run (hit F5)
You should get this windows console output
Try updating the line to:
ocr.Init(#"C:\", "eng", false); // the path here should be the parent folder of tessdata
I had same problem, now its resolved. I have tesseract2, under this folders for 32 bit and 64 bit, I copied files 64 bit folder(as my system is 64 bit) to main folder ("Tesseract2") and under bin/Debug folder. Now my solution is working fine.
In my case I had all these worked except for the correct character recognition.
But you need to consider these few things:
Use correct tessnet2 library
use correct tessdata language version
tessdata should be somewhere out of your application folder where you can put in full path in the init parameter. use ocr.Init(#"c:\tessdata", "eng", true);
Debugging will cause you headache. Then you need to update your app.config
use this. (I can't put the xml code here. give me your email i will email it to you)
hope that this helps
Here's a great working example project; Tesseract OCR Sample (Visual Studio) with Leptonica Preprocessing
Tesseract OCR Sample (Visual Studio) with Leptonica Preprocessing
Tesseract OCR 3.02.02 API can be confusing, so this guides you through including the Tesseract and Leptonica dll into a Visual Studio C++ Project, and provides a sample file which takes an image path to preprocess and OCR. The preprocessing script in Leptonica converts the input image into black and white book-like text.
Setup
To include this in your own projects, you will need to reference the header files and lib and copy the tessdata folders and dlls.
Copy the tesseract-include folder to the root folder of your project. Now Click on your project in Visual Studio Solution Explorer, and go to Project>Properties.
VC++ Directories>Include Directories:
..\tesseract-include\tesseract;..\tesseract-include\leptonica;$(IncludePath)
C/C++>Preprocessor>Preprocessor Definitions:
_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)
C/C++>Linker>Input>Additional Dependencies:
..\tesseract-include\libtesseract302.lib;..\tesseract-include\liblept168.lib;%(AdditionalDependencies)
Now you can include headers in your project's file:
include
include
Now copy the two dll files in tesseract-include and the tessdata folder in Debug to the Output Directory of your project.
When you initialize tesseract, you need to specify the location of the parent folder (!important) of the tessdata folder if it is not already the current directory of your executable file. You can copy my script, which assumes tessdata is installed in the executable's folder.
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
api->Init("D:\tessdataParentFolder\", ...
Sample
You can compile the provided sample, which takes one command line argument of the image path to use. The preprocess() function uses Leptonica to create a black and white book-like copy of the image which makes tesseract work with 90% accuracy. The ocr() function shows the functionality of the Tesseract API to return a string output. The toClipboard() can be used to save text to clipboard on Windows. You can copy these into your own projects.
This worked for me, I had 3-4 more PDF to Text extractor and if one doesnot work the other one will ... tesseract in particular this code can be used on Windows 7, 8, Server 2008 . Hope this is helpful to you
do
{
// Sleep or Pause the Thread for 1 sec, if service is running too fast...
Thread.Sleep(millisecondsTimeout: 1000);
Guid tempGuid = ToSeqGuid();
string newFileName = tempGuid.ToString().Split('-')[0];
string outputFileName = appPath + "\\pdf2png\\" + fileNameithoutExtension + "-" + newFileName +
".png";
extractor.SaveCurrentImageToFile(outputFileName, ImageFormat.Png);
// Create text file here using Tesseract
foreach (var file in Directory.GetFiles(appPath + "\\pdf2png"))
{
try
{
var pngFileName = Path.GetFileNameWithoutExtension(file);
string[] myArguments =
{
"/C tesseract ", file,
" " + appPath + "\\png2text\\" + pngFileName
}; // /C for closing process automatically whent completes
string strParam = String.Join(" ", myArguments);
var myCmdProcess = new Process();
var theProcess = new ProcessStartInfo("cmd.exe", strParam)
{
CreateNoWindow = true,
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = true,
WindowStyle = ProcessWindowStyle.Minimized
}; // Keep the cmd.exe window minimized
myCmdProcess.StartInfo = theProcess;
myCmdProcess.Exited += myCmdProcess_Exited;
myCmdProcess.Start();
//if (process)
{
/*
MessageBox.Show("cmd.exe process started: " + Environment.NewLine +
"Process Name: " + myCmdProcess.ProcessName +
Environment.NewLine + " Process Id: " + myCmdProcess.Id
+ Environment.NewLine + "process.Handle: " +
myCmdProcess.Handle);
*/
Process.EnterDebugMode();
//ShowWindow(hWnd: process.Handle, nCmdShow: 2);
/*
MessageBox.Show("After EnterDebugMode() cmd.exe process Exited: " +
Environment.NewLine +
"Process Name: " + myCmdProcess.ProcessName +
Environment.NewLine + " Process Id: " + myCmdProcess.Id
+ Environment.NewLine + "process.Handle: " +
myCmdProcess.Handle);
*/
myCmdProcess.WaitForExit(60000);
/*
MessageBox.Show("After WaitForExit() cmd.exe process Exited: " +
Environment.NewLine +
"Process Name: " + myCmdProcess.ProcessName +
Environment.NewLine + " Process Id: " + myCmdProcess.Id
+ Environment.NewLine + "process.Handle: " +
myCmdProcess.Handle);
*/
myCmdProcess.Refresh();
Process.LeaveDebugMode();
//myCmdProcess.Dispose();
/*
MessageBox.Show("After LeaveDebugMode() cmd.exe process Exited: " +
Environment.NewLine);
*/
}
//process.Kill();
// Waits for the process to complete task and exites automatically
Thread.Sleep(millisecondsTimeout: 1000);
// This works fine in Windows 7 Environment, and not in Windows 8
// Try following code block
// Check, if process is not comletey exited
if (!myCmdProcess.HasExited)
{
//process.WaitForExit(2000); // Try to wait for exit 2 more seconds
/*
MessageBox.Show(" Process of cmd.exe was exited by WaitForExit(); Method " +
Environment.NewLine);
*/
try
{
// If not, then Kill the process
myCmdProcess.Kill();
//myCmdProcess.Dispose();
//if (!myCmdProcess.HasExited)
//{
// myCmdProcess.Kill();
//}
MessageBox.Show(" Process of cmd.exe exited ( Killed ) successfully " +
Environment.NewLine);
}
catch (System.ComponentModel.Win32Exception ex)
{
MessageBox.Show(
" Exception: System.ComponentModel.Win32Exception " +
ex.ErrorCode + Environment.NewLine);
}
catch (NotSupportedException notSupporEx)
{
MessageBox.Show(" Exception: NotSupportedException " +
notSupporEx.Message +
Environment.NewLine);
}
catch (InvalidOperationException invalidOperation)
{
MessageBox.Show(
" Exception: InvalidOperationException " +
invalidOperation.Message + Environment.NewLine);
foreach (
var textFile in Directory.GetFiles(appPath + "\\png2text", "*.txt",
SearchOption.AllDirectories))
{
loggingInfo += textFile +
" In Reading Text from generated text file by Tesseract " +
Environment.NewLine;
strBldr.Append(File.ReadAllText(textFile));
}
// Delete text file after reading text here
Directory.GetFiles(appPath + "\\pdf2png").ToList().ForEach(File.Delete);
Directory.GetFiles(appPath + "\\png2text").ToList().ForEach(File.Delete);
}
}
}
catch (Exception exception)
{
MessageBox.Show(
" Cought Exception in Generating image do{...}while{...} function " +
Environment.NewLine + exception.Message + Environment.NewLine);
}
}
// Delete png image here
Directory.GetFiles(appPath + "\\pdf2png").ToList().ForEach(File.Delete);
Thread.Sleep(millisecondsTimeout: 1000);
// Read text from text file here
foreach (var textFile in Directory.GetFiles(appPath + "\\png2text", "*.txt",
SearchOption.AllDirectories))
{
loggingInfo += textFile +
" In Reading Text from generated text file by Tesseract " +
Environment.NewLine;
strBldr.Append(File.ReadAllText(textFile));
}
// Delete text file after reading text here
Directory.GetFiles(appPath + "\\png2text").ToList().ForEach(File.Delete);
} while (extractor.GetNextImage()); // Advance image enumeration...
Admittedly this is an older question when Tesseract 3 was the version available, but it came up in my search results while looking for a related issue and the question, and other answers, highlight the still valid issue of the difficulty of actually getting Tesseract installed, let alone configuring it to work correctly.
There is a far simpler solution (and using the updated Tesseract 5 engine) which does all the work for you, in IronOcr.
(Disclaimer: I do work for Iron Software, though I feel that others can benefit from this information, particularly as it relates to the question of using Tesseract OCR in C# which IronOcr excels at).
using IronOcr;
var Ocr = new IronTesseract(); // nothing to configure
Ocr.Configuration.WhiteListCharacters = "0123456789"; // If digit only
using (var Input = new OcrInput(#"example.tiff"))
{
OcrResult Result = Ocr.Read(Input);
foreach (var Page in Result.Pages)
{
// Page object
int PageNumber = Page.PageNumber;
string PageText = Page.Text;
int PageWordCount = Page.WordCount;
// null if we dont set Ocr.Configuration.ReadBarCodes = true;
OcrResult.Barcode[] Barcodes = Page.Barcodes;
System.Drawing.Bitmap PageImage = Page.ToBitmap(Input);
int PageWidth = Page.Width;
int PageHeight = Page.Height;
foreach (var Paragraph in Page.Paragraphs)
{
// Pages -> Paragraphs
int ParagraphNumber = Paragraph.ParagraphNumber;
String ParagraphText = Paragraph.Text;
System.Drawing.Bitmap ParagraphImage = Paragraph.ToBitmap(Input);
int ParagraphX_location = Paragraph.X;
int ParagraphY_location = Paragraph.Y;
int ParagraphWidth = Paragraph.Width;
int ParagraphHeight = Paragraph.Height;
double ParagraphOcrAccuracy = Paragraph.Confidence;
OcrResult.TextFlow paragrapthText_direction = Paragraph.TextDirection;
foreach (var Line in Paragraph.Lines)
{
// Pages -> Paragraphs -> Lines
int LineNumber = Line.LineNumber;
String LineText = Line.Text;
System.Drawing.Bitmap LineImage = Line.ToBitmap(Input); ;
int LineX_location = Line.X;
int LineY_location = Line.Y;
int LineWidth = Line.Width;
int LineHeight = Line.Height;
double LineOcrAccuracy = Line.Confidence;
double LineSkew = Line.BaselineAngle;
double LineOffset = Line.BaselineOffset;
foreach (var Word in Line.Words)
{
// Pages -> Paragraphs -> Lines -> Words
int WordNumber = Word.WordNumber;
String WordText = Word.Text;
System.Drawing.Image WordImage = Word.ToBitmap(Input);
int WordX_location = Word.X;
int WordY_location = Word.Y;
int WordWidth = Word.Width;
int WordHeight = Word.Height;
double WordOcrAccuracy = Word.Confidence;
if (Word.Font != null)
{
// Word.Font is only set when using Tesseract Engine Modes rather than LTSM
String FontName = Word.Font.FontName;
double FontSize = Word.Font.FontSize;
bool IsBold = Word.Font.IsBold;
bool IsFixedWidth = Word.Font.IsFixedWidth;
bool IsItalic = Word.Font.IsItalic;
bool IsSerif = Word.Font.IsSerif;
bool IsUnderLined = Word.Font.IsUnderlined;
bool IsFancy = Word.Font.IsCaligraphic;
}
foreach (var Character in Word.Characters)
{
// Pages -> Paragraphs -> Lines -> Words -> Characters
int CharacterNumber = Character.CharacterNumber;
String CharacterText = Character.Text;
System.Drawing.Bitmap CharacterImage = Character.ToBitmap(Input);
int CharacterX_location = Character.X;
int CharacterY_location = Character.Y;
int CharacterWidth = Character.Width;
int CharacterHeight = Character.Height;
double CharacterOcrAccuracy = Character.Confidence;
// Output alternative symbols choices and their probability.
// Very useful for spellchecking
OcrResult.Choice[] Choices = Character.Choices;
}
}
}
}
}
}

Categories

Resources