I'm writing a Windows Service to scan a set of directories for new PDF files and convert them to TIFF with Ghostscript.NET. When I'd compiled and ran the code as a normal program it functioned perfectly, but when I used the same code as a Service the output TIFF never appears. I've set the destination directory to allow writing for Everyone, and the original PDF is being removed as it's supposed to, so it shouldn't be a permissions issue for the "Local System" user. Auditing the directory for access Failures and Successes just shows a list of Successes.
There is a function that reads the color population of the PDF to determine if it's a color document, or B&W scanned as color. That part works, so there isn't an issue accessing and reading the PDF.
I've also tried removing '-q' from the Ghostscript switches and I don't have any errors reported, and "-dDEBUG" outputs so much garbage I don't know what it's saying - but nothing is tagged as an error.
public static void ConvertPDF(string file, GSvalues gsVals)
{
gsProc = new Ghostscript.NET.Processor.GhostscriptProcessor();
System.Collections.Generic.List<string> switches = new System.Collections.Generic.List<string>();
switches.Add("-empty"); // GS.NET ignores the first switch
switches.Add("-r" + gsVals.Resolution); // dpi
switches.Add("-dDownScaleFactor=" + gsVals.ScaleFactor); // Scale the image back down
switches.Add("-sCompression=lzw"); // Compression
switches.Add("-dNumRenderingThreads=" + Environment.ProcessorCount);
switches.Add("-c \"30000000 setvmthreshold\"");
switches.Add("-dNOGC");
string device;
if (_checkPdf(file, gsVals.InkColorLevels, gsVals))
{
gsVals.WriteLog("Color PDF");
device = "-sDEVICE=tiffscaled24"; // 24bit Color TIFF
}
else
{
gsVals.WriteLog("Grayscale PDF");
device = "-sDEVICE=tiffgray"; // grayscale TIFF
}
switches.Add(device);
// Strip the filename out of the full path to the file
string filename = System.IO.Path.GetFileNameWithoutExtension(file);
// Set the output file tag
string oFileName = _setFileName(oPath + "\\" + filename.Trim(), GSvalues.Extension);
string oFileTag = "-sOutputFile=" + oFileName;
switches.Add(oFileTag);
switches.Add(file);
// Process the PDF file
try
{
string s = string.Empty;
foreach (string sw in switches) s += sw + ' ';
gsVals.DebugLog("Switches:\n\t" + s);
gsProc.StartProcessing(switches.ToArray(), new GsStdio());
while (gsProc.IsRunning) System.Threading.Thread.Sleep(1000);
}
catch (Exception e)
{
gsVals.WriteLog("Exception caught: " + e.Message);
Console.Read();
}
gsVals.DebugLog("Archiving PDF");
try
{
System.IO.File.Move(file, _setFileName(gsVals.ArchiveDir + "\\" + filename, ".pdf"));
}
catch (Exception e)
{
gsVals.WriteLog("Error moving PDF: " + e.Message);
}
}
private static string _setFileName(string path, string tifExt)
{
if (System.IO.File.Exists(path + tifExt)) return _setFileName(path, 1, tifExt);
else return path + tifExt;
}
private static string _setFileName(string path, int ctr, string tifExt)
{
// Test the proposed altered filename. It it exists, move to the next iteration
if(System.IO.File.Exists(path + '(' + ctr.ToString() + ')' + tifExt)) return _setFileName(path, ++ctr, tifExt);
else return path + '(' + ctr.ToString() + ')' + tifExt;
}
This is a sample output of the generated switches (pulled from the output log):
Switches: -empty -r220 -dDownScaleFactor=1 -sCompression=lzw -dNumRenderingThreads=4 -c "30000000 setvmthreshold" -dNOGC -sDEVICE=tiffscaled24 -sOutputFile=\\[servername]\amb_ops_scanning$\Test.tiff \\[servername]\amb_ops_scanning$\Test.pdf
Settings are read in an XML file and stored in a class, GSVals. The class also handles writing to the System log for output, or to a text file in the normal Program version. GSSTDIO is a class for handling GS input and output, which just redirects all the output to the same logs as GSVals. The only code changes between the Program version and the Service version is the Service handling code, and the output is changed from a text file to the system logs. Nothing about the Ghostscript processing was changed.
This is being compiled as x86 for portability, but is being run on x64. GS 9.15 is installed, both x86 and x64 versions. GS.NET is version 4.0.30319 installed via NuGet into VS 2012. ILMerge 2.13.0307 is being used to package the GS.NET dll into the exe, also for portability. None of these things changed between the normal EXE and the Windows Service versions, and as I said the normal EXE works without any issues.
I got it working by using CreateProcessAsUser() from advapi32.dll, using code from this article.
I also had to restructure the order of the switches:
switches.Add("-c 30000000 setvmthreshold -f\"" + file + "\"")
The original source I'd used for speeding up the conversion left out the '-f' part, and the fact that the -f was the tag marking the file. I don't know why this worked in GS.NET, but with normal gswin32c.exe I got an error saying that it was an invalid file, until I set the switch this way.
Oddly, the processes this method creates are still Session 0, but it actually works. I'll keep tinkering, but for now it's working.
Related
The same code, one on windows 10, the other on windows 7.
The idea is to have a directory from a network drive replicate over to a local drive.
On windows 10, the machine I am writing it on, it works perfectly fine as intended.
On windows 7, the target machine, it 'works' but the sub folder structure is messed up.
Example,
C:\target -> the target location
C:\targetNewFolderName1 -> What its being copied to
C:\targetNewFolderName2
C:\targetNewFolderNameN
When it should be doing this below,(which it is, on windows 10, not on windows 7)
C:\target -> the target location
C:\target\NewFolderName1 -> What its being copied to
C:\target\NewFolderName2
C:\target\NewFolderNameN
Master is a network directory, #"\\server\fu\bar\target"
Slave is a local directory, #"C:\target"
These are passed to the function.
Function header, private void CheckMasterToSlave(string MasterPath, string SlavePath, string BackupPath, string[] MasterFilesList, string[] SlaveFilesList)
The below code snipit is within a foreach; foreach (string master in MasterFilesList).
log.Info(master + " doesnt exist, copying");
string directoryCheck = (SlavePath + master.Substring(MasterPath.Length)).Substring(0,
(SlavePath + master.Substring(MasterPath.Length)).LastIndexOf("\\"));
if (!Directory.Exists(directoryCheck))
{
log.Debug(directoryCheck + " Directory not present, touching.");
try
{
Directory.CreateDirectory((SlavePath +
master.Substring(MasterPath.Length)).Substring(0, (SlavePath +
master.Substring(MasterPath.Length)).LastIndexOf("\\")));
}
catch
{
log.Error(master + " directory failed to be created in slave environment.");
}
}
try
{
File.Copy(master, SlavePath + master.Substring(MasterPath.Length));
log.Info(SlavePath + master.Substring(MasterPath.Length) + " Successfully created.");
BackupFile(master.Replace(MasterPath, SlavePath), BackupPath, SlavePath);
}
catch
{
log.Error(master + " failed to copy, backup has been halted for this file.");
}
I do not understand why this works as intended on windows 10 but moving it to windows 7 causes this issue.
What would be causing this and how can I stop the new folder from appending to the parent folder in windows 7?
Use Path.Combine to build a path name from different path components instead of just using string concatenation.
Alright, I am stupid and forgot to change to release. When changes that NineBerry mentioned were made. It did work.
I still do not understand why the original did work on windows 10 but not on windows 7. Especially since the BackupFile portion does the same thing as the old 'wrong' way. But both work now.
Regardless, here is the updated bit.
log.Info(master + " doesnt exist, copying");
string[] EndDirectoryFile = master.Substring(MasterPath.Length).Split('\\');
string[] EndDirectory = new string[EndDirectoryFile.Length-1];
for (int i = 0; i < EndDirectoryFile.Length - 1; i++)
{
EndDirectory[i] = EndDirectoryFile[i];
}
string directoryCheck = Path.Combine(SlavePath, Path.Combine(EndDirectory));
if (!Directory.Exists(directoryCheck))
{
log.Debug(directoryCheck + " Directory not present, touching.");
try
{
Directory.CreateDirectory(directoryCheck);
}
catch
{
log.Error(master + " directory failed to be created in slave environment.");
}
}
try
{
File.Copy(master, SlavePath + master.Substring(MasterPath.Length));
log.Info(SlavePath + master.Substring(MasterPath.Length) + " Successfully created.");
BackupFile(master.Replace(MasterPath, SlavePath), BackupPath, SlavePath);
}
catch
{
log.Error(master + " failed to copy, backup has been halted for this file.");
}
The setup - a local Windows 10 PC has multiple network printers installed. A GUI C# WinForm application (.NET) is constantly running in the background, and occasionally downloads a PDF file from a predefined URL (read from an *.ini file).
The problem occurs when the said PDF file is printed. Instead of accepting the number of copies sent from the application, the printer keeps printing just one copy of the file.
This is the relevant part of my code:
string webPrinter = "HP LaserJet PCL 6"; // is set in another part of the code
string iniFilePrinter = "hp LaserJet 1320 PCL 5"; // is set in another part of the code - read from the ini file
string dirName = "C:\\mydir";
string newDocName = "mydoc.pdf";
short numCopies = 1;
if(event1 == "event1") { // taken from another part of the code
numCopies = webNumCopies; // taken from another part of the code
} else if(event2 == "event2") {
numCopies = iniNumCopies; // taken from another part of the code - read from the ini file
}
var path = dirName + "\\" + newDocName;
try
{
using (var document = PdfiumViewer.PdfDocument.Load(path))
{
using (var printDocument = document.CreatePrintDocument())
{
System.Drawing.Printing.PrinterSettings settings = new System.Drawing.Printing.PrinterSettings();
string defaultPrinterName = settings.PrinterName;
printDocument.DocumentName = newDocName;
printDocument.PrinterSettings.PrintFileName = newDocName;
printDocument.PrinterSettings.Copies = numCopies;
printDocument.PrintController = new System.Drawing.Printing.StandardPrintController();
printDocument.PrinterSettings.PrinterName = webPrinter;
MessageBox.Show("Before: " + printDocument.PrinterSettings.Copies.ToString() + " --- " + newDocName);
if (!printDocument.PrinterSettings.IsValid)
{
printDocument.PrinterSettings.PrinterName = iniFilePrinter;
if(!printDocument.PrinterSettings.IsValid)
{
printDocument.PrinterSettings.PrinterName = defaultPrinterName;
}
}
MessageBox.Show("After: " + printDocument.PrinterSettings.Copies.ToString() + " --- " + newDocName);
printDocument.Print();
}
}
}
catch (Exception ex) {
MessageBox.Show(ex.Message);
}
The exception has never been triggered, and the print succeeds on every single try. Changing the number of copies within if/else also happens when the conditions are met, and the MessageBox.Show() parts of the code do show the expected number of copies (2,3,7, anything but 1, when it's not supposed to be 1) immediatelly before invoking printDocument.Print().
I've also tried printing unrelated documents from various other programs (MS Word, various custom applications, PDF readers and the like), and the number of copies has always been 1. However, software like Google Chrome or FireFox manage to get things printed in the specified number of copies.
I was thinking that there might be something about the printer's setting which makes it ignore the number of copies sent. Based on that assumption, I've checked the settings of all of the printers, and have found that the number of copies is actually set to 1.
If that is indeed the cause of my problem, how can I bypass that setting (without actually changing it), the way that Google Chrome and Firefox seem to be able to do it? I know that I could probably change that limit programmatically (set it to my number of copies, and then change it back to the original value, once the printing has been completed), but that doesn't seem like the proper way of doing it.
EDIT
I've expanded my code by including a print dialog, like this:
PrintDialog printDlg = new PrintDialog();
printDlg.Document = printDocument;
printDlg.AllowSelection = true;
printDlg.AllowSomePages = true;
if (printDlg.ShowDialog() == DialogResult.OK)
{
printDocument.Print();
}
Still, the results are the same - even when the user changes the number of copies within the print dialog, the printer ignores them. The same code was tested on another (local) printer, connected to an unrelated Windows 10 PC, and there the number of copies from the dialog was not ignored.
I've also noticed that the print dialog from my application, and that from notepad.exe are different (image below). Is there a way for me to call up the same print dialog notepad.exe uses? The reason I'd like to do this, is because that one gets the job done (xy number of copies in the print dialog, xy number of copies printed).
I'm developing an windows application and i want to set this application as windows start-up application for that i use this code:-
Code
public static void SetStartup(string AppName,
bool enable)
{
try
{
string runKey = #"SOFTWARE\Microsoft\Windows\CurrentVersion\Run";
Microsoft.Win32.RegistryKey startupKey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(runKey);
if (enable)
{
if (startupKey.GetValue(AppName) == null)
{
startupKey.Close();
startupKey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(runKey, true);
startupKey.SetValue(AppName, Assembly.GetExecutingAssembly().Location + " /StartMinimized");
startupKey.Close();
}
}
else
{
startupKey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(runKey, true);
startupKey.DeleteValue(AppName, false);
startupKey.Close();
}
}
catch
{
}
}
Calling code on application load
SetStartup(Application.ExecutablePath, true);
And this code works fine.It sets application as a start-up application.
I check that executing msconfig command in run window.It shows this application checked in start-up tab.But when i restarts the system it doesn't start application.
Can any one tell me what is the problem and how can i solve that problem.
If everything points to it being in startup then I can only assume that that part of it is correct, but the application is failing to start for some reason.
When you start an application on run, it's working directory is set to C:\Windows\System32
I have had issues with applications that may be looking for files in its home directory such as config files but are unable to find them.
Normally files referenced the normal way will be found anyway, but if you are manually specifying a path in your code you can use:
string pathToDLL = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "LibraryFile.dll");
Using AppDomain.CurrentDomain.BaseDirectory should give the path of the your application exe, rather than the working directory.
Could this be the cause of the problem?
Also, I'm going to assume Vista upwards is the OS, and if that's the case then your application would have to be running as elevated to write to that registry. So, if UAC is off and the machine is restarted then your application, if it's set in the manifest to run as requireAdministrator, would fail silently.
Martyn
Finally i got a answer for this problem.Use StreamWriter for creating a URL link of application instead of creating a LNK into start-up folder.
Create shortcut
private void appShortcutToStartup()
{
string linkName ="MytestLink";
string startDir = Environment.GetFolderPath(Environment.SpecialFolder.Startup);
if (!System.IO.File.Exists(startDir + "\\" + linkName + ".url"))
{
using (StreamWriter writer = new StreamWriter(startDir + "\\" + linkName + ".url"))
{
string app = System.Reflection.Assembly.GetExecutingAssembly().Location;
writer.WriteLine("[InternetShortcut]");
writer.WriteLine("URL=file:///" + app);
writer.WriteLine("IconIndex=0");
string icon = Application.StartupPath + "\\backup (3).ico";
writer.WriteLine("IconFile=" + icon);
writer.Flush();
}
}
}
Delete Shortcut
private void delappShortcutFromStartup()
{
string linkName ="MytestLink";
string startDir = Environment.GetFolderPath(Environment.SpecialFolder.Startup);
if (System.IO.File.Exists(startDir + "\\" + linkName + ".url"))
{
System.IO.File.Delete(startDir + "\\" + linkName + ".url");
}
}
This code works very fine.
I believe the most simplest way would be by following the below steps
1.) Build your application
2.) Navigate to your debug folder
3) Copy the exe and place it at your Startup location
**C:\Documents and Settings\user\Start Menu\Programs\Startup**
OR
Simply drag your exe over start menu--> Program-->Startup and Paste it there (i.e
releasing the mouse button)
I guess that would do your work
Hope it helps
Hi Can you anyone give me a simple example of testing Tesseract OCR
preferably in C#.
I tried the demo found here.
I download the English dataset and unzipped in C drive. and modified the code as followings:
string path = #"C:\pic\mytext.jpg";
Bitmap image = new Bitmap(path);
Tesseract ocr = new Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only
ocr.Init(#"C:\tessdata\", "eng", false); // To use correct tessdata
List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
Console.WriteLine("{0} : {1}", word.Confidence, word.Text);
Unfortunately the code doesn't work. the program dies at "ocr.Init(..." line. I couldn't even get an exception even using try-catch.
I was able to run the vietocr! but that is a very large project for me to follow. i need a simple example like above.
Ok. I found the solution here
tessnet2 fails to load
the Ans given by Adam
Apparently i was using wrong version of tessdata. I was following the the source page instruction intuitively and that caused the problem.
it says
Quick Tessnet2 usage
Download binary here, add a reference of the assembly Tessnet2.dll to your .NET project.
Download language data definition file here and put it in tessdata directory. Tessdata directory and your exe must be in the
same directory.
After you download the binary, when you follow the link to download the language file, there are many language files. but none of them are right version. you need to select all version and go to next page for correct version (tesseract-2.00.eng)! They should either update download binary link to version 3 or put the the version 2 language file on the first page. Or at least bold mention the fact that this version issue is a big deal!
Anyway I found it.
Thanks everyone.
A simple example of testing Tesseract OCR in C#:
public static string GetText(Bitmap imgsource)
{
var ocrtext = string.Empty;
using (var engine = new TesseractEngine(#"./tessdata", "eng", EngineMode.Default))
{
using (var img = PixConverter.ToPix(imgsource))
{
using (var page = engine.Process(img))
{
ocrtext = page.GetText();
}
}
}
return ocrtext;
}
Info: The tessdata folder must exist in the repository: bin\Debug\
I was able to get it to work by following these instructions.
Download the sample code
Unzip it to a new location
Open ~\tesseract-samples-master\src\Tesseract.Samples.sln (I used Visual Studio 2017)
Install the Tesseract NuGet package for that project (or uninstall/reinstall as I had to)
Uncomment the last two meaningful lines in Tesseract.Samples.Program.cs:
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
Run (hit F5)
You should get this windows console output
Try updating the line to:
ocr.Init(#"C:\", "eng", false); // the path here should be the parent folder of tessdata
I had same problem, now its resolved. I have tesseract2, under this folders for 32 bit and 64 bit, I copied files 64 bit folder(as my system is 64 bit) to main folder ("Tesseract2") and under bin/Debug folder. Now my solution is working fine.
In my case I had all these worked except for the correct character recognition.
But you need to consider these few things:
Use correct tessnet2 library
use correct tessdata language version
tessdata should be somewhere out of your application folder where you can put in full path in the init parameter. use ocr.Init(#"c:\tessdata", "eng", true);
Debugging will cause you headache. Then you need to update your app.config
use this. (I can't put the xml code here. give me your email i will email it to you)
hope that this helps
Here's a great working example project; Tesseract OCR Sample (Visual Studio) with Leptonica Preprocessing
Tesseract OCR Sample (Visual Studio) with Leptonica Preprocessing
Tesseract OCR 3.02.02 API can be confusing, so this guides you through including the Tesseract and Leptonica dll into a Visual Studio C++ Project, and provides a sample file which takes an image path to preprocess and OCR. The preprocessing script in Leptonica converts the input image into black and white book-like text.
Setup
To include this in your own projects, you will need to reference the header files and lib and copy the tessdata folders and dlls.
Copy the tesseract-include folder to the root folder of your project. Now Click on your project in Visual Studio Solution Explorer, and go to Project>Properties.
VC++ Directories>Include Directories:
..\tesseract-include\tesseract;..\tesseract-include\leptonica;$(IncludePath)
C/C++>Preprocessor>Preprocessor Definitions:
_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)
C/C++>Linker>Input>Additional Dependencies:
..\tesseract-include\libtesseract302.lib;..\tesseract-include\liblept168.lib;%(AdditionalDependencies)
Now you can include headers in your project's file:
include
include
Now copy the two dll files in tesseract-include and the tessdata folder in Debug to the Output Directory of your project.
When you initialize tesseract, you need to specify the location of the parent folder (!important) of the tessdata folder if it is not already the current directory of your executable file. You can copy my script, which assumes tessdata is installed in the executable's folder.
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
api->Init("D:\tessdataParentFolder\", ...
Sample
You can compile the provided sample, which takes one command line argument of the image path to use. The preprocess() function uses Leptonica to create a black and white book-like copy of the image which makes tesseract work with 90% accuracy. The ocr() function shows the functionality of the Tesseract API to return a string output. The toClipboard() can be used to save text to clipboard on Windows. You can copy these into your own projects.
This worked for me, I had 3-4 more PDF to Text extractor and if one doesnot work the other one will ... tesseract in particular this code can be used on Windows 7, 8, Server 2008 . Hope this is helpful to you
do
{
// Sleep or Pause the Thread for 1 sec, if service is running too fast...
Thread.Sleep(millisecondsTimeout: 1000);
Guid tempGuid = ToSeqGuid();
string newFileName = tempGuid.ToString().Split('-')[0];
string outputFileName = appPath + "\\pdf2png\\" + fileNameithoutExtension + "-" + newFileName +
".png";
extractor.SaveCurrentImageToFile(outputFileName, ImageFormat.Png);
// Create text file here using Tesseract
foreach (var file in Directory.GetFiles(appPath + "\\pdf2png"))
{
try
{
var pngFileName = Path.GetFileNameWithoutExtension(file);
string[] myArguments =
{
"/C tesseract ", file,
" " + appPath + "\\png2text\\" + pngFileName
}; // /C for closing process automatically whent completes
string strParam = String.Join(" ", myArguments);
var myCmdProcess = new Process();
var theProcess = new ProcessStartInfo("cmd.exe", strParam)
{
CreateNoWindow = true,
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = true,
WindowStyle = ProcessWindowStyle.Minimized
}; // Keep the cmd.exe window minimized
myCmdProcess.StartInfo = theProcess;
myCmdProcess.Exited += myCmdProcess_Exited;
myCmdProcess.Start();
//if (process)
{
/*
MessageBox.Show("cmd.exe process started: " + Environment.NewLine +
"Process Name: " + myCmdProcess.ProcessName +
Environment.NewLine + " Process Id: " + myCmdProcess.Id
+ Environment.NewLine + "process.Handle: " +
myCmdProcess.Handle);
*/
Process.EnterDebugMode();
//ShowWindow(hWnd: process.Handle, nCmdShow: 2);
/*
MessageBox.Show("After EnterDebugMode() cmd.exe process Exited: " +
Environment.NewLine +
"Process Name: " + myCmdProcess.ProcessName +
Environment.NewLine + " Process Id: " + myCmdProcess.Id
+ Environment.NewLine + "process.Handle: " +
myCmdProcess.Handle);
*/
myCmdProcess.WaitForExit(60000);
/*
MessageBox.Show("After WaitForExit() cmd.exe process Exited: " +
Environment.NewLine +
"Process Name: " + myCmdProcess.ProcessName +
Environment.NewLine + " Process Id: " + myCmdProcess.Id
+ Environment.NewLine + "process.Handle: " +
myCmdProcess.Handle);
*/
myCmdProcess.Refresh();
Process.LeaveDebugMode();
//myCmdProcess.Dispose();
/*
MessageBox.Show("After LeaveDebugMode() cmd.exe process Exited: " +
Environment.NewLine);
*/
}
//process.Kill();
// Waits for the process to complete task and exites automatically
Thread.Sleep(millisecondsTimeout: 1000);
// This works fine in Windows 7 Environment, and not in Windows 8
// Try following code block
// Check, if process is not comletey exited
if (!myCmdProcess.HasExited)
{
//process.WaitForExit(2000); // Try to wait for exit 2 more seconds
/*
MessageBox.Show(" Process of cmd.exe was exited by WaitForExit(); Method " +
Environment.NewLine);
*/
try
{
// If not, then Kill the process
myCmdProcess.Kill();
//myCmdProcess.Dispose();
//if (!myCmdProcess.HasExited)
//{
// myCmdProcess.Kill();
//}
MessageBox.Show(" Process of cmd.exe exited ( Killed ) successfully " +
Environment.NewLine);
}
catch (System.ComponentModel.Win32Exception ex)
{
MessageBox.Show(
" Exception: System.ComponentModel.Win32Exception " +
ex.ErrorCode + Environment.NewLine);
}
catch (NotSupportedException notSupporEx)
{
MessageBox.Show(" Exception: NotSupportedException " +
notSupporEx.Message +
Environment.NewLine);
}
catch (InvalidOperationException invalidOperation)
{
MessageBox.Show(
" Exception: InvalidOperationException " +
invalidOperation.Message + Environment.NewLine);
foreach (
var textFile in Directory.GetFiles(appPath + "\\png2text", "*.txt",
SearchOption.AllDirectories))
{
loggingInfo += textFile +
" In Reading Text from generated text file by Tesseract " +
Environment.NewLine;
strBldr.Append(File.ReadAllText(textFile));
}
// Delete text file after reading text here
Directory.GetFiles(appPath + "\\pdf2png").ToList().ForEach(File.Delete);
Directory.GetFiles(appPath + "\\png2text").ToList().ForEach(File.Delete);
}
}
}
catch (Exception exception)
{
MessageBox.Show(
" Cought Exception in Generating image do{...}while{...} function " +
Environment.NewLine + exception.Message + Environment.NewLine);
}
}
// Delete png image here
Directory.GetFiles(appPath + "\\pdf2png").ToList().ForEach(File.Delete);
Thread.Sleep(millisecondsTimeout: 1000);
// Read text from text file here
foreach (var textFile in Directory.GetFiles(appPath + "\\png2text", "*.txt",
SearchOption.AllDirectories))
{
loggingInfo += textFile +
" In Reading Text from generated text file by Tesseract " +
Environment.NewLine;
strBldr.Append(File.ReadAllText(textFile));
}
// Delete text file after reading text here
Directory.GetFiles(appPath + "\\png2text").ToList().ForEach(File.Delete);
} while (extractor.GetNextImage()); // Advance image enumeration...
Admittedly this is an older question when Tesseract 3 was the version available, but it came up in my search results while looking for a related issue and the question, and other answers, highlight the still valid issue of the difficulty of actually getting Tesseract installed, let alone configuring it to work correctly.
There is a far simpler solution (and using the updated Tesseract 5 engine) which does all the work for you, in IronOcr.
(Disclaimer: I do work for Iron Software, though I feel that others can benefit from this information, particularly as it relates to the question of using Tesseract OCR in C# which IronOcr excels at).
using IronOcr;
var Ocr = new IronTesseract(); // nothing to configure
Ocr.Configuration.WhiteListCharacters = "0123456789"; // If digit only
using (var Input = new OcrInput(#"example.tiff"))
{
OcrResult Result = Ocr.Read(Input);
foreach (var Page in Result.Pages)
{
// Page object
int PageNumber = Page.PageNumber;
string PageText = Page.Text;
int PageWordCount = Page.WordCount;
// null if we dont set Ocr.Configuration.ReadBarCodes = true;
OcrResult.Barcode[] Barcodes = Page.Barcodes;
System.Drawing.Bitmap PageImage = Page.ToBitmap(Input);
int PageWidth = Page.Width;
int PageHeight = Page.Height;
foreach (var Paragraph in Page.Paragraphs)
{
// Pages -> Paragraphs
int ParagraphNumber = Paragraph.ParagraphNumber;
String ParagraphText = Paragraph.Text;
System.Drawing.Bitmap ParagraphImage = Paragraph.ToBitmap(Input);
int ParagraphX_location = Paragraph.X;
int ParagraphY_location = Paragraph.Y;
int ParagraphWidth = Paragraph.Width;
int ParagraphHeight = Paragraph.Height;
double ParagraphOcrAccuracy = Paragraph.Confidence;
OcrResult.TextFlow paragrapthText_direction = Paragraph.TextDirection;
foreach (var Line in Paragraph.Lines)
{
// Pages -> Paragraphs -> Lines
int LineNumber = Line.LineNumber;
String LineText = Line.Text;
System.Drawing.Bitmap LineImage = Line.ToBitmap(Input); ;
int LineX_location = Line.X;
int LineY_location = Line.Y;
int LineWidth = Line.Width;
int LineHeight = Line.Height;
double LineOcrAccuracy = Line.Confidence;
double LineSkew = Line.BaselineAngle;
double LineOffset = Line.BaselineOffset;
foreach (var Word in Line.Words)
{
// Pages -> Paragraphs -> Lines -> Words
int WordNumber = Word.WordNumber;
String WordText = Word.Text;
System.Drawing.Image WordImage = Word.ToBitmap(Input);
int WordX_location = Word.X;
int WordY_location = Word.Y;
int WordWidth = Word.Width;
int WordHeight = Word.Height;
double WordOcrAccuracy = Word.Confidence;
if (Word.Font != null)
{
// Word.Font is only set when using Tesseract Engine Modes rather than LTSM
String FontName = Word.Font.FontName;
double FontSize = Word.Font.FontSize;
bool IsBold = Word.Font.IsBold;
bool IsFixedWidth = Word.Font.IsFixedWidth;
bool IsItalic = Word.Font.IsItalic;
bool IsSerif = Word.Font.IsSerif;
bool IsUnderLined = Word.Font.IsUnderlined;
bool IsFancy = Word.Font.IsCaligraphic;
}
foreach (var Character in Word.Characters)
{
// Pages -> Paragraphs -> Lines -> Words -> Characters
int CharacterNumber = Character.CharacterNumber;
String CharacterText = Character.Text;
System.Drawing.Bitmap CharacterImage = Character.ToBitmap(Input);
int CharacterX_location = Character.X;
int CharacterY_location = Character.Y;
int CharacterWidth = Character.Width;
int CharacterHeight = Character.Height;
double CharacterOcrAccuracy = Character.Confidence;
// Output alternative symbols choices and their probability.
// Very useful for spellchecking
OcrResult.Choice[] Choices = Character.Choices;
}
}
}
}
}
}
So I have a C# console app that goes through a huge list of files and copies missing files to a remote location using a WebDAV mapped drive.
In essence, I'm not using a custom client, I'm using windows built-in client.
net use j: http://127.69.69.69 /user:testme pass /persistent:yes
Everything seems to work just fine (I've increased the file size limits on the IIS 7.0 server, etc. ) EXCEPT occasionally I get a "Delayed Write Failure" popup from the OS.
My problem is that my application does not bail/throw an exception on the File.Copy() Method on the client-- How might I detect this?
I guess it's not a huge deal, but part of me wonders why no exception is thrown. It appears to move onto the next file without logging an error. My script checks if the remote file size is the same and can replace a partial file on the next cycle.
Pretty simple code where it actually copies:
Log("\n copying " + lf.FullName);
try
{
if (!Directory.Exists(Path.Combine(remotePath, localDir.Name)))
Directory.CreateDirectory(Path.Combine(remotePath, localDir.Name));
}
catch (Exception e)
{
Log("Cannot create remote directory " + Path.Combine(remotePath, localDir.Name) + " " + e.Message , "error");
}
try
{
File.Copy(lf.FullName, Path.Combine(new string[] { remotePath, localDir.Name, lf.Name }), true);
}
catch (Exception e)
{
Log("Cannot copy file to " + Path.Combine(new string[] { remotePath, localDir.Name, lf.Name }) + " " + e.Message, "error");
}