I have been reviewing replacements for the Office 2007 MODI OCR (OneNote's 2010 solution has lesser quality/results than 2007 :-( ). I notice that Windows 7 contains an OCR library once you install the optional tiff filter
The OCR component gets installed to
%programfiles%\Common Files\microsoft shared\OCR\7.0\xocr3.psp.dll
but I don't see any API for it?
Does anyone see how this can be interfaced preferably in C#?
ANSWER: Found the soluation, once the optional tiff ifilter win7 feature is installed, i can then get a textoutput of a screenshot using the code/exe on http://www.codeproject.com/KB/cs/IFilter.aspx. Also if add the same [HKEY_CLASSES_ROOT.tiff\PersistentHandler] for .png and .jpg then OCR also works for jpg and png's.
Tessnet OCR is a good solution, but pretty old (last release from 2009). There are couple of very good free OCR solutions available for .NET:
Asprise C# OCR SDK. Very good and fast one.
Microsoft Research Project Hawaii Web-based (cloud) OCR solution with full docs and samples (discontinued 2013)
Bing OCR Web based (cloud) OCR replacement for above. (discontinued March 2014)
Try TessNet, using the suggestions I made to the Poster in this post (enlarge image, use separate process):
c# OCR can't recognize digits (tesseract 2)
I was exploring the windows 7 dlls and I found 3 libraries that might be useful: thocr.psp.dll ,xocr3.psp.dll, and ximage3b.dll. In this website and other similar websites I found out that ximage3b is a Windows system ocr engine. I have been looking for documentation online but I have not been succesful, but hey! at least I know that it's there, I will give you guys an update if I find out how to use it with C#/C/C++.
Related
Looking for suggestions for libraries that can generate PDF and RTF documents from stored data (not "HTML to PDF" or "URL to PDF"). With all functionality for adding images, encryption etc. We are currently looking for an alternative to PDFSharp-MigraDoc-GDI, which, although works with .NET Core, does not fully support it and we see compiler warnings - "This package may not be compatible with your project". We have also been getting issues on the IIS tier regarding GDI+. We've decided to play it safe and find an alternative. Does anyone have a solution that they would recommend? Thanks
As far as I know, you can write whole new documents using the Microsoft.Office.Interop library, here is this post that's talk about it (be careful about deploying things like these, you might need an office instalation running on the server):
https://www.c-sharpcorner.com/UploadFile/muralidharan.d/how-to-create-word-document-using-C-Sharp/
And I've found this post about using the library to print PDFs:
How do I convert Word files to PDF programmatically?
It's not much but hope that it helps, regards!
Well, I found a lot of stuff to this question but unfortunately only bits and pieces and nothing that specific for someone who looks for a starting point.
Draft:
I plan a file format which is derived from BagIT. I want to implement it as a class library in c#.net.
Environment: Office 365 (SharePoint Document List) synced to Windows 10 by OneDrive.
This are the two planned features:
- "Versioning" or better "Diff", like for example in Word
- Continous saving also like in Word or better like in all Office apps
For those not familiar with this features: If you are working in Office 2017, and your file is saved on a SharePoint drive (don't know exactly about OneDrive privat), than you can swich on "continous saving" (don't know the english term). With this your work will always be saved and you don't need to save manually. Works on OneDrive and SharePoint.
If you have SharePoint and you are using the versioning feature of it, than you can Word (don't know of other Office apps) let show the differences between the versions.
The "don't knows" are not the point.
I am searching for an outline, a starting point. Is it possible at all or is it only usable by Microsoft (at the moment)?
So it is SharePoint and C# (.net)
Links would be nice,
an outline also for other readers/potential questioners would be better.
Thanks in advance.
No I found the answere myself. For all with the same question: It ist in the REST-API. With it you can download different versions and diff them yourself.
I didn't found continous saving, but I think it is somwhere in this documentation. If someone knew it right now, than it would be nice if you could shorten my search. Otherwise I will send the link when I find it myself.
Here the link:
https://learn.microsoft.com/en-us/onedrive/developer/rest-api/api/driveitemversion_get
Usually, I would use the Microsoft Office Interop library, but it requires the use of COM objects, which (as far as I know) isn't possible if I'm developing a Windows Universal app. What are some alternative methods I could use to convert Word and PowerPoint files to PDF from a Windows Universal App? Thanks!
There are a number of 3rd party libraries that will do this for you pretty easily, I have used Synfusion (there's a free version) for a similar workflow with Word, Excel and PDF (not PowerPoint) and while not a huge fan overall of the Syncfusion library (version 11), the Office/PDF stuff has done its job well.
Alternatively, if you're sure you will only be using the latest version of Office docs (extensions ends in X) you should be able to use any of the open office libraries to open the file to read it and use something like itext to export back to PDF. That might be trickier for more complicated documents (like PowerPoint slides).
The Syncfusion seems to be a good set of components even if in my projet i only wanted to print/convert office (.doc and .docx) documents.
Did you try this ? and what do you think of :
the speed of converting
the quality printing pdf files
the quality of converting Word file to pdf
Did you use the free "Community Edition", because i can really image we can use theses library with no cost ! What is the drawback ofusing it from free
I'm trying to do a project for school. The project is to create a presentation software. Example of this is the Microsoft PowerPoint. My goal is to mimic its use, but instead of customizing each slide, the user must be able to upload documents(excel, power point, and word). After uploading, the software must be able to convert each page to a "slide".
My medium will be Microsoft Visual C#. I would like to ask for any reading material, tutorials or any suggestions on how i could attack this project. Currently I am able to get text from Microsoft word and printing it out in a RTF text box, unfortunately I am not able to preserve its format(font style, font size, etc.). Although I have added Microsoft Word 12.0 Object to my references in c#, i do not still know how it works.
My inspiration for this project is EasyWorship, a presentation software designed for church use. Their software can upload power point presentations only.
I do really need a lot of help. Please, and thanks.!
I believe you are going to have to get a bit down and dirty with the COM Interop Assemblies available via the combination of Visual Studio w/Tools for Office as well as actually having the Office Suite installed.
MSDN has a run down of the various Interop DLLs available, it may be a jumping off point toward finding the entry point you need.
Additionally, there are various walkthroughs on MSDN for beginning development extending Office components, so that may get you a bit familiar with how to implement the assemblies in your application.
Read this article from MSDN
You create an add-in for Word 2007 by using Visual Studio 2005 Tools for the 2007 Microsoft Office System Second Edition. The add-in takes the structure of the current Word 2007 document, gathers information about all the headings, and creates a basic PowerPoint 2007 presentation, with corresponding Agenda and topic slides.
What you have in mind is not an easy task.
Consider installing virtual image printer, Virtual Image Printer driver for example. Then you can open your document in an instance of Microsoft Word, print the document to the image printer, wait for images to be produced, then display images one at a time in your C# application.
I need to retire 15 years old system and preserve all data. It can only print documents into specific printer HP LaserJet 5. I can print documents into PCL files and looking for ways to convert all this files into PDFs programmatically. Preferably in C#. Can anybody recommend good library or command line tool? Preferably free ;-)
The commandline tool GhostPCL (part of GhostPDL), by the same developers as Ghostscript, can convert PCL to PDF. Recent changes in their public source code repository provide a fully integrated source tree encompassing Ghostscript, GhostPCL and GhostXPS. This includes MS Visual Studio *.sln and *.vcproj files to build all or part of their products. License is GPL or commercial (commercial licenses to be obtained from Artifex):
The simplest solution I found is VeryPDF PCL Converter http://www.verypdf.com/pcltools/index.html. It has command line mode, GUI (for command line), batch mode and only cost $125. My company has been pay for it. Hope this will help somebody too.
I've used Visual Softwares pcl2pdf on several projects, it worked well for me.
We are currently using Lincoln's PCL to PDF converter. It was simple to call and provides embed into our C# application. It also provides good feedback in terms of Events when a page has been converted etc so you can even add progress bars etc.
Lincoln PCL to PDF Converter
I've used PCL to PDF for Windows and OS X which is based on GhostPCL.