Parsing Windows Journal (JNT) files - c#

According to .NET docs here, it should be possible to convert JNT files into XML using the Microsoft.Ink.JournalReader.ReadFromStream component. However, all the code samples on MSDN are old and don't work with Windows 7 x64 and the current version of Windows Journal. In particular, JournalReader.ReadFromStream throws a runtime type mismatch error when reading a JNT file stream.
The most recent code sample I could find targets .NET 3.0; the API docs for .NET 4.0
describe the JournalReader.ReadFromStream component, but the code samples are blank. Is there any up-to-date information on converting JNT format to XML?

Please work on this. You could extract the text and merge it in an invisible layer with pdfs created from the JNTs. Then they would also be searchable.

One possibility is to batch print them to PDF. Then parse the PDF. You can find scripts for batch printing to PDF in the Journal Note to PDF repo.

Related

.NET Core PDF and RTF document generation

Looking for suggestions for libraries that can generate PDF and RTF documents from stored data (not "HTML to PDF" or "URL to PDF"). With all functionality for adding images, encryption etc. We are currently looking for an alternative to PDFSharp-MigraDoc-GDI, which, although works with .NET Core, does not fully support it and we see compiler warnings - "This package may not be compatible with your project". We have also been getting issues on the IIS tier regarding GDI+. We've decided to play it safe and find an alternative. Does anyone have a solution that they would recommend? Thanks
As far as I know, you can write whole new documents using the Microsoft.Office.Interop library, here is this post that's talk about it (be careful about deploying things like these, you might need an office instalation running on the server):
https://www.c-sharpcorner.com/UploadFile/muralidharan.d/how-to-create-word-document-using-C-Sharp/
And I've found this post about using the library to print PDFs:
How do I convert Word files to PDF programmatically?
It's not much but hope that it helps, regards!

How to Add and update the comments for the PDF file using c#

Now I am working in the process of Insert and Update the comment for the pdf file using c# code. I want to Know the any possible way to insert a comment inside the pdf file. Anyone Know about this, Please reply and some reference for this process.
Microsoft does have very limited support for PDF in operating system APIs and it has come only recently in Windows 8 for modern applications (now called UWP) and that support doesn't go as far as updating comments.
So you need to use the 3rd party library. As far as I can tell SharpPDF is the only free library worth something, but I failed to open many PDFs in it so I can't recommend it. So I think you would need to search for some commercial library, I am aware of several of them that can do the job (e.g XfiniumPDF, iTextSharp etc) and you'll get the documentation when you license them.

The best way to extract text from common documents formats (primarily rtf, doc, docx, pdf, epub, mobi) that works with UWP?

I'd like to implement support for these types of files in my application, but for this I need something that will let me extract raw text from these filetypes.
I'm looking for either a solution that doesn't require any additional libraries, or an all-in-one library/NuGet package. I took a look at GemBox.Document but it doesn't seem to be working with UWP projects.
What would be the best option for this?
I'm looking for either a solution that doesn't require any additional libraries, or an all-in-one library/NuGet package.
There is no such package.
In the standard UWP app we can read .rtf file with the Rich edit box, there is code sample in this document shows how to edit, load, and save a Rich Text Format (.rtf) file in a RichEditBox.
For .doc, .docx, aka. MS Word document, especially the version after 2007, it uses Open-XML-SDK and currently it doesn't support UWP platform.
For .pdf documents, you can refer to #Franklin Chen's thread: [UWP]PDF Viewing on a Windows Universal App.
For epub files, it is a ZIP archive file, to parse this file, you can refer to the thread: [WP8.1][C#] How can i read an EPub file in c# on Windows Phone!?.
For mobi files, sorry I couldn't find any useful information for development for the moment, I can only now suggest to convert it to pdf file with free online service.
But in a word, since Open-XML-SDK currently doesn't support UWP platform. It is not possible to find a solution or package for standard UWP app. You can try to find such a web service and implement this service in your app, or you can use commercial libraries which can read documents in all these formats.

How convert PCL generated by HP LaserJet 5 into PDF in C#?

I need to retire 15 years old system and preserve all data. It can only print documents into specific printer HP LaserJet 5. I can print documents into PCL files and looking for ways to convert all this files into PDFs programmatically. Preferably in C#. Can anybody recommend good library or command line tool? Preferably free ;-)
The commandline tool GhostPCL (part of GhostPDL), by the same developers as Ghostscript, can convert PCL to PDF. Recent changes in their public source code repository provide a fully integrated source tree encompassing Ghostscript, GhostPCL and GhostXPS. This includes MS Visual Studio *.sln and *.vcproj files to build all or part of their products. License is GPL or commercial (commercial licenses to be obtained from Artifex):
The simplest solution I found is VeryPDF PCL Converter http://www.verypdf.com/pcltools/index.html. It has command line mode, GUI (for command line), batch mode and only cost $125. My company has been pay for it. Hope this will help somebody too.
I've used Visual Softwares pcl2pdf on several projects, it worked well for me.
We are currently using Lincoln's PCL to PDF converter. It was simple to call and provides embed into our C# application. It also provides good feedback in terms of Events when a page has been converted etc so you can even add progress bars etc.
Lincoln PCL to PDF Converter
I've used PCL to PDF for Windows and OS X which is based on GhostPCL.

How can a Word document be created in C#? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a project where I would like to generate a report export in MS Word format. The report will include images/graphs, tables, and text. What is the best way to do this? Third party tools? What are your experiences?
The answer is going to depend slightly upon if the application is running on a server or if it is running on the client machine. If you are running on a server then you are going to want to use one of the XML based office generation formats as there are know issues when using Office Automation on a server.
However, if you are working on the client machine then you have a choice of either using Office Automation or using the Office Open XML format (see links below), which is supported by Microsoft Office 2000 and up either natively or through service packs. One draw back to this though is that you might not be able to embed some kinds of graphs or images that you wish to show.
The best way to go about things will all depend sightly upon how much time you have to invest in development. If you go the route of Office Automation there are quite a few good tutorials out there that can be found via Google and is fairly simple to learn. However, the Open Office XML format is fairly new so you might find the learning curve to be a bit higher.
Office Open XML Iinformation
Office Open XML - http://en.wikipedia.org/wiki/Office_Open_XML
OpenXML Developer - http://openxmldeveloper.org/default.aspx
Introducing the Office (2007) Open XML File Formats - http://msdn.microsoft.com/en-us/library/aa338205.aspx
DocX free library for creating DocX documents, actively developed and very easy and intuitive to use. Since CodePlex is dying, project has moved to github.
I have spent the last week or so getting up to speed on Office Open XML. We have a database application that stores survey data that we want to report in Microsoft Word. You can actually create Word 2007 (docx) files from scratch in C#. The Open XML SDK version 2 includes a cool application called the Document Reflector that will actually provide the C# code to fully recreate a Word document. You can use parts or all of the code, and substitute the bits you want to change on the fly. The help file included with the SDK has some good code samples as well.
There is no need for the Office Interop or any other Office software on the server - the new formats are 100% XML.
Have you considered using .RTF as an alternative?
It supports embedding images and tables as well as text, opens by default using Microsoft Word and whilst it's featureset is more limited (count out any advanced formatting) for something that looks and feels and opens like a Word document it's not far off.
Your end users probably won't notice.
I have found Aspose Words to be the best as not everybody can open Office Open XML/*.docx format files and the Word interop and Word automation can be buggy. Aspose Words supports most document file types from Word 97 upwards.
It is a pay-for component but has great support. The other alternative as already suggested is RTF.
To generate Word documents with Office Automation within .NET, specifically in C# or VB.NET:
Add the Microsoft.Office.Interop.Word assembly reference to your project. The path is \Visual Studio Tools for Office\PIA\Office11\Microsoft.Office.Interop.Word.dll.
Follow the Microsoft code example
you can find here: http://support.microsoft.com/kb/316384/en-us.
Schmidty, if you want to generate Word documents on a web server you will need a licence for each client (not just the web server). See this section in the first link Rob posted:
"Besides the technical problems, you must also consider licensing issues. Current licensing guidelines prevent Office applications from being used on a server to service client requests, unless those clients themselves have licensed copies of Office. Using server-side Automation to provide Office functionality to unlicensed workstations is not covered by the End User License Agreement (EULA)."
If you meet the licensing requirements, I think you will need to use COM Interop - to be specific, the Office XP Primary Interop Assemblies.
Check out VSTO (Visual Studio Tools for Office). It is fairly simple to create a Word template, inject an xml data island into it, then send it to the client. When the user opens the doc in Word, Word reads the xml and transforms it into WordML and renders it. You will want to look at the ServerDocument class of the VSTO library. No extra licensing is required from my experience.
I have had good success using the Syncfusion Backoffice DocIO which supports doc and docx formats.
In prior releases it did not support everything in word, but accoriding to your list we tested it with tables and text as a mail merge approach and it worked fine.
Not sure about the import of images though. On their blurb page http://www.syncfusion.com/products/DocIO/Backoffice/features/default.aspx it says
Blockquote
Essential DocIO has support for inserting both Scalar and Vector images into the document, in almost all formats. Bitmap, gif, png and tiff are some of the common image types supported.
So its worth considering.
As others have mentioned you can build up a RTF document, there are some good RTF libraries around for .net like http://www.codeproject.com/KB/string/nrtftree.aspx
I faced this problem and created a small library for this. It was used in several projects and then I decided to publish it. It is free and very very simple but I'm sure it will help with you with the task. Invoke the Office Open XML Library, http://invoke.co.nz/products/docx.aspx.
I've written a blog post series on Open XML WordprocessingML document generation. My approach is that you create a template document that contains content controls, and in each content control you write an XPath expression that defines how to retrieve the content from an XML document that contains the data that drives the document generation process. The code is free, and is licensed under the the Microsoft Reciprocal License (Ms-RL). In that same blog post series, I also explore an approach where you write C# code in content controls. The document generation process then processes the template document and generates a C# program that generates the desired documents. One advantage of this approach is that you can use any data source as the source of data for the document generation process. That code is also licenced under the Microsoft Reciprocal License.
I currently do this exact thing.
If the document isn't very big, doesn't contain images and such, then I store it as an RTF with #MergeFields# in it and simply replace them with content, sending the result down to the user as an RTF.
For larger documents, including images and dynamically inserted images, I save the initial Word document as a Single Webpage *.mht file containing the #MergeFields# again. I then do the same as above. Using this, I can easily render a DataTable with some basic Html table tags and replace one of the #MergeFields# with a whole table.
Images can be stored on your server and the url embedded into the document too.
Interestingly, the new Office 2007 file formats are actually zip files - if you rename the extension to .zip you can open them up and see their contents. This means you should be able to switch content such as images in and out using a simple C# zip library.
#Dale Ragan: That will work for the Office 2003 XML format, but that's not portable (as, say, .doc or .docx files would be).
To read/write those, you'll need to use the Word Object Library ActiveX control:
http://www.codeproject.com/KB/aspnet/wordapplication.aspx
#Danny Smurf: Actually this article describes what will become the Office Open XML format which Rob answered with. I will pay more attention to the links I post for now on to make sure there not obsolete. I actually did a search on WordML, which is what it was called at the time.
I believe that the Office Open XML format is the best way to go.
LibreOffice also supports headless interaction via API. Unfortunately there's currently not much information about this feature yet.. :(
You could also use Word document generator. It can be used for client-side or server-side deployment. From the project description:
WordDocumentGenerator is an utility to generate Word documents from
templates using Visual Studio 2010 and Open XML 2.0 SDK.
WordDocumentGenerator helps generate Word documents both
non-refresh-able as well as refresh-able based on predefined templates
using minimum code changes. Content controls are used as placeholders
for document generation. It supports Word 2007 and Word 2010.
Grab it: http://worddocgenerator.codeplex.com/
Download SDK: http://www.microsoft.com/en-us/download/details.aspx?id=5124
Another alternative is Windward Docgen (disclaimer - I'm the founder). With Windward you design the template in Word, including images, tables, graphs, gauges, and anything else you want. You can set tags where data from an XML or SQL datasource is inserted (including functionality like forEach loops, import, etc). And then generate the report to DOCX, PDF, HTML, etc.

Categories

Resources