C#: How to convert HTML5/CSS3 into PDF document? - c#

It's obvious from the title what I want to do. I know it is possible to convert html to PDF document using a very popular library iTextSharp. But what I acknowledged from this post is iTextSharp cannot render HTML5 and CSS3 styles correctly. Is there any free library to achieve this?
Backgroud:
I am using DevExtreme for report generation. It has supported chart export in PDF but my client wants some extra content in PDF apart from charts. It is not supported by DevExtreme, so I took decision to write my own custom PDF exporter.
There are some libraries available but I cannot rely them since I can't predict in advance what issues will it cause in production in future. Correct me if I am wrong, there is no API given by Microsoft for manipulating PDF files. We can create and manipulate excel and doc files using Microsoft.Office.Interop.Excel.dll and Microsoft.Office.Interop.Word.dll but I didn't find anything for PDF manipulation.
Please suggest me what options I have.
Hope this makes sense..!

A few years back I was using iTextSharp to get our html manuals in xhtml/css/wiki to pdf. It was...painful and a lot of work. So, the first news is: You will need quite a few weeks (2,3,4 weeks, depending on the grade of perfection you want) of time if what you have is not only a few html pages.
If you only have a very limited amount of pages, the quickest and dirtiest way is to make screenshots from your rendered pages and add those images to the pdf. Not very high-tech but quickly done.
If your style sheets can be sacrificed and you do not care about the formatting of the content to be identical, you can convert your html5 pages to xhtml so you can load them as XmlDocuments. Then you simply create a program which does some mapping from xml elements, such as <h1>MyTitle</h1> to some section of code which creates a pdf entity using iTextSharp. Basically that was the way I did it in my case. I also did some mapping from css style classes to some specific pdf formatting, but not to the extreme.
Also worth trying is converters from html (or xml) to tex/latex. If you are lucky you find one which does a good enough job. Then you can use pdftex and get your pdf.
Also, it is possible that you can print your documents to an xps printer and then convert the xps to pdf. Or you simply convince your customer that xps is what they want.

Related

How can I add accessibility tagging to non-tagged PDF with iTextSharp using PDF content nodes?

How can I add accessibility tagging to non-tagged PDF with iTextSharp using PDF content nodes?
Per Can I fully tag a non tagged PDF using iTextSharp?, I know we can't get perfect tagging compared to a human or tagging the PDF as its being created (the best options), but what about taking the PDF content objects and just doing a "best effort" semantic tagging to improve readability?
The basic principle I would think could be to sort all PDF content objects left to right, top to bottom, and then say for a text node, create simple P tags in that order so that they're spoken at least. If there are form objects intermixed, then tag those as well. Obviously if its all paths and artifacts, there isn't much you can do with it, but plenty of PDFs have text nodes. I can't rely on Adobe Reader trying to determine reading order.
For example, the content structure of a PDF is a simple example that has Text content nodes that could be tagged. We can't control the source PDF generation, but need to manipulate the PDF by adding headers/footers, etc and want it all tagged together.
How can we achieve this with iTextSharp? We have the commercial version 5.5.10.0 version.
For example, abcpdf has a function called MakeAccessible that attempts this and works fairly well. However, we want to use iTextSharp.

Converting XSL-FO to HTML

I have set of XSL-FO documents which are used for PDF generation. Also I have a requirement to get the same output data (which are in PDF) exported as an HTML file. Further, I need the HTML to have a similar styles as in PDF.
Is there any way to convert XSL-FO to XHTML using C#?
NOTE : I know one option is to use "RenderX:FO2HTML". But since it's a commercial product, I would like to learn about any other options available and do a comparison before continuing further.
I use the RenderX fo2html stylesheet a lot, and I recommend it to my customers because it is zero cost. Thus I have built it into a number of client solutions. You have to go through the RenderX online store to get it, but it costs nothing.
Write or find an XSLT stylesheet which converts XSL-FO into XHTML, modify it if necessary to get the rendering you require? Websearching "XSL-FO to HTML" finds at least one such.
Though this is somewhat backward. Normally the document starts in some semantic markup language (such as XHTML), and a stylesheet converts it into XSL-FO for rendering.

Markup Language To Pdf or Html

Is there a markup language that can be used in conjunction with a well supported .net open source project to generate PDF or HTML documents with very fine control on the output in terms of style and anchoring for both ?
Documents will part be static and part auto generated from the xml comments of some class libraries.
To Clarify the question, I Know html is a markup language, The reason I don't want to use it to directly store the content is because all of the HTML to PDF tools and libraries I have looked at contain patchy support for creating tables of contents, indexes and turning hyperlinks in to PDF document anchors.
I would opt for HTML documents. Markdown comes to mind. But as far as 'very fine' control goes arbitrarily, you can always just use HTML.. it is THE HyperText Markup Language after all.
There were many questions like this before on stackoverflow. I think the consensus is that you should have one markup language, rather than two.
HTML is - by definition (hypertext MARKUP LANGUAGE) - the markup language of choice and all you need to do is convert that to PDF. The other way around, from PDF to HTML is quite a bit tougher.
In order to convert HTML to PDF there's a truckload of tools, depending on what exact needs you have for the resulting PDF and what kind of CSS you need to support.
I'd always go for a rendering engine that's used in browsers (instead of something like iText or Prince), because you want to make sure your docs look like they do in a browser. You'd end up with Winnovative or something based on WebKit like the API by htm2pdf.
XSL-FO is the recommended solution. It provides a great level of control over the document layout and there are several tools for XSL-FO to PDF comversion.

Generate PDF/ Microsoft Word Reports using ASP.NET

I want to ask for the best way to generate a PDF and Microsoft Word Documents using ASP.NET.
I have used XSLT transformation, but the results was not good, and the major of XSLT processors are commercial and not free.
I need to create a simple document have a header, footer and some tables and images.
can anyone provide me with the best technology to do this job.
Thanks
I had this question a little while ago.
I wrote some really neat stuff for PDF generation.
iTextSharp or XSL-FO to create a PDF dynamically with fillable forms?
PM me and I can send you some files.
IText is a good free library for creating PDF documents.http://itextpdf.com. Works great with both WinForms and ASP.Net.

Wanted: ASP.NET control to view/print PDF, TIFF, possibly more?

I'm looking for an asp.NET control that will allow for viewing and printing of a pdf and TIFF within a web form. I'm willing to use more than 1 control if needed (1 control for pdf, 1 for Tiff, show and hide based on file extension), but I have not been able to find a good Tiff viewer.
Files are stored on our LAN in a shared folder, and this application is an intranet site.
Open source / free licensing preferred, but I'm willing to look at paid options as well.
http://www.alternatiff.com/ is one of the viewers that I've seen used for this type of viewing of tiffs.
You can get a free licence of ABCPDF (provided you link back to their site) which will do the conversion from TIFF to PDF for you as per #Chris Lively 's suggestion.
It'll also do conversion from PDF to TIFF if you decide to do things backwards.
It makes sense to present the content in a common format. If you wanted to you can embed the PDF in the browser to create the 'seamless' experience you're looking for using something like PDFObject.
As #BenCr says though, PDF is a really common format and the tools already exist to open and work with them, so introducing new ways to perform existing tasks could actually end up complicating matters unnecessarily.
I'm in total agreement with #BenCr on this.
Viewing PDFs is an extremely common thing to do. This isn't a "technical" issue by any stretch.
It sounds like you have some type of faxing solution in place that is creating these documents. Most likely multi-page TIFF and PDFs.
If this is the case you might want to just convert the TIFFs to PDFs to begin with and run everything through Adobe's pdf reader. Every online fax solution does this.
You could try http://issuu.com/ and they appear to have a API too if you want to go that deep.
We used the the Seadragon control to do this. I think it was an overkill and we should have just rolled our own -- would have been cheaper than integrating it. TIFFs and PDFs are converted to PNG on the server side. I don't think you can do better than that, especially with PDFs (assuming you don't want to use Acrobat Reader to display them). Convert PDFs to PNG using Xpdf/Poppler.
How about using Google Docs Viewer?
EDIT: Probably not working, since the viewer has to read the document from your URL; when it's on the Intranet, this won't work.
If you can mess about with mime types -- mainly by making the .tiff files expose an application/pdf mimetype -- you should be able to get acrobat to open TIFF files directly by effectively fooling the browser to open TIFF files with acrobat. Then all you need is a trusty old iframe to get you familiar UI with print buttons.

Categories

Resources