Markup Language To Pdf or Html

Markup Language To Pdf or Html - c#

Is there a markup language that can be used in conjunction with a well supported .net open source project to generate PDF or HTML documents with very fine control on the output in terms of style and anchoring for both ?
Documents will part be static and part auto generated from the xml comments of some class libraries.
To Clarify the question, I Know html is a markup language, The reason I don't want to use it to directly store the content is because all of the HTML to PDF tools and libraries I have looked at contain patchy support for creating tables of contents, indexes and turning hyperlinks in to PDF document anchors.

I would opt for HTML documents. Markdown comes to mind. But as far as 'very fine' control goes arbitrarily, you can always just use HTML.. it is THE HyperText Markup Language after all.

There were many questions like this before on stackoverflow. I think the consensus is that you should have one markup language, rather than two.
HTML is - by definition (hypertext MARKUP LANGUAGE) - the markup language of choice and all you need to do is convert that to PDF. The other way around, from PDF to HTML is quite a bit tougher.
In order to convert HTML to PDF there's a truckload of tools, depending on what exact needs you have for the resulting PDF and what kind of CSS you need to support.
I'd always go for a rendering engine that's used in browsers (instead of something like iText or Prince), because you want to make sure your docs look like they do in a browser. You'd end up with Winnovative or something based on WebKit like the API by htm2pdf.

XSL-FO is the recommended solution. It provides a great level of control over the document layout and there are several tools for XSL-FO to PDF comversion.

Related

C#: How to convert HTML5/CSS3 into PDF document?

It's obvious from the title what I want to do. I know it is possible to convert html to PDF document using a very popular library iTextSharp. But what I acknowledged from this post is iTextSharp cannot render HTML5 and CSS3 styles correctly. Is there any free library to achieve this?
Backgroud:
I am using DevExtreme for report generation. It has supported chart export in PDF but my client wants some extra content in PDF apart from charts. It is not supported by DevExtreme, so I took decision to write my own custom PDF exporter.
There are some libraries available but I cannot rely them since I can't predict in advance what issues will it cause in production in future. Correct me if I am wrong, there is no API given by Microsoft for manipulating PDF files. We can create and manipulate excel and doc files using Microsoft.Office.Interop.Excel.dll and Microsoft.Office.Interop.Word.dll but I didn't find anything for PDF manipulation.
Please suggest me what options I have.
Hope this makes sense..!

A few years back I was using iTextSharp to get our html manuals in xhtml/css/wiki to pdf. It was...painful and a lot of work. So, the first news is: You will need quite a few weeks (2,3,4 weeks, depending on the grade of perfection you want) of time if what you have is not only a few html pages.
If you only have a very limited amount of pages, the quickest and dirtiest way is to make screenshots from your rendered pages and add those images to the pdf. Not very high-tech but quickly done.
If your style sheets can be sacrificed and you do not care about the formatting of the content to be identical, you can convert your html5 pages to xhtml so you can load them as XmlDocuments. Then you simply create a program which does some mapping from xml elements, such as <h1>MyTitle</h1> to some section of code which creates a pdf entity using iTextSharp. Basically that was the way I did it in my case. I also did some mapping from css style classes to some specific pdf formatting, but not to the extreme.
Also worth trying is converters from html (or xml) to tex/latex. If you are lucky you find one which does a good enough job. Then you can use pdftex and get your pdf.
Also, it is possible that you can print your documents to an xps printer and then convert the xps to pdf. Or you simply convince your customer that xps is what they want.

Generating wide html pages to PDFs

I'm using C# .Net and want to generate a random size HTML table into PDF. I have tried ExpertPDF and Essential Objects HTMLtoPDF but they seem to mess up on very wide tables.
Any tools or ideas to fix this?

These tools use html rendering engines like a regular browser. Then they take the rendered content and write it into a pdf. They don't know how to handle wide content (except for scaling it to fit into a pdf page).
You need to change your html (create a printer friendly version of your page that is not that wide) and feed the html to pdf converter something that it can chew.

Converting XSL-FO to HTML

I have set of XSL-FO documents which are used for PDF generation. Also I have a requirement to get the same output data (which are in PDF) exported as an HTML file. Further, I need the HTML to have a similar styles as in PDF.
Is there any way to convert XSL-FO to XHTML using C#?
NOTE : I know one option is to use "RenderX:FO2HTML". But since it's a commercial product, I would like to learn about any other options available and do a comparison before continuing further.

I use the RenderX fo2html stylesheet a lot, and I recommend it to my customers because it is zero cost. Thus I have built it into a number of client solutions. You have to go through the RenderX online store to get it, but it costs nothing.

Write or find an XSLT stylesheet which converts XSL-FO into XHTML, modify it if necessary to get the rendering you require? Websearching "XSL-FO to HTML" finds at least one such.
Though this is somewhat backward. Normally the document starts in some semantic markup language (such as XHTML), and a stylesheet converts it into XSL-FO for rendering.

Server Side HTML to PDF

I'm trying to find a C# library that will allow me to "Print" one of my HTML pages to a PDF file. I can't seem to find out if one currently exists that will allow you to do this. I've found several that will let you build a page, but haven't noticed if one would generate the pdf only based off of HTML.
EDIT: I'm not allowed a budget on this at work so it will need to be an open source/free product. If not I'm aware of iTextSharp and will have to generate the pdf programmatically (which is what I'm hoping to avoid :) )

I've had a lot of luck with ActivePDF WebGrabber. It's kind of odd to use compared to standard managed libraries (ActivePDF is unmanaged), but it gets the job done.

iTextSharp comes with a little companion : XML Worker
For a demo, have a look here
Even though the documentation refers to the Java API, the adaptation to C# should be straightforward.

I've experimented with itextsharp and it works for basic conversion, but gets complicated when you get into styles and formatting. I've also heard wkhtmltopdf is out there as another option.

Lightly styled text library for WPF?

Does anyone know of a lightly-marked-up-text to styled-text formatting library (ie. something like Markdown# or Textile.NET), but which produces a native XAML document (or rather, a FlowDocument model or similar that can be displayed directly in a WPF app), to avoid the use of a WebBrowser?
Bonus points for something lightweight. I'm hoping for something that will tolerate very frequent updates in the source text.
Alternatively, is there a lightweight HTML rendering control that can be used in WPF? (I don't consider the standard WebBrowser to be lightweight.)

I don't know of such a library pre-built, but I do have some thoughts for you that may be helpful.
The first big question in my mind is why you want to use something primitive like Markdown when you could be using RichTextBox. Markdown is required for StackOverflow and similar sites because of the limitations of the browser. But if your app is WPF this is not an issue.
One guess as to why you might want to do this is that you want your documents to be editable both in WPF and in a lowest-common-denominator web application. In that case you will need an engine that renders the markdown to HTML anyway, so why not leverage that same engine to convert the markdown to XAML?
Converting arbitrary HTML to XAML is very difficult, but converting the sort of HTML that a Markdown converter would spit out is another matter entirely. Most Markdown-style converters spit out only a few simple HTML tags, all of which are trivially convertible to equivalent XAML.
If you use an Markdown-to-HTML converter it will have done all of the really heavy lifting for you (parsing the text, etc) and left you with an XML-like document (HTML to be precise) that is relatively easy to parse. Also, if you are using the Markdown-to-HTML converter elsewhere you will have confidence that your Markdown parser will parse your Markdown syntax exactly the same for both HTML and XAML use because it will be the same parser in each case.
So basically what I am thinking is:
string html = MarkdownEngine.MarkdownToHtml(markdown);
string xaml = MarkdownHtmlToXamlTranslator.HtmlToXaml(html);
Where you design your implementation of MarkdownHtmlToXamlTranslator around whatever the markdown engine actually spits out. It could be a very simple XSLT, or you could use LINQ to XML along with XDocument construction techniques. Either way it should be a very small bit of code.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.