Server Side HTML to PDF

Server Side HTML to PDF - c#

I'm trying to find a C# library that will allow me to "Print" one of my HTML pages to a PDF file. I can't seem to find out if one currently exists that will allow you to do this. I've found several that will let you build a page, but haven't noticed if one would generate the pdf only based off of HTML.
EDIT: I'm not allowed a budget on this at work so it will need to be an open source/free product. If not I'm aware of iTextSharp and will have to generate the pdf programmatically (which is what I'm hoping to avoid :) )

I've had a lot of luck with ActivePDF WebGrabber. It's kind of odd to use compared to standard managed libraries (ActivePDF is unmanaged), but it gets the job done.

iTextSharp comes with a little companion : XML Worker
For a demo, have a look here
Even though the documentation refers to the Java API, the adaptation to C# should be straightforward.

I've experimented with itextsharp and it works for basic conversion, but gets complicated when you get into styles and formatting. I've also heard wkhtmltopdf is out there as another option.

Related

Does MuPdf library have unicode or text search functionality?

Background
I am working on a WPF windows application and I want add embedded PDF viewer with only basic functionalities including PDF view, text search and page navigation.
I tried embedded Internet Explorer and Adobe PDF Reader installed method (this way ) but this method is not suitable for our requirement as Adobe PDF Reader has too may external links which can not be allowed because of the security reasons of the application.
Therefore, I am trying to use moonpdf library. This library works fine with our requirements but the only problem is there is no text search functionality in this library. (I think it shows PDF as images)
Then, I have download moonpdf source code and realized that moonpdf is using libmupdf.dll wrapping to c#.
I can modify the moonpdf source code and mupdf source code for our requirement if needed.
My Question
Is there any text search functionalities in mupdf? if so how can I use it?

In the basic mupdf library, there are several functions for searching for text. These work by searching a page for a text string, in a few different variants, and returns the area for all hits of the given text. You need to iterate over the pages yourself (in order to do forward or reverse search).
fz_quad hits[1000];
count = fz_search_page(ctx, page, needle, hits, nelem(hits));
That said, I do not know how or even if "moonpdf" has wrapped these functions.

You can certainly extract the text from a document, the MuPDF library will do that. I believe it's up to you to apply your own search criteria after that. I'm afraid I'm not expert enough to answer the 'how to' part of it though. I imagine one of the mutool examples would be helpful here though. I'll see if I can get one of the developers to answer.

How to export data in Groupbox into PDF?

I have built a very simple windows form application using C#. But I am stuck here and could not find any help anywhere.
What my simple application does is that it queries out the information in the groupbox which has textboxes labels and buttons. I would like to export or convert the data from in the groupbox into the pdf.
Is there a way to implement that??
Thank you so much in advance~

You may want to have a look at http://www.pdfsharp.com/PDFsharp/. It's a pretty good framework for what you need.
Good luck

There are so many options, following SO Link provides the solution and it is working it seems
convert windows form to pdf file
Other Solution
Use following code to take screenshot
http://www.developerfusion.com/code/4630/capture-a-screen-shot/
Use PDFSharp to save as PDF

there's really a ton of different solutions for this, but you will almost absolutely need a 3rd party library for what you're trying to do. Google around for a pdf library for c#, as there are several.
EDIT: some possible solutions may be PDFjet, PDFsharp,ABCPDF (do note, if this is part of a commercial application, make sure that the licensing allows use for it)

How to convert rtf or docx to PDF server-side (without having OpenType fonts rasterized or substituted)

does anyone know of a tool that permits rtf/doc/docx conversion to pdf from C# ?
These documents contain all sorts of fonts (TrueType or OpenType), and I am looking for a tool that will not rasterize the OpenTypes ones and render them rather accurately.
Many thanks in advance for any pointers !

have a look at Aspose.Words
It works server side and does not require Office Automation.
I've used it in a commercial project and it works really good. But it's a commercial component, so you need to pay for a license (which is not that cheap)
EDIT
please have a look at this thread in the Aspose forum

I've used iTextSharp myself in a C# project recently, and it worked pretty well.
Unfortunately, I'm not able to give you any example, as that would make a huge post (and the code is pretty confidential) :)
But here's the link to their site, if that helps:
http://itextpdf.com/
Hope this fits your needs!

Batch conversion of docx to clean HTML

I'm starting to wonder if this is even possible. I've searched for solutions on Google and come up with nothing that works exactly how I'd like it to.
I think it'd benefit to explain what that entails. I work for database group at my university's IT department. My main job is to take specs of a report in a docx file, copy that over to dreamweaver, fix some formatting, and put it onto their website. My issue is that it's ridiculously tedious to do this over and over. I figured, hey, I haven't written anything in C# for some time now, perhaps I could write an application to grab a docx file, convert it to HTML, fix the CSS, stick the header, and footer from the webpage on there, and save the result. I originally planned to have it do one by one, but it probably wouldn't be difficult to have it input a list of files and batch convert.
I've found these relevant topics on how to accomplish this, but they don't fit my needs well enough.
http://www.techrepublic.com/blog/howdoi/how-do-i-modify-word-documents-using-c/190
This is probably fine for a few documents, but since it's just automating an instance of Word, I feel like it'd be slow and memory intensive. I'd prefer to avoid opening and closing an instance of Word 50+ times.
http://openxmldeveloper.org/articles/333.aspx
This is what I started using. XSLT had the benefit of not needing word to be installed nor ran for each file. After some searching I got a proof of concept working. It takes in a docx file, decompresses it, grabs the document.xml from that, and uses the DocX2Html.xsl file I scavenged from OpenXML viewer. I believe that was originally provided by MS for sharepoint servers to provide the ability to render word documents in a browser. Or something along those lines.
After adjusting that code to fit my needs, and having issues with the objXSLT.Load () method, I ended up using IlMerge to make the XSL into a DLL. No idea why I kept getting a compile error when using the plain old XSL file, but the DLL worked fine, so I was satisfied. Here (http://pastebin.com/a5HBAakJ) is my current code. It does the job of converting docx to HTML just fine (other than random spaces between some words), but the result file has ridiculously ugly HTML syntax. An example of this monstrosity can be found here (http://pastebin.com/b8sPGmFE).
Does anyone know how I could remedy this? I'm thinking perhaps I need to make a new XSL file, as the one MS provided is what's responsible for sticking all those tags and extra code in there. My issue with that is that I don't know anything about how to do that. Perhaps there's an alternative version already out there. All I'd need is one that will preserve tables and text formatting. Images aren't needed.

This looks like just what you need: http://msdn.microsoft.com/en-us/library/ff628051(v=office.14).aspx
The author Eric White blogged about his experiences developing that tool. You can see that list of posts on his blog here: http://blogs.msdn.com/b/ericwhite/archive/2008/10/20/eric-white-s-blog-s-table-of-contents.aspx#Open_XML_to_XHtml

Since I'm a big fan of Aspose.Words, a commercial library to create/process Word documents, I would do something like:
Open the Word document with Aspose.Words.
Save the Word document as HTML.
Use something like SgmlReader or HTML Agility Pack (or even Regular Expressions if it is suitable) to remove unwanted HTML tags/attributes.
Since you wrote you work at an university, I'm not sure whether commercial packages are an option, though.

Hi not sure what the rules are on promoting your own solutions, so do let me know if I am out of line.
I am a web developer who had the same issues, so I created my own tool:
http://www.convertwordtohtml.com
We are also working on a new version that will have even better conversion quality and one click conversion eg you can right click on a word file and it will be directly converted to html and the code placed into the clipboard. The current version also supports command line access and the new version will have a server version to.
There is a free trial version downloadable from the site , and if you have any questions do contact me any time.

How to generate an index at end of a PDF file?

Given an existing PDF document, I would like to tack on an index to the end of the file to show the pages on which key words show up. It would be best if I don't have to give a list of words to look for and the list of words is automatically generated. However, if a list of words must be given, I can work with it. I'm looking to do this either through a C# library or a command line tool. It needs to run as part of another command line app.
Is there anything out there that is capable of this?
This "PDF Index Everthing" (http://www.pdfstore.com/details.asp?ProdID=799) seems to be on the right track, but requires interaction through its GUI.

I don't actually have an c# solution but hopefully this will still help...
pdflib is an excellent pdf development library. It is one of the better libs available. As far as I know it doesn't have a C# binding. PDF is a random access object-based file format and although there are many libraries that allow for creating of pdfs, most freely available libs don't support adding pages to existing pdfs. pdflib does support adding pages with it's pdi option, so it may be worth checking out.
Updated Info:
Check out- iText# library and
merging pdf files with C# and iText

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.