.NET server based PDF generation - c#

I'd like to dynamically generate content and then render to a PDF file. This processing would take place on a remote hosting server so using virtual printers etc is out. Does any have a recommendation for a .NET library (pref C#) that would work?
I know that I could generate a bunch of PS code and package it myself but I'd prefer something a little less tricksy at this stage.
Thanks!

Have a look at http://itextsharp.sourceforge.net/. Its open source.
Tutorial: http://itextdocs.lowagie.com/tutorial/

I have had good success using SharpPDF.

I have had success using Siberix
http://www.siberix.com/
Corporate License: $350 USD (A single license covers unlimited number of company's developer seats, unlimited number of company's web servers and unlimited number of distributions as a part of your application.)

Free PDF Generator .NET (WkHtmlToPdf wrapper) can generate pretty PDF from HTML template with one line of code:
var pdfBytes = (new NReco.PdfGenerator.HtmlToPdfConverter()).GeneratePdf(htmlContent);
(all you need is one DLL, no external dependencies)

We use the Amyuni PDF Converter and have used it successfully for several years. Our usage is via the COM interface, but it does support a .NET interface.

I've had good experiences with Winnovative's HTML to PDF.
And bad ones with Open Source HTML Doc (Problems with form elements + CSS).

I have been looking for a high performing docx to pdf tool for a while now. Our system has an e-government aspect and is generating a very high number of reports to the user community. At this point, performance is paramount.
Earlier tools I have used did not do simultaneous conversion, instead each exe needed to wait for completion of the other. I have tried Aspose.words and I am very happy with the results.
First of all, it was very easy and seamless to integrate and deploy in our project. Very smooth.
Secondly, the speed of conversion is way better due to the fact that multiple jobs run in parallel.
Thirdly, not only fast, but even with no formatting errors. Considering that we are providing a multi-lingual system and some reports include both English and Arabic fields (mind right-to-left alignment!), this was very important.
And finally, the file size was quite small, which again is very important as tens of thousands of documents are created through our system.
Our first implementation was Microsoft Office Interop library. We convert docx to pdf documents by using below code. This library converts the docx documents to pdf files perfectly and we decided to upload this to report generation server. But after a while, we noticed that conversion operations are waiting for each executable. This causes a big delay on converting the documents at the same time and that's why we start to search a new tool for converting docx files to pdf files.
See Image
Below code shows the how to convert docx documents to pdf files by using Aspose.Words for .NET tool.
See Image 2

RDLC & the Report Viewer controls can generate PDF either at the Client's discretion or at server command which can then be served as a PDF mime-type.

I've used PDF4NET from O2solutions with much success. They support all sorts of scenarios and digital signing of the pdf.

If your data is mostly in XML, you could also look at a XSL-FO solution - we're using Alt-Soft's Xml2Pdf with great success. The "server" version is a bit of a misnomer - it's really just a single DLL you need to include in your Winforms, WPF or ASP.NET app - that's all!
Works like a charm (if you're familiar with XSLT and XSL-FO, or willing to learn it).
Marc

We used a set of third party DLLs from PDFSharp who in turn use DLLs from MigraDoc. I'm not privy to all the reasons that we went that direction (the decision was made by a senior developer), but I can tell you that:
It seems to be in active
development.
It had most of the
features we needed.
The source code
is available. Although it used some
patterns and conventions that I
hadn't seen before, once I got on to
them, it was fairly easy to make the
changes. I added support for using
the System.Drawing.Image directly
rather than as saving files.
It is
not documented well either
internally or externally.

I used iTextSharp in .NET 6 as shown below, but it had an issue of loading scripts and cdn(s) for loading stylesheet, it only works with inline styles. these bytes can be saved using File.WriteAllBytes()
using iTextSharp.text;
using iTextSharp.text.html.simpleparser;
using iTextSharp.text.pdf;
using System.Net.Http;
using PageSize = iTextSharp.text.PageSize;
public static byte[] GenratePdfBytes(string htmlContent)
{
byte[] pdfBytes;
var pdfDoc = new Document(PageSize.A4, 10f, 10f, 10f, 0f);
var html = new StringReader(htmlContent);
var htmlparser = new HTMLWorker(pdfDoc);
using (var memoryStream = new MemoryStream())
{
var writer = PdfWriter.GetInstance(pdfDoc, memoryStream);
pdfDoc.Open();
htmlparser.Parse(html);
pdfDoc.Close();
pdfBytes = memoryStream.ToArray();
}
return pdfBytes;
}

There are a few ways to do this, in my experience, and it depends on the application and complexity of what you are trying to generate and whether the resulting PDF needs to be a commercial print-ready file or just a PDF report for sharing/archiving etc, and what sort out volume output, based on budget. Most higher end PDF libraries come with a large price tag.
I have used various techniques based no the complexity, there are libraries to generate PDF (build PDF elements from the ground up) in this case you could use something like iText or others that can add content on top of a PDF.
If you need to do minor adjustments i.e. use an existing PDF as a template and add some content (text/images) there are libraries that can just stamp text and images on top. (eg: http://www.pdfsharp.net/)
If you generating invoices or reports, you could use an HTML template, merge data (replacing {tokens} etc) and then convert the html to pdf using a different type of mechanism (eg: https://www.nrecosite.com/pdf_generator_net.aspx)
There are API's if you need full control over styling, client generated templates (idml) etc, you can integrate with InDesign Server and use that to generate print ready PDF files. I have build an API like this but this is another level of PDF generation.

Related

Converting MS Office Docx with a good compatibility

After spending hours and hours on StackOverflow and programmers forum, i've decided to use the SyncFusion on our project.
Our main target is :
convert to PDF/directly print existing Doc And Docx
this Document can be quite complexe (including shapes, images....)
using Word Interop can not be a solution for us
If we are able to convert or print without problems, the orginal Word documents are not well rendered (parts of shapes missing...)
Somebody is using this component without problem ? Or do you knows others best components (Aspose ??)
Yes, you can meet these requirements using Aspose.Words. With Aspose.Words for .NET API, you can easily render any complex Microsoft Word document or Text/Html/Mhtml file to PDF format with high fidelity. Please see the following simple code:
// Load Word document in memory for processing
Document doc = new Document(MyDir + "Document.docx");
// Send it directly to printer
doc.Print("printerName");
// Convert DOCX to PDF
doc.Save(MyDir + "DocxToPdf.pdf");
This means that if you convert a Microsoft Word document into PDF, XPS or print it using Aspose.Words, the output will appear almost exactly as if it was done by Microsoft Word application. Please check Aspose.Words for .NET Documentation for more details.
I work with Aspose as Developer Evangelist.
Can you please send your example document to Syncfusion by creating a support ticket and we will able to check and provide a solution. (I work for Syncfusion)
Ty for all the answers (in this post and other on SO).
But after trying :
4 or 5 .Net Libraries
Using very simple, but also complex Doc and Docx Files
And specialy with complex doc file, mixing complex shapes, grouping schemes, img import....
Here is my conclusion (imo) :
there is no better processing (printing and creating PDF) than using Ms Office and it's automation from .Net
and even if it is not recommended by MS itseflf, we are very sastified of our "doc printing server"
it is printing more than 30 or 50 "Build of Manufacturer" with more than 30 pages each, and also with A3 plan.
it is working very well for 4 month, without major bug
If somebody is interested, i can post the tips and link i have used to properly use MS Office in a "Windows Printer Server mode".
Best regards from Toulouse
Docentric Toolkit is primarily a mail merge library but also has a complete DOM and a high fidelity PDF/XPS rendering engine. Still I haven’t try to convert a Word document containing decorations and styles with effects.
document = Document.Load("Test1.docx");
document.Save("Test1.pdf");

Convert docx to postscript

I need to convert a Word document (docx) to a postscript file so that I can use this postscript file to generate PDF using the Ghostscript command line tool.
How do I generate the postscript file from the docx?
I need to code using .NET/C#. I found about LaTeX which generates postscript but how do I make my Word file be used with LaTeX or any other tool to get the postscript generated?
There are three main products I will mention that understand DOCX.
The obvious one is MS Word. It produces the definitive rendering of all DOCX files. Nothing is ever going to be exactly the same. By definition it is always correct. However it is not really designed for automated conversion and getting it to do this kind of thing is fraught with difficulty. On a legal level the EULA may confict with your chosen solution.
OpenOffice.org is a great product. The EULA is much more accomodating. The freeness is attractive. However, while it will produce a pretty good output for most DOCX documents it does not for all. While it is similar to MS Word it is not the same and this is something you may notice, particularly for more complex documents. Probably more importantly, again it's not designed for automated conversions and trying to get it to do this can be fraught and tiresome.
WordGlue .NET (on which I work) is a native .NET library that understands DOCX. It is designed specifically to produce output which is the same as MS Word. While I'm not going to say it is perfect (it's a big task) it is superior to OpenOffice.org in that it does actually attempt this as a specfic design decision. However probably the biggest advantage is that it is designed for high perfomance multi-threaded server side conversion. It's native .NET and thus low impact in terms of security.
Products like ABCpdf (on which I work) will integrate with these three applicatons to allow conversion direct to PDF. Why bother going via PostScript if you want PDF? However if you really want to save as PostScript you can do that too.
Or indeed you can write your own code to integrate with these products. Just be aware of the caveats above regarding fraughtness and tiresomeness relating to MS Office and OpenOffice.org. To get these things working unattended requires an awful lot of attention.
You need to print it to a PostScript file, from an application which can read .docx files. Or you could just export direct to PDf from the app, as far as I know anything which reads .docx and can print, can also write a PDF file.
If you have a windows computer you can use the commandline
"%ProgramFiles%\Windows NT\Accessories\wordpad.exe" /pt foobaar.docx "printerThatDumpsPS"
You can find file printers for postscript printing for free on the internet. Or if you have adobe pfdf, pdf exchange or any PS printer. You can use c# to temporarily set the printers settings so that it does this for you.
So for example using pdf exchange as follows,
"%ProgramFiles%\Windows NT\Accessories\wordpad.exe" /pt foobaar.docx "PDF-XChange Printer 2012"
Produces a pdf file without much of a trace anywhere what program was used, assuming pdf exchange was set to save file without asking.
This produces a passable document but yeah it looses quiet many features. But it might be enough.

What options do I have to produce a PDF report from code in .NET for scientific data (winforms)

I have a "legacy" VB.NET application (winforms) written for .NET 1.1, and re-compiled under 2.0 that produces a report in HTML via a custom XmlTextWriter wrapper that is suited for HTML. The user then would print the report into pdf if they wanted to.
That was 2003, and now technology has changed a bit, especially within the C#/VB.NET world, and customers want to skip the HTML part, and go to PDF directly. What are my options for open source, or low cost PDF libraries that work well with .NET and must support tables with pictures (generated bitmaps from code) and text.
Here is screen shot of the resulting html rendering
Obviously this needs some cleaning up, tidying it and stuff, but I am interested on known which technology to pursue in this project.
This related question might be what I need, or it may be out of date by now. I don't have any data sources that will provide all the information I want displayed. Currently it is collected from various classes within the application in order to be displayed as html.
anybody have direct experience with iTextSharp or SharpPDF ?
Thanks for any advice.
Update 1:
found possible duplicate here.
I have used iTextSharp to produce PDF reports before. Although you have to get used to the library (and it is an extensive library), once you get the hang of it it isn't so bad. I found the book iText In Action to be very helpful. Even though the book is about the original Java library, not the .NET port, most of the methods and classes are named the same so it wasn't really a problem.
My #1 piece of advice when working with iTextSharp is that you'll be writing a lot of the same code, over and over again. (i.e. creating a table cell, setting the fonts, sizes, colors, and borders for that table cell, setting text...). Do yourself a favor and make your own little Utility class that will do all of your gruntwork for you -- otherwise you'll end up with 2000 lines of code that just create a few tables with some special formatting.
In addition, this site has a series of brief articles that I found useful when I was first learning iTextSharp.
Edit:
If you're interested in an XHTML-->PDF converter, I just found this blog post by Darin Dimitriov that shows how to port the open-source Java flying-saucer library to .NET. He makes it look easy!
Interestingly enough, it seems that flying-saucer uses iText under the hood to perform the conversion.
Report.NET is a .NET PDF library specific for report generation supporting the features you asked for. It is smaller than iTextPdf, but perhaps sufficient for your needs:
http://sourceforge.net/projects/report/
(license is LGPL).
You can use this free print driver:
http://www.dopdf.com/
When you print to it, it outputs PDF.
I was researching this topic two months ago and basically you have two ways:
Dlls
open source iTextSharp - well, don't expect too much from it, it can generate PDFs from simple web pages, your table seems quite complicated though some my I don't think you will succeed with it without some tweaking of it's source code
paid options - I've used ABCPdf, works pretty smoothly, not everything is rendered as good as in browser but does it's job, I believe there are way more libraries like this
Command line tools
If you are lucky enough to have full control over server I think it will be your best option
wkhtmltopdf - good user opinions
htmldoc
I had not tried does two though, I used hosting so they were not an option to me
I just wrote a TIP in CodeProject on how to do this without using any external DLL in a couple of lines.
Here's the short code copied:
// ----------------------------------------------------------------------------------------------
// If you run this on Windows 10 (having it's default printer "Microsoft Print to PDF" installed)
// This should print a PDF file named "CreatedByCSharp.PDF" in your "MyDocuments" folder
// containing the string "When nothing goes right, go left"
// ----------------------------------------------------------------------------------------------
// If not present, you will need to add a reference to System.Drawing in your project References
using System.Drawing;
using System.Drawing.Printing;
void PrintPDF()
{
// Set the output dir and file name
string directory = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
string file = "CreatedByCSharp.pdf";
PrintDocument pDoc = new PrintDocument()
{
PrinterSettings = new PrinterSettings()
{
PrinterName = "Microsoft Print to PDF",
PrintToFile = true,
PrintFileName = System.IO.Path.Combine(directory, file),
}
};
pDoc.PrintPage += new PrintPageEventHandler(Print_Page);
pDoc.Print();
}
void Print_Page(object sender, PrintPageEventArgs e)
{
// Here you can play with the font style (and much much more, this is just an ultra-basic example)
Font fnt = new Font("Courier New", 12);
// Insert the desired text into the PDF file
e.Graphics.DrawString("When nothing goes right, go left", fnt, System.Drawing.Brushes.Black, 0, 0);
}
I ended up using iTextSharp to produce image flashcards with some tricky formatting after becoming deeply frustrated with other libraries.
Where it really paid off was the competent documentation compared to the other options. I believe there is also an option to automatically parse HTML/XML.
I'd also suggest to take a look at PD4ML html to pdf converting library. It has quite a modest price for paid product, but it supports lots of features and is instantly updated.

ASP.Net Converting and Merging documents into single PDF

I need to have the ability to convert and merge various documents into a single Pdf.
The documents could be of varying types, such as Word, Open Office, Images, Text, Web pages (by URL) and the PDF would usually consist of 2-3 documents.
At the moment, we are using BCL Technologies easyPDF with Microsoft Office installed onto the Server. This handles most documents but we haven't had it doing Open Office ones yet.
We currently produce around 100-1000 of these PDF's per day.
The reason I am asking the question is that performance is a key issue. The PDF is generated for users on the fly and so the waiting times we are currently getting of 30-60 seconds is becoming unacceptable.
We have done some caching around documents when they are intially uploaded so the main tasks that happens when a User requests a Pdf is merging a number of already generated Pdf's.
Does anyone else have any other tools they have used that work reliably for most common document types and above all, quickly? When put like that, it seems like I'm asking a lot!
Edit:
Thanks for all the great advice, I'll look into some of these and compare performance.
Just to add to all this, money is not really an object. We're more than happy to pay for different applications to perform each task as well as looking into various hardware options to distribute the load as much as possible.
Merging multiple PDF documents is normally simple enough (as long as they don't need to be merged on the same page) - you could compare your merge performance with something like iTextSharp (.NET version of iText) to be sure it isn't a bottleneck - otherwise the conversion from other formats to PDF is likely the bottleneck.
In almost all cases, the method used to convert X to PDF is to execute the applications print command, targeted at a software PDF printer, to create a temporary PDF file.
This means:
The target application (for example Office) is opened and closed
The document has to travel through the printing service
In your situation, are you converting arbitrary documents submitted by the users, or do the documents come from a stored library of files? If it's a library, you could make a PDF copy of each file as it is added to the library (instead of when the user makes a request), and then only merge the PDF files.
We use ABC Pdf. I don't know if it will be fast enough for your needs, but it seems to work for our use.
I had a very similar issue where we had documents that were already existing in PDF format and needed to allow the user to see them all combined together. We purchased the PDF4NET product which was about $500 from what I recall. It was extremely easy to use and they provide awesome examples of how to use the tools.
O2 Solutions - PDF4NET
Here is the code sample that they provide for merging. The top line looks like it just outputs the file, the second 2 lines allow for streaming the content back to the user.
PDFFile.MergeFilesToDisk( "append.pdf", "unicode.pdf", "multicolumntextandimages.pdf" );
PDFDocument doc = PDFFile.MergeFilesToDoc( "append.pdf", "unicode.pdf", "multicolumntextandimages.pdf" );
doc.SaveToStream( stream );
You say you're using Microsoft Office to open these files, I would imagine this is the bottleneck rather than the actual PDF creation.
Is it possible to distill these documents into a more accessible format (html/xml/database), so that it's not necessary to open office every time a PDF needs to be created?
While I have no PDF conversion suggestions I can say that this problem sounds like one which could be distributed over a number of nodes. Do you find that the PDF generation is CPU-bound or are there other limiting factors? Before expending too much effort on rewriting the PDF library interface you might want to see what the bottlenecks are.

How can a Word document be created in C#? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a project where I would like to generate a report export in MS Word format. The report will include images/graphs, tables, and text. What is the best way to do this? Third party tools? What are your experiences?
The answer is going to depend slightly upon if the application is running on a server or if it is running on the client machine. If you are running on a server then you are going to want to use one of the XML based office generation formats as there are know issues when using Office Automation on a server.
However, if you are working on the client machine then you have a choice of either using Office Automation or using the Office Open XML format (see links below), which is supported by Microsoft Office 2000 and up either natively or through service packs. One draw back to this though is that you might not be able to embed some kinds of graphs or images that you wish to show.
The best way to go about things will all depend sightly upon how much time you have to invest in development. If you go the route of Office Automation there are quite a few good tutorials out there that can be found via Google and is fairly simple to learn. However, the Open Office XML format is fairly new so you might find the learning curve to be a bit higher.
Office Open XML Iinformation
Office Open XML - http://en.wikipedia.org/wiki/Office_Open_XML
OpenXML Developer - http://openxmldeveloper.org/default.aspx
Introducing the Office (2007) Open XML File Formats - http://msdn.microsoft.com/en-us/library/aa338205.aspx
DocX free library for creating DocX documents, actively developed and very easy and intuitive to use. Since CodePlex is dying, project has moved to github.
I have spent the last week or so getting up to speed on Office Open XML. We have a database application that stores survey data that we want to report in Microsoft Word. You can actually create Word 2007 (docx) files from scratch in C#. The Open XML SDK version 2 includes a cool application called the Document Reflector that will actually provide the C# code to fully recreate a Word document. You can use parts or all of the code, and substitute the bits you want to change on the fly. The help file included with the SDK has some good code samples as well.
There is no need for the Office Interop or any other Office software on the server - the new formats are 100% XML.
Have you considered using .RTF as an alternative?
It supports embedding images and tables as well as text, opens by default using Microsoft Word and whilst it's featureset is more limited (count out any advanced formatting) for something that looks and feels and opens like a Word document it's not far off.
Your end users probably won't notice.
I have found Aspose Words to be the best as not everybody can open Office Open XML/*.docx format files and the Word interop and Word automation can be buggy. Aspose Words supports most document file types from Word 97 upwards.
It is a pay-for component but has great support. The other alternative as already suggested is RTF.
To generate Word documents with Office Automation within .NET, specifically in C# or VB.NET:
Add the Microsoft.Office.Interop.Word assembly reference to your project. The path is \Visual Studio Tools for Office\PIA\Office11\Microsoft.Office.Interop.Word.dll.
Follow the Microsoft code example
you can find here: http://support.microsoft.com/kb/316384/en-us.
Schmidty, if you want to generate Word documents on a web server you will need a licence for each client (not just the web server). See this section in the first link Rob posted:
"Besides the technical problems, you must also consider licensing issues. Current licensing guidelines prevent Office applications from being used on a server to service client requests, unless those clients themselves have licensed copies of Office. Using server-side Automation to provide Office functionality to unlicensed workstations is not covered by the End User License Agreement (EULA)."
If you meet the licensing requirements, I think you will need to use COM Interop - to be specific, the Office XP Primary Interop Assemblies.
Check out VSTO (Visual Studio Tools for Office). It is fairly simple to create a Word template, inject an xml data island into it, then send it to the client. When the user opens the doc in Word, Word reads the xml and transforms it into WordML and renders it. You will want to look at the ServerDocument class of the VSTO library. No extra licensing is required from my experience.
I have had good success using the Syncfusion Backoffice DocIO which supports doc and docx formats.
In prior releases it did not support everything in word, but accoriding to your list we tested it with tables and text as a mail merge approach and it worked fine.
Not sure about the import of images though. On their blurb page http://www.syncfusion.com/products/DocIO/Backoffice/features/default.aspx it says
Blockquote
Essential DocIO has support for inserting both Scalar and Vector images into the document, in almost all formats. Bitmap, gif, png and tiff are some of the common image types supported.
So its worth considering.
As others have mentioned you can build up a RTF document, there are some good RTF libraries around for .net like http://www.codeproject.com/KB/string/nrtftree.aspx
I faced this problem and created a small library for this. It was used in several projects and then I decided to publish it. It is free and very very simple but I'm sure it will help with you with the task. Invoke the Office Open XML Library, http://invoke.co.nz/products/docx.aspx.
I've written a blog post series on Open XML WordprocessingML document generation. My approach is that you create a template document that contains content controls, and in each content control you write an XPath expression that defines how to retrieve the content from an XML document that contains the data that drives the document generation process. The code is free, and is licensed under the the Microsoft Reciprocal License (Ms-RL). In that same blog post series, I also explore an approach where you write C# code in content controls. The document generation process then processes the template document and generates a C# program that generates the desired documents. One advantage of this approach is that you can use any data source as the source of data for the document generation process. That code is also licenced under the Microsoft Reciprocal License.
I currently do this exact thing.
If the document isn't very big, doesn't contain images and such, then I store it as an RTF with #MergeFields# in it and simply replace them with content, sending the result down to the user as an RTF.
For larger documents, including images and dynamically inserted images, I save the initial Word document as a Single Webpage *.mht file containing the #MergeFields# again. I then do the same as above. Using this, I can easily render a DataTable with some basic Html table tags and replace one of the #MergeFields# with a whole table.
Images can be stored on your server and the url embedded into the document too.
Interestingly, the new Office 2007 file formats are actually zip files - if you rename the extension to .zip you can open them up and see their contents. This means you should be able to switch content such as images in and out using a simple C# zip library.
#Dale Ragan: That will work for the Office 2003 XML format, but that's not portable (as, say, .doc or .docx files would be).
To read/write those, you'll need to use the Word Object Library ActiveX control:
http://www.codeproject.com/KB/aspnet/wordapplication.aspx
#Danny Smurf: Actually this article describes what will become the Office Open XML format which Rob answered with. I will pay more attention to the links I post for now on to make sure there not obsolete. I actually did a search on WordML, which is what it was called at the time.
I believe that the Office Open XML format is the best way to go.
LibreOffice also supports headless interaction via API. Unfortunately there's currently not much information about this feature yet.. :(
You could also use Word document generator. It can be used for client-side or server-side deployment. From the project description:
WordDocumentGenerator is an utility to generate Word documents from
templates using Visual Studio 2010 and Open XML 2.0 SDK.
WordDocumentGenerator helps generate Word documents both
non-refresh-able as well as refresh-able based on predefined templates
using minimum code changes. Content controls are used as placeholders
for document generation. It supports Word 2007 and Word 2010.
Grab it: http://worddocgenerator.codeplex.com/
Download SDK: http://www.microsoft.com/en-us/download/details.aspx?id=5124
Another alternative is Windward Docgen (disclaimer - I'm the founder). With Windward you design the template in Word, including images, tables, graphs, gauges, and anything else you want. You can set tags where data from an XML or SQL datasource is inserted (including functionality like forEach loops, import, etc). And then generate the report to DOCX, PDF, HTML, etc.

Categories

Resources