I have a situation where in a web application a user may need a variable list of PDFs to be printed. That is, given a large list of PDFs, the user may choose an arbitrary subset of that list to print. These PDFs are stored on the file system. I need a method to allow users to print these batches of PDFs relatively easily (thus, asking the user to click each PDF and print is not an option) and without too much of a hit on performance.
A couple of options I've thought about:
1) I have a colleague who uses a PDF library that I could use to take the PDFs and combine them on the fly and then send that PDF to the user for printing. I don't know if this method will mess up any sort of page numbering. This may be an "ok" method but I worry about the performance hit of this.
2) I've thought about creating an ActiveX that I would pass the PDFs off to and let it invoke the printing features. My concern is that this is needlessly complex and may present some odd user interactions.
So, I'm looking for the best option to use in this scenario, which is probably not one of the ones I've gone through.
The best solution I have for you is number 1. There are plenty of libraries that will merge documents. From the one I've used the numbering should not be an issue since all the pages are all ready rendered.
If you go with ActiveX you are going to limit yourself to IE which might be acceptable. The only other idea would be to use a smart client so you can have more control...then you could serve up the PDF's via a web service.
I think concatenating the documents is the way to go.
For tools I recommend iText#. Its free
You can download here iTextSharp
iText# (iTextSharp) is a port of the iText open source java library for PDF generation written entirely in C# for the .NET platform. Use the iText mailing list to get support.
I agree with #1. You could do some tests to see what the performance hit would be like.
Related
We are generating PDFS from a web app a couple different ways, ItextASharp, Html->Rotativa, and RDLC...
Is there any way in anyof those tools to modify the ViewerPreferences dictionary inside the PDF so as to disable the "shrink to fit" option..
The PDF format supports this option, I've found documentation for that...
I'm aware that not all viewers honor the request not to shrink to fit, but we're using stock adobe readers across the board so it's ok.
I was able to find this in ITextSharp to read one, modify it and save, it, so I have to believe there is a way to set it before generation...but I can't find it..
Determine properties such as if PDF is Simplex or Duplex in iTextSharp
It'd be awesome if Rotativa had a way too...since we use that for some reports
We also have some done in RDLC style, if there is a way to do it there...
The reason we have to do it, is one of our apps prints labels and the amount of data leave no room for fudging it. Printing them from a web app is problematic, even when we control the ecosystem.
Unfortunately, our IT group will not use the reg settings to change the default on the machines.. we have to do it through code.
I am thoroughly confused with something I want to do and am looking for some advice.
One of my client has to produce monthly invoice detailing all of the company expenditure, and two other such invoices. The client is sure that he only needs these invoices - and they are extremely simple enough to produce as far as logic is concerned.
Now, to make the actual invoice, I don't really want to use reporting solutions like Telerik, SSRS etc.. as I think they are an overkill for my purpose. At the same time, I am not sure how I can get the printer to print the invoices in a neat pages without cutting off anything.
I am very tempted to just give the output in a webpage and ask my client to print them off from there.
Am I not looking at this the right way? Is this possible?
I could use ITextSharp or something to produce pdf's.. In fact, I think I will go ahead with this if it isn't possible to just output to html page and get the printer to recognize the page breaks somehow.
Because this is a very small job, I don't want to spend too much time on it as the cost of this freelance project is minimal too.
The reason printing to a new page is important is that my client has a few shops he deals with and he would want to print each of his customers their own invoices. I can get him to produce each customer's invoice separately and print them but it is not ideal way to deal with it.
thanks
There is a css property which should tell a browser to break a page: page-break-before.
But if you have a a wide list of browsers to support, it would be better to get some HTML to PDF conversion library or really use iTextSharp (as far as I know there is even a module/class which allows to conver HTML to PDF with iTextSharp) as printing web pages has many issues.
In the past, when I wanted to create a reusable document, I used Word or Excel XML formats.
See: http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
They are easy to create and tweak, then all you have to do is recreate the dynamic parts in your code. All you have to do is save the document in Office XML format, then open it up in word pad to see where to make your changes.
SSRS has a drag and drop interface for designing reports and has a PDF output option. If the data is in a SQL server database then even with the learning curve it should be easier to do SSRS reports.
I'm looking for an asp.NET control that will allow for viewing and printing of a pdf and TIFF within a web form. I'm willing to use more than 1 control if needed (1 control for pdf, 1 for Tiff, show and hide based on file extension), but I have not been able to find a good Tiff viewer.
Files are stored on our LAN in a shared folder, and this application is an intranet site.
Open source / free licensing preferred, but I'm willing to look at paid options as well.
http://www.alternatiff.com/ is one of the viewers that I've seen used for this type of viewing of tiffs.
You can get a free licence of ABCPDF (provided you link back to their site) which will do the conversion from TIFF to PDF for you as per #Chris Lively 's suggestion.
It'll also do conversion from PDF to TIFF if you decide to do things backwards.
It makes sense to present the content in a common format. If you wanted to you can embed the PDF in the browser to create the 'seamless' experience you're looking for using something like PDFObject.
As #BenCr says though, PDF is a really common format and the tools already exist to open and work with them, so introducing new ways to perform existing tasks could actually end up complicating matters unnecessarily.
I'm in total agreement with #BenCr on this.
Viewing PDFs is an extremely common thing to do. This isn't a "technical" issue by any stretch.
It sounds like you have some type of faxing solution in place that is creating these documents. Most likely multi-page TIFF and PDFs.
If this is the case you might want to just convert the TIFFs to PDFs to begin with and run everything through Adobe's pdf reader. Every online fax solution does this.
You could try http://issuu.com/ and they appear to have a API too if you want to go that deep.
We used the the Seadragon control to do this. I think it was an overkill and we should have just rolled our own -- would have been cheaper than integrating it. TIFFs and PDFs are converted to PNG on the server side. I don't think you can do better than that, especially with PDFs (assuming you don't want to use Acrobat Reader to display them). Convert PDFs to PNG using Xpdf/Poppler.
How about using Google Docs Viewer?
EDIT: Probably not working, since the viewer has to read the document from your URL; when it's on the Intranet, this won't work.
If you can mess about with mime types -- mainly by making the .tiff files expose an application/pdf mimetype -- you should be able to get acrobat to open TIFF files directly by effectively fooling the browser to open TIFF files with acrobat. Then all you need is a trusty old iframe to get you familiar UI with print buttons.
Where I work we get PDF templates from our clients and we convert them into html templates that we can change out tokens in the page with other info and mail them out to their clients.
The reason we convert them into html is because the text can wrap if any of the info is too long.
The process can be slow since we can only do 1 at a time on a single computer because of a GDI problem. So we have a farm that creates the pdf docs that need to be mailed out.
The GDI problem.
http://support.microsoft.com/kb/939884/en-us/
The hotfix does not seam to fix the problem.
Is there a better way of doing this that would be more efferent or easier to do with out having to change from pdf->html->pdf
Have a look at iTextSharp. You can use the PdfStamper object to dink with the original PDF and spit out customized versions very quickly. You might need to do a little massaging of the source PDF (define some form obejcts in Acrobat, etc) to get it to do exactly what you want, but it's very high performance compared to the process you're describing. We use it to generate thousands of PDFs a day (with customized info inserted for each customer), and it's free. A little tricky to get your head around (steep learning curve). I'd highly suggest buying the eBook "iText In Action"- the samples there make the whole thing much easier to grok.
I need to have the ability to convert and merge various documents into a single Pdf.
The documents could be of varying types, such as Word, Open Office, Images, Text, Web pages (by URL) and the PDF would usually consist of 2-3 documents.
At the moment, we are using BCL Technologies easyPDF with Microsoft Office installed onto the Server. This handles most documents but we haven't had it doing Open Office ones yet.
We currently produce around 100-1000 of these PDF's per day.
The reason I am asking the question is that performance is a key issue. The PDF is generated for users on the fly and so the waiting times we are currently getting of 30-60 seconds is becoming unacceptable.
We have done some caching around documents when they are intially uploaded so the main tasks that happens when a User requests a Pdf is merging a number of already generated Pdf's.
Does anyone else have any other tools they have used that work reliably for most common document types and above all, quickly? When put like that, it seems like I'm asking a lot!
Edit:
Thanks for all the great advice, I'll look into some of these and compare performance.
Just to add to all this, money is not really an object. We're more than happy to pay for different applications to perform each task as well as looking into various hardware options to distribute the load as much as possible.
Merging multiple PDF documents is normally simple enough (as long as they don't need to be merged on the same page) - you could compare your merge performance with something like iTextSharp (.NET version of iText) to be sure it isn't a bottleneck - otherwise the conversion from other formats to PDF is likely the bottleneck.
In almost all cases, the method used to convert X to PDF is to execute the applications print command, targeted at a software PDF printer, to create a temporary PDF file.
This means:
The target application (for example Office) is opened and closed
The document has to travel through the printing service
In your situation, are you converting arbitrary documents submitted by the users, or do the documents come from a stored library of files? If it's a library, you could make a PDF copy of each file as it is added to the library (instead of when the user makes a request), and then only merge the PDF files.
We use ABC Pdf. I don't know if it will be fast enough for your needs, but it seems to work for our use.
I had a very similar issue where we had documents that were already existing in PDF format and needed to allow the user to see them all combined together. We purchased the PDF4NET product which was about $500 from what I recall. It was extremely easy to use and they provide awesome examples of how to use the tools.
O2 Solutions - PDF4NET
Here is the code sample that they provide for merging. The top line looks like it just outputs the file, the second 2 lines allow for streaming the content back to the user.
PDFFile.MergeFilesToDisk( "append.pdf", "unicode.pdf", "multicolumntextandimages.pdf" );
PDFDocument doc = PDFFile.MergeFilesToDoc( "append.pdf", "unicode.pdf", "multicolumntextandimages.pdf" );
doc.SaveToStream( stream );
You say you're using Microsoft Office to open these files, I would imagine this is the bottleneck rather than the actual PDF creation.
Is it possible to distill these documents into a more accessible format (html/xml/database), so that it's not necessary to open office every time a PDF needs to be created?
While I have no PDF conversion suggestions I can say that this problem sounds like one which could be distributed over a number of nodes. Do you find that the PDF generation is CPU-bound or are there other limiting factors? Before expending too much effort on rewriting the PDF library interface you might want to see what the bottlenecks are.