Compare PDF documents in Adobe Acrobat via SDK

Compare PDF documents in Adobe Acrobat via SDK - c#

We are planning on implementing a solution for comparing different revisions of a PDF document in our .Net Windows Forms application. In Adobe Acrobat there is a nice feature for comparing two documents, but I have not been able to find any information about whether it is possible to create a plug-in (or something else) to this feature from our application.
I would really appreciate it if any of you could point me in the direction to how I should go about to make such a solution.
I have also looked at other threads here at Stackoverflow for comparing PDF documents, particularly these threads:
How to compare two PDF-files
PDF-libraries
I did not really find a good solution there for a library or SDK letting us create a good solution for comparing PDF-documents in a way which is easy to understand for users of the system.
Do you know any good solutions to solve this problem?
All help appreciated! :)

Do you know the pdf files? or you just want to make the compare without knowing it. If you know the pdf files, you can use variables values on the specific fields and compare the values between files, instead comparing the entire pdf file.

Related

Is it possible to create multilingual help (.chm) file?

I need to create multilingual help (.chm) file for my WPF application. Please suggest best way to create it.

I'd try to steer clear of that if possible. CHM is a proprietary format and, although it has been reverse engineered, I think you'll get far more benefit from doing a truly portable solution like on-disk HTML.
Back when we still used CHM files, we found no easy way to embed multi-language capability into a single file and we had to provide translations in independent CHM files, leading to massive duplication of things like charts, pictures and so forth (this was many years ago so you should check if the situation has improved since then, if you really want to use CHM).
The support for Unicode was, shall we say, less than adequate, and there were numerous security problems which caused many customers to disallow use of CHM files - seriously, who in their right mind allows arbitrary code to be run by a help system?
With on-disk HTML, not only did this duplication disappear (since each language version included common images), we also got much better Unicode support and the ability to have a default front page (in English) with links to alternative front pages for other locales.
And we gained a big boost in portability since it's an open standard. That means we could pretty much run it in any browser on any platform.
And, on top of that, it appears Microsoft don't support it any more. From the Wikipedia CHM article:
In 2002, Microsoft announced security risks associated with the .CHM format, as well as security bulletins and patches. They have since announced their intentions not to develop the .CHM format further.

As the comments in the previous answer state, the CHM format is both a very old format by Microsoft, as well as being proprietary. Distributing HTML files with your application will accomplish the same functionality as a single CHM file. Worrying about the language of the user interface the user will see is more of a non-issue; chances are if the user is reading Italian help, he or she will be using a (1) Italian-localized version of Windows and (2) an Italian-language localized web browser.
That being said, because CHM is an old format and seemingly partially unsupported now, you can generate the same file based on the reversed engineered specification that the CHM files follow. Furthermore, because CHM is merely a binary file format emcompassing HTML files, encoding the HTML files using UTF-8 will accomplish getting the help documents themselves in whichever language you desire.
There is no Microsoft-supported CHM .NET API, so you'll have to output the Binary yourself using Streams / BinaryWriter.

How to generate an index at end of a PDF file?

Given an existing PDF document, I would like to tack on an index to the end of the file to show the pages on which key words show up. It would be best if I don't have to give a list of words to look for and the list of words is automatically generated. However, if a list of words must be given, I can work with it. I'm looking to do this either through a C# library or a command line tool. It needs to run as part of another command line app.
Is there anything out there that is capable of this?
This "PDF Index Everthing" (http://www.pdfstore.com/details.asp?ProdID=799) seems to be on the right track, but requires interaction through its GUI.

I don't actually have an c# solution but hopefully this will still help...
pdflib is an excellent pdf development library. It is one of the better libs available. As far as I know it doesn't have a C# binding. PDF is a random access object-based file format and although there are many libraries that allow for creating of pdfs, most freely available libs don't support adding pages to existing pdfs. pdflib does support adding pages with it's pdi option, so it may be worth checking out.
Updated Info:
Check out- iText# library and
merging pdf files with C# and iText

ASP.Net Converting and Merging documents into single PDF

I need to have the ability to convert and merge various documents into a single Pdf.
The documents could be of varying types, such as Word, Open Office, Images, Text, Web pages (by URL) and the PDF would usually consist of 2-3 documents.
At the moment, we are using BCL Technologies easyPDF with Microsoft Office installed onto the Server. This handles most documents but we haven't had it doing Open Office ones yet.
We currently produce around 100-1000 of these PDF's per day.
The reason I am asking the question is that performance is a key issue. The PDF is generated for users on the fly and so the waiting times we are currently getting of 30-60 seconds is becoming unacceptable.
We have done some caching around documents when they are intially uploaded so the main tasks that happens when a User requests a Pdf is merging a number of already generated Pdf's.
Does anyone else have any other tools they have used that work reliably for most common document types and above all, quickly? When put like that, it seems like I'm asking a lot!
Edit:
Thanks for all the great advice, I'll look into some of these and compare performance.
Just to add to all this, money is not really an object. We're more than happy to pay for different applications to perform each task as well as looking into various hardware options to distribute the load as much as possible.

Merging multiple PDF documents is normally simple enough (as long as they don't need to be merged on the same page) - you could compare your merge performance with something like iTextSharp (.NET version of iText) to be sure it isn't a bottleneck - otherwise the conversion from other formats to PDF is likely the bottleneck.
In almost all cases, the method used to convert X to PDF is to execute the applications print command, targeted at a software PDF printer, to create a temporary PDF file.
This means:
The target application (for example Office) is opened and closed
The document has to travel through the printing service
In your situation, are you converting arbitrary documents submitted by the users, or do the documents come from a stored library of files? If it's a library, you could make a PDF copy of each file as it is added to the library (instead of when the user makes a request), and then only merge the PDF files.

We use ABC Pdf. I don't know if it will be fast enough for your needs, but it seems to work for our use.

I had a very similar issue where we had documents that were already existing in PDF format and needed to allow the user to see them all combined together. We purchased the PDF4NET product which was about $500 from what I recall. It was extremely easy to use and they provide awesome examples of how to use the tools.
O2 Solutions - PDF4NET
Here is the code sample that they provide for merging. The top line looks like it just outputs the file, the second 2 lines allow for streaming the content back to the user.
PDFFile.MergeFilesToDisk( "append.pdf", "unicode.pdf", "multicolumntextandimages.pdf" );
PDFDocument doc = PDFFile.MergeFilesToDoc( "append.pdf", "unicode.pdf", "multicolumntextandimages.pdf" );
doc.SaveToStream( stream );

You say you're using Microsoft Office to open these files, I would imagine this is the bottleneck rather than the actual PDF creation.
Is it possible to distill these documents into a more accessible format (html/xml/database), so that it's not necessary to open office every time a PDF needs to be created?

While I have no PDF conversion suggestions I can say that this problem sounds like one which could be distributed over a number of nodes. Do you find that the PDF generation is CPU-bound or are there other limiting factors? Before expending too much effort on rewriting the PDF library interface you might want to see what the bottlenecks are.

Creating PowerPoint presentations programmatically

Is there a way to programmatically create PowerPoint presentations? If possible, I'd like to use C# and create PowerPoint 2003 presentations.

Yes, you can.
You will want to look into MSDN which has a pretty good introduction to it.
I might give you a word of warning, Microsoft Office interop is compatible with an API which is now more than 10 years old. Because of this, it is downright nasty to use sometimes. If you have the money to invest in a good book or two, I think it would be money well spent.
Here's a starting point for you. Use the search feature on MSDN MSDN Webpage. It's good for any Microsoft C# .NET style stuff.
Specifically in regards to your question, this link should help: Automate PowerPoint from C#. EDIT LINK NOW DEAD :(. These two links are fairly close to the original KB article:
Automate Powerpoint from C# 1/2
Automate Powerpoint from C# 2/2
Finally, to whoever downvoted this: We were all learning one day, how to do something as a beginner is most definitely programming related, regardless of how new someone might be.

OpenXML looks like the way to go from a web app.
Using the interop libraries is not recommended, as others have stated.

You can also look at Aspose Slides, a component for .NET and Java that makes it easy to generate powerpoint documents.

If you don't really need PowerPoint compatible output, consider using a markup language such as LaTeX with the Beamer package to produce a PDF of the presentation, or use HTML and javascript in a manner similar to Slidy. If you need fancy effects, it might still be easier to use SVG, and you'd have the benefit of getting output that can be reliably viewed with free software.

http://msdn.microsoft.com/hi-in/magazine/cc163471(en-us).aspx
Use this link. Although this is in VB.NET, C# supports the same.

You may also try out SlideMight, a tool for merging hierarchical data with PowerPoint templates.
SlideMight supports:
text substitution in text fields, tables and notes
image substitution, from raw data, files and URLs
images in tables nested
iterations over data to create slides
iterations to populate tables, possibly spanning multiple slides
special formatting for specific cell values
hyperlinks to generated slides
Input data format is at this time just JSON.
There are versions for Windows and Mac OS X.
More information is at http://www.SlideMight.com
Disclaimer:
I am the owner of Delftware Technology, the company that developed SlideMight.
And I am one of the developers.

You can use Essential Presentation product from Syncfusion Software Private Limited. This product can be used to
Create and manipulate PowerPoint presentations
Open, modify, and save existing PowerPoint presentations
Convert PowerPoint presentations to PDF or Image
More information is at https://help.syncfusion.com/file-formats/presentation/overview
Disclaimer:
I work for Syncfusion Software Private Limited

Batch Printing PDFs from ASP.NET

I have a situation where in a web application a user may need a variable list of PDFs to be printed. That is, given a large list of PDFs, the user may choose an arbitrary subset of that list to print. These PDFs are stored on the file system. I need a method to allow users to print these batches of PDFs relatively easily (thus, asking the user to click each PDF and print is not an option) and without too much of a hit on performance.
A couple of options I've thought about:
1) I have a colleague who uses a PDF library that I could use to take the PDFs and combine them on the fly and then send that PDF to the user for printing. I don't know if this method will mess up any sort of page numbering. This may be an "ok" method but I worry about the performance hit of this.
2) I've thought about creating an ActiveX that I would pass the PDFs off to and let it invoke the printing features. My concern is that this is needlessly complex and may present some odd user interactions.
So, I'm looking for the best option to use in this scenario, which is probably not one of the ones I've gone through.

The best solution I have for you is number 1. There are plenty of libraries that will merge documents. From the one I've used the numbering should not be an issue since all the pages are all ready rendered.
If you go with ActiveX you are going to limit yourself to IE which might be acceptable. The only other idea would be to use a smart client so you can have more control...then you could serve up the PDF's via a web service.

I think concatenating the documents is the way to go.
For tools I recommend iText#. Its free
You can download here iTextSharp
iText# (iTextSharp) is a port of the iText open source java library for PDF generation written entirely in C# for the .NET platform. Use the iText mailing list to get support.

I agree with #1. You could do some tests to see what the performance hit would be like.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.