Convert XPS to PDF in WPF C# application - c#

I need to convert an XPS file I create with my application to a PDF file, what is the best way to do so? I prefer this to be done from inside C# using a managed assembly.
Open source is preferred upon third party solutions

You can use the XpsDocument class to read the XPS files, then use a PDF library (such as Report.Net or #PDF) to export it. I used #PDF back in .NET 1.1, but not sure if it can be easily converted to .NET 2.0.
#PDF:
http://sharppdf.sourceforge.net/
Report.NET:
http://report.sourceforge.net/

An open source managed assembly might be hard to find, but you can look at tallcomponents.com for a commercial product that might help, You can have a look at GhostScript.com, its open source and supports both XPS and PDF, although you may have issues redistributing it without a license.

XPS to PDF document conversion using Ghostscript. Please refer below code snippet to convert XPS to PDF
Process process = new Process();
process.StartInfo.FileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "ghostxps-9.54.0-win32", "gxpswin32.exe");
process.StartInfo.Arguments = $"-sDEVICE=pdfwrite -sOutputFile=\"{pdfFilePath}\" -dNOPAUSE \"{xpsFilePath}\"";
process.Start();
process.WaitForExit();
Please refer below links for more details. click here.

Although it is not free, Amyuni PDF Creator .Net supports loading XPS files and saving them as PDF.
Usual disclaimer applies.

Related

Extracting the first page of multiple PDFs & saving them as Image

I have about 400 ebooks, all in PDF format, and my task is to extract the cover from every one of them (which is the first page of every PDF) and export them all as separate image (PNG or JPEG) files
So I will end up with 400 ebooks and 400 images of their covers.
I have Windows
Any advice greatly appreciated.
Use ghostscript to render tiff or jpg from the pdf. You have fine grained control over the result.
If this is a commercial application, you need a commercial license. If you use the application commercially, but inside your organisation, you are allowed to use the GPLed version of ghostscript.
Ghostscript can be found here. The PDF interpreter in many opensource packages relies on the gs PDF interpreter. Imagemagick for example, requires ghostscript libraries.
Download GS here: http://ghostscript.com/download/gsdnld.html
Use C# Process class to execute Ghostscript, there is a SO topic on this here How to run a C# console application with the console hidden
The commandline for tiff will be:
D:\gs\gs9.20>bin\gswin64c.exe -sOutputFile=d:\some%02d.tiff -dBATCH -dNOPAUSE -sDEVICE=tiff24nc -sCompression=lzw -r150 -sPageList=1 d:\PDFReference.pdf
This will create one some01.tiff file on d:\ in 150dpi resolution.
The following thread is suitable for your request. converting pdf file to an jpeg image
One solution is to use a third party library. ImageMagick is a very popular, freely available too. You can get a .NET wrapper for it here. The original ImageMagick download page is here.
http://www.codeproject.com/KB/library/pdftoimages.aspx Convert PDF pages to image files using the Solid Framework
http://www.print-driver.com/howto/convert_pdf_to_jpeg.html Universal Document Converter
http://www.makeuseof.com/tag/6-ways-to-convert-a-pdf-file-to-a-jpg-image/ 6 Ways To Convert A PDF To A JPG Image
And you also can take a look at this thread: how to open a page from a pdf file in pictureBox in C#
If you use this process to convert a PDF to tiff, you can use this class to retrieve the bitmap from tiff.

Can I save a DOCX file as HTML using the DOCX Library?

I am using the DOCX library to manipulate *.docx files.
I would like to save a *docx file as an html file, but this code:
using (DocX sourceDoc = DocX.Create(sourceFilename))
{
sourceDoc.SaveAs(sourceHTMLFileName);
}
...does not work (sourceHTMLFileName is "Bla.html").
Is it possible? If so, how?
The author of DocX has stated in a blog post that his library does not support this feature yet. I got the link from the codeplex page for the library.)
Quote from the link:
I would love to add this functionality to DocX, however there is a problem.
[...]
The only easy way to do this conversion, is to use Microsoft’s Office interop libraries
[...]
Is there no way to do conversions without having Word.exe installed on my machine. I didn’t say that, I said there is no easy way. This looks very promising, now if I could only find the time.
He suggests a workaround using Interop but that might not be possible depending on your environment.
Using SaveAs with a file that ends in .html simply saves a .docx file with the wrong extension; there is no conversion done.

Print Pdf in C# [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm new to c#. I was looking all over the net for tutorials on how to print pdf, but couldn't find one.
Then I thought, is it possible to read it using itextpdf, like mentioned here
Reading PDF content with itextsharp dll in VB.NET or C#
then print it. If so, how?
A very straight forward approach is to use an installed Adobe Reader or any other PDF viewer capable of printing:
Process p = new Process( );
p.StartInfo = new ProcessStartInfo( )
{
CreateNoWindow = true,
Verb = "print",
FileName = path //put the correct path here
};
p.Start( );
Another way is to use a third party component, e.g. PDFView4NET
I wrote a little helper method around the adobereader to bulk-print pdf from c#...:
public static bool Print(string file, string printer) {
try {
Process.Start(
Registry.LocalMachine.OpenSubKey(
#"SOFTWARE\Microsoft\Windows\CurrentVersion" +
#"\App Paths\AcroRd32.exe").GetValue("").ToString(),
string.Format("/h /t \"{0}\" \"{1}\"", file, printer));
return true;
} catch { }
return false;
}
One cannot rely on the return-value of the method btw...
Another approach, if you simply wish to print a PDF file programmatically, is to use the LPR command:
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/lpr
LPR is available on newer versions of Windows too (e.g. Vista/7), but you need to enable it in the Optional Windows Components.
For example:
Process.Start("LPR -S printerdnsalias -P raw C:\files\file.pdf");
You can also use the printer IP address instead of the alias.
This assumes that your printer supports PDF Direct Printing otherwise this will only work for PostScript and ASCII files. Also, the printer needs to have a network interface installed and you need to know it's IP address or alias.
Use PDFiumViewer. I searched for a long time till I came up with a similar solution, then I found this clean piece of code that does not rely on sending raw files to the printer (which is bad if they get interpreted as text files..) or using Acrobat or Ghostscript as a helper (both would need to be installed, which is a hassle):
https://stackoverflow.com/a/41751184/586754
PDFiumViewer comes via nuget, the code example above is complete. Pass in null values for using the default printer.
I had the same problem on printing a PDF file. There's a nuget package called Spire.Pdf that's very simple to use. The free version has a limit of 10 pages although, however, in my case it was the best solution once I don't want to depend on Adobe Reader and I don't want to install any other components.
https://www.nuget.org/packages/Spire.PDF/
PdfDocument pdfdocument = new PdfDocument();
pdfdocument.LoadFromFile(pdfPathAndFileName);
pdfdocument.PrinterName = "My Printer";
pdfdocument.PrintDocument.PrinterSettings.Copies = 2;
pdfdocument.PrintDocument.Print();
pdfdocument.Dispose();
You can create the PDF document using PdfSharp. It is an open source .NET library.
When trying to print the document it get worse. I have looked allover for a open source way of doing it. There are some ways do do it using AcroRd32.exe but it all depends on the version, and it cannot be done without acrobat reader staying open.
I finally ended up using VintaSoftImaging.NET SDK. It costs some money but is much cheaper than the alternative and it solves the problem really easy.
var doc = new Vintasoft.Imaging.Print.ImagePrintDocument { DocumentName = #"C:\Test.pdf" };
doc.Print();
That just prints to the default printer without showing. There are several alternatives and options.
The best way to print pdf automatically from C# is using printer's "direct pdf". You just need to copy the pdf file to printer's network sharename. The rest will be taken care by printer itself.
The speed is 10 times faster than any other methods. However, the requirements are the printer model supporting for direct pdf printing and having at least 128 MB Dram which is easy for any modern printer.
I wrote and released a small Nuget Package which can be used to print a PDF file to a printerdriver. It can also print to a XPS file or PDF file. Here is a link to it.
It is possible to use Ghostscript to read PDF files and print them to a named printer.
Looks like the usual suspects like pdfsharp and migradoc are not able to do that (pdfsharp only if you have Acrobat (Reader) installed).
I found here
https://vishalsbsinha.wordpress.com/2014/05/06/how-to-programmatically-c-net-print-a-pdf-file-directly-to-the-printer/
code ready for copy/paste. It uses the default printer and from what I can see it doesn't even use any libraries, directly sending the pdf bytes to the printer. So I assume the printer also needs to support it, on one 10 year old printer I tested this it worked flawlessly.
Most other approaches - without commercial libraries or applications - require you to draw yourself in the printing device context. Doable but will take a while to figure it out and make it work across printers.
The easiest way is to create C# Process and launch external tool to print your PDF file
private static void ExecuteRawFilePrinter() {
Process process = new Process();
process.StartInfo.FileName = "c:\\Program Files (x86)\\RawFilePrinter\\RawFilePrinter.exe";
process.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
process.StartInfo.Arguments = string.Format("-p \"c:\\Users\\Me\\Desktop\\mypdffile.pdf\" \"gdn02ptr006\"");
process.Start();
process.WaitForExit();
}
Code above launches RawFilePrinter.exe (similar to 2Printer.exe), but with better support. It is not free, but by making donation allow you to use it everywhere and redistribute with your application. Latest version to download: http://bigdotsoftware.pl/rawfileprinter
It depends on what you are trying to print. You need a third party pdf printer application or if you are printing data of your own you can use report viewer in visual studio. It can output reports to excel and pdf -files.
It is also possible to do it with an embedded web browser, note however that since this might be a local file, and also because it is not actually the browser directly and there is no DOM so there is no ready state.
Here is the code for the approach I worked out on a win form web browser control:
private void button1_Click(object sender, EventArgs e)
{
webBrowser1.Navigate(#"path\to\file");
}
private void webBrowser1_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{
//Progress Changed fires multiple times, however after the Navigated event it is fired only once,
//and at this point it is ready to print
webBrowser1.ProgressChanged += (o, args) =>
{
webBrowser1.Print();//Note this does not print only brings up the print preview dialog
//Should be on a separate task to ensure the main thread
//can fully initialize the print dialog
Task.Factory.StartNew(() =>
{
Thread.Sleep(1000);//We need to wait before we can send enter
//This assumes that the print preview is still in focus
Action g = () =>
{
SendKeys.SendWait("{ENTER}");
};
this.Invoke(g);
});
};
}
I advice you to try 2Printer command line tool from:
http://www.doc2prn.com/
Command line example to print all PDF files from folder "C:\Input" is below. You can simple call it from your C# code.
2Printer.exe -s "C:\Input*.PDF" -prn "Canon MP610 series Printer"
If you have Adobe Reader installed, then you should be able to just set it as the default printer. And VOILA! You can print to PDF!
printDocument1.PrinterSettings.PrinterName = "Adobe PDF";
printDocument1.Print();
Just as simple as that!!!
Open, import, edit, merge, convert Acrobat PDF documents with a few lines of code using the intuitive API of Ultimate PDF. By using 100% managed code written in C#, the component takes advantage of the numerous built-in features of the .NET Framework to enhance performance. Moreover, the library is CLS compliant, and it does not use any unsafe blocks for minimal permission requirements. The classes are fully documented with detailed example code which helps shorten your learning curve. If your development environment is Visual Studio, enjoy the full integration of the online documentation. Just mark or select a keyword and press F1 in your Visual Studio IDE, and the online documentation is represented instantly. A high-performance and reliable PDF library which lets you add PDF functionality to your .NET applications easily with a few lines of code.
PDF Component for NET

Use a System.Drawing.Printing.PrintDocument to generate a PDF in memory

Does anyone know if the following is possible and if so what the best way of doing it is for free?
I am generating a PrintDocument in a project I am currently working on and displaying a print dialog box so a user can choose which printer they want to use etc. The is currently a windows form application and if a user wants to print to a PDF they can select to print to CutePDF or something similar.
However I am now putting a ASP.Net web frontend on the application and want to use the same code to generate the PrintDocument but want to print it to a PDF on the fly and serve it up via the Response stream in the format of a PDF download.
So my question is....How can I use the current PrintDocument and generate a PDF in memory from it??
Thanks
The System.Drawing code for a PrintDocument can be reused to generate a PDF document with ABCpdf .NET. See the System.Drawing example...
You would have to use a 3rd party component in order to generate the PDF. The following article has some links to some such components: Generating PDF Files from .Net
You're in a world of hurt if you think you're going to run the "same code" that deals with printers in both a forms app and an ASP.NET app.
You might be in luck, however, as it appears that PDFsharp + MigraDoc might be able to do this for you.
I think you will find there is not any tools that will take a PrintDocument as input and render a PDF as output. The only way to do what you want is to "print" the PrintDocument to a "PDF printer driver" that will generate PDF. Basically a virtual printer that will generate PDF instead of printing the actual output. There are a plethora of products on the market for that. A couple that are cheap and widely used are as follows:
Ghostscript with RedMon (open source/GPL or commercial licenses available)
Foxit Software's PDF Creator ($29)
You really should be looking at iTextSharp (it is mentioned on the iText.NET page recommended earlier)
http://itextsharp.sourceforge.net/
PrintDocument is meant for Windows Forms applications but is up and coming in SilverLight, see this video... http://silverlight.net/learn/videos/all/printing-api-basics/
If you wish to continue with the PrintDocument and a web application, I think SilverLight 4 (which is beta right now) is the only way to go, or your going to have to have a lite weight windows form application installed locally for the end user that maybe uses web services.
iTextSharp is a great tool for generating PDFs with .NET on the Internet. I highly recommend it; I've used iText with Java...and have been using iTextSharp for the past few years.
There are several ports of iText for .NET (A very popular open-source PDF library for Java).
http://www.ujihara.jp/iTextdotNET/en/

Programatically Break Apart a PDF created by a scanner into separate PDF documents

I have PDF documents from a scanner. This PDF contain forms filled out and signed by staff for a days work. I want to place a bar code or standard area for OCR text on every form type so the batch scan can be programatically broken apart into separate PDF document based on form type.
I would like to do this in Microsoft .net 2.0
I can purchase the require Adobe or other namespaces/dll need to accomplish the task if there are no open source namespaces/dll's available.
Not a free or open source option, but you might also look at ABCPdf by webSuperGoo as another alternative to Adobe.
You can research the iTextSharp library, which can split pdf files.
But it isn't very good for reading the actual pdfs. So I have no idea how it would know where to split them.
There are companies that already do this for you.
You can research the kwiktag company.
iTextSharp will help you split, reassemble, and apply barcodes to pdf's in .NET languages. I dont think it can OCR a document, but I havent looked (I used Abby fine Reader engine).
From the title of your question I'm assuming that you just need to break apart PDF files and that they are already OCR'd. There are a few open source .NET PDF libraries out there. I have successfully used PDFSharp in a project of my own.
Here is a quick snippet that shows how to cull out each page from a PDF document using PDFSharp:
string filePath = #"c:\file.pdf";
using (PdfDocument ipdf = PdfReader.Open(filePath, PdfDocumentOpenMode.ReadOnly))
{
int i = 1;
foreach (PdfPage page in ipdf.Pages)
{
using (PdfDocument opdf = new PdfDocument())
{
opdf.Version = ipdf.Version;
opdf.AddPage(page);
opdf.Save("page " + i++ + ".pdf");
}
}
}
Assuming also that you need to access the text in the document for grouping you can use the PdfPage.Contents property.
You can use several, try these free tools:
PDF Toolkit
Multivalent
check out the Tesseract .NET wrapper (v 2.04.0) around the c++ ocr engine by the same name developed by hp in the late 90's, it won awards for its ingenuity

Categories

Resources