Printing PDF programmatically bloats file size

Printing PDF programmatically bloats file size - c#

Ok, So I've been given a task to build a light weight printing feature that will replace a third party tool that costs a significant amount of money,and not only that, has far too many features.
I've managed to build a little system that polls some data and calls an endpoint on an on-premise MVC app, which in turn prints the document.
All is great, but I'm really struggling to figure out why the PDF file size bloats when hitting the Print Queue.
Currently the File size is 822KB when I print manually via Adobe the PDF is compressed to 342KB
BUT using the system it bloats to an astonishing 4.22MB
To note I am using the PDFium SDK Nuget package to take away some of the heavy lifting. Having said that, I do utilize System.Drawing.Printing to craft settings to pass to PDFium.
A little code to demonstrate printing:
public bool PrintPDF(string printer,
string filePath)
{
try
{
var printerSettings = new PrinterSettings
{
PrinterName = "Hewlett-Packard HP LaserJet P2015 Series",
Copies = 1,
};
using (var document = PdfDocument.Load(#"C:\folder\Documentation\test.pdf"))
{
using (var printDocument = document.CreatePrintDocument())
{
printDocument.PrinterSettings = printerSettings;
printDocument.DefaultPageSettings = pageSettings;
printDocument.DocumentName = "test.pdf";
printDocument.PrintController = new StandardPrintController();
printDocument.Print();
}
}
return true;
}
catch(System.Exception ex)
{
new Email().SendEmail("", "TEST ERR", ex.Message, "email address");
return false;
}
}
At the moment I'd be happy if it printed the physical size (822KB) rather than bloating it.
Id really appreciate some guidance and a nudge in the right direction.

PDF is (usually) a vector representation of the page, its a page description. PDF can contain bitmap data as well, but for text and line art its usually vector, and white space simply isn't included in the description at all.
When you print, then behind the scenes the application creates a device context compatible with the printer you select, replays the drawing commands it used to draw the content on the display, and then tells the printer context to print.
That causes the device driver to be handed the GDI commands to draw the page. Depending on the printer type (ie what page description language it understands) the device driver can simply pass on the commands (for a GDI printer), convert them to a high level vector representation (like PostScript) or render them to a bitmap. Some drivers may do a combination of these approaches. The result is then sent to the printer.
The Adobe PDF 'printer' works by co-opting the Windows PostScript printer driver, which converts GDI commands into vector PostScript operations, which are easily turned into vector PDF operations, resulting in a small representation of the page.
It sounds to me like your printer (or possibly printer driver) is 'dumb' and wants, or is being sent, a big bitmap. Once upon a time, in the days when printers ran on serial interfaces and 9600 baud was fast, it was worth keeping the file size small and having the printer be smart, because it took a long time to send the data. Nowadays, that's less of a concern, even several megabytes can transfer rapidly, and if you send a pre-rendered bitmap to the printer, the printer can be dumb and still print fast, because all it has to do is transfer the bits.
You haven't really said what you mean when you "print manually using Adobe" or "use the system" so I can't tell you more than that, but my guess would be that your big PDF simply contains a large (compressed) image.

Related

Exporting WPF Canvas to PDF

I've been attempting to find an easy solution to exporting a Canvas in my WPF Application to a PDF Document.
So far, the best solution has been to use the PrintDialog and set it up to automatically use the Microsoft Print the PDF 'printer'. The only problem I have had with this is that although the PrintDialog is skipped, there is a FileDialog to choose where the file should be saved.
Sadly, this is a deal-breaker because I would like to run this over a large number of canvases with automatically generated PDF names (well, programitically provided anyway).
Other solutions I have looked at include:
Using PrintDocument, but from my experimentation I would have to manually iterate through all my Canveses children and manually invoke the correct Draw method (of which a lot of my custom elements with transformation would be rather time consuming to do)
Exporting as a PNG image and then embedding that in a PDF. Although this works, TextBlocks within my canvas are no longer text. So this isn't an ideal situation.
Using the 3rd party library PDFSharp has the same downfall as the PrintDocument. A lot of custom logic for each element.
With PDFSharp. I did find a method fir generating the XGraphics from a Canvas but no way of then consuming that object to make a PDF Page
So does anybody know how I can skip or automate the PDF PrintDialog, or consume PDFSharp XGraphics to make
A page. Or any other ideas for directions to take this besides writing a whole library to convert each of my Canvas elements to PDF elements.

If you look at the output port of a recent windows installation of Microsoft Print To PDF
You may note it is set to PORTPROMP: and that is exactly what causes the request for a filename.
You might note lower down, I have several ports set to a filename, and the fourth one down is called "My Print to PDF"
So very last century methodology; when I print with a duplicate printer but give it a different name I can use different page ratios etc., without altering the built in standard one. The output for a file will naturally be built:-
A) Exactly in one repeatable location, that I can file monitor and rename it, based on the source calling the print sequence, such that if it is my current default printer I can right click files to print to a known \folder\file.pdf
B) The same port can be used via certain /pt (printto) command combinations to output, not just to that default port location, but to a given folder\name such as
"%ProgramFiles%\Windows NT\Accessories\WORDPAD.EXE" /pt listIN.doc "My Print to PDF" "My Print to PDF" "listOUT.pdf"
Other drivers usually charge for the convenience of WPF programmable renaming, but I will leave you that PrintVisual challenge for another of your three wishes.
MS suggest XPS is best But then they would be promoting it as a PDF competitor.
It does not need to be Doc[X]2PDF it could be [O]XPS2PDF or aPNG2PDF or many pages TIFF2PDF etc. etc. Any of those are Native to Win 10 also other 3rd party apps such as [Free]Office with a PrintTo verb will do XLS[X]2PDF. Imagination becomes pagination.

I had a great success in generating PDFs using PDFSharp in combination with SkiaSharp (for more advanced graphics).
Let me begin from the very end:
you save the PdfDocument object in the following way:
PdfDocument yourDocument = ...;
string filename = #"your\file\path\document.pdf"
yourDocument.Save(filename);
creating the PdfDocument with a page can be achieved the following way (adjust the parameters to fit your needs):
PdfDocument yourDocument = new PdfDocument();
yourDocument.PageLayout = PdfPageLayout.SinglePage;
yourDocument.Info.Title = "Your document title";
PdfPage yourPage = yourDocument.AddPage();
yourDocument.Orientation = PageOrientation.Landscape;
yourDocument.Size = PageSize.A4;
the PdfPage object's content (as an example I'm putting a string and an image) is filled in the following way:
using (XGraphics gfx = XGraphics.FromPdfPage(yourPage))
{
XFont yourFont = new XFont("Helvetica", 20, XFontStyle.Bold);
gfx.DrawString(
"Your string in the page",
yourFont,
XBrushes.Black,
new XRect(0, XUnit.FromMillimeter(10), page.Width, yourFont.GetHeight()),
XStringFormats.Center);
using (Stream s = new FileStream(#"path\to\your\image.png", FileMode.Open))
{
XImage image = XImage.FromStream(s);
var imageRect = new XRect()
{
Location = new XPoint() { X = XUnit.FromMillimeter(42), Y = XUnit.FromMillimeter(42) },
Size = new XSize() { Width = XUnit.FromMillimeter(42), Height = XUnit.FromMillimeter(42.0 * image.PixelHeight / image.PixelWidth) }
};
gfx.DrawImage(image, imageRect);
}
}
Of course, the font objects can be created as static members of your class.
And this is, in short to answer your question, how you consume the XGraphics object to create a PDF page.
Let me know if you need more assistance.

Reduce pdf size in .Net

I have a pdf template of size 230KB. In my WebAPI for multiple users, taking copy of that template pushing data to it, and merging using iTextsharp library. For 1500 users, total file size is reaching up to 320 MB.
I tried using BitMiracle, it reduced the file size to 160 MB. But it is still a large file.
I used acrobat Pro and used Save as Other option Reduced Size PDF, it reduced file size to 25 MB.
I want to decrease the file size to 25MB in my WebAPI using C# which will be hosted on server later.
As user is not supposed to edit that PDF, he will just store it as a record. Can i generate a post script file and then use acrobat distiller to decrease the size?If yes, how can I do it?
I am using ghostscript.Net. Wrote this method, it is not throwing any error. But i am unable to find the path of generated postscript file
public void convertToPs(string file)
{
try
{
Process printProcess = new Process();
printProcess.StartInfo.FileName = file;
printProcess.StartInfo.Verb = "printto";
printProcess.StartInfo.Arguments = "\"Ghostscript PDF\"";
printProcess.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
printProcess.StartInfo.CreateNoWindow = true;
printProcess.Start();
// Wait until the PostScript file is created
try
{
printProcess.WaitForExit();
}
catch (InvalidOperationException) { }
printProcess.Dispose();
}
catch (Exception ex)
{
throw ex;
}
}
Please help

Does the template has embedded Fonts? Probably the merging doesn't combine those fonts. If you don't need embedded Fonts you could remove them. Adobe does some good work in combining embedded fonts.
If you want you can send me such a big pdf document, so that i can understand, why this file is getting so big. I am a developer of a PDF library and getting the smallest PDF is one interesting usecase i am working on.

c# printing through PDF drivers, print to file option will output PS instead of PDF

After struggling whole day, I identified the issue but this didn't solve my problem.
On short:
I need to open a PDF, convert to BW (grayscale), search some words and insert some notes nearby found words. At a first look it seems easy but I discovered how hard PDF files are processed (having no "words" concepts and so on).
Now the first task, converting to grayscale just drove me crazy. I didn't find a working solution either commercial or free. I came up with this solution:
open the PDF
print with windows drivers, some free PDF printers
This is quite ugly since I will force the C# users to install such 3'rd party SW but.. that is fpr the moment. I tested FreePDF, CutePDF and PDFCreator. All of them are working "stand alone" as expected.
Now when I tried to print from C#, obviously, I don't want the print dialog, just select BW option and print (aka. convert)
The following code just uses a PDF library, shown for clarity only.
Aspose.Pdf.Facades.PdfViewer viewer = new Aspose.Pdf.Facades.PdfViewer();
viewer.BindPdf(txtPDF.Text);
viewer.PrintAsGrayscale = true;
//viewer.RenderingOptions = new RenderingOptions { UseNewImagingEngine = true };
//Set attributes for printing
//viewer.AutoResize = true; //Print the file with adjusted size
//viewer.AutoRotate = true; //Print the file with adjusted rotation
viewer.PrintPageDialog = true; //Do not produce the page number dialog when printing
////PrinterJob printJob = PrinterJob.getPrinterJob();
//Create objects for printer and page settings and PrintDocument
System.Drawing.Printing.PrinterSettings ps = new System.Drawing.Printing.PrinterSettings();
System.Drawing.Printing.PageSettings pgs = new System.Drawing.Printing.PageSettings();
//System.Drawing.Printing.PrintDocument prtdoc = new System.Drawing.Printing.PrintDocument();
//prtdoc.PrinterSettings = ps;
//Set printer name
//ps.PrinterName = prtdoc.PrinterSettings.PrinterName;
ps.PrinterName = "CutePDF Writer";
ps.PrintToFile = true;
ps.PrintFileName = #"test.pdf";
//
//ps.
//Set PageSize (if required)
//pgs.PaperSize = new System.Drawing.Printing.PaperSize("A4", 827, 1169);
//Set PageMargins (if required)
//pgs.Margins = new System.Drawing.Printing.Margins(0, 0, 0, 0);
//Print document using printer and page settings
viewer.PrintDocumentWithSettings(ps);
//viewer.PrintDocument();
//Close the PDF file after priting
What I discovered and seems to be little explained, is that if you select
ps.PrintToFile = true;
no matter C# PDF library or PDF printer driver, Windows will just skip the PDF drivers and instead of PDF files will output PS (postscript) ones which obviously, will not be recognized by Adobe Reader.
Now the question (and I am positive that others who may want to print PDFs from C# may be encountered) is how to print to CutePDF for example and still suppress any filename dialog?
In other words, just print silently with programmatically selected filename from C# application. Or somehow convince "print to file" to go through PDF driver, not Windows default PS driver.
Thanks very much for any hints.

I solved conversion to grayscale with a commercial component with this post and I also posted there my complete solution, in care anyone will struggle like me.
Converting PDF to Grayscale pdf using ABC PDF

Printing multiple images to the Fax virtual printer using PrintDocument on Win7&Vista results in corrupt TIFF

I have an application that application uses PrintDocument to print multiple images as a single print job with one image per page. I am having an issue with a very specific but common configuration and am wondering how to correct it.
The code works without issue on any of the physical printers I have access to and the Microsoft XPS Document Writer on all Desktop Windows OSes. However printing to the Fax virtual-printer (the one that comes standard on Windows operating systems) only works on Windows XP and Windows 8. When I print to the Fax printer on Windows Vista or Windows 7, it indicates it prints successfully; but, if there was more than one page and you open the .tif image that is created Windows Photo Viewer says
Windows Photo Viewer can't open this picture because the file appears to be damaged, corrupted, or is too large.
That message is the one from Windows 7, the text may be slightly different on Vista. If there is only a single image (thus a single page) it works fine.
If the fax is sent, it comes out blank. I have also have tried opening the .tif image in GIMP and processing it with ImageMagick both of which failed indicating it was a bad .tif file.
This is the code that produces the issue, some robustness has been removed to allow for a more concise example.
internal void Print( string printerName )
{
PrintDocument printDocument = new PrintDocument
{
PrinterSettings = new PrinterSettings {PrinterName = printerName}
};
IEnumerable<string> filesToPrint = new[]{"File1.png", "File2.png"};
IEnumerator<string> enumerator = filesToPrint.GetEnumerator();
enumerator.MoveNext()
printDocument.PrintPage += (sender, args) =>
{
string fileName = enumerator.Current;
using (var img = System.Drawing.Image.FromFile(fileName))
{
args.Graphics.DrawImage(img, args.PageBounds);
}
var moveNext = enumerator.MoveNext();
args.HasMorePages = moveNext;
if (!moveNext)
{
enumerator.Dispose();
}
};
printDocument.Print();
}
Is this simply an issue with the Fax printer on those operating systems or is there something wrong with the above code? How might I resolve this issue?
This Microsoft hotfix does not specifically mention Faxing but does have the right error message, so I tried applying it. It made no difference.

I pursued this in the Microsoft Partner Support Community. The ultimate reply there was that they have confirmed the issue and the "escalation engineers are working at this issue."
The workaround I came up with myself is to print using System.Printing rather than the System.Drawing.Printing.
internal override void Print(string printerName, string baseFilePath, string baseFileName)
{
using (var printQueue = GetPrintQueue(printerName))
{
XpsDocumentWriter xpsDocumentWriter = PrintQueue.CreateXpsDocumentWriter(printQueue);
IEnumerable<string> filesToPrint = GetFilesToPrint(baseFilePath, baseFileName);
PrintUsingCollator(xpsDocumentWriter, filesToPrint);
}
}
private static void PrintUsingCollator(XpsDocumentWriter xpsDocumentWriter,
IEnumerable<string> filesToPrint)
{
SerializerWriterCollator collator = xpsDocumentWriter.CreateVisualsCollator();
collator.BeginBatchWrite();
foreach (var fileName in filesToPrint)
{
Image image = CreateImage(fileName);
ArrangeElement(image);
collator.Write(image);
}
collator.EndBatchWrite();
}
/// <remarks>
/// This method needs to be called in order for the element to print the right size.
/// </remarks>
private static void ArrangeElement(UIElement element)
{
var box = new Viewbox {Child = element};
box.Measure(new Size(double.PositiveInfinity, double.PositiveInfinity));
box.Arrange(new Rect(box.DesiredSize));
}
System.Printing is a newer API based on WPF. The down side to it in this particular case is that it is memory usage. Because System.Windows.Controls.Image is GC-ed (where as System.Drawing.Image requires explicit disposal) and I am loading relatively large images, it does seem to thrash memory usage a bit and ultimately be a bit slower on large jobs.

Printing A PDF Automatically to a specific printer and tray

I have a C# application that When the user clicks Print the application creates a PDF in memorystream using ITextSharp. I need to print this PDF automatically to a specific printer and tray.
I have searched for this but all i can find is using javascript, but it doesn't print to a specific tray.
Does anyone have an examples of doing this?
Thank you.

You can change printer tray with this code.
string _paperSource = "TRAY 2"; // Printer Tray
string _paperName = "8x17"; // Printer paper name
//Tested code comment. The commented code was the one I tested, but when
//I was writing the post I realized that could be done with less code.
//PaperSize pSize = new PaperSize() //Tested code :)
//PaperSource pSource = new PaperSource(); //Tested code :)
/// Find selected paperSource and paperName.
foreach (PaperSource _pSource in printDoc.PrinterSettings.PaperSources)
if (_pSource.SourceName.ToUpper() == _paperSource.ToUpper())
{
printDoc.DefaultPageSettings.PaperSource = _pSource;
//pSource = _pSource; //Tested code :)
break;
}
foreach (PaperSize _pSize in printDoc.PrinterSettings.PaperSizes)
if (_pSize.PaperName.ToUpper() == _paperName.ToUpper())
{
printDoc.DefaultPageSettings.PaperSize = _pSize;
//pSize = _pSize; //Tested code :)
break;
}
//printDoc.DefaultPageSettings.PaperSize = pSize; //Tested code :)
//printDoc.DefaultPageSettings.PaperSource = pSource; //Tested code :)

in the past I spent a lot of time searching the web for solutions to print pdf files to specific printer trays.
My requirement was: collect several pdf files from server directory and send each file to a different printer tray in a loop.
So I have tested a lot of 3rd party tools (trials) and best practices found in web.
Generally all theese tools can be divide into two classifications: a) send pdf files to printer in a direct way (silent in UI) or b) open pdf files in UI using a built-in pdf previewer working with .Net-PrintDocument.
The only solution that fix my requirement was PDFPrint from veryPdf (drawback: it´s not priceless, but my company bought it). All the other tools and solutions didn´t work reliable, that means: calling their print-routines with parameter e.g. id = 258 (defines tray 2; getting from installed printer) but printing the pdf file in tray 3 or pdf was opened in print previewer (UI) with lost images or totally blank content and so on..
Hope that helps a little bit.

There is a tool called pdfprint:
http://www.verypdf.com/pdfprint/index.html
And here they discuss some solutions:
http://social.msdn.microsoft.com/forums/en-US/csharpgeneral/thread/da99765f-2706-4bb6-aa0e-b90730294cb4

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.