Rotate PDF without truncating contents using PDFsharp - c#

After rotating to portrait view, how might I ensure all content is visible (within page bounds)?
PDFsharp truncates my page when rotated by +90 degrees but not when rotated by -90 degrees.
PdfDocument outputDocument = new PdfDocument();
XPdfForm old = XPdfForm.FromFile(in_filename);
PdfPage newP = outputDocument.AddPage();
if (old.Page.Orientation == PageOrientation.Landscape)
{
old.Page.Rotate= (old.Page.Rotate - 90) % 360;
old.Page.Orientation = PageOrientation.Portrait;
}
newP.Height = old.Page.Height;
newP.Width = old.Page.Width;
// Get a graphics object for page1
XGraphics gfx = XGraphics.FromPdfPage(newP);
// Draw the page identified by the page number like an image
gfx.DrawImage(old, new XRect(0, 0, old.PointWidth, old.PointHeight));
The above works for a few pdf test cases, but I am skeptical it is coincidental/luck
I am using PDFsharp 1.50 beta.

There is a known problem with PDFsharp 1.50 beta with respect to importing rotated PDFs. That problem is still under investigation.
PDF files come in many different variations, therefore it is very difficult to ensure that code works in all cases.

I ended up doing the following:
(Note, this only had to work for a landscape to portrait)
var output = new PdfDocument();
var outputPage = output.AddPage();
using (var stream = new MemoryStream(Convert.FromBase64String(pdfBase64String)))
{
using (var input = XPdfForm.FromStream(stream))
{
outputPage.Height = input.PointWidth;
outputPage.Width = input.PointHeight;
using (var graphics = XGraphics.FromPdfPage(outputPage))
{
graphics.RotateAtTransform(90, new XPoint(input.PointHeight / 2, input.PointHeight / 2));
graphics.DrawImage(input, new XRect(0, 0, input.PointWidth, input.PointHeight));
}
}
}

Related

How to add an image to whole page with iText 7 C#

I have a .png files which are size of A4. I am adding them to the .pdf. It is working but my image is not covering all .pdf document page and it is leaving the white edges around it. How to cover whole page with my image?
String dest = "C:\\ImagePaged.pdf";
PdfWriter writer = new PdfWriter(dest);
// Creating a PdfDocument
PdfDocument pdfDoc = new PdfDocument(writer);
// Creating a Document
iText.Layout.Document document2 = new iText.Layout.Document(pdfDoc);
// process and save pages one by one
for (int i = 0; i < 10; i++) //count of .png images
{
iText.IO.Image.ImageData imageData = iText.IO.Image.ImageDataFactory.Create($"C:\\ImagePage{i}.png");
Image image = new Image(imageData);
document2.Add(image);
}
document2.Close();
I guess I need somehow to set page edge parameters. But how to do that.
Pretty sure it is this one
pdfDoc.SetMargins(0, 0, 0, 0);

C# Cropping image return wrong coordinates

I've been trying to crop a specific image using Selenium and different cropping methods for a few days.
An important note before my code - the following method used to work 2 weeks ago and for some reason it now returns an image with wrong coordinates
// Go to site
Driver.Navigate().GoToUrl("http://google.com");
Screenshot screenshot = driver.GetScreenshot();
using (var ms = new MemoryStream(screenshot.AsByteArray))
using (var imgShot = Image.FromStream(ms))
using (var src = new Bitmap(imgShot))
{
IWebElement element = driver.FindElement(By.XPath("//canvas"));
Rectangle cropRect = new Rectangle(element.Location.X, element.Location.Y, element.Size.Width, element.Size.Height);
var clone = src.Clone(cropRect, src.PixelFormat);
clone.Save(filePath);
}
Things I tried:
1) I usually use Firefox driver for this purpose, I tried using ChromeDriver instead and got the same result.
2) I checked for the element's coordiantes using the following console command: $0.getBoundingClientRect() and the position I got in my code matches it.
3) I tried 4 different cropping methods including this one:
IWebElement element = Driver.FindElement(By.XPath("//canvas"));
string filename = #"C:\Users\User\Desktop\test.png";
Screenshot screenshot = Driver.GetScreenshot();
screenshot.SaveAsFile(filename, ImageFormat.Png);
Rectangle cropRect = new Rectangle(element.Location.X, element.Location.Y,
element.Size.Width, element.Size.Height);
using (Image imgShot = Image.FromFile(filename))
using (Bitmap original = new Bitmap(imgShot))
using (Bitmap target = new Bitmap(original, new Size(cropRect.Width, cropRect.Height)))
using (Graphics g = Graphics.FromImage(target))
{
g.DrawImage(original, new Rectangle(0, 0, target.Width, target.Height),
cropRect,
GraphicsUnit.Pixel);
target.Save(#"C:\Users\User\Desktop\test1.png", ImageFormat.Png);
}
Just to be clear, the image I get is totally blank. In a different website the image I get is not blank so I can tell it's just in the wrong coordinates.
4) I tried a different website and different elements and they were all in the wrong coordinates.
5) I tried to Google it and found so many different approaches that didn't work. This answer however, says something about resolution which was my best guess. I tried playing with both the original and the target's resolution and saw no difference. The set resolution method was called either before or after the Graphics variable was created and still, zero change.
The funny thing is, it used to work 2 weeks ago but I never changed the code...
You are getting a blank image probably because the area is not yet rendered when GetScreenshot is called.
Try to wait to see if it's the case:
Thread.Sleep(3000);
Screenshot screenshot = ((ITakesScreenshot)element).GetScreenshot();
It could also be due to the implementation in the page preventing web scrapers, in which case there's nothing much you can do without digging in the code.
Note that you shouldn't use element.Location since it returns the coordinates relative to the document and not from the viewport.
You should also consider calling GetScreenshot directly on a IWebElement if the driver supports it.
Here's a working example to capture a footer:
var options = new ChromeOptions();
options.AddArgument("disable-infobars");
var driver = new ChromeDriver(options);
driver.Url = "https://stackoverflow.com/questions";
IWebElement element = driver.FindElement(By.CssSelector("#footer"));
string filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), #"screenshot.png");
try {
Thread.Sleep(500);
Screenshot screenshot = ((ITakesScreenshot)element).GetScreenshot();
screenshot.SaveAsFile(filePath, ScreenshotImageFormat.Png);
}
catch (WebDriverException) {
var result = ((IJavaScriptExecutor)driver).ExecuteScript(
"var elm = arguments[0];" +
"elm.scrollIntoView(true);" +
"var rect = elm.getBoundingClientRect();" +
"return [rect.left, rect.top, rect.width, rect.height];"
, element);
int[] pts = Array.ConvertAll(((IReadOnlyCollection<object>)result).ToArray(), Convert.ToInt32);
var rect = new Rectangle(pts[0], pts[1], pts[2], pts[3]);
Screenshot screenshot = ((ITakesScreenshot)driver).GetScreenshot();
using (var mstream = new MemoryStream(screenshot.AsByteArray))
using (var bitmap = (Bitmap)Image.FromStream(mstream, false, false)) {
rect.Intersect(new Rectangle(0, 0, bitmap.Width, bitmap.Height));
if (rect.IsEmpty)
throw new ArgumentOutOfRangeException("Cropping rectangle is out of range.");
var clone = bitmap.Clone(rect, bitmap.PixelFormat);
clone.Save(filePath);
}
}
So apparently the thing that actually changed 2 weeks ago wasn't my code but the Firefox version installed on my machine. The current version - v52, return elements in wrong coordinates.
Uninstalling it and reinstalling the previous version - v47 solved the issue

iTextSharp CreateInk method: curves and corners

I'm using iTextSharp to draw markup graphics in PDF documents, using PdfAnnotation.CreateInk. I'm trying to draw rectangles, passing in an array of five co-ordinates. I know iTextSharp has a dedicated function for drawing rectangles, but I'm trying to use just one method to draw the various kinds of markup lines I need, and it seems that CeateInk should be able to do that.
The problem is that CreateInk is drawing the rectangle with curves rather than corners. I haven't been able to figure out how to change this. This answer suggests that the solution might be to create a PdfAppearance; I haven't yet figured out whether that will work. Here's my code (the first bit converts a list of list of points to an array of arrays, which InkList requires):
public void MarkupInk(List<List<float>> InkList){
float[][] Coords = new float[InkList.Count][];
for (int i = 0; i < InkList.Count; i++) {
float[] thisarr = InkList [i].ToArray ();
Coords[i] = new float[InkList[i].Count];
System.Array.Copy (thisarr, Coords [i], InkList [i].Count);
}
using(MemoryStream ms = new MemoryStream ()){
PdfReader reader = new PdfReader (textmaster.pdfDocArr);
PdfStamper stamper = new PdfStamper (reader, ms);
pagerect = reader.GetPageSize (textmaster.currentfirstpage + 1);
PdfAnnotation an2 = PdfAnnotation.CreateInk (stamper.Writer, pagerect, "", Coords);
an2.Color = strokeColor;
an2.BorderStyle = new PdfBorderDictionary (strokeWeight, PdfBorderDictionary.STYLE_SOLID);
stamper.AddAnnotation (an2, textmaster.currentfirstpage+1);
stamper.Close ();
textmaster.pdfDocArr = ms.ToArray ();
reader.Close ();
}
}
Any suggestions are much appreciated. Thanks!
EDIT: following #mkl's code I now have code that creates PDFAnnotations with appearances. And most of those annotations show up correctly in the viewing applications I use. But there is one odd behavior that I have not been able to fix.
What's happening is that the most recently created annotation does not appear in the viewing applications until I've created another annotation. So if I create annotation A, it's invisible until I create annotation B, at which point annotation A appears and B does not. Creating annotation C causes annotation B to appear, and so on.
This behavior persists even if I close the pdf file and the viewing application and re-load from disk. So the data describing the last-created annotation exists as part of the pdf file, but it doesn't render until I've created a new annotation.
I suspect there's something I'm still missing in the code I'm using to create the annotations and pdfAppearances. Here's code that creates a single line annotation:
public void WriteLineAnnotation(List<float> polyCoords){
using (MemoryStream ms = new MemoryStream ()) {
PdfReader reader = new PdfReader (textmaster.pdfDocArr);
PdfStamper stamper = new PdfStamper (reader, ms) { AnnotationFlattening = true };
pagerect = reader.GetPageSize (textmaster.currentfirstpage + 1);
//Create the pdfAnnotation polyline
PdfAnnotation polyann = PdfAnnotation.CreatePolygonPolyline (stamper.Writer, pagerect, "", false, new PdfArray (polyCoords.ToArray ()));
polyann.Color = strokeColor;
polyann.BorderStyle = new PdfBorderDictionary (strokeWeight, PdfBorderDictionary.STYLE_SOLID);
polyann.Flags = iTextSharp.text.pdf.PdfAnnotation.FLAGS_PRINT;
//create the PdfAppearance and set attributes
PdfAppearance app = PdfAppearance.CreateAppearance (stamper.Writer, pagerect.Width, pagerect.Height);
app.SetColorStroke (strokeColor);
app.MoveTo (polyCoords [0], polyCoords [1]);
app.LineTo (polyCoords [2], polyCoords [3]);
app.Stroke ();
//set the annotation's appearance, add annotation, clean up
polyann.SetAppearance (PdfName.N, app);
stamper.AddAnnotation (polyann, textmaster.currentfirstpage + 1);
stamper.Close ();
reader.Close ();
//create bytearray from memorystream and send to pdf renderer
textmaster.pdfDocArr = ms.ToArray ();
}
}
[/code]
Is there something obvious that I'm missing? Thanks in advance for any help.
Which annotation type to use
I know iTextSharp has a dedicated function for drawing rectangles, but I'm trying to use just one method to draw the various kinds of markup lines I need, and it seems that CeateInk should be able to do that.
Please be aware that iText not merely has separate functions for those different forms, these different functions also create different types of PDF annotations.
In particular the Ink annotation -- which you would like to use for all forms -- is specified as
An ink annotation (PDF 1.3) represents a freehand “scribble” composed of one or more disjoint paths.
(Section 12.5.6.13 - Ink Annotations - ISO 32000-1)
As a freehand “scribble” commonly is not considered to be a sequence of straight lines and sharp corners but instead more soft and rounded; thus, it is only natural that PDF viewers will display an ink annotation given by coordinates of the corners of a rectangle with curves rather than corners.
Of course you can use an appearance stream to provide a visualization of the appearance but that would be a small misuse of this PDF feature.
Instead I would propose you use a different kind of annotation to draw the various kinds of markup lines you need: Polyline annotations. They are specified as:
Polygon annotations (PDF 1.5) display closed polygons on the page. Such polygons may have any number of vertices connected by straight lines. Polyline annotations (PDF 1.5) are similar to polygons, except that the first and last vertex are not implicitly connected.
(Section 12.5.6.9 - Polygon and Polyline Annotations - ISO 32000-1)
iText(Sharp) provides a method for this kind of annotations, too:
/**
* Creates a polygon or -line annotation
* #param writer the PdfWriter
* #param rect the annotation position
* #param contents the textual content of the annotation
* #param polygon if true, the we're creating a polygon annotation, if false, a polyline
* #param vertices an array with the vertices of the polygon or -line
* #since 5.0.2
*/
public static PdfAnnotation CreatePolygonPolyline(
PdfWriter writer, Rectangle rect, String contents, bool polygon, PdfArray vertices)
You might eventually still have to create annotations as not all PDF viewers, in particular so called "pre-viewers", generate appearances but instead count on appearances being provided in PDFs they pre-display...
Examples
Without own appearance
using (PdfReader pdfReader = new PdfReader(inputPath))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream))
{
PdfArray vertices = new PdfArray(new int[]{ 100, 100, 300, 300, 100, 300, 300, 100 });
PdfAnnotation polyLine = PdfAnnotation.CreatePolygonPolyline(pdfStamper.Writer, pdfReader.GetPageSize(1),
"", false, vertices);
polyLine.Color = BaseColor.GREEN;
pdfStamper.AddAnnotation(polyLine, 1);
}
adds this:
in PDF viewers supporting annotations including appearance generation according to the specification.
With own appearance
using (PdfReader pdfReader = new PdfReader(inputPath))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream))
{
Rectangle pageSize = pdfReader.GetPageSize(1);
PdfArray vertices = new PdfArray(new int[] { 100, 100, 300, 300, 100, 300, 300, 100 });
PdfAnnotation polyLine = PdfAnnotation.CreatePolygonPolyline(pdfStamper.Writer, pageSize,
"", false, vertices);
polyLine.Color = BaseColor.GREEN;
PdfAppearance appearance = PdfAppearance.CreateAppearance(pdfStamper.Writer, pageSize.Width, pageSize.Height);
appearance.SetColorStroke(BaseColor.RED);
appearance.MoveTo(100, 100);
appearance.LineTo(300, 300);
appearance.LineTo(100, 300);
appearance.LineTo(300, 100);
appearance.Stroke();
polyLine.SetAppearance(PdfName.N, appearance);
pdfStamper.AddAnnotation(polyLine, 1);
}
adds this
in PDF viewers supporting annotations which bring along their annotation according to the specification.
(I explicitly used a different color in my appearance to make sure the PDF viewer shows my appearance and does not create one itself.)

wpf - how to print a whole page to pdf with PDFSharp

So I want to know how to print my entire WPF page to a PDF file with PDFSharp.
I've already been looking at several articles but I can't seem to figure it out.
I want the pdf to look something like this:
I've already looked up on articles about drawing strings, lines name it. But creating every line, string and shape individually looks like a sloppy and bad idea to me.
Can anyone help me with this?
Articles will also be appreciated!
Thanks in advance
I Would say first export your control to image with RenderTargetBitmap and then use a library to export it to PDF.
Maybe this sample might help ?
http://www.techcognition.com/post/Create-PDF-File-From-WPF-Window-using-iTextsharp-1001
With this library
Here his the Control to Image class I'm using with sucess (I'm able to get a PNG snapshot of complex UI controls implying a very deep parent-child hierarchy)
The source is a WPF control container (usercontrol, grid, wahtever).
The path is the full path for PNG output file (C:\Temp\myImage.png)
public class ControlToImageSnapshot
{
/// <summary>
/// Conversion du controle en image PNG
/// </summary>
/// <param name="source">Contrôle à exporter</param>
/// <param name="path">Destination de l'export</param>
/// <param name="zoom">Taille désirée</param>
public static void SnapShotPng(FrameworkElement source, string path, double zoom = 1.0)
{
try
{
var dir = Path.GetDirectoryName(path);
if (dir != null && !Directory.Exists(dir))
{
Directory.CreateDirectory(dir);
}
RenderTargetBitmap renderTarget = new RenderTargetBitmap((int)source.ActualWidth, (int)source.ActualHeight, 96, 96, PixelFormats.Pbgra32);
VisualBrush sourceBrush = new VisualBrush(source);
DrawingVisual drawingVisual = new DrawingVisual();
DrawingContext drawingContext = drawingVisual.RenderOpen();
using (drawingContext)
{
drawingContext.DrawRectangle(sourceBrush, null, new Rect(new Point(0, 0), new Point(source.ActualWidth, source.ActualHeight)));
}
renderTarget.Render(drawingVisual);
PngBitmapEncoder encoder = new PngBitmapEncoder();
encoder.Frames.Add(BitmapFrame.Create(renderTarget));
using (FileStream stream = new FileStream(path, FileMode.Create, FileAccess.Write))
{
encoder.Save(stream);
}
createPdfFromImage(path, #"C:\Temp\myfile.pdf");
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
public static void createPdfFromImage(string imageFile, string pdfFile)
{
using (var ms = new MemoryStream())
{
var document = new iTextSharp.text.Document(iTextSharp.text.PageSize.LETTER.Rotate(), 0, 0, 0, 0);
PdfWriter.GetInstance(document, new FileStream(pdfFile, FileMode.Create));
iTextSharp.text.pdf.PdfWriter.GetInstance(document, ms).SetFullCompression();
document.Open();
FileStream fs = new FileStream(imageFile, FileMode.Open);
var image = iTextSharp.text.Image.GetInstance(fs);
image.ScaleToFit(document.PageSize.Width, document.PageSize.Height);
document.Add(image);
document.Close();
//open pdf file
Process.Start("explorer.exe", pdfFile);
}
}
}
For pdfsharp its quite easy, you can pass in an array of bytes for the pdf image and pdf, ive used this function quite a lot when dealing with images in pdfsharp.
fairly self explanatory
Open pdf and the image into a memorystream
Get pdf setup and choose page to draw on
I always set interpolate to false, I get better results with the kind of images I'm dealing with, if you have shading in your image set it to true.
then all your left to do is draw the image on the pdf and return as a memorystream
public static byte[] AddImageToPdf(byte[] pdf, byte[] img, double x, double y)
{
using (var msPdf = new MemoryStream(pdf))
{
using (var msImg = new MemoryStream(img))
{
var image = Image.FromStream(msImg);
var document = PdfReader.Open(msPdf);
var page = document.Pages[0];
var gfx = XGraphics.FromPdfPage(page);
var ximg = XImage.FromGdiPlusImage(image);
ximg.Interpolate = false;
gfx.DrawImage(
ximg,
XUnit.FromCentimeter(x),
XUnit.FromCentimeter(y),
ximg.PixelWidth * 72 / ximg.HorizontalResolution,
ximg.PixelHeight * 72 / ximg.HorizontalResolution);
using (var msFinal = new MemoryStream())
{
document.Save(msFinal);
return msFinal.ToArray();
}
}
}
}
its hardcoded for page 1 in the pdf, easily extendable to pass in pages if you want, ill leave that as an exercise for yourself, at the end you get a nice byte array containing your pdf, no files need to touch the ground enroute if you import your image as a memorystream from your control and pass it in. another answer in this topic has a good way of getting the control image.

PDF Compression with iTextSharp [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am currently trying to recompress a pdf that has already been created, I am trying to find a way to recompress the images that are in the document, to reduce the file size.
I have been trying to do this with the DataLogics PDE and iTextSharp libraries but I can not find a way to do the stream recompression of the items.
I have though about looping over the xobjects and getting the images and then dropping the DPI down to 96 or using the libjpeg C# implimentation to change the quality of the image but getting it back into the pdf stream seems to always end up, with memory corruption or some other issue.
Any samples will be appreciated.
Thanks
iText and iTextSharp have some methods for replacing indirect objects. Specifically there's PdfReader.KillIndirect() which does what it says and PdfWriter.AddDirectImageSimple(iTextSharp.text.Image, PRIndirectReference) which you can then use to replace what you killed off.
In pseudo C# code you'd do:
var oldImage = PdfReader.GetPdfObject();
var newImage = YourImageCompressionFunction(oldImage);
PdfReader.KillIndirect(oldImage);
yourPdfWriter.AddDirectImageSimple(newImage, (PRIndirectReference)oldImage);
Converting the raw bytes to a .Net image can be tricky, I'll leave that up to you or you can search here. Mark has a good description here. Also, technically PDFs don't have a concept of DPI, that's for printers mostly. See the answer here for more on that.
Using the method above your compression algorithm can actually do two things, physically shrink the image as well as apply JPEG compression. When you physically shrink the image and add it back it will occupy the same amount of space as the original image but with less pixels to work with. This will get you what you consider to be DPI reduction. The JPEG compression speaks for itself.
Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0. It takes an existing JPEG on your desktop called "LargeImage.jpg" and creates a new PDF from it. Then it opens the PDF, extracts the image, physically shrinks it to 90% of the original size, applies 85% JPEG compression and writes it back to the PDF. See the comments in the code for more of an explanation. The code needs lots more null/error checking. Also looks for NOTE comments where you'll need to expand to handle other situations.
using System;
using System.Drawing;
using System.Drawing.Imaging;
using System.Drawing.Drawing2D;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1 {
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e) {
//Our working folder
string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//Large image to add to sample PDF
string largeImage = Path.Combine(workingFolder, "LargeImage.jpg");
//Name of large PDF to create
string largePDF = Path.Combine(workingFolder, "Large.pdf");
//Name of compressed PDF to create
string smallPDF = Path.Combine(workingFolder, "Small.pdf");
//Create a sample PDF containing our large image, for demo purposes only, nothing special here
using (FileStream fs = new FileStream(largePDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document()) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
iTextSharp.text.Image importImage = iTextSharp.text.Image.GetInstance(largeImage);
doc.SetPageSize(new iTextSharp.text.Rectangle(0, 0, importImage.Width, importImage.Height));
doc.SetMargins(0, 0, 0, 0);
doc.NewPage();
doc.Add(importImage);
doc.Close();
}
}
}
//Now we're going to open the above PDF and compress things
//Bind a reader to our large PDF
PdfReader reader = new PdfReader(largePDF);
//Create our output PDF
using (FileStream fs = new FileStream(smallPDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
//Bind a stamper to the file and our reader
using (PdfStamper stamper = new PdfStamper(reader, fs)) {
//NOTE: This code only deals with page 1, you'd want to loop more for your code
//Get page 1
PdfDictionary page = reader.GetPageN(1);
//Get the xobject structure
PdfDictionary resources = (PdfDictionary)PdfReader.GetPdfObject(page.Get(PdfName.RESOURCES));
PdfDictionary xobject = (PdfDictionary)PdfReader.GetPdfObject(resources.Get(PdfName.XOBJECT));
if (xobject != null) {
PdfObject obj;
//Loop through each key
foreach (PdfName name in xobject.Keys) {
obj = xobject.Get(name);
if (obj.IsIndirect()) {
//Get the current key as a PDF object
PdfDictionary imgObject = (PdfDictionary)PdfReader.GetPdfObject(obj);
//See if its an image
if (imgObject.Get(PdfName.SUBTYPE).Equals(PdfName.IMAGE)) {
//NOTE: There's a bunch of different types of filters, I'm only handing the simplest one here which is basically raw JPG, you'll have to research others
if (imgObject.Get(PdfName.FILTER).Equals(PdfName.DCTDECODE)) {
//Get the raw bytes of the current image
byte[] oldBytes = PdfReader.GetStreamBytesRaw((PRStream)imgObject);
//Will hold bytes of the compressed image later
byte[] newBytes;
//Wrap a stream around our original image
using (MemoryStream sourceMS = new MemoryStream(oldBytes)) {
//Convert the bytes into a .Net image
using (System.Drawing.Image oldImage = Bitmap.FromStream(sourceMS)) {
//Shrink the image to 90% of the original
using (System.Drawing.Image newImage = ShrinkImage(oldImage, 0.9f)) {
//Convert the image to bytes using JPG at 85%
newBytes = ConvertImageToBytes(newImage, 85);
}
}
}
//Create a new iTextSharp image from our bytes
iTextSharp.text.Image compressedImage = iTextSharp.text.Image.GetInstance(newBytes);
//Kill off the old image
PdfReader.KillIndirect(obj);
//Add our image in its place
stamper.Writer.AddDirectImageSimple(compressedImage, (PRIndirectReference)obj);
}
}
}
}
}
}
}
this.Close();
}
//Standard image save code from MSDN, returns a byte array
private static byte[] ConvertImageToBytes(System.Drawing.Image image, long compressionLevel) {
if (compressionLevel < 0) {
compressionLevel = 0;
} else if (compressionLevel > 100) {
compressionLevel = 100;
}
ImageCodecInfo jgpEncoder = GetEncoder(ImageFormat.Jpeg);
System.Drawing.Imaging.Encoder myEncoder = System.Drawing.Imaging.Encoder.Quality;
EncoderParameters myEncoderParameters = new EncoderParameters(1);
EncoderParameter myEncoderParameter = new EncoderParameter(myEncoder, compressionLevel);
myEncoderParameters.Param[0] = myEncoderParameter;
using (MemoryStream ms = new MemoryStream()) {
image.Save(ms, jgpEncoder, myEncoderParameters);
return ms.ToArray();
}
}
//standard code from MSDN
private static ImageCodecInfo GetEncoder(ImageFormat format) {
ImageCodecInfo[] codecs = ImageCodecInfo.GetImageDecoders();
foreach (ImageCodecInfo codec in codecs) {
if (codec.FormatID == format.Guid) {
return codec;
}
}
return null;
}
//Standard high quality thumbnail generation from http://weblogs.asp.net/gunnarpeipman/archive/2009/04/02/resizing-images-without-loss-of-quality.aspx
private static System.Drawing.Image ShrinkImage(System.Drawing.Image sourceImage, float scaleFactor) {
int newWidth = Convert.ToInt32(sourceImage.Width * scaleFactor);
int newHeight = Convert.ToInt32(sourceImage.Height * scaleFactor);
var thumbnailBitmap = new Bitmap(newWidth, newHeight);
using (Graphics g = Graphics.FromImage(thumbnailBitmap)) {
g.CompositingQuality = CompositingQuality.HighQuality;
g.SmoothingMode = SmoothingMode.HighQuality;
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
System.Drawing.Rectangle imageRectangle = new System.Drawing.Rectangle(0, 0, newWidth, newHeight);
g.DrawImage(sourceImage, imageRectangle);
}
return thumbnailBitmap;
}
}
}
I don't know about iTextSharp, but you have to rewrite a PDF file if anything is changed, as it contains an xref table (index) with the exact file position of each object. This means if even one byte is added or removed, the PDF becomes corrupted.
Your best bet for recompressing the images is JBIG2 if they are B&W, or JPEG2000 otherwise, for which Jasper library will happily encode JPEG2000 codestreams for placement into PDF files at whatever quality you so desire.
If it were me I'd do it all from code without the PDF libraries. Just find all images (anything between stream and endstream after an occurance of JPXDecode (JPEG2000), JBIG2Decode (JBIG2) or DCTDecode (JPEG)) pull that out, reencode it with Jasper, then stick it back in again and update the xref table.
To update the xref table, find the positions of each object (starting 00001 0 obj) and just update the new positions in the xref table. It's not too much work, less than it sounds. You might be able to get all the offsets with a single regular expression (I'm not a C# programmer, but in PHP it would be that simple.)
Then finally update the value of the startxref tag in the trailer with the offset of the beginning of the xref table (where it says xref in the file).
Otherwise you'll end up decoding the entire PDF and rewriting it all, which will be slow, and you might lose something along the way.
There is an example on how to find and replace images in an existing PDF by the creator of iText. It's actually a small excerpt from his book. Since it's in Java, here's a simple replacement:
public void ReduceResolution(PdfReader reader, long quality) {
int n = reader.XrefSize;
for (int i = 0; i < n; i++) {
PdfObject obj = reader.GetPdfObject(i);
if (obj == null || !obj.IsStream()) {continue;}
PdfDictionary dict = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName subType = (PdfName)PdfReader.GetPdfObject(
dict.Get(PdfName.SUBTYPE)
);
if (!PdfName.IMAGE.Equals(subType)) {continue;}
PRStream stream = (PRStream )obj;
try {
PdfImageObject image = new PdfImageObject(stream);
PdfName filter = (PdfName) image.Get(PdfName.FILTER);
if (
PdfName.JBIG2DECODE.Equals(filter)
|| PdfName.JPXDECODE.Equals(filter)
|| PdfName.CCITTFAXDECODE.Equals(filter)
|| PdfName.FLATEDECODE.Equals(filter)
) continue;
System.Drawing.Image img = image.GetDrawingImage();
if (img == null) continue;
var ll = image.GetImageBytesType();
int width = img.Width;
int height = img.Height;
using (System.Drawing.Bitmap dotnetImg =
new System.Drawing.Bitmap(img))
{
// set codec to jpeg type => jpeg index codec is "1"
System.Drawing.Imaging.ImageCodecInfo codec =
System.Drawing.Imaging.ImageCodecInfo.GetImageEncoders()[1];
// set parameters for image quality
System.Drawing.Imaging.EncoderParameters eParams =
new System.Drawing.Imaging.EncoderParameters(1);
eParams.Param[0] =
new System.Drawing.Imaging.EncoderParameter(
System.Drawing.Imaging.Encoder.Quality, quality
);
using (MemoryStream msImg = new MemoryStream()) {
dotnetImg.Save(msImg, codec, eParams);
msImg.Position = 0;
stream.SetData(msImg.ToArray());
stream.SetData(
msImg.ToArray(), false, PRStream.BEST_COMPRESSION
);
stream.Put(PdfName.TYPE, PdfName.XOBJECT);
stream.Put(PdfName.SUBTYPE, PdfName.IMAGE);
stream.Put(PdfName.FILTER, filter);
stream.Put(PdfName.FILTER, PdfName.DCTDECODE);
stream.Put(PdfName.WIDTH, new PdfNumber(width));
stream.Put(PdfName.HEIGHT, new PdfNumber(height));
stream.Put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
stream.Put(PdfName.COLORSPACE, PdfName.DEVICERGB);
}
}
}
catch {
// throw;
// iText[Sharp] can't handle all image types...
}
finally {
// may or may not help
reader.RemoveUnusedObjects();
}
}
}
You'll notice it's only handling JPEG. The logic is reversed (instead of explicitly handling only DCTDECODE/JPEG) so you can uncomment some of the ignored image types and experiment with the PdfImageObject in the code above. In particular, most of the FLATEDECODE images (.bmp, .png, and .gif) are represented as PNG (confirmed in the DecodeImageBytes method of the PdfImageObject source code). As far as I know, .NET does not support PNG encoding. There are some references to support this here and here. You can try a stand-alone PNG optimization executable, but you also have to figure out how to set PdfName.BITSPERCOMPONENT and PdfName.COLORSPACE in the PRStream.
For completeness sake, since your question specifically asks about PDF compression, here's how you compress a PDF with iTextSharp:
PdfStamper stamper = new PdfStamper(
reader, YOUR-STREAM, PdfWriter.VERSION_1_5
);
stamper.Writer.CompressionLevel = 9;
int total = reader.NumberOfPages + 1;
for (int i = 1; i < total; i++) {
reader.SetPageContent(i, reader.GetPageContent(i));
}
stamper.SetFullCompression();
stamper.Close();
You might also try and run the PDF through PdfSmartCopy to get the file size down. It removes redundant resources, but like the call to RemoveUnusedObjects() in the finally block, it may or may not help. That will depend on how the PDF was created.
IIRC iText[Sharp] doesn't deal well with JBIG2DECODE, so #Alasdair's suggestion looks good - if you want to take the time learning the Jasper library and using the brute-force approach.
Good luck.
EDIT - 2012-08-17, comment by #Craig:
To save the PDF after compressing the jpegs using the ReduceResolution() method above:
a. Instantiate a PdfReader object:
PdfReader reader = new PdfReader(pdf);
b. Pass the PdfReader to the ReduceResolution() method above.
c. Pass the altered PdfReader to a PdfStamper. Here's one way using a MemoryStream:
// Save altered PDF. then you can pass the btye array to a database, etc
using (MemoryStream ms = new MemoryStream()) {
using (PdfStamper stamper = new PdfStamper(reader, ms)) {
}
return ms.ToArray();
}
Or you can use any other Stream if you don't need to keep the PDF in memory. E.g. use a FileStream and save directly to disk.
I've written a library to do just that. It will also OCR the pdf's using Tesseract or Cuneiform and create searchable, compressed PDF files. It's a library that uses several open source projects (iTextsharp, jbig2 encoder, Aforge, muPDF#) to complete the task. You can check it out here http://hocrtopdf.codeplex.com/
I am not sure if you are considering other libraries, but you can easily recompress existing images using Docotic.Pdf library (Disclaimer: I work for the company).
Here is some sample code:
static void RecompressExistingImages(string fileName, string outputName)
{
using (PdfDocument doc = new PdfDocument(fileName))
{
foreach (PdfImage image in doc.Images)
image.RecompressWithGroup4Fax();
doc.Save(outputName);
}
}
There are also RecompressWithFlate, RecompressWithGroup3Fax, RecompressWithJpeg and Uncompress methods.
The library will convert color images to bilevel ones if needed. You can specify deflate compression level, JPEG quality etc.
I am also ask you to think twice before using approach suggested by #Alasdair. If you are going to deal with PDF files that weren't created by you than the task is far more complex that it might seem.
To start with, there is great deal of images compressed by codecs other than JPXDecode, JBIG2Decode or DCTDecode. And PDF can also contain inline images.
PDF files saved using newer versions of standard (1.5 or newer) can contain cross-reference streams. It means that reading and updating such files is more complex than just finding/updating some numbers at the end of the file.
So, please, use a PDF library.
A simple way to compress PDF is using gsdll32.dll (Ghostscript) and Cyotek.GhostScript.dll (wrapper):
public static void CompressPDF(string sInFile, string sOutFile, int iResolution)
{
string[] arg = new string[]
{
"-sDEVICE=pdfwrite",
"-dNOPAUSE",
"-dSAFER",
"-dBATCH",
"-dCompatibilityLevel=1.5",
"-dDownsampleColorImages=true",
"-dDownsampleGrayImages=true",
"-dDownsampleMonoImages=true",
"-sPAPERSIZE=a4",
"-dPDFFitPage",
"-dDOINTERPOLATE",
"-dColorImageDownsampleThreshold=1.0",
"-dGrayImageDownsampleThreshold=1.0",
"-dMonoImageDownsampleThreshold=1.0",
"-dColorImageResolution=" + iResolution.ToString(),
"-dGrayImageResolution=" + iResolution.ToString(),
"-dMonoImageResolution=" + iResolution.ToString(),
"-sOutputFile=" + sOutFile,
sInFile
};
using(GhostScriptAPI api = new GhostScriptAPI())
{
api.Execute(arg);
}
}

Categories

Resources