Manipulate PDF objects

Manipulate PDF objects - c#

I am trying to manipulate a PDF, it functions as a template. What I am trying is replacing 'placeholders' in the PDF template with my data. So someone makes a PDF template in Scribus for example, and adds an empty image with the name "company_logo". My application sees an image placeholder with the name "company_logo" and it adds the company logo there.
I can browse AcroFields with iTextSharp library and set text in a text field (for example) but AcroFields doesn't list the image placeholder. I've got the feeling that AcroFields is not what I am looking for.
So how can I get a list (or tree) of all objects from the PDF and read their properties (like position, size, contents, etc).
P.S. I do not necessarily need to use iTextSharp, any other PDF lib will do as well. Preferably free.
A little pseudo code to make myself more clear
var object = Pdf.GetObjectById("company_logo");
object.SetValue(myImage);
object.SetPosition(x, y);

From your pseudo-code example, we understand that you want to replace the stream of an object that contains an image. There are several examples on how to do this.
For instance, in the SpecialID example, we create a PDF where we mark a specific image with a special ID. In the ResizeImage example, we track that image based on that special ID and we replace the stream:
object = reader.getPdfObject(i);
if (object == null || !object.isStream())
continue;
stream = (PRStream)object;
if (value.equals(stream.get(key))) {
PdfImageObject image = new PdfImageObject(stream);
BufferedImage bi = image.getBufferedImage();
if (bi == null) continue;
int width = (int)(bi.getWidth() * FACTOR);
int height = (int)(bi.getHeight() * FACTOR);
BufferedImage img = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
AffineTransform at = AffineTransform.getScaleInstance(FACTOR, FACTOR);
Graphics2D g = img.createGraphics();
g.drawRenderedImage(bi, at);
ByteArrayOutputStream imgBytes = new ByteArrayOutputStream();
ImageIO.write(img, "JPG", imgBytes);
stream.clear();
stream.setData(imgBytes.toByteArray(), false, PRStream.NO_COMPRESSION);
stream.put(PdfName.TYPE, PdfName.XOBJECT);
stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
stream.put(key, value);
stream.put(PdfName.FILTER, PdfName.DCTDECODE);
stream.put(PdfName.WIDTH, new PdfNumber(width));
stream.put(PdfName.HEIGHT, new PdfNumber(height));
stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
stream.put(PdfName.COLORSPACE, PdfName.DEVICERGB);
}
You will find another example in the book The Best iText Questions on StackOverflow where I answered the following question: PDF Convert to Black And White PNGs
I wrote the ReplaceImage example to show how to replace the image:
public static void replaceStream(PRStream orig, PdfStream stream) throws IOException {
orig.clear();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
stream.writeContent(baos);
orig.setData(baos.toByteArray(), false);
for (PdfName name : stream.getKeys()) {
orig.put(name, stream.get(name));
}
}
As you can see, it isn't as trivial as saying:
var object = Pdf.GetObjectById("company_logo");
object.SetValue(myImage);
As I explained in my comment, this doesn't make sense:
object.SetPosition(x, y);
The objects we're manipulating are streams that are used as Image XObjects. The advantage of having Image XObjects is that they can be reused. For instance: if you have the same logo on every page, then you want to store the bytes of that image only once and reuse the same logo multiple times. This means that the object with the image bytes doesn't know anything about its position. The position is determined in the content stream. It depends on the CTM.

Did you have a look at the scribus scripting capabilities?
Since you create a tamplate in scribus You could also write a short python script which replaces your placeholders with your final data and exports the final PDF.
Since scribus 1.5 it is also possible to call the python scripts from the commandline.

Related

Broken images extracted by PdfSharp C# Library

I am trying to execute the sample program of PDFSharp library. I have already provided the references in the project.
PROBLEM: The code executes without any error and I see that the method ExportJpegImage() is called 7 times (which is the number of images in the pdf). But when I try to open the images written by the program ( in .jpeg), the Windows cannot open them.
static void Main(string[] args)
{
Console.WriteLine("Starting PDF Sharp sample program...");
const string filename = #"D:/Test/test.pdf";
PdfDocument document = PdfReader.Open(filename);
int imageCount = 0;
// Iterate pages
foreach (PdfPage page in document.Pages)
{
// Get resources dictionary
PdfDictionary resources = page.Elements.GetDictionary("/Resources");
if (resources != null)
{
// Get external objects dictionary
PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
if (xObjects != null)
{
ICollection<pdfitem> items = xObjects.Elements.Values;
// Iterate references to external objects
foreach (PdfItem item in items)
{
PdfReference reference = item as PdfReference;
if (reference != null)
{
PdfDictionary xObject = reference.Value as PdfDictionary;
// Is external object an image?
if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image")
{
ExportJpegImage(xObject, ref imageCount);
}
}
}
}
}
}
}
static void ExportJpegImage(PdfDictionary image, ref int count)
{
// Fortunately JPEG has native support in PDF and exporting an image is just writing the stream to a file.
byte[] stream = image.Stream.Value;
FileStream fs = new FileStream(String.Format(#"D:\Test\Image{0}.jpeg", count++), FileMode.Create, FileAccess.Write);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(stream);
bw.Close();
}

Please read the note preceding that example:
Note: This snippet shows how to export JPEG images from a PDF file. PDFsharp cannot convert PDF pages to JPEG files. This sample does not handle non-JPEG images. It does not (yet) handle JPEG images that have been flate-encoded.
There are several different formats for non-JPEG images in PDF. Those are not supported by this simple sample and require several hours of coding, but this is left as an exercise to the reader.
Thus, the images you want to extract from your PDF most likely simply are not embedded as JPEG images (at least not without further filtering like deflation). To transform that data into something a normal image viewer can handle, you most likely have to invest several hours of coding.

Add PDF to Image ReportViewer

I am trying to print an RDL report with terms and conditions PDF. The problem is that the report itself is a Queue of images, whereas the T&C's are in PDF format. So whenever I do an "Enqueue", adding to the streams, it's looking at that PDF like one big image, as opposed to two pages. This causes a GDI+ generics error. Is there anyway for me to convert the PDF into the proper image format so that I can combine these documents? Here's the code I have so far:
internal static void DoPrintInvoice(int orderID, SalesOrderBLL.DocumentType doctype, string printer, int copies, List<string> lines)
{
using (var context = rempscoDataContext.CreateReadOnlyContext())
using (MiniProfiler.Current.Step("DoPrintInvoice()"))
{
//Customer Opt-Out
// Generate Report
using (var report = GetSalesOrderReport(orderID, _DocumentTypeDescriptions[doctype], doctype != DocumentType.InvoiceLetterhead, lines))
{
// returns queue of streams.
var streams = PrintingBLL.RenderStreams(report, landscape: false);
// returns byte array
var TermsAndConditions = GetTermsAndConditions();
//convert byte array to memory stream.
var TCStream = new MemoryStream(TermsAndConditions);
//conditional to add T&C's to stream.
if (doctype == DocumentType.OrderAcknowledgement)
{
streams.Enqueue(TCStream);
}
ParallelInvoke(
() => SaveSalesOrderPDF(orderID, doctype, report),
() => PrintingBLL.PrintStreams(streams, string.Format("Sales Order ({0})", report.DisplayName), printer, copies, false)
);
}
}
}
I've tried to convert the terms and conditions into an image, and back to a byte array but it gives me the same GDI generic issue. Any help would be greatly appreciated!

Have you looked at PDFSharp? I have had good luck working with it in the past for rendering PDFs.
www.pdfsharp.com

ITextSharp returning same size pdf when I'm trying to compress pdf file with different levels

I'm reading a pdf and injecting some content using itextsharp. The resulting byte[] is passed to the method below along with the compression level.
public static byte[] method(byte[] pdf,int compressionlevel)
{
using (MemoryStream outputPdfStream1 = new MemoryStream())
{
//PdfReader reader1 = new PdfReader(pdf);
//PdfStamper stamper1 = new PdfStamper(reader1, outputPdfStream1);
//int level = (int)compressionlevel;
//if (level <= 9)
// stamper1.Writer.CompressionLevel = (int)compressionlevel;
//else
// stamper1.Writer.SetFullCompression();
//stamper1.SetFullCompression();
//stamper1.Close();
//byte[] newfile = outputPdfStream1.ToArray();
//return newfile;
PdfReader reader = new PdfReader(pdf);
PdfStamper stamper = new PdfStamper(reader, outputPdfStream1,PdfWriter.VERSION_1_5);
int level = (int)compressionlevel;
if (level <= 9)
{
stamper.Writer.CompressionLevel = level;
}
else
stamper.Writer.SetFullCompression();
int total = reader.NumberOfPages + 1;
for (int i = 1; i < total; i++)
{
reader.SetPageContent(i, reader.GetPageContent(i));
}
stamper.SetFullCompression();
stamper.Close();
byte[] newfile = outputPdfStream1.ToArray();
return newfile;
}
}
If I comment stamper.SetFullCompression(); then this method is returning same size of byte array irrespective of the compression level am passing(am passing from 0 to 9 in each case)..
If I uncomment stamper.SetFullCompression(); this method is returning fully compressed version of the input byte irrespective of the compression level!!!
What is the purpose/difference of stamper.SetFullCompression(); and stamper.Writer.SetFullCompression();?
What is the correct way to use different compression levels so that I can see different sizes in each case?

You have a couple of questions and I'll try my best to answer them.
PdfStamper is a helper class that ultimately uses another class called PdfStamperImp to do most of the work. PdfStamperImp is derived from PdfWriter and when you use stamper.Writer you are actually getting back this implementation class. Many of the properties on PdfStamper also pass directly through to the implementation class. So these two calls actually do the same thing.
stamper.SetFullCompression();
stamper.Writer.SetFullCompression();
Another point of confusion is that SetFullCompression and the CompressionLevel aren't actually related at all. "Full compression" represents a feature that was added in PDF 1.5 called "Object Streams" that allows grouping PDF objects together to potentially allow for greater compression. There's actually no requirement that what we think of as "compression" actually occurs but in reality I think it would always happen. (Possibly a super simple document might get larger with this enabled, not sure and don't feel like testing.)
The CompressionLevel is actually what you normally think of as compression, a number from 0 to 9 or -1 to mean default (which currently equals six I think). This property is actually part of the PdfStream class which many classes ultimately derive from. This setting doesn't "trickle down", however. Since you are importing a stream from another location via GetPageContent() and SetPageContent() that specific stream has its own compression settings unrelated to the Writer's compression settings. There's actually a third parameter that you can pass to SetPageContent() to set your specific compression level if you want.
reader.SetPageContent(1, reader.GetPageContent(1), PdfStream.BEST_COMPRESSION);

PDF Compression with iTextSharp [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am currently trying to recompress a pdf that has already been created, I am trying to find a way to recompress the images that are in the document, to reduce the file size.
I have been trying to do this with the DataLogics PDE and iTextSharp libraries but I can not find a way to do the stream recompression of the items.
I have though about looping over the xobjects and getting the images and then dropping the DPI down to 96 or using the libjpeg C# implimentation to change the quality of the image but getting it back into the pdf stream seems to always end up, with memory corruption or some other issue.
Any samples will be appreciated.
Thanks

iText and iTextSharp have some methods for replacing indirect objects. Specifically there's PdfReader.KillIndirect() which does what it says and PdfWriter.AddDirectImageSimple(iTextSharp.text.Image, PRIndirectReference) which you can then use to replace what you killed off.
In pseudo C# code you'd do:
var oldImage = PdfReader.GetPdfObject();
var newImage = YourImageCompressionFunction(oldImage);
PdfReader.KillIndirect(oldImage);
yourPdfWriter.AddDirectImageSimple(newImage, (PRIndirectReference)oldImage);
Converting the raw bytes to a .Net image can be tricky, I'll leave that up to you or you can search here. Mark has a good description here. Also, technically PDFs don't have a concept of DPI, that's for printers mostly. See the answer here for more on that.
Using the method above your compression algorithm can actually do two things, physically shrink the image as well as apply JPEG compression. When you physically shrink the image and add it back it will occupy the same amount of space as the original image but with less pixels to work with. This will get you what you consider to be DPI reduction. The JPEG compression speaks for itself.
Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0. It takes an existing JPEG on your desktop called "LargeImage.jpg" and creates a new PDF from it. Then it opens the PDF, extracts the image, physically shrinks it to 90% of the original size, applies 85% JPEG compression and writes it back to the PDF. See the comments in the code for more of an explanation. The code needs lots more null/error checking. Also looks for NOTE comments where you'll need to expand to handle other situations.
using System;
using System.Drawing;
using System.Drawing.Imaging;
using System.Drawing.Drawing2D;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1 {
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e) {
//Our working folder
string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//Large image to add to sample PDF
string largeImage = Path.Combine(workingFolder, "LargeImage.jpg");
//Name of large PDF to create
string largePDF = Path.Combine(workingFolder, "Large.pdf");
//Name of compressed PDF to create
string smallPDF = Path.Combine(workingFolder, "Small.pdf");
//Create a sample PDF containing our large image, for demo purposes only, nothing special here
using (FileStream fs = new FileStream(largePDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document()) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
iTextSharp.text.Image importImage = iTextSharp.text.Image.GetInstance(largeImage);
doc.SetPageSize(new iTextSharp.text.Rectangle(0, 0, importImage.Width, importImage.Height));
doc.SetMargins(0, 0, 0, 0);
doc.NewPage();
doc.Add(importImage);
doc.Close();
}
}
}
//Now we're going to open the above PDF and compress things
//Bind a reader to our large PDF
PdfReader reader = new PdfReader(largePDF);
//Create our output PDF
using (FileStream fs = new FileStream(smallPDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
//Bind a stamper to the file and our reader
using (PdfStamper stamper = new PdfStamper(reader, fs)) {
//NOTE: This code only deals with page 1, you'd want to loop more for your code
//Get page 1
PdfDictionary page = reader.GetPageN(1);
//Get the xobject structure
PdfDictionary resources = (PdfDictionary)PdfReader.GetPdfObject(page.Get(PdfName.RESOURCES));
PdfDictionary xobject = (PdfDictionary)PdfReader.GetPdfObject(resources.Get(PdfName.XOBJECT));
if (xobject != null) {
PdfObject obj;
//Loop through each key
foreach (PdfName name in xobject.Keys) {
obj = xobject.Get(name);
if (obj.IsIndirect()) {
//Get the current key as a PDF object
PdfDictionary imgObject = (PdfDictionary)PdfReader.GetPdfObject(obj);
//See if its an image
if (imgObject.Get(PdfName.SUBTYPE).Equals(PdfName.IMAGE)) {
//NOTE: There's a bunch of different types of filters, I'm only handing the simplest one here which is basically raw JPG, you'll have to research others
if (imgObject.Get(PdfName.FILTER).Equals(PdfName.DCTDECODE)) {
//Get the raw bytes of the current image
byte[] oldBytes = PdfReader.GetStreamBytesRaw((PRStream)imgObject);
//Will hold bytes of the compressed image later
byte[] newBytes;
//Wrap a stream around our original image
using (MemoryStream sourceMS = new MemoryStream(oldBytes)) {
//Convert the bytes into a .Net image
using (System.Drawing.Image oldImage = Bitmap.FromStream(sourceMS)) {
//Shrink the image to 90% of the original
using (System.Drawing.Image newImage = ShrinkImage(oldImage, 0.9f)) {
//Convert the image to bytes using JPG at 85%
newBytes = ConvertImageToBytes(newImage, 85);
}
}
}
//Create a new iTextSharp image from our bytes
iTextSharp.text.Image compressedImage = iTextSharp.text.Image.GetInstance(newBytes);
//Kill off the old image
PdfReader.KillIndirect(obj);
//Add our image in its place
stamper.Writer.AddDirectImageSimple(compressedImage, (PRIndirectReference)obj);
}
}
}
}
}
}
}
this.Close();
}
//Standard image save code from MSDN, returns a byte array
private static byte[] ConvertImageToBytes(System.Drawing.Image image, long compressionLevel) {
if (compressionLevel < 0) {
compressionLevel = 0;
} else if (compressionLevel > 100) {
compressionLevel = 100;
}
ImageCodecInfo jgpEncoder = GetEncoder(ImageFormat.Jpeg);
System.Drawing.Imaging.Encoder myEncoder = System.Drawing.Imaging.Encoder.Quality;
EncoderParameters myEncoderParameters = new EncoderParameters(1);
EncoderParameter myEncoderParameter = new EncoderParameter(myEncoder, compressionLevel);
myEncoderParameters.Param[0] = myEncoderParameter;
using (MemoryStream ms = new MemoryStream()) {
image.Save(ms, jgpEncoder, myEncoderParameters);
return ms.ToArray();
}
}
//standard code from MSDN
private static ImageCodecInfo GetEncoder(ImageFormat format) {
ImageCodecInfo[] codecs = ImageCodecInfo.GetImageDecoders();
foreach (ImageCodecInfo codec in codecs) {
if (codec.FormatID == format.Guid) {
return codec;
}
}
return null;
}
//Standard high quality thumbnail generation from http://weblogs.asp.net/gunnarpeipman/archive/2009/04/02/resizing-images-without-loss-of-quality.aspx
private static System.Drawing.Image ShrinkImage(System.Drawing.Image sourceImage, float scaleFactor) {
int newWidth = Convert.ToInt32(sourceImage.Width * scaleFactor);
int newHeight = Convert.ToInt32(sourceImage.Height * scaleFactor);
var thumbnailBitmap = new Bitmap(newWidth, newHeight);
using (Graphics g = Graphics.FromImage(thumbnailBitmap)) {
g.CompositingQuality = CompositingQuality.HighQuality;
g.SmoothingMode = SmoothingMode.HighQuality;
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
System.Drawing.Rectangle imageRectangle = new System.Drawing.Rectangle(0, 0, newWidth, newHeight);
g.DrawImage(sourceImage, imageRectangle);
}
return thumbnailBitmap;
}
}
}

I don't know about iTextSharp, but you have to rewrite a PDF file if anything is changed, as it contains an xref table (index) with the exact file position of each object. This means if even one byte is added or removed, the PDF becomes corrupted.
Your best bet for recompressing the images is JBIG2 if they are B&W, or JPEG2000 otherwise, for which Jasper library will happily encode JPEG2000 codestreams for placement into PDF files at whatever quality you so desire.
If it were me I'd do it all from code without the PDF libraries. Just find all images (anything between stream and endstream after an occurance of JPXDecode (JPEG2000), JBIG2Decode (JBIG2) or DCTDecode (JPEG)) pull that out, reencode it with Jasper, then stick it back in again and update the xref table.
To update the xref table, find the positions of each object (starting 00001 0 obj) and just update the new positions in the xref table. It's not too much work, less than it sounds. You might be able to get all the offsets with a single regular expression (I'm not a C# programmer, but in PHP it would be that simple.)
Then finally update the value of the startxref tag in the trailer with the offset of the beginning of the xref table (where it says xref in the file).
Otherwise you'll end up decoding the entire PDF and rewriting it all, which will be slow, and you might lose something along the way.

There is an example on how to find and replace images in an existing PDF by the creator of iText. It's actually a small excerpt from his book. Since it's in Java, here's a simple replacement:
public void ReduceResolution(PdfReader reader, long quality) {
int n = reader.XrefSize;
for (int i = 0; i < n; i++) {
PdfObject obj = reader.GetPdfObject(i);
if (obj == null || !obj.IsStream()) {continue;}
PdfDictionary dict = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName subType = (PdfName)PdfReader.GetPdfObject(
dict.Get(PdfName.SUBTYPE)
);
if (!PdfName.IMAGE.Equals(subType)) {continue;}
PRStream stream = (PRStream )obj;
try {
PdfImageObject image = new PdfImageObject(stream);
PdfName filter = (PdfName) image.Get(PdfName.FILTER);
if (
PdfName.JBIG2DECODE.Equals(filter)
|| PdfName.JPXDECODE.Equals(filter)
|| PdfName.CCITTFAXDECODE.Equals(filter)
|| PdfName.FLATEDECODE.Equals(filter)
) continue;
System.Drawing.Image img = image.GetDrawingImage();
if (img == null) continue;
var ll = image.GetImageBytesType();
int width = img.Width;
int height = img.Height;
using (System.Drawing.Bitmap dotnetImg =
new System.Drawing.Bitmap(img))
{
// set codec to jpeg type => jpeg index codec is "1"
System.Drawing.Imaging.ImageCodecInfo codec =
System.Drawing.Imaging.ImageCodecInfo.GetImageEncoders()[1];
// set parameters for image quality
System.Drawing.Imaging.EncoderParameters eParams =
new System.Drawing.Imaging.EncoderParameters(1);
eParams.Param[0] =
new System.Drawing.Imaging.EncoderParameter(
System.Drawing.Imaging.Encoder.Quality, quality
);
using (MemoryStream msImg = new MemoryStream()) {
dotnetImg.Save(msImg, codec, eParams);
msImg.Position = 0;
stream.SetData(msImg.ToArray());
stream.SetData(
msImg.ToArray(), false, PRStream.BEST_COMPRESSION
);
stream.Put(PdfName.TYPE, PdfName.XOBJECT);
stream.Put(PdfName.SUBTYPE, PdfName.IMAGE);
stream.Put(PdfName.FILTER, filter);
stream.Put(PdfName.FILTER, PdfName.DCTDECODE);
stream.Put(PdfName.WIDTH, new PdfNumber(width));
stream.Put(PdfName.HEIGHT, new PdfNumber(height));
stream.Put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
stream.Put(PdfName.COLORSPACE, PdfName.DEVICERGB);
}
}
}
catch {
// throw;
// iText[Sharp] can't handle all image types...
}
finally {
// may or may not help
reader.RemoveUnusedObjects();
}
}
}
You'll notice it's only handling JPEG. The logic is reversed (instead of explicitly handling only DCTDECODE/JPEG) so you can uncomment some of the ignored image types and experiment with the PdfImageObject in the code above. In particular, most of the FLATEDECODE images (.bmp, .png, and .gif) are represented as PNG (confirmed in the DecodeImageBytes method of the PdfImageObject source code). As far as I know, .NET does not support PNG encoding. There are some references to support this here and here. You can try a stand-alone PNG optimization executable, but you also have to figure out how to set PdfName.BITSPERCOMPONENT and PdfName.COLORSPACE in the PRStream.
For completeness sake, since your question specifically asks about PDF compression, here's how you compress a PDF with iTextSharp:
PdfStamper stamper = new PdfStamper(
reader, YOUR-STREAM, PdfWriter.VERSION_1_5
);
stamper.Writer.CompressionLevel = 9;
int total = reader.NumberOfPages + 1;
for (int i = 1; i < total; i++) {
reader.SetPageContent(i, reader.GetPageContent(i));
}
stamper.SetFullCompression();
stamper.Close();
You might also try and run the PDF through PdfSmartCopy to get the file size down. It removes redundant resources, but like the call to RemoveUnusedObjects() in the finally block, it may or may not help. That will depend on how the PDF was created.
IIRC iText[Sharp] doesn't deal well with JBIG2DECODE, so #Alasdair's suggestion looks good - if you want to take the time learning the Jasper library and using the brute-force approach.
Good luck.
EDIT - 2012-08-17, comment by #Craig:
To save the PDF after compressing the jpegs using the ReduceResolution() method above:
a. Instantiate a PdfReader object:
PdfReader reader = new PdfReader(pdf);
b. Pass the PdfReader to the ReduceResolution() method above.
c. Pass the altered PdfReader to a PdfStamper. Here's one way using a MemoryStream:
// Save altered PDF. then you can pass the btye array to a database, etc
using (MemoryStream ms = new MemoryStream()) {
using (PdfStamper stamper = new PdfStamper(reader, ms)) {
}
return ms.ToArray();
}
Or you can use any other Stream if you don't need to keep the PDF in memory. E.g. use a FileStream and save directly to disk.

I've written a library to do just that. It will also OCR the pdf's using Tesseract or Cuneiform and create searchable, compressed PDF files. It's a library that uses several open source projects (iTextsharp, jbig2 encoder, Aforge, muPDF#) to complete the task. You can check it out here http://hocrtopdf.codeplex.com/

I am not sure if you are considering other libraries, but you can easily recompress existing images using Docotic.Pdf library (Disclaimer: I work for the company).
Here is some sample code:
static void RecompressExistingImages(string fileName, string outputName)
{
using (PdfDocument doc = new PdfDocument(fileName))
{
foreach (PdfImage image in doc.Images)
image.RecompressWithGroup4Fax();
doc.Save(outputName);
}
}
There are also RecompressWithFlate, RecompressWithGroup3Fax, RecompressWithJpeg and Uncompress methods.
The library will convert color images to bilevel ones if needed. You can specify deflate compression level, JPEG quality etc.
I am also ask you to think twice before using approach suggested by #Alasdair. If you are going to deal with PDF files that weren't created by you than the task is far more complex that it might seem.
To start with, there is great deal of images compressed by codecs other than JPXDecode, JBIG2Decode or DCTDecode. And PDF can also contain inline images.
PDF files saved using newer versions of standard (1.5 or newer) can contain cross-reference streams. It means that reading and updating such files is more complex than just finding/updating some numbers at the end of the file.
So, please, use a PDF library.

A simple way to compress PDF is using gsdll32.dll (Ghostscript) and Cyotek.GhostScript.dll (wrapper):
public static void CompressPDF(string sInFile, string sOutFile, int iResolution)
{
string[] arg = new string[]
{
"-sDEVICE=pdfwrite",
"-dNOPAUSE",
"-dSAFER",
"-dBATCH",
"-dCompatibilityLevel=1.5",
"-dDownsampleColorImages=true",
"-dDownsampleGrayImages=true",
"-dDownsampleMonoImages=true",
"-sPAPERSIZE=a4",
"-dPDFFitPage",
"-dDOINTERPOLATE",
"-dColorImageDownsampleThreshold=1.0",
"-dGrayImageDownsampleThreshold=1.0",
"-dMonoImageDownsampleThreshold=1.0",
"-dColorImageResolution=" + iResolution.ToString(),
"-dGrayImageResolution=" + iResolution.ToString(),
"-dMonoImageResolution=" + iResolution.ToString(),
"-sOutputFile=" + sOutFile,
sInFile
};
using(GhostScriptAPI api = new GhostScriptAPI())
{
api.Execute(arg);
}
}

Load image to ReportView dynamically

My name is Ed and i need load image from ReportView dinamic.How i can do this?
I work windows forms,c# 3.0 and linq to sql, i need load image to my reports dinamic.
Thanks.

I'm assuming that you are using the Microsoft Report Viewer Component from C# and you want to add an image to the report dynamically.
This is certainly possible, you need to create a class with a byte[] property that represents the serialized bitmap.
class ReportImage {
public byte[] Image {get;set;}
// Other stuff here if you want...
}
Set the property of this object the a 24 bit per pixel serialized version of your Bitmap (i.e. save your bitmap to a MemoryStream, then call MemoryStream.ToArray()). You must use 24 bits per pixel, and the format you save to must be BMP, this seems to be required in the Report Viewer.
You can then bind to the Object Data Source, (see the MSDN documentation for details on binding to Objects, also see the example here). Use the Image item to display your image in the report.
The limitation is that the images in your report must be fixed size. You'll have to resample the images beforehand to fit them in, or, as Jon suggests, dynamically create the RDLC file for the report.

This answer is very helpful (it got me past having little "broken image" boxes on my report), but a little misleading.
It is, strictly speaking, NOT a requirement that the "image" (which is actually a byte array) be a BMP format. In a test project I was able to read jpeg files from disk (i.e. File.ReadAllBytes(filename); ) and add the resulting byte arrays to a byte[] property in a List of "rptrow" (where rptrow is a object that represents all the data for one row in a report table). The images on the report had the MIMEType set to "image/jpeg" and a Source property of "Database". I also noticed that it did not matter what MIMEType I used as long as something was specified, (i.e. not blank).
I was in a hurry, so I didn't even consider checking the statement that it had to be a 24bpp image.
Simplified rptobj:
public class rptobj
{
public string FileName { get; set; }
public byte[] Photo { get; set; }
private List<rptobj> photos;
public List<rptobj> GetList()
{
if (photos == null)
{
photos = LoadPhotos();
}
return photos;
}
private List<rptobj> LoadPhotos()
{
var rslt = new List<rptobj>();
byte[] rawData;
var path = HttpContext.Current.Server.MapPath(#"~\images");
DirectoryInfo di = new DirectoryInfo(path);
FileSystemInfo[] fis = di.GetFileSystemInfos("*.jpg");
foreach(var fi in fis){
rawData = File.ReadAllBytes(string.Format(#"{0}\{1}", path, fi.Name ));
rslt.Add(new rptobj() { Photo = rawData, FileName = fi.Name });
}
return rslt;
}
}

Short answer is that you cannot do this, at least not with the built in report viewer functions.
However, if you are sure you want to do this, you can attempt to dynamically create RDLC files. If you create your RDLC files dynamically, you can dynamically add images to the reports.
You can find some sample code on how to create RDLC files dynamically here.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Manipulate PDF objects - c#

Related

Broken images extracted by PdfSharp C# Library

Add PDF to Image ReportViewer

ITextSharp returning same size pdf when I'm trying to compress pdf file with different levels

PDF Compression with iTextSharp [closed]

Load image to ReportView dynamically

Categories

Resources