I have a request for reduce PDF file size in code level if u have a solution please share with me.
But image size should be lowest.
Thank you.
The following code example demonstrates how to compress a PDF document by reducing/compressing the qualities of images in the document. You can give it a try.
//Loads the PDF document
PdfDocument doc = new PdfDocument("Image.pdf");
//Disables the incremental update
doc.FileInfo.IncrementalUpdate = false;
//Traverses all pages
foreach (PdfPageBase page in doc.Pages)
{
//Extracts images from page
Image[] images = page.ExtractImages();
if (images != null && images.Length > 0)
{
//Traverses all images
for (int j = 0; j < images.Length; j++)
{
Image image = images[j];
PdfBitmap bp = new PdfBitmap(image);
//Reduces the quality of the image
bp.Quality = 20;
//Replaces the old image in the document with the compressed image
page.ReplaceImage(j, bp);
}
}
}
//Saves and closes the resultant document
doc.SaveToFile("Output.pdf");
doc.Close();
Note: This example is offered by Spire.PDF and I'm an employee of Spire. You can check the How to Compress PDF Document in C#, VB.NET article for detailed information.
Related
I have a PDF file with some images, which I want to replace with some other PDF. The code goes through the pdf and gets the image references:
PdfDocument pdf = new PdfDocument(new PdfReader(args[0]), new PdfWriter(args[1]));
for(int i=1; i<=pdf.GetNumberOfPages(); ++i)
{
PdfDictionary pageDict = pdf.GetPage(i).GetPdfObject();
PdfDictionary resources = pageDict.GetAsDictionary(PdfName.Resources);
PdfDictionary xObjects = resources.GetAsDictionary(PdfName.XObject);
foreach (PdfName imgRef in xObjects.KeySet())
{
// image reference
}
}
For all my images I have a corresponding PDF which I would like to replace the image with. What I tried is to Put the other PDF (which is always a single page) as object by:
PdfDocument other = new PdfDocument(new PdfReader("replacement.pdf"));
xObjects.Put(imgRef, other.GetFirstPage().GetPdfObject().Clone());
But while closing the PdfDocument an exception is thrown:
iText.Kernel.PdfException: 'Pdf indirect object belongs to other PDF document. Copy object to current pdf document.'
How can I achieve to replace the image with (the content of) another PDF?
Update
I also tried a few other approaches, which maybe improved results. To overcome the previous error message, I copy the page to the original pdf by:
var page = other.GetFirstPage().CopyTo(pdf);
However, replacing the xObject doesn't work:
xObjects.Put(imgRef, page.GetPdfObject());
Results in a corrupted PDF.
To just copy the original page into another document to be used as an image replacement, you can use PdfPage#CopyAsFormXObject.
So let's assume we have this PDF as a template and we want to replace the image of a desert with the contents of another PDF:
Let's also assume the PDF that we want to use as a replacement looks as follows:
The issue is that if we blindly replace the original image with the contents of the PDF, chances are we will get something like this:
So we will get a feeling that everything worked well while we still have a bad visual result. The issue is that coordinates work a bit differently for plain raster images and vector XObjects (PDF replacements). So we also need to adjust the transformation matrix (/Matrix key) of our newly created XObject.
So the code could look like this:
PdfDocument pdf = new PdfDocument(new PdfReader(#"template.pdf"), new PdfWriter(#"out.pdf"));
for(int i=1; i<=pdf.GetNumberOfPages(); ++i) {
PdfDictionary pageDict = pdf.GetPage(i).GetPdfObject();
PdfDictionary resources = pageDict.GetAsDictionary(PdfName.Resources);
PdfDictionary xObjects = resources.GetAsDictionary(PdfName.XObject);
IDictionary<PdfName, PdfStream> toReplace = new Dictionary<PdfName, PdfStream>();
foreach (PdfName imgRef in xObjects.KeySet()) {
PdfStream previousXobject = xObjects.GetAsStream(imgRef);
PdfDocument imageReplacementDoc =
new PdfDocument(new PdfReader(#"insert.pdf"));
PdfXObject imageReplacement = imageReplacementDoc.GetPage(1).CopyAsFormXObject(pdf);
toReplace[imgRef] = imageReplacement.GetPdfObject();
adjustXObjectSize(imageReplacement);
imageReplacementDoc.Close();
}
foreach (var x in toReplace) {
xObjects.Put(x.Key, x.Value);
}
}
pdf.Close();
UPD: Implementation of adjustXObjectSize(thanks mkl):
private void adjustXObjectSize(PdfXObject pageXObject) {
float scaleXobject = 1 / Math.Max(pageXObject.GetWidth(), pageXObject.GetHeight());
AffineTransform transform = new AffineTransform();
transform.Scale(scaleXobject, scaleXobject);
float[] matrix = new float[6];
transform.GetMatrix(matrix);
pageXObject.GetPdfObject().Put(PdfName.Matrix, new PdfArray(matrix));
}
And the visual result after running the above code on the samples I described would look like this:
I have a lot of PDF files with text. To prevent copying, I added a watermark, however, the watermark is easily removable simply by editing the PDF.
Using C#, how can I convert a PDF into a PDF with each page being an image of the text? I understand this isn't foolproof, as OCR can be used to extract the text, but I want to make it that little bit harder.
Thanks for your help.
I used Ghostscript.Net (https://github.com/jhabjan/Ghostscript.NET) to break up each page into a bitmap which you can
convert into any other format you want:
using Ghostscript.NET.Rasterizer;
...
using (GhostscriptRasterizer raster = new GhostscriptRasterizer())
{
raster.Open(filename);
pages = raster.PageCount;
_bitpages = new Bitmap[raster.PageCount];
for (int i = 1; i < pages + 1; i++)
{
_bitpages[i - 1] = (Bitmap)raster.GetPage(dpi, dpi, i);
// convert and save image here
}
raster.Close();
}
Please let me know what method can be used to convert pdf to image in iText7.
In Itexsharp, there was an option to convert pdf file to images. Following is the link. PDF to Image Using iTextSharp
http://www.c-sharpcorner.com/UploadFile/a0927b/create-pdf-document-and-convert-to-image-programmatically/
Below is the sample code created using the following refernce link.
itext7 pdf to image
this is not working as expected. It is not converting the pdf to image. It is creating a 1kb blank image.
string fileName = System.IO.Path.GetFileNameWithoutExtension(inputFilePath);
var pdfReader = new PdfReader(inputFilePath);
var pdfDoc = new iText.Kernel.Pdf.PdfDocument(pdfReader);
int pagesLength = pdfDoc.GetNumberOfPages()+1;
for (int i = 1; i < pagesLength; i++)
{
if (!File.Exists(System.IO.Path.Combine(imageFileDir, fileName + "_" +
`enter code here`(startIndex + i) + ".png")) && i < pagesLength)
{
PdfPage pdfPages = pdfDoc.GetPage(i);
PdfWriter writer = new PdfWriter(System.IO.Path.Combine(imageFileDir, fileName + "_" + (startIndex + i) + ".png"), new WriterProperties().SetFullCompressionMode(true));
PdfDocument pdf = new PdfDocument(writer);
PdfFormXObject pageCopy = pdfPages.CopyAsFormXObject(pdf);
iText.Layout.Element.Image image = new iText.Layout.Element.Image(pageCopy);
}
}
Quoting Bruno:
iText does not convert PDFs to raster images (such as .jpg, .png,...). You are misinterpreting the examples that create an Image instance based on an existing page. Those examples create an XObject that can be reused in a new PDF as if it were a vector image; they don't convert a PDF page to a raster image.
What you can use for this (which is what we at iText internally use for testing) is GhostScript. It takes a pdf as input and converts it to a series of images (one image per page).
I'm at the last step in completing a pdf generator. I am using iText sharp and i am able to stamp a base64 image with no problem thanks to help from StackOverflow.
My question is how would I iterate over posted files and add a new page with posted image files on it. Here is my current way of stamping an image... however, its coming from base64. I need to add uploaded images selected from my application to the pdf preferably while the stamper is opened. Just can't seem to make my code work.
I feel this is easy to iterate thru but can't get the logic. Please help:
PdfContentByte pdfContentByte = stamper.GetOverContent(1);
PdfContentByte pdfContentByte2 = stamper.GetOverContent(4);
var image = iTextSharp.text.Image.GetInstance(
Convert.FromBase64String(match.Groups["data"].Value)
);
image.SetAbsolutePosition(270, 90);
image.ScaleToFit(250f, 100f);
pdfContentByte.AddImage(image);
//stamping base64 image works perfect - now i need to stamp the uploaded images onto a new page in the same document before stamper closes.
var imagepath = "//test//";
HttpFileCollection uploadFilCol = HttpContext.Current.Request.Files;
for (int i = 0; i < uploadFilCol.Count; i++)
{
HttpPostedFile file = uploadFilCol[i];
using (FileStream fs = new FileStream(imagepath + "Invoice-" +
HttpContext.Current.Request.Form.Get("genUUID") + file, FileMode.Open))
{
HttpPostedFile file = uploadFilCol[i];
pdfContentByte2.AddImage(file);
}
}
My posted files comes from input form on an html page
<input type="file" id="file" name="files[]" runat="server" multiple />
The basic steps:
Iterate over the HttpFileCollection.
Read each HttpPostedFile into a byte array.
Create iText Image with byte array in previous step.
Set the image absolute position, and optionally scale as needed.
Add image at specified page number with GetOverContent()
A quick snippet to get you started. Not tested, and assumes you have PdfReader, Stream, and PdfStamper setup, along with a working file upload:
HttpFileCollection uploadFilCol = HttpContext.Current.Request.Files;
for (int i = 0; i < uploadFilCol.Count; i++)
{
HttpPostedFile postedFile = uploadFilCol[i];
using (var br = new BinaryReader(postedFile.InputStream))
{
var imageBytes = br.ReadBytes(postedFile.ContentLength);
var image = Image.GetInstance(imageBytes);
// still not sure if you want to add a new blank page, but
// here's how
//stamper.InsertPage(
// APPEND_NEW_PAGE_NUMBER, reader.GetPageSize(APPEND_NEW_PAGE_NUMBER - 1)
//);
// image absolute position
image.SetAbsolutePosition(absoluteX, absoluteY);
// scale image if needed
// image.ScaleAbsolute(...);
// PAGE_NUMBER => add image to specific page number
stamper.GetOverContent(PAGE_NUMBER).AddImage(image);
}
}
How can I remove page breaks from a pdf, so the output would be a single 'page' PDF? So if a normal page is 400x900 and I have 4 pages, a resulting file would be 1600x900. I previously did this for Tif files (Remove page breaks in multi-page tif to make one long page), but would like to do it with PDF. Could I possibly convert to ps, remove whatever code means 'page break', then convert back to pdf?
This can be done in the iTextSharp library by using a single columned PdfTable and dynamically changing the size of the document dependent upon the number of pages.
You'll of course need a few references to the iTextSharp DLL found here
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
Here's a simple example:
public static void MergePages()
{
using (PdfReader reader = new PdfReader(#"C:\Users\cmilne\Desktop\AA0081913.pdf"))//Original PDF containing page breaks.
{
int pages = reader.NumberOfPages;
float postProcessPageHeight = 0;
float postProcessPageWidth = 0;
for (int p = 1; p <= bill.PageCount; p++)
{
var size = bill.PdfReader.GetPageSize(p);
postProcessPageHeight += (size.Height);
if (size.Width > postProcessPageWidth)
postProcessPageWidth = (size.Width);
}
var rect = new Rectangle(postProcessPageWidth, postProcessPageHeight);
using (Document document = new Document(rect, 0, 0, 0, 0))
{
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(#"C:\Users\cmilne\Desktop\AA0081913_NEW.pdf", FileMode.Create)); //Declare location\name of new PDF not containing page breaks.
document.Open();
PdfImportedPage page;
PdfPTable table = new PdfPTable(1);
table.WidthPercentage = 100;
for (int i = 1; i <= pages; i++)
{
page = writer.GetImportedPage(reader, i);
table.AddCell(iTextSharp.text.Image.GetInstance(page));
}
document.Add(table);
document.Close();
}
}
}
The ending page size must be smaller than 14400 by 14400. (This is all that iTextSharp allows) An 8 1/2 x 11 PDF at a common resolution would make the max about 18 pages.
Use the iTextSharp C# library. It gives you a lot of options to manipulate PDFs. I've used it before when I had to write an import application for a closed-source document repository. It worked like a charm. The only downside is their documentation is kind of spotty because they want you to purchase their book. You can browser their Java API though for free since its almost identical to the C#, and just play around with it to find the C# version.
iText: http://itextpdf.com/