Please let me know what method can be used to convert pdf to image in iText7.
In Itexsharp, there was an option to convert pdf file to images. Following is the link. PDF to Image Using iTextSharp
http://www.c-sharpcorner.com/UploadFile/a0927b/create-pdf-document-and-convert-to-image-programmatically/
Below is the sample code created using the following refernce link.
itext7 pdf to image
this is not working as expected. It is not converting the pdf to image. It is creating a 1kb blank image.
string fileName = System.IO.Path.GetFileNameWithoutExtension(inputFilePath);
var pdfReader = new PdfReader(inputFilePath);
var pdfDoc = new iText.Kernel.Pdf.PdfDocument(pdfReader);
int pagesLength = pdfDoc.GetNumberOfPages()+1;
for (int i = 1; i < pagesLength; i++)
{
if (!File.Exists(System.IO.Path.Combine(imageFileDir, fileName + "_" +
`enter code here`(startIndex + i) + ".png")) && i < pagesLength)
{
PdfPage pdfPages = pdfDoc.GetPage(i);
PdfWriter writer = new PdfWriter(System.IO.Path.Combine(imageFileDir, fileName + "_" + (startIndex + i) + ".png"), new WriterProperties().SetFullCompressionMode(true));
PdfDocument pdf = new PdfDocument(writer);
PdfFormXObject pageCopy = pdfPages.CopyAsFormXObject(pdf);
iText.Layout.Element.Image image = new iText.Layout.Element.Image(pageCopy);
}
}
Quoting Bruno:
iText does not convert PDFs to raster images (such as .jpg, .png,...). You are misinterpreting the examples that create an Image instance based on an existing page. Those examples create an XObject that can be reused in a new PDF as if it were a vector image; they don't convert a PDF page to a raster image.
What you can use for this (which is what we at iText internally use for testing) is GhostScript. It takes a pdf as input and converts it to a series of images (one image per page).
Related
I have a PDF file with some images, which I want to replace with some other PDF. The code goes through the pdf and gets the image references:
PdfDocument pdf = new PdfDocument(new PdfReader(args[0]), new PdfWriter(args[1]));
for(int i=1; i<=pdf.GetNumberOfPages(); ++i)
{
PdfDictionary pageDict = pdf.GetPage(i).GetPdfObject();
PdfDictionary resources = pageDict.GetAsDictionary(PdfName.Resources);
PdfDictionary xObjects = resources.GetAsDictionary(PdfName.XObject);
foreach (PdfName imgRef in xObjects.KeySet())
{
// image reference
}
}
For all my images I have a corresponding PDF which I would like to replace the image with. What I tried is to Put the other PDF (which is always a single page) as object by:
PdfDocument other = new PdfDocument(new PdfReader("replacement.pdf"));
xObjects.Put(imgRef, other.GetFirstPage().GetPdfObject().Clone());
But while closing the PdfDocument an exception is thrown:
iText.Kernel.PdfException: 'Pdf indirect object belongs to other PDF document. Copy object to current pdf document.'
How can I achieve to replace the image with (the content of) another PDF?
Update
I also tried a few other approaches, which maybe improved results. To overcome the previous error message, I copy the page to the original pdf by:
var page = other.GetFirstPage().CopyTo(pdf);
However, replacing the xObject doesn't work:
xObjects.Put(imgRef, page.GetPdfObject());
Results in a corrupted PDF.
To just copy the original page into another document to be used as an image replacement, you can use PdfPage#CopyAsFormXObject.
So let's assume we have this PDF as a template and we want to replace the image of a desert with the contents of another PDF:
Let's also assume the PDF that we want to use as a replacement looks as follows:
The issue is that if we blindly replace the original image with the contents of the PDF, chances are we will get something like this:
So we will get a feeling that everything worked well while we still have a bad visual result. The issue is that coordinates work a bit differently for plain raster images and vector XObjects (PDF replacements). So we also need to adjust the transformation matrix (/Matrix key) of our newly created XObject.
So the code could look like this:
PdfDocument pdf = new PdfDocument(new PdfReader(#"template.pdf"), new PdfWriter(#"out.pdf"));
for(int i=1; i<=pdf.GetNumberOfPages(); ++i) {
PdfDictionary pageDict = pdf.GetPage(i).GetPdfObject();
PdfDictionary resources = pageDict.GetAsDictionary(PdfName.Resources);
PdfDictionary xObjects = resources.GetAsDictionary(PdfName.XObject);
IDictionary<PdfName, PdfStream> toReplace = new Dictionary<PdfName, PdfStream>();
foreach (PdfName imgRef in xObjects.KeySet()) {
PdfStream previousXobject = xObjects.GetAsStream(imgRef);
PdfDocument imageReplacementDoc =
new PdfDocument(new PdfReader(#"insert.pdf"));
PdfXObject imageReplacement = imageReplacementDoc.GetPage(1).CopyAsFormXObject(pdf);
toReplace[imgRef] = imageReplacement.GetPdfObject();
adjustXObjectSize(imageReplacement);
imageReplacementDoc.Close();
}
foreach (var x in toReplace) {
xObjects.Put(x.Key, x.Value);
}
}
pdf.Close();
UPD: Implementation of adjustXObjectSize(thanks mkl):
private void adjustXObjectSize(PdfXObject pageXObject) {
float scaleXobject = 1 / Math.Max(pageXObject.GetWidth(), pageXObject.GetHeight());
AffineTransform transform = new AffineTransform();
transform.Scale(scaleXobject, scaleXobject);
float[] matrix = new float[6];
transform.GetMatrix(matrix);
pageXObject.GetPdfObject().Put(PdfName.Matrix, new PdfArray(matrix));
}
And the visual result after running the above code on the samples I described would look like this:
I have a lot of PDF files with text. To prevent copying, I added a watermark, however, the watermark is easily removable simply by editing the PDF.
Using C#, how can I convert a PDF into a PDF with each page being an image of the text? I understand this isn't foolproof, as OCR can be used to extract the text, but I want to make it that little bit harder.
Thanks for your help.
I used Ghostscript.Net (https://github.com/jhabjan/Ghostscript.NET) to break up each page into a bitmap which you can
convert into any other format you want:
using Ghostscript.NET.Rasterizer;
...
using (GhostscriptRasterizer raster = new GhostscriptRasterizer())
{
raster.Open(filename);
pages = raster.PageCount;
_bitpages = new Bitmap[raster.PageCount];
for (int i = 1; i < pages + 1; i++)
{
_bitpages[i - 1] = (Bitmap)raster.GetPage(dpi, dpi, i);
// convert and save image here
}
raster.Close();
}
I have a request for reduce PDF file size in code level if u have a solution please share with me.
But image size should be lowest.
Thank you.
The following code example demonstrates how to compress a PDF document by reducing/compressing the qualities of images in the document. You can give it a try.
//Loads the PDF document
PdfDocument doc = new PdfDocument("Image.pdf");
//Disables the incremental update
doc.FileInfo.IncrementalUpdate = false;
//Traverses all pages
foreach (PdfPageBase page in doc.Pages)
{
//Extracts images from page
Image[] images = page.ExtractImages();
if (images != null && images.Length > 0)
{
//Traverses all images
for (int j = 0; j < images.Length; j++)
{
Image image = images[j];
PdfBitmap bp = new PdfBitmap(image);
//Reduces the quality of the image
bp.Quality = 20;
//Replaces the old image in the document with the compressed image
page.ReplaceImage(j, bp);
}
}
}
//Saves and closes the resultant document
doc.SaveToFile("Output.pdf");
doc.Close();
Note: This example is offered by Spire.PDF and I'm an employee of Spire. You can check the How to Compress PDF Document in C#, VB.NET article for detailed information.
I'm at the last step in completing a pdf generator. I am using iText sharp and i am able to stamp a base64 image with no problem thanks to help from StackOverflow.
My question is how would I iterate over posted files and add a new page with posted image files on it. Here is my current way of stamping an image... however, its coming from base64. I need to add uploaded images selected from my application to the pdf preferably while the stamper is opened. Just can't seem to make my code work.
I feel this is easy to iterate thru but can't get the logic. Please help:
PdfContentByte pdfContentByte = stamper.GetOverContent(1);
PdfContentByte pdfContentByte2 = stamper.GetOverContent(4);
var image = iTextSharp.text.Image.GetInstance(
Convert.FromBase64String(match.Groups["data"].Value)
);
image.SetAbsolutePosition(270, 90);
image.ScaleToFit(250f, 100f);
pdfContentByte.AddImage(image);
//stamping base64 image works perfect - now i need to stamp the uploaded images onto a new page in the same document before stamper closes.
var imagepath = "//test//";
HttpFileCollection uploadFilCol = HttpContext.Current.Request.Files;
for (int i = 0; i < uploadFilCol.Count; i++)
{
HttpPostedFile file = uploadFilCol[i];
using (FileStream fs = new FileStream(imagepath + "Invoice-" +
HttpContext.Current.Request.Form.Get("genUUID") + file, FileMode.Open))
{
HttpPostedFile file = uploadFilCol[i];
pdfContentByte2.AddImage(file);
}
}
My posted files comes from input form on an html page
<input type="file" id="file" name="files[]" runat="server" multiple />
The basic steps:
Iterate over the HttpFileCollection.
Read each HttpPostedFile into a byte array.
Create iText Image with byte array in previous step.
Set the image absolute position, and optionally scale as needed.
Add image at specified page number with GetOverContent()
A quick snippet to get you started. Not tested, and assumes you have PdfReader, Stream, and PdfStamper setup, along with a working file upload:
HttpFileCollection uploadFilCol = HttpContext.Current.Request.Files;
for (int i = 0; i < uploadFilCol.Count; i++)
{
HttpPostedFile postedFile = uploadFilCol[i];
using (var br = new BinaryReader(postedFile.InputStream))
{
var imageBytes = br.ReadBytes(postedFile.ContentLength);
var image = Image.GetInstance(imageBytes);
// still not sure if you want to add a new blank page, but
// here's how
//stamper.InsertPage(
// APPEND_NEW_PAGE_NUMBER, reader.GetPageSize(APPEND_NEW_PAGE_NUMBER - 1)
//);
// image absolute position
image.SetAbsolutePosition(absoluteX, absoluteY);
// scale image if needed
// image.ScaleAbsolute(...);
// PAGE_NUMBER => add image to specific page number
stamper.GetOverContent(PAGE_NUMBER).AddImage(image);
}
}
I'm searching for converting each page of the PDF to JPEG.
I found the below code from the following link:
http://imagemagick.codeplex.com/discussions/257915
here are two ways to convert a pdf to an image. You can either read all the images at once or just read one image. For this to work you need to have ghostscript installed.
using (ImageList imageList = new ImageList())
{
imageList.ReadImages(inputFile);
int pageIndex = 0;
foreach (Image page in imageList)
{
page.Write(outputDirectory + "\\Page." + pageIndex + ".jpg");
pageIndex++;
}
}
using (Image firstPage = new Image())
{
firstPage.Read(inputFile + "[0]");
firstPage.Write(outputDirectory + "\\FirstPage.jpg");
}
Here the imagelist class is not recognizing. I have added the DLL. Do u have any worked samples?Also Ghostscript download is an EXE file but not DLL?