Using itextsharp to remove inline images from pdf

Using itextsharp to remove inline images from pdf - c#

There are several examples of removing or resizeing images using itextsharp on the net, but i'm unable to find exemples of removing inline images.
I´m using the following code to remove XObject images:
PdfWriter writer = st.Writer;
PdfDictionary pg = reader.GetPageN(1);
PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
if (xobj != null)
{
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsDictionary())
{
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
//PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
if (PdfName.IMAGE.Equals(type))
{
int xrefIdx = ((PRIndirectReference)obj).Number;
PdfObject pdfObj = reader.GetPdfObject(xrefIdx);
PdfStream str = (PdfStream)(pdfObj);
byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)str);
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance((PRIndirectReference)obj);
string filter = tg.Get(PdfName.FILTER).ToString();
if (filter == "/DCTDecode")
{
PdfReader.KillIndirect(obj);
Stream stBrasao2 = File.OpenRead(pasta_recurso + "brasao.jpg");
iTextSharp.text.Image img2 = iTextSharp.text.Image.GetInstance(stBrasao2);
writer.AddDirectImageSimple(img2, (PRIndirectReference)obj);
break;
}
}
}
}
}
Is there any way to adapt this to remove inline images rather than XObject images?
Thanks.

The code wont remove inline images, with iText such task is done as Bruno Lowagie pointed out in the comments. In the end my solution was to parse the pdf with PDFSharp before IText. I´m using PDFSharp to read the pdf stream read the bytes, remove the bytes from image and then output a file for iText.

Related

iTextSharp is not working with foreach loop

I'm trying to insert images to a pdf file and this code is working right without foreach loop, but not working with foreach loop. Positions of images (top, left) are saved in data base, so foreach loop is getting positions of images and then placing them dynamically. But when I use foreach loop this code is not working, if I give static position to image to be inserted in pdf file, it works fine then.
And I have to place multiple imges that's why i'm using foreach loop and positions of images will be dynamic so it's must to use foreach loop.
And one more problem in this code is that it's just exporting first page of pdf file, instead I want to export all pages of pdf file i.e. complete pdf document with images. Anyone here help me to sort out this problem
string pdfFile = Server.MapPath("~/files/" + arg);
ViewBag.file = pdfFile;
var getAllPostitions = db.DraggedElements.Where(l => l.doc_name == arg).ToList();
ViewBag.tag_positions = getAllPostitions;
string imagepath = Server.MapPath("~/images/sign.png");
string DEST = #"e:/TestComplete.pdf";
//string IMG = #"C:Saved//TestImage.JPG";
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(pdfFile);
iTextSharp.text.Rectangle Size = reader.GetPageSizeWithRotation(1);
Document document = new Document(Size);
FileStream fs = new FileStream(DEST, FileMode.Create, FileAccess.Write);
iTextSharp.text.pdf.PdfWriter weiter = iTextSharp.text.pdf.PdfWriter.GetInstance(document, fs);
document.Open();
PdfContentByte cb = weiter.DirectContent;
PdfImportedPage page = weiter.GetImportedPage(reader, 1);
cb.AddTemplate(page, 0, 0);
iTextSharp.text.Image signImg = iTextSharp.text.Image.GetInstance(imagepath);
signImg.ScaleToFit(50, 50);
foreach (var position in getAllPostitions)
{
string topStr = position.top;
string topStr1 = topStr.Split(':')[1];
string topStr2 = topStr1.Split('p')[0];
float top = float.Parse(topStr2);
string leftStr = position.left;
string leftStr1 = leftStr.Split(':')[1];
string leftStr2 = leftStr1.Split('p')[0];
float left = float.Parse(leftStr2);
signImg.SetAbsolutePosition(top, left);
document.Add(signImg);
}
document.Close();
fs.Close();
weiter.Close();
reader.Close();

Convert Base64 from PDF to Bitmap [duplicate]

Is there any way, I can convert HTML Document (file not URL) to Image, or PDF to image?
I am able to do the above using Ghostscript DLL , Is there any other way , I can do it, without using the Ghostscript DLL?
I am developing a C# Windows Application.

the best and free nuget package that you can save every page of your Pdf to png and with custom resilution Docnet.core this can be use in the .net core project.
they have github and nice examples but here i want to add my code for reading en pdf with more that one page
string webRootPath = _hostingEnvironment.WebRootPath;
string fullPath = webRootPath + "/uploads/user-manual/file.pdf";
string fullPaths = webRootPath + "/uploads/user-manual";
using (var library = DocLib.Instance)
{
using (var docReader = library.GetDocReader(fullPath, 1080, 1920))
{
for (int i = 1; i < docReader.GetPageCount(); i++)
{
using (var pageReader = docReader.GetPageReader(i))
{
var bytes = EmailTemplates.GetModifiedImage(pageReader);
System.IO.File.WriteAllBytes(fullPaths+"/page_image_" +i+".png", bytes);
}
}
}
}
Other functions you can find in thier github repo.

Use LibPdf, for PDF to Image conversion
LibPdf library converts converts PDF file to an image. Supported image formats are PNG and BMP, but you can easily add more.
Usage example:
using (FileStream file = File.OpenRead(#"..\path\to\pdf\file.pdf")) // in file
{
var bytes = new byte[file.Length];
file.Read(bytes, 0, bytes.Length);
using (var pdf = new LibPdf(bytes))
{
byte[] pngBytes = pdf.GetImage(0,ImageType.PNG); // image type
using (var outFile = File.Create(#"..\path\to\pdf\file.png")) // out file
{
outFile.Write(pngBytes, 0, pngBytes.Length);
}
}
}
ImageMagick, you should also look at this freely available and powerful tool. It's capable of doing what you want and also provides some .NET bindings (as well as bindings to several other languages).
In its simplest form, it's just like writing a command
convert file.pdf imagefile.png

Try Freeware.Pdf2Png, check below url:
PDF to PNG converter.
byte[] png = Freeware.Pdf2Png.Convert(pdf, 1);
https://www.nuget.org/packages/Freeware.Pdf2Png/1.0.1?_src=template
In the about info, It said MIT license, I check it on March 22, 2022.
But as said Mitya, please double check.

You can use below any one library for PDF to Image conversion
Use Aspose.pdf link below:
http://www.aspose.com/docs/display/pdfnet/Convert+all+PDF+pages+to+JPEG+Images
code sample:
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(MyPdfPath));
using (FileStream imageStream = new FileStream(MyOutputImage.png, FileMode.Create))
{
Resolution resolution = new Resolution(300);
PngDevice pngDevice = new PngDevice(resolution);
pngDevice.Process(pdfDocument.Pages[PageNo], MyOutputImage);
imageStream.Close();
}
Use Bytescout PDF Renderer link below:
http://bytescout.com/products/developer/pdfrenderersdk/convert-pdf-to-png-basic-examples
code sample :
MemoryStream ImageStream = new MemoryStream();
RasterRenderer renderer = new RasterRenderer();
renderer.RegistrationName = "demo";
renderer.RegistrationKey = "demo";
// Load PDF document.
renderer.LoadDocumentFromFile(FilePath);
for (int i = 0; i < renderer.GetPageCount(); i++)
{
// Render first page of the document to PNG image file.
renderer.RenderPageToStream(i, RasterOutputFormat.PNG, ImageStream);
}
Image im = Image.FromStream(ImageStream);
im.Save("MyOutputImage.png");
ImageStream.Close();

Using docnet, based in this example on github, I did this, very simple and functional :
pdf used in this example.
//...
using Docnet.Core;
using System.IO;
using Docnet.Core.Models;
using System.Drawing;
using System.Drawing.Imaging;
using System.Runtime.InteropServices;
//paths
string pathPdf = #"C:\pathToPdfFile\lorem-ipsum.pdf";
string finalPathWithFileName = #"C:\pathToFinalImageFile\finalFile.png";
//using docnet
using (var docReader = DocLib.Instance.GetDocReader(pathPdf, new PageDimensions(1080, 1920)))
{
//open pdf file
using (var pageReader = docReader.GetPageReader(0))
{
var rawBytes = pageReader.GetImage();
var width = pageReader.GetPageWidth();
var height = pageReader.GetPageHeight();
var characters = pageReader.GetCharacters();
//using bitmap to create a png image
using (var bmp = new Bitmap(width, height, PixelFormat.Format32bppArgb))
{
AddBytes(bmp, rawBytes);
using (var stream = new MemoryStream())
{
//saving and exporting
bmp.Save(stream, ImageFormat.Png);
File.WriteAllBytes(finalPathWithFileName, stream.ToArray());
};
};
};
};
//extra methods
private static void AddBytes(Bitmap bmp, byte[] rawBytes)
{
var rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
var bmpData = bmp.LockBits(rect, ImageLockMode.WriteOnly, bmp.PixelFormat);
var pNative = bmpData.Scan0;
Marshal.Copy(rawBytes, 0, pNative, rawBytes.Length);
bmp.UnlockBits(bmpData);
}

Spire.PDF library can be used for PDF to Image conversion, such as PDF to PNG, JPG, EMF and TIFF etc.
The following is the code example shows how to convert PDF to PNG:
//Load a PDF
PdfDocument doc = new PdfDocument();
doc.LoadFromFile("PdfFilePath");
//Save to PNG images
for (int i = 0; i < doc.Pages.Count; i++)
{
String fileName = String.Format("ToImage-img-{0}.png", i);
using (Image image = doc.SaveAsImage(i,300,300))
{
image.Save(fileName, System.Drawing.Imaging.ImageFormat.Png);
}
}
doc.Close();
More conversion examples can be found in the library's documentation. It also provides a free community edition but with some limitations.

While using Ghostscript with ImageMagick is a potential option, it is incredibly slow, every page would take around 5 or more seconds. DocNet is a much better option to convert pdf to images. The following code would convert all pages in a pdf file into Images, and do that fast.
public void SavePDFtoJPGDocnet(string fileName)
{
string FilePath = #"C:\SampleFileFolder\doc.pdf";
string DestinationFolder = #"C:\SampleFileFolder\";
IDocLib DocNet = DocLib.Instance;
//you are specifying the max resolution of image on any side, actual resolution will be limited by longer side,
//preserving the aspect ratio
var docReader = DocNet.GetDocReader(
FilePath,
new PageDimensions(1440, 2560));
for (int i = 0; i < docReader.GetPageCount(); i++)
{
using (var pageReader = docReader.GetPageReader(i))
{
var rawBytes = pageReader.GetImage();
var width = pageReader.GetPageWidth();
var height = pageReader.GetPageHeight();
var characters = pageReader.GetCharacters();
var bmp = new Bitmap(width, height, PixelFormat.Format32bppArgb);
DocnetClass.AddBytes(bmp, rawBytes);
//DocnetClass.DrawRectangles(bmp, characters);
var stream = new MemoryStream();
bmp.Save(stream, ImageFormat.Png);
File.WriteAllBytes(DestinationFolder + "/page_image_" + i + ".png", stream.ToArray());
}
}
}

Freeware.Pdf2Png worked great for my needs.
It does not only convert to Png, you can save to the image format of your choice.
In MS Visual Studio run this in your Package Manager console
PM> NuGet\Install-Package Freeware.Pdf2Png -Version 1.0.1,
or just add via the NuGet Package Manager GUI, search for Freeware.Pdf2Png and it should come up.
Once the reference is added to your project, code similar to this should do what you need to convert a PDF to an Image.
using (FileStream fs = new FileStream(FullFilePath, FileMode.Open))
{
byte[] buff = Freeware.Pdf2Png.Convert(fs, 1);
MemoryStream ms = new MemoryStream(buff);
Image img = Image.FromStream(ms);
img.Save(TiffFilePath, System.Drawing.Imaging.ImageFormat.Tiff);
}
FullFilePath - a string that is the Full File Path to the PDF to be converted.
TiffFilePath - a string that is the Full File Path of the newly created Image that you would like to save.
Unfortunately I was not able to find any c# code or proper algorithm to do this conversion without a 3rd party DLL. If any of you have good information for that please do share it!

In case someone wants to use Ghostscript.NET.
Ghostscript.NET - (written in C#) is the most completed managed wrapper library around the Ghostscript library (32-bit & 64-bit), an interpreter for the PostScript language, PDF.
It is dependent on executable file you have to install on your machine. Here is a link from where you can see and download the latest version of the exe.
https://www.ghostscript.com/download/gsdnld.html
P.S. I had some troubles with the latest version 9.50 not being able to count the pages.
I prefer using the 9.26 version.
https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs926/gs926aw32.exe
https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs926/gs926aw64.exe
Next step is to find and install Ghostscript.NET from Nuget.
I download the PDF from CDN url and use the MemoryStream to open and process the PDF file. Here is a sample code:
using (WebClient myWebClient = new WebClient())
{
using (GhostscriptRasterizer rasterizer = new GhostscriptRasterizer())
{
/* custom switches can be added before the file is opened
rasterizer.CustomSwitches.Add("-dPrinted");
*/
byte[] buffer = myWebClient.DownloadData(pdfUrl);
using (var ms = new MemoryStream(buffer))
{
rasterizer.Open(ms);
var image = rasterizer.GetPage(0, 0, 1);
var imageURL = "MyCDNpath/Images/" + filename + ".png";
_ = UploadFileToS3(image, imageURL);
}
}
}
You can also use it with temporary FileStream. Here is another example. Note that the File is temporary and has DeleteOnClose mark.
using (WebClient myWebClient = new WebClient())
{
using (GhostscriptRasterizer rasterizer = new GhostscriptRasterizer())
{
/* custom switches can be added before the file is opened
rasterizer.CustomSwitches.Add("-dPrinted");
*/
byte[] buffer = myWebClient.DownloadData(pdfUrl);
int bufferSize = 4096;
using (var fileStream = System.IO.File.Create("TempPDFolder/" + pdfName, bufferSize, System.IO.FileOptions.DeleteOnClose))
{
// now use that fileStream to save the pdf stream
fileStream.Write(buffer, 0, buffer.Length);
rasterizer.Open(fileStream);
var image = rasterizer.GetPage(0, 0, 1);
var imageURL = "MyCDNpath/Images/" + filename + ".png";
_ = UploadFileToS3(image, imageURL);
}
}
}
Hope it will help someone struggling to get high quality images from pdf for free.

How do I append a PDF file from binary to an already 'in-progress' PDF, using iTextSharp?

In code, I am in the process of created a PDF document using iTextSharp. I have already added content to the document and have closed the document, successfully retrieving it in a response to a web browser.
What I am trying to do is append another PDF document to the one I am creating but it has to come from binary or an object of type Byte[].
I realize that there is the available method document.Add(stuff) but I am trying to convert the binary to an object and then essentially add that to the document in progress. I have seen questions and posts similar to my scenario but they are mostly dealing with Images.
Here is what I have...
while (sqlExpDocDataReader.Read())
{
// Read data and fill temp. objects
string docName = sqlExpDocDataReader["docName"].ToString();
string docType = sqlExpDocDataReader["docType"].ToString();
Byte[] docData = (Byte[])sqlExpDocDataReader["docData"];
// Get current page size
var pageWidth = document.PageSize.Width;
var pageHeight = document.PageSize.Height;
// Is this an image or PDF?
if (docType.Contains("pdf"))
{
// Could I use a memeory stream some how?
MemoryStream ms = new MemoryStream(docData.ToArray());
}
else
{
// Here I see how to do it with images.
Image doc = Image.GetInstance(docData);
doc.ScaleToFit(pageWidth, pageHeight); // width, height
document.Add(doc);
}
}
Any ideas?

With a bit more digging, here is how I was able to resolve my issue...
Basically, I created a MemoryStream object from my binary data and then created a PdfReader to read that object, where normally we would read a file.
I then looped through each page of the reader object (or file if you'd like) and appended them as they where found.
if (docType.Contains("pdf"))
{
MemoryStream ms = new MemoryStream(docData.ToArray());
PdfReader pdfReader = new PdfReader(ms);
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(pdfReader, i);
document.Add(iTextSharp.text.Image.GetInstance(page));
}
}

public static byte[] UnificarImagenesPDF(IEnumerable<DocumentoDTO> documentos)// "documents" is a list of objects that are located in the database, the images and pdf are stored in a binary attribute of "documents"
{
using (MemoryStream workStream = new MemoryStream())
{
iTextSharp.text.Document doc = new iTextSharp.text.Document();//to create a itextSharp Document
PdfWriter writer = PdfWriter.GetInstance(doc, workStream);
doc.Open();
foreach (DocumentoDTO d in documentos)// "documentos" has an attribute where the document extension type is saved (eg pdf, jpg, png, etc)
{
try
{
if (d.sExtension == ".pdf")
{
MemoryStream ms = new MemoryStream(d.bBinarios.ToArray());
PdfReader pdfReader = new PdfReader(ms); //
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(pdfReader, i);
doc.Add(resizeImagen(iTextSharp.text.Image.GetInstance(page)));//Each sheet of the PDF document is added to the document created in itextsharp, and the resizeImage function is used so that the images are centered in the ITEXTSHARP document
doc.NewPage();// add a new page on ITEXTSHARP document
}
}
if (d.sExtension != ".pdf")
{
doc.Add(resizeImagen(Image.GetInstance((byte[])d.bBinarios)));
doc.NewPage();
}
}
catch
{ }
}
doc.Close();
writer.Close();
return workStream.ToArray();
}
}
private static iTextSharp.text.Image resizeImagen(iTextSharp.text.Image image)
{
if (image.Height > image.Width)
{
//Maximum height is 800 pixels.
float percentage = 0.0f;
percentage = 700 / image.Height;
image.ScalePercent(percentage * 100);
}
else
{
//Maximum width is 600 pixels.
float percentage = 0.0f;
percentage = 540 / image.Width;
image.ScalePercent(percentage * 100);
}
return image;
}

Scaling PDF Existing Images

I'm trying to go through all the images in a PDF and resize them to reduce the PDF file size. I'm using iTextSharp in C#. Below is my code so far. When I look at the output file none of the images were modified.
PdfReader pdf = new PdfReader(input);
using (PdfStamper stp = new PdfStamper(pdf, output))
{
for (int i = 1; i <= pdf.NumberOfPages; i++)
{
PdfDictionary page = pdf.GetPageN(i);
PdfDictionary resources = (PdfDictionary)PdfReader.GetPdfObject(page.Get(PdfName.RESOURCES));
PdfDictionary xObjects = (PdfDictionary)PdfReader.GetPdfObject(resources.Get(PdfName.XOBJECT));
if (xObjects == null)
continue;
foreach (PdfName name in xObjects.Keys)
{
PdfObject xObj = xObjects.Get(name);
if (!xObj.IsIndirect())
continue;
//Filter non-images
PdfDictionary xObjDic = (PdfDictionary)PdfReader.GetPdfObject(xObj);
PdfName xObjType = (PdfName)PdfReader.GetPdfObject(xObjDic.Get(PdfName.SUBTYPE));
if (!PdfName.IMAGE.Equals(xObjType))
continue;
//Get embedded image
int refId = ((PRIndirectReference)xObj).Number;
PRStream objStream = (PRStream)pdf.GetPdfObject(refId);
PdfImageObject objImg = new PdfImageObject(objStream);
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(objImg.GetImageAsBytes());
//Resize
img.ScaleAbsolute(200, 200);
img.SetDpi(72, 72);
PdfReader.KillIndirect(xObj);
stp.Writer.AddDirectImageSimple(img, (PRIndirectReference)xObj);
break;
}
}
}
pdf.Close();

Getting Image Details For Adding To DOCX

I am adding images to a DOCX files using the WordprocessingDocument method found here (Open XML) http://msdn.microsoft.com/en-us/library/bb497430.aspx.
I can add images, but the sizing is not correct.
MainDocumentPart mainPart = doc.MainDocumentPart;
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Jpeg);
using (System.IO.FileStream stream = new System.IO.FileStream(fileName, System.IO.FileMode.Open, System.IO.FileAccess.Read))
{
imagePart.FeedData(stream);
}
return AddImageToBody(doc, mainPart.GetIdOfPart(imagePart), fileName);
private static Drawing AddImageToBody(WordprocessingDocument wordDoc, string relationshipId, string filename)
{
long imageWidthEMU = 1900000;
long imageHeightEMU = 350000;
double imageWidthInInches = imageWidthEMU / 914400.0;
double imageHeightInInches = imageHeightEMU / 914400.0;
new DW.Extent();
//Define the reference of the image.
var element =
new Drawing(
new DW.Inline(
new DW.Extent() { Cx = imageWidthEMU, Cy = imageHeightEMU },
As you can see, you specify the sizes (length + width) manually. I am unable to get them dynamically. How can you get the right correct image size to pass to this code?
Thanks.

Your problem solved here:
Inserting Image into DocX using OpenXML and setting the size
There is no way to solve it via instruction in the xml file. OOXML offers only <a:fill> and <a:tile> options.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Using itextsharp to remove inline images from pdf - c#

Related

iTextSharp is not working with foreach loop

Convert Base64 from PDF to Bitmap [duplicate]

How do I append a PDF file from binary to an already 'in-progress' PDF, using iTextSharp?

Scaling PDF Existing Images

Getting Image Details For Adding To DOCX

Categories

Resources