I have a pdf that contains a vector image. I asked the client about it, and they said that they created the image in Illustrator and saved it as a pdf. Is there a way I can extract that image and convert it into a png? I've tried code from the following:
Extract image from PDF using itextsharp
http://www.vbforums.com/showthread.php?530736-2005-Extract-Images-from-a-PDF-file-using-iTextSharp
and a couple of other links that I can't find, but they all don't seem to work. My theory is that they are extracting embedded images like jpegs, bmps, pngs, etc., but what I am faced with is a direct export from illustrator.
Should I be using an illustrator sdk or is there a way for me to do it using itextsharp? Also, I need to convert it to a standard image format, like png, and send the stream to a calling app, so I'll need to be able to grab stream.
You will not be able to do this with iText, since it cannot render or rasterize vector graphics in PDF files.
Option 1:
If a GPL license works for you, you could rasterize your PDF file with Imagemagick+GNU Ghostscript, but AFAIK you will have to write the output into a file in this case.
Command line sample:
convert -density 300 -depth 8 c:\temp\mydoc.pdf c:\temp\myrasterimage.png
There is also a .net wrapper in Codeplex that might work for you: ImageMagick.NET
Option A:
If a commercial library is an option for you, you could try with Amyuni PDF Creator .Net. You can either use the method IacDocument.ExportToJpg, which requires writing into a file, or you can use the method IacDocument.DrawCurrentPage, which can be useful for writing the output into a memory stream.
Sample code for exporting one page using IacDocument.DrawCurrentPage into a memory stream:
const int twipsPerInch = 1440;
const int MM_ISOTROPIC = 7;
private static MemoryStream RasterizePDF(string filePath, int pageIndex, int targetDPI)
{
Amyuni.PDFCreator.IacDocument doc = new Amyuni.PDFCreator.IacDocument();
doc.SetLicenseKey("Evaluation", "07EFC00...77C23E29");
FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read);
doc.Open(fs, "");
//Get the width and height of the target page
Amyuni.PDFCreator.IacPageFormat format = doc.GetPage(pageIndex).GetPageFormat();
doc.CurrentPageNumber = pageIndex;
//Create Image
Bitmap img = new Bitmap((int)(format.Width * targetDPI / twipsPerInch), (int)(format.Length * targetDPI / twipsPerInch), PixelFormat.Format32bppArgb);
Graphics g = Graphics.FromImage(img);
//set image object background to white
g.Clear(Color.White);
//Get a device context for the grahics object
IntPtr hdc = g.GetHdc();
SetMapMode(hdc, MM_ISOTROPIC);
// set scaling factor
SetWindowExtEx(hdc, twipsPerInch, twipsPerInch, 0);
SetViewportExtEx(hdc, targetDPI, targetDPI, 0);
//draw the contents of the PDF document on to the graphic context
doc.DrawCurrentPage(hdc, false);
//clean up
g.ReleaseHdc(hdc);
g.Dispose();
// Save the bitmap as png into the resulting stream
MemoryStream resultStrm = new MemoryStream();
img.Save(resultStrm, ImageFormat.Png);
//Prepare the stream to be read later on
resultStrm.Position = 0;
}
[System.Runtime.InteropServices.DllImportAttribute("gdi32.dll")]
private static extern int SetMapMode(IntPtr hdc, int MapMode);
[System.Runtime.InteropServices.DllImportAttribute("gdi32.dll")]
private static extern int SetWindowExtEx(IntPtr hdc, int nXExtent, int nYExtent, int not_used);
[System.Runtime.InteropServices.DllImportAttribute("gdi32.dll")]
private static extern int SetViewportExtEx(IntPtr hdc, int nXExtent, int nYExtent, int not_used);
Disclaimer: I currently work as a developer of the library
Modern versions of AI uses PDF as an export format. It's an enhanced form of PDF containing important metadata for Illustrator but ultimately it is PDF.
Yes most PDF packages are aimed at extracting bitmaps as these come in atomic lumps. If your embedded image is vector then it's been dropped in in a format which most will not understand.
Illustrator may have used its own metadata to delimit the image. If this is the case then it will be difficult to extract. However it may have used a PDF analog like the Form XObject. If I was designing Illustrator I would probably do both.
So it probably is possible to extract though perhaps a little tricky. More is impossible to say without being able to see the document.
If you would like to mail your illustrator file to us at ABCpdf we will certainly see what we can suggest. :-)
Related
I am looking to convert PDF files into images. Docnet is able to convert the pdf into bytes[] and their samples show how to save this byte[] into an image file using Bitmap. Documentation
However, the solution won't work on linux machine since Bitmap requires few libraries pre-installed on the system.
I've tried ImageSharp to convert the byte[] using SixLabors.ImageSharp.Image.Load<Bgra32>(rawBytes), however, it throws Unhandled exception. SixLabors.ImageSharp.InvalidImageContentException: PNG Image does not contain a data chunk.
Does anyone knows any alternative to achieve this.
PS - I'm open to explore any other cross platform FREE supported alternatives to convert PDF files to images.
This works fine with ImageSharp assuming Docnet works then ImageSharp will work fine for you.
The trick is you want to be using the Image.LoadPixelData<Bgra32>(rawBytes, width, height); API not the Image.Load<Bgra32>(encodedBytes); one.
using Docnet.Core;
using Docnet.Core.Models;
using SixLabors.ImageSharp;
using SixLabors.ImageSharp.PixelFormats;
using SixLabors.ImageSharp.Processing;
using var docReader = DocLib.Instance.GetDocReader(
"wikipedia_0.pdf",
new PageDimensions(1080, 1920));
using var pageReader = docReader.GetPageReader(0);
var rawBytes = pageReader.GetImage();
var width = pageReader.GetPageWidth();
var height = pageReader.GetPageHeight();
// this is the important line, here you are taking a byte array that
// represents the pixels directly where as Image.Load<Bgra32>()
// is expected an encoded image in png, jpeg etc format
using var img = Image.LoadPixelData<Bgra32>(rawBytes, width, height);
// you are likely going to want this as well otherwise you might end up with transparent parts.
img.Mutate(x => x.BackgroundColor(Color.White));
img.Save("wikipedia_0.png");
My goal is to add company logo to every page of an existing pdf(not watermark).
Due to pdf file and logo specifics, I can only place the logo on top of the pdf content(not underneath) and the logo has to support transparency.
One more limitation is I have to use .NET Core.
Posting this with an answer, because I could not find a clear solution.
Suggestions/corrections/improvements are welcome.
Hope someone finds this useful.
The newest iTextSharp library to support .NET Core is iText7 however I cannot use it legitemately; neither making my code open source, nor purchasing the licence is an option for me. Therefore I use old, third party library:
Install-Package iTextSharp.LGPLv2.Core
Latest version, the one I'm using, at the time of this post is 1.3.2
Following usings are required
using System;
using System.Drawing.Imaging;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
To acheve image transparency in pdf, image has to be opened in a correct format
var preImage = System.Drawing.Image.FromFile(imagePath);
var image = Image.GetInstance(preImage, ImageFormat.Png);
When adding the image, it is also important to not select the image to be inline
canvas.AddImage(image);//do not put .AddImage(image, true);
Here is all the code
var imagePath = "logo.png";
var pdfPath = "edit_this.pdf";
//load pdf file
var pdfBytes = File.ReadAllBytes(pdfPath);
var oldFile = new PdfReader(pdfBytes);
//load image
var preImage = System.Drawing.Image.FromFile(imagePath);
var image = Image.GetInstance(preImage, ImageFormat.Png);
preImage.Dispose();
//optional: if image is wider than the page, scale down the image to fit the page
var sizeWithRotation = oldFile.GetPageSizeWithRotation(1);
if (image.Width > sizeWithRotation.Width)
image.ScalePercent(sizeWithRotation.Width / image.Width * 100);
//set image position in top left corner
//in pdf files, cooridinates start in the left bottom corner
image.SetAbsolutePosition(0, sizeWithRotation.Height - image.ScaledHeight);
//in production, I use MemoryStream
//I put FileStream here to test the code in console application
using (var newFileStream = new FileStream("with_logo.pdf", FileMode.Create))
{
//setup PdfStamper
var stamper = new PdfStamper(oldFile, newFileStream);
//iterate through the pages in the original file
for (var i = 1; i <= oldFile.NumberOfPages; i++)
{
//get canvas for current page
var canvas = stamper.GetOverContent(i);
//add image with pre-set position and size
canvas.AddImage(image);
}
stamper.Close();
}
This code works with local files.
In my (real world) case, I receive pdf files as Base64 string, add a logo from local storage, convert it back to Base64 string and output it on a web-page.
I open the image as PNG forcefully(hardcoded) because I control what extension does the logo have. If necessary you can dynamicaly set the image format.
I have array of image bytes and I would like to set resolution. Original image can be JPEG, PNG, BMP. Output - PNG. I am using ImageMagic to convert image and do some manipulations.
using (var image = this.Convert(originalImage, height, width))
using (var stream = new MemoryStream())
{
image.Quality = 90;
image.Write(stream, MagickFormat.Png);
return stream.GetBuffer();
}
I tryed to modify image.GetExifProfile, but has no success (at least for PNG images).
I can't use any comandline tool (like ImageMagic or ExifTool) here.
There are 3 exiff tags I need to modify
XResolution
YResolution
ResolutionUnit
I can successfully achieve this with bitmap, but it also resource overhead (need to create MemoryStream ...).
I have found some Pdf specification, but it will consume time to make it all work.
Does any can point me to right direction?
Thanks.
i convert a PDF file to BitmapImage in C#. After i manipulate it (resize, rotate) and i want to save it to new PNG or JPEG file but i'm not found how can i make that. I'm developed a windows store apps in C#.
According to this blog post: Save XAML as PNG in a Windows Store App
You should be able to do this using the class BitmapEncoder, the method BitmapSource.CopyPixels will give you the pixels data that BitmapEncoder requires.
Use LibPdf, for PDF to Image conversion
This library converts converts PDF file to an image. Supported image formats are PNG and BMP, but you can easily add more.
Usage example:
using (FileStream file = File.OpenRead(#"..\path\to\pdf\file.pdf")) // in file
{
var bytes = new byte[file.Length];
file.Read(bytes, 0, bytes.Length);
using (var pdf = new LibPdf(bytes))
{
byte[] pngBytes = pdf.GetImage(0,ImageType.PNG); // image type
using (var outFile = File.Create(#"..\path\to\pdf\file.png")) // out file
{
outFile.Write(pngBytes, 0, pngBytes.Length);
}
}
}
ImageMagick, you should also look at this freely available and powerful tool. It's capable of doing what you want and also provides some .NET bindings (as well as bindings to several other languages).
string imgfile = #"C:\users\me\desktop\test.jpg";
Bitmap bmp = new Bitmap(imgfile);
Bitmap bw = ConvertTo1Bpp(bmp); //make b+w
Document doc =
new Document(new iTextSharp.text.Rectangle(bmp.Width, bmp.Height));
PdfWriter.GetInstance(doc,
new System.IO.FileStream(
#"C:\users\me\desktop\test.pdf",
System.IO.FileMode.Create,
System.IO.FileAccess.ReadWrite));
iTextSharp.text.ImgJBIG2 i =
((iTextSharp.text.ImgJBIG2)iTextSharp.text.ImgJBIG2.GetInstance(
bmp, System.Drawing.Imaging.ImageFormat.Bmp));
doc.Open();
doc.Add(i);
doc.Close();
I cant find any good documentation for this with iTextSharp. What I am trying to do is take a Jpeg file and convert it into a pdf embedded as a black and white JBig2 image. The error I get is an InvalidCastException between "iTextSharp.text.ImageRaw" and "iTextSharp.text.ImageJBig2"... is there an alternative to what I have above?
EDIT
ImgJBig2 just represents an image already encoded in JBig2 I believe now. What I am looking for is something that will take a Bitmap and encode it into a BW JBig2 Bitmap that I can put into a Pdf.
As far as I can tell there's not a lot of options for encoding JBig2 and no native free ones.
Windows Imaging SDK ($2500+)
JBIG2 Compression Codec SDK for .NET ($1,800)
jbig2enc (free but c code only)
The last creates a CLI program that you might be able to P/Invoke into or at worst script against so I think that's your best option if you don't want to pay. Does JBIG2 offer such greater compression over other formats?