I want to export some image from some PDF File, to do this I should to use a PdfSharp library.
I have find the code on the web to export image file from one PDF but if the image is codifing by DCTDecode I don't have any problems. If the image ise codifing by FlatDecode mode I'm not able to export this Image.
So this is the code:
static void Main(string[] args)
{
//estrapolare immagine da pdf
const string filename = "d://eresult.pdf";
PdfDocument document = PdfReader.Open(filename);
int imageCount = 0;
// Iterate pages
foreach (PdfPage page in document.Pages)
{
// Get resources dictionary
PdfDictionary resources = page.Elements.GetDictionary("/Resources");
if (resources != null)
{
// Get external objects dictionary
PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
if (xObjects != null)
{
ICollection<PdfItem> items = xObjects.Elements.Values;
// Iterate references to external objects
foreach (PdfItem item in items)
{
PdfReference reference = item as PdfReference;
if (reference != null)
{
PdfDictionary xObject = reference.Value as PdfDictionary;
// Is external object an image?
if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image")
{
ExportImage(xObject, ref imageCount);
}
}
}
}
}
}
System.Diagnostics.Debug.Write(imageCount + " images exported.", "Export Images");
}
static void ExportImage(PdfDictionary image, ref int count)
{
string filter = image.Elements.GetName("/Filter");
switch (filter)
{
case "/DCTDecode":
ExportJpegImage(image, ref count);
break;
case "/FlateDecode":
ExportAsPngImage(image, ref count);
break;
}
}
static void ExportJpegImage(PdfDictionary image, ref int count)
{
// Fortunately JPEG has native support in PDF and exporting an image is just writing the stream to a file.
byte[] stream = image.Stream.Value;
FileStream fs = new FileStream(String.Format("Image{0}.jpeg", count++), FileMode.Create, FileAccess.Write);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(stream);
bw.Close();
}
static void ExportAsPngImage(PdfDictionary image, ref int count)
{
int width = image.Elements.GetInteger(PdfImage.Keys.Width);
int height = image.Elements.GetInteger(PdfImage.Keys.Height);
int bitsPerComponent = image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent);
PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
byte[] decodedBytes = flate.Decode(image.Stream.Value);
System.Drawing.Imaging.PixelFormat pixelFormat;
switch (bitsPerComponent)
{
case 1:
pixelFormat = PixelFormat.Format1bppIndexed;
break;
case 8:
pixelFormat = PixelFormat.Format8bppIndexed;
break;
case 24:
pixelFormat = PixelFormat.Format24bppRgb;
break;
default:
throw new Exception("Unknown pixel format " + bitsPerComponent);
}
Bitmap bmp = new Bitmap(width, height, pixelFormat);
var bmpData = bmp.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.WriteOnly, pixelFormat);
int length = (int)Math.Ceiling(width * bitsPerComponent / 8.0);
for (int i = 0; i < height; i++)
{
int offset = i * length;
int scanOffset = i * bmpData.Stride;
Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
}
bmp.UnlockBits(bmpData);
using (FileStream fs = new FileStream(#"D:\\" + String.Format("Image{0}.png", count++), FileMode.Create, FileAccess.Write))
{
bmp.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
}
// TODO: You can put the code here that converts vom PDF internal image format to a Windows bitmap
// and use GDI+ to save it in PNG format.
// It is the work of a day or two for the most important formats. Take a look at the file
// PdfSharp.Pdf.Advanced/PdfImage.cs to see how we create the PDF image formats.
// We don't need that feature at the moment and therefore will not implement it.
// If you write the code for exporting images I would be pleased to publish it in a future release
// of PDFsharp.
}
With this code I can see the Image in this strange mode:
But the image in the Pdf file is this:
As you can see, the color is too different
Image data and color palette are different objects in the PDF file. Images can have masks and these would be different objects, too.
When saving image data to a PNG file, you also may have to get the color palette and include the color data in the PNG file.
Maybe the code shown on the PDFsharp forum works better than your code:
http://forum.pdfsharp.net/viewtopic.php?p=6755#p6755
Related
I've seen a ton of stackoverflow articles for reducing image size, but none of them maintain the original image type (or so I've found). They usually have steps to reduce pixel dimensions, reduce image quality, and convert to a specific type of image (usually jpeg).
I have a group of images that I need to resize. They have various image types, and the filenames are all stored in a database, which makes converting from one image type to another somewhat problematic. I can't just change the filename from png to jpg because then the database won't point at a real file.
Doe anyone have an example of how to resize / reduce images to '256 kilobytes' and maintain the original image type?
For examples, here is the code I'm currently fiddling with.
public static byte[] ResizeImageFile(Image oldImage, int targetSize) // Set targetSize to 1024
{
Size newSize = CalculateDimensions(oldImage.Size, targetSize);
using (Bitmap newImage = new Bitmap(newSize.Width, newSize.Height, PixelFormat.Format24bppRgb))
{
using (Graphics canvas = Graphics.FromImage(newImage))
{
canvas.SmoothingMode = SmoothingMode.AntiAlias;
canvas.InterpolationMode = InterpolationMode.HighQualityBicubic;
canvas.PixelOffsetMode = PixelOffsetMode.HighQuality;
canvas.DrawImage(oldImage, new Rectangle(new Point(0, 0), newSize));
MemoryStream m = new MemoryStream();
newImage.Save(m, ImageFormat.Jpeg);
return m.GetBuffer();
}
}
}
Maybe there is a way I can get file fileinfo or mime type first and then switch on the .Save for the type of image?
Here is what I came up with (based on some examples that I found online that weren't 100% complete.
private void EnsureImageRequirements(string filePath)
{
try
{
if (File.Exists(filePath))
{
// If images are larger than 300 kilobytes
FileInfo fInfo = new FileInfo(filePath);
if (fInfo.Length > 300000)
{
Image oldImage = Image.FromFile(filePath);
ImageFormat originalFormat = oldImage.RawFormat;
// manipulate the image / Resize
Image tempImage = RefactorImage(oldImage, 1200); ;
// Dispose before deleting the file
oldImage.Dispose();
// Delete the existing file and copy the image to it
File.Delete(filePath);
// Ensure encoding quality is set to an acceptable level
ImageCodecInfo[] encoders = ImageCodecInfo.GetImageEncoders();
// Set encoder to fifty percent compression
EncoderParameters eps = new EncoderParameters
{
Param = { [0] = new EncoderParameter(Encoder.Quality, 50L) }
};
ImageCodecInfo ici = (from codec in encoders where codec.FormatID == originalFormat.Guid select codec).FirstOrDefault();
// Save the reformatted image and use original file format (jpeg / png / etc) and encoding
tempImage.Save(filePath, ici, eps);
// Clean up RAM
tempImage.Dispose();
}
}
}
catch (Exception ex)
{
this._logger.Error("Could not resize oversized image " + filePath, ex);
}
}
private static Image RefactorImage(Image imgToResize, int maxPixels)
{
int sourceWidth = imgToResize.Width;
int sourceHeight = imgToResize.Height;
int destWidth = sourceWidth;
int destHeight = sourceHeight;
// Resize if needed
if (sourceWidth > maxPixels || sourceHeight > maxPixels)
{
float thePercent = 0;
float thePercentW = 0;
float thePercentH = 0;
thePercentW = maxPixels / (float) sourceWidth;
thePercentH = maxPixels / (float) sourceHeight;
if (thePercentH < thePercentW)
{
thePercent = thePercentH;
}
else
{
thePercent = thePercentW;
}
destWidth = (int)(sourceWidth * thePercent);
destHeight = (int)(sourceHeight * thePercent);
}
Bitmap tmpImage = new Bitmap(destWidth, destHeight, PixelFormat.Format24bppRgb);
Graphics g = Graphics.FromImage(tmpImage);
g.InterpolationMode = InterpolationMode.HighQualityBilinear;
g.DrawImage(imgToResize, 0, 0, destWidth, destHeight);
g.Dispose();
return tmpImage;
}
I am converting number of Jpeg or Png image to PDF using iTextSharp dll. I can convert but the size of the PDF cause much worry. If I convert 9 jpeg images (total size is 4.5 MB) into single pdf , it creates 12.3 MB size of PDF. Below is conversion part.
private bool CreatePdf(string stFilePath_in, List<ImageData> lstImageData_in, string doctype, string stproCompid)
{
bool flag = false;
StringBuilder builder = new StringBuilder();
try
{
this.UtilityProgress(lstImageData_in.Count);
builder.Append(stFilePath_in);
builder.Append(#"\");
builder.Append(lstImageData_in[0].Barcode);
builder.Append(".pdf");
Document document = new Document(PageSize.LETTER, 10f, 10f, 42f, 35f);
PdfWriter.GetInstance(document, new FileStream(builder.ToString(), FileMode.OpenOrCreate));
document.Open();
IOrderedEnumerable<ImageData> enumerable = from files in lstImageData_in
orderby files.PageNo
select files;
if (enumerable != null)
{
DbFileData data2;
foreach (ImageData data in enumerable)
{
Bitmap bitmap = new Bitmap(data.FilePath);
iTextSharp.text.Image instance = iTextSharp.text.Image.GetInstance(bitmap, ImageFormat.Png);
if (instance.Height > instance.Width)
{
float num = 0f;
num = 700f / instance.Height;
instance.ScalePercent(num * 100f);
}
else
{
float num2 = 0f;
num2 = 540f / instance.Width;
instance.ScalePercent(num2 * 100f);
}
instance.Border = 15;
instance.BorderColor = BaseColor.BLACK;
instance.BorderWidth = 3f;
document.Add(instance);
document.NewPage();
bitmap.Dispose();
}
document.Close();
if (doctype == "AR")
{
//data2.m_stInvoiceNo = lstImageData_in[0].Barcode.Substring(2);
data2.m_stInvoiceNo = lstImageData_in[0].Barcode.ToString();
data2.m_doctype = "AR";
}
else
{
data2.m_stInvoiceNo = lstImageData_in[0].Barcode.ToString();
data2.m_doctype = "PO";
}
data2.m_stImgLocation = builder.ToString();
string str = DateTime.Now.ToString("MM/dd/yy,hh:mm:ss");
data2.m_dtDate = DateTime.Now.Date;
data2.m_stTime = str.Substring(str.IndexOf(",") + 1);
data2.m_stcompid = stproCompid;
this.OnPdfFileCreationCompleted(data2);
return true;
}
flag = false;
}
catch (Exception exception)
{
flag = false;
StringBuilder builder2 = new StringBuilder();
builder2.Append(builder.ToString());
builder2.Append(": \t");
builder2.Append(exception.Message);
this.m_excepLogger.LogException(builder2.ToString());
}
return flag;
}
The OP creates the iTextSharp Image object like this:
Bitmap bitmap = new Bitmap(data.FilePath);
iTextSharp.text.Image instance = iTextSharp.text.Image.GetInstance(bitmap, ImageFormat.Png);
What this actually means is that the original image file is decoded into a bitmap, and then iTextSharp is asked to use the bitmap as if it was a PNG image.
In case of JPG images this usually means that the amount of data required to store the image explodes.
To prevent such size explosions one should allow iTextSharp to directly work with the original image file data, in the context at hand:
iTextSharp.text.Image instance = iTextSharp.text.Image.GetInstance(data.FilePath);
I want to read a dicom image using simpleitk, convert it into a bitmap and then display the result in a pictureBox. But when I'm trying to do this, an ArgumentException is thrown. How can I solve this?
Here is my code:
OpenFileDialog dialog = new OpenFileDialog();
dialog.Title = "Open";
dialog.Filter = "DICOM Files (*.dcm;*.dic)|*.dcm;*.dic|All Files (*.*)|*.*";
dialog.ShowDialog();
if (dialog.FileName != "")
{
using (sitk.ImageFileReader reader = new sitk.ImageFileReader())
{
reader.SetFileName(dialog.FileName);
reader.SetOutputPixelType(sitk.PixelIDValueEnum.sitkFloat32);
sitk.Image image = reader.Execute();
var castedImage = sitk.SimpleITK.Cast(image,
sitk.PixelIDValueEnum.sitkFloat32);
var size = castedImage.GetSize();
int length = size.Aggregate(1, (current, i) => current * (int)i);
IntPtr buffer = castedImage.GetBufferAsFloat();
// Declare an array to hold the bytes of the bitmap.
byte[] rgbValues = new byte[length];
// Copy the RGB values into the array.
Marshal.Copy(buffer, rgbValues, 0, length);
Stream stream = new MemoryStream(rgbValues);
Bitmap newBitmap = new Bitmap(stream);
//I have tried in this way, but it generated ArgumentException too
//Bitmap newBitmap = new Bitmap((int)image.GetWidth(), (int)image.GetHeight(), (int)image.GetDepth(), PixelFormat.Format8bppIndexed, buffer);
Obraz.pic.Image = newBitmap;
}
}
Thank you for your comments and attempts to help. After consultations and my own searching on the internet I solved this issue. The first problem was the inadequate representation of pixel image. I had to change Float32 to UInt8 to provide an eight-bit for pixel.
var castedImage = sitk.SimpleITK.Cast(image2, sitk.PixelIDValueEnum.sitkUInt8);
Then I would already create a Bitmap using the constructor that was was commented out in question, but with (int)image.GetWidth() instead of (int)image.GetDepth().
Bitmap newBitmap = new Bitmap((int)image.GetWidth(), (int)image.GetHeight(), (int)image.GetWidth(), PixelFormat.Format8bppIndexed, buffer);
Unfortunately, a new problem appeared. The image, that was supposed to be in gray scale, was displayed in strange colors. But I found the solution here
ColorPalette pal = newBitmap.Palette;
for (int i = 0; i <= 255; i++)
{
// create greyscale color table
pal.Entries[i] = Color.FromArgb(i, i, i);
}
newBitmap.Palette = pal; // you need to re-set this property to force the new ColorPalette
I'm saving a bitmap to a file on my hard drive inside of a loop (All the jpeg files within a directory are being saved to a database). The save works fine the first pass through the loop, but then gives the subject error on the second pass. I thought perhaps the file was getting locked so I tried generating a unique file name for each pass, and I'm also using Dispose() on the bitmap after the file get saved. Any idea what is causing this error?
Here is my code:
private string fileReducedDimName = #"c:\temp\Photos\test\filePhotoRedDim";
...
foreach (string file in files)
{
int i = 0;
//if the file dimensions are big, scale the file down
Stream photoStream = File.OpenRead(file);
byte[] photoByte = new byte[photoStream.Length];
photoStream.Read(photoByte, 0, System.Convert.ToInt32(photoByte.Length));
Image image = Image.FromStream(new MemoryStream(photoByte));
Bitmap bm = ScaleImage(image);
bm.Save(fileReducedDimName + i.ToString() + ".jpg", ImageFormat.Jpeg);//error occurs here
Array.Clear(photoByte,0, photoByte.Length);
bm.Dispose();
i ++;
}
...
Thanks
Here's the scale image code: (this seems to be working ok)
protected Bitmap ScaleImage(System.Drawing.Image Image)
{
//reduce dimensions of image if appropriate
int destWidth;
int destHeight;
int sourceRes;//resolution of image
int maxDimPix;//largest dimension of image pixels
int maxDimInch;//largest dimension of image inches
Double redFactor;//factor to reduce dimensions by
if (Image.Width > Image.Height)
{
maxDimPix = Image.Width;
}
else
{
maxDimPix = Image.Height;
}
sourceRes = Convert.ToInt32(Image.HorizontalResolution);
maxDimInch = Convert.ToInt32(maxDimPix / sourceRes);
//Assign size red factor based on max dimension of image (inches)
if (maxDimInch >= 17)
{
redFactor = 0.45;
}
else if (maxDimInch < 17 && maxDimInch >= 11)
{
redFactor = 0.65;
}
else if (maxDimInch < 11 && maxDimInch >= 8)
{
redFactor = 0.85;
}
else//smaller than 8" dont reduce dimensions
{
redFactor = 1;
}
destWidth = Convert.ToInt32(Image.Width * redFactor);
destHeight = Convert.ToInt32(Image.Height * redFactor);
Bitmap bm = new Bitmap(destWidth, destHeight,
PixelFormat.Format24bppRgb);
bm.SetResolution(Image.HorizontalResolution, Image.VerticalResolution);
Graphics grPhoto = Graphics.FromImage(bm);
grPhoto.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
grPhoto.DrawImage(Image,
new Rectangle(0, 0, destWidth, destHeight),
new Rectangle(0, 0, Image.Width, Image.Height),
GraphicsUnit.Pixel);
grPhoto.Dispose();
return bm;
}
If I'm reading the code right, your i variable is zero every time through the loop.
It is hard to diagnose exactly what is wrong, I would recommend that you use using statements to ensure that your instances are getting disposed of properly, but it looks like they are.
I originally thought it might be an issue with the ScaleImage. So I tried a different resize function (C# GDI+ Image Resize Function) and it worked, but i is always set to zero at beginning of each loop. Once you move i's initialization outside of the loop your scale method works as well.
private void MethodName()
{
string fileReducedDimName = #"c:\pics";
int i = 0;
foreach (string file in Directory.GetFiles(fileReducedDimName, "*.jpg"))
{
//if the file dimensions are big, scale the file down
using (Image image = Image.FromFile(file))
{
using (Bitmap bm = ScaleImage(image))
{
bm.Save(fileReducedDimName + #"\" + i.ToString() + ".jpg", ImageFormat.Jpeg);//error occurs here
//this is all redundant code - do not need
//Array.Clear(photoByte, 0, photoByte.Length);
//bm.Dispose();
}
}
//ResizeImage(file, 50, 50, fileReducedDimName +#"\" + i.ToString()+".jpg");
i++;
}
}
how do I extract Images, which are FlateDecoded (such like PNG) out of a PDF-Document with PDFSharp?
I found that comment in a Sample of PDFSharp:
// TODO: You can put the code here that converts vom PDF internal image format to a
// Windows bitmap
// and use GDI+ to save it in PNG format.
// [...]
// Take a look at the file
// PdfSharp.Pdf.Advanced/PdfImage.cs to see how we create the PDF image formats.
Does anyone have a solution for this problem?
Thanks for your replies.
EDIT: Because I'm not able to answer on my own Question within 8 hours, I do it on that way:
Thanks for your very fast reply.
I added some Code to the Method "ExportAsPngImage", but I didn't get the wanted results. It is just extracting a few more Images (png) and they don't have the right colors and are distorted.
Here's my actual Code:
PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
byte[] decodedBytes = flate.Decode(bytes);
System.Drawing.Imaging.PixelFormat pixelFormat;
switch (bitsPerComponent)
{
case 1:
pixelFormat = PixelFormat.Format1bppIndexed;
break;
case 8:
pixelFormat = PixelFormat.Format8bppIndexed;
break;
case 24:
pixelFormat = PixelFormat.Format24bppRgb;
break;
default:
throw new Exception("Unknown pixel format " + bitsPerComponent);
}
Bitmap bmp = new Bitmap(width, height, pixelFormat);
var bmpData = bmp.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.WriteOnly, pixelFormat);
int length = (int)Math.Ceiling(width * bitsPerComponent / 8.0);
for (int i = 0; i < height; i++)
{
int offset = i * length;
int scanOffset = i * bmpData.Stride;
Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
}
bmp.UnlockBits(bmpData);
using (FileStream fs = new FileStream(#"C:\Export\PdfSharp\" + String.Format("Image{0}.png", count), FileMode.Create, FileAccess.Write))
{
bmp.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
}
Is that the right way? Or should I choose another way? Thanks a lot!
I know this answer might be a few years to late, but maybe it will help others.
The disortion occurs in my case because image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent) seems to not return the correct value. As Vive la déraison pointed out under your question, you get the BGR Format for using Marshal.Copy. So reversing the Bytes and rotating the Bitmap after executing Marshal.Copy will do the job.
The resulting code looks like this:
private static void ExportAsPngImage(PdfDictionary image, ref int count)
{
int width = image.Elements.GetInteger(PdfImage.Keys.Width);
int height = image.Elements.GetInteger(PdfImage.Keys.Height);
var canUnfilter = image.Stream.TryUnfilter();
byte[] decodedBytes;
if (canUnfilter)
{
decodedBytes = image.Stream.Value;
}
else
{
PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
decodedBytes = flate.Decode(image.Stream.Value);
}
int bitsPerComponent = 0;
while (decodedBytes.Length - ((width * height) * bitsPerComponent / 8) != 0)
{
bitsPerComponent++;
}
System.Drawing.Imaging.PixelFormat pixelFormat;
switch (bitsPerComponent)
{
case 1:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;
break;
case 8:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;
break;
case 16:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format16bppArgb1555;
break;
case 24:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
break;
case 32:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format32bppArgb;
break;
case 64:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format64bppArgb;
break;
default:
throw new Exception("Unknown pixel format " + bitsPerComponent);
}
decodedBytes = decodedBytes.Reverse().ToArray();
Bitmap bmp = new Bitmap(width, height, pixelFormat);
BitmapData bmpData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.WriteOnly, bmp.PixelFormat);
int length = (int)Math.Ceiling(width * (bitsPerComponent / 8.0));
for (int i = 0; i < height; i++)
{
int offset = i * length;
int scanOffset = i * bmpData.Stride;
Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
}
bmp.UnlockBits(bmpData);
bmp.RotateFlip(RotateFlipType.Rotate180FlipNone);
bmp.Save(String.Format("exported_Images\\Image{0}.png", count++), System.Drawing.Imaging.ImageFormat.Png);
}
The code might need some optimisation, but it did export FlateDecoded Images correctly in my case.
To get a Windows BMP, you just have to create a Bitmap header and then copy the image data into the bitmap. PDF images are byte aligned (every new line starts on a byte boundary) while Windows BMPs are DWORD aligned (every new line starts on a DWORD boundary (a DWORD is 4 bytes for historical reasons)).
All information you need for the Bitmap header can be found in the filter parameters or can be calculated.
The color palette is another FlateEncoded object in the PDF. You also copy that into the BMP.
This must be done for several formats (1 bit per pixel, 8 bpp, 24 bpp, 32 bpp).
Here's my full code for doing this.
I'm extracting a UPS shipping label from a PDF so I know the format in advance. If your extracted image is of an unknown type then you'll need to check the bitsPerComponent and handle it accordingly. I also only handle the first image here on the first page.
Note: I'm using TryUnfilter to 'deflate' which uses whatever filter is applied and decodes the data in-place for me. No need to call 'Deflate' explicitly.
var file = #"c:\temp\PackageLabels.pdf";
var doc = PdfReader.Open(file);
var page = doc.Pages[0];
{
// Get resources dictionary
PdfDictionary resources = page.Elements.GetDictionary("/Resources");
if (resources != null)
{
// Get external objects dictionary
PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
if (xObjects != null)
{
ICollection<PdfItem> items = xObjects.Elements.Values;
// Iterate references to external objects
foreach (PdfItem item in items)
{
PdfReference reference = item as PdfReference;
if (reference != null)
{
PdfDictionary xObject = reference.Value as PdfDictionary;
// Is external object an image?
if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image")
{
// do something with your image here
// only the first image is handled here
var bitmap = ExportImage(xObject);
bmp.Save(#"c:\temp\exported.png", System.Drawing.Imaging.ImageFormat.Bmp);
}
}
}
}
}
}
Using these helper functions
private static Bitmap ExportImage(PdfDictionary image)
{
string filter = image.Elements.GetName("/Filter");
switch (filter)
{
case "/FlateDecode":
return ExportAsPngImage(image);
default:
throw new ApplicationException(filter + " filter not implemented");
}
}
private static Bitmap ExportAsPngImage(PdfDictionary image)
{
int width = image.Elements.GetInteger(PdfImage.Keys.Width);
int height = image.Elements.GetInteger(PdfImage.Keys.Height);
int bitsPerComponent = image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent);
var canUnfilter = image.Stream.TryUnfilter();
var decoded = image.Stream.Value;
Bitmap bmp = new Bitmap(width, height, System.Drawing.Imaging.PixelFormat.Format8bppIndexed);
BitmapData bmpData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.WriteOnly, bmp.PixelFormat);
Marshal.Copy(decoded, 0, bmpData.Scan0, decoded.Length);
bmp.UnlockBits(bmpData);
return bmp;
}
So far... my code... it works with many png files, but not the one that comes from adobe photoshop with colorspace indexed:
private bool ExportAsPngImage(PdfDictionary image, string SaveAsName)
{
int width = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Width);
int height = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Height);
int bitsPerComponent = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.BitsPerComponent);
var ColorSpace = image.Elements.GetArray(PdfImage.Keys.ColorSpace);
System.Drawing.Imaging.PixelFormat pixelFormat= System.Drawing.Imaging.PixelFormat.Format24bppRgb; //24 just for initalize
if (ColorSpace is null) //no colorspace.. bufferedimage?? is in BGR order instead of RGB so change the byte order. Right now it works
{
byte[] origineel_byte_boundary = image.Stream.UnfilteredValue;
bitsPerComponent = (origineel_byte_boundary.Length) / (width * height);
switch (bitsPerComponent)
{
case 4:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format32bppPArgb;
break;
case 3:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
break;
default:
{
MessageBox.Show("Unknown pixel format " + bitsPerComponent, "Error", MessageBoxButtons.OK, MessageBoxIcon.Warning);
return false;
}
break;
}
Bitmap bmp = new Bitmap(width, height, pixelFormat); //copy raw bytes to "master" bitmap so we are out of pdf format to work with
System.Drawing.Imaging.BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, width, height), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat);
System.Runtime.InteropServices.Marshal.Copy(origineel_byte_boundary, 0, bmd.Scan0, origineel_byte_boundary.Length);
bmp.UnlockBits(bmd);
Bitmap bmp2 = new Bitmap(width, height, pixelFormat);
for (int indicex = 0; indicex < bmp.Width; indicex++)
{
for (int indicey = 0; indicey < bmp.Height; indicey++)
{
Color nuevocolor = bmp.GetPixel(indicex, indicey);
Color colorintercambiado = Color.FromArgb(nuevocolor.A, nuevocolor.B, nuevocolor.G, nuevocolor.R);
bmp2.SetPixel(indicex, indicey, colorintercambiado);
}
}
using (FileStream fs = new FileStream(SaveAsName, FileMode.Create, FileAccess.Write))
{
bmp2.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
}
bmp2.Dispose();
bmp.Dispose();
}
else
{
// this is the case of photoshop... work needs to be done here. I ´m able to get the color palette but no idea how to put it back or create the png file...
switch (bitsPerComponent)
{
case 4:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format32bppArgb;
break;
default:
{
MessageBox.Show("Unknown pixel format " + bitsPerComponent, "Error", MessageBoxButtons.OK, MessageBoxIcon.Warning);
return false;
}
break;
}
if ((ColorSpace.Elements.GetName(0) == "/Indexed") && (ColorSpace.Elements.GetName(1) == "/DeviceRGB"))
{
//we need to create the palette
int paletteColorCount = ColorSpace.Elements.GetInteger(2);
List<System.Drawing.Color> paletteList = new List<Color>();
//Color[] palette = new Color[paletteColorCount+1]; // no idea why but it seams that there´s always 1 color more. ¿transparency?
PdfObject paletteObj = ColorSpace.Elements.GetObject(3);
PdfDictionary paletteReference = (PdfDictionary)paletteObj;
byte[] palettevalues = paletteReference.Stream.Value;
for (int index = 0; index < (paletteColorCount + 1); index++)
{
//palette[index] = Color.FromArgb(1, palettevalues[(index*3)], palettevalues[(index*3)+1], palettevalues[(index*3)+2]); // RGB
paletteList.Add(Color.FromArgb(1, palettevalues[(index * 3)], palettevalues[(index * 3) + 1], palettevalues[(index * 3) + 2])); // RGB
}
}
}
return true;
}
PDF may contain images with masks and with different colorspace options that is why simply decoding an image object may not work properly in some cases.
So the code also needs to check for image masks (/ImageMask) and other properties of image objects (to see if image should also use inverted colors or uses indexed colors) inside PDF to recreate the image similar to how it is displayed in PDF. See Image object, /ImageMask and /Decode dictionaries in the official PDF Reference.
Not sure if PDFSharp is capable of finding Image Mask objects inside PDF but iTextSharp is able to access image mask objects (see PdfName.MASK object types).
Commercial tools like PDF Extractor SDK are able to extract images in both original form and in "as rendered" form.
I work for ByteScout, maker of PDF Extractor SDK
Maybe not directly answer the question but another option to extract images from PDF is to use FreeSpire.PDF which can extract the image from pdf easily. It is available as Nuget package https://www.nuget.org/packages/FreeSpire.PDF/. They handle all the image format and can export as PNG. Their sample code is
using System;
using System.Collections.Generic;
using System.Text;
using System.Drawing;
using Spire.Pdf;
namespace ExtractImagesFromPDF
{
class Program
{
static void Main(string[] args)
{
//Instantiate an object of Spire.Pdf.PdfDocument
PdfDocument doc = new PdfDocument();
//Load a PDF file
doc.LoadFromFile("sample.pdf");
List<Image> ListImage = new List<Image>();
for (int i = 0; i < doc.Pages.Count; i++)
{
// Get an object of Spire.Pdf.PdfPageBase
PdfPageBase page = doc.Pages[i];
// Extract images from Spire.Pdf.PdfPageBase
Image[] images = page.ExtractImages();
if (images != null && images.Length > 0)
{
ListImage.AddRange(images);
}
}
if (ListImage.Count > 0)
{
for (int i = 0; i < ListImage.Count; i++)
{
Image image = ListImage[i];
image.Save("image" + (i + 1).ToString() + ".png", System.Drawing.Imaging.ImageFormat.Png);
}
System.Diagnostics.Process.Start("image1.png");
}
}
}
}
(code taken from https://www.e-iceblue.com/Tutorials/Spire.PDF/Spire.PDF-Program-Guide/How-to-Extract-Image-From-PDF-in-C.html)