Updating existing markup (FreeText Callout) PDF using itext7 .NET - c#

I have a code below to update existing markup (FreeText Callout) PDF using itext7 .NET. It does not appear correctly, but edit it in the bluebeam then it is shown the correct content as this image:
What am I missing?
public void UpdateMarkupCallout()
{
string inPDF = #"C:\in PDF.pdf";
string outPDF = #"C:\out PDF.pdf";
PdfDocument pdfDoc = new PdfDocument(new PdfReader(inPDF), new PdfWriter(outPDF));
int numberOfPages = pdfDoc.GetNumberOfPages();
for (int i = 1; i <= numberOfPages; i++)
{
PdfDictionary page = pdfDoc.GetPage(i).GetPdfObject();
PdfArray annotArray = page.GetAsArray(PdfName.Annots);
if (annotArray == null)
{
continue;
}
int size = annotArray.Size();
for (int x = 0; x < size; x++)
{
PdfDictionary curAnnot = annotArray.GetAsDictionary(x);
if (curAnnot.GetAsString(PdfName.Contents) != null)
{
string contents = curAnnot.GetAsString(PdfName.Contents).ToString();
if (contents != "" && contents.Contains("old content"))
{
curAnnot.Put(PdfName.Contents, new PdfString("new content"));
}
}
}
}
pdfDoc.Close();
}
The attached files: here

The answer is in Java but conversion to C# should be a matter of some easy letter case replacements and small tweaks.
Unfortunately, there is no silver bullet solution here, at least not without significant effort.
1. Partial proper solution
There are several issues here. First, you are only updating /Contents key, while the annotations you are editing also have /RC key which stands for A rich text string (see Adobe XML Architecture, XML Forms Architecture (XFA) Specification, version 3.3) that shall be used to generate the appearance of the annotation. (ISO 32000).
On top of that, the appearance (/AP entry) must be regenerated. as dictated by the specification. This is not what iText is capable of doing at the moment, so you will have to do it yourself.
You need to determine the area where the text must be drawn, taking /RD, or rect diff entry into account.
To create your appearance you can use pdfHTML add-on which would process the rich text representation from /RC into layout elements that you can transfer to an XObject that you can put into /AP.
With the code similar to the following:
PdfDocument pdfDocument = new PdfDocument(new PdfReader("in PDF.pdf"),
new PdfWriter("out PDF.pdf"));
int numberOfPages = pdfDocument.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
PdfDictionary page = pdfDocument.getPage(i).getPdfObject();
PdfArray annotArray = page.getAsArray(PdfName.Annots);
if (annotArray == null) {
continue;
}
int size = annotArray.size();
for (int x = 0; x < size; x++) {
PdfDictionary curAnnot = annotArray.getAsDictionary(x);
if (curAnnot.getAsString(PdfName.Contents) != null) {
String contents = curAnnot.getAsString(PdfName.Contents).toString();
if (!contents.isEmpty() && contents.contains("old content")) //set layer for a FreeText with this content
{
curAnnot.put(PdfName.Contents, new PdfString("new content"));
String richText = curAnnot.getAsString(PdfName.RC).toUnicodeString();
Document document = Jsoup.parse(richText);
for (Element element : document.select("p")) {
element.html("new content");
}
curAnnot.put(PdfName.RC, new PdfString(document.body().outerHtml()));
Rectangle bbox = curAnnot.getAsRectangle(PdfName.Rect);
Rectangle textBbox = bbox.clone();
// left, top, right, bottom
PdfArray rectDiff = curAnnot.getAsArray(PdfName.RD);
if (rectDiff != null) {
textBbox.applyMargins(rectDiff.getAsNumber(1).floatValue(),
rectDiff.getAsNumber(2).floatValue(),
rectDiff.getAsNumber(3).floatValue(),
rectDiff.getAsNumber(0).floatValue(), false);
}
float leftRectDiff = rectDiff != null ? rectDiff.getAsNumber(0).floatValue() : 0;
float topRectDiff = rectDiff != null ? rectDiff.getAsNumber(1).floatValue() : 0;
List<IElement> elements = HtmlConverter.convertToElements(document.body().outerHtml());
PdfFormXObject appearance = new PdfFormXObject(
new Rectangle(0, 0, bbox.getWidth(), bbox.getHeight()));
Canvas canvas = new Canvas(new PdfCanvas(appearance, pdfDocument),
new Rectangle(leftRectDiff, topRectDiff, textBbox.getWidth(), textBbox.getHeight()));
canvas.setProperty(Property.RENDERING_MODE, RenderingMode.HTML_MODE);
for (IElement ele : elements) {
if (ele instanceof IBlockElement) {
canvas.add((IBlockElement) ele);
}
}
curAnnot.getAsDictionary(PdfName.AP).put(PdfName.N, appearance.getPdfObject());
}
}
}
}
pdfDocument.close();
You would get the result that looks like that:
You can see that the new text is displayed as expected, but the overall visual representation is far from our expectations - the background filling, the borders and the arrows are missing. So to generate the appearance properly you would have to further explore other PDF properties such as /CL (arrow descriptors), /BS (border style), /C (background color) etc. This takes quite some time - reading up on the spec, parsing the relevant entries and applying those in your drawing operations. You can get some inspiration from PdfFormField class implementation.
2. Easy solution without any guarantees
In case you expect the text in your annotation to consist of only one line, be plain Latin text and in general the variability of the input documents is small, you can take the current appearance and assume that the text string will be written there in one chunk (it's the case for your input document).
Note that this is a hacky approach which is prone to many potential errors/bugs.
Sample code:
PdfDocument pdfDocument = new PdfDocument(new PdfReader("in PDF.pdf"),
new PdfWriter("out PDF.pdf"));
int numberOfPages = pdfDocument.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
PdfDictionary page = pdfDocument.getPage(i).getPdfObject();
PdfArray annotArray = page.getAsArray(PdfName.Annots);
if (annotArray == null) {
continue;
}
int size = annotArray.size();
for (int x = 0; x < size; x++) {
PdfDictionary curAnnot = annotArray.getAsDictionary(x);
if (curAnnot.getAsString(PdfName.Contents) != null) {
String contents = curAnnot.getAsString(PdfName.Contents).toString();
String oldContent = "old content";
if (!contents.isEmpty() && contents.contains(oldContent)) {
String newContent = "new content";
curAnnot.put(PdfName.Contents, new PdfString(newContent));
String richText = curAnnot.getAsString(PdfName.RC).toUnicodeString();
Document document = Jsoup.parse(richText);
for (Element element : document.select("p")) {
element.html(newContent);
}
curAnnot.put(PdfName.RC, new PdfString(document.body().outerHtml()));
PdfStream currentAppearance = curAnnot.getAsDictionary(PdfName.AP).getAsStream(PdfName.N);
String currentBytes = new String(currentAppearance.getBytes(), StandardCharsets.UTF_8);
currentBytes = currentBytes.replace("(" + oldContent + ") Tj", "(" + newContent + ") Tj");
currentAppearance.setData(currentBytes.getBytes(StandardCharsets.UTF_8));
}
}
}
}
pdfDocument.close();
Visual result (as you can see, this is what we want):
3. Non-compliant solution
Another way, which is not compliant with the PDF specification, is to remove /AP entry whatsoever. You can do it in the very same loop with curAnnot.remove(PdfName.AP);. Most major PDF viewers are going to regenerate the appearance themselves. However, my viewer generated the appearance in not the most appealing way:
So as you can see the result will depend on the PDF-viewer and this very well illustrates the reason why PDF specification mandates presence of /AP. Once again, this way is not compliant with the PDF spec .

Related

How to get and set value for /BSIColumnData of an annotation PDF itext c#

How to get and set value for /BSIColumnData of an annotation (markup) in PDF using itext c# as attached file?
I'm using Itext7 code below, but it is error at BSIColumnData:
public void BSIcontents ()
{
string pdfPath = #"C:\test PDF.pdf";
iText.Kernel.Pdf.PdfReader pdfReader = new iText.Kernel.Pdf.PdfReader(pdfPath);
iText.Kernel.Pdf.PdfDocument pdfDoc = new iText.Kernel.Pdf.PdfDocument(pdfReader);
int numberOfPages = pdfDoc.GetNumberOfPages();
int z = 0;
for (int i = 1; i <= numberOfPages; i++)
{
iText.Kernel.Pdf.PdfDictionary page = pdfDoc.GetPage(i).GetPdfObject();
iText.Kernel.Pdf.PdfArray annotArray = page.GetAsArray(iText.Kernel.Pdf.PdfName.Annots);
if (annotArray == null)
{
z++;
continue;
}
int size = annotArray.Size();
for (int x = 0; x < size; x++)
{
iText.Kernel.Pdf.PdfDictionary curAnnot = annotArray.GetAsDictionary(x);
if (curAnnot != null)
{
if (curAnnot.GetAsString(iText.Kernel.Pdf.PdfName.BSIColumnData) != null)
{
MessageBox.Show("BSIColumnData: " + curAnnot.GetAsString(iText.Kernel.Pdf.PdfName.BSIColumnData).ToString());
}
}
}
}
pdfReader.Close();
}
In Bluebeam Revu, you can see as below:
In Itext-rups 5.5.9, you can see as below:
I see two errors:
You try to use the BSIColumnData name like this:
iText.Kernel.Pdf.PdfName.BSIColumnData
This assumes that there is already a static PdfName member for your custom name. But of course there isn't, there only are predefined members for standard names used in itext itself. If you want to work with other names, you have to create a PdfName instance yourself and use that instance, e.g. like this
var BSIColumnData = new iText.Kernel.Pdf.PdfName("BSIColumnData");
You try to retrieve the value of that name as string
curAnnot.GetAsString(iText.Kernel.Pdf.PdfName.BSIColumnData)
but it is clear from your RUPS screenshot that the value of that name is an array of strings. Thus, even after correcting as described in the first item GetAsString(BSIColumnData) will return null. Instead do
var BSIColumnData = new iText.Kernel.Pdf.PdfName("BSIColumnData");
var array = curAnnot.GetAsArray(BSIColumnData);
After checking if (array != null) you now can access the strings at their respective indices using array.GetAsString(index).

How to split a PDF by titles with a specific size or font

I'm trying to split a PDF by his titles depending of his size or font.
Currently I could only extract the fonts of a PDF but I need to know the size or font to see if that text is a title. I don't know how to read the PDF by his titles with an specific size or font, I know it can extract the text but is just that, simple text, how do you know which size or font has that text ?
By the way, I have been able to split the PDF by his bookmarks and also the bookmark's "kids"
but I need to split the PDF more deeper. Is that why I'm trying to get the titles to split the PDF by them.
I made some research but I couldn't get something very useful for this case.
Here some questions:
How do you get the size?
How do you get the font?
How do you iterate the PDF line by line?
How do you check a text(line) if has a specific font?
Some code
HashSet<String> fontNames = new HashSet<string>();
PdfDictionary resources;
for (int p = 1; p <= reader.NumberOfPages; p++)
{
PdfDictionary dic = reader.GetPageN(p);
resources = dic.GetAsDict(PdfName.RESOURCES);
if (resources != null)
{
//get fonts dictionary
PdfDictionary fonts = resources.GetAsDict(PdfName.FONT);
if (fonts != null)
{
PdfDictionary font;
foreach (PdfName key in fonts.Keys)
{
font = fonts.GetAsDict(key);
String name = font.GetAsName(PdfName.BASEFONT).ToString();
fontNames.Add(name);
}
}
}
}
Another way
List<object[]> fonts2 = BaseFont.GetDocumentFonts(reader);
Other code: get text
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(reader, i, strategy);
words = currentText.Split('\n');
for (int j = 0, len = words.Length; j < len; j++)
{
line = Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(words[j]));
}
I used this:
public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo)
and the renderInfo parameter has all what I needed it.
renderInfo.GetFont().PostscriptFontName; // Font Name
renderInfo.GetBaseline().GetStartPoint(); // Coordinates - (56.6929 , 727.8466, 1)
renderInfo.GetAscentLine().GetEndPoint(); // Coordinates - (96.78018, 737.1749, 1)

Adding a group of annotations from one pdf in all pages of another pdf and create a new output pdf

I have created a reader for Input file and one for the Markup file. I am not sure if I should loop through the annotations and then add them one by one to the output or if there is a way to pull all the annotations from the markup file and add them to the input file retaining their x,z coordinates.
I have the below code, and I am not sure what to do at the commented section. The AddAnnotation method only takes PdfAnnotation as input but I am not sure how to convert the PdfDictionary to PdfAnnotaiton.
class Program
{
public static string inputFile = #"E:\pdf-sample.pdf";
public static string markupFile = #"E:\StampPdf.pdf";
public static string outputFile = #"E:\pdf.pdf";
public static PdfReader inputReader = new PdfReader(inputFile);
public static PdfReader markupReader = new PdfReader(markupFile);
static void Main(string[] args)
{
PdfDocument inputDoc = new PdfDocument(inputReader, new PdfWriter(outputFile));
PdfDocument markupDoc = new PdfDocument(markupReader);
int n = inputDoc.GetNumberOfPages();
for (int i = 1; i <= n; i++)
{
PdfPage page = inputDoc.GetPage(i);
PdfDictionary markupPage = markupDoc.GetFirstPage().GetPdfObject();
PdfArray annots = markupPage.GetAsArray(PdfName.Annots);
if(annots != null)
{
for(int j=0; j < annots.Size(); j++)
{
PdfDictionary annotItem = annots.GetAsDictionary(i);
//******
//page.AddAnnotation(?);
//******
}
}
}
inputDoc.Close();
}
}
I tried another variation after I found new GetAnnotations method in iText7. Here the code runs fine but I am not able to open the O/P file and get an error that the file is corrupted. Also when I ran inputDoc.Close() instead of the last line given below, I got an error “Pdf indirect object belongs to other PDF document. Copy object to current pdf document.”
PdfReader ireader = new PdfReader(inputFile);
PdfDocument inputDoc = new PdfDocument(ireader, new PdfWriter(outputFile));
PdfReader mreader = new PdfReader(markupFile);
PdfDocument markupDoc = new PdfDocument(mreader);
var annots = markupDoc.GetFirstPage().GetAnnotations();
if (annots != null)
{
for (int j = 0; j < annots.Count(); j++)
{
inputDoc.GetFirstPage().AddAnnotation(annots[j]);
}
}
ireader.Close();
mreader.Close();
markupDoc.Close();
inputDoc.SetCloseWriter(true);
Maybe try this :
if (annots != null)
{
for (int j = 0; j < annots.Size(); j++)
{
PdfDictionary annotItem = annots.GetAsDictionary(i);
PdfLineAnnotation lineAnnotation = new PdfLineAnnotation(annotItem);
page.AddAnnotation(lineAnnotation);
}
}
If it doesn't work, here is some documentation (unfortunately in Java)
http://developers.itextpdf.com/examples/actions-and-annotations/clone-creating-and-adding-annotations
If you could post Pdf with annotations you wish to copy - maybe I can debug and try something more.

c# itextsharp, locate words not chunks in page with their location for adding sticky notes

I already read all related StackOverflow and haven't find a decent solution to this. I want to open a PDF, get the text (words) and their coordinates then further, add a sticky note to some of them.
Seems to be mission impossible, I'm stucked.
How come this code will correctly find all words in a page (but not their coordinates)?
using (PdfReader reader = new PdfReader(path))
{
StringBuilder sb = new StringBuilder();
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
for (int page = 5; page <= 5; page++)
{
string text = PdfTextExtractor.GetTextFromPage(reader, page, strategy);
Console.WriteLine(text);
}
//txt = sb.ToString();
}
But this one gets coordinates, but for "chunks" that cannot rely they are in proper order.
PdfReader reader = new PdfReader(path);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
LocationTextExtractionStrategyEx strategy;
for (int i = 5; i <= 5; i++) // reader.NumberOfPages
{
//strategy = parser.ProcessContent(i, new SimpleTextExtractionStrategy());
// new MyLocationTextExtractionStrategy("sample", System.Globalization.CompareOptions.None)
strategy = parser.ProcessContent(i, new LocationTextExtractionStrategyEx("MCU_MOSI", 0));
foreach (LocationTextExtractionStrategyEx.ExtendedTextChunk chunk in strategy.m_DocChunks)
{
if (chunk.m_text.Trim() == "MCU_MOSI")
Console.WriteLine("Bingo"); // <-- NEVER HIT
}
//Console.WriteLine(strategy.m_SearchResultsList.ToString()); // strategy.GetResultantText() +
}
This uses a class from this post (little modified by me)
Getting Coordinates of string using ITextExtractionStrategy and LocationTextExtractionStrategy in Itextsharp
But only finds useless "chunks".
So the question is can with iTextSharp really locate words in page so I can add some sticky notes nearby? Thank you.
It looks like the chunk.m_text only contains one letter at a time which is why it this will never be true:
if (chunk.m_text.Trim() == "MCU_MOSI")
What you could do instead is have each chunk text added to a string and see if it contains your text.
PdfReader reader = new PdfReader(path);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
LocationTextExtractionStrategyEx strategy;
string str = string.Empty;
for (int i = 5; i <= 5; i++) // reader.NumberOfPages
{
strategy = parser.ProcessContent(i, new LocationTextExtractionStrategyEx("MCU_MOSI", 0));
var x = strategy.m_SearchResultsList;
foreach (LocationTextExtractionStrategyEx.ExtendedTextChunk chunk in strategy.m_DocChunks)
{
str += chunk.m_text;
if (str.Contains("MCU_MOSI"))
{
str = string.Empty;
Vector location = chunk.m_endLocation;
Console.WriteLine("Bingo");
}
}
}
Note for the example of the location, I made m_endLocation public.

Convert Pdf file pages to Images with itextsharp

I want to convert Pdf pages in Images using ItextSharp lib.
Have any idea how to convert each page in image file
iText/iTextSharp can generate and/or modify existing PDFs but they do not perform any rendering which is what you are looking for. I would recommend checking out Ghostscript or some other library that knows how to actually render a PDF.
you can use ImageMagick convert pdf to image
convert -density 300 "d:\1.pdf" -scale #1500000 "d:\a.jpg"
and split pdf can use itextsharp
here is the code from others.
void SplitePDF(string filepath)
{
iTextSharp.text.pdf.PdfReader reader = null;
int currentPage = 1;
int pageCount = 0;
//string filepath_New = filepath + "\\PDFDestination\\";
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
//byte[] arrayofPassword = encoding.GetBytes(ExistingFilePassword);
reader = new iTextSharp.text.pdf.PdfReader(filepath);
reader.RemoveUnusedObjects();
pageCount = reader.NumberOfPages;
string ext = System.IO.Path.GetExtension(filepath);
for (int i = 1; i <= pageCount; i++)
{
iTextSharp.text.pdf.PdfReader reader1 = new iTextSharp.text.pdf.PdfReader(filepath);
string outfile = filepath.Replace((System.IO.Path.GetFileName(filepath)), (System.IO.Path.GetFileName(filepath).Replace(".pdf", "") + "_" + i.ToString()) + ext);
reader1.RemoveUnusedObjects();
iTextSharp.text.Document doc = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage));
iTextSharp.text.pdf.PdfCopy pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
doc.Open();
for (int j = 1; j <= 1; j++)
{
iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader1, currentPage);
pdfCpy.SetFullCompression();
pdfCpy.AddPage(page);
currentPage += 1;
}
doc.Close();
pdfCpy.Close();
reader1.Close();
reader.Close();
}
}
You can use Ghostscript
to convert the PDF files into Images, I used the following parameters to convert the needed PDF into tiff image with multiple frames :
gswin32c.exe -sDEVICE=tiff12nc -dBATCH -r200 -dNOPAUSE -sOutputFile=[Output].tiff [PDF FileName]
Also you can use the -q parameter for silent mode
You can get more information about its output devices from here
After that I can easily load the tiff frames like the following
using (FileStream stream = new FileStream(#"C:\tEMP\image_$i.tiff", FileMode.Open, FileAccess.Read, FileShare.Read))
{
BitmapDecoder dec = BitmapDecoder.Create(stream, BitmapCreateOptions.IgnoreImageCache, BitmapCacheOption.None);
BitmapEncoder enc = BitmapEncoder.Create(dec.CodecInfo.ContainerFormat);
enc.Frames.Add(dec.Frames[frameIndex]);
}
I did it with MuPDFCore NuGet. Here is the link to guide I used : https://giorgiobianchini.com/MuPDFCore/MuPDFCore.pdf
using System;
using System.Threading.Tasks;
using MuPDFCore;
using VectSharp.Raster;
MuPDFContext context = new MuPDFContext();
MuPDFDocument document = new MuPDFDocument(context, #"C:\install\test.pdf");
//Renderers: one per page
MuPDFMultiThreadedPageRenderer[] renderers = new MuPDFMultiThreadedPageRenderer[document.Pages.Count];
//Page size: one per page
RoundedSize[] renderedPageSizes = new RoundedSize[document.Pages.Count];
//Boundaries of the tiles that make up each page: one array per page, with one element per thread
RoundedRectangle[][] tileBounds = new RoundedRectangle[document.Pages.Count][];
//Addresses of the memory areas where the image data of the tiles will be stored: one array per page, with one element per thread
IntPtr[][] destinations = new IntPtr[document.Pages.Count][];
//Cycle through the pages in the document to initialise everything
for (int i = 0; i < document.Pages.Count; i++)
{
//Initialise the renderer for the current page, using two threads (total number of threads: number of pages x 2
renderers[i] = document.GetMultiThreadedRenderer(i, 2);
//Determine the boundaries of the page when it is rendered with a 1.5x zoom factor
RoundedRectangle roundedBounds = document.Pages[i].Bounds.Round(2);//quality ..can use 0.5 ,1 etc.
renderedPageSizes[i] = new RoundedSize(roundedBounds.Width, roundedBounds.Height);
//Determine the boundaries of each tile by splitting the total size of the page by the number of threads.
tileBounds[i] = renderedPageSizes[i].Split(renderers[i].ThreadCount);
destinations[i] = new IntPtr[renderers[i].ThreadCount];
for (int j = 0; j < renderers[i].ThreadCount; j++)
{
//Allocate the required memory for the j-th tile of the i-th page.
//Since we will be rendering with a 24-bit-per-pixel format, the required memory in bytes is height x width x 3.
destinations[i][j] = System.Runtime.InteropServices.Marshal.AllocHGlobal(tileBounds[i][j].Height * tileBounds[i][j].Width * 3);
}
}
//Start the actual rendering operations in parallel.
Parallel.For(0, document.Pages.Count, i =>
{
renderers[i].Render(renderedPageSizes[i], document.Pages[i].Bounds, destinations[i], PixelFormats.RGB);
});
//The code in this for-loop is not really part of MuPDFCore - it just shows an example of using VectSharp to "stitch" the tiles up and produce the full image.
for (int i = 0; i < document.Pages.Count; i++)
{
//Create a new (empty) image to hold the whole page.
VectSharp.Page renderedPage = new VectSharp.Page(renderedPageSizes[i].Width,
renderedPageSizes[i].Height);
//Draw each tile onto the image.
for (int j = 0; j < renderers[i].ThreadCount; j++)
{
//Create a raster image object containing the pixel data. Yay, we do not need to copy/marshal anything!
VectSharp.RasterImage tile = new VectSharp.RasterImage(destinations[i][j], tileBounds[i][j].Width,
tileBounds[i][j].Height, false, false);
//Draw the tile on the main image page.
renderedPage.Graphics.DrawRasterImage(tileBounds[i][j].X0, tileBounds[i][j].Y0, tile);
}
//Save the full page as a PNG image.
renderedPage.SaveAsPNG(#"C:\install\page"+ i.ToString() + ".png");
}
//Clean-up code.
for (int i = 0; i < document.Pages.Count; i++)
{
//Release the allocated memory.
for (int j = 0; j < renderers[i].ThreadCount; j++)
{
System.Runtime.InteropServices.Marshal.FreeHGlobal(destinations[i][j]);
}
//Release the renderer (if you skip this, the quiescent renderer’s threads will not be stopped, and your application will never exit!
renderers[i].Dispose();
}
document.Dispose();
context.Dispose();
}
you can extract Image from PDF
and save as JPG
here is the sample code
you need Itext Sharp
public IEnumerable<System.Drawing.Image> ExtractImagesFromPDF(string sourcePdf)
{
// NOTE: This will only get the first image it finds per page.
var pdf = new PdfReader(sourcePdf);
var raf = new RandomAccessFileOrArray(sourcePdf);
try
{
for (int pageNum = 1; pageNum <= pdf.NumberOfPages; pageNum++)
{
PdfDictionary pg = pdf.GetPageN(pageNum);
// recursively search pages, forms and groups for images.
PdfObject obj = ExtractImagesFromPDF_FindImageInPDFDictionary(pg);
if (obj != null)
{
int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(CultureInfo.InvariantCulture));
PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
PdfStream pdfStrem = (PdfStream)pdfObj;
PdfImageObject pdfImage = new PdfImageObject((PRStream)pdfStrem);
System.Drawing.Image img = pdfImage.GetDrawingImage();
yield return img;
}
}
}
finally
{
pdf.Close();
raf.Close();
}
}

Categories

Resources