Pdfsharp Out of Memory Exception when Combine Multi Pdf File - c#

I have to convert into a single pdf a large number (but undefined) pdf into one for this, I'm using the code PDFsharp here.
// Get some file names
string[] files = filesToPrint.ToArray();
// Open the output document
PdfDocument outputDocument = new PdfDocument();
PdfPage newPage;
int nProcessedFile = 0;
int nMemoryFile = 5;
int nStepConverted = 0;
String sNameLastCombineFile = "";
// Iterate files
foreach (string file in files)
{
// Open the document to import pages from it.
PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import);
// Iterate pages
int count = inputDocument.PageCount;
for (int idx = 0; idx < count; idx++)
{
// Get the page from the external document...
PdfPage page = inputDocument.Pages[idx];
// ...and add it to the output document.
outputDocument.AddPage(page);
}
nProcessedFile++;
if (nProcessedFile >= nMemoryFile)
{
//nProcessedFile = 0;
//nStepConverted++;
//sNameLastCombineFile = "ConcatenatedDocument" + nStepConverted.ToString() + " _tempfile.pdf";
//outputDocument.Save(sNameLastCombineFile);
//outputDocument.Close();
}
}
// Save the document...
const string filename = "ConcatenatedDocument1_tempfile.pdf";
outputDocument.Save(filename);
// ...and start a viewer.
Process.Start(filename);
For small numbers of files the code works but then at some point
generates an exception of out of memory
is there a solution?
p.s
I was thinking of saving the files in step and then the remaining aggiungingere so liebrare memory but I can not find the way.
UPDATE1:
if (nProcessedFile >= nMemoryFile)
{
nProcessedFile = 0;
//nStepConverted++;
sNameLastCombineFile = "ConcatenatedDocument" + nStepConverted.ToString() + " _tempfile.pdf";
outputDocument.Save(sNameLastCombineFile);
outputDocument.Close();
outputDocument = PdfReader.Open(sNameLastCombineFile,PdfDocumentOpenMode.Modify);
}
UPDATE 2 versione 1.32
Complete example
Error on line:
PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import);
Text error:
Cannot handle iref streams. The current implementation of PDFsharp cannot handle this PDF feature introduced with Acrobat 6.
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
List<String> filesToPrint = new List<string>();
filesToPrint = Directory.GetFiles(#"D:\Downloads\RACCOLTA\FILE PDF", "*.pdf").ToList();
// Get some file names
string[] files = filesToPrint.ToArray();
// Open the output document
PdfDocument outputDocument = new PdfDocument();
PdfPage newPage;
int nProcessedFile = 0;
int nMemoryFile = 5;
int nStepConverted = 0;
String sNameLastCombineFile = "";
try
{
// Iterate files
foreach (string file in files)
{
// Open the document to import pages from it.
PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import);
// Iterate pages
int count = inputDocument.PageCount;
for (int idx = 0; idx < count; idx++)
{
// Get the page from the external document...
PdfPage page = inputDocument.Pages[idx];
// ...and add it to the output document.
outputDocument.AddPage(page);
}
nProcessedFile++;
if (nProcessedFile >= nMemoryFile)
{
nProcessedFile = 0;
//nStepConverted++;
sNameLastCombineFile = "ConcatenatedDocument" + nStepConverted.ToString() + " _tempfile.pdf";
outputDocument.Save(sNameLastCombineFile);
outputDocument.Close();
inputDocument = PdfReader.Open(sNameLastCombineFile , PdfDocumentOpenMode.Modify);
}
}
// Save the document...
const string filename = "ConcatenatedDocument1_tempfile.pdf";
outputDocument.Save(filename);
// ...and start a viewer.
Process.Start(filename);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.ReadKey();
}
}
}
}
UPDATE3
Code that generate exception out of memory
int count = inputDocument.PageCount;
for (int idx = 0; idx < count; idx++)
{
// Get the page from the external document...
newPage = inputDocument.Pages[idx];
// ...and add it to the output document.
outputDocument.AddPage(newPage);
newPage.Close();
}
I can not exactly which row general exception

I had a simular issue, saving, closing and reopening the PdfDocument did not really help.
I am adding al lot (100+) large (upto 5Mb) images (tiff, jpg, etc) to a pdf document where every images has its own page. It crashed around image #50. After the save-close-reopen it did finish the whole document but was still getting close to max memory, around 3Gb. Some more images and it would still crash.
After more refining, I implemented a using for the XGraphics object, it was a little better again but not much.
The big step forward was disposing of the XImage within the loop! After that the application never used more than 100-200Kb, I removed the save-close-reopen for the PdfDocument and it was no problem.

After saving and closing outputDocument (the code is commented out in your snippet), you have to open outputDocument again, using PdfDocumentOpenMode.Modify.
It could help to add using(...) for the inputDocument.
If your code is running as a 32-bit process, then switching to 64 bit will allow your process to use more than 2 GB of RAM (assuming your computer has more than 2 GB RAM).
Update: The message "Cannot handle iref streams" means you have to use PDFsharp 1.50 Prerelease, available on NuGet.

Related

How to not lose notes added to PDF when merging PDF's with PDFSharp

I am using PDFSharp to merge multiple PDF documents saved on temp and network locations to then print them as one single task.
But it has come to my attention that when a PDF has some notes added to it that they become lost after the merge.
I am using the following Method's to merge the PDF's. (input is a list with file locations.)
public string CombinePdf(List<string> files)
{
PdfDocument outDoc = new PdfDocument();
foreach(string file in files)
{
using (PdfDocument pdfDocument = PdfReader.Open(FormatPath(file), PdfDocumentOpenMode.Import, PdfReadAccuracy.Strict))
{
CopyPages(pdfDocument, outDoc);
}
}
string outPdfFileName = $"{Guid.NewGuid()}.pdf";
string outPdfPath = Path.Combine(Path.GetTempPath(), outPdfFileName);
outDoc.Save(outPdfPath);
return outPdfPath;
}
private void CopyPages(PdfDocument from, PdfDocument to)
{
for (int i = 0; i < from.PageCount; i++)
{
to.AddPage(from.Pages[i]);
}
}

PDFSharp not disposing resources (memory leak)

I am writing a little tool for myself to merge PDF files using PDFSharp library. I am using the latest pre-release version (1.5) of PDFSharp.
I come across a problem where documents that are loaded into memory is not releasing when going out of scope. I tracked down this memory leak to the following part of the code:
using (var mergedDocument = new PdfDocument())
{
for (var i = 0; i < SelectedDocuments.Count; i++)
{
using (var document = PdfReader.Open(SelectedDocuments[i].FilePath, PdfDocumentOpenMode.Import))
{
for (var j = 0; j < document.PageCount; j++)
{
mergedDocument.AddPage(document.Pages[j]);
}
}
}
mergedDocument.Save(savePath);
}
An example would be I have 10 pdf documents which totals on 178 Mb. The merged document that is created is also around 178 Mb. When above code finishes executing memory usage holds at 356 Mb. When I merge more documents this memory leak keeps going up and eventually cause a crash.
I tried removing using statements and using Dispose() when I wanted the document to be released from memory, however it does not work as well.
Any help would be appreciated. Thank you.
Edit:
To be more precise:
var parentDirectory = Directory.GetParent(SelectedDocuments[0].FilePath);
var savePath = parentDirectory + "\\MergedDocument.pdf";
using (var mergedDocument = new PdfDocument())
{
for (var i = 0; i < SelectedDocuments.Count; i++)
{
using (var document = PdfReader.Open(SelectedDocuments[i].FilePath, PdfDocumentOpenMode.Import))
{
for (var j = 0; j < document.PageCount; j++)
{
mergedDocument.AddPage(document.Pages[j]);
}
}
}
mergedDocument.Save(savePath);
}
SelectedDocuments is a list which holds a bunch of file paths to the selected PDF files.
I ended up using iTextSharp instead with the following code to avoid memory issues:
var parentDirectory = Directory.GetParent(SelectedDocuments[0].FilePath);
var savePath = parentDirectory + "\\MergedDocument.pdf";
using (var fs = new FileStream(savePath, FileMode.Create))
{
using (var document = new Document())
{
using (var pdfCopy = new PdfCopy(document, fs))
{
document.Open();
for (var i = 0; i < SelectedDocuments.Count; i++)
{
using (var pdfReader = new PdfReader(SelectedDocuments[i].FilePath))
{
for (var page = 0; page < pdfReader.NumberOfPages;)
{
pdfCopy.AddPage(pdfCopy.GetImportedPage(pdfReader, ++page));
}
}
}
}
}
}

c# / openxml merge of word documents to one, fails --closed--

I have to merge several word documents with a small c# console application. So far so good. The documents are generated in arcplan. Around 30 files are generated, but somehow some documents are corrupted, but still shows me Content.
If I merge now all files which are correct my document is fine but if i have a corrupted file in my bunch of files any corrupted generates an empty page. I debugged it of course, but i dont see anything going wrong which explains the empty page.
the arguments are like this:
"C:\temp\Report_C_01.docx" "C:\temp\Report_D_01.docx" "C:\temp\Report_E_01.docx"
here´s my Code:
public static void Merge(params String[] filepaths)
{
String pathName = Path.GetDirectoryName(filepaths[0]);
subfolder = Path.Combine(pathName, "Output\\"); //Wird für den gemergten File benötigt
if (filepaths != null && filepaths.Length > 1)
{
WordprocessingDocument myDoc = WordprocessingDocument.Open(#filepaths[0], true); //Wordfiles werden geöffnet
MainDocumentPart mainPart = myDoc.MainDocumentPart;
for (int i = 1; i < filepaths.Length; i++)
{
String altChunkId = "AltChunkId" + i;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
FileStream fileStream = File.Open(#filepaths[i], FileMode.Open);
chunk.FeedData(fileStream);
DocumentFormat.OpenXml.Wordprocessing.AltChunk altChunk = new DocumentFormat.OpenXml.Wordprocessing.AltChunk();
altChunk.Id = altChunkId;
//new page, if you like it...
mainPart.Document.Body.AppendChild(new Paragraph(new Run(new Break() { Type = BreakValues.Page } )));
//next document
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
}
mainPart.Document.Save();
myDoc.Close();
for (int i = 0; i < 1; i++)
{
String fileNameWE = Path.GetFileName(filepaths[i]);
File.Copy(filepaths[i], subfolder + fileNameWE);
}
foreach (String fp in filepaths)
{
File.Delete(fp);
}
}
else
{
Console.WriteLine("Nur 1 Argument");
}
}
Hope someone can help me.
Best regards
Christian
Fixed it. It seems word cannot merge different Formats in one Document. So if you have 2 documents with a footer and other 3 without it just won´t work. Obviously it can happen that some customers have these kind of issues; at least the Code is fine

Import pdf at a specific page

I have the requirement to merge pdf together. I need to import a pdf at a specific page into another one.
Let me illustrate this to you.
I have two pdf, first one is 50 pages long and the second one is 4pages long. I need to import the second one at the 13th page of the first pdf.
I don't find any exemple. There are plenty exemple on how to merge pdf but nothing about merging at a specific page.
Based on this exemple it look like I need to iterate over all pages one by one and import them in a new pdf. That look a bit painfull espicially if you have big pdf and need to merge many. I would create x new pdf to merge x+1 pdf.
Is there something I don't understand or is it really the way to go?
Borrowing from the example, this should be easy to do with a few modifications. You just need to add all the pages before the merge, then all the pages from the second document, then all the rest of the original pages.
Try something like this (not tested or robust - just a starting point maybe):
// Used the ExtractPages as a starting point.
public void MergeDocuments(string sourcePdfPath1, string sourcePdfPath2,
string outputPdfPath, int insertPage) {
PdfReader reader1 = null;
PdfReader reader2 = null;
Document sourceDocument1 = null;
Document sourceDocument2 = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage = null;
try {
reader1 = new PdfReader(sourcePdfPath1);
reader2 = new PdfReader(sourcePdfPath2);
// Note, I'm assuming pages are 0 based. If that's not the case, change to 1.
sourceDocument1 = new Document(reader1.GetPageSizeWithRotation(0));
sourceDocument2 = new Document(reader2.GetPageSizeWithRotation(0));
pdfCopyProvider = new PdfCopy(sourceDocument1,
new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
sourceDocument1.Open();
sourceDocument2.Open();
int length1 = reader1.NumberOfPages;
int length2 = reader2.NumberOfPages;
int page1 = 0; // Also here I'm assuming pages are 0-based.
// Having these three loops is the key. First is pages before the merge.
for (;page1 < insertPage && page1 < length1; page1++) {
importedPage = pdfCopyProvider.GetImportedPage(reader1, page1);
pdfCopyProvider.AddPage(importedPage);
}
// These are the pages from the second document.
for (int page2 = 0; page2 < length2; page2++) {
importedPage = pdfCopyProvider.GetImportedPage(reader2, page2);
pdfCopyProvider.AddPage(importedPage);
}
// Finally, add the remaining pages from the first document.
for (;page1 < length1; page1++) {
importedPage = pdfCopyProvider.GetImportedPage(reader1, page1);
pdfCopyProvider.AddPage(importedPage);
}
sourceDocument1.Close();
sourceDocument2.Close();
reader1.Close();
reader2.Close();
} catch (Exception ex) {
throw ex;
}
}

Saving 2 copies of PDF template, 2nd file corrupt - iTextSharp

I have a PDF template file for 60 labels per page. My goal was to make copies of the template as needed, fill in the form data and then merge the files into a single PDF (or provide links to individual files...either works)
The problem is that the 2nd PDF copy comes out corrupt regardless of date.
The workflow is user selects a date. The lunch orders for that day are gathered into a generic list that in turn is used to fill in the form fields on the template. At 60, the file is saved as a temp file and a new copy of the template is used for the next 60 names, etc...
09/23/2013 through 09/25 have data. On the 25th there are only 38 orders, so this works as intended. On 09/24/2013 there are over 60 orders, the first page works, but the 2nd page is corrupt.
private List<string> CreateLabels(DateTime orderDate)
{
// create file name to save
string fName = ConvertDateToStringName(orderDate) + ".pdf"; // example 09242013.pdf
// to hold Temp File Names
List<string> tempFNames = new List<string>();
// Get path to template/save directory
string path = Server.MapPath("~/admin/labels/");
string pdfPath = path + "8195a.pdf"; // template file
// Get the students and their lunch orders
List<StudentLabel> labels = DalStudentLabel.GetStudentLabels(orderDate);
// Get number of template pages needed
decimal recCount = Convert.ToDecimal(labels.Count);
decimal pages = Decimal.Divide(recCount, 60);
int pagesNeeded = Convert.ToInt32(Math.Ceiling(pages));
// Make the temp names
for (int c = 0; c < pagesNeeded; c++)
{
tempFNames.Add(c.ToString() + fName); //just prepend a digit to the date string
}
//Create copies of the empty templates
foreach (string tName in tempFNames)
{
try
{ File.Delete(path + tName); }
catch { }
File.Copy(pdfPath, path + tName);
}
// we know we need X pages and there is 60 per page
int x = 0;
// foreach page needed
for (int pCount = 0; pCount < pagesNeeded; pCount++)
{
// Make a new page
PdfReader newReader = new PdfReader(pdfPath);
// pCount.ToString replicates temp names
using (FileStream stream = new FileStream(path + pCount.ToString() + fName, FileMode.Open))
{
PdfStamper stamper = new PdfStamper(newReader, stream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
StudentLabel lbl = null;
string lblInfo = "";
// fill in acro fields with lunch data
foreach (string fieldKey in fieldKeys)
{
try
{
lbl = labels[x];
}
catch
{
break;
} // if we're out of labels, then we're done
lblInfo = lbl.StudentName + "\n";
lblInfo += lbl.Teacher + "\n";
lblInfo += lbl.MenuItem;
form.SetField(fieldKey, lblInfo);
x++;
if (x % 60 == 0) // reached 60, time for new page
{
break;
}
}
stamper.Writer.CloseStream = false;
stamper.FormFlattening = true;
stamper.Close();
newReader.Close();
stream.Flush();
stream.Close();
}
}
return tempFNames;
}
Why are you pre-allocating your files? My guess is that's your problem. You're binding a PdfStamper to a PdfReader for input and an exact copy of the same pdf to a FileStream object for output. The PdfStamper will generate your output file for you, you don't need to help it. You're trying to append new data to an existing file and I'm not quite sure what happens in that case (as I've never actually seen anyone do it.)
So drop your whole File.Copy pre-allocation and change your FileStream declaration to:
using (FileStream stream = new FileStream(path + pCount.ToString() + fName, FileMode.Create, FileAccess.Write, FileShare.None))
You'll obviously also need to adjust how your return array gets populated, too.

Categories

Resources