I have a C# .NET program running in Visual Studio. Its function is to merge PDF files together based on their file names.
It works fine for the first 12 or so PDFs but then suddenly exits with code 0 at a certain point. I am not using multi-threading and the PDF library I am using is iTextSharp.
public static string MergePDFs(List<string> fileNames, ref string targetPdf)
{
string merged = "";
using (FileStream stream = new FileStream(targetPdf, FileMode.Create))
{
Document document = new Document();
PdfCopy pdf = new PdfCopy(document, stream);
PdfReader reader = null;
try
{
document.Open();
foreach (var file in fileNames)
{
reader = new PdfReader(file);
pdf.AddDocument(reader);
reader.Close();
}
}
catch (Exception)
{
// merged = "false";
if (reader != null) reader.Close();
}
finally
{
if (document != null)
{
Console.WriteLine("Closing document.");
document.Close(); // exits here
Console.WriteLine("Document closed.");
}
}
}
return merged;
}
The output in the debugger is:
...
Closing document.
The program has exited with code 0 (0x0)
Printed out the exception via the catch block but it was never reached. The try block executed successfully. Removing the try/catch/finally blocks 'fixed' the issue.
This is only a partial fix as there is no longer a way to handle errors now.
public static string MergePDFs(List<string> fileNames, ref string targetPdf)
{
string merged = "";
using (FileStream stream = new FileStream(targetPdf, FileMode.Create))
{
Document document = new Document();
PdfCopy pdf = new PdfCopy(document, stream);
PdfReader reader = null;
document.Open();
foreach (var file in fileNames)
{
reader = new PdfReader(file);
pdf.AddDocument(reader);
reader.Close();
}
document.Close();
}
return merged;
}
Related
I need to split a pdf and I have using this code:
using iTextSharp.text;
using iTextSharp.text.pdf;
private static void PDF_ExtractPages(string pdfFilePath, string outputPath)
{
int pageNameSuffix = 0;
// Intialize a new PdfReader instance with the contents of the source Pdf file:
//PdfReader reader = new PdfReader(pdfFilePath);
using (PdfReader reader = new PdfReader(pdfFilePath))
{
FileInfo file = new FileInfo(pdfFilePath);
string pdfFileName = file.Name.Substring(0, file.Name.LastIndexOf(".")) + "-";
for (int pageNumber = 1; pageNumber <= reader.NumberOfPages; pageNumber++)
{
pageNameSuffix++;
string newPdfFileName = string.Format(pdfFileName + "{0}", pageNameSuffix);
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileStream(outputPath + "\\" + newPdfFileName + ".pdf", FileMode.Create));
document.Open();
copy.AddPage(copy.GetImportedPage(reader, pageNumber));
document.Close();
copy.Close();
}
}
}
and I get this error on line:
using (PdfReader reader = new PdfReader(pdfFilePath))
type used in a using statement must be implicitly convertible to 'System.IDisposable'
if I remove the using with this code:
PdfReader reader = new PdfReader(pdfFilePath);
the code work but not dispose reader object, and when I try to open a PDF I get a message like: "this document is open by another user".
There is a way to dispose PdfReader and all inside code ?
Thanks !
That's why it doesn't implement IDisposable.
You can use a try-catch block which is a equivalent way.
PdfReader reader = null;
try
{
reader = new PdfReader(pdfFilePath);
// Some code here
}
catch (Exception)
{
// Take some action.
}
finally
{
reader.Dispose();
}
Getting merged pdf but corrupted file that I can't open using the below code. The 'targetPDF' is the final merged pdf file and 'fileNames' has all the single pdfs. Please help. Thanks in advance.
Using (FileStream stream = new FileStream(targetPDF, FileMode.Create, FileAccess.Write))
{
Document document = new Document();
PdfCopy pdf = new PdfCopy(document, stream);
if (pdf == null)
{
return;
}
document.Open();
foreach (string file in fileNames)
{
PdfReader reader = new PdfReader(file);
reader.ConsolidateNamedDestinations();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = pdf.GetImportedPage(reader, i);
pdf.AddPage(page);
//pdf.AddDocument(new PdfReader(file));
// pdf.AddPage(pdf.GetImportedPage(reader, 1));
}
reader.Close();
}
}
Change this lines
Document document = new Document();
PdfCopy pdf = new PdfCopy(document, stream);
To :
using(Document document = new Document())
{
using(PdfCopy pdf = new PdfCopy(document, stream))
{
//do staff here...
}
}
So that after the work is done all streams close and files were not locked.
I have an issue with trying to create a large PDF file. Basically I have a list of byte arrays, each containing a PDF in a form of a byte array. I wanted to merge the byte arrays into a single PDF. This works great for smaller files (under 2000 pages), but when I tried creating a 12,00 page file it bombed). Originally I was using MemoryStream but after some research, a common solution was to use a FileStream instead. So I tried a file stream approach, however get similar results. The List contains 3,800 records, each containing 4 pages. MemoryStream bombs after around 570. FileStream after about 680 records. The current file size after the code crashed was 60MB. What am I doing wrong? Here is the code I have, and the code crashes on "copy.AddPage(curPg);" directive, inside the "for(" loop.
private byte[] MergePDFs(List<byte[]> PDFs)
{
iTextSharp.text.Document doc = new iTextSharp.text.Document();
byte[] completePDF;
Guid uniqueId = Guid.NewGuid();
string tempFileName = Server.MapPath("~/" + uniqueId.ToString() + ".pdf");
//using (MemoryStream ms = new MemoryStream())
using(FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
{
iTextSharp.text.pdf.PdfCopy copy = new iTextSharp.text.pdf.PdfCopy(doc, ms);
doc.Open();
int i = 0;
foreach (byte[] PDF in PDFs)
{
i++;
// Create a reader
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(PDF);
// Cycle through all the pages
for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
{
// Read a page
iTextSharp.text.pdf.PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);
// Add the page over to the rest of them
copy.AddPage(curPg);
}
// Close the reader
reader.Close();
}
// Close the document
doc.Close();
// Close the copier
copy.Close();
// Convert the memorystream to a byte array
//completePDF = ms.ToArray();
}
//return completePDF;
return GetPDFsByteArray(tempFileName);
}
A couple of notes:
PdfCopy implements iDisposable, so you should try and see if a using helps.
PdfCopy.FreeReader() will help.
Anyway, not sure if you're using MVC or WebForms, but here's a simple working HTTP handler tested with a 15 page 125KB test file that runs on my workstation:
<%# WebHandler Language="C#" Class="MergeFiles" %>
using System;
using System.Collections.Generic;
using System.Web;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
public class MergeFiles : IHttpHandler
{
public void ProcessRequest(HttpContext context)
{
List<byte[]> pdfs = new List<byte[]>();
var pdf = File.ReadAllBytes(context.Server.MapPath("~/app_data/test.pdf"));
for (int i = 0; i < 4000; ++i) pdfs.Add(pdf);
var Response = context.Response;
Response.ContentType = "application/pdf";
Response.AddHeader(
"content-disposition",
"attachment; filename=MergeLotsOfPdfs.pdf"
);
Response.BinaryWrite(MergeLotsOfPdfs(pdfs));
}
byte[] MergeLotsOfPdfs(List<byte[]> pdfs)
{
using (var ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
for (int i = 0; i < pdfs.Count; ++i)
{
using (PdfReader reader = new PdfReader(
new RandomAccessFileOrArray(pdfs[i]), null))
{
copy.AddDocument(reader);
copy.FreeReader(reader);
}
}
}
}
return ms.ToArray();
}
}
public bool IsReusable { get { return false; } }
}
Tried to make the output file similar to what you described in the question, but YMMV, depending on how large the individual PDFs you're dealing with are in size. Here's the test output from my run:
So after a lot of messing around, I realized that there just was no way around it. However, I did manage to find a work-around. Instead of returning byte array, I return a temp file path, which I then transmit and delete there after.
private string MergeLotsOfPDFs(List<byte[]> PDFs)
{
Document doc = new Document();
Guid uniqueId = Guid.NewGuid();
string tempFileName = Server.MapPath("~/__" + uniqueId.ToString() + ".pdf");
using (FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
{
PdfCopy copy = new PdfCopy(doc, ms);
doc.Open();
int i = 0;
foreach (byte[] PDF in PDFs)
{
i++;
// Create a reader
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(PDF), null);
// Cycle through all the pages
for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
{
// Read a page
PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);
// Add the page over to the rest of them
copy.AddPage(curPg);
// This is a lie, it still costs money, hue hue hue :)~
copy.FreeReader(reader);
}
reader.Close();
}
// Close the document
doc.Close();
// Close the document
copy.Close();
}
// Return temp file path
return tempFileName;
}
And here is how I send that data to the client.
// Send the merged PDF file to the user.
System.Web.HttpResponse response = System.Web.HttpContext.Current.Response;
response.ClearContent();
Response.ClearHeaders();
response.ContentType = "application/pdf";
response.AddHeader("Content-Disposition", "attachment; filename=1094C.pdf;");
response.WriteFile(tempFileName);
HttpContext.Current.Response.Flush(); // Sends all currently buffered output to the client.
DeleteFile(tempFileName); // Call right after flush but before close
HttpContext.Current.Response.SuppressContent = true; // Gets or sets a value indicating whether to send HTTP content to the client.
HttpContext.Current.ApplicationInstance.CompleteRequest(); // Causes ASP.NET to bypass all events and filtering in the HTTP pipeline chain of execution and directly execute the EndRequest event.
Lastly, here is a fancy DeleteFile method
private void DeleteFile(string fileName)
{
if (File.Exists(fileName))
{
try
{
File.Delete(fileName);
}
catch (Exception ex)
{
//Could not delete the file, wait and try again
try
{
System.GC.Collect();
System.GC.WaitForPendingFinalizers();
File.Delete(fileName);
}
catch
{
//Could not delete the file still
}
}
}
}
We have a system that stores some custom templating data in a Word document. Sometimes, updating this data causes Word to complain that the document is corrupted. When that happens, if I unzip the docx file and compare the contents to the previous version, the only difference appears to be the expected change in the customXML\item.xml file. If I re-zip the contents using 7zip, it seems to work OK (Word no longer complains that the document is corrupt).
The (simplified) code:
void CreateOrReplaceCustomXml(string filename, MyCustomData data)
{
using (var doc = WordProcessingDocument.Open(filename, true))
{
var part = GetCustomXmlParts(doc).SingleOrDefault();
if (part == null)
{
part = doc.MainDocumentPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
}
var serializer = new DataContractSerializer(typeof(MyCustomData));
using (var stream = new MemoryStream())
{
serializer.WriteObject(stream, data);
stream.Seek(0, SeekOrigin.Begin);
part.FeedData(stream);
}
}
}
IEnumerable<CustomXmlPart> GetCustomXmlParts(WordProcessingDocument doc)
{
return doc.MainDocumentPart.CustomXmlParts
.Where(part =>
{
using (var stream = doc.Package.GePart(c.Uri).GetStream())
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd().Contains("Some.Namespace");
}
});
}
Any suggestions?
Since re-zipping works, it seems the content is well-formed.
So it sounds like the zip process is at fault. So open the corrupted docx in 7-Zip, and take note of the values in the "method" column (especially for customXML\item.xml).
Compare that value to a working docx - is it the same or different? Method "Deflate" works.
I faced the same issue and it turned out it was due to encoding.
Do you already specify the same encoding when serializing/deserializing?
Couple of suggestion
a. Try doc.Package.Flush(); after you write the data back into the custom xml.
b. You may have to delete all custom part and add a new custom part. We are using the following code and it seems working fine.
public static void ReplaceCustomXML(WordprocessingDocument myDoc, string customXML)
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
mainPart.DeleteParts<CustomXmlPart>(mainPart.CustomXmlParts);
CustomXmlPart customXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (StreamWriter ts = new StreamWriter(customXmlPart.GetStream()))
{
ts.Write(customXML);
ts.Flush();
ts.Close();
}
}
public static MemoryStream GetCustomXmlPart(MainDocumentPart mainPart)
{
foreach (CustomXmlPart part in mainPart.CustomXmlParts)
{
using (XmlTextReader reader =
new XmlTextReader(part.GetStream(FileMode.Open, FileAccess.Read)))
{
reader.MoveToContent();
if (reader.Name.Equals("aaaa", StringComparison.OrdinalIgnoreCase))
{
string str = reader.ReadOuterXml();
byte[] byteArray = Encoding.ASCII.GetBytes(str);
MemoryStream stream = new MemoryStream(byteArray);
return stream;
}
}
}
return null; //result;
}
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(ms, true))
{
StreamReader reader = new StreamReader(memStream);
string FullXML = reader.ReadToEnd();
ReplaceCustomXML(myDoc, FullXML);
myDoc.Package.Flush();
//Code to save file
}
The code snippet below returns a corrupt PDF document however if I return mergedDocument instead it always returns a valid PDF. mergedDocument is based on a PDF file i created using Word, whereas completed document is entirely programmatically generated. The code "works" in that it throws no exceptions. Why is iTextSharp creating a corrupt PDF?
byte[] completedDocument = null;
using (MemoryStream streamCompleted = new MemoryStream())
{
using (Document document = new Document())
{
PdfCopy copy = new PdfCopy(document, streamCompleted);
document.Open();
copy.Open();
foreach (var item in eventItems)
{
byte[] mergedDocument = null;
PdfReader reader = new PdfReader(pdfTemplates[item.DataTokens[NotifyTokenType.OrganisationID]]);
using (MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, streamTemplate))
{
foreach (var token in item.DataTokens)
{
if (stamper.AcroFields.Fields.Any(fld => fld.Key == token.Key.ToString()))
{
stamper.AcroFields.SetField(token.Key.ToString(), token.Value);
}
}
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
}
mergedDocument = new byte[streamTemplate.Length];
streamTemplate.Position = 0;
streamTemplate.Read(mergedDocument, 0, (int)streamTemplate.Length);
}
reader = new PdfReader(mergedDocument);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
document.SetPageSize(PageSize.A4);
copy.AddPage(copy.GetImportedPage(reader, i));
}
}
completedDocument = new byte[streamCompleted.Length];
streamCompleted.Position = 0;
streamCompleted.Read(completedDocument, 0, (int)streamCompleted.Length);
}
}
return completedDocument;
You need to close the document and copy objects to flush the PDF writing buffer. This, however, causes some problems when trying to read the stream into an array. The fix for that is to use the ToArray() method of the MemoryStream which still works on closed streams. The changes I made have comments on them.
byte[] completedDocument = null;
using (MemoryStream streamCompleted = new MemoryStream())
{
using (Document document = new Document())
{
PdfCopy copy = new PdfCopy(document, streamCompleted);
document.Open();
copy.Open();
foreach (var item in eventItems)
{
byte[] mergedDocument = null;
PdfReader reader = new PdfReader(pdfTemplates[item.DataTokens[NotifyTokenType.OrganisationID]]);
using (MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, streamTemplate))
{
foreach (var token in item.DataTokens)
{
if (stamper.AcroFields.Fields.Any(fld => fld.Key == token.Key.ToString()))
{
stamper.AcroFields.SetField(token.Key.ToString(), token.Value);
}
}
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
}
//Copy the stream's bytes
mergedDocument = streamTemplate.ToArray();
}
reader = new PdfReader(mergedDocument);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
document.SetPageSize(PageSize.A4);
copy.AddPage(copy.GetImportedPage(reader, i));
}
//Close the document and the copy
document.Close();
copy.Close();
}
//ToArray() can operate on closed streams
completedDocument = streamCompleted.ToArray();
}
}
return completedDocument;
Also make sure your html doesn't contains hr tag while converting html to pdf
hdnEditorText.Value.Replace("\"", "'").Replace("<hr />", "").Replace("<hr/>", "")