I have a large (6 pages, 222 fields) fillable PDF that I am using as a template with iTextSharp PdfReader. When this object instantiates it takes 5 minutes or more. I have tried:
string pdfPath = Path.Combine(context.Server.MapPath("~/apps/ssgenpdf/App_Data"), "07-2011 Worksheets.pdf");
reader = new PdfReader(pdfPath);
alternatively I have tried reading the file into a memory stream and passing the memory stream to the PdfReader constructor. Additionally I have tried using:
reader = new PdfReader(new RandomAccessFileOrArray(pdfPath), null);
none of these alternatives show significant gains.
This is an ASP.Net app, and so my interim solution is to do this creation on application start and caching the reader, then I check to see if I get a valid reader from the cache and instantiate a new reader from that reader. Now I routinely see under 50 millisecond response from this approach.
My concern is that this does not seem scalable if others in my group want to use this "fillable PDF as template with iTextSharp" strategy. Does anyone have any suggestions for alternate strategies to balance performance with scalability?
Make sure the PDF you are using are same on the server as well as locally.sometime from source control they get corrupted.[I have face the issue with VSS with fillable pdf forms which i have created via nitro.]
Also better to ask the question and relevant forum http://forum.pdfsharp.net
Related
I have a routine that reads XPS documents and chops off pages to separate documents. Originally it read one document, decided how to chop it, closed it and wrote out the new files.
Features were added, this was causing headaches with cleaning up old files before running the routine and so I saved all the chopped pieces to be written out at the end.
ChoppedXPS is a dictionary, the key is the filename, the data is the FixedDocument prepared from the original:
foreach (String OneReport in ChoppedXPS.Keys)
{
File.Delete(OneReport);
using (XpsDocument TargetFile = new XpsDocument(OneReport, FileAccess.ReadWrite))
{
XpsDocumentWriter Writer = XpsDocument.CreateXpsDocumentWriter(TargetFile);
Writer.Write(ChoppedXPS[OneReport]);
Logger($"{OneReport} written to disk", 2);
}
Application.DoEvents();
}
If the FixedDocument being written out here contains graphics the source file is opened by the Writer.Write line and left open until the program is closed.
The XpsDocumentWriter does not seem to implement anything that can be used to clean it up.
(Yeah, that Application.DoEvents is ugly--this is an in-house program used by two people, it's not worth the hassle of making this run in the background and without it a big enough task can cause Windows to decide it's non-responsive and kill it. And, yes, I know how to indent--I took them out to make it all fit this screen.)
.Net 4.5, using some C# 8.0 features.
I found a workaround for this problem. I'm not going to try to post the whole thing as I had to change the whole data handling but the heart of it:
using (XPSDocument Source = new XPSDocument(SourceFile, FileAccess.Read)
{
[the using loop from my question]
}
I'm still hoping for understanding and something more appropriate than this approach.
Yes--this produces a warning that Source is unused, but the compiler isn't eliminating it so it does work.
I am trying to Generate a Single PDF File From Multiple Memory Streams, I am having a lot of trouble determining the proper way to merge 2 PDF MemoryStreams into one PDF MemoryStream that contains all the pages from both source PDF MemoryStreams. It seems simple and I think the code below is set up properly but the resulting PDF memory stream does not contain both the Files Combined.
I am having a lot of trouble determining the proper way to merge 2 PDF MemoryStreams into one PDF MemoryStream that contains all the pages from both source PDF MemoryStreams. It seems simple and I think the code below is set up properly but the resulting PDF memory stream does not Contain Merged Documents.
I have found multiple ways documented on the Internet as the "proper" way to do the merge. The actual sample code with iText 7 seems to be unusually complex (in that is mixes multiple concepts into one sample repeatedly - as in doesn't reduce the concept to the simplest possible code), and seems to fail to demonstrate simple concepts. For instance, their PDFMerge documentation has no sample code at all in the documentation (nor does anything else I looked at in the class documentation). The examples they have online actually always mix merging from files (not MemoryStreams or byte[]) with other concepts like adding page numbers or adding Table of Contents. So they never just show one concept and they never start with anything other than files. My PDFs are coming out of a database and we just need to merge them into one PDF memory stream and save it back out. My concern is that maybe I am not creating the MemoryStream properly when I initialize the PDFWriter. As none of their samples ever do anything but initial with an actual file, I was unable to confirm this was done properly. I also fully qualified all objects in the code because I want to leave the old iTextSharp code in place while I am upgrading to the new iText 7. This was done to make sure an iTextSharp object of the same name wasn't inadvertently being unknowingly used.
Also, in the interest of making the source as easy as possible to read I removed some of the declarations and initialization of objects being used. Everything was traced through and all values are fully loaded with proper values as you trace through the code. I am assuming the problem is that I didn't prepare the PDF objects properly or that I have to do something special with the PDFWriter on the Destination PDF Document (ms) before the the PDFMerge object.
List<byte[]> streams = new List<byte[]>();
somelist.ForEach(item=>
{
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
{
pdfWriter.SetCloseStream(false);
HtmlConverter.ConvertToPdf(strContent, pdfWriter);
streams.Add(workStream.ToArray());
pdfWriter.Close();
}
}
MemoryStream ms = new MemoryStream();
PdfWriter writer = new PdfWriter(ms);
PdfDocument document = new PdfDocument(writer);
PdfMerger merger = new PdfMerger(document);
streams.ForEach(stream =>
{
Stream msDoc = new MemoryStream(stream);
PdfDocument doc = new PdfDocument(new PdfReader(msDoc));
merger.Merge(doc, 1, doc.GetNumberOfPages());
doc.Close();
});
ByteContent = ms.ToArray();
document.Close();
Merging is a really straightforward process:
var SourceDocument1 = new PdfDocument(new PdfReader(SRC));
var SourceDocument2 = new PdfDocument(new PdfReader(SRC1));
byte[] result;
using (var memoryStream = new MemoryStream())
{
var pdfWriter = new PdfWriter(memoryStream);
var pdfDocument = new PdfDocument(pdfWriter);
PdfMerger merge = new PdfMerger(pdfDocument);
merge.Merge(SourceDocument1, 1, SourceDocument1.GetNumberOfPages())
.Merge(SourceDocument2, 1, SourceDocument2.GetNumberOfPages());
merge.Close();
result = memoryStream.ToArray();
}
File.WriteAllBytes(#"C:\temp\file.pdf", result);
this will merge SRC with SRC1.
There are a lot of examples on Github, such as this one (there is also a whole folder with merge examples).
I am writing the destination document in the end, just to make sure it's being created correctly, but you can do whatever you want to with the MemoryStream, of course.
I am using iTextSharp (5.5.5.90) to generate PDF files. I am using paragraphs and importing pages from readers and such. Here is how I create my document, from there I just append what I need:
FileStream fs = new FileStream("filename.pdf", FileMode.Create, FileAccess.Write, FileShare.None);
Document doc = new Document(new Rectangle(PageSize.LETTER), 58, 58, 100, 50);
PdfWriter writer = PdfWriter.GetInstance(doc, fs);
Once the file is created, I add paragraphs like this:
doc.Add(new Paragraph("Paragraph text"));
And import pages from readers like this:
writer.DirectContent.AddTemplate(writer.GetImportedPage(reader, page), 0, 0);
My question is how would I go back to page one after generating the entire document and add an element to page one? I will be adding a barcode (I know how to add barcodes, tables and such where I want them on the current page), but I don't know how to "go back" to page one to add an element.
Here is the full code, but you won't be able to compile it because of dependencies. Also, don't get caught up in the details of the full code, as this is a large project to create dynamically generated documents. https://pastebin.com/kABi7fzW
I can't attest as to the exactly calls to iTextSharp's as we approached our documentation quite differently; open a Word template, at the data as DataTables, etc., do a MailMerge, close and reopen and save as PDF. Sounds more involved but doesn't require the granular level of detail you're doing of creating the document paragraph by paragraph but it does allow the document generator to worry about content and not style placement (handled via Word, manually and external to the application).
From experience with iTextSharp, you'll have a lot of trouble trying to float an element on top of a section to insert the barcode. The document generation tool has an annoying tendency to not quite work in this scenario. We endured many weeks of back and forth with iTextSharp support and a version upgrade and still couldn't get it to behave properly in all scenarios.
As discussed in the comments and given how you've already written your code (I doubt you'll scrap all that code and start with a MailMerge unless you really, really have to), you'll need to insert a placeholder block that you can locate via iTextSharp's PdfBuilder api. I'd imagine that setting a bookmark location would likely be the easiest way.
If it's possible (preferable?) to have the barcode on a page of its own, then you already have the code needed to do this (circa line 324 in your pastebin link) with;
// create doc...
// reopen doc and get page count
doc.NewPage();
// add barcode with page count + 1
// save
I have a sql server db. In there are many, many rows. Each row has a column that contains a stored pdf.
The db is a gig in size. So we can expect roughly half that size is due to the pdfs.
now I have a requirement to join all those pdf's ... into 1 pdf. Don't ask why.
Can you suggest the best way forward and which component will be best suited for this job. There are many answers available:
How can I join two PDF's using iTextSharp?
Merge memorystreams to one itext document
How to merge multiple pdf files (generated in run time)?
as to how to join two (or more pdfs). But what I'm asking for is in terms of performance. We literally dealing with around 50 000 pdfs that need to be merged into 1 almighty pdf
[Edit Solution] Brought time to merge 1000 pdfs from 4m30s to 21s
public void MergePDFs(string targetPDF, string sourceDir)
{
using (FileStream stream = new FileStream(targetPDF, FileMode.Create))
{
var files = Directory.GetFiles(sourceDir);
Document pdfDoc = new Document(PageSize.A4);
PdfCopy pdf = new PdfCopy(pdfDoc, stream);
pdfDoc.Open();
Console.WriteLine("Merging files count: " + files.Length);
int i = 1;
var watch = System.Diagnostics.Stopwatch.StartNew();
foreach (string file in files)
{
Console.WriteLine(i + ". Adding: " + file);
pdf.AddDocument(new PdfReader(file));
i++;
}
if (pdfDoc != null)
pdfDoc.Close();
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
MessageBox.Show(elapsedMs.ToString());
}
}
I just did a C#/Winforms project with PDFSharp and merging images to PDFs and it worked phenomenally with a traditional folder structure. I imagine that it would work similarly with data stored PDFs so long as you can pull them into a memory stream first then merge them.
Some suggestions:
1) Recommend doing it in a multi-threaded environment so you can work on multiple PDFs at a time.
2) Open only what you need and close as soon as the operation is complete. So say you have three documents that need to be merged into one. Create a blank PDF. Open first into a memory stream, open blank. Append first to blank. Close first, save blank, close blank. Repeat for second and third. This way you control how much memory you are taking up at any one point in time. In this way I was able to append millions of images, but control memory usage.
3) Ensure you are using the Using statements when utilizing objects. This will help with memory cleanup and eliminate the need for calling garbage collector which is looked down upon.
4) Separate your business (work) from your UI as best you can so you can cancel the operation at any point in time, or view current status as it progresses through.
5) Log everything that is done so that you can go back and correct one-offs for the PDFs that didn't make it through the first pass.
I need to populate XFA form fields in a PDF (created with Adobe LiveCycle Designer). We're attempting to use iText (actually iTextSharp with C#) to parse the PDF, populate the XFA fields and then save the modified PDF back out.
All the examples I can find with iText (very few iTextSharp examples) talk about modifying AcroForm fields. This PDF does NOT have AcroForm fields and uses XFA only.
Pointers to any non-standard resources would be helpful (I've already done the requisite Googling on the topic and haven't found anything useful).
Code examples here would be awesome from anyone who has actually done what I'm trying to do.
If you can get a data packet into the PDF, the XFA runtime in Acrobat would populate those fields with the data in the data packet.
If you want to see what one of these looks like, create a form in LiveCycle Designer (comes with Acrobat Pro), add some fields to it, and save it as a dynamic PDF. Open the form in Acrobat and type some values into the fields and save it.
Open the PDF with a tool that lets you peer at the PDF data and you'll find /Catalog/AcroForm/XFA a stream that has an <xfa:datasets> packet with the values you typed. That's what you'll need to create yourself and insert into the PDF.
The XDP spec includes a description of the data packet and the merge algorithm. You can find it here:
http://partners.adobe.com/public/developer/xml/index_arch.html
Alternately, you buy the LiveCycle server from Adobe which lets you do all this programmatically in a number of ways including through web service calls.
using (FileStream existingPdf = new FileStream("existing.pdf", FileMode.Open))
using (FileStream sourceXml = new FileStream("source.xml", FileMode.Open))
using (FileStream newPdf = new FileStream("new.pdf", FileMode.Create))
{
// Open existing PDF
PdfReader pdfReader = new PdfReader(existingPdf);
// PdfStamper, which will create
PdfStamper stamper = new PdfStamper(pdfReader, newPdf);
stamper.AcroFields.Xfa.FillXfaForm(sourceXml);
stamper.Close();
pdfReader.Close();
}
iTextSharp can work with XFA. To remove all doubts, please take a look at sample on iText website:
http://itextpdf.com/examples/iia.php?id=165
I have the same issue and I think I found the solution. I am using Powershell to inspect the pdf object.
Load the iTextSharp DLL.
Add-Type -Path "C:\Users\micah\Desktop\itextsharp.dll"
Load the PDF into a variable:
$PDF = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList "C:\Users\micah\Desktop\test.pdf"
Inspect this object:
$PDF.AcroFields.XFA.DomDocument.XDP.DataSets.Data.TopMostSubForm | Get-Member
What you SHOULD see, is that all your fields on your PDF are in this object as a property.
You can get a quick view of all your fields like this:
$PDF.AcroFields.XFA.DomDocument.XDP.DataSets.Data.TopMostSubForm | Select-Object -Property "*"
That should be the magic ticket. This location is both fields from the original form AND the XFA portion of the form.
It says that in the book because itext does not do this. Can you convert your PDF?