Adding PDF from stream while creating PDF with iTextSharp - c#

I have to create a new PDF document, but I also have to attach existing PDF files to the new one.
It was easy enough to attach them at the end. But I have to attach these in the middle of the document so its easier to know which is related to which.
So, essentially it would look liked this:
New coverpage (from html)
New invoicerow (from html)
old invoiceRow (from stream)
New invoicerow (from html)
old invoiceRow (from stream)
But the issue I am running into is that the file either doesn't accept adding the stream from the already existing PDF, which means I only get the new generated rows. Or one of the existing PDFs is all I am seeing.
The code I have so far, which works for generating the new rows like I want them (before adding the existing PDFs that is) looks like this:
private async Task<byte[]> CreateHtmlString(List<Invoice> invoices)
{
byte[] bytes;
using (MemoryStream memoryStream = new MemoryStream())
{
using (Document document = new Document(PageSize.A4, 10F, 10F, 10F, 0F))
{
using (PdfWriter pdfWriter = PdfWriter.GetInstance(document, memoryStream))
{
pdfWriter.CloseStream = false;
if (!document.IsOpen())
{
document.Open();
}
StringBuilder sbHeader = new StringBuilder();
sbHeader.Append("<!DOCTYPE html>");
sbHeader.Append("<html>");
sbHeader.Append("<body>");
sbHeader.Append("<table style='table-layout: auto; width: 100%;'>");
sbHeader.Append("<tr>");
sbHeader.Append("</tr>");
foreach (var invoice in invoices)
{
sbHeader.Append("<tr>");
sbHeader.Append("</tr>");
}
sbHeader.Append("</table>");
sbHeader.Append("</body>");
sbHeader.Append("</html>");
using (StringReader srHtml = new StringReader(sbHeader.ToString()))
{
HTMLWorker htmlparser = new HTMLWorker(document);
using (MemoryStream ms = new MemoryStream())
{
htmlparser.Parse(srHtml);
}
}
foreach (var invoice in invoices)
{
document.NewPage();
StringBuilder sbRow = new StringBuilder();
sbRow.Append("<div style='page-break-before:always'> </div>");
sbRow.Append("<table style='table-layout: auto; width: 100%;'>");
sbRow.Append("<tr>");
sbRow.Append("</tr>");
foreach (var acknowledgeJournal in invoice.AcknowledgementJournals)
{
sbRow.Append("<tr>");
sbRow.Append("</tr>");
}
sbRow.Append("</table>");
sbRow.Append("<table style='table-layout: auto; width: 100%;'>");
sbRow.Append("<tr>");
sbRow.Append("</tr>");
foreach (var invoiceItem in invoice.InvoiceItems)
{
sbRow.Append("<tr>");
sbRow.Append("</tr>");
}
sbRow.Append("</table>");
sbRow.Append("</body>");
sbRow.Append("</html>");
using (StringReader srHtml = new StringReader(sbRow.ToString()))
{
HTMLWorker htmlparser = new HTMLWorker(document);
using (MemoryStream ms = new MemoryStream())
{
htmlparser.Parse(srHtml);
foreach (var attachment in invoice.Attachments)
{
var retrievedAttachment = await getPdf();
retrievedAttachment.CopyTo(memoryStream);
}
}
}
}
bytes = memoryStream.ToArray();
string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Combined.pdf");
using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
fs.Write(bytes, 0, bytes.Length);
}
return bytes;
}
}
}
}
Is it at all possible to do what I want to do this way? Or will I have to change it, and make one stream per page, and then merge them after creation?

Related

iText7 - CopyPagesTo from pdfDocument to another and write to MemoryStream

I have two html strings, I'm trying to write both of them to 2 memoryStreams, then read one pdfDocument and CopyPagesTo to the other pdfDocument, then finally return the memoryStream with the combined documents.
{
string Document = "<p>Hello to All</p>";
string newPage = "<p>Additional Page</p>";
var mainStream = new MemoryStream();
using (PdfWriter pdfWriter = new PdfWriter(mainStream))
{
pdfWriter.SetCloseStream(false);
using (var document = HtmlConverter.ConvertToDocument(Document, pdfWriter)) { }
}
mainStream.Position = 0;
var newPageStream = new MemoryStream();
using (PdfWriter pdfWriter1 = new PdfWriter(newPageStream))
{
pdfWriter1.SetCloseStream(false);
using (var document = HtmlConverter.ConvertToDocument(newPage, pdfWriter1)) { }
};
mainStream = addNewPage(mainStream, newPageStream);
return mainStream;
}
Code to Add new page :
public static MemoryStream addNewPage(MemoryStream mainStream, MemoryStream newPageStream)
{
PdfWriter writer = new PdfWriter(mainStream);
PdfDocument pdfDoc = new PdfDocument(writer);
pdfDoc.InitializeOutlines();
PdfDocument addedDoc = new PdfDocument(new PdfReader(newPageStream));
addedDoc.CopyPagesTo(1, addedDoc.GetNumberOfPages(), pdfDoc);
mainStream.Position = 0;
using (PdfReader reader = new PdfReader(mainStream))
{
writer.SetCloseStream(false);
using (pdfDoc = new PdfDocument(reader, writer))
{
int z = pdfDoc.GetNumberOfPages();
}
}
addedDoc.Close();
return mainStream;
}

ItextSharp PDF has no Header

I have created a console app that uses a memory stream to create a pdf file, to encrypt it and the add it as an attachment.
using (Stream output = new MemoryStream())
{
Document document = new Document();
using (var stream = new MemoryStream())
{
PdfWriter.GetInstance(document, stream);
document.Open();
var image = Image.GetInstance(renderedPayslip);
image.ScaleToFit(600, 820);
image.SetAbsolutePosition(2, 10);
document.Add(image);
using (var newTestStream = new MemoryStream())
{
stream.CopyTo(newTestStream);
newTestStream.Position = 0;
document.Close();
PdfReader reader = new PdfReader(stream.ToArray());
PdfStamper stamper = new PdfStamper(reader, newTestStream);
PdfEncryptor.Encrypt(reader, newTestStream, true, "secret", "secret", PdfWriter.ALLOW_PRINTING);
reader.Close();
}
//return sm.ToArray();
}
The problem is that it keeps on giving the message PDF Header not found.
Can some one help, please?

itextsharp(xmlworker) parsing is slow

I have been using iTextSharp for converting a MVC view to pdf .the view uses inline styling. Everything works fine with below code but the parsing is slow-
using (var ms = new MemoryStream())
{
using (var doc = new Document(PageSize.A4, 0, 1, 0,0))
{
using (var writer = PdfWriter.GetInstance(doc, ms))
{
doc.Open();
XMLWorkerHelper.GetInstance().ParseXHtml(writer,doc, htmlcontent);
//Above line is too slow
doc.Close();
}
}
as suggested by the experts here I moved on to below modifications-
registering fonts
Moved stylinging to diffrent css file
Now i am using the below code but the generated pdf is blank. it does retain the style but no fonts and even this approach takes same time to parse
using (var ms = new MemoryStream())
{
using (var doc = new Document(PageSize.A4, 0, 1, 0,0))
{
using (var writer = PdfWriter.GetInstance(doc, ms))
{
doc.Open();
// css
var cssResolver = new StyleAttrCSSResolver();
var cssFile = XMLWorkerHelper.GetCSS((new FileStream(Server.MapPath("~/Content/scptpdf.css"), FileMode.Open, FileAccess.Read)));
cssResolver.AddCss(cssFile);
// html
var fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.Register(Server.MapPath("~/Content/fonts/arial.ttf"));
fontProvider.Register(Server.MapPath("~/Content/fonts/arialbd."));
fontProvider.AddFontSubstitute("calibri","ARIAL");
var cssAppliers = new CssAppliersImpl(fontProvider);
var htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
var pdf = new PdfWriterPipeline(doc, writer);
var html = new HtmlPipeline(htmlContext, pdf);
var css = new CssResolverPipeline(cssResolver, html);
var worker = new XMLWorker(css,true);
var p = new XMLParser(worker);
byte[] byteArray = Encoding.UTF8.GetBytes(pdftext);
var htmlstream = new MemoryStream(byteArray);
p.Parse(htmlstream);
//XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, htmlcontent);
doc.Close();
}
}
I need to over the latency. Can some help with this. Thanks in advance.
I removed the font types. Now iTEXT sharp uses its OWN . Its fast too.

itextsharp "the document has no pages" error when i have anchor tag

I am converting some html to pdf. It is working fine but when i have anchor tag in my html i get error the document has no pages
My code is
byte[] data;
using (var sr = new StringReader(sw.ToString()))
{
var st = new StyleSheet();
GetStyleSheetForUnicodeCharacters(st);
using (var ms = new MemoryStream())
{
using (var pdfDoc = new Document())
{
using (var w = PdfWriter.GetInstance(pdfDoc, ms))
{
pdfDoc.Open();
var parsedHtmlElements = HTMLWorker.ParseToList(sr, st);
foreach (var htmlElement in parsedHtmlElements)
{
pdfDoc.Add(htmlElement as IElement);
}
pdfDoc.Close();
data = ms.ToArray();
}
}
}
}
The problem may be invalid html. One way to check is to run your html source through a validator such W3C Markup Validation Service.
have you already tried to add a Page with:
pdfDoc.NewPage();
I think your Code should look like this:
byte[] data;
using (var sr = new StringReader(sw.ToString()))
{
var st = new StyleSheet();
GetStyleSheetForUnicodeCharacters(st);
using (var ms = new MemoryStream())
{
using (var pdfDoc = new Document())
{
using (var w = PdfWriter.GetInstance(pdfDoc, ms))
{
pdfDoc.Open();
pdfDoc.NewPage(); // add Page here
var parsedHtmlElements = HTMLWorker.ParseToList(sr, st);
foreach (var htmlElement in parsedHtmlElements)
{
pdfDoc.Add(htmlElement as IElement);
}
pdfDoc.Close();
data = ms.ToArray();
}
}
}
}
You can also add a blank Page by using:
pdfDoc.newPage();
w.setPageEmpty(false);
MfG chris
Need to check that any html tags are mismatched. Example /td>, this types of mistake raised above error.

iTextSharp is producing a corrupt PDF

The code snippet below returns a corrupt PDF document however if I return mergedDocument instead it always returns a valid PDF. mergedDocument is based on a PDF file i created using Word, whereas completed document is entirely programmatically generated. The code "works" in that it throws no exceptions. Why is iTextSharp creating a corrupt PDF?
byte[] completedDocument = null;
using (MemoryStream streamCompleted = new MemoryStream())
{
using (Document document = new Document())
{
PdfCopy copy = new PdfCopy(document, streamCompleted);
document.Open();
copy.Open();
foreach (var item in eventItems)
{
byte[] mergedDocument = null;
PdfReader reader = new PdfReader(pdfTemplates[item.DataTokens[NotifyTokenType.OrganisationID]]);
using (MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, streamTemplate))
{
foreach (var token in item.DataTokens)
{
if (stamper.AcroFields.Fields.Any(fld => fld.Key == token.Key.ToString()))
{
stamper.AcroFields.SetField(token.Key.ToString(), token.Value);
}
}
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
}
mergedDocument = new byte[streamTemplate.Length];
streamTemplate.Position = 0;
streamTemplate.Read(mergedDocument, 0, (int)streamTemplate.Length);
}
reader = new PdfReader(mergedDocument);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
document.SetPageSize(PageSize.A4);
copy.AddPage(copy.GetImportedPage(reader, i));
}
}
completedDocument = new byte[streamCompleted.Length];
streamCompleted.Position = 0;
streamCompleted.Read(completedDocument, 0, (int)streamCompleted.Length);
}
}
return completedDocument;
You need to close the document and copy objects to flush the PDF writing buffer. This, however, causes some problems when trying to read the stream into an array. The fix for that is to use the ToArray() method of the MemoryStream which still works on closed streams. The changes I made have comments on them.
byte[] completedDocument = null;
using (MemoryStream streamCompleted = new MemoryStream())
{
using (Document document = new Document())
{
PdfCopy copy = new PdfCopy(document, streamCompleted);
document.Open();
copy.Open();
foreach (var item in eventItems)
{
byte[] mergedDocument = null;
PdfReader reader = new PdfReader(pdfTemplates[item.DataTokens[NotifyTokenType.OrganisationID]]);
using (MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, streamTemplate))
{
foreach (var token in item.DataTokens)
{
if (stamper.AcroFields.Fields.Any(fld => fld.Key == token.Key.ToString()))
{
stamper.AcroFields.SetField(token.Key.ToString(), token.Value);
}
}
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
}
//Copy the stream's bytes
mergedDocument = streamTemplate.ToArray();
}
reader = new PdfReader(mergedDocument);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
document.SetPageSize(PageSize.A4);
copy.AddPage(copy.GetImportedPage(reader, i));
}
//Close the document and the copy
document.Close();
copy.Close();
}
//ToArray() can operate on closed streams
completedDocument = streamCompleted.ToArray();
}
}
return completedDocument;
Also make sure your html doesn't contains hr tag while converting html to pdf
hdnEditorText.Value.Replace("\"", "'").Replace("<hr />", "").Replace("<hr/>", "")

Categories

Resources