iTextSharp PDF header with HTML string C#

iTextSharp PDF header with HTML string C# - c#

I'm trying to generate PDF reports using iTextSharp with customer information, header and footer etc. All these reports are already generated using EVO APIs. As part of a migration process, we are planning to generate these reports using iTextSharp APIs.
I need to know if there is any possibility to provide a ready to render HTML string to iTextSharp PDF header (Existing EVO design accepts HTML string and build PDF), instead of using PageEvents to design with PDFPTable and PDFPCell (as the number of reports are huge and to avoid rework)

I need to know if there is any possibility to provide a ready to render HTML string to iTextSharp PDF header (Existing EVO design accepts HTML string and build PDF), instead of using PageEvents to design with PDFPTable and PDFPCell
You will have to use page events to draw header or footers but there is no need to use PdfPTable explicitly there. You actually can render html during a page event, e.g. like this:
[Test]
public void CreatePdfWithHtmlHeader()
{
string htmlHeader = "<!DOCTYPE html><html><body><table style=\"width: 100%; border: 1px solid black;\"><tr><td>A</td><td>B</td></tr></table></body></html>";
using (FileStream output = new FileStream(#"C:\Temp\test-results\content\html-header.pdf", FileMode.Create, FileAccess.Write))
using (Document document = new Document(PageSize.A4))
{
PdfWriter writer = PdfWriter.GetInstance(document, output);
writer.PageEvent = new HtmlPageEventHelper(htmlHeader);
document.Open();
document.Add(new Paragraph("1"));
document.NewPage();
document.Add(new Paragraph("2"));
}
}
making use the following two small helper classes.
HtmlPageEventHelper is a page event listener drawing a given html sniplet into the page header. Obviously it can alternatively or additionally write into the page footer, simply use appropriate column coordinates
public class HtmlPageEventHelper : PdfPageEventHelper
{
public HtmlPageEventHelper(string html)
{
this.html = html;
}
public override void OnEndPage(PdfWriter writer, Document document)
{
base.OnEndPage(writer, document);
ColumnText ct = new ColumnText(writer.DirectContent);
XMLWorkerHelper.GetInstance().ParseXHtml(new ColumnTextElementHandler(ct), new StringReader(html));
ct.SetSimpleColumn(document.Left, document.Top, document.Right, document.GetTop(-20), 10, Element.ALIGN_MIDDLE);
ct.Go();
}
string html = null;
}
For more complex HTML sniplets you may want to replace the XMLWorkerHelper.GetInstance().ParseXHtml call by a customized parser call as presented in #Skary's answer.
ColumnTextElementHandler is an IElementHandler implementation that adds content (generated e.g. by parsing HTML) to a ColumnText
public class ColumnTextElementHandler : IElementHandler
{
public ColumnTextElementHandler(ColumnText ct)
{
this.ct = ct;
}
ColumnText ct = null;
public void Add(IWritable w)
{
if (w is WritableElement)
{
foreach (IElement e in ((WritableElement)w).Elements())
{
ct.AddElement(e);
}
}
}
}
By the way, the test above produces a PDF with this content:
...
...
Disclaimer: I predominantly work with Java and have not used the XmlWorker before. Thus, this code may have considerable potential for improvement.

I am not sure to have understand you question right.
If you are asking how to parse HTML to PDF using iTextSharp here is the solutin i found time ago :
using (Document document = new Document(size))
{
var writer = PdfWriter.GetInstance(document, stream);
document.Open();
document.NewPage();
document.Add(new Chunk(""));
var tagProcessors = (DefaultTagProcessorFactory)Tags.GetHtmlTagProcessorFactory();
tagProcessors.RemoveProcessor(HTML.Tag.IMG);
tagProcessors.AddProcessor(HTML.Tag.IMG, new CustomImageTagProcessor());
var charset = Encoding.UTF8;
CssFilesImpl cssFiles = new CssFilesImpl();
cssFiles.Add(XMLWorkerHelper.GetInstance().GetDefaultCSS());
var cssResolver = new StyleAttrCSSResolver(cssFiles);
cssResolver.AddCss(srcCssData, "utf-8", true);
var hpc = new HtmlPipelineContext(new CssAppliersImpl(new XMLWorkerFontProvider()));
hpc.SetAcceptUnknown(true).AutoBookmark(true).SetTagFactory(tagProcessors);
var htmlPipeline = new HtmlPipeline(hpc, new PdfWriterPipeline(document, writer));
var pipeline = new CssResolverPipeline(cssResolver, htmlPipeline);
var worker = new XMLWorker(pipeline, true);
var xmlParser = new XMLParser(true, worker, charset);
xmlParser.Parse(new StringReader(srcFileData));
document.Close();
}
To get it work you need to add custom image processor to inline image in the HTML you provide to tha above converte function :
public class CustomImageTagProcessor : iTextSharp.tool.xml.html.Image
{
public override IList<IElement> End(IWorkerContext ctx, Tag tag, IList<IElement> currentContent)
{
IDictionary<string, string> attributes = tag.Attributes;
string src;
if (!attributes.TryGetValue(HTML.Attribute.SRC, out src))
return new List<IElement>(1);
if (string.IsNullOrEmpty(src))
return new List<IElement>(1);
if (src.StartsWith("data:image/", StringComparison.InvariantCultureIgnoreCase))
{
// data:[<MIME-type>][;charset=<encoding>][;base64],<data>
var base64Data = src.Substring(src.IndexOf(",") + 1);
var imagedata = Convert.FromBase64String(base64Data);
var image = iTextSharp.text.Image.GetInstance(imagedata);
var list = new List<IElement>();
var htmlPipelineContext = GetHtmlPipelineContext(ctx);
list.Add(GetCssAppliers().Apply(new Chunk((iTextSharp.text.Image)GetCssAppliers().Apply(image, tag, htmlPipelineContext), 0, 0, true), tag, htmlPipelineContext));
return list;
}
else
{
return base.End(ctx, tag, currentContent);
}
}
}

Related

IText7 PDFHtml generator with header and footer for C#

I am trying to generating a PDF from predefined HTML content. I managed to generate the content, yet without the required HTML Header, HTML Footer, and Arabic language is not supported as well.
My requirements:
Arabic language support.
The ability to generate more than 10 pages.
The footer may differ from one page to another.
There is a web application that sends a request to a WCF service, and the service returns a byte array containing the PDF.
So, I have been searching for a couple of days for a good tool and I found SelectPdf, it is perfect except that it is NOT free, so the only solution is IText7 PDFHtml.
The thing is this library has good documentation for JAVA, and I am really struggling in following the C# examples and converting from JAVA API to C# code.
Anyone has done something similar before with c#?

After a long process of searching and trying, I have got it working and achieved the following features:
Image in the header.
Base64 image in the footer, in addition to the ability to write some text in the other side in the footer.
Generating the same footer for all pages except for the last one.
the number of the generated pages was unlimited.
Page numbering.
All the previous features were free of charge, however, supporting Arabic language needs a license, so I have to pay anyway :)
Kindly find below my C# code and post your improvements if you have any.
public class Pdfgenerator
{
public const string FONT = "Fonts/NotoNaskhArabic-Regular2.ttf";
public static string HEADER_TEXT = "<table width=\"100%\" border=\"0\"><tr><td>Header</td><td align=\"right\">Some title</td></tr></table>";
public static string FOOTER_TEXT = "<table width=\"100%\" border=\"0\"><tr><td>Footer</td><td align=\"right\">Some title</td></tr></table>";
public MemoryStream createPdf()
{
string apPath = System.Web.Hosting.HostingEnvironment.ApplicationPhysicalPath;
MemoryStream file = new MemoryStream();
PdfDocument pdfDocument = null;
PdfDocument pdfDoc = null;
PdfDocument pdfDocument1 = null;
try
{
using (file)
{
PdfFont f = PdfFontFactory.CreateFont(apPath + FONT, PdfEncodings.IDENTITY_H);
string header = "pdfHtml Header and footer example using page-events";
Header headerHandler = new Header(header);
Footer footerHandler = new Footer();
ConverterProperties converterProperties = new ConverterProperties();
PdfWriter writer1 = new PdfWriter(apPath + "test1.pdf");
pdfDocument1 = new PdfDocument(writer1);
pdfDocument1.AddEventHandler(PdfDocumentEvent.START_PAGE, headerHandler);
pdfDocument1.AddEventHandler(PdfDocumentEvent.END_PAGE, footerHandler);
converterProperties = new ConverterProperties().SetBaseUri(apPath);
HtmlConverter.ConvertToDocument(File.ReadAllText(apPath + "content.html"), pdfDocument1, converterProperties);
footerHandler.lastPage = pdfDocument1.GetLastPage();
pdfDocument1.Close();
}
}
catch (Exception ex)
{
}
finally
{
file.Dispose();
}
return new MemoryStream();
}
}
Generating the header:
class Header : IEventHandler
{
private string header;
private Image image;
public Header(string header)
{
string apPath = System.Web.Hosting.HostingEnvironment.ApplicationPhysicalPath;
this.header = header;
image = new Image(ImageDataFactory.Create(apPath + "Images/RANDOM_PHOTO.jpg"));
}
public void HandleEvent(Event #event)
{
PdfDocumentEvent docEvent = (PdfDocumentEvent)#event;
PdfDocument pdf = docEvent.GetDocument();
PdfPage page = docEvent.GetPage();
Rectangle pageSize = page.GetPageSize();
Canvas canvas = new Canvas(new PdfCanvas(page), pdf, pageSize);
canvas.SetFontSize(18);
// Write text at position
canvas.Add(image);
canvas.Close();
}
}
Generating the footer:
class Footer : IEventHandler
{
public PdfPage lastPage = null;
protected PdfFormXObject placeholder;
protected float side = 20;
protected float x = 300;
protected float y = 25;
protected float space = 4.5f;
protected float descent = 3;
public Footer()
{
placeholder = new PdfFormXObject(new Rectangle(0, 0, side, side));
}
public void HandleEvent(Event #event)
{
PdfDocumentEvent docEvent = (PdfDocumentEvent)#event;
PdfDocument pdf = docEvent.GetDocument();
PdfPage page = docEvent.GetPage();
int pageNumber = pdf.GetPageNumber(page);
Rectangle pageSize = page.GetPageSize();
// Creates drawing canvas
PdfCanvas pdfCanvas = new PdfCanvas(page);
Canvas canvas = new Canvas(pdfCanvas, pdf, pageSize);
IList<iText.Layout.Element.IElement> elements = HtmlConverter.ConvertToElements("<table border=\"0\"><tr><td><img src=\"data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==\" alt=\"Italian Trulli\"></td></tr></table>");
Paragraph p = new Paragraph();
foreach (iText.Layout.Element.IElement e in elements)
{
if (e is IBlockElement) {
p.Add((IBlockElement)e);
}
}
if (lastPage == docEvent.GetPage()) {
}
else
{
canvas.ShowTextAligned(p, 25, 75, TextAlignment.LEFT);
}
canvas.Close();
// Create placeholder object to write number of pages
pdfCanvas.AddXObject(placeholder, x + space, y - descent);
pdfCanvas.Release();
}
public void writeTotal(PdfDocument pdf)
{
Canvas canvas = new Canvas(placeholder, pdf);
canvas.ShowTextAligned(pdf.GetNumberOfPages().ToString(),
0, descent, TextAlignment.LEFT);
canvas.Close();
}
}
I was trying to get a stream as an output, so if you want that as well you can use the following in your main service:
public byte[] GetData()
{
MemoryStream stream = new Pdfgenerator().createPdf();
byte[] arr = stream.ToArray();
return stream.ToArray();
}

itextsharp html to pdf with different oriontation

I need to create single pdf using few html pages. Actually HTMLs have tables. Each HTMLs(table) has different number of columns, hence it should have to export pdf with difference oriontations.
Eg :
htmlPg1 --> 4 columns
htmlPg2 --> 15 columns
According to above scenario, it is needed to comes up the first html page with portrait mode and second html with landscape.
in below 'Code block 02' lst is a list which having 2 attributes. (Please see 'Code Block 01')
If the lst Oriantation is assigned 0, it is considered as Landscape and otherwise portrait.
Code Block 01
public class PdfExportDoc
{
public int Oriantation { get; set; }
public string Html { get; set; }
}
All are working correctly except the orientation.
Code Block 02
using (var ms = new MemoryStream())
{
using (var doc = new Document())
{
using (var writer = PdfWriter.GetInstance(doc, ms))
{
doc.Open();
foreach (var ele in lst)
{
using (var srHtml = new StringReader(ele.Html))
{
if (ele.Oriantation == 0)
{
doc.SetPageSize(PageSize.A4.Rotate());
}
else
{
doc.SetPageSize(PageSize.A4);
}
XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
doc.NewPage();
}
}
doc.Close();
}
}
bytes = ms.ToArray();
}
htmlPg1 data dragged for 2 pages and htmlPg2 has only one. This is how the data comes as pdf.
But actually I need the pdf like this.
Please show me a direction for doing this.

This solved my problem.
I get your point Bruno. You have said in your deleted answer NewPage will not added a new page if it is blank. So I added doc.NewPage to both before and after creating paraseXHtml. How ever thanks for your previous direction.
foreach (var ele in lst)
{
using (var srHtml = new StringReader(ele.Html))
{
if (ele.Oriantation == 1)
{
doc.SetPageSize(PageSize.A4.Rotate());
}
else
{
doc.SetPageSize(PageSize.A4);
}
doc.NewPage();
XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
doc.NewPage();
}
}

“corrupt” document when Streaming In Memory Merge Word Document using OpenXML

I am trying merge several Word Document using OpenXML on ASP.NET MVC 5. But I am constantly getting a message from Microsoft Word that the document is corrupt.
private Stream GenerateDocument(DocumentType documentType)
{
using (var templateStream = File.OpenRead(GetTemplatePath(documentType)))
{
//some code
var result = documentGenerator.Generate();
return result;
}
}
private Stream MergeDocuments(DocumentLibraryModel documentLibrary)
{
var documentTypes = documentLibrary.DocumentTypes.GetEnumerator();
var mainStream = GenerateDocument(documentTypes.Current);
using (WordprocessingDocument mainDocument = WordprocessingDocument.Open(mainStream, true))
{
XElement newBody = XElement.Parse(mainDocument.MainDocumentPart.Document.Body.OuterXml);
documentTypes.MoveNext();
while (documentTypes.MoveNext())
{
WordprocessingDocument tempDocument = WordprocessingDocument.Open(GenerateDocument(documentTypes.Current), true);
XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml);
newBody.Add(tempBody);
mainDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString());
mainDocument.MainDocumentPart.Document.Save();
mainDocument.Package.Flush();
}
}
return mainStream;
}
However the document opens as corrupted.
Any ideas?

Problem lies in this:
XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml);
newBody.Add(tempBody);
You are adding body to body which generates invalid Word document. Word document can contain only one Body at the time.
I would recommend cloning elements instead of parsing XML.
You can do this:
using (WordprocessingDocument mainDocument = WordprocessingDocument.Open(mainStream, true))
{
mainDocument.MainDocumentPart.Document.Body = new Body();
documentTypes.MoveNext();
while (documentTypes.MoveNext())
{
using (WordprocessingDocument tempDocument = WordprocessingDocument.Open(GenerateDocument(documentTypes.Current)))
{
foreach (var element in tempDocument.MainDocumentPart.Document.Body.Elements)
{
mainDocument.MainDocumentPart.Document.Body.AppendChild(element.CloneNode(true));
}
}
}
mainDocument.MainDocumentPart.Document.Save();
}

import named destinations in pdf

I m developing an application in which a word document is converted in pdf. My problem is too complicated please help me out.
My word doc has a toc, bookmarks, endnotes and hyperlinks. when I save this doc as pdf, only bookmarks are converted. After a long research I found that PDF documents does not support bookmark to bookmark hyperlinks, it needs either page number or named destinations.
So I choose named destinations for this purpose, but I am stuck again , because simple "save as" cannot generate named destinations in the pdf doc. So I print the word doc on adobe PDF printer and I got named destination as required, but again this document neither have bookmarks in it nor hyperlinks. so what I decided that I generate two pdf from a word, first by save as option and second one is by printing.
test.pdf (by save as) (contains bookmarks, hyperlinks)
test_p.pdf( by printing) (only contains named destination)
then I research ones again and found a way to extract all named destination from test_p.pdf into XML by a function of itextsharp.but unfortunately I dont get any way to import back this xml in test.pdf.. thats why I came here.
Guide me what to do next if this approach is ok. else suggest me any ohter approach to accomplish this mission.

I wrote a class to replace urls in my PDF files some times ago:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using iTextSharp.text.pdf;
namespace ReplaceLinks
{
public class ReplacePdfLinks
{
Dictionary<string, PdfObject> _namedDestinations;
PdfReader _reader;
public string InputPdf { set; get; }
public string OutputPdf { set; get; }
public Func<Uri, string> UriToNamedDestination { set; get; }
public void Start()
{
updatePdfLinks();
saveChanges();
}
private PdfArray getAnnotationsOfCurrentPage(int pageNumber)
{
var pageDictionary = _reader.GetPageN(pageNumber);
var annotations = pageDictionary.GetAsArray(PdfName.ANNOTS);
return annotations;
}
private static bool hasAction(PdfDictionary annotationDictionary)
{
return annotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK);
}
private static bool isUriAction(PdfDictionary annotationAction)
{
return annotationAction.Get(PdfName.S).Equals(PdfName.URI);
}
private void replaceUriWithLocalDestination(PdfDictionary annotationAction)
{
var uri = annotationAction.Get(PdfName.URI) as PdfString;
if (uri == null)
return;
if (string.IsNullOrWhiteSpace(uri.ToString()))
return;
var namedDestination = UriToNamedDestination(new Uri(uri.ToString()));
if (string.IsNullOrWhiteSpace(namedDestination))
return;
PdfObject entry;
if (!_namedDestinations.TryGetValue(namedDestination, out entry))
return;
annotationAction.Remove(PdfName.S);
annotationAction.Remove(PdfName.URI);
var newLocalDestination = new PdfArray();
annotationAction.Put(PdfName.S, PdfName.GOTO);
var xRef = ((PdfArray)entry).First(x => x is PdfIndirectReference);
newLocalDestination.Add(xRef);
newLocalDestination.Add(PdfName.FITH);
annotationAction.Put(PdfName.D, newLocalDestination);
}
private void saveChanges()
{
using (var fileStream = new FileStream(OutputPdf, FileMode.Create, FileAccess.Write, FileShare.None))
using (var stamper = new PdfStamper(_reader, fileStream))
{
stamper.Close();
}
}
private void updatePdfLinks()
{
_reader = new PdfReader(InputPdf);
_namedDestinations = _reader.GetNamedDestinationFromStrings();
var pageCount = _reader.NumberOfPages;
for (var i = 1; i <= pageCount; i++)
{
var annotations = getAnnotationsOfCurrentPage(i);
if (annotations == null || !annotations.Any())
continue;
foreach (var annotation in annotations.ArrayList)
{
var annotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(annotation);
if (!hasAction(annotationDictionary))
continue;
var annotationAction = annotationDictionary.Get(PdfName.A) as PdfDictionary;
if (annotationAction == null)
continue;
if (!isUriAction(annotationAction))
continue;
replaceUriWithLocalDestination(annotationAction);
}
}
}
}
}
To use it:
new ReplacePdfLinks
{
InputPdf = #"test.pdf",
OutputPdf = "mod.pdf",
UriToNamedDestination = uri =>
{
if (uri.Host.ToLowerInvariant().Contains("google.com"))
{
return "entry1";
}
return string.Empty;
}
}.Start();
This sample will modify all of the urls containing google.com to point to a specific named destination "entry1".
And this is the sample file to test the above class:
void WriteFile()
{
using (var doc = new Document(PageSize.LETTER))
{
using (var fs = new FileStream("test.pdf", FileMode.Create))
{
using (var writer = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
var blueFont = FontFactory.GetFont("Arial", 12, Font.NORMAL, BaseColor.BLUE);
doc.Add(new Chunk("Go to URL", blueFont).SetAction(new PdfAction("http://www.google.com/", false)));
doc.NewPage();
doc.Add(new Chunk("Go to Test", blueFont).SetLocalGoto("entry1"));
doc.NewPage();
doc.Add(new Chunk("Test").SetLocalDestination("entry1"));
doc.Close();
}
}
}
}

iTextSharp: table in landscape

I'm using iTextSharp to generate a large document. In this document I want some specific pages in landscape. All the rest is portrait. Does anyone know how I can do this?
Starting a new document is not an option.
Thanks!

You can set the document size and it will affect the next pages. Some snippets:
Set up your document somewhere (you know that already):
var document = new Document();
PdfWriter pdfWriter = PdfWriter.GetInstance(
document, new FileStream(destinationFile, FileMode.Create)
);
pdfWriter.SetFullCompression();
pdfWriter.StrictImageSequence = true;
pdfWriter.SetLinearPageMode();
Now loop over your pages (you probably do that as well already) and decide what page size you want per page:
for (int pageIndex = 1; pageIndex <= pageCount; pageIndex++) {
// Define the page size here, _before_ you start the page.
// You can easily switch from landscape to portrait to whatever
document.SetPageSize(new Rectangle(600, 800));
if (document.IsOpen()) {
document.NewPage();
} else {
document.Open();
}
}

try this code :
using System;
using System.IO;
using iText.Kernel.Events;
using iText.Kernel.Pdf;
using iText.Layout;
using iText.Layout.Element;
namespace iText.Samples.Sandbox.Events
{
public class PageOrientations
{
public static readonly String DEST = "results/sandbox/events/page_orientations.pdf";
public static readonly PdfNumber PORTRAIT = new PdfNumber(0);
public static readonly PdfNumber LANDSCAPE = new PdfNumber(90);
public static readonly PdfNumber INVERTEDPORTRAIT = new PdfNumber(180);
public static readonly PdfNumber SEASCAPE = new PdfNumber(270);
public static void Main(String[] args)
{
FileInfo file = new FileInfo(DEST);
file.Directory.Create();
new PageOrientations().ManipulatePdf(DEST);
}
protected void ManipulatePdf(String dest)
{
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(dest));
// The default page orientation is set to portrait in the custom event handler.
PageOrientationsEventHandler eventHandler = new PageOrientationsEventHandler();
pdfDoc.AddEventHandler(PdfDocumentEvent.START_PAGE, eventHandler);
Document doc = new Document(pdfDoc);
doc.Add(new Paragraph("A simple page in portrait orientation"));
eventHandler.SetOrientation(LANDSCAPE);
doc.Add(new AreaBreak());
doc.Add(new Paragraph("A simple page in landscape orientation"));
eventHandler.SetOrientation(INVERTEDPORTRAIT);
doc.Add(new AreaBreak());
doc.Add(new Paragraph("A simple page in inverted portrait orientation"));
eventHandler.SetOrientation(SEASCAPE);
doc.Add(new AreaBreak());
doc.Add(new Paragraph("A simple page in seascape orientation"));
doc.Close();
}
private class PageOrientationsEventHandler : IEventHandler
{
private PdfNumber orientation = PORTRAIT;
public void SetOrientation(PdfNumber orientation)
{
this.orientation = orientation;
}
public void HandleEvent(Event currentEvent)
{
PdfDocumentEvent docEvent = (PdfDocumentEvent) currentEvent;
docEvent.GetPage().Put(PdfName.Rotate, orientation);
}
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

iTextSharp PDF header with HTML string C# - c#

Related

IText7 PDFHtml generator with header and footer for C#

itextsharp html to pdf with different oriontation

“corrupt” document when Streaming In Memory Merge Word Document using OpenXML

import named destinations in pdf

iTextSharp: table in landscape

Categories

Resources