Conver itext7 PdfDocument to byte array in c# - c#

I am using itext7 pdfhtml (4.0.3) to convert Html to pdf in memory. Below method is taking html in memory and returning PdfDocument object of itext7. I need to convert that PdfDocument object to byte array or stream.
Please let me know how we can achieve that.
private iText.Kernel.Pdf.PdfDocument CreatePdf( string html)
{
byte[] bytes = Encoding.ASCII.GetBytes(html);
ConverterProperties properties = new ConverterProperties();
properties.SetBaseUri(path);
MemoryStream myMemoryStream = new MemoryStream(bytes);
PdfWriter writer = new(myMemoryStream);
iText.Kernel.Pdf.PdfDocument pdf = new iText.Kernel.Pdf.PdfDocument(writer);
pdf.SetDefaultPageSize(PageSize.A4);
pdf.SetTagged();
HtmlConverter.ConvertToDocument(html,pdf,properties);
return pdf;
}

to #mkl's point, you're kind of overdoing it. Here's a simple example:
var html = "<h1>hi mom</h1>";
byte[] result;
using (var memoryStream = new MemoryStream())
{
var pdf = new PdfDocument(new PdfWriter(memoryStream));
pdf.SetDefaultPageSize(PageSize.A4);
pdf.SetTagged();
HtmlConverter.ConvertToPdf(html, pdf, new ConverterProperties());
result = memoryStream.ToArray();
}
File.WriteAllBytes(#"/tmp/file.pdf", result);
the memoryStream will have your in memory representation of your conversion. I've added the WriteAllBytes bits just so you can see for yourself.
Another note, if you do not require setting any PdfDocument properties, you can use an even simpler version:
var html = "<h1>hi mom</h1>";
byte[] result;
using (var memoryStream = new MemoryStream())
{
HtmlConverter.ConvertToPdf(html, memoryStream);
result = memoryStream.ToArray();
}
File.WriteAllBytes(#"/tmp/file.pdf", result);

Related

iTextSharp throws an exception when it reads a PDF created with an empty signature container (to embed the received signature from the client)

Help please. I'm trying to get a "deferred" digital signature and embed it in a PDF document. On the server (.net core 3.1) I create a test pdf and add an empty container for the signature. To do this, I use the iTextSharp package (version 5.5.13.3):
public static byte[] GetPdfBytesToSign(byte[] unsignedPdf, string signatureFieldName)
{
//unsignedPdf - source PDF
using (PdfReader reader = new PdfReader(unsignedPdf))
{
//looking for existing signatures
int existingSignaturesNumber = reader.AcroFields.GetSignatureNames().Count;
using (MemoryStream tempPdf = new MemoryStream())
{
//add an empty container for a new signature
using (PdfStamper st = PdfStamper.CreateSignature(reader, tempPdf, '\0', null, true))
{
PdfSignatureAppearance appearance = st.SignatureAppearance;
appearance.SetVisibleSignature(new Rectangle(205, 700, 390, 800), reader.NumberOfPages,
//set the name of the field, it will be needed later to insert a signature
$"{signatureFieldName}{existingSignaturesNumber + 1}");
ExternalBlankSignatureContainer external = new ExternalBlankSignatureContainer(PdfName.ADOBE_PPKLITE, PdfName.ADBE_PKCS7_DETACHED);
MakeSignature.SignExternalContainer(appearance, external, BufferSize);
//get a stream that contains the sequence we want to sign
using (Stream contentStream = appearance.GetRangeStream())
{
MemoryStream memoryStream = new MemoryStream();
contentStream.CopyTo(memoryStream);
byte[] bytesToSign = memoryStream.ToArray();
return bytesToSign;
}
}
}
}
}
And here is the problem!!! For some reason, iText can't read the PDF it created with an empty signature container (to embed the received signature from the client).
public static byte[] GetSignedPdf(byte[] unsignedPdfWithContainer, byte[] signature)
{
using var pdfReader = new PdfReader(unsignedPdfWithContainer);//<--throwing an exception here
using var output = new MemoryStream();
var external = new MyExternalSignatureContainer(signature);
MakeSignature.SignDeferred(pdfReader, SignatureFieldName, output, external);
return output.ToArray();
}
I am getting the following exception:
iTextSharp.text.exceptions.InvalidPdfException: "Unexpected '>>' at file pointer 1432"
Test(source) pdf is created like this:
public static byte[] CreatePdf()
{
using (MemoryStream ms = new MemoryStream())
{
Document document = new Document(PageSize.A4, 25, 25, 30, 30);
PdfWriter writer = PdfWriter.GetInstance(document, ms);
document.Open();
document.Add(new Paragraph("Hello World!"));
document.Close();
writer.Close();
return ms.ToArray();
}
}

itext7 CopyPagesTo not opening PDF

I am trying to add a Cover Page PDF file to another PDF file. I am using CopyPagesTo method. CoverPageFilePath will go before any pages in the pdfDocumentFile. I then need to rewrite that new file to the same location. When I run the code and open the new pdf file I get an error about it being damaged.
public static void iText7MergePDF()
{
byte[] modifiedPdfInBytes = null;
string pdfCoverPageFilePath = #"PathtoCoverPage\Cover Page.pdf";
PdfDocument pdfDocumentCover = new PdfDocument(new iText.Kernel.Pdf.PdfReader(pdfCoverPageFilePath));
string pdfDocumentFile =#"PathtoFullDocument.pdf";
var buffer = File.ReadAllBytes(pdfDocumentFile);
using (var originalPdfStream = new MemoryStream(buffer))
using (var modifiedPdfStream = new MemoryStream())
{
var pdfReader = new iText.Kernel.Pdf.PdfReader(originalPdfStream);
var pdfDocument = new PdfDocument(pdfReader, new PdfWriter(modifiedPdfStream));
int numberOfPages = pdfDocumentCover.GetNumberOfPages();
pdfDocumentCover.CopyPagesTo(1, numberOfPages, pdfDocument);
modifiedPdfInBytes = modifiedPdfStream.ToArray();
pdfDocument.Close();
}
System.IO.File.WriteAllBytes(pdfGL, modifiedPdfInBytes);
}
Whenever you have some other type, like a StreamWriter, or here a PdfWriter writing to a Stream, it may not write all the data to the Stream immediately.
Here you Close the pdfDocument for all the data to be written to the MemoryStream.
ie this
modifiedPdfInBytes = modifiedPdfStream.ToArray();
pdfDocument.Close();
Should be
pdfDocument.Close();
modifiedPdfInBytes = modifiedPdfStream.ToArray();

Generate one pdf document with multiple pages converting from html using IText 7

I'm working with IText 7, I've been able to get one html page and generate a pdf for that page, but I need to generate one pdf document from multiple html pages and separated by pages. For example: I have Page1.html, Page2.html and Page3.html. I will need a pdf document with 3 pages, the first page with the content of Page1.html, second page with the content of Page2.html and like that...
This is the code I have and it's working for one html page:
ConverterProperties properties = new ConverterProperties();
PdfWriter writer = new PdfWriter(pdfRoot, new WriterProperties().SetFullCompressionMode(true));
PdfDocument pdfDocument = new PdfDocument(writer);
pdfDocument.AddEventHandler(PdfDocumentEvent.END_PAGE, new HeaderPdfEventHandler());
HtmlConverter.ConvertToPdf(htmlContent, pdfDocument, properties);
Is it possible to loop against the multiple html pages, add a new page to the PdfDocument for every html page and then have only one pdf generated with one page per html page?
UPDATE
I've been following this example and trying to translate it from Java to C#, I'm trying to use PdfMerger and loop around the html pages... but I'm receiving the Exception Cannot access a closed stream, on this line:
temp = new PdfDocument(
new PdfReader(new RandomAccessSourceFactory().CreateSource(baos), rp));
It looks like is related to the ByteArrayOutputStream baos instance. Any suggestions? This is my current code:
foreach (var html in htmlList)
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfDocument temp = new PdfDocument(new PdfWriter(baos));
HtmlConverter.ConvertToPdf(html, temp, properties);
ReaderProperties rp = new ReaderProperties();
temp = new PdfDocument(
new PdfReader(new RandomAccessSourceFactory().CreateSource(baos), rp));
merger.Merge(temp, 1, temp.GetNumberOfPages());
temp.Close();
}
pdfDocument.Close();
You are using RandomAccessSourceFactory and passing there a closed stream which you wrote a PDF document into. RandomAccessSourceFactory expects an input stream instead that is ready to be read.
First of all you should use MemoryStream which is native to .NET world. ByteArrayOutputStream is the class that was ported from Java for internal purposes (although it extends MemoryStream as well). Secondly, you don't have to use RandomAccessSourceFactory - there is a simpler way.
You can create a new MemoryStream instance from the bytes of the MemoryStream that you used to create a temporary PDF with the following line:
baos = new MemoryStream(baos.ToArray());
As an additional remark, it's better to close PdfMerger instance directly instead of closing the document - closing PdfMerger closes the underlying document as well.
All in all, we get the following code that works:
foreach (var html in htmlList)
{
MemoryStream baos = new MemoryStream();
PdfDocument temp = new PdfDocument(new PdfWriter(baos));
HtmlConverter.ConvertToPdf(html, temp, properties);
ReaderProperties rp = new ReaderProperties();
baos = new MemoryStream(baos.ToArray());
temp = new PdfDocument(new PdfReader(baos, rp));
pdfMerger.Merge(temp, 1, temp.GetNumberOfPages());
temp.Close();
}
pdfMerger.Close();
Maybe not so succinctly. I use "using". Similar answer
private byte[] CreatePDF(string html)
{
byte[] binData;
using (var workStream = new MemoryStream())
{
using (var pdfWriter = new PdfWriter(workStream))
{
//Create one pdf document
using (var pdfDoc = new PdfDocument(pdfWriter))
{
pdfDoc.SetDefaultPageSize(iText.Kernel.Geom.PageSize.A4.Rotate());
//Create one pdf merger
var pdfMerger = new PdfMerger(pdfDoc);
//Create two identical pdfs
for (int i = 0; i < 2; i++)
{
using (var newStream = new MemoryStream(CreateDocument(html)))
{
ReaderProperties rp = new ReaderProperties();
using (var newPdf = new PdfDocument(new PdfReader(newStream, rp)))
{
pdfMerger.Merge(newPdf, 1, newPdf.GetNumberOfPages());
}
}
}
}
binData = workStream.ToArray();
}
}
return binData;
}
Create pdf
private byte[] CreateDocument(string html)
{
byte[] binData;
using (var workStream = new MemoryStream())
{
using (var pdfWriter = new PdfWriter(workStream))
{
using (var pdfDoc = new PdfDocument(pdfWriter))
{
pdfDoc.SetDefaultPageSize(iText.Kernel.Geom.PageSize.A4.Rotate());
ConverterProperties props = new ConverterProperties();
using (var document = HtmlConverter.ConvertToDocument(html, pdfDoc, props))
{
}
}
binData = workStream.ToArray();
}
}
return binData;
}

How to convert DocX from Xceed.Words.NET library to pdf and save it in a memory stream

I want to convert word byte array to pdf byte array.
I am using Xceed.Words.NET library
var stream = new MemoryStream(sourceFile.AttachedFile);
var doc = DocX.Load(stream);
var ms = new MemoryStream();
doc.SaveAs(ms);
var wByteArray = ms.GetBuffer();
Use this:
var stream = new MemoryStream(sourceFile.AttachedFile);
using (var document = DocX.Load(stream))
{
stream = new MemoryStream();
DocX.ConvertToPdf(document, stream);
}
var bytes = stream.ToArray();
As mentioned in the comment, you need a professional version of DocX library to convert a Word document to PDF.
If you're looking for free solution then perhaps you could try out GemBox.Document, its free version does support converting to PDF, but it has a document size limitation.
You can use it like this:
ComponentInfo.SetLicense("FREE-LIMITED-KEY");
var stream = new MemoryStream(sourceFile.AttachedFile);
var document = DocumentModel.Load(stream, LoadOptions.DocxDefault);
stream = new MemoryStream();
document.Save(stream, SaveOptions.PdfDefault);
var bytes = stream.ToArray();

PDFsharp save to MemoryStream

I want to save a PdfSharp.Pdf.PdfDocument by its Save method to a Stream, but it doesn't attach the PDF header settings to it. So when I read back the Stream and return it to the user, he see that the PDF file is invalid. Is there a solution to attach the PDF header settings when PDFsharp saves to memory?
If you think there is an issue with PdfDocument.Save, then please report this on the PDFsharp forum (but please be more specific with your error description).
Your "solution" looks like a hack to me.
"pdfRenderer.Save" calls "PdfDocument.Save" internally.
Whatever the problem is - your "solution" still calls the same Save routine.
Edit:
To get a byte[] containing a PDF file, you only have to call:
MemoryStream stream = new MemoryStream();
document.Save(stream, false);
byte[] bytes = stream.ToArray();
Early versions of PDFsharp do not reset the stream position.
So you have to call
ms.Seek(0, SeekOrigin.Begin);
to reset the stream position before reading from the stream; this is no longer required for current versions.
Using ToArray can often be used instead of reading from the stream.
Edit 2: instead of stream.ToArray() it may be more efficient to use stream.GetBuffer(), but this buffer is usually larger than the PDF file and you only have to use stream.Length bytes from that buffer. Very useful for method that take a byte[] along with a length parameter.
So the solution:
MigraDoc.DocumentObjectModel.Document doc = new MigraDoc.DocumentObjectModel.Document();
MigraDoc.Rendering.DocumentRenderer renderer = new DocumentRenderer(doc);
MigraDoc.Rendering.PdfDocumentRenderer pdfRenderer = new MigraDoc.Rendering.PdfDocumentRenderer();
pdfRenderer.PdfDocument = pDoc;
pdfRenderer.DocumentRenderer = renderer;
using (MemoryStream ms = new MemoryStream())
{
pdfRenderer.Save(ms, false);
byte[] buffer = new byte[ms.Length];
ms.Seek(0, SeekOrigin.Begin);
ms.Flush();
ms.Read(buffer, 0, (int)ms.Length);
}
There is this MigraDoc stuff which comes with PdfSharp, but i hardly found any proper doc/faq for it. After hours of googling i've found a snippet which was something like this. Now it works.
I found simpler solution:
byte[] fileContents = null;
using(MemoryStream stream = new MemoryStream())
{
pdfDoc.Save(stream, true);
fileContents = stream.ToArray();
}
Source:
http://usefulaspandcsharp.wordpress.com/2010/03/09/save-a-pdf-to-a-byte-array-using-pdf-sharpmigradoc/
For MigraDoc (ver 1.30) I could save it with
PdfDocumentRenderer renderer = new PdfDocumentRenderer(true, PdfSharp.Pdf.PdfFontEmbedding.Always);
renderer.Document = report.m_Document;
renderer.RenderDocument();
using (MemoryStream stream = new MemoryStream())
{
renderer.PdfDocument.Save(stream, false);
... your code in here
}
Thanks Misnyo Solution. But for me it works like this:
Document document = new Document();
PdfDocumentRenderer pdfRenderer = new PdfDocumentRenderer();
//Add to document here.......
//render the document with pdf renderer
pdfRenderer.Document = document;
pdfRenderer.RenderDocument();
//Save renderer result into stream
using(MemoryStream ms = new MemoryStream())
{
pdfRenderer.PdfDocument.Save(ms, false);
byte[] buffer = new byte[ms.Length];
ms.Seek(0, SeekOrigin.Begin);
ms.Flush();
ms.Read(buffer, 0, (int)ms.Length);
ms.Position = 0;
}

Categories

Resources