iText7 Merge of Multiple PDF MemoryStreams Not Working

iText7 Merge of Multiple PDF MemoryStreams Not Working - c#

I am trying to Generate a Single PDF File From Multiple Memory Streams, I am having a lot of trouble determining the proper way to merge 2 PDF MemoryStreams into one PDF MemoryStream that contains all the pages from both source PDF MemoryStreams. It seems simple and I think the code below is set up properly but the resulting PDF memory stream does not contain both the Files Combined.
I am having a lot of trouble determining the proper way to merge 2 PDF MemoryStreams into one PDF MemoryStream that contains all the pages from both source PDF MemoryStreams. It seems simple and I think the code below is set up properly but the resulting PDF memory stream does not Contain Merged Documents.
I have found multiple ways documented on the Internet as the "proper" way to do the merge. The actual sample code with iText 7 seems to be unusually complex (in that is mixes multiple concepts into one sample repeatedly - as in doesn't reduce the concept to the simplest possible code), and seems to fail to demonstrate simple concepts. For instance, their PDFMerge documentation has no sample code at all in the documentation (nor does anything else I looked at in the class documentation). The examples they have online actually always mix merging from files (not MemoryStreams or byte[]) with other concepts like adding page numbers or adding Table of Contents. So they never just show one concept and they never start with anything other than files. My PDFs are coming out of a database and we just need to merge them into one PDF memory stream and save it back out. My concern is that maybe I am not creating the MemoryStream properly when I initialize the PDFWriter. As none of their samples ever do anything but initial with an actual file, I was unable to confirm this was done properly. I also fully qualified all objects in the code because I want to leave the old iTextSharp code in place while I am upgrading to the new iText 7. This was done to make sure an iTextSharp object of the same name wasn't inadvertently being unknowingly used.
Also, in the interest of making the source as easy as possible to read I removed some of the declarations and initialization of objects being used. Everything was traced through and all values are fully loaded with proper values as you trace through the code. I am assuming the problem is that I didn't prepare the PDF objects properly or that I have to do something special with the PDFWriter on the Destination PDF Document (ms) before the the PDFMerge object.
List<byte[]> streams = new List<byte[]>();
somelist.ForEach(item=>
{
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
{
pdfWriter.SetCloseStream(false);
HtmlConverter.ConvertToPdf(strContent, pdfWriter);
streams.Add(workStream.ToArray());
pdfWriter.Close();
}
}
MemoryStream ms = new MemoryStream();
PdfWriter writer = new PdfWriter(ms);
PdfDocument document = new PdfDocument(writer);
PdfMerger merger = new PdfMerger(document);
streams.ForEach(stream =>
{
Stream msDoc = new MemoryStream(stream);
PdfDocument doc = new PdfDocument(new PdfReader(msDoc));
merger.Merge(doc, 1, doc.GetNumberOfPages());
doc.Close();
});
ByteContent = ms.ToArray();
document.Close();

Merging is a really straightforward process:
var SourceDocument1 = new PdfDocument(new PdfReader(SRC));
var SourceDocument2 = new PdfDocument(new PdfReader(SRC1));
byte[] result;
using (var memoryStream = new MemoryStream())
{
var pdfWriter = new PdfWriter(memoryStream);
var pdfDocument = new PdfDocument(pdfWriter);
PdfMerger merge = new PdfMerger(pdfDocument);
merge.Merge(SourceDocument1, 1, SourceDocument1.GetNumberOfPages())
.Merge(SourceDocument2, 1, SourceDocument2.GetNumberOfPages());
merge.Close();
result = memoryStream.ToArray();
}
File.WriteAllBytes(#"C:\temp\file.pdf", result);
this will merge SRC with SRC1.
There are a lot of examples on Github, such as this one (there is also a whole folder with merge examples).
I am writing the destination document in the end, just to make sure it's being created correctly, but you can do whatever you want to with the MemoryStream, of course.

Related

Read a PDF document in Blazor WebAssembly with iText7

I am having a bit of a struggle reading a PDF using iText7 in Blazor WebAssembly.
The InputFile component creates a IBrowserFile:
<div>
<InputFile OnChange="#OnFileSelection"></InputFile>
<div class="row">
<textarea>#outputText</textarea>
</div>
</div>
I can then read the file with Stream - and iText7 will supposedly read that - but it won't give a page count or anything else that I have tried. It also doesn't seem to pass over the reader, and doesn't even seem to get to the pageCount.
int pageCount = 0;
IBrowserFile pdfFile = e.File;
Stream stream = pdfFile.OpenReadStream();
PdfDocument pdfDoc = new PdfDocument(new PdfReader(stream));
pageCount = pdfDoc.GetNumberOfPages();
stream.Close();
outputText = $"{pageCount}";
StateHasChanged();
I have also tried reading the Stream into a MemoryStream first, same outcome. I have followed the information here:
https://learn.microsoft.com/en-us/aspnet/core/blazor/file-uploads?view=aspnetcore-6.0&pivots=webassembly
Same outcome.
Is there a way to handle the PDF file in such a way as the functionality of iText7 remains intact, so you can get page counts, extracted text etc?
The file I am testing on is below the 500kb limit, it is 66kb. I don't need to display the PDF - I just need to know what the contents of it are ideally on a page by page basis, but for now, simply being able to read a page or get a page count would be a big step forward.

If you look at your developer console, you'll find that it's emitting the error:
Synchronous reads are not supported.
You'll notice you aren't using await anywhere, and, unfortunately, neither is iText7. Blazor strongly enforces use of asynchronous semantics and if it's violated, you'll see an error like this.
Fortunately, you can still make this work. You said:
I have also tried reading the Stream into a MemoryStream first, same outcome.
You should show what you tried, but my hunch is it looked something like this:
var copy = new MemoryStream();
stream.CopyTo(copy);
copy.Position = 0;
PdfDocument pdfDoc = new PdfDocument(new PdfReader(copy));
This will lead to the exact same error on the line .CopyTo. And for the same reasons. If you instead make the copy process properly use async semantics, it will work:
var copy = new MemoryStream();
await stream.CopyToAsync(copy);
copy.Position = 0;
PdfDocument pdfDoc = new PdfDocument(new PdfReader(copy));
Notice await stream.CopyToAsync(copy);. You'll need to make your surrounding method async in order for the await to work, but the return type ought to be a Task already. (And if it isn't, you can make it so)
Using this, I was able to see the page count display in your text area.

How to save one pdf in different places

I create a pdf file using pdfstamper and I want to save my pdf in two different files (change the path in the pdfStamper) do I need to create a new pdfStamper or is there a way to save in multiple places the same file
// that's my code
PdfStamper stamper = new PdfStamper(rdr, new System.IO.FileStream(path, System.IO.FileMode.Create));

If I understand you correctly - you need to put the same file in different places, right? It seems to me the most logical thing is to perform all the necessary operations on one pdf-file and then make a copy of it using method System.IO.File.Copy(path, new_path);

Writing a multipaged tif file from memorystream vs filestream?

I am trying to use a LibTiff.Net library and rewriting a merge tool TiffCP api to use memory streams.
This library has a Tiff class and by passing a stream to this class, it can merge tiff images into this stream.
For testing, I passed on a Filestream and I got what i wanted - it merged and I was able to see multipage tif.
But when I pass a MemoryStream, I am able to verify that the page data is being added to the stream as I loop through but when I write it to the file at the end, I could see only 1st page.
var mso = new MemoryStream();
var fso = new FileStream(#"C:\test\ttest.tif",FileMode.OpenOrCreate); //This works
using (Tiff outImage = Tiff.ClientOpen("custom", "w", mso, tso))
{
//...
//..
System.Drawing.Image tiffImg = System.Drawing.Image.FromStream(mso, true);
tiffImg.Save(#"C:\test\test2.tiff", System.Drawing.Imaging.ImageFormat.Tiff);
tiffImg.Dispose();
//..
//..
}
P.S: I need it in memorystream because, of some folder permissions on servers + vendor API reasons.

You probably using the memory stream before data is actually written into the stream.
Please use Tiff.Flush() method before accessing data in the memory stream. And please make sure you call Tiff.WriteDirectory() method for each page you create.
EDIT:
Please also take a look at Bob Powell's article on Generating Multi-Page TIFF files. The article shows how to use EncoderParameters to actually generate a multipage TIFF.
Using
tiffImg.Save(#"C:\test\test2.tiff", System.Drawing.Imaging.ImageFormat.Tiff);
you are probably save only first frame.

Convert HTML with CSS to PDF using iTextSharp

I am working in asp.net with C# website. I want to convert a HTML DIV which contains various html elements like divs,label, tables and images with css styles(background color, cssClass etc) and I want its whole content to be converted into PDF using iTextSharp DLL but here I am facing a issue that css is not getting applied.Can any one help me by providing any example or code snippet.

Install 2 NuGet packages iTextSharp and itextsharp.xmlworker and use the following code:
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.tool.xml;
byte[] pdf; // result will be here
var cssText = File.ReadAllText(MapPath("~/css/test.css"));
var html = File.ReadAllText(MapPath("~/css/test.html"));
using (var memoryStream = new MemoryStream())
{
var document = new Document(PageSize.A4, 50, 50, 60, 60);
var writer = PdfWriter.GetInstance(document, memoryStream);
document.Open();
using (var cssMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssText)))
{
using (var htmlMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, htmlMemoryStream, cssMemoryStream);
}
}
document.Close();
pdf = memoryStream.ToArray();
}

Check out Pechkin, a C# wrapper for wkhtmltopdf.
Specifically at this point in time (considering a pending pull request) I'd check out this fork that addresses a couple of bugs (particularly helpful in IIS based on my experience).
If you don't go with the fork / get other stability issues you may want to look at having some kind of "render queue" (e.g. in a database) and have a background process (e.g. Windows service) periodically run over the queue and render then store the binary content somewhere (either in database as well, or on file system). This depends entirely on your use-case though.
Alternatively the similar solution #DaveDev has comment linked to.

How to store formatted snippets of Microsoft Word documents in sql server

I need to extract formatted text snippets of a Word document and store it inside an SQL Server table, for later processing and then reinsertion in the Word document using C#.
I've had a look at the Word DOM and it seems that I need to use a combination of the Document.Load(), Document.Save() and Range.Copy(), Range.Paste() methods to create a file for each snippets that I then load into the DB.
Isn't there a easier (more efficient way)?
By the way the code snippets can be hidden text and I was thinking about storing the snippets as RTF.

Finally I got to use Aspose.Words for .NET to extract the code snippets from the Word file I'm interested in and store them as RTF:
// Get insteresting code snippets (in this case text runs with
// style "tw4winMark")
Document sourceDocument = new Document(fileName);
var runs = sourceDocument.GetChildNodes(NodeType.Run, true)
.Select(r => r.Font.StyleName == "tw4winMark").ToList();
// Store snippets into temporary document
// Read Aspose documentation for details
Document document = new Document();
if (runs.Count > 0) {
NodeImporter nodeImporter = new NodeImporter(
runs[0].Document,
document,
ImportFormatMode.KeepSourceFormatting
);
foreach (Run run in runs) {
Run importedRun = nodeImporter.ImportNode(run, true) as Run;
importedRun.Font.Hidden = false;
document.Sections[0].Body.Paragraphs[0].AppendChild(importedRun);
}
}
// save temporary document in MemoryStream as RTF
RtfSaveOptions saveOptions = new RtfSaveOptions();
MemoryStream ms = new MemoryStream();
document.Save(ms, saveOptions);
// retrieve RTF from MemoryStream
ms.Seek(0, SeekOrigin.Begin);
StreamReader sr = new StreamReader(ms);
string rtf = sr.ReadToEnd();
One can then store the rtf into a text field of the database as usual and edit it in a RTF text control.

Document.load, then select the range via a RANGE object, then use the XML property of the range object to get the XML of that range and store it.
You can later insert the XML into another document using the reverse process.
Editing the snippets might prove interesting though, because I'm not aware of any web based WORD compatible editors.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

iText7 Merge of Multiple PDF MemoryStreams Not Working - c#

Related

Read a PDF document in Blazor WebAssembly with iText7

How to save one pdf in different places

Writing a multipaged tif file from memorystream vs filestream?

Convert HTML with CSS to PDF using iTextSharp

How to store formatted snippets of Microsoft Word documents in sql server

Categories

Resources