Merging xps documents make last one duplicate - c#

I have problem when merging multiple XPS documents into one. When I merge them, the result xps contains last XPS document that duplicated. Here is my function to merge (Modified version of this question):
public XpsDocument CreateXPSStream(List<XpsDocument> ListToMerge)
{
var memoryStream = new MemoryStream();
Package container = Package.Open(memoryStream, FileMode.Create);
string pack = "pack://temp.xps";
PackageStore.RemovePackage(new Uri(pack));
PackageStore.AddPackage(new Uri(pack), container);
XpsDocument xpsDoc = new XpsDocument(container, CompressionOption.SuperFast, "pack://temp.xps");
FixedDocumentSequence seqNew = new FixedDocumentSequence();
foreach (var sourceDocument in ListToMerge)
{
FixedDocumentSequence seqOld = sourceDocument.GetFixedDocumentSequence();
foreach (DocumentReference r in seqOld.References)
{
DocumentReference newRef = new DocumentReference();
((IUriContext)newRef).BaseUri = ((IUriContext)r).BaseUri;
newRef.Source = r.Source;
seqNew.References.Add(newRef);
}
}
XpsDocumentWriter xpsWriter = XpsDocument.CreateXpsDocumentWriter(xpsDoc);
xpsWriter.Write(seqNew);
//xpsDoc.Close();
//container.Close();
return xpsDoc;
}
the result goes to DocumentViewer and display it to user.

I created following function and it works for me.
public void MergeXpsDocument(string newFile, List<XpsDocument> sourceDocuments)
{
if (File.Exists(newFile))
{
File.Delete(newFile);
}
XpsDocument xpsDocument = new XpsDocument(newFile, System.IO.FileAccess.ReadWrite);
XpsDocumentWriter xpsDocumentWriter = XpsDocument.CreateXpsDocumentWriter(xpsDocument);
FixedDocumentSequence fixedDocumentSequence = new FixedDocumentSequence();
foreach(XpsDocument doc in sourceDocuments)
{
FixedDocumentSequence sourceSequence = doc.GetFixedDocumentSequence();
foreach (DocumentReference dr in sourceSequence.References)
{
DocumentReference newDocumentReference = new DocumentReference();
newDocumentReference.Source = dr.Source;
(newDocumentReference as IUriContext).BaseUri = (dr as IUriContext).BaseUri;
FixedDocument fd = newDocumentReference.GetDocument(true);
newDocumentReference.SetDocument(fd);
fixedDocumentSequence.References.Add(newDocumentReference);
}
}
xpsDocumentWriter.Write(fixedDocumentSequence);
xpsDocument.Close();
}

Related

How to merge all pdf files from a PDF Portfolio to a normal pdf file using C# iText7?

I took this C# example and tried to get the attachments as a PdfDocument, but I couldn't figure out how to do it.
In the end I would like to simply merge every pdf file contained in a portfolio into a single "normal" pdf file. Every non-pdf attachment should be ignored.
Edit:
(Okay, sorry for being too vague. By saying what I want to achieve, I simply wanted to make it easier for you guys to help me. I did not want to make you write the program for me.)
So, here's part of the code from the linked example:
protected void ManipulatePdf(String dest)
{
PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(dest));
PdfDictionary root = pdfDoc.GetCatalog().GetPdfObject();
PdfDictionary names = root.GetAsDictionary(PdfName.Names);
PdfDictionary embeddedFiles = names.GetAsDictionary(PdfName.EmbeddedFiles);
PdfArray namesArray = embeddedFiles.GetAsArray(PdfName.Names);
// Remove the description of the embedded file
namesArray.Remove(0);
// Remove the reference to the embedded file.
namesArray.Remove(0);
pdfDoc.Close();
}
Instead of removing anything from the source document, I would like to know how to get the PdfDocument object(s) out of the PdfArray if possible.
Sample file:
http://www.mediafire.com/file/c4tw07wci8swdx9/NPort_5000.pdf/file
Solution by mkl ported to C#:
PdfNameTree embeddedFilesTree = pdfDocument.GetCatalog().GetNameTree(PdfName.EmbeddedFiles);
IDictionary<string, PdfObject> embeddedFilesMap = embeddedFilesTree.GetNames();
List<PdfStream> embeddedPdfs = new List<PdfStream>();
foreach (PdfObject pdfObject in embeddedFilesMap.Values)
{
if (!(pdfObject is PdfDictionary))
continue;
PdfDictionary filespecDict = (PdfDictionary)pdfObject;
PdfDictionary embeddedFileDict = filespecDict.GetAsDictionary(PdfName.EF);
if (embeddedFileDict == null)
continue;
PdfStream embeddedFileStream = embeddedFileDict.GetAsStream(PdfName.F);
if (embeddedFileStream == null)
continue;
PdfName subtype = embeddedFileStream.GetAsName(PdfName.Subtype);
if (PdfName.ApplicationPdf.CompareTo(subtype) != 0)
continue;
embeddedPdfs.Add(embeddedFileStream);
}
if (embeddedPdfs.Count > 0)
{
PdfWriter pdfWriter = new PdfWriter("NPort_5000-flat.pdf", new WriterProperties().SetFullCompressionMode(true));
PdfDocument flatPdfDocument = new PdfDocument(pdfWriter);
PdfMerger pdfMerger = new PdfMerger(flatPdfDocument);
RandomAccessSourceFactory sourceFactory = new RandomAccessSourceFactory();
foreach (PdfStream pdfStream in embeddedPdfs)
{
PdfReader embeddedReader = new PdfReader(sourceFactory.CreateSource(pdfStream.GetBytes()), new ReaderProperties());
PdfDocument embeddedPdfDocument = new PdfDocument(embeddedReader);
pdfMerger.Merge(embeddedPdfDocument, 1, embeddedPdfDocument.GetNumberOfPages());
}
flatPdfDocument.Close();
}
To merge all pdf files from a PDF Portfolio to a normal pdf file you have to walk the name tree of EmbeddedFiles, retrieve the streams of all PDFs therein, and then merge all these PDFs.
You can do this as follows for a portfolio loaded in a PdfDocument pdfDocument (Java version; the OP edited a port to C# into his question body):
PdfNameTree embeddedFilesTree = pdfDocument.getCatalog().getNameTree(PdfName.EmbeddedFiles);
Map<String, PdfObject> embeddedFilesMap = embeddedFilesTree.getNames();
List<PdfStream> embeddedPdfs = new ArrayList<PdfStream>();
for (Map.Entry<String, PdfObject> entry : embeddedFilesMap.entrySet()) {
PdfObject pdfObject = entry.getValue();
if (!(pdfObject instanceof PdfDictionary))
continue;
PdfDictionary filespecDict = (PdfDictionary) pdfObject;
PdfDictionary embeddedFileDict = filespecDict.getAsDictionary(PdfName.EF);
if (embeddedFileDict == null)
continue;
PdfStream embeddedFileStream = embeddedFileDict.getAsStream(PdfName.F);
if (embeddedFileStream == null)
continue;
PdfName subtype = embeddedFileStream.getAsName(PdfName.Subtype);
if (!PdfName.ApplicationPdf.equals(subtype))
continue;
embeddedPdfs.add(embeddedFileStream);
}
Assert.assertFalse("No embedded PDFs found", embeddedPdfs.isEmpty());
try ( PdfWriter pdfWriter = new PdfWriter("NPort_5000-flat.pdf", new WriterProperties().setFullCompressionMode(true));
PdfDocument flatPdfDocument = new PdfDocument(pdfWriter) ) {
PdfMerger pdfMerger = new PdfMerger(flatPdfDocument);
RandomAccessSourceFactory sourceFactory = new RandomAccessSourceFactory();
for (PdfStream pdfStream : embeddedPdfs) {
try ( PdfReader embeddedReader = new PdfReader(sourceFactory.createSource(pdfStream.getBytes()), new ReaderProperties());
PdfDocument embeddedPdfDocument = new PdfDocument(embeddedReader)) {
pdfMerger.merge(embeddedPdfDocument, 1, embeddedPdfDocument.getNumberOfPages());
}
}
}
(FlattenPortfolio test testFlattenNPort_5000)

PDFBox doesn't embedd all Fonts

i'm using PDFBox in C# Project to create PDF/A and PDF/A3. Almost everything is working except that when I use PDFBox to convert a normal PDF-File to PDF/A not all Fonts are embedded. If I use Word to save as PDF/A the Font in Question is embedded. How do I embedd ArialMT too?
private PDDocumentCatalog makeDocPDFAcompliant(String producer, String creator)
{
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(doc);
cat.setMetadata(metadata);
List<Dictionary<string, PDFont>> lstFonts = new List<Dictionary<string, PDFont>>();
List<PDFont> lstPDFonts= new List<PDFont>();
List pages = cat.getAllPages();
Iterator it = pages.iterator();
while (it.hasNext())
{
PDPage page = (PDPage)it.next();
var pageFont = page.getResources().getFonts();
lstFonts.Add(pageFont.ToDicitonary<string, PDFont>());
}
foreach (Dictionary<string, PDFont> d in lstFonts)
{
foreach (KeyValuePair<string, PDFont> entry in d)
{
PDFont font = entry.Value;
if (!lstPDFonts.Contains(font))
{
lstPDFonts.Add(font);
}
}
}
//PDType0Font font0 = PDType0Font.Load(doc,)
XMPMetadata xmp = new XMPMetadata();
XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
xmp.addSchema(pdfaid);
pdfaid.setConformance("A");
pdfaid.setPart(java.lang.Integer.valueOf(1));
pdfaid.setAbout("");
metadata.importXMPMetadata(xmp);
//System.IO.Stream asset = Zaumzeug.Properties.Resources.sRGB_Color_Space_Profile
System.IO.Stream stream = new System.IO.MemoryStream(Zaumzeug.Properties.Resources.sRGB_Color_Space_Profile);
InputStream colorProfile = new ikvm.io.InputStreamWrapper(stream);
PDOutputIntent oi = new PDOutputIntent(doc, colorProfile);
oi.setInfo("sRGB IEC61966-2.1");
oi.setOutputCondition("sRGB IEC61966-2.1");
oi.setOutputConditionIdentifier("sRGB IEC61966-2.1");
oi.setRegistryName("http://www.color.org");
cat.addOutputIntent(oi);
doc.save(#"D:\Examples .Net\Data\FontsNormalA.pdf");
return cat;
}

iTextSharp AcroForm - multi-field not copying

I have a pdf with buttons that take you out to web links. I used iTextSharp to split these into separate PDFs (1 per page) per outside requirements. ISSUE: Any button that has multiple positions, lost the actions.
QUESTION: Does anyone know how to update these actions? I can open the new file, but I'm not sure how to go about using the PdfStamper to add an AA to this Annotation
So when opening the original file, you could get to the Additional Action by doing this:
var r = new PdfReader(f.FullName);
var positionsOfThisButton = r.AcroFields.GetFieldPositions("14");
var field = r.AcroForm.GetField("14")
var targetObject = PdfReader.GetPdfObject(field.Ref);
var kids = targetObject.GetAsArray(PdfName.KIDS);
foreach (var k in kids){
var ko = (PdfDictionary)(k.IsIndirect() ? PdfReader.GetPdfObject(k) : k);
var aaObj = ko.Get(PdfName.AA);
//(aaObj is NULL in the new file)
var aa = (PdfDictionary)(aaObj.IsIndirect() ? PdfReader.GetPdfObject(aaObj) : aaObj);
var dObj = aa.Get(PdfName.D);
var d = (PdfDictionary)(dObj.IsIndirect() ? PdfReader.GetPdfObject(dObj) : dObj);
Debug.WriteLine("S:" + d.GetAsName(PdfName.S).ToString() );
//returns S:/Uri
Debug.WriteLine("URI:" + d.GetAsString(PdfName.URI).ToString() );
//returns URI:http://www.somesite.com/etc
}
Thanks for any help.
FYI ONLY - The following is how I split the files:
List<byte[]> Get(FileInfo f) {
List<byte[]> outputFiles = new List<byte[]>();
var reader = new PdfReader(f.FullName);
int n = reader.NumberOfPages;
reader.Close();
for (int i = n; i > 0; i--) {
reader = new PdfReader(f.FullName);
using (var document = new Document(reader.GetPageSizeWithRotation(1))) {
using (var outputStream = new MemoryStream()) {
using (var writer = new PdfCopy(document, outputStream)) {
writer.SetMergeFields();
writer.PdfVersion = '6';
document.Open();
writer.AddDocument(reader, new List<int> { i });
document.Close();
writer.Close();
}
outputFiles.Insert(0, outputStream.ToArray());
}
}
reader.Close();
}
return outputFiles;
}

Merge a list of pdfs and create new bookmarks (C#)

The project is in C# and use iTextSharp.
I have a dictionary with a title (string) and file content (byte array). I loop through this dictionary and merge all files together. What I need now is to add bookmarks to the start of the first page in each file, but I should not add any new pages or text to the final document. I have tried different solutions, but all seem to add a table of contents page, a new page before each page or some text at the start of the page.
None of the files have bookmarks originally.
I am looking for a bookmarks structure that looks something like this:
File1
File2
SomeCategory
File3
File4
I would very much appreciate it if anyone could point me in the right direction.
My function for merging the files looks like this:
/// <summary>
/// Merge PDF files, and stamp certificates. This is a modified version of the example in the link below.
/// See: http://www.codeproject.com/Articles/28283/Simple-NET-PDF-Merger for more information.
/// </summary>
/// <param name="sourceFiles">Files to be merged</param>
/// <returns>Byte array with the combined files.</returns>
public static byte[] MergeFiles(Dictionary<string, byte[]> sourceFiles)
{
var document = new Document();
var output = new MemoryStream();
try
{
// Initialize pdf writer
var writer = PdfWriter.GetInstance(document, output);
writer.PageEvent = new PdfPageEvents();
// Open document to write
document.Open();
var content = writer.DirectContent;
// Iterate through all pdf documents
foreach (var sourceFile in sourceFiles)
{
// Create pdf reader
var reader = new PdfReader(sourceFile.Value);
var numberOfPages = reader.NumberOfPages;
// Iterate through all pages
for (var currentPageIndex = 1; currentPageIndex <=
numberOfPages; currentPageIndex++)
{
// Determine page size for the current page
document.SetPageSize(
reader.GetPageSizeWithRotation(currentPageIndex));
// Create page
document.NewPage();
var importedPage =
writer.GetImportedPage(reader, currentPageIndex);
// Determine page orientation
var pageOrientation = reader.GetPageRotation(currentPageIndex);
if ((pageOrientation == 90) || (pageOrientation == 270))
{
content.AddTemplate(importedPage, 0, -1f, 1f, 0, 0,
reader.GetPageSizeWithRotation(currentPageIndex).Height);
}
else
{
content.AddTemplate(importedPage, 1f, 0, 0, 1f, 0, 0);
}
// Add stamp to certificates
if (sourceFile.Key.IsValidDocumentReference())
AddStamp(content, document, sourceFile.Key, currentPageIndex, numberOfPages);
}
}
}
catch (Exception exception)
{
throw new Exception("An unexpected exception occured during the merging process", exception);
}
finally
{
document.Close();
}
return output.GetBuffer();
}
Thanks to Bruno Lowagie who pointed me in the right direction, I was able to produce a solution to my problem.
This is my solution:
public static byte[] MergeFilesAndAddBookmarks(Dictionary<PrintDocument, byte[]> sourceFiles)
{
using (var ms = new MemoryStream())
{
using (var document = new Document())
{
using (var copy = new PdfCopy(document, ms))
{
//Order the files by chapternumber
var files = sourceFiles.GroupBy(f => f.Key.ChapterNumber);
document.Open();
var outlines = new List<Dictionary<string, object>>();
var pageIndex = 1;
foreach (var chapterGroup in files)
{
var map = new Dictionary<string, object>();
outlines.Add(map);
map.Add("Title", chapterGroup.First().Key.ChapterName);
var kids = new List<Dictionary<string, object>>();
map.Add("Kids", kids);
foreach (var sourceFile in chapterGroup)
{
using (var reader = new PdfReader(sourceFile.Value))
{
// add the pages
var n = reader.NumberOfPages;
for (var page = 0; page < n;)
{
if (page == 0)
{
var kid = new Dictionary<string, object>();
kids.Add(kid);
kid["Title"] = sourceFile.Key.Title;
kid["Action"] = "GoTo";
kid["Page"] = String.Format("{0} Fit", pageIndex);
}
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
pageIndex += n;
reader.Close();
}
}
}
copy.Outlines = outlines;
document.Close();
copy.Close();
ms.Close();
}
}
return ms.ToArray();
}
}
}
public class PrintDocument
{
public string Title { get; set; }
public string ChapterName { get; set; }
public int ChapterNumber { get; set; }
}

ITextSharp: How to get an image embedded resource

I'm parsing an HTML with some images inside this.
This images are stored as embedded resource, not in the filesystem.
as I know, i need to set a custom image provider in HtmlPipelineContext, and this provider need to retrieve the image path or the itextsharp image.
The question is, somebody know which method of Abstract Image Provider i need to implement? and how?
this is my code:
var list = new List<string> { text };
byte[] renderedBuffer;
using (var outputMemoryStream = new MemoryStream())
{
using (
var pdfDocument = new Document(PageSize.A4, 30, 30, 30, 30))
{
var pdfWriter = PdfWriter.GetInstance(pdfDocument, outputMemoryStream);
pdfWriter.CloseStream = false;
pdfDocument.Open();
HtmlPipelineContext htmlContext = new HtmlPipelineContext(new CssAppliersImpl());
htmlContext.SetImageProvider(new MyImageProvider());
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
CssResolverPipeline pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(pdfDocument, pdfWriter)));
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser p = new XMLParser(worker);
foreach (var htmlText in list)
{
using (var htmlViewReader = new StringReader(htmlText))
{
p.Parse(htmlViewReader);
}
}
}
renderedBuffer = new byte[outputMemoryStream.Position];
outputMemoryStream.Position = 0;
outputMemoryStream.Read(renderedBuffer, 0, renderedBuffer.Length);
}
Thanks in advance.
Using a custom Image Provider it doesn't seem to be supported. The only thing it really supports is changing root paths.
However, here's one solution to the problem:
Create a new html tag, called <resimg src="{resource name}"/>, and write a custom tag processor for it.
Here's the implementation:
/// <summary>
/// Our custom HTML Tag to add an IElement.
/// </summary>
public class ResourceImageHtmlTagProcessor : AbstractTagProcessor
{
public override IList<IElement> End(IWorkerContext ctx, Tag tag, IList<IElement> currentContent)
{
var src = tag.Attributes["src"];
var bitmap = (Bitmap)Resources.ResourceManager.GetObject(src);
if (bitmap == null)
throw new RuntimeWorkerException("No resource with the name: " + src);
var converter = new ImageConverter();
var image = Image.GetInstance((byte[])converter.ConvertTo(bitmap, typeof(byte[])));
HtmlPipelineContext htmlPipelineContext = this.GetHtmlPipelineContext(ctx);
return new List<IElement>(1)
{
this.GetCssAppliers().Apply(
new Chunk((Image)this.GetCssAppliers().Apply(image, tag, htmlPipelineContext), 0f, 0f, true),
tag,
htmlPipelineContext)
};
}
}
To configure your new processor replace the line where you specify the TagFactory with the following:
var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
tagProcessorFactory.AddProcessor(new ResourceImageHtmlTagProcessor(), new[] { "resimg" });
htmlContext.SetTagFactory(tagProcessorFactory);

Categories

Resources