How to Define a PDF Outline Using MigraDoc

How to Define a PDF Outline Using MigraDoc - c#

I noticed when using MigraDoc that if I add a paragraph with any of the heading styles (e.g., "Heading1"), an entry is automatically placed in the document outline. My question is, how can I add entries in the document outline without showing the text in the document? Here is an example of my code:
var document = new Document();
var section = document.AddSection();
// The following line adds an entry to the document outline, but it also
// adds a line of text to the current section. How can I add an
// entry to the document outline without adding any text to the page?
var paragraph = section.AddParagraph("TOC Level 1", "Heading1");

I used a hack: added white text on white ground with a font size of 0.001 or so to get outlines that are actually invisible to the user.
For a perfect solution, mix PDFsharp and MigraDoc code. The hack works for me and is much easier to implement.

I realized after reading ThomasH's answer that I am already mixing PDFSharp and MigraDoc code. Since I am utilizing a PdfDocumentRenderer, I was able to add a custom outline to the PdfDocument property of that renderer. Here is an example of what I ended up doing to create a custom outline:
var document = new Document();
// Populate the MigraDoc document here
...
// Render the document
var renderer = new PdfDocumentRenderer(false, PdfFontEmbedding.Always)
{
Document = document
};
renderer.RenderDocument();
// Create the custom outline
var pdfSharpDoc = renderer.PdfDocument;
var rootEntry = pdfSharpDoc.Outlines.Add(
"Level 1 Header", pdfSharpDoc.Pages[0]);
rootEntry.Outlines.Add("Level 2 Header", pdfSharpDoc.Pages[1]);
// Etc.
// Save the document
pdfSharpDoc.Save(outputStream);

I've got a method that is slightly less hacked. Here's the basic method:
1) Add a bookmark, save into a list that bookmark field object and the name of the outline entry. Do not set a paragraph .OutlineLevel (or set as bodytext)
// Defined previously
List<dynamic> Bookmarks = new List<dynamic>();
// In your bookmarking method, P is a Paragraph already created somewhere
Bookmarks.Add(new { Bookmark = P.AddBookmark("C1"), Name = "Chapter 1", Depth = 0 });
2) At the end of your Migradoc layout, before rendering, prepare the pages
pdfwriter.PrepareRenderPages();
3) Build a dictionary of the Bookmark's parent's parent (This will be a paragraph) and pages (pages will be initialized to -1)
var Pages = Bookmarks.Select(x=> ((BookmarkField)x).Bookmark.Parent.Parent).ToDictionary(x=>x, x=>-1);
4) Now fill in those pages by iterating through the objects on each page, finding the match
for (int i = 0; i < pdfwriter.PageCount; i++)
foreach (var s in pdfwriter.DocumentRenderer.GetDocumentObjectsFromPage(i).Where(x=> Pages.ContainsKey(x))
Pages[s] = i-1;
5) You've now got a dictionary of Bookmark's parent's parents to page numbers, with this you can add your outlines directly into the PDFSharp document. This also iterates down the depth-tree, so you can have nested outlines
foreach(dynamic d in Bookmarks)
{
var o = pdfwriter.PdfDocument.Outlines;
for(int i=0;i<d.Depth;i++)
o = o.Last().Outlines;
BookmarkField BK = d.Bookmark;
int PageNumber = Pages[BK.Parent.Parent];
o.Add(d.Name, pdfwriter.PdfDocument.Pages[PageNumber], true, PdfOutlineStyle.Regular);
}

Related

Copy/Clone an Excel Chart with EPPlus

I'm using EPPlus in my project and I know you can copy an existing Shape however there is no method for copying an existing Chart.
I have setup a workbook with template charts that I need to duplicate and update the series to point to different datatable/sets.
I can populate the data no worries and can create new charts, but then need to size and position and style. It would be a lot easier to just clone the chart template and modify series and position to simplify the code.
Currently I use this approach:
// wb is an ExcelWorkbook
ExcelWorksheet ws = wb.Worksheets[sheetIdx];
ExcelChart chart = (ExcelChart)ws.Drawings[0];
ExcelChart cc = ws.Drawings.AddChart("Chart " + (i + 2), eChartType.ColumnClustered);
// invoke methods that will position and size new chart
// copy starting chart xml so will have identical styling, series, legend etc
var xml = XDocument.Parse(chart.ChartXml.InnerXml);
XNamespace nsC = "http://schemas.openxmlformats.org/drawingml/2006/chart";
XNamespace nsA = "http://schemas.openxmlformats.org/drawingml/2006/main";
// modify xml to update Category, Title and Values formulas
var fs = xml.Descendants(nsC + "f");
foreach (var f in fs)
{
f.Value = ws.Cells[f.Value].Offset(chartNumRows + 1, 0).FullAddressAbsolute;
}
// set new chart xml to modified xml.
cc.ChartXml.InnerXml = xml.ToString();
Which works, but there are several drawbacks.
1) The chart.series of the clone (cc in my example) has not been set, peeking at the code this is because it is only done during the object construction. If I could get this property to update then I would be able to easily resolve the second issue
2) I need to remove all series and add new ones and because the series property isn't initialised properly this is harder than it should be.
Any help getting properties to initialise in the chart or a better method of cloning the original would be greatly appreciated!

It seems like there is no built-in functionality for this and an other reload methods I could come up with required too many changes to the EPPlus source, so until I find a better solution I have added the following method to EPPlus\Drawings\ExcelDrawings.cs
public ExcelChart CloneChart(ExcelChart SourceChart, String NewName)
{
// Create clone
var tempClone = this.AddChart(NewName, SourceChart.ChartType, null);
tempClone.ChartXml.InnerXml = SourceChart.ChartXml.InnerXml;
// Reload clone
using (tempClone.Part.Stream = new MemoryStream())
{
// Create chart object using temps package and xml
var chartXmlBytes = Encoding.ASCII.GetBytes(tempClone.ChartXml.OuterXml);
tempClone.Part.Stream.Write(chartXmlBytes, 0, chartXmlBytes.Length);
var finalClone = ExcelChart.GetChart(this, tempClone.TopNode);
// Remove old from collection
var index = _drawingNames[tempClone.Name];
var draw = _drawings[index];
for (int i = index + 1; i < _drawings.Count; i++)
_drawingNames[_drawings[i].Name]--;
_drawingNames.Remove(draw.Name);
_drawings.Remove(draw);
// Add new to collection
finalClone.Name = tempClone.Name;
_drawings.Add(finalClone);
_drawingNames.Add(finalClone.Name, _drawings.Count - 1);
// Done
return finalClone;
}
}

Add Comment in to selected Text in Word Document Using OpenXML c#

I need to use OpenXML to add comments in to a word document. I need to add a comment to a location or word(or multiple words). Normally in a word document openxml return those text as run elements. But the words which I wanted to add a comment is coming with different run elements. So I couldn't add a comment in to the document words which i actually wanted. It means that I couldn't add specific CommentRangeStart and CommentRangeEnd objects.
My current implementation is as below.
foreach (var paragraph in document.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Wordprocessing.Paragraph>())
{
foreach (var run in paragraph.Elements<Run>())
{
var item = run.Elements<Text>().FirstOrDefault(b => b.Text.Trim() == "My words selection to add comment");
if (item != null)
{
run.InsertBefore(new CommentRangeStart() { Id = id }, item);
var cmtEnd = run.InsertAfter(new CommentRangeEnd() { Id = id }, item);
run.InsertAfter(new Run(new CommentReference() { Id = id }), cmtEnd);
}
}
}
More Detail..
<w:r><w:t>This </w:t></w:r>
<w:r><w:t>is </w:t></w:r>
<w:r><w:t>a first paragraph</w:t></w:r>
So how could I add a comment in to text "is a first para" in that case.
Or in some cases openxml document contains run element as below.
<w:r><w:t>This is a first paragraph</w:t></w:r>
So both of these cases how to add a comment in to my specific selection of words. I have added a screenshot here which exactly what i want.

If the style doesn't differ, and if you are allowed to manipulate the doc, you could easily merge all runs in a paragraph, and then isolate the text run.

Get the formatting of a table that a specific string of text exisits in and create a new table with the same formatting

Using OpenXML in C#, we need to:
Find a specific string of text on a Word document (this text will always exist in a table cell)
Get the formatting of the text and the table that the text exists in.
Create a new table with the same text and table formatting while pulling in text values for the cell from a nested List
This is the code that I currently have and the places I am not sure how do:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(fileWordFile, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
Body body = mainPart.Document.Body;
IEnumerable paragraphs = body.Elements<Paragraph>();
Paragraph targetParagraph = null;
//Comment 1: Loop through paragraphs and search for a specific string of text in word document
foreach (Paragraph paragraph in paragraphs) {
if(paragraph.Elements<Run>().Any()) {
Run run = paragraph.Elements<Run>().First();
if(run.Elements<Text>().Any()) {
Text text = run.Elements<Text>().First();
if (text.Text.Equals("MY SEARCH STRING")) {
targetParagraph = paragraph;
// Comment 2: How can I get the formatting of the table that contains this text??
}
}
}
}
//Comment 3: Create table with same formatting as where the text was found
Table table1 = new Table();
TableProperties tableProperties1 = new TableProperties();
//Comment 4: How can I set these properties to be the same as the one found at "Comment 2"??
wordDoc.Close();
wordDoc.Dispose();
}

If you're looking for text elements that are inside a table cell, you can use a LINQ query to get there quickly without needing to use a heap of nested loops.
// Find the first text element matching the search string
// where the text is inside a table cell.
var textElement = body.Descendants<Text>()
.FirstOrDefault(t => t.Text == searchString &&
t.Ancestors<TableCell>().Any());
Once you have your match, the easiest way to duplicate the containing table with all its formatting and contents is simply to clone it.
if (textElement != null)
{
// get the table containing the matched text element and clone it
Table table = textElement.Ancestors<Table>().First();
Table tableCopy = (Table)table.CloneNode(deep: true);
// do stuff with copied table (see below)
}
After that, you can add things to the corresponding cell of the copied table. It's not entirely clear what you meant by "pulling in text values for the cell from a nested List" (what list? nested where?), so I'll just show a contrived example. (This code would replace the "do stuff" comment in the code above.)
// find the table cell containing the search string in the copied table
var targetCell = tableCopy.Descendants<Text>()
.First(t => t.InnerText == searchString)
.Ancestors<TableCell>()
.First();
// get the properties from the first paragraph in the target cell (so we can copy them)
var paraProps = targetCell.Descendants<ParagraphProperties>().First();
// now add new stuff to the target cell
List<string> stuffToAdd = new List<string> { "foo", "bar", "baz", "quux" };
foreach (string item in stuffToAdd)
{
// for each item, clone the paragraph properties, then add a new paragraph
var propsCopy = (ParagraphProperties)paraProps.CloneNode(deep: true);
targetCell.AppendChild(new Paragraph(propsCopy, new Run(new Text(item))));
}
Lastly, you need to add the copied table to the document somewhere or you won't see it. You don't say in your question where you would want this to appear, so I'll just put it at the end of the document. You can use methods like InsertAfter, InsertAt, InsertBefore, etc. to insert the table relative to other elements.
body.AppendChild(tableCopy);
Hope this helps.

How to get Page Count per section in a PDF

I'm rendering a PDF document with MigraDoc.
Each section has one or more paragraph texts.
Currently this is how I create a document;
var document = new Document();
var pdfRenderer = new PdfDocumentRenderer(true);
pdfRenderer.Document = document;
for(int i=0;i<10;i++){
Section section = document.AddSection();
section.PageSetup.PageFormat = PageFormat.A4;
for(int j=0;j<5;j++) {
var paragraphText = GetParaText(i,j); // some large text can span multiple pages
section.AddParagraph(paragraphText);
//Want page count per section?
// Section 1 -> 5 , Section 2 ->3 etc.
// int count = CalculateCurrentPageCount(); //*EDIT*
}
}
// Create the PDF document
pdfRenderer.RenderDocument();
pdfRenderer.Save(filename);
Edit : Currently i use the following code to get the page count.
But it takes a lot of time ,possibly every page is rendered twice.
public int CalculateCurrentPageCount()
{
var tempDocument = document.Clone();
tempDocument.BindToRenderer(null);
var pdfRenderer = new PdfDocumentRenderer(true);
pdfRenderer.Document = tempDocument;
pdfRenderer.RenderDocument();
int count = pdfRenderer.PdfDocument.PageCount;
Console.WriteLine("-- Count :" + count);
return count;
}
Some of the sections can span multiple pages depending on content added.
Is it possible to get/find how many pages (in PDF) it took for a Section to render?
Edit 2 : Is it possible to tag a section and find on which page it starts on?

Thx for the help. I calculated it like this (i.e. To get the count in code...) :
First i tagged the section with a creation count of the section
newsection.Tag = num_sections_in_doc; //count changes every time i add a section
Then i used GetDocumentObjectsFromPage :
var x = new Dictionary<int, int>();
int numpages = pdfRenderer.PdfDocument.PageCount;
for (int idx = 0; idx < numpages; idx++)
{
DocumentObject[] docObjects = pdfRenderer.DocumentRenderer.GetDocumentObjectsFromPage(idx + 1);
if (docObjects != null && docObjects.Length > 0)
{
Section section = docObjects[0].Section;
int sectionTag = -1;
if (section != null)
sectionTag = (int)section.Tag;
if (sectionTag >= 0)
{
// count a section only once
if (!x.ContainsKey(sectionTag))
x.Add(sectionTag, idx + 1);
}
}
}
x.Keys are the sections.
and x.values are the start of each section.

If you want to display the page count in the PDF, use paragraph.AddSectionPagesField().
See also:
https://stackoverflow.com/a/19499231/162529
To get the count in code: you can add a tag to any document object (e.g. to any paragraph) and then use docRenderer.GetDocumentObjectsFromPage(...) to query the objects for a specific page. This allows you to find out which section the objects on this page belong to.
Or create each section in a separate document and then combine them to one PDF using docRenderer.RenderPage(...) as shown here:
http://www.pdfsharp.net/wiki/MixMigraDocAndPdfSharp-sample.ashx
The sample scales pages down to thumbnail size - you would draw them 1:1, each on a new page.

Modifying a paragraph added using InsertParagraphAfter()

var p1 = document.Paragraphs.Add(ref o);
p1.Range.InsertParagraphAfter();
Now I want to grab the paragraph that was just created using InsertParagraphAfter() and modify it. How can I access it?

InsertParagraphAfter is supposed to extend the current selection to include the new paragraph. So if you start by creating an empty selection at the end of the existing paragraph, the current selection should be set to the new paragraph after calling InsertParagraphAfter.
Note that I have not tested the following code (I have not even tried compiling it), so I may be way off.
var p1 = document.Paragraphs.Add(ref o);
// Set the selection to the end of the paragraph.
document.Range(p1.Range.End, p1.Range.End).Select();
p1.Range.InsertParagraphAfter();
// InsertParagraphAfter should expand the active selection to include
// the newly inserted paragraph.
var newParagraph = document.Application.Selection;

You can accomplish this by adding a new paragraph relatively to the first paragraph:
Paragraph p1 = document.Paragraphs.Add(System.Reflection.Missing.Value);
p1.Range.Text = "Foo";
p1.Range.InsertParagraphAfter();
// Add new paragraph relative to first paragraph
Paragraph p2 = document.Paragraphs.Add(p1.Range);
p2.Range.Text = "Bar";
p2.Range.InsertParagraphAfter();
// Add new paragraph relative to the second paragraph
Paragraph p3 = document.Paragraphs.Add(p2.Range);
p3.Range.Text = "Baz";

I know this is way old but could not resist.
Here is the working solution (rng is one paragraph's range):
rng.InsertParagraphAfter()
If rng.Paragraphs(1).Next IsNot Nothing Then
rng.Paragraphs(1).Next.Style = ActiveDocument.Styles(WdBuiltinStyle.wdStyleNormal)
End If

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to Define a PDF Outline Using MigraDoc - c#

I used a hack: added white text on white ground with a font size of 0.001 or so to get outlines that are actually invisible to the user. For a perfect solution, mix PDFsharp and MigraDoc code. The hack works for me and is much easier to implement.

Related

Copy/Clone an Excel Chart with EPPlus

Add Comment in to selected Text in Word Document Using OpenXML c#

Get the formatting of a table that a specific string of text exisits in and create a new table with the same formatting

How to get Page Count per section in a PDF

Modifying a paragraph added using InsertParagraphAfter()

Categories

Resources