OpenXML Remove text from template

OpenXML Remove text from template - c#

I have a number of .docx templates that customers download, but certain words need to be changed or removed from the document for different customers. I can't find anything on how to remove text:-
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
foreach (Text element in doc.MainDocumentPart.Document.Body.Descendants<Text>())
{
//This is fine
element.Text = element.Text.Replace("DocumentDate", wordReferenceTemplatesMV.DocumentDate)
//Need help on how to remove text
element.Text = element.Text.Remove???("TextToRemove")
}

Why not just replace it with an empty string?
element.Text = element.Text.Replace("TextToRemove", string.Empty);

Most text values are in Run element. Basically you can run through all the Run elements and check its text. it should be something like:
Body body = wordprocessingDocument.MainDocumentPart.Document.Body;
foreach (Run r in body.Descendants<Run>())
{
string sText = r.InnerText ;
//...compare the text with the value
//note sometime, you could see the text be broken into two runs, you need to find a way based on your requirements and connect them. }
if you want to delete the text, you can just delete the run.
call the run's remove() method.
r.Remove();
More details about Runs and text object,
If you use the file as template, usually I will set some special properties on the Run element, so later, I can find them with more accuracy.
for example, inside the run loop, before checking its text, you can check the color first.
if( r.RunProperties.Highlight.Val == DocumentFormat.OpenXml.Wordprocessing.HighlightColorValues.Yellow )
{
string sText = r.InnerText ;
....
}
Hope it helps.

If you don't want the element any more then you can delete the whole element:
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
foreach (Text element in doc.MainDocumentPart.Document.Body.Descendants<Text>())
{
if (element.Text == "TextToRemove")
element.Remove();
}
}
Edit
If you're left with an empty line the chances are you have a Paragraph that contained the Text. In that case you want to remove the Paragraph instead in which case you can do:
if (element.Text == "TextToRemove")
element.Parent.Remove();

I don't think it's the paragraph element causing the empty line when removed.
Clients send over a template with an address block as:-
[address1]
[address2]
[city]
[town]
[state]
[zip]
The fields are populated from the database with the replace function, but if an address doesn't contain an [address2] value, that's what I need removing. If I remove the text, I'm still left with an empty line between [address1] and [city]. The [address2] field isn't in it's own paragraph.

Related

Distinct() values still letting in duplicates

This is another programming issue in which I think everything looks fine but does not work as intended.
What I'm trying to do is scrape all links from a webpage with htmlagilitypack and add them to a datagrid, but NOT to add duplicates to the datagrid.
Code:
webBrowser.Navigate(url);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(webBrowser.DocumentText);
if (debug)
{
Helpers.SaveDebugToFile(#"Debug\[google.com]-" + DateTime.Now.ToString("hhmmssffffff") + "-debug.html", webBrowser.DocumentText);
}
List<string> values = new List<string>();
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[#href]"))
{
HtmlAttribute href = link.Attributes["href"];
if (href.Value.Contains("google.") || href.Value.Contains("search?") || href.Value.StartsWith("/") || href.Value.Length < 5)
{
// Ignore.
}
else
{
// DO NOT ADD TO THE DATAGRID IF href.Value ALREADY EXISTS IN COLUMN 1 //
values.Add(href.Value);
}
}
foreach (var value in values.Distinct().ToList())
{
DataGridViewLinks.Rows.Add(value, randomKeyword);
}
The code works but it's still adding duplicates in the first column, but I'm only adding Distinct() values in (or that's what I intended it to happen).
I can't see the reason for this issue, i have looked over the code a good few times and don't see anything obvious wrong.
EDIT:

As it was already mentioned in above comments, most likely somewhere the content isn't exactly equal (different casing, some leading or trailing whitespace, ...)
Better would be to check for duplicates (with defined casing, and removing whitespaces), already when inserting to the "values" list

Instead of using Distinct directly in the for loop you can check the result in a List what all values you are getting and then can find whether the problem is in this section of code or any other section. Possibly list is appending while the loop is iterating.

passing a list to a list<T>

I am working with openxml, and have something that is pulling my hairs up, basicly i am editing a pré existing document, it is a template, the template should mantain the first page and the second, so every section i add(paragraph, table etc..) it should be added between the 2 pages, i already accomplish that, i can insert a simple table this way:
DocTable docTable = new DocTable();
Paragraph paragraph = doc.MainDocumentPart.Document.Body.Descendants<Paragraph>()
.Where<Paragraph>(p => p.InnerText.Equals("some Text")).First();
Table table = docTable.createTable(Convert.ToInt16(2), Convert.ToInt16(2));
mainPart.Document.Body.InsertAfter(table, paragraph);
i basicly search the pargraph at the end of the page 1 and insert the table after. My problem is: i don't receive a single section from a frontEnd webpage, i receive a list of sections, i defined this list as a list of object without a defined type since it can have Tables, paragraphs and other things.
so basicly i have this:
List<Object> listOfSections = new List<Object>();
In receive the sections from the front end, and identify what it is with the key like this:
foreach (DocumentAtributes section in sections.atributes)
{
if(section.key != "Document")
{
checkSection(mainPart, section, listOfSections);
}
}
public void checkSection(MainDocumentPart mainPart,DocumentAtributes section,List<Object> listOfSections)
{
switch (section.key)
{
case "Table":
DocTable docTable = new DocTable();
Table table = docTable.createTable(Convert.ToInt16(section.rows), Convert.ToInt16(section.cols));
listOfSections.Add(new Run(table));
break;
case "Paragraph":
DocRun accessTypeTitle = new DocRun();
Run permissionTitle = accessTypeTitle.createParagraph(section.text, PARAGRAPHCOLOR, Convert.ToInt16(section.fontSize), DEFAULTFONT,section.align);
listOfSections.Add(permissionTitle);
break;
case "Image":
DocImage docImage = new DocImage();
Run image = docImage.imageCreatorFromDisk(mainPart, "abcd", Convert.ToInt16(section.width), Convert.ToInt16(section.height), section.align, null, null, section.wrapChoice, section.base64);
listOfSections.Add(image);
break;
}
}
I need a way to add this list to the insertAfter, it must be the list i can't add the individual object since after i insert the first the next sections will be added after the paragraph either it brings me a issue since i want the order to be the same as it comes in the sections.atributes.
So the insertAfter accepts a list and i have a list of objects the method is like this: insertAfter(List, refChild)
Can i cast my list of objects or do something else? need some help here.

You can iterate the list in reverse to have the first element in the list immediately after the paragraph, followed by the second, then the third etc.
for (int i = listOfSections.Count - 1; i >= 0; i--)
{
mainPart.Document.Body.InsertAfter(listOfSections[i], paragraph);
}
If you start with a list with elements:
Element1
Element2
Element3
Element4
And the document starts with just:
Paragraph
Then after each iteration you would end up with:
Iteration 1
Paragraph
Element4
Iteration 2
Paragraph
Element3
Element4
Iteration 3
Paragraph
Element2
Element3
Element4
and finally, Iteration 4
Paragraph
Element1
Element2
Element3
Element4
which is the desired result.

Add Comment in to selected Text in Word Document Using OpenXML c#

I need to use OpenXML to add comments in to a word document. I need to add a comment to a location or word(or multiple words). Normally in a word document openxml return those text as run elements. But the words which I wanted to add a comment is coming with different run elements. So I couldn't add a comment in to the document words which i actually wanted. It means that I couldn't add specific CommentRangeStart and CommentRangeEnd objects.
My current implementation is as below.
foreach (var paragraph in document.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Wordprocessing.Paragraph>())
{
foreach (var run in paragraph.Elements<Run>())
{
var item = run.Elements<Text>().FirstOrDefault(b => b.Text.Trim() == "My words selection to add comment");
if (item != null)
{
run.InsertBefore(new CommentRangeStart() { Id = id }, item);
var cmtEnd = run.InsertAfter(new CommentRangeEnd() { Id = id }, item);
run.InsertAfter(new Run(new CommentReference() { Id = id }), cmtEnd);
}
}
}
More Detail..
<w:r><w:t>This </w:t></w:r>
<w:r><w:t>is </w:t></w:r>
<w:r><w:t>a first paragraph</w:t></w:r>
So how could I add a comment in to text "is a first para" in that case.
Or in some cases openxml document contains run element as below.
<w:r><w:t>This is a first paragraph</w:t></w:r>
So both of these cases how to add a comment in to my specific selection of words. I have added a screenshot here which exactly what i want.

If the style doesn't differ, and if you are allowed to manipulate the doc, you could easily merge all runs in a paragraph, and then isolate the text run.

Get the formatting of a table that a specific string of text exisits in and create a new table with the same formatting

Using OpenXML in C#, we need to:
Find a specific string of text on a Word document (this text will always exist in a table cell)
Get the formatting of the text and the table that the text exists in.
Create a new table with the same text and table formatting while pulling in text values for the cell from a nested List
This is the code that I currently have and the places I am not sure how do:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(fileWordFile, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
Body body = mainPart.Document.Body;
IEnumerable paragraphs = body.Elements<Paragraph>();
Paragraph targetParagraph = null;
//Comment 1: Loop through paragraphs and search for a specific string of text in word document
foreach (Paragraph paragraph in paragraphs) {
if(paragraph.Elements<Run>().Any()) {
Run run = paragraph.Elements<Run>().First();
if(run.Elements<Text>().Any()) {
Text text = run.Elements<Text>().First();
if (text.Text.Equals("MY SEARCH STRING")) {
targetParagraph = paragraph;
// Comment 2: How can I get the formatting of the table that contains this text??
}
}
}
}
//Comment 3: Create table with same formatting as where the text was found
Table table1 = new Table();
TableProperties tableProperties1 = new TableProperties();
//Comment 4: How can I set these properties to be the same as the one found at "Comment 2"??
wordDoc.Close();
wordDoc.Dispose();
}

If you're looking for text elements that are inside a table cell, you can use a LINQ query to get there quickly without needing to use a heap of nested loops.
// Find the first text element matching the search string
// where the text is inside a table cell.
var textElement = body.Descendants<Text>()
.FirstOrDefault(t => t.Text == searchString &&
t.Ancestors<TableCell>().Any());
Once you have your match, the easiest way to duplicate the containing table with all its formatting and contents is simply to clone it.
if (textElement != null)
{
// get the table containing the matched text element and clone it
Table table = textElement.Ancestors<Table>().First();
Table tableCopy = (Table)table.CloneNode(deep: true);
// do stuff with copied table (see below)
}
After that, you can add things to the corresponding cell of the copied table. It's not entirely clear what you meant by "pulling in text values for the cell from a nested List" (what list? nested where?), so I'll just show a contrived example. (This code would replace the "do stuff" comment in the code above.)
// find the table cell containing the search string in the copied table
var targetCell = tableCopy.Descendants<Text>()
.First(t => t.InnerText == searchString)
.Ancestors<TableCell>()
.First();
// get the properties from the first paragraph in the target cell (so we can copy them)
var paraProps = targetCell.Descendants<ParagraphProperties>().First();
// now add new stuff to the target cell
List<string> stuffToAdd = new List<string> { "foo", "bar", "baz", "quux" };
foreach (string item in stuffToAdd)
{
// for each item, clone the paragraph properties, then add a new paragraph
var propsCopy = (ParagraphProperties)paraProps.CloneNode(deep: true);
targetCell.AppendChild(new Paragraph(propsCopy, new Run(new Text(item))));
}
Lastly, you need to add the copied table to the document somewhere or you won't see it. You don't say in your question where you would want this to appear, so I'll just put it at the end of the document. You can use methods like InsertAfter, InsertAt, InsertBefore, etc. to insert the table relative to other elements.
body.AppendChild(tableCopy);
Hope this helps.

C# openxml removal of paragraph

I am trying to remove paragraph (I'm using some placeholder text to do generation from docx template-like file) from .docx file using OpenXML, but whenever I remove paragraph it breaks the foreach loop which I'm using to iterate trough.
MainDocumentPart mainpart = doc.MainDocumentPart;
IEnumerable<OpenXmlElement> elems = mainPart.Document.Body.Descendants();
foreach(OpenXmlElement elem in elems){
if(elem is Text && elem.InnerText == "##MY_PLACE_HOLDER##")
{
Run run = (Run)elem.Parent;
Paragraph p = (Paragraph)run.Parent;
p.RemoveAllChildren();
p.Remove();
}
}
This works, removes my place holder and paragraph it is in, but foreach loop stops iterating. And I need more things to do in my foreach loop.
Is this ok way to remove paragraph in C# using OpenXML and why is my foreach loop stopping or how to make it not stop? Thanks.

This is the "Halloween Problem", so called because it was noticed by some developers on Halloween, and it looked spooky to them. It is the problem of using declarative code (queries) with imperative code (deleting nodes) at the same time. If you think about it, you are iterating though a linked list, and if you start deleting nodes in the linked list, you totally mess up the iterator. A simpler way to avoid this problem is to "materialize" the results of the query in a List, and then you can iterate through the list, and delete nodes at will. The only difference in the following code is that it calls ToList after calling the Descendants axis.
MainDocumentPart mainpart = doc.MainDocumentPart;
IEnumerable<OpenXmlElement> elems = mainPart.Document.Body.Descendants().ToList();
foreach(OpenXmlElement elem in elems){
if(elem is Text && elem.InnerText == "##MY_PLACE_HOLDER##")
{
Run run = (Run)elem.Parent;
Paragraph p = (Paragraph)run.Parent;
p.RemoveAllChildren();
p.Remove();
}
}
However, I have to note that I see another bug in your code. There is nothing to stop Word from splitting up that text node into multiple text elements from multiple runs. While in most cases, your code will work fine, sooner or later, you or a user is going to take some action (like selecting a character, and accidentally hitting the bold button on the ribbon) and then your code will no longer work.
If you really want to work at the text level, then you need to use code such as what I introduce in this screen-cast: http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/08/04/introducing-textreplacer-a-new-class-for-powertools-for-open-xml.aspx
In fact, you could probably use that code verbatim to handle your use case, I believe.
Another approach, more flexible and powerful, is detailed in:
http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/06/13/open-xml-presentation-generation-using-a-template-presentation.aspx
While that screen-cast is about PresentationML, the same principles apply to WordprocessingML.
But even better, given that you are using WordprocessingML, is to use content controls. For one approach to document generation, see:
http://ericwhite.com/blog/map/generating-open-xml-wordprocessingml-documents-blog-post-series/
And for lots of information about using content controls in general, see:
http://www.ericwhite.com/blog/content-controls-expanded
-Eric

You have to use two cycles first that stores items you want to delete and second that deletes items.
something like this:
List<Paragraph> paragraphsToDelete = new List<Paragraph>();
foreach(OpenXmlElement elem in elems){
if(elem is Text && elem.InnerText == "##MY_PLACE_HOLDER##")
{
Run run = (Run)elem.Parent;
Paragraph p = (Paragraph)run.Parent;
paragraphsToDelete.Add(p);
}
}
foreach (var p in paragraphsToDelete)
{
p.RemoveAllChildren();
p.Remove();
}

Dim elems As IEnumerable(Of OpenXmlElement) = MainPart.Document.Body.Descendants().ToList()
For Each elem As OpenXmlElement In elems
If elem.InnerText.IndexOf("fullname") > 0 Then
elem.RemoveAllChildren()
End If
Next

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

OpenXML Remove text from template - c#

Why not just replace it with an empty string? element.Text = element.Text.Replace("TextToRemove", string.Empty);

Related

Distinct() values still letting in duplicates

passing a list to a list<T>

Add Comment in to selected Text in Word Document Using OpenXML c#

Get the formatting of a table that a specific string of text exisits in and create a new table with the same formatting

C# openxml removal of paragraph

Categories

Resources