C# openxml removal of paragraph - c#

I am trying to remove paragraph (I'm using some placeholder text to do generation from docx template-like file) from .docx file using OpenXML, but whenever I remove paragraph it breaks the foreach loop which I'm using to iterate trough.
MainDocumentPart mainpart = doc.MainDocumentPart;
IEnumerable<OpenXmlElement> elems = mainPart.Document.Body.Descendants();
foreach(OpenXmlElement elem in elems){
if(elem is Text && elem.InnerText == "##MY_PLACE_HOLDER##")
{
Run run = (Run)elem.Parent;
Paragraph p = (Paragraph)run.Parent;
p.RemoveAllChildren();
p.Remove();
}
}
This works, removes my place holder and paragraph it is in, but foreach loop stops iterating. And I need more things to do in my foreach loop.
Is this ok way to remove paragraph in C# using OpenXML and why is my foreach loop stopping or how to make it not stop? Thanks.

This is the "Halloween Problem", so called because it was noticed by some developers on Halloween, and it looked spooky to them. It is the problem of using declarative code (queries) with imperative code (deleting nodes) at the same time. If you think about it, you are iterating though a linked list, and if you start deleting nodes in the linked list, you totally mess up the iterator. A simpler way to avoid this problem is to "materialize" the results of the query in a List, and then you can iterate through the list, and delete nodes at will. The only difference in the following code is that it calls ToList after calling the Descendants axis.
MainDocumentPart mainpart = doc.MainDocumentPart;
IEnumerable<OpenXmlElement> elems = mainPart.Document.Body.Descendants().ToList();
foreach(OpenXmlElement elem in elems){
if(elem is Text && elem.InnerText == "##MY_PLACE_HOLDER##")
{
Run run = (Run)elem.Parent;
Paragraph p = (Paragraph)run.Parent;
p.RemoveAllChildren();
p.Remove();
}
}
However, I have to note that I see another bug in your code. There is nothing to stop Word from splitting up that text node into multiple text elements from multiple runs. While in most cases, your code will work fine, sooner or later, you or a user is going to take some action (like selecting a character, and accidentally hitting the bold button on the ribbon) and then your code will no longer work.
If you really want to work at the text level, then you need to use code such as what I introduce in this screen-cast: http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/08/04/introducing-textreplacer-a-new-class-for-powertools-for-open-xml.aspx
In fact, you could probably use that code verbatim to handle your use case, I believe.
Another approach, more flexible and powerful, is detailed in:
http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/06/13/open-xml-presentation-generation-using-a-template-presentation.aspx
While that screen-cast is about PresentationML, the same principles apply to WordprocessingML.
But even better, given that you are using WordprocessingML, is to use content controls. For one approach to document generation, see:
http://ericwhite.com/blog/map/generating-open-xml-wordprocessingml-documents-blog-post-series/
And for lots of information about using content controls in general, see:
http://www.ericwhite.com/blog/content-controls-expanded
-Eric

You have to use two cycles first that stores items you want to delete and second that deletes items.
something like this:
List<Paragraph> paragraphsToDelete = new List<Paragraph>();
foreach(OpenXmlElement elem in elems){
if(elem is Text && elem.InnerText == "##MY_PLACE_HOLDER##")
{
Run run = (Run)elem.Parent;
Paragraph p = (Paragraph)run.Parent;
paragraphsToDelete.Add(p);
}
}
foreach (var p in paragraphsToDelete)
{
p.RemoveAllChildren();
p.Remove();
}

Dim elems As IEnumerable(Of OpenXmlElement) = MainPart.Document.Body.Descendants().ToList()
For Each elem As OpenXmlElement In elems
If elem.InnerText.IndexOf("fullname") > 0 Then
elem.RemoveAllChildren()
End If
Next

Related

Distinct() values still letting in duplicates

This is another programming issue in which I think everything looks fine but does not work as intended.
What I'm trying to do is scrape all links from a webpage with htmlagilitypack and add them to a datagrid, but NOT to add duplicates to the datagrid.
Code:
webBrowser.Navigate(url);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(webBrowser.DocumentText);
if (debug)
{
Helpers.SaveDebugToFile(#"Debug\[google.com]-" + DateTime.Now.ToString("hhmmssffffff") + "-debug.html", webBrowser.DocumentText);
}
List<string> values = new List<string>();
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[#href]"))
{
HtmlAttribute href = link.Attributes["href"];
if (href.Value.Contains("google.") || href.Value.Contains("search?") || href.Value.StartsWith("/") || href.Value.Length < 5)
{
// Ignore.
}
else
{
// DO NOT ADD TO THE DATAGRID IF href.Value ALREADY EXISTS IN COLUMN 1 //
values.Add(href.Value);
}
}
foreach (var value in values.Distinct().ToList())
{
DataGridViewLinks.Rows.Add(value, randomKeyword);
}
The code works but it's still adding duplicates in the first column, but I'm only adding Distinct() values in (or that's what I intended it to happen).
I can't see the reason for this issue, i have looked over the code a good few times and don't see anything obvious wrong.
EDIT:
As it was already mentioned in above comments, most likely somewhere the content isn't exactly equal (different casing, some leading or trailing whitespace, ...)
Better would be to check for duplicates (with defined casing, and removing whitespaces), already when inserting to the "values" list
Instead of using Distinct directly in the for loop you can check the result in a List what all values you are getting and then can find whether the problem is in this section of code or any other section. Possibly list is appending while the loop is iterating.

C# - Duplicates in List of string Lists instead of proper values

Im reading from xml file using foreach (as in below) and writing found info into a List, which then is later added to a list of lists. My problem is that the moment foreach loop is trying to add another element to my lists of lists it somehow erases the content of previous elements of the list and instead adds x of the same. E.g. first loop is ok, on the second loop it erases the first element and adds 2 of the same, on the 3rd loop it adds 3 same lists etc.
It might be a simple problem but i really cannot think of a solution to at the moment.
Code:
static List<List<string>> AddPapers(XmlNodeList nodelist)
{
var papers = new List<List<string>>();
var paper = new List<string>();
foreach (XmlNode node in nodelist)
{
paper.Clear();
for (int i = 0; i < node.ChildNodes.Count; i++)
{
paper.Add(node.ChildNodes[i].InnerText);
}
papers.Add(paper);
}
return papers;
}
More info: This is sort of a simplified version without all the fancy stuff id do with the xml but nevertheless, the problem is the same.
The paper list is good everytime i check so the problem should be with adding to papers. I honestly have no idea why or even how can it erase the contents of papers and add same values on its own.
The problem is that you're only calling paper.Clear, which clears the list that you just added, but then you re-populate it with new items and add it again.
Instead, you should create a new instance of the list on each iteration, so you're not always modifying the same list over and over again (remember a List<T> is a reference type, so you're only adding a reference to the list).
For example:
static List<List<string>> AddPapers(XmlNodeList nodelist)
{
var papers = new List<List<string>>();
foreach (XmlNode node in nodelist)
{
// Create a new list on each iteration
var paper = new List<string>();
for (int i = 0; i < node.ChildNodes.Count; i++)
{
paper.Add(node.ChildNodes[i].InnerText);
}
papers.Add(paper);
}
return papers;
}
Also, using System.Linq extention methods, your code can be reduced to:
static List<List<string>> GetChildrenInnerTexts(XmlNodeList nodes)
{
return nodes.Cast<XmlNode>()
.Select(node => node.ChildNodes.Cast<XmlNode>()
.Select(child => child.InnerText)
.ToList())
.ToList();
}
The issue is with reference. You need to initialize 'paper' instead of clearing it.
Inside you first foreach loop, change
paper.Clear()
With
paper = new List<string>();
When you clear the object, you are keeping the reference to empty object for every index of papers

OpenXML Remove text from template

I have a number of .docx templates that customers download, but certain words need to be changed or removed from the document for different customers. I can't find anything on how to remove text:-
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
foreach (Text element in doc.MainDocumentPart.Document.Body.Descendants<Text>())
{
//This is fine
element.Text = element.Text.Replace("DocumentDate", wordReferenceTemplatesMV.DocumentDate)
//Need help on how to remove text
element.Text = element.Text.Remove???("TextToRemove")
}
Why not just replace it with an empty string?
element.Text = element.Text.Replace("TextToRemove", string.Empty);
Most text values are in Run element. Basically you can run through all the Run elements and check its text. it should be something like:
Body body = wordprocessingDocument.MainDocumentPart.Document.Body;
foreach (Run r in body.Descendants<Run>())
{
string sText = r.InnerText ;
//...compare the text with the value
//note sometime, you could see the text be broken into two runs, you need to find a way based on your requirements and connect them. }
if you want to delete the text, you can just delete the run.
call the run's remove() method.
r.Remove();
More details about Runs and text object,
If you use the file as template, usually I will set some special properties on the Run element, so later, I can find them with more accuracy.
for example, inside the run loop, before checking its text, you can check the color first.
if( r.RunProperties.Highlight.Val == DocumentFormat.OpenXml.Wordprocessing.HighlightColorValues.Yellow )
{
string sText = r.InnerText ;
....
}
Hope it helps.
If you don't want the element any more then you can delete the whole element:
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
foreach (Text element in doc.MainDocumentPart.Document.Body.Descendants<Text>())
{
if (element.Text == "TextToRemove")
element.Remove();
}
}
Edit
If you're left with an empty line the chances are you have a Paragraph that contained the Text. In that case you want to remove the Paragraph instead in which case you can do:
if (element.Text == "TextToRemove")
element.Parent.Remove();
I don't think it's the paragraph element causing the empty line when removed.
Clients send over a template with an address block as:-
[address1]
[address2]
[city]
[town]
[state]
[zip]
The fields are populated from the database with the replace function, but if an address doesn't contain an [address2] value, that's what I need removing. If I remove the text, I'm still left with an empty line between [address1] and [city]. The [address2] field isn't in it's own paragraph.

passing a list to a list<T>

I am working with openxml, and have something that is pulling my hairs up, basicly i am editing a pré existing document, it is a template, the template should mantain the first page and the second, so every section i add(paragraph, table etc..) it should be added between the 2 pages, i already accomplish that, i can insert a simple table this way:
DocTable docTable = new DocTable();
Paragraph paragraph = doc.MainDocumentPart.Document.Body.Descendants<Paragraph>()
.Where<Paragraph>(p => p.InnerText.Equals("some Text")).First();
Table table = docTable.createTable(Convert.ToInt16(2), Convert.ToInt16(2));
mainPart.Document.Body.InsertAfter(table, paragraph);
i basicly search the pargraph at the end of the page 1 and insert the table after. My problem is: i don't receive a single section from a frontEnd webpage, i receive a list of sections, i defined this list as a list of object without a defined type since it can have Tables, paragraphs and other things.
so basicly i have this:
List<Object> listOfSections = new List<Object>();
In receive the sections from the front end, and identify what it is with the key like this:
foreach (DocumentAtributes section in sections.atributes)
{
if(section.key != "Document")
{
checkSection(mainPart, section, listOfSections);
}
}
public void checkSection(MainDocumentPart mainPart,DocumentAtributes section,List<Object> listOfSections)
{
switch (section.key)
{
case "Table":
DocTable docTable = new DocTable();
Table table = docTable.createTable(Convert.ToInt16(section.rows), Convert.ToInt16(section.cols));
listOfSections.Add(new Run(table));
break;
case "Paragraph":
DocRun accessTypeTitle = new DocRun();
Run permissionTitle = accessTypeTitle.createParagraph(section.text, PARAGRAPHCOLOR, Convert.ToInt16(section.fontSize), DEFAULTFONT,section.align);
listOfSections.Add(permissionTitle);
break;
case "Image":
DocImage docImage = new DocImage();
Run image = docImage.imageCreatorFromDisk(mainPart, "abcd", Convert.ToInt16(section.width), Convert.ToInt16(section.height), section.align, null, null, section.wrapChoice, section.base64);
listOfSections.Add(image);
break;
}
}
I need a way to add this list to the insertAfter, it must be the list i can't add the individual object since after i insert the first the next sections will be added after the paragraph either it brings me a issue since i want the order to be the same as it comes in the sections.atributes.
So the insertAfter accepts a list and i have a list of objects the method is like this: insertAfter(List, refChild)
Can i cast my list of objects or do something else? need some help here.
You can iterate the list in reverse to have the first element in the list immediately after the paragraph, followed by the second, then the third etc.
for (int i = listOfSections.Count - 1; i >= 0; i--)
{
mainPart.Document.Body.InsertAfter(listOfSections[i], paragraph);
}
If you start with a list with elements:
Element1
Element2
Element3
Element4
And the document starts with just:
Paragraph
Then after each iteration you would end up with:
Iteration 1
Paragraph
Element4
Iteration 2
Paragraph
Element3
Element4
Iteration 3
Paragraph
Element2
Element3
Element4
and finally, Iteration 4
Paragraph
Element1
Element2
Element3
Element4
which is the desired result.

I'm using the property findelements with selenium and C#, but it keeps giving the same error

This is a part of the code that i was trying to use to get the respective elements, but it keeps giving me the following error:
System.Collections.ObjectModel.ReadOnlyCollection`1[OpenQA.Selenium.IWebElement]or
others identical
This is also shown in a datagridview, in her rows.
IList<IWebElement> ruas = Gdriver.FindElements(By.ClassName("search-title"));
String[] AllText = new String[ruas.Count];
int i = 0;
foreach (IWebElement element in ruas)
{
AllText[i++] = element.Text;
table.Rows.Add(ruas);
}
First thing is: as far as I understand the elements you are talking about are not contained in table. Its a list: <ul class="list-unstyled list-inline">... (considering the comment you left with site link)
If you want to find those elements you can use the code below:
var elements = driver.FindElements(By.CssSelector("ul.list-inline > li > a"));
// Here you can iterate though links and do whatever you want with them
foreach (var element in elements)
{
Console.WriteLine(element.Text);
}
// Here is the collection of links texts
var linkNames = elements.Select(e => e.Text).ToList();
Considering the error you get, I may assume that you are using DataGridView for storing collected data, which is terribly incorrect. DataGridView is used for viewing data in MVC application. There is no standard Selenium class for storing table data. There are multiple approaches for this, but I can't suggest you any because I don't know your what you are trying to achieve.
Here is how i answered my own question:
IList<string> all = new List<string>();
foreach (var element in Gdriver.FindElements(By.ClassName("search-title")))
{
all.Add(element.Text);
table.Rows.Add(element.Text);
}

Categories

Resources