How to replace the InnerText of a Comment - c#

I've tried the following:
comment.InnerText=comment.InnerText.Replace(comment.InnerText,new_text);
Which doesn't work because we can only read the InnerText property. How do I effectively change the InnerText value so I can save the modifications to WordProcessing.CommentsPart.Comments and MainDocumentPart.Document ?
EDIT: DocumentFormat.OpenXml.Wordprocessing.Comment is comment's class.
EDIT 2: The method:
public void updateCommentInnerTextNewWorkItem(List<Tuple<Int32, String, String>> list){
//DOCX.CDOC.Comments -> WordProcessingCommentsPart.Comments
//DOCX._CIT -> Dictionary<int,string>
foreach (var comm in DOCX.CDOC.Comments)
{
foreach (var item in list)
{
foreach (var item_cit in DOCX._CIT)
{
if (((Comment)comm).InnerText.Contains("<tag>") && item.Item3.Contains(item_cit.Value))
{
comm.InnerXml = comm.InnerXml.Replace(comm.InnerText, item.Item1 + "");
//comm.InnerText.Replace(comm.InnerText,item.Item1+"");
//DOCX.CDOC.Comments.Save();
//DOCX.DOC.MainDocumentPart.Document.Save();
}
if (((Comment)comm).InnerText.Contains("<tag class") && item.Item3.Contains(item_cit.Value))
{
//comm.InnerText.Replace(comm.InnerText, item.Item1 + "");
comm.InnerXml = comm.InnerXml.Replace(comm.InnerText, item.Item1 + "");
//DOCX.CDOC.Comments.Save();
//DOCX.DOC.MainDocumentPart.Document.Save();
}
}
}
}
DOCX.CDOC.Comments.Save();
DOCX.DOC.MainDocumentPart.Document.Save();
}

It's read-only because it returns the XML content with all XML tags removed. So setting it would strip it of all XML tags.
If the text you want to replace does not span tags you could just replace the text in the XML:
comment.InnerXml=comment.InnerXml.Replace(comment.InnerText,new_text);

It is not such easy(but still not complex). Comment has structure as well as document's body - it could contain Paragraphs, Runs etc. InnerText will just return to you text values of all runs of all paragraphs in this comment, so now you understand why you can not just set this value.
So first you have to remove all comment's paragraphs:
comment.RemoveAllChildren<Paragraph>();
Next step is to add new paragraph with run that contains text you need:
Paragraph paragraph = new Paragraph();
Run run = new Run();
Text text = new Text();
text.Text = "My comment";
run.Append(text);
paragraph.Append(run);
comment.Append(paragraph);
After all do not forget to save changes:
doc.MainDocumentPart.WordprocessingCommentsPart.Comments.Save();

Ahh....This is a little complex.And I have ever had the same problem.
You will need the XmlElement Class.And for example, there is a variable named xmlDoc which has been instantiated from XmlDocument.
And then you should use the method SelectSingleNode to get the reference of which XmlNode you want to edit.Here you need to transform the XmlNode into XmlElement by using this(Suppose the XmlNode is named 'node'):
XmlElement XmlEle = (XmlElement)node;
Also in easy way, you can use this:
XmlElement XmlEle = (XmlElement)xmlDoc.SelectSingleNode("dict/integer");
And now you can use the variable XmlEle to replace the InnerText because it's just a reference.
Like this:
XmlEle.InnerText = TopNumber.ToString();

just use not innterxml , user text
foreach (Paragraph paragraph in document.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Wordprocessing.Paragraph>())
{
bool ss = paragraph.InnerXml.Contains("commentRangeStart");
bool ee = paragraph.InnerXml.Contains("commentRangeEnd");
if (ss && ee)
{
foreach (Run run in paragraph.Elements<Run>())
{
foreach (Text text in run.Elements<Text>())
{
text.Text = "your word " ;
}
}
}
}

Related

how to get value from xml by Linq

i was reading huge xml file of 5GB size by using the following code, and i was success to get the first element Testid but failed to get another element TestMin coming under different namespace
this is the xml i am having
which i am getting as null
.What is wrong here?
EDIT
GMileys answer giving error like The ':' character, hexadecimal value 0x3A, cannot be included in a name
The element es:qRxLevMin is a child element of xn:attributes, but it looks like you are trying to select it as a child of xn:vsDataContainer, it is a grandchild of that element. You could try changing the following:
var dataqrxlevmin = from atts in pin.ElementsAfterSelf(xn + "VsDataContainer")
select new
{
qrxlevmin = (string)atts.Element(es + "qRxLevMin"),
};
To this:
var dataqrxlevmin = from atts in pin.Elements(string.Format("{0}VsDataContainer/{1}attributes", xn, es))
select new
{
qrxlevmin = (string)atts.Element(es + "qRxLevMin"),
};
Note: I changed your string concatenation to use string.Format for readability purposes, either is technically fine to use, but string.Format is a better approach.
What about this approach?
XDocument doc = XDocument.Load(path);
XName utranCellName = XName.Get("UtranCell", "un");
XName qRxLevMinName = XName.Get("qRxLevMin", "es");
var cells = doc.Descendants(utranCellName);
foreach (var cell in cells)
{
string qRxLevMin = cell.Descendants(qRxLevMinName).FirstOrDefault();
// Do something with the value
}
try this code which is very similar to your code but simpler.
using (XmlReader xr = XmlReader.Create(path))
{
xr.MoveToContent();
XNamespace un = xr.LookupNamespace("un");
XNamespace xn = xr.LookupNamespace("xn");
XNamespace es = xr.LookupNamespace("es");
while (!xr.EOF)
{
if(xr.LocalName != "UtranCell")
{
xr.ReadToFollowing("UtranCell", un.NamespaceName);
}
if(!xr.EOF)
{
XElement utranCell = (XElement)XElement.ReadFrom(xr);
}
}
}
actually namespace was the culprit,what i did is first loaded the small section i am getting from.Readform method in to xdocument,then i removed all the namespace,then i took the value .simple :)

How to iterate a xml file with XmlReader class

my xml stored in xml file which look like as below
<?xml version="1.0" encoding="utf-8"?>
<metroStyleManager>
<Style>Blue</Style>
<Theme>Dark</Theme>
<Owner>CSRAssistant.Form1, Text: CSR Assistant</Owner>
<Site>System.ComponentModel.Container+Site</Site>
<Container>System.ComponentModel.Container</Container>
</metroStyleManager>
this way i am iterating but some glitch is there
XmlReader rdr = XmlReader.Create(System.IO.Path.GetDirectoryName(System.Windows.Forms.Application.ExecutablePath) + #"\Products.xml");
while (rdr.Read())
{
if (rdr.NodeType == XmlNodeType.Element)
{
string xx1= rdr.LocalName;
string xx = rdr.Value;
}
}
it is always getting empty string xx = rdr.Value;
when element is style then value should be Blue as in the file but i am getting always empty....can u say why?
another requirement is i want to iterate always within <metroStyleManager></metroStyleManager>
can anyone help for the above two points. thanks
Blue is the value of Text node, not of Element node. You either need to add another if to get value of text nodes, or you can read inner xml of current element node:
rdr.MoveToContent();
while (rdr.Read())
{
if (rdr.NodeType == XmlNodeType.Element)
{
string name = rdr.LocalName;
string value = rdr.ReadInnerXml();
}
}
You can also use Linq to Xml to get names and values of root children:
var xdoc = XDocument.Load(path_to_xml);
var query = from e in xdoc.Root.Elements()
select new {
e.Name.LocalName,
Value = (string)e
};
You can use the XmlDocument class for this.
XmlDocument doc = new XmlDocument.Load(filename);
foreach (XmlNode node in doc.ChildNodes)
{
if (node.ElementName == "metroStyleManager")
{
foreach (XmlNode subNode in node.ChildNodes)
{
string key = subNode.LocalName; // Style, Theme, etc.
string value = subNode.Value; // Blue, Dark, etc.
}
}
else
{
...
}
}
you can user XDocument xDoc = XDocument.Load(strFilePath) to load XML file.
then you can use
foreach (XElement xeNode in xDoc.Element("metroStyleManager").Elements())
{
//Check if node exist
if (!xeNode.Elements("Style").Any()
//If yes then
xeNode.Value
}
Hope it Helps...
BTW, its from System.XML.Linq.XDocument

A better way to handle XML updation

I have a DataGridView control where some values are popluted.
And also I have an xml file. The user can change the value in the Warning Column of DataGridView.And that needs to be saved in the xml file.
The below program just does the job
XDocument xdoc = XDocument.Load(filePath);
//match the record
foreach (var rule in xdoc.Descendants("Rule"))
{
foreach (var row in dgRulesMaster.Rows.Cast<DataGridViewRow>())
{
if (rule.Attribute("id").Value == row.Cells[0].Value.ToString())
{
rule.Attribute("action").Value = row.Cells[3].Value.ToString();
}
}
}
//save the record
xdoc.Save(filePath);
Matching the grid values with the XML document and for the matched values, updating the needed XML attribute.
Is there a better way to code this?
Thanks
You could do something like this:
var rules = dgRulesMaster.Rows.Cast<DataGridViewRow>()
.Select(x => new {
RuleId = x.Cells[0].Value.ToString(),
IsWarning = x.Cells[3].Value.ToString() });
var tuples = from n in xdoc.Descendants("Rule")
from r in rules
where n.Attribute("id").Value == r.RuleId
select new { Node = n, Rule = r };
foreach(var tuple in tuples)
tuple.Node.Attribute("action").Value = tuple.Rule.IsWarning;
This is basically the same, just a bit more LINQ-y. Whether or not this is "better" is debatable. One thing I removed is the conversion of IsWarning first to string, then to int and finally back to string. It now is converted to string once and left that way.
XPath allows you to target nodes in the xml with alot of power. Microsoft's example of using the XPathNavigator to modify an XML file is as follows:
XmlDocument document = new XmlDocument();
document.Load("contosoBooks.xml");
XPathNavigator navigator = document.CreateNavigator();
XmlNamespaceManager manager = new XmlNamespaceManager(navigator.NameTable);
manager.AddNamespace("bk", "http://www.contoso.com/books");
foreach (XPathNavigator nav in navigator.Select("//bk:price", manager))
{
if (nav.Value == "11.99")
{
nav.SetValue("12.99");
}
}
Console.WriteLine(navigator.OuterXml);
Source: http://msdn.microsoft.com/en-us/library/zx28tfx1(v=vs.80).aspx

Neatest way to extract 'text' value from an XmlNode?

Given the code:
var doc = new XmlDocument();
doc.LoadXml(#"<a>
<b>test
<c>test2</c>
</b>
</a>");
var node = doc.SelectNodes("/a/b")[0];
I want to then extract the 'text' value of node b - in this case "test", without retrieving all text elements from all child nodes (as .innerText does)
I find myself resorting to this code
var elementText = node.ChildNodes.Cast<XmlNode>().First(a => a.NodeType == XmlNodeType.Text).Value;
As unfortunately node.Value does something else in this case
is there a neater/inbuilt way without resorting to linq casting? that doesnt involve me doing something like;
foreach (var childNode in node.ChildNodes)
if (childNode.NodeType==XmlNodeType.Text)
...
I prefer XDocument to XmlDocument, I think it's easier to work with. You can easily get a value using the Element method to find the "b" element, and then using the Value property.
using(var stream = new MemoryStream())
{
using(var streamWriter = new StreamWriter(stream))
{
streamWriter.Write(#"<a>
<b>test
<c>test2</c>
</b>
</a>");
streamWriter.Flush();
streamWriter.BaseStream.Seek(0, SeekOrigin.Begin);
var doc = XDocument.Load(stream);
Console.WriteLine(doc.Element("a").Element("b").FirstNode.ToString());
}
}
EDIT: As noted in comments, that would get the incorrect value. I've updated it correctly.
In LINQ2XML you can do this
foreach(XNode elm in doc.Descendants("b").OfType<XText>())
{
//elm has the text
}
You want to use node.InnerText instead of Value. So you would have this:
foreach(XmlNode child in node.ChildNodes)
{
if (string.IsNullOrEmpty(s))
{
s = child.InnerText;
}
else
{
s = s.Replace(child.InnerText, "");
}
}
s.Trim();

Grab all text from html with Html Agility Pack

Input
<html><body><p>foo <a href='http://www.example.com'>bar</a> baz</p></body></html>
Output
foo
bar
baz
I know of htmldoc.DocumentNode.InnerText, but it will give foobarbaz - I want to get each text, not all at a time.
XPATH is your friend :)
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(#"<html><body><p>foo <a href='http://www.example.com'>bar</a> baz</p></body></html>");
foreach(HtmlNode node in doc.DocumentNode.SelectNodes("//text()"))
{
Console.WriteLine("text=" + node.InnerText);
}
var root = doc.DocumentNode;
var sb = new StringBuilder();
foreach (var node in root.DescendantNodesAndSelf())
{
if (!node.HasChildNodes)
{
string text = node.InnerText;
if (!string.IsNullOrEmpty(text))
sb.AppendLine(text.Trim());
}
}
This does what you need, but I am not sure if this is the best way. Maybe you should iterate through something other than DescendantNodesAndSelf for optimal performance.
I was in the need of a solution that extracts all text but discards the content of script and style tags. I could not find it anywhere, but I came up with the following which suits my own needs:
StringBuilder sb = new StringBuilder();
IEnumerable<HtmlNode> nodes = doc.DocumentNode.Descendants().Where( n =>
n.NodeType == HtmlNodeType.Text &&
n.ParentNode.Name != "script" &&
n.ParentNode.Name != "style");
foreach (HtmlNode node in nodes) {
Console.WriteLine(node.InnerText);
var pageContent = "{html content goes here}";
var pageDoc = new HtmlDocument();
pageDoc.LoadHtml(pageContent);
var pageText = pageDoc.DocumentNode.InnerText;
The specified example for html content:
<html><body><p>foo <a href='http://www.example.com'>bar</a> baz</p></body></html>
will produce the following output:
foo bar baz
public string html2text(string html) {
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(#"<html><body>" + html + "</body></html>");
return doc.DocumentNode.SelectSingleNode("//body").InnerText;
}
This workaround is based on Html Agility Pack. You can also install it via NuGet (package name: HtmlAgilityPack).
https://github.com/jamietre/CsQuery
have you tried CsQuery? Though not being maintained actively - it's still my favorite for parsing HTML to Text. Here's a one liner of how simple it is to get the Text from HTML.
var text = CQ.CreateDocument(htmlText).Text();
Here's a complete console application:
using System;
using CsQuery;
public class Program
{
public static void Main()
{
var html = "<div><h1>Hello World <p> some text inside h1 tag under p tag </p> </h1></div>";
var text = CQ.CreateDocument(html).Text();
Console.WriteLine(text); // Output: Hello World some text inside h1 tag under p tag
}
}
I understand that OP has asked for HtmlAgilityPack only but CsQuery is another unpopular and one of the best solutions I've found and wanted to share if someone finds this helpful. Cheers!
I just changed and fixed some people's answers to work better:
var document = new HtmlDocument();
document.LoadHtml(result);
var sb = new StringBuilder();
foreach (var node in document.DocumentNode.DescendantsAndSelf())
{
if (!node.HasChildNodes && node.Name == "#text" && node.ParentNode.Name != "script" && node.ParentNode.Name != "style")
{
string text = node.InnerText?.Trim();
if (text.HasValue() && !text.StartsWith('<') && !text.EndsWith('>'))
sb.AppendLine(System.Web.HttpUtility.HtmlDecode(text.Trim()));
}
}

Categories

Resources