How to get text value from XDocument? - c#

I'm having an XDocument.For example,
<cars>
<name>
<ford>model1</ford>
textvalue
<renault>model2</renault>
</name>
</cars>
How to get the text value from the XDocument? How to identify the textvalue among the elements?

Text values are interpreted by XLinq as XText. therefore you can easily check if a node is of type XText or by checking the NodeType see:
// get all text nodes
var textNodes = document.DescendantNodes()
.Where(x => x.NodeType == XmlNodeType.Text);
However, it strikes me that you only want to find that piece of text that seems a bit lonely named textvalue. There is no real way to recognize this valid but unusual thing. You can either check if the parent is named 'name' or if the textNode itself is alone or not see:
// get 'lost' textnodes
var lastTextNodes = document.DescendantNodes()
.Where(x => x.NodeType == XmlNodeType.Text)
.Where(x => x.Parent.Nodes().Count() > 1);
edit just one extra comment, i see that many people claim that this XML is invalid. I have to disagree with that. Although its not pretty, it's still valid according to my knowledge (and validators)

You can use the Nodes property to iterate over the child nodes of the document's root element. From there on, text nodes will be represented by XText instances, and their text value is available through their Value property:
string textValue = yourDoc.Root.Nodes.OfType<XText>().First().Value;

Assuming the variable "doc" contains the XDocument representing your XML above,
doc.XPathSelectElement("cars/name").Nodes().OfType<XText>()
This should give you all of the XText type text nodes that contain plain text that you are looking for.

Related

Adding child Xelement if it doesnt exist

I have few solutions on how to do this, but I was wondering if there is a neat way to do this.
<Project>
<Test>
<Name value="zero">
<Name value="One">
<Name value="Two">
</Test>
</Project>
Now, I have access to testElement. I want to add new child XElement to it, only when it doesn't exist.
What I am currently doing is this. This is only sample code I am typing which is equivalent to what I am doing, so pardon me for minor mistakes.
XElement element = (from item
in testElement.Elements("name")
where item.Attribute("value") == "zero"
select item).SingleOrDefault();
if (element == null)
{
testElement.add(newElement);
}
Is there is a better way to do this? Maybe a simpler check?
you may perhaps use XPath extensions available via using System.Xml.XPath; to avoid lengthy LINQ, this is pretty simpler and easy
example
XElement element = testElement.XPathSelectElement("Name[#value='zero']");
if (element == null)
{
testElement.add(newElement);
}
XPath Name[#value='zero'] in the example above says that you are looking for an element named Name which has an attribute # named value and has the value of zero. so the linq in the question is reduced to on single XPathSelectElement and rest remains same.
optional, this is just a rewrite of above code
if (testElement.XPathSelectElement("Name[#value='zero']") == null)
{
testElement.add(newElement);
}
You are comparing an XAttribute to a string, so you probably are not getting the result you want.
item.Attribute("value") == "zero"
Try changing it to:
(string)item.Attribute("value") == "zero"
This converts the attribute to a string before the comparison.

Selecting all nodes containing text with XPath

I have been struggling to resolve this problem I am having over the past couple of days. Say, I want to get all the text() from a HTML document, however I only want to know of and retrieve of the XPath of the node that contains text data. Example:
foreach (var textNode in node.SelectNodes(".//text()"))
//do stuff here
However, when it comes to retrieving the XPath of the textNode using textNode.XPath, I get the full XPath including the #text node:
/html[1]/body[1]/div[1]/a[1]/#text
Yet I only want the containing node of the text, for example:
/html[1]/body[1]/div[1]/a[1]
Could anyone point me toward a better XPath solution to retrieve all nodes that contains text but only retrieve the XPath up until the containing node?
Instead of:
.//text()
use:
.//*[normalize-space(text())]
This selects all "leaf-elements"-descendants of the context (current) node that have at least one non-whitespace-only text node child.
Why don't you
string[] elements = getXPath(textNode).Split(new char[1] { '/' });
return String.Join("/", elements, 0, elements.Length-2);

Get tags around text in HTML document using C#

I would like to search an HTML file for a certain string and then extract the tags. Given:
<div_outer><div_inner>Happy birthday<div><div>
I would like to search the HTML for "Happy birthday" then have a function return some sort of tag structure: this is the innermost tag, this is the tag outside that one, etc. So, <div_inner></div> then <div_outer></div>.
Any ideas? I am thinking HTMLAgilityPack but I haven't been able to figure out how to do it.
Thanks as always, guys.
The HAP is a good place indeed for this.
You can use the OuterHtml and Parent properties of a Node to get the enclosing elements and markup.
You could use xpath for this. I use //*[text()='Happy birthday'][1]/ancestor-or-self::* expression which finds a first (for simplicity) node which text content is Happy birthday, and then returns all the ancestors (parent, grandparent, etc.) of this node and the node itself:
var doc = new HtmlDocument();
doc.LoadHtml("<div_outer><div_inner>Happy birthday<div><div>");
var ancestors = doc.DocumentNode
.SelectNodes("//*[text()='Happy birthday'][1]/ancestor-or-self::*")
.Reverse()
.ToList();
It seems that the order of the nodes returned is the order the nodes found in the document, so I used Enumerable.Reverse method to reverse it.
This will return 2 nodes: div_inner and div_outer.

How to find a XML tag with a specific attribute (in C#)

I need to get a list of tags that contain a specific attribute. I am using DITA xml and I need to find out all tags that has a href attribute.
The problem here is that the attribute may be inside any tag so XPath will not work in this case. For example, an image tag may contain a href, a topicref tag may contain a href, and so on.
So I need to get a XmlNodeList (as returned by the getElementByTagName method). Ideally I need a method getElementByAttributeName that should return XmlNodeList.
I might have misunderstood your problem here, but I think you could possibly use an XPath expression.
var nodes = doc.SelectNodes("//*[#href='pic1.jpg']");
The above should return all elements with href='pic1.jpg', where doc is the XmlDocument
If you're on C#, then the following approach might work for you:
XDocument document = XDocument.Load(xmlReader);
XAttribute xa = new XAttribute("href", "pic1.jpg");
var attrList = document.Descendants().Where (d => d.Attributes().Contains(xa));

Create XML subtree from string in LINQ?

I want to modify all the text nodes using some functions in C#.
I want to insert another xml subtree created from some string.
For example, I want to change this
<root>
this is a test
</root>
to
<root>
this is <subtree>another</subtree> test
</root>
I have this piece of code, but it inserts text node, I want to create xml subtree and insert that instead of plain text node.
List<XText> textNodes = element.DescendantNodes().OfType<XText>().ToList();
foreach (XText textNode in textNodes)
{
String node = System.Text.RegularExpressions.Regex.Replace(textNode.Value, "a", "<subtree>another</subtree>");
textNode.ReplaceWith(new XText(node));
}
You can split the original XText node into several, and add an XElement in between. Then you replace the original node with the three new nodes.
List<XNode> newNodes = Regex.Split(textNode.Value, "a").Select(p => (XNode) new XText(p)).ToList();
newNodes.Insert(1, new XElement("subtree", "another")); // substitute this with something better
textNode.ReplaceWith(newNodes);
I guess CreateDocumentFragment is much easier, though not LINQ, but the idea to use LINQ is ease only.

Categories

Resources