c# htmlagility select specific xpath

c# htmlagility select specific xpath - c#

i have this html code :
<div>
<time class="departure"><span></span>value1<time class="return">
<span></span>value2</time>
</div>
i'm using the c# code below :
var nodes = doc.DocumentNode.SelectNodes("//time[#class='departure']");
foreach (var node in nodes)
{
Console.WriteLine(node.InnerHtml);
if (node.InnerText.Trim() == DepartTime)
{
ReturnTime = node.SelectSingleNode("time").InnerText; //null reference here
}
}
so as you can see i'm checking if the depart time (DepartTime) exist then it will returns the next innertext value of the first time element after . but this doesnt seems to be working i get exception null reference .

solved it by
foreach (var node in nodes)
{
if (node.InnerText.Trim() == DepartTime)
{
ReturnTime = node.ParentNode.SelectNodes("time")[1].InnerText.Trim();
}
}

Related

XML Node Attribute returning as NULL when populated?

I have a XML Doc that I'm pulling out a specific Node and all of it's attributes. In debug mode I can see that I'm getting the specific Nodes and all of their attributes. However, when I try to get the attribute value it can't find it and returns a NULL value. I've done some searching and looked at some examples and from what I can tell I should be getting the value but I'm not and I don't see what I'm doing wrong.
I'm trying to get the StartTime value.
Here is the XML that is returned.
Here you can see in debug and with the Text Visualizer the value should be there.
The code I'm trying.
XmlNodeList nodes = xmlDoc.GetElementsByTagName("PlannedAbsences");
if (nodes != null && nodes.Count > 0)
{
foreach (XmlNode node in nodes)
{
if (node.Attributes != null)
{
var nameAttribute = node.Attributes["StartTime"];
if (nameAttribute != null)
{
//var startDate = nameAttribute.Value;
}
}
}
}

Using the XDocument class contained within the System.Xml.Linq namespace, grab the sub elements from the PlannedAbsences parent, then iterate over sub elements retrieving the value of the desired attribute.
var xmlDoc = XDocument.Load(#"path to xml file")
var absences = xmlDoc.Element("PlannedAbsences")?.Elements("Absence");
foreach (var item in absences)
{
var xElement = item.Attribute("StartTime").Value;
Console.WriteLine(xElement);
}

Accessing subnode in xml

I have a xml like this
<abcd>
<abcd1>
<hgjh>txt</hgjh>
<addedit>true</addedit>
<Db>txtDB</DB>
<server>ser</server>
</abcd1>
<abcd1>
<hgjh>txt</hgjh>
<addedit>false</addedit>
<Db>txtDB</DB>
<server>ser</server>
</abcd1>
</abcd>
Now based on addedit flag value if it is true i need to show the db name and server name for any amount of nodes in the xml. I have included two of them for the reference .
Please help me out on this. I tried lot of code in C# not able to reach the required functionality.

Hope this will help you:
XmlDocument Xmlabcd= new XmlDocument();
Xmlabcd.LoadXml(xml); //xml=your xml
foreach (XmlNode v in Xmlabcd.ChildNodes)
{
if ( v.ChildNodes.Count > 0)
{
bool addedit=false;
string DBName=string.Empty;
string serverName=string.Empty;
foreach (XmlNode child in v.ChildNodes)
{
if (child.Name.Equals("addedit") )
{
addedit=child.InnerText=="true"?true:false;
}
if(child.Name.Equals("Db"))
{
DBName=child.InnerText;
}
if(child.Name.Equals("server"))
{
serverName=child.InnerText;
}
}
//Code to Show Db and server or to add it to one list based on bool addedit(variable) value
}
}

There is a mistake. Opening "Db" not matching closing "DB".
You can use XPath to select nodes with addedit = true
var list = xmlDoc.DocumentElement.SelectNodes("abcd1[addedit='true']");
foreach (XmlNode node in list)
{
Console.WriteLine(node.SelectSingleNode("DB").InnerText);
Console.WriteLine(node.SelectSingleNode("server").InnerText);
}

XDocument, it says that a node is text but it is an element

I am reading an XML that contains a tag like this:
<source><bpt id="1"><donottranslate></bpt><ph id="2">($ T_353_1 Parent ID $)</ph><ept id="1"></donottranslate></ept></source>
When reading source node I get that this node type is Text, but it should be Element.
This is an XML that I am receiving and I cannot change it.
Do you know how can I get this sorted out?
This is my code:
XDocument doc = XDocument.Load(fileName, LoadOptions.PreserveWhitespace);
foreach (var elUnit in doc.Descendants("trans-unit"))
{
if (elUnit.AttributeString("translate").ToString() == "no")
{
foreach (var elSource in elUnit.Elements("source"))
{
string text = "";
foreach (var node in elSource.DescendantNodes().Where(n => XmlNodeType.Text == n.NodeType).ToList())
{
//When reading that "source" node, it enters inside this code
Thanks

First check whether your XML is wellformed
http://www.w3schools.com/xml/xml_validator.asp
http://chris.photobooks.com/xml/default.htm
I could get this to work
//using System.Xml.Linq;
var str = "<source><bpt id=\"1\"><donottranslate></bpt>" +
"<ph id=\"2\">($ T_353_1 Parent ID $)</ph>" +
"<ept id=\"1\"></donottranslate></ept></source>";
XElement element = XElement.Parse(str);
Console.WriteLine(element);
The output is this
<source>
<bpt id="1"><donottranslate></bpt>
<ph id="2">($ T_353_1 Parent ID $)</ph>
<ept id="1"></donottranslate></ept>
</source>
Please provide some code sample for more help if this example if not suffient.

Finally, I solved this checking if the node is correct or not:
if (System.Security.SecurityElement.IsValidText(text.XmlDecodeEntities()))

How can I get this with XPath

I'm writing a Crawler for one of the sites and and came across with this problem.
From this HTML...
<div class="Price">
<span style="font-size: 14px; text-decoration: line-through; color: #444;">195.90 USD</span>
<br />
131.90 USD
</div>
I need to get only 131.90 USD using XPath.
Tried this...
"//div[#class='Price']"
But it returns different result.
How can i achieve this?
EDIT
I'm using this C# code (simplified for demonstration)
protected override DealDictionary GrabData(HtmlAgilityPack.HtmlDocument html) {
var price = Helper.GetInnerHtml(html.DocumentNode, "//div[#class='Price']/text()");
}
Helper Class
public static class Helper {
public static String GetInnerText(HtmlDocument doc, String xpath) {
var nodes = doc.DocumentNode.SelectNodes(xpath);
if (nodes != null && nodes.Count > 0) {
var node = nodes[0];
return node.InnerText.TrimHtml();
}
return String.Empty;
}
public static String GetInnerText(HtmlNode inputNode, String xpath) {
var nodes = inputNode.SelectNodes(xpath);
if (nodes != null && nodes.Count > 0) {
var node = nodes[0];
var comments = node.ChildNodes.OfType<HtmlCommentNode>().ToList();
foreach (var comment in comments)
comment.ParentNode.RemoveChild(comment);
return node.InnerText.TrimHtml();
}
return String.Empty;
}
public static String GetInnerHtml(HtmlDocument doc, String xpath) {
var nodes = doc.DocumentNode.SelectNodes(xpath);
if (nodes != null && nodes.Count > 0) {
var node = nodes[0];
return node.InnerHtml.TrimHtml();
}
return String.Empty;
}
public static string GetInnerHtml(HtmlNode inputNode, string xpath) {
var nodes = inputNode.SelectNodes(xpath);
if (nodes != null && nodes.Count > 0) {
var node = nodes[0];
return node.InnerHtml.TrimHtml();
}
return string.Empty;
}
}

The XPath you tried is a good start:
//div[#class='Price']
This selects any <div> element in the Xml document. You restrict that selection to <div> elements that have a class attribute whose value is Price.
So far, so good - but as you select a <div> element, what you will get back will be a <div> element including all of its contents.
In the Xml fragment you show above, you have the following hierarchical structure:
<div> element
<span> element
text node
<br> element
text node
So, what you are actually interested in is the latter text node. You can use text() in XPath to select any text nodes. As in this case, you are interested in the first text node that is an immediate child of the <div> element you found, your XPath should look like this:
//div[#class='Price']/text()

How do I check particular attributes exist or not in XML?

Part of the XML content:
<section name="Header">
<placeholder name="HeaderPane"></placeholder>
</section>
<section name="Middle" split="20">
<placeholder name="ContentLeft" ></placeholder>
<placeholder name="ContentMiddle"></placeholder>
<placeholder name="ContentRight"></placeholder>
</section>
<section name="Bottom">
<placeholder name="BottomPane"></placeholder>
</section>
I want to check in each node and if attribute split exist, try to assign an attribute value in a variable.
Inside a loop, I try:
foreach (XmlNode xNode in nodeListName)
{
if(xNode.ParentNode.Attributes["split"].Value != "")
{
parentSplit = xNode.ParentNode.Attributes["split"].Value;
}
}
But I'm wrong if the condition checks only the value, not the existence of attributes. How should I check for the existence of attributes?

You can actually index directly into the Attributes collection (if you are using C# not VB):
foreach (XmlNode xNode in nodeListName)
{
XmlNode parent = xNode.ParentNode;
if (parent.Attributes != null
&& parent.Attributes["split"] != null)
{
parentSplit = parent.Attributes["split"].Value;
}
}

If your code is dealing with XmlElements objects (rather than XmlNodes) then there is the method XmlElement.HasAttribute(string name).
So if you are only looking for attributes on elements (which it looks like from the OP) then it may be more robust to cast as an element, check for null, and then use the HasAttribute method.
foreach (XmlNode xNode in nodeListName)
{
XmlElement xParentEle = xNode.ParentNode as XmlElement;
if((xParentEle != null) && xParentEle.HasAttribute("split"))
{
parentSplit = xParentEle.Attributes["split"].Value;
}
}

Just for the newcomers: the recent versions of C# allows the use of ? operator to check nulls assignments
parentSplit = xNode.ParentNode.Attributes["split"]?.Value;

You can use LINQ to XML,
XDocument doc = XDocument.Load(file);
var result = (from ele in doc.Descendants("section")
select ele).ToList();
foreach (var t in result)
{
if (t.Attributes("split").Count() != 0)
{
// Exist
}
// Suggestion from #UrbanEsc
if(t.Attributes("split").Any())
{
}
}
OR
XDocument doc = XDocument.Load(file);
var result = (from ele in doc.Descendants("section").Attributes("split")
select ele).ToList();
foreach (var t in result)
{
// Response.Write("<br/>" + t.Value);
}

var splitEle = xn.Attributes["split"];
if (splitEle !=null){
return splitEle .Value;
}

EDIT
Disregard - you can't use ItemOf (that's what I get for typing before I test). I'd strikethrough the text if I could figure out how...or maybe I'll simply delete the answer, since it was ultimately wrong and useless.
END EDIT
You can use the ItemOf(string) property in the XmlAttributesCollection to see if the attribute exists. It returns null if it's not found.
foreach (XmlNode xNode in nodeListName)
{
if (xNode.ParentNode.Attributes.ItemOf["split"] != null)
{
parentSplit = xNode.ParentNode.Attributes["split"].Value;
}
}
XmlAttributeCollection.ItemOf Property (String)

You can use the GetNamedItem method to check and see if the attribute is available. If null is returned, then it isn't available. Here is your code with that check in place:
foreach (XmlNode xNode in nodeListName)
{
if(xNode.ParentNode.Attributes.GetNamedItem("split") != null )
{
if(xNode.ParentNode.Attributes["split"].Value != "")
{
parentSplit = xNode.ParentNode.Attributes["split"].Value;
}
}
}

Another way to handle the situation is exception handling.
Every time a non-existent value is called, your code will recover from the exception and just continue with the loop. In the catch-block you can handle the error the same way you write it down in your else-statement when the expression (... != null) returns false. Of course throwing and handling exceptions is a relatively costly operation which might not be ideal depending on the performance requirements.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# htmlagility select specific xpath - c#

solved it by foreach (var node in nodes) { if (node.InnerText.Trim() == DepartTime) { ReturnTime = node.ParentNode.SelectNodes("time")[1].InnerText.Trim(); } }

Related

XML Node Attribute returning as NULL when populated?

Accessing subnode in xml

XDocument, it says that a node is text but it is an element

How can I get this with XPath

How do I check particular attributes exist or not in XML?

Categories

Resources