XML Parsing with HtmlAgilityPack - c#

I'm parsing xml with HtmlAgilityPack on WebService worker role, but there is something wrong. When I select childnode "link" get empty char.
the xml like :
<link>
http://www.webtekno.com/google/google-ve-razer-dan-oyun-konsolu.html
</link>
my code for get link from rss is:
HtmlNodeCollection nodeList = doc.DocumentNode.SelectNodes("//item");
foreach (HtmlNode node in nodeList)
{
string newsUri = node.ChildNodes["link"].InnerText;
}
I think gets empty char cause link node includes new line and after link. How can I get link in the node?

Put that line before loading HtmlDocument
HtmlNode.ElementsFlags["link"] = HtmlElementFlag.Closed;
That is all.
By default, its value is HtmlElementFlag.Empty and treated like meta and img tags...

Related

Failing to read value XML attribute in C#

I am trying to read a file produced by another developer. The file looks something like this. I am trying to read in the value for 'ProfileName', but when I look at the object in memory, I see null for the Value (capital V) attribute. The only place I can see the string "GolfLeague-Dual" is in the outerxml attribute, but I would have to parse through a bunch of just to get it.
<?xml version="1.0"?>
<TopNode>
<ProfileSettings>
<ProfileName value="GolfLeague-Dual" />
</ProfileSettings>
</TopNode>
Here is my code to try to read this:
XmlDocument doc = new XmlDocument();
doc.Load(directory + #"\Settings.xml");
XmlElement root = doc.DocumentElement;
XmlNodeList nodes = root.SelectNodes("//ProfileSettings");
foreach (XmlNode node in nodes) {
Console.WriteLine(node["ProfileName"].Value);
}
Your code is trying to get the inner value of the node, not an attribute called value. Try this instead...
foreach (XmlNode node in nodes) {
Console.WriteLine(node["ProfileName"].Attributes["value"].Value);
}
Here's a working dotnetfiddle...
https://dotnetfiddle.net/pmJKbX

How read content of a span tag using HtmlAgilityPack?

I'm using HtmlAgilityPack to scrap data from a link(site). There are many p tags, header and span tags in a site. I need to scrap data from a particular span tag.
var webGet = new HtmlWeb();
var document = webGet.Load(URL);
foreach (HtmlNode node in document.DocumentNode.SelectNodes("\\span"))
{
string strData = node.InnerText.Trim();
}
I had tried by using keyword on parent tag which was not working for all kind of URLs.
Please help me to fix it.
What is the error?
You can start by fixing this:
foreach (HtmlNode node in document.DocumentNode.SelectNodes("\\span"))
it should be:
foreach (HtmlNode node in document.DocumentNode.SelectNodes("//span"))
But I want exact data. For example, there are too many span tags in source as <span>abc</span>, <span>def</span>, <span>pqr</span>, <span>xyz</span>. I want the result as "pqr". Is there any option to get it by count of particular tag or by index?
If you want to get, for example, the third span tag from the root:
doc.DocumentNode.SelectSingleNode("//span[3]")
If you want to get the node containing the text "pqr":
doc.DocumentNode.SelectSingleNode("//span[contains(text(),'pqr')]");
You can use SelectNodes for the latter to get all span tags containing "pqr" in the text.

xml parsing error (xpath, HTMLagilitypack)

I am trying to parse an xml. All nodes have opening and closing tags except one node that in some lines in only has this tag: <persons/>
In most of the time it appears like this: <persons> ... </persons>
I cannot get values from the xml when this node is not closing like this
Here is my code:
foreach (HtmlNode man in bm.SelectNodes(".//persons"))
{
//store values
}
How can I overcome this issue? Even if some nodes are like this at the start:
<persons> </persons>
if there is a tag like this in the middle of the file
<persons/>
I cannot get the remaining <persons> </persons> values from the remaining lines
why are you using htmlnode? xmlnode would be just fine.
Or else, show more codes.
Did you step through the line? Did you encounter any error?
try this:
internal string ParseXML()
{
string ppl = "";
XmlDocument doc = new XmlDocument();
doc.LoadXml(xmlString);
foreach (XmlElement node in doc.SelectNodes(".//person"))
{
string text = node.InnerText; //or loop through its children as well
ppl += text;
}
return ppl;
}

Parse Compelete Web Page

How to parse complete HTML web page not specific nodes using HTML Agility Pack or any other technique?
I am using this code, but this code only parse specific node, but I need complete page to parse with neat and clear contents
List<string> list = new List<string>();
string url = "https://www.google.com";
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a"))
{
list.Add(node.InnerText);
}
To get all descendant text nodes use something like
var textNodes = doc.DocumentNode.SelectNodes("//text()").
Select(t=>t.InnerText);
To get all non empty descendant text nodes
var textNodes = doc.DocumentNode.
SelectNodes("//text()[normalize-space()]").
Select(t=>t.InnerText);
Do SelectNodes("*") . '*' (asterisk) Is the wild card selector and will get every node on the page.

HTML Agility Pack get all input fields

I found some code on the internet that finds all the href tags and changes them to google.com, but how can I tell the code to find all the input fields and put custom text in there?
This is the code I have right now:
HtmlDocument doc = new HtmlDocument();
doc.Load(path);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[#href]"))
{
HtmlAttribute att = link.Attributes["href"];
att.Value = "http://www.google.com";
}
doc.Save("file.htm");
Please, can someone help me, I can't seem to find any information about this on the internet :(.
Change the XPath selector to //input to select all the input nodes:
foreach (HtmlNode input in doc.DocumentNode.SelectNodes("//input"))
{
HtmlAttribute att = input.Attributes["value"];
att.Value = "some text";
}
Your current code selected all a elements (that have a href attribute): "//a[#href]".
You want it to select all input elements: "//input".
Of course, the inner part of the loop will need to change to match what you are looking for.
I suggest you read up on XPath.

Categories

Resources