scraping 3rd node using htmlagilitypack - c#

In a webpage there are several nodes having class='inner'. But i need to the 3rd node having class='inner'. If i use
string x = textBox1.Text;
string q = "";
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("myweb_link" + x);
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[#class='inner']");
if (nodes != null)
{
foreach (HtmlNode n in nodes)
{
q = n.InnerText;
q = System.Net.WebUtility.HtmlDecode(q);
q = q.Trim();
MessageBox.Show(q);
}
}
else
MessageBox.Show("nothing found ");
it gives me all the nodes having class='inner'. i also know that.
But i want only the 3rd node. How can i get that???

Get the third node from the nodes variable using the indexer:
var thirdNode = nodes[2];

Related

Scrape data with HtmlAgilityPack for a tag which doesn't have class

Here is my C# code what i am trying to do is to scrape data from a website by using HtmlAgilityPack but it's showing nothing found every time don't know what i am doing wrong a bit confused
HtmlAgilityPack.HtmlWeb webb = new HtmlAgilityPack.HtmlWeb();
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
HtmlAgilityPack.HtmlDocument doc = webb.Load("mywebsite");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//ul[#class='unstyled']//li//a");
if (nodes != null)
{
foreach (HtmlNode n in nodes)
{
q = n.InnerText;
q = System.Net.WebUtility.HtmlDecode(q);
q = q.Trim();
Console.WriteLine(q);
}
}
else
{
Console.WriteLine("nothing found");
}
Here is the picture of the tag from which i am trying to capture data i need data from <a> tag .
The XPath used to select the tag is incorrect.
HtmlNodeCollection nodes =
doc.DocumentNode.SelectNodes("//ul[#class='unstyled']/li/a");
This should select all the anchor nodes and then you can loop through the nodes to get the InnerHtml.
Working sample shown below
string s = "<ul class='unstyle no-overflow'><li><ul class='unstyled'><li><a href='http://www.smsconnexion.com'>SMS ConneXion</a></li></ul><ul class='unstyled'><li><a href='http://www.celusion.com'>Celusion</a></li></ul></li></ul>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(s);
HtmlNodeCollection nodes =
doc.DocumentNode.SelectNodes("//ul[#class='unstyled']/li/a");
foreach(var node in nodes)
{
Console.WriteLine(node.Attributes["href"].Value);
}
Console.ReadLine();

How to get a text between nodes

I have a problem with extracting text between nodes.It shows me the entire span node.I would like to get value of hours e.g 4:45;5:15 e.t.c.
var html = #"https://programtv.onet.pl/";
HtmlWeb web = new HtmlWeb();
var htmldoc=web.Load(html);
var findhours = htmldoc.DocumentNode.SelectNodes("//div[#id='boxTV1']//div[#class='hours']//span[#class='hour']");
if (findhours != null)
{
foreach (var x in findhours )
{
Console.WriteLine(x.OuterHtml);
}
}
else
{
Console.WriteLine("node = null");
}
Console.ReadLine();
Application window
You can simply use the InnerText property of your HtmlNode object. Checkout the following documentation.
foreach (var x in findhours )
{
Console.WriteLine(x.InnerText);
}

Not able to read XML string in C#

I have created a XML string and Looping that to get value. But its not entering in foreach loop. But in my other code same loop code is working.
my code is :
XML string:
<SuggestedReadings>
<Suggestion Text="Customer Centricity" Link="http://wdp.wharton.upenn.edu/book/customer-centricity/?utm_source=Coursera&utm_medium=Web&utm_campaign=custcent" SuggBy="Pete Fader�s" />
<Suggestion Text="Global Brand Power" Link="http://wdp.wharton.upenn.edu/books/global-brand-power/?utm_source=Coursera&utm_medium=Web&utm_campaign=glbrpower" SuggBy="Barbara Kahn�s" />
</SuggestedReadings>
Code Is:
string str = CD.SRList.Replace("&", "&");
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(str);
XmlNode SuggestionListNode = xmlDoc.SelectSingleNode("/SuggestedReadings/Suggestion");
foreach (XmlNode node in SuggestionListNode)
{
COURSESUGGESTEDREADING CSR = new COURSESUGGESTEDREADING();
var s = db.COURSESUGGESTEDREADINGS.OrderByDescending(o => o.SRID);
CSR.SRID = (s == null ? 0 : s.FirstOrDefault().SRID) + 1;
CSR.COURSEID = LibId;
CSR.TEXT = node.Attributes.GetNamedItem("Text").Value;
CSR.LINK = node.Attributes.GetNamedItem("Link").Value; ;
CSR.SUGBY = node.Attributes.GetNamedItem("SuggBy").Value; ;
CSR.ACTIVEFLAG = "Y";
CSR.CREATEDBY = CD.CreatedBy;
CSR.CREATEDDATE = DateTime.Now;
db.COURSESUGGESTEDREADINGS.Add(CSR);
}
You should use SelectNodes, not SelectSingleNode, since you are trying to get multiple rows out of the XML document.
Use this:
XmlNodeList SuggestionListNode = xmlDoc.SelectNodes("//Suggestion");
foreach (XmlNode node in SuggestionListNode)
{
}
You can try this.
XDocument xdoc = XDocument.Load("data.xml");
var xmlData = from lv1 in xdoc.Descendants("Suggestion")
select new {
Text = lv1.Attribute("Text").Value,
Link = lv1.Attribute("Link").Value,
SuggBy = lv1.Attribute("SuggBy").Value
};
foreach (var item in xmlData){
// your logic here
}

How to get xml element values using XmlNodeList in c#

I need to store the element values which are inside the nodes "member" . I have tried the following code but I can't achieve it. How to get the values. Any help would be appreciated
XML:
<ListInventorySupplyResponse xmlns="http://mws.amazonaws.com/FulfillmentInventory/2010-10-01/">
<ListInventorySupplyResult>
<InventorySupplyList>
<member>
<SellerSKU>043859634910</SellerSKU>
<FNSKU>X000IA4045</FNSKU>
<ASIN>B005YV4DJO</ASIN>
<Condition>NewItem</Condition>
<TotalSupplyQuantity>7</TotalSupplyQuantity>
<InStockSupplyQuantity>7</InStockSupplyQuantity>
<EarliestAvailability>
<TimepointType>Immediately</TimepointType>
</EarliestAvailability>
<SupplyDetail>
</SupplyDetail>
</member>
</InventorySupplyList>
</ListInventorySupplyResult>
<ResponseMetadata>
<RequestId>58c9f4f4-6f60-496a-8d71-8fe99ce301c9</RequestId>
</ResponseMetadata>
</ListInventorySupplyResponse>
C# Code:
string a = Convert.ToString(oInventorySupplyRes.ToXML());
XmlDocument oXdoc = new XmlDocument();
oXdoc.LoadXml(a);
XmlNodeList oInventorySupplyListxml = oXdoc.SelectNodes("//member");
foreach (XmlNode itmXml in oInventorySupplyListxml)
{
// var cond = itmXml.InnerXml.ToString();
var asinVal = itmXml.SelectSingleNode("ASIN").Value;
var TotalSupplyQuantityVal = itmXml.SelectSingleNode("TotalSupplyQuantity").Value;
}
ResultView : "Enumeration yielded no results" and count = 0;
Edit 1:
string a = Convert.ToString(oInventorySupplyRes.ToXML());
var status = oInventorySupplyResult.InventorySupplyList;
XmlDocument oXdoc = new XmlDocument();
var doc = XDocument.Parse(a);
var r = doc.Descendants("member")
.Select(member => new
{
ASIN = member.Element("ASIN").Value,
TotalSupplyQuantity = member.Element("TotalSupplyQuantity").Value
});
private string mStrXMLStk = Application.StartupPath + "\\Path.xml";
private System.Xml.XmlDocument mXDoc = new XmlDocument();
mXDoc.Load(mStrXMLStk);
XmlNode XNode = mXDoc.SelectSingleNode("/ListInventorySupplyResult/InventorySupplyList/member");
if (XNode != null)
{
int IntChildCount = XNode.ChildNodes.Count;
for (int IntI = 1; IntI <= IntChildCount ; IntI++)
{
string LocalName = XNode.ChildNodes[IntI].LocalName;
XmlNode Node = mXDoc.SelectSingleNode("/Response/" + LocalName);
// Store Value in Array assign value by "Node.InnerText"
}
}
Try This Code. Its Worked
try using this xpath
string xPath ="ListInventorySupplyResponse/ListInventorySupplyResult
/InventorySupplyList/member"
XmlNodeList oInventorySupplyListxml = oXdoc.SelectNodes(xpath);
when you do "//member", then, the code is trying to look for element named member from the root level, which is not present at the root level, rather it is nested beneath few layers.
I think this will help you..
string a = Convert.ToString(oInventorySupplyRes.ToXML());
XmlDocument oXdoc = new XmlDocument();
oXdoc.LoadXml(a);
XmlNodeList fromselectors;
XmlNodeList toselectors;
XmlElement root = oXdoc.DocumentElement;
fromselectors = root.SelectNodes("ListInventorySupplyResult/InventorySupplyList/member/ASIN");
toselectors = root.SelectNodes("ListInventorySupplyResult/InventorySupplyList/member/TotalSupplyQuantity");
foreach (XmlNode m in fromselectors)
{
you will have value in `m.InnerXml` use it whereever you want..
}
foreach (XmlNode n in toselectors)
{
you will have value in `n.InnerXml` use it whereever you want..
}

HTML Agility Pack - Get Text From 1st STRONG Tag Inside SPAN Tag

There are 5 STRONG Tags inside my SPAN Tag from my Html document.
I want to know how to get the text from the first STRONG Tag inside the SPAN TAG?
Here is my code so far.
var web = new HtmlWeb();
var doc = web.Load(url);
var nodes = doc.DocumentNode.SelectNodes("//span[#class='advisory_link']/strong");
foreach (var node in nodes)
{
richTextBox1.Text = node.InnerHtml;
}
var nodes = doc.DocumentNode.SelectNodes("//span[#class='advisory_link']//strong[1]");
if (nodes != null)
{
foreach (var node in nodes)
{
string Description = node.InnerHtml;
return Description;
}
}
return null;

Categories

Resources