How to get content of <div> with HtmlAgilityPack - C# - c#

I have html source:
<div class="lit-plot">
<b class="red">خلاصه داستان :</b>
Content
</div>
I want to get the value of <div> (not <b> and only the string "Content") with HtmlAgilityPack. What is the best way to do this?
Here is what am I doing. movieDesHTMLSource is given html source. I don't know how to access the InnerHtml!
string movieDes;
//Exctact the movie's description HTML source
var movieDesHTMLSource = new HtmlAgilityPack.HtmlDocument();
movieDesHTMLSource.LoadHtml(postPageHTMLDes[95].InnerHtml);
var src = movieDesHTMLSource.DocumentNode.SelectNodes("//div[contains(#class,'lit-plot')]");

Use Xpath text() to retrieve just the text inside div tag.
var html = #"<body>
<div class='lit-plot'>
<b class='red'>خلاصه داستان :</b>
Content
</div>
</body>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//div[contains(#class,'lit-plot')]/text()");
foreach (HtmlNode node in htmlNodes)
{
Console.WriteLine(node.InnerText.Trim());
}
Fiddle here : https://dotnetfiddle.net/mXFs8k
I recommend that you wrap your content inside <p> or <span> etc tags then you can easily target it using HtmlAgilityPack.

Related

How to get value of nested img src with Html Agility Pack?

I'm trying to get a nested img srcs with Html Agility pack and I've tried multiple things with no success. Basically there are multiple img srcs I need to grab, all are nested. There are 17 of these I need to grab but can't figure it out for the life of me. Here is the barebones html, I need the value of src in the last line:
<div class="largeTitle">
<article class="articleItem" data-id="0000">
<a href="#blank_link"> class="img">
<img class=" lazyloaded" data-src="#blank_link" alt="test" onerror="script"
src="image_link.jpg">
</a>
</article>
<article class="articleItem" data-id="0001">
<a href="#blank_link"> class="img">
<img class=" lazyloaded" data-src="#blank_link" alt="test" onerror="script"
src="image_link.jpg">
</a>
</article>
</div>
With the url you mentioned in comments, you can do:
var web = new HtmlWeb();
var doc = web.Load("https://www.investing.com/");
var images = doc.DocumentNode.SelectNodes("//*[contains(#class,'js-articles')]//a[#class='img']//img");
foreach(var image in images)
{
string source = image.Attributes["data-src"].Value;
string label = image.Attributes["alt"].Value;
Console.WriteLine($"\"{label}\" {source}");
}

I've been triying to get data from website with HtmlAgilityPack

Firstly, I tried a lot of ways but I couldn't solve my problem. I don't know how to place my node way in SelectSingleNode(?) method. I create a html path to reach my node in my c# code but if I run this code, I take NullReferenceException because of my html path. I just want you that how can I create my html way or any other solution?
This is example of html code:
<html>
<body>
<div id="container">
<div id="box">
<div class="box">
<div class="boxContent">
<div class="userBox">
<div class="userBoxContent">
<div class="userBoxElement">
<ul id ="namePart">
<li>
<span class ="namePartContent>
</span>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
And this my C# code:
namespace AgilityTrial
{
class Program
{
static void Main(string[] args)
{
Uri url = new Uri("https://....");
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
string html = client.DownloadString(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string path = #"//html/body/div[#id='container']/div[#id='classifiedDetail']"+
"/div[#class='classifiedDetail']/div[#class='classifiedDetailContent']"+
"/div[#class='classifiedOtherBoxes']/div[#class='classifiedUserBox']"+
"/div[#class='classifiedUserContent']/ul[#id='phoneInfoPart']/li"+
"/span[#class='pretty-phone-part show-part']";
var tds = doc.DocumentNode.SelectSingleNode(path);
var date = tds.InnerHtml;
Console.WriteLine(date);
}
}
}
Take as an example your namePartContent span node. If you want to fetch that data you would simply do this:
doc.DocumentNode.SelectSingleNode(".//span[#class='namePartContent']")?.InnerText;
It will search/fetch a single span node with namePartContent as its class, begining at the root node, in your case <html>;

C# HtmlAgilityPack get content from all div with given class

I have a HTML file that looks like this:
<div class="user_meals">
<div class="name">Name Surname</div>
<div class="day_meals">
<div class="meal">First Meal</div>
</div>
<div class="day_meals">
<div class="meal">Second Meal</div>
</div>
<div class="day_meals">
<div class="meal">Third Meal</div>
</div>
<div class="day_meals">
<div class="meal">Fourth Meal</div>
</div>
<div class="day_meals">
<div class="meal">Fifth Meal</div>
</div>
This code repeats a few times.
I want to get Name and Surname which is between <div> tag with class "name".
This is my code using HtmlAgilityPack:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(#"C:\workspace\file.html");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[#class='name']"))
{
string vaule = node.InnerText;
}
But actually it doesn't work. Visual Studio throws me Exception:
An unhandled exception of type 'System.NullReferenceException'.
You are using wrong method to load HTML from a path LoadHtml expect HTML and not location of the file. Use Load instead.
The error you are getting is quite misleading as all properties are not null and standard tips from What is a NullReferenceException, and how do I fix it? don't apply.
Essentially this comes from the fact SelectNodes correctly returns null as there are not elements matching the query and foreach throws on it.
Fixed code:
HtmlDocument doc = new HtmlDocument();
// either doc.Load(#"C:\workspace\file.html") or pass HTML:
doc.LoadHtml("<div class='user_meals'><div class='name'>Name Surname</div></div> ");
var nodes = doc.DocumentNode.SelectNodes("//div[#class='name']");
// SelectNodes returns null if nothing found - may need to check
if (nodes == null)
{
throw new InvalidOperationException("Where all my nodes???");
}
foreach (HtmlNode node in nodes)
{
string vaule = node.InnerText;
vaule.Dump();
}

Inject variable into html input tag value using Html Agility Pack C#

Is it possible using the C# HTML Agility Pack to insert a variable into the selected node?
I have created my HTML form, loaded it, and selected the input node that I want, and now I would like to inject in the value field a SAML Response
Here is a bit of the code that I have, first the HTML document:
<html xmlns="http://www.w3.org/1999/xhtml">
<head id="Head1" runat="server">
<title></title>
</head>
<body runat="server" id="bodySSO">
<form id="frmSSO" runat="server" enableviewstate="False">
<div style="display:none" >
<input id="SAMLResponse" name="SAMLResponse" type="text" runat="server" enableviewstate="False" value=""/>
<input id="Query" name="Query" type="text" runat="server" enableviewstate="False" value=""/>
</div>
</form>
</body>
</html>
and here is the function which loads the HTML document and selects the node I want:
public static string GetHTMLForm(SamlAssertion samlAssertion)
{
HtmlAgilityPack.HtmlDocument HTMLSamlDocument = new HtmlAgilityPack.HtmlDocument();
HTMLSamlDocument.Load(#"C:\HTMLSamlForm.html");
HtmlNode node = HTMLSamlDocument.DocumentNode.SelectNodes("//input[#id='SAMLResponse']").First();
//Code that will allow me to inject into the value field my SAML Response
}
EDIT:
Ok so I have achieved injecting the SAML Response packet into the "value" field of the html input tag with this:
HtmlAgilityPack.HtmlDocument HtmlDoc = new HtmlAgilityPack.HtmlDocument();
String SamlInjectedPath = "C:\\SamlInjected.txt";
HtmlDoc.Load(#"C:\HTMLSamlForm.txt");
var SAMLResposeNode = HtmlDoc.DocumentNode.SelectSingleNode("//input[#id='SAMLResponse']").ToString();
SAMLResposeNode = "<input id='SAMLResponse' name='SAMLResponse' type='text' runat='server' enableviewstate='False' value='" + samlAssertion + "'/>";
Now I just need to be able to add that injected tag back into the original HTML document
ok I have solved this using the following:
HtmlAgilityPack.HtmlDocument HtmlDoc = new HtmlAgilityPack.HtmlDocument();
HtmlDoc.Load(#"C:\HTMLSamlForm.html");
var SamlNode = HtmlNode.CreateNode("<input id='SAMLResponse' name='SAMLResponse' type='text' runat='server' enableviewstate='False' value='" + samlAssertion + "'/>");
foreach (HtmlNode node in HtmlDoc.DocumentNode.SelectNodes("//input[#id='SAMLResponse']"))
{
string value = node.Attributes.Contains("value") ? node.Attributes["value"].Value : " ";
node.ParentNode.ReplaceChild(SamlNode, node);
}
Then in order to check the contents of the new HTML file I output it using this:
System.IO.File.WriteAllText(#"C:\SamlInjected.txt", HtmlDoc.DocumentNode.OuterHtml);

How to set innertext of class

I got this code in my webpage:
<div class="goog-inline-block goog-flat-menu-button-caption">
TestText
</div>
I was wondering how I can access this class and change TestText into some other string using C#.
I was trying with HtmlCollection, but there's no InnerText option.
EDIT: I CANT CHANGE CODE ABOVE.
assuming you are using ASP.NET and your div is inside atleast one container having runat="server" attribute i.e. Form
<form id="form1" runat="server">
<div class="goog-inline-block goog-flat-menu-button-caption">
TestText
</div>
</form>
you can simply do this:
var xml = form1.InnerHtml;
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
var nodes = doc.SelectSingleNode("//div[contains(#class,'goog-inline-block goog')]");
foreach(XmlNode node in nodes)
{
node.InnerText = " changed Text";
}
form1.InnerHtml = xml = doc.InnerXml;
using Linq to SQL i.e XDocument
XDocument doc = XDocument.Parse(xml);
var nodes = doc.Elements("div")
.Where(s => s.Attribute("class").Value
.Contains("goog-inline-block goog")
)
.ToList();
foreach (XElement elem in nodes)
{
elem.Value = "changed text";
}
form1.InnerHtml = doc.ToString();
Add the runat="server" and id attribute to it so you have:
<div id="mydiv" class="goog-inline-block goog-flat-menu-button-caption" runat="server" >
TestText
</div>
you can use the class attribute by using:
mydiv.Attributes["class"] = "classOfYourChoice";
or
mydiv.InnerText = "Your Selected text";
Hope you understand and helps for you..
In C# you need to give id and set runat="server" in div tag like this.
<div id="divTest" runat="server" class="goog-inline-block goog-flat-menu-button-caption">
TestText
</div>
then
In C# Code behind
divText.InnerText = "Change Text From Here.";
try this, if not work then please explain me your question in detail.

Categories

Resources