How to getelement by class?

How to getelement by class? - c#

I am trying to code a way using webBrowser1 to get a hold of of a download link via href, but the problem is I must find it using its class name.
<body>
<iframe scrolling="no" frameborder="0" allowtransparency="true" tabindex="0" name="twttrHubFrame" style="position: absolute; top: -9999em; width: 10px; height: 10px;" src="http://platform.twitter.com/widgets/hub.html">
‌¶
<div id="main">
‌¶‌→
<div id="header">
<div style="float:left;">
‌¶‌→
<div id="content">
‌¶‌→
<h1 style="background-image:url('http://static.mp3skull.com/img/bgmen.JPG'); background-repeat:repeat-x;">Rush‌·Mp3‌·Download</h1>
‌¶‌→
<a id="bitrate" onclick="document.getElementById('ofrm').submit(); return false;" rel="nofollow" href="">
<form id="ofrm" method="POST" action="">
‌¶‌→‌¶‌→‌→
<div id="song_html" class="show1">
‌¶‌→‌→‌→
<div class="left">
‌¶‌→‌→‌→
<div id="right_song">
‌¶‌→‌→‌→‌→
<div style="font-size:15px;">
‌¶‌→‌→‌→‌→
<div style="clear:both;"></div>
‌¶‌→‌→‌→‌→
<div style="float:left;">
‌¶‌→‌→‌→‌→‌→
<div style="float:left; height:27px; font-size:13px; padding-top:2px;">
‌¶‌→‌→‌→‌→‌→‌→
<div style="float:left; width:27px; text-align:center;">
‌¶‌→‌→‌→‌→‌→‌→
<div style="margin-left:8px; float:left;">
<a style="color:green;" target="_blank" rel="nofollow" href="http://dc182.4shared.com/img/1011303409/865387c9/dlink__2Fdownload_2F6QmedN8H_3Ftsid_3D20111211-54337-a79f8d10/preview.mp3">Download</a>
</div>
‌·‌¶‌→‌→‌→‌→‌→‌→
<div style="margin-left:8px; float:left;">
‌¶‌→‌→‌→‌→‌→‌→
<div style="margin-left:8px; float:left;">
‌·‌¶‌→‌→‌→‌→‌→‌→
<div style="clear:both;"></div>
‌¶‌→‌→‌→‌→‌→
</div>
‌¶‌→‌→‌→‌→‌→
<div id="player155580779" class="player" style="float:left; margin-left:10px;"></div>
‌¶‌→‌→‌→‌→
</div>
‌→‌¶‌→‌→‌→‌→
<div style="clear:both;"></div>
‌¶‌→‌→‌→
</div>
‌¶‌→‌→‌→
<div style="clear:both;"></div>
‌¶‌→‌→
</div>
I looked and searched all over google, but I found PHP examples?
I understand you would do something along the lines of this
HtmlElement downloadlink = webBrowser1.Document.GetElementById("song_html").All[0];
URL = downloadlink.GetAttribute("href");
but I do not understand how to do it by the class "show1".
Please point me in the right direction with examples and/or a website I can visit so I can learn how to do this as I searched and have no clue.
EDIT: I pretty much need the href link ("http://dc182.4shared.com/img/1011303409/865387c9/dlink__2Fdownload_2F6QmedN8H_3Ftsid_3D20111211-54337-a79f8d10/preview.mp3"), so how would I obtain it?

There is nothing built-in in the WebBrowser control to retrieve an element by class name. Since you know it is going to be an a element the best you can do is get all a elements and search for the one you want:
var links = webBrowser1.Document.GetElementsByTagName("a");
foreach (HtmlElement link in links)
{
if (link.GetAttribute("className") == "show1")
{
//do something
}
}

Extension Method for HtmlDocument
Returns a list of elements with a particular tag, which coincides with the given className
It can be used to capture the elements only on the tag, or only by class name
internal static class Utils
{
internal static List<HtmlElement> getElementsByTagAndClassName(this HtmlDocument doc, string tag = "", string className = "")
{
List<HtmlElement> lst = new List<HtmlElement>();
bool empty_tag = String.IsNullOrEmpty(tag);
bool empty_cn = String.IsNullOrEmpty(className);
if (empty_tag && empty_cn) return lst;
HtmlElementCollection elmts = empty_tag ? doc.All : doc.GetElementsByTagName(tag);
if (empty_cn)
{
lst.AddRange(elmts.Cast<HtmlElement>());
return lst;
}
for (int i = 0; i < elmts.Count; i++)
{
if (elmts[i].GetAttribute("className") == className)
{
lst.Add(elmts[i]);
}
}
return lst;
}
}
Usage:
WebBrowser wb = new WebBrowser();
List<HtmlElement> lst_div = wb.Document.getElementsByTagAndClassName("div");// all div elements
List<HtmlElement> lst_err_elmnts = wb.Document.getElementsByTagAndClassName(String.Empty, "error"); // all elements with "error" class
List<HtmlElement> lst_div_err = wb.Document.getElementsByTagAndClassName("div", "error"); // all div's with "error" class

I followed up these answers and make my method to hide div by class name.
I shared for whom concern.
public void HideDivByClassName(WebBrowser browser, string classname)
{
if (browser.Document != null)
{
var byTagName = browser.Document.GetElementsByTagName("div");
foreach (HtmlElement element in byTagName)
{
if (element.GetAttribute("className") == classname)
{
element.Style = "display:none";
}
}
}
}

Related

Get the href link of an element with classname having spaces

I have been trying to get the link of an element using the class name but always getting an error that no element found
List<IWebElement> LinkElements = Selenium.Selenium.driver.FindElementsByClassName("column.wrap-text").ToList();
I somehow managed to get the links I want using the below code but I know that is not a good approach.
try
{
Selenium.Selenium.driver.Navigate().GoToUrl(txt_url.Text);
List<IWebElement> LinkElements = Selenium.Selenium.driver.FindElementsByTagName("a").ToList();
List<string> ValidLinks = new List<string>();
foreach (IWebElement LinkElement in LinkElements)
{
string LinkString = LinkElement.GetAttribute("href");
if (LinkString != null)
{
if (LinkString.Contains("documents"))
{
list.Items.Add(LinkString);
}
}
}
}
catch (Exception)
{ }
Below is the html code for the element I want to extract the href link("/view/garnimii#/Testing%20Folder/MyFile.txt") with the title name in it. I have tried every possible way but not able to read the element with the findbyclassname or findbyxpath(which is very vague here). can anyone please help me with this?
<div class="wrapper fluid-element">
<div class="wrapper fluid-element">
<div class="wrapper fluid-element">
<div class="column wrap-text">
<a title="MyFile.txt" href="https://drive.corp.amazon.com/documents/garnimii#/Testing%20Folder/MyFile.txt">MyFile.txt</a
>
</div>
</div>
<div class="column actions resource-actions-view">
<a data-turbolink="true" href="/view/garnimii#/Testing%20Folder/MyFile.txt"><i class="fa fa-
external-link"></i> View
</a></div>
<div class="column actions resource-actions-share">
<a data-target="#resource-modal-share" data-toggle="modal"
href="/share/garnimii#/Testing%20Folder/MyFile.txt">
<i class="fa fa-share-alt"></i> Share
</a>
</div>
<div class="column actions resource-actions-rename resource-header-actions">
<a data-resource-basename="MyFile.txt" data-resource-id="8a520062-5dbe-46ba-b4b0-b672f6481c17"
data-root-path="/" data-target="#resource-modal-rename" data-toggle="modal" href="#resource-
modal-rename">
<i class="fa fa-pencil"></i> Rename
</a>
</div>
</div>
</div>
Update
foreach (IWebElement LinkElement in LinkElements)
{
string LinkString = LinkElement.GetAttribute("title");
if (LinkString != null)
{
if(LinkString.Contains("myfile.txt"))
{
list.Items.Add(LinkString.GetAttribute('href'));
}
}
}

You can even try with //a xpath.
List<IWebElement> LinkElements = Selenium.Selenium.driver.FindElementsByXpath("//a");
List<string> ValidLinks = new List<string>();
foreach (IWebElement LinkElement in LinkElements){
Console.WriteLine(LinkElement.GetAttribute('href'))
}
print all the GetAttribute with href first. and if your output contains all the href then we can proceed further with adding into other list.
Update :
string LinkString = Selenium.Selenium.driver.FindElementByXpath("//a[#title='MyFile.txt']").GetAttribute('href')

FindElementsByClassName can locate element by single class name.
For multiple class names you should use XPath or CSS selector.
So instead of
List<IWebElement> LinkElements = Selenium.Selenium.driver.FindElementsByClassName("column.wrap-text").ToList();
Try using
List<IWebElement> LinkElements = Selenium.Selenium.driver.FindElementsByCssSelector("div.column.wrap-text").ToList();

get div information with html agility pack

Hi I want to process information on a html page, with the following code I can get the information
This is how the order is received
new-link-1
new-link-2
new-link-3
But when it comes to the new-link-no-title section, it breaks up And it changes to
new-link-3
new-link-1
new-link-2
And at the end of the program it stops with an ArgumentOutOfRangeException error
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = await web.LoadFromWebAsync(Link);
foreach ((var item, int index) in doc.DocumentNode.SelectNodes(".//div[#class='new-link-1']").WithIndex())
{
var x = item.SelectNodes("//div[#class='new-link-2']")[index].InnerText;
var xx = item.SelectNodes("//div[#class='new-link-3']//a")[index];
MessageBox.Show(item.InnerText);
MessageBox.Show(x);
MessageBox.Show(xx.Attributes["href"].Value);
}
and html
<div id="new-link">
<ul>
<li>
<div class="new-link-1"> فصل پنجم</div>
<div class="new-link-2"> تکمیل شده</div>
<div class="new-link-3">
دانلود با لینک مستقیم
</div>
</li>
<li class="new-link-no-titel">
<div class="new-link-1"> فصل ششم</div>
<div class="new-link-2"> درحال پخش</div>
<div class="new-link-3">
<i class="fa fa-arrow-down" title=حال پخش">
</i>
</div>
</li>
<li>
<divs="new-link-1"> قسمت 1</div>
<div class="new-link-2"> پخش شده</div>
<div class="new-link-3">
دانلودلینک مستقیم
</div>
</li>
<li>
<div class="new-link-1"> قسمت 7</div>
<div class="new-link-2"> پخش شده</div>
<div class="new-link-3">
دانلود با لینک مستقیم
</div>
</li>
</ul>
</div>

This is what I found to be the issue with your code.
foreach ((var item, int index) in doc.DocumentNode.SelectNodes(".//div[#class='new-link-1']").WithIndex()) //-> Gives 4 indecies for index
item.SelectNodes("//div[#class='new-link-2']") // -> This produces 4 nodes
item.SelectNodes("//div[#class='new-link-3']//a") // -> This produces only 3 nodes
Issue:
When you search with //div, you search All nodes.. not just from the item you are currently on.
Solution/Suggestion: Your current code searches all a elements starting from the root node. If you prefix it with a dot instead only the descendants of the current node will be considered. (Excerpt from here)
foreach (HtmlNode item in doc.DocumentNode.SelectNodes(".//li"))
{
try
{
var x0 = item.SelectSingleNode(".//div[#class='new-link-1']");
var x = item.SelectSingleNode(".//div[#class='new-link-2']");
var xx = item.SelectSingleNode(".//a");
MessageBox.Show(x0.InnerText);
MessageBox.Show(x.InnerText);
if (xx.Attributes["href"] != null)
MessageBox.Show(xx.Attributes["href"].Value);
}
catch { }
}

Get specific href values or link from email which is parsed as html in c#

I am processing emails in my C# service. I need to extract certain links present in the same to add to DB. I am using HtmlagilityPack. The div and p tags turn out interchangeable in the parsed email. I have to extract the links present below the tags 'Scheduler Link', 'Data Path' and 'Link' from the email. After cleaning it up, a sample data is as follows :
<html>
<body>
......//contains some other tags which i dont need, may include hrefs but
//i dont need them
<div align="justify" style="margin:0;"></div>
<div align="justify" style="margin:0;"></div>
<div align="justify" style="margin:0;">Scheduler link :</div>
<div align="justify" style="margin:0;"></div>
<div style="margin:0;"><a href="https://something.com/requests/26428">
https://something.com/requests/26428</a>
</div>
<div style="margin:0;"></div>
<div style="margin:0;"></div>
<div style="margin:0;"></div>
<div align="justify" style="margin:0;">Data path :</div>
<div align="left" style="text-align:justify;margin:0;"><a
href="file:///\\mycompany.com\ABC\OPQ1234\jui\tui245.5t_2rtfg_tyui">
\\mycompany.com\ABC\OPQ1234\jui\tui245.5t_2rtfg_tyui</a>
</div>
<div align="left" style="text-align:justify;margin:0;"><a
href="file:///\\mycompany.com\ABC\OPQ1234\tui245.5t_2rtfg_tyui">
\\mycompany.com\ABC\OPQ1234\tui245.5t_2rtfg_tyui</a>
</div>
<div align="justify" style="margin:0;"></div>
<div align="justify" style="margin:0;">Link :</div>
<div align="justify" style="margin:0;"><a
href="https://Thisisanotherlink.abcdef/sites/this/498592/rkjfb/3874y">
This is some text</a></div>
<div align="justify" style="margin:0 0 5pt 0;">This is another text</div>
......//contains some other tags which i dont need
</body>
</html>
I am looking for the div tag of 'Scheduler Link', 'Data Path' and 'Link' using regular expressions as follows :
HtmlNode schedulerLink = doc.DocumentNode.SelectSingleNode("//*[text()[contains(.,'" + Regex.Match(body, _keyValuePairs["scheduler"]).Value.ToString() + "')]]");
HtmlNode dataPath = doc.DocumentNode.SelectSingleNode("//*[text()[contains(.,'" + Regex.Match(body, _keyValuePairs["datapath"]).Value.ToString() + "')]]");
HtmlNode link = doc.DocumentNode.SelectSingleNode("//*[text()[contains(.,'" + Regex.Match(body, _keyValuePairs["link"]).Value.ToString() + "')]]");
The div tags are returning me the respective nodes. The number of links present against the three in each email varies and so does the order of the tags. I need to capture the links against each in a list. I am using the following code :
foreach (HtmlNode link in schedulerLink.Descendants())
{
string hrefValue = link.GetAttributeValue("href", string.Empty);
if (!(link.InnerText.Contains("\r\n")))
{
if (link.InnerText.Contains("/"))
{
schedulersList.Add(link.InnerText.Trim());
}
}
}
The descendants sometimes is not returning the correct number of nodes. Also how do i get the specific links against the 3 tags in 3 different lists since descendants usually return all the nodes present below.

If I understand correctly, you want to capture the content of the first href-attribute after a specific string like scheduler link. I don't know about the HtmlagilityPack, but my approach would be to just search the email body with a regex like this:
Scheduler link(?:\s|\S)*?href="([^"]+)
This regex should capture the content of the first href-attribute after every occurence of "Scheduler link" in the mail.
You can try it here: Regex101
To find the other types of links just replace the Scheduler link part with the respective string.
I hope this is helpful.
Additional info about the regex:
Scheduler link matches the string literally
(?:\s|\S)*?href=" non-capturing group that matches any character until the first occurence of the literal string href="
([^"]+) captures everything despite the " character

As you have mentioned different hrefs in your question,
one way of doing it is by following:
var html = #"<html> <body> <div align='justify' style='margin:0;'></div> <div align='justify' style='margin:0;'></div> <div align='justify' style='margin:0;'>Scheduler link :</div> <div align='justify' style='margin:0;'></div> <div style='margin:0;'><a href='https://something.com/requests/26428'> https://something.com/requests/26428</a> </div> <div style='margin:0;'></div> <div style='margin:0;'></div> <div style='margin:0;'></div> <div align='justify' style='margin:0;'>Data path :</div> <div align='left' style='text-align:justify;margin:0;'><a href='file:///\\mycompany.com\ABC\OPQ1234\jui\tui245.5t_2rtfg_tyui'> \\mycompany.com\ABC\OPQ1234\jui\tui245.5t_2rtfg_tyui</a> </div> <div align='left' style='text-align:justify;margin:0;'><a href='file:///\\mycompany.com\ABC\OPQ1234\tui245.5t_2rtfg_tyui'> \\mycompany.com\ABC\OPQ1234\tui245.5t_2rtfg_tyui</a> </div> <div align='justify' style='margin:0;'></div> <div align='justify' style='margin:0;'>Link :</div> <div align='justify' style='margin:0;'><a href='https://Thisisanotherlink.abcdef/sites/this/498592/rkjfb/3874y'> This is some text</a></div> <div align='justify' style='margin:0 0 5pt 0;'>This is another text</div> </body></html>";
var document = new HtmlDocument();
document.LoadHtml(html);
var schedulerNodes = document.DocumentNode.SelectNodes("//a[contains(#href, \"something\")]");
var dataPathNodes = document.DocumentNode.SelectNodes("//a[contains(#href, \"mycompany\")]");
var linkNodes = document.DocumentNode.SelectNodes("//a[contains(#href, \"Thisisanotherlink\")]");
foreach (var item in schedulerNodes)
{
Debug.WriteLine(item.GetAttributeValue("href", ""));
Debug.WriteLine(item.InnerText);
}
foreach (var item in dataPathNodes)
{
Debug.WriteLine(item.GetAttributeValue("href", ""));
Debug.WriteLine(item.InnerText);
}
foreach (var item in linkNodes)
{
Debug.WriteLine(item.GetAttributeValue("href", ""));
Debug.WriteLine(item.InnerText);
}
Hope that helps !!
EDIT ::
var result = document.DocumentNode.SelectNodes("//div//text()[normalize-space()] | //a");
// select all textnodes and a tags
string sch = "Scheduler link :";
string dataLink = "Data path :";
string linkpath = "Link :";
foreach (var item in result)
{
if (item.InnerText.Trim().Contains(sch))
{
var processResult = result.SkipWhile(x => !x.InnerText.Trim().Equals(sch)).Skip(1);
// skip the result till we reache to Scheduler.
Debug.WriteLine("====================Scheduler link=========================");
foreach (var subitem in processResult)
{
Debug.WriteLine(subitem.GetAttributeValue("href", ""));
// if href then add to list TODO
if (subitem.InnerText.Contains(dataLink)) // break when data link appears.
{
break;
}
}
}
if (item.InnerText.Trim().Contains(dataLink))
{
var processResult = result.SkipWhile(x => !x.InnerText.Trim().Equals(dataLink)).Skip(1);
Debug.WriteLine("====================Data link=========================");
foreach (var subitem in processResult)
{
Debug.WriteLine(subitem.GetAttributeValue("href", ""));
if (subitem.InnerText.Contains(dataLink))
{
break;
}
}
}
if (item.InnerText.Trim().Contains("Link :"))
{
var processResult = result.SkipWhile(x => !x.InnerText.Trim().Equals(linkpath)).Skip(1);
Debug.WriteLine("====================Link=========================");
foreach (var subitem in processResult)
{
var hrefValue = subitem.GetAttributeValue("href", "");
Debug.WriteLine(hrefValue);
if (subitem.InnerText.Contains(dataLink))
{
break;
}
}
}
}
I have mentioned logic in code commments.
Hope that helps

I've been triying to get data from website with HtmlAgilityPack

Firstly, I tried a lot of ways but I couldn't solve my problem. I don't know how to place my node way in SelectSingleNode(?) method. I create a html path to reach my node in my c# code but if I run this code, I take NullReferenceException because of my html path. I just want you that how can I create my html way or any other solution?
This is example of html code:
<html>
<body>
<div id="container">
<div id="box">
<div class="box">
<div class="boxContent">
<div class="userBox">
<div class="userBoxContent">
<div class="userBoxElement">
<ul id ="namePart">
<li>
<span class ="namePartContent>
</span>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
And this my C# code:
namespace AgilityTrial
{
class Program
{
static void Main(string[] args)
{
Uri url = new Uri("https://....");
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
string html = client.DownloadString(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string path = #"//html/body/div[#id='container']/div[#id='classifiedDetail']"+
"/div[#class='classifiedDetail']/div[#class='classifiedDetailContent']"+
"/div[#class='classifiedOtherBoxes']/div[#class='classifiedUserBox']"+
"/div[#class='classifiedUserContent']/ul[#id='phoneInfoPart']/li"+
"/span[#class='pretty-phone-part show-part']";
var tds = doc.DocumentNode.SelectSingleNode(path);
var date = tds.InnerHtml;
Console.WriteLine(date);
}
}
}

Take as an example your namePartContent span node. If you want to fetch that data you would simply do this:
doc.DocumentNode.SelectSingleNode(".//span[#class='namePartContent']")?.InnerText;
It will search/fetch a single span node with namePartContent as its class, begining at the root node, in your case <html>;

Traverse the dom with CsQuery

I'm trying to learn how to use CsQuery to traverse a dom to get specific text.
The html looks like this:
<div class="featured-rows">
<div class="row">
<div class="featured odd" data-genres-filter="MA0000002613">
<div class="album-cover">
<div class="artist">
Half apanese
</div>
<div class="title">
<div class="label"> Joyful Noise </div>
<div class="styles">
<div class="rating allmusic">
<div class="rating average">
<div class="headline-review">
</div>
<div class="featured even" data-genres-filter="MA0000002572, MA0000002613">
</div>
<div class="row">
<div class="row">
<div class="row">
My code attempt looks like this:
public void GetRows()
{
var artistName = string.Empty;
var html = GetHtml("http://www.allmusic.com/newreleases");
var rows = html.Select(".featured-rows");
foreach(var row in rows)
{
var odd = row.Cq().Find(".featured odd");
foreach(var artist in odd)
{
artistName = artist.Cq().Text();
}
}
}
The first select for .featured-row works but then i don't know how to get down to the .artist to get the text.

You should try something similar to this:
var html = GetHtml("http://www.allmusic.com/newreleases");
var query = CQ.Create(html)
var row = query[".artist>a"];
string link = row.Attributes["href"];
string text = row.DefaultValue or row.InnerText or row.Value...
CsQuery is port of JQuery so you can google for JQuery code
UPDATE:
To traverse to get all artists and titles
var rows = query[".featured odd"];
foreach(var row in rows)
{
var artistsLink = row[".artists>a"];
var title = row[".title"];
// here do whatever you need with this
}

List<string> artists = html[".featured .artist a"].Select(dom=>dom.TextContent).ToList();
where html == your CQ object.
var odd = row.Cq().Find(".featured odd");
should be
var odd = row.Cq().Find(".featured.odd");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to getelement by class? - c#

Related

Get the href link of an element with classname having spaces

get div information with html agility pack

Get specific href values or link from email which is parsed as html in c#

I've been triying to get data from website with HtmlAgilityPack

Traverse the dom with CsQuery

Categories

Resources