Getting only the DIRECT InnerText of an IHTMLElement - c#

Consider the following html code:
<div id='x'><div id='y'>Y content</div>X content</div>
I'd like to extract only the content of 'x'. However, its innerText property includes the content of 'y' as well. I tried iterating over its children and all properties but they only return the inner tags.
How can I access through the IHTMLElement interface only the actual data of 'x'?
Thanks

Use something like:
function getText(this) {
var txt = this.innerHTML;
txt.replace(/<(.)*>/g, "");
return txt;
}
Since this.innerHTML returns
<div id='y'>Y content</div>X content
the function getText would return
X content
Maybe this'll help.

Use the childNodes collection to return child elements and textnodes
You need to QI IHTMLDomNote from IHTMLelement for that.

Here is the final code as suggested by Sheng (just a part of the sample, of course):
mshtml.IHTMLElementCollection c = ((mshtml.HTMLDocumentClass)(wbBrowser.Document)).getElementsByTagName("div");
foreach (IHTMLElement div in c)
{
if (div.className == "lyricbox")
{
IHTMLDOMNode divNode = (IHTMLDOMNode)div;
IHTMLDOMChildrenCollection children = (IHTMLDOMChildrenCollection)divNode.childNodes;
foreach (IHTMLDOMNode child in children)
{
Console.WriteLine(child.nodeValue);
}
}
}

Since innerText() doesn't work with ie, there is no real way i guess.
Maybe try server-side solving the issue by creating content the following way:
<div id='x'><div id='y'>Y content</div>X content</div>
<div id='x-plain'>_plain X content_</div>
"Plain X content" represents your c# generated content for the element.
Now you gain access to the element by refering to getObject('x-plan').innerHTML().

Related

How to collect a Value out of multiple div elements in a list with Selenium in C#?

i have a list on a website that stores the part number and the order number.
in this list are different div elements and i would like to export every part number in this list.
The list looks like this:
<div class="spareValue">
<span class="label">OrderNumber:</span>
<span class="PartNumber">180011</span>
</div>
<div class="spareValue">
<span class="label">SparePartNumber:</span>
<span class="PartNumber">01002523</span>
</div>
How can i export every OrderNumber and put them into a list in c# that i can work with the values??
lot of ways to do that:
var spans = driver.FindElements(By.CssSelector("div.spareValue span"));
var nbrspans = spans.Count;
var Listordernumber = new List<string>();
for(int i = 0; i < nbrspans; i +=2)
{
if (spans[i].GetAttribute("textContent") != "OrderNumber:") continue;
Listordernumber.Add(spans[i + 1].GetAttribute("textContent"));
}
so Listordernumber contains the result
if you prefer linq, you could use that:
string path = "//div[#class='spareValue' and ./span[text()='OrderNumber:']]/span[#class = 'PartNumber']";
var Listordernumber = driver.FindElements(By.XPath(path)).Select(s => s.GetAttribute("textContent")).ToList();
Oke, you want every partnummer, that belongs to an ordernummer. That leads me to this xPath, find the div that has a ordernumber in it, than find the partnumber element inside.
Then to find them all (or none).
Last put them all in a neat list, selecting the partnumber text:
string path = "//div[#class='spareValue' and ./span[text()='OrderNumber:']]/div[#class='PartNumber']"
var elements = driver.findElements(By.XPath(path));
var listOrderNumber = elements.Select(e=>e.Text);
var els = driver.FindElements(By.XPath("//*[text()='OrderNumber:']"));
foreach(var el in els){
var el = els.FindElement(By.XPath("./../span[#class='PartNumber']"));
console.writeline("OrderNumber: "+el.Text());
}
first, you have to find all elements that have "OrderNumber:" text on it and eliminate all elements that don't.
now, iterate through all elements that have "OrderNumber:" we have found from step above and go to its parent node, then find all element inside the parent node that the class name is "PartNumber".

Looping through elements in a div in CsQuery

I'm trying to open an HTML file, loop through the divs that match a certain criteria, and then loop through the p tags that match a certain criteria within those divs.
CQ dom = CQ.CreateFromFile("page.html");
CQ document_divs = dom["div"];
document_divs.Each((i,document_div) =>
{
string divid = document_div.Id;
if (divid.Contains("page"))
{
CQ page_ptags = document_div["p"];
page_ptags.Each((j, page_ptag) =>
{
lblOutput.Text = page_ptag.Id;
});
}
});
It is selecting the divs fine, but I'm not sure how to select the p tags within a div. I know there is something wrong with this line:
CQ page_ptags = document_div["p"];
But what should I change?
Try this:
CQ page_ptags = document_div.Cq().Find("p");
When you are looking throw a CQ object, each elements will be type of IDom.
That's why you need or wrap it in CQ object, or use native Dom functions to work with.

Get HtmlAgilityPack Node using exact HTML search or Converting HTMLElement to HTMLNode

I have created a HTMLElement picker (DOM) by using the default .net WebBrowser.
The user can pick (select) a HTMLElement by clicking on it.
I want to get the HtmlAgilityPack.HTMLNode corresponding to the HTMLElement.
The easiest way (in my mind) is to use doc.DocumentNode.SelectSingleNode(EXACTHTMLTEXT) but it does not really work (because the function only accepts xpath code).
How can I do this?
A sample HTMLElement select by a user looks like this (The OuterHtml Code):
<a onmousedown="return wow" class="l" href="http://site.com"><em>Great!!!</em> <b>come and see more</b></a>
Of course, any element can be selected, that's why I need a way to get the HTMLNode.
Same concept, but a bit simpler because you don't have to know the element type:
HtmlNode n = doc.DocumentNode.Descendants().Where(n => n.OuterHtml.Equals(text, StringComparison.InvariantCultureIgnoreCase)).FirstOrDefault();
I came up with a solution. Don't know if it's the best (I would appreciate if somebody knows a better way to achieve this to let me know).
Here is the class that will get the HTMLNode:
public HtmlNode GetNode(string text)
{
if (text.StartsWith("<")) //get the type of the element (a, p, div etc..)
{
string type = "";
for (int i = 1; i < text.Length; i++)
{
if (text[i] == ' ')
{
type = text.Substring(1, i - 1);
break;
}
}
try //check to see if there are any nodes of your HTMLElement type that have an OuterHtml equal to the HtmlElement Outer HTML. If a node exist, than that's the node we want to use
{
HtmlNode n = doc.DocumentNode.SelectNodes("//" + type).Where(x => x.OuterHtml == text).First();
return n;
}
catch (Exception)
{
throw new Exception("Cannot find the HTML element in the HTML Page");
}
}
else
{
throw new Exception("Invalid HTML Element supplied. The selected HTML element must start with <");
}
}
The idea is that you pass the OuterHtml of the HtmlElement. Example:
HtmlElement el=....
HtmlNode N = GetNode(el.OuterHtml);

replacing value only from an Xelement using linq

EDIT: A bit more detailed HTML document... In short- how do I actually do the lookup and where precisely should the element.setvalue or element.value appear in the query...
Edit 2: The list of monkey id does not appear clear so I will add proper id's and add additional properties to my Lookup data object, sorry for the confusion! The reason I have used a list is bacause my datasource could be from anywhere also I have used a List object because I do not really know the proper usage of Dictionary (I am a newbie to coding hence why my question is all over the place, please bear with me)
I have an XElement which is a properly formatted HTML document, I am trying to replace only the value of a html element with a value contained in a List Object for example
<div id="pageContainer">
<p> some guy wants to <b>buy</b> a <h4><label id="monkey23">monkeyfield</label></h4> for some price that I do not have a clue about, maybe we should <i>suggest</i> a list of other monkeys he may like:
</p>
<h3>list of special monekeys you may want chappy...</h3>
<br />
<ul>
<li><label id="monkey13">monkeyfield</label></li>
<li><label id="monkey3">monkeyfield</label></li>
<li><label id="animal4">animalfield</label></li>
<li><label id="seacreature5">seacreaturefield</label></li>
<li><label id="mamal1">mamal field</label></li>
</ul>
</div>
Note: the value "monkeyfield" is a temporary value inserted onscreen for the purpose of identifying this is a field, once the values from the data source is binded the new values should appear.
public class LookupData
{
public string id{get;set;}
public string value{get;set;}
public string Type{get;set;}
public string Url{get;set;}
}
...
public void DataTransformerMethod()
{
var data = new List<LookupData>();
data.add(new LookupData{id="monkey3", value="special monkey from africa" });
data.add(new LookupData{id="monkey13", value="old monkey from china" });
data.add(new LookupData{id="seacreature5", value="sea monkey" });
data.add(new LookupData{id="animal4", value="rhino" });
data.add(new LookupData{id="mamal1", value="some mamal creature" });
//what linq query will iterate over the document and set the values from the values
//found in the list?
var answer = from x in HtmlDocAsAXelement
where x.Attributes()
.Any(a=> data.AsEnumerable().Where(f=> f.Name == a.Name) );
//somehow I should use .SetValue(a.value)???
SaveTheNewXElement(answer ); //all other original data must stay in tact...
}
Well, you need to iterate over all the XElements which need changing - and set their value by just calling the Value setter:
element.Value = "newvalue";
It would be trickier if the element had multiple text nodes and you only wanted to change one of them, but as there's no other content within the element, this should be fine for you.
EDIT: After the discussion, I would do something like this:
Dictionary<string, string> replacements = data.ToDictionary(x => x.id,
x => x.value);
foreach (XElement element in HtmlDocAsAXelement.Descendants())
{
string newValue;
string id = (string) element.Attribute("id");
if (id != null && replacements.TryGetValue(id, out newValue))
{
element.Value = newValue;
}
}
You can't do that "easily", because there is no foreach equivalent in LINQ. That's on purpose. See http://blogs.msdn.com/b/ericlippert/archive/2009/05/18/foreach-vs-foreach.aspx and LINQ equivalent of foreach for IEnumerable<T>.
I would suggest you just do a normal foreach over the query results.

Find specific data in html with HtmlElement(Collection) and webbrowser

I want to find a div with the class name XYZ then in it I want to loop through a bunch of elements named ABC. Then grab the links (a href) inside and possibly other information.
How do I find the div with XYZ from webBrowser1.Document.Links and any subitems I want?
First you said you want to find a div with the class name XYZ, so why are you looking in webBrowser1.Documnet.Links? Find the Div first, then get to the links within it.
HtmlDocument doc = webBrowser.Document;
HtmlElementCollection col = doc.GetElementsByTagName("div");
foreach (HtmlElement element in col)
{
string cls = element.GetAttribute("className");
if (String.IsNullOrEmpty(cls) || !cls.Equals("XYZ"))
continue;
HtmlElementCollection childDivs = element.Children.GetElementsByName("ABC");
foreach (HtmlElement childElement in childDivs)
{
//grab links and other stuff same way
}
}
Also note the use of "className" instead of "class", it will get you the name of the proper class. Using just "class" will return an empty string. This is documented in MSDN - SetAttribute, but not in GetAttribute. So it causes a little bit of confusion.

Categories

Resources