I'm trying to generate CSS selectors for random elements on a webpage by means of C#. Some background:
I use a form with a WebBrowser control. While navigating one can ask for the CSS selector of the element under the cursor. Getting the html-element is trivial, of course, by means of:
WebBrowser.Document.GetElementFromPoint(<Point>);
The ambition is to create a 'strict' css selector leading up to the element under the cursor, a-la:
html > body > span:eq(2) > li:eq(5) > div > div:eq(3) > span > a
This selector is based on :eq operators since it's meant to be handled by jQuery and/or SizzleJS (these two support :eq - original CSS selectors don't. Thumbs up #BoltClock for helping me clarify this). So, you get the picture. In order to achieve this goal, we supply the retrieved HtmlElement to the below method and start ascending up the DOM tree by asking for the Parent of each element we come across:
private static List<String> GetStrictCssForHtmlElement(HtmlElement element)
{
List<String> familyTree;
for (familyTree = new List<String>(); element != null; element = element.Parent)
{
string ordinalString = CalculateOrdinalPositionAmongSameTagSimblings(element);
if (ordinalString == null) return null;
familyTree.Add(element.TagName.ToLower() + ordinalString);
}
familyTree.Reverse();
return familyTree;
}
private static string CalculateOrdinalPositionAmongSameTagSimblings(HtmlElement element, bool simplifyEq0 = true)
{
int count = 0;
int positionAmongSameTagSimblings = -1;
if (element.Parent != null)
{
foreach (HtmlElement child in element.Parent.Children)
{
if (element.TagName.ToLower() == child.TagName.ToLower())
{
count++;
if (element == child)
{
positionAmongSameTagSimblings = count - 1;
}
}
}
if (positionAmongSameTagSimblings == -1) return null; // Couldn't find child in parent's offsprings!?
}
return ((count > 1) ? (":eq(" + positionAmongSameTagSimblings + ")") : ((simplifyEq0) ? ("") : (":eq(0)")));
}
This method has worked reliably for a variety of pages. However, there's one particular page which makes my head in:
http://www.delicious.com/recent
Trying to retrieve the CSS selector of any element in the list (at the center of the page) fails for one very simple reason:
After the ascension hits the first SPAN element in it's way up (you can spot it by inspecting the page with IE9's web-dev tools for verification) it tries to process it by calculating it's ordinal position among it's same tag siblings. To do that we need to ask it's Parent node for the siblings. This is where things get weird. The SPAN element reports that it's Parent is a DIV element with id="recent-index". However that's not the immediate parent of the SPAN (the immediate parent is LI class="wrap isAdv"). This causes the method to fail because -unsurprisingly- it fails to spot SPAN among the children.
But it gets even weirder. I retrieved and isolated the HtmlElement of the SPAN itself. Then I got it's Parent and used it to re-descend back down to the SPAN element using:
HtmlElement regetSpanElement = spanElement.Parent.Children[0].Children[1].Children[1].Children[0].Children[2].Children[0];
This lead us back to the SPAN node we begun ... with one twist however:
regetSpanElement.Parent.TagName;
This now reports LI as the parent X-X. How can this be? Any insight?
Thank you again in advance.
Notes:
I saved the Html code (as it's presented inside WebBrowser.Document.Html) and inspected it myself to be 100% sure that nothing funny is taking place (aka different code served to WebBrowser control than the one I see in IE9 - but that's not happening the structure matches 100% for the path concerned).
I am running WebBrowser control in IE9-mode using the instructions outlined here:
http://www.west-wind.com/weblog/posts/2011/May/21/Web-Browser-Control-Specifying-the-IE-Version
Trying to get WebBrowser control and IE9 to run as similarly as possible.
I suspect that the effects observed might be due to some script running behind my back. However my knowledge is not so far reaching in terms of web-programming to pin it down.
Edit: Typos
Relying on :eq() is tough! It is difficult to reliably re-select out of a DOM that is dynamic. Sure it may work on very static pages, but things are only getting more dynamic every day. You might consider changing strategy a little bit. Try using a smarter more flexible selector. Perhaps pop in some javascript like so:
predictCss = function(s, noid, noclass, noarrow) {
var path, node = s;
var psep = noarrow ? ' ' : ' > ';
if (s.length != 1) return path; //throw 'Requires one element.';
while (node.length) {
var realNode = node[0];
var name = (realNode.localName || realNode.tagName || realNode.nodeName);
if (!name || name == '#document') break;
name = name.toLowerCase();
if(node.parent().children(name).length > 1){
if (realNode.id && !noid) {
try {
var idtest = $(name + '#' + realNode.id);
if (idtest.length == 1) return name + '#' + realNode.id + (path ? '>' + path : '');
} catch (ex) {} // just ignore the exception, it was a bad ID
} else if (realNode.className && !noclass) {
name += '.' + realNode.className.split(/\s+/).join('.');
}
}
var parent = node.parent();
if (name[name.length - 1] == '.') {
name = name.substring(0, name.length - 1);
}
siblings = parent.children(name);
//// If you really want to use eq:
//if (siblings.length > 1) name += ':eq(' + siblings.index(node) + ')';
path = name + (path ? psep + path : '');
node = parent;
}
return path
}
And use it to generate a variety of selectors:
var elem = $('#someelement');
var epath = self.model.util.predictCss(elem, true, true, false);
var epathclass = self.model.util.predictCss(elem, true, false, false);
var epathclassid = self.model.util.predictCss(elem, false, false, false);
Then use each:
var relem= $(epathclassid);
if(relem.length === 0){
relem = $(epathclass);
if(relem.length === 0){
relem = $(epath);
}
}
And if your best selector still comes out with more than one element, you'll have to get creative in how you match a dom element - perhaps levenshtein or perhaps there is some specific text, or you can fallback to eq. Hope that helps!
Btw, I assumed you have jQuery - due to the sizzle reference. You could inject the above in a self-executing anonymous function in a script tag appended to the last child of body for example.
Related
Running into an issue with a function I wrote for a selenuim testcase... When I run a jquery on a web element ID (#AmountToggle), it displays all the attributes. I want to verify this one ("lastChild"):
but when I run this code:
driver.FindElement(By.CssSelector("#AmountToggle")).GetAttribute("lastChild")
its returning null?!
Why is this and how can I get the correct value of this attribute?
So it looks like "lastChild" is a property, not an attribute. Although I cannot assert that for sure with what you have above.
The difference being, an attribute would appear as something in the direct html, such as:
Link
Where href and id are attributes. If lastChild doesn't appear in the html like the above examples, it won't be considered an attribute.
First try comparing these two in the javascript console:
$("#AmountToggle").attr("lastChild")
$("#AmountToggle").prop("lastChild")
The following is a workaround when you have issues with Selenium finding things. This logic will allow you to find things inside of iframes with ease, and also will allow you to use pseudo-selectors to find elements:
public static string GetFullyQualifiedXPathToElement(string cssSelector, bool isFullJQuery = false, bool noWarn = false)
{
if (cssSelector.Contains("$(") && !isFullJQuery) {
isFullJQuery = true;
}
string finder_method = #"
function getPathTo(element) {
if(typeof element == 'undefined') return '';
if (element.tagName == 'HTML')
return '/HTML[1]';
if (element===document.body)
return '/HTML[1]/BODY[1]';
var ix= 0;
var siblings = element.parentNode.childNodes;
for (var i= 0; i< siblings.length; i++) {
var sibling= siblings[i];
if (sibling===element)
return getPathTo(element.parentNode)+'/'+element.tagName+'['+(ix+1)+']';
if (sibling.nodeType===1 && sibling.tagName===element.tagName)
ix++;
}
}
";
if(isFullJQuery) {
cssSelector = cssSelector.TrimEnd(';');
}
string executable = isFullJQuery ? string.Format("{0} return getPathTo({1}[0]);", finder_method, cssSelector) : string.Format("{0} return getPathTo($('{1}')[0]);", finder_method, cssSelector.Replace("'", "\""));
string xpath = string.Empty;
try {
xpath = BaseTest.Driver.ExecuteJavaScript<string>(executable);
} catch (Exception e) {
if (!noWarn) {
//Warn about failure with custom message.
}
}
if (!noWarn && string.IsNullOrEmpty(xpath)) {
//Warn about failure with custom message.
//string.Format("Supplied cssSelector did not point to an element. Selector is \"{0}\".", cssSelector);
}
return xpath;
}
This method uses Jquery, which has more extensive search options using CssSelectors (such as pseudo selectors), and finds things 100% of the time given a good search query. This method uses JQuery to find the element, and then generates an explicit XPath to that element in the DOM, returning that XPath. With the explicit XPath, you can then tell Selenium to find the element using XPath.
It looks like the value of last-child is an element itself. If that is true, here is how you might use this in your example:
driver.FindElement(By.XPath(GetFullyQualifiedXPathToElement("$(#AmountToggle).prop('lastChild')[0]", true)));
Note three things here. The first is that I used "prop" in JQuery. Change that to "attr" if that was the correct call. Also, note the [0] index. This will return the JQuery element value as a regular javascript DOM element, which is what the method above uses. The final thing to note is the cssSelector value passed in. You can pass in just a selector to this method, such as "#SomeElementId > div", or you can pass in full JQuery, such as "$('#SomeElementId > div')".
I am able to retrieve data using C# & XPath and display it in a list, but I would like to know how to perform two unique actions.
To start with, my code example looks like this:
protected async override void OnNavigatedTo(NavigationEventArgs e)
{
base.OnNavigatedTo(e);
string htmlPagePurchase = "";
using (var client = new HttpClient())
{
htmlPagePurchase = await client.GetStringAsync(MyURI);
}
HtmlDocument htmlDocumentPurchase = new HtmlDocument();
htmlDocumentPurchase.LoadHtml(htmlPagePurchase);
foreach (var div in htmlDocumentPurchase.DocumentNode.SelectNodes("//div[contains(#id, 'odyContent')]"))
{
PurchaseDetails newPurchase = new PurchaseDetails();
newPurchase.Expiry = div.SelectSingleNode(".//ex1").InnerText.Trim();
Purchase.Add(newPurchase);
}
lstPurchase.ItemsSource = Purchase;
}
Firstly, if there is no "ex1" node within the page, can I request a null value be returned or for it to be ignored? I need to do this as some of the pages I use contain the data I want in an alternative node (I can't control this) and I don't want the app to crash if one of the nodes isn't contained within that page.
Secondly, if the node contains no text within it, can I force an output i.e. within a list of "ex1" nodes, some contain an expired date but one "ex1" node does not include any date as it hasn't expired yet. When that happens can I return my own value of 'hasn't expired', for example?
This is being compiled in a Windows Phone 8.0 Silverlight App.
This code should work by checking the node and value, and using your defaultValue if no real value is found.
var node = xmlDoc.SelectSingleNode(".//ex1");
return (node == null || string.IsNullOrEmpty((node.InnerText ?? "").Trim()) ? defaultValue : node.InnerText.Trim());
.NET Fiddle: https://dotnetfiddle.net/3DAjKH
UPDATE FOR INTEGRATING WITH PROVIDED CODE SAMPLE
This should work within your loop.
var exNode = div.SelectSingleNode(".//ex1");
if (exNode == null || string.IsNullOrEmpty((exNode.InnerText ?? "").Trim()))
newPurchase.Expiry = "N/A"; // Default value
else
newPurchase.Expiry = div.SelectSingleNode(".//ex1").InnerText.Trim();
I am using xml linq on my project. I am dealing with very large xml's for easy understanding purpose I have mentioned small sample xml.
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<StackOverflowReply xmlns="http://xml.stack.com/RRAND01234">
<processStatus>
<statusCode1>P</statusCode1>
<statusCode2>P</statusCode2>
<statusCode3>P</statusCode3>
<statusCode4>P</statusCode4>
</processStatus>
</StackOverflowReply>
</soap:Body>
Following is C# xml linq
XNamespace x = "http://xml.stack.com/RRAND01234";
var result = from StackOverflowReply in XDocument.Parse(Myxml).Descendants(x + "Security_AuthenticateReply")
select new
{
status1 = StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode1").Value,
status2 = StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode2").Value,
status3 = StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode3").Value,
status4 = StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode4").Value,
status5 = StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode5").Value,
};
Here I am getting exception like "Object reference not set to an instance of an object.". Because the tag
<statusCode5>
was not in my xml.In this case I want to get detail exception message like "Missing tag statusCode5". Please guide me how to get this message from my exception.
There's no easy way (that I'm aware of) to find out exactly what element(s) was/were missing in a LINQ to XML statement. What you can do however is use (string) on the element to handle missing elements - but that can get tricky if you have a chain of elements.
That wouldn't work in your current code:
status5 = (string)StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode5")
Becuase (string) will only work on first element, and the second one is the one that is missing.
You could change your LINQ to focus only on the subnodes, like this:
XNamespace x = "http://xml.stack.com/RRAND01234";
var result = from StackOverflowReply in XDocument.Parse(Myxml).Descendants(x + "processStatus")
select new
{
status1 = (string)StackOverflowReply.Element(x + "statusCode1"),
status2 = (string)StackOverflowReply..Element(x + "statusCode2"),
status3 = (string)StackOverflowReply..Element(x + "statusCode3"),
status4 = (string)StackOverflowReply.Element(x + "statusCode4"),
status5 = (string)StackOverflowReply.Element(x + "statusCode5"),
};
However, if your XML is complex and you have different depths (nested elements), you'll need a more robust solution to avoid a bunch of conditional operator checks or multiple queries.
I have something that might help if that is the case - I'll have to dig it up.
EDIT For More Complex XML
I've had similar challenges with some XML I have to deal with at work. In lieu of an easy way to determine what node was the offending node, and not wanting to have hideously long ternary operators, I wrote an extension method that worked recursively from the specified starting node down to the one I was looking for.
Here's a somewhat simple and contrived example to demonstrate.
<SomeXML>
<Tag1>
<Tag1Child1>Value1</Tag1Child1>
<Tag1Child2>Value2</Tag1Child2>
<Tag1Child3>Value3</Tag1Child3>
<Tag1Child4>Value4</Tag1Child4>
</Tag1>
<Tag2>
<Tag2Child1>
<Tag2Child1Child1>SomeValue1</Tag2Child1Child1>
<Tag2Child1Child2>SomeValue2</Tag2Child1Child2>
<Tag2Child1Child3>SomeValue3</Tag2Child1Child3>
<Tag2Chidl1Child4>SomeValue4</Tag2Child1Child4>
<Tag2Child1>
<Tag2Child2>
<Tag2Child2Child1>
<Tag2Child2Child1Child1 />
<Tag2Child2Child1Child2 />
</Tag2Child2>
</Tag2>
</SomeXML>
In the above XML, I had no way of knowing (prior to parsing) if any of the children elements were empty, so I after some searching and fiddling I came up with the following extension method:
public static XElement GetChildFromPath(this XElement currentElement, List<string> elementNames, int position = 0)
{
if (currentElement == null || !currentElement.HasElements)
{
return currentElement;
}
if (position == elementNames.Count - 1)
{
return currentElement.Element(elementNames[position]);
}
else
{
XElement nextElement = currentElement.Element(elementNames[position]);
return GetChildFromPath(nextElement, elmenentNames, position + 1);
}
}
Basically, the method takes the XElement its called on, plus a List<string> of the elements in path order, with the one I want as the last one, and a position (index in the list), and then works it way down the path until it finds the element in question or runs out of elements in the path. It's not as elegant as I would like it to be, but I haven't had time to refactor it any.
I would use it like this (based on the sample XML above):
MyClass myObj = (from x in XDocument.Parse(myXML).Descendants("SomeXML")
select new MyClass() {
Tag1Child1 = (string)x.GetChildFromPath(new List<string>() {
"Tag1", "Tag1Child1" }),
Tag2Child1Child4 = (string)x.GetChildFromPath(new List<string>() {
"Tag2", "Tag2Child1", "Tag2Child1Child4" }),
Tag2Child2Child1Child2 = (string)x.GetChildFromPath(new List<string>() {
"Tag2", "Tag2Child2", "Tag2Child2Child1",
"Tag2Child2Child1Child2" })
}).SingleOrDefault();
Not as elegant as I'd like it to be, but at least it allows me to parse an XML document that may have missing nodes without blowing chunks. Another option was to do something like:
Tag2Child2Child1Child1 = x.Element("Tag2") == null ?
"" : x.Element("Tag2Child2") == null ?
"" : x.Element("Tag2Child2Child1") == null ?
"" : x.Element("Tag2Child2Child1Child2") == null ?
"" : x.Element("Tag2")
.Element("Tag2Child2")
.Element("Tag2Child2Child1")
.Element("Tag2Child2Child1Child2").Value
That would get really ugly for an object that had dozens of properties.
Anyway, if this is of use to you feel free to use/adapt/modify as you need.
I have created a HTMLElement picker (DOM) by using the default .net WebBrowser.
The user can pick (select) a HTMLElement by clicking on it.
I want to get the HtmlAgilityPack.HTMLNode corresponding to the HTMLElement.
The easiest way (in my mind) is to use doc.DocumentNode.SelectSingleNode(EXACTHTMLTEXT) but it does not really work (because the function only accepts xpath code).
How can I do this?
A sample HTMLElement select by a user looks like this (The OuterHtml Code):
<a onmousedown="return wow" class="l" href="http://site.com"><em>Great!!!</em> <b>come and see more</b></a>
Of course, any element can be selected, that's why I need a way to get the HTMLNode.
Same concept, but a bit simpler because you don't have to know the element type:
HtmlNode n = doc.DocumentNode.Descendants().Where(n => n.OuterHtml.Equals(text, StringComparison.InvariantCultureIgnoreCase)).FirstOrDefault();
I came up with a solution. Don't know if it's the best (I would appreciate if somebody knows a better way to achieve this to let me know).
Here is the class that will get the HTMLNode:
public HtmlNode GetNode(string text)
{
if (text.StartsWith("<")) //get the type of the element (a, p, div etc..)
{
string type = "";
for (int i = 1; i < text.Length; i++)
{
if (text[i] == ' ')
{
type = text.Substring(1, i - 1);
break;
}
}
try //check to see if there are any nodes of your HTMLElement type that have an OuterHtml equal to the HtmlElement Outer HTML. If a node exist, than that's the node we want to use
{
HtmlNode n = doc.DocumentNode.SelectNodes("//" + type).Where(x => x.OuterHtml == text).First();
return n;
}
catch (Exception)
{
throw new Exception("Cannot find the HTML element in the HTML Page");
}
}
else
{
throw new Exception("Invalid HTML Element supplied. The selected HTML element must start with <");
}
}
The idea is that you pass the OuterHtml of the HtmlElement. Example:
HtmlElement el=....
HtmlNode N = GetNode(el.OuterHtml);
I need to get the rendering parameters programmatically from my sublayout. Currently I do this as such:
var sublayout = ((Sublayout)this.Parent);
//Get all rendering
var renderings = Sitecore.Context.Item.Visualization.GetRenderings(Sitecore.Context.Device, true);
//Get the first rendering that matches the current sublayout's path
var sublayoutRendering = renderings.FirstOrDefault(r => r.RenderingItem.InnerItem["Path"] == sublayout.Path);
if (sublayoutRendering != null)
Response.Write(sublayoutRendering.RenderingItem.Parameters);
This solution came from this question and works perfectly until I have two sublayouts of the same type on the page. When this occurs obviously renderings.FirstOrDefault(r => r.RenderingItem.InnerItem["Path"] == sublayout.Path); always returns the first rendering parameter that matches the sublayouts path for both sublayouts.
How can I differentiate between them? I can see nothing that I can use to tie them together!
Edit:
To be clear, I add my sublayout in Presentation > Details, then when I click my control I set the fields in the 'Control Properties' window. I have a field called Module Source which always comes back the same - it always populates as the one highest up in the order. The values are definitely different for each sublayout but I cannot get them from the renderings.
Not sure if I'm missing something. But you can get the sublayouts rendering parameters, directly on the Sublayout. I use the following on my base Sublayout I use for all my Sitecore sublayouts - and it has no problems with rendering parameters on the same sublayout inserted multiple times :)
protected Sitecore.Web.UI.WebControls.Sublayout CurrentSublayout
{
get
{
Control c = Parent;
while (c != null && !(c is Sitecore.Web.UI.WebControls.Sublayout))
{
c = c.Parent;
if (c == null)
break;
}
return c as Sitecore.Web.UI.WebControls.Sublayout;
}
}
protected NameValueCollection CurrentParameters
{
get
{
if (CurrentSublayout == null)
return null;
NameValueCollection parms = WebUtil.ParseUrlParameters(CurrentSublayout.Parameters);
var sanitizedValues = new NameValueCollection();
for (int i = 0; i < parms.Count; i++)
{
if (!string.IsNullOrEmpty(parms[i]))
sanitizedValues.Add(parms.Keys[i], parms[i]);
}
return sanitizedValues;
}
}
You may want to check the cache settings on your sub-layout, if you don't have Cacheable VarbyParam it is not going to work for you