Adding conditional statements to an XPath query - c#

I am able to retrieve data using C# & XPath and display it in a list, but I would like to know how to perform two unique actions.
To start with, my code example looks like this:
protected async override void OnNavigatedTo(NavigationEventArgs e)
{
base.OnNavigatedTo(e);
string htmlPagePurchase = "";
using (var client = new HttpClient())
{
htmlPagePurchase = await client.GetStringAsync(MyURI);
}
HtmlDocument htmlDocumentPurchase = new HtmlDocument();
htmlDocumentPurchase.LoadHtml(htmlPagePurchase);
foreach (var div in htmlDocumentPurchase.DocumentNode.SelectNodes("//div[contains(#id, 'odyContent')]"))
{
PurchaseDetails newPurchase = new PurchaseDetails();
newPurchase.Expiry = div.SelectSingleNode(".//ex1").InnerText.Trim();
Purchase.Add(newPurchase);
}
lstPurchase.ItemsSource = Purchase;
}
Firstly, if there is no "ex1" node within the page, can I request a null value be returned or for it to be ignored? I need to do this as some of the pages I use contain the data I want in an alternative node (I can't control this) and I don't want the app to crash if one of the nodes isn't contained within that page.
Secondly, if the node contains no text within it, can I force an output i.e. within a list of "ex1" nodes, some contain an expired date but one "ex1" node does not include any date as it hasn't expired yet. When that happens can I return my own value of 'hasn't expired', for example?
This is being compiled in a Windows Phone 8.0 Silverlight App.

This code should work by checking the node and value, and using your defaultValue if no real value is found.
var node = xmlDoc.SelectSingleNode(".//ex1");
return (node == null || string.IsNullOrEmpty((node.InnerText ?? "").Trim()) ? defaultValue : node.InnerText.Trim());
.NET Fiddle: https://dotnetfiddle.net/3DAjKH
UPDATE FOR INTEGRATING WITH PROVIDED CODE SAMPLE
This should work within your loop.
var exNode = div.SelectSingleNode(".//ex1");
if (exNode == null || string.IsNullOrEmpty((exNode.InnerText ?? "").Trim()))
newPurchase.Expiry = "N/A"; // Default value
else
newPurchase.Expiry = div.SelectSingleNode(".//ex1").InnerText.Trim();

Related

Selenium C# - Web element attribute is present but cannot be found?

Running into an issue with a function I wrote for a selenuim testcase... When I run a jquery on a web element ID (#AmountToggle), it displays all the attributes. I want to verify this one ("lastChild"):
but when I run this code:
driver.FindElement(By.CssSelector("#AmountToggle")).GetAttribute("lastChild")
its returning null?!
Why is this and how can I get the correct value of this attribute?
So it looks like "lastChild" is a property, not an attribute. Although I cannot assert that for sure with what you have above.
The difference being, an attribute would appear as something in the direct html, such as:
Link
Where href and id are attributes. If lastChild doesn't appear in the html like the above examples, it won't be considered an attribute.
First try comparing these two in the javascript console:
$("#AmountToggle").attr("lastChild")
$("#AmountToggle").prop("lastChild")
The following is a workaround when you have issues with Selenium finding things. This logic will allow you to find things inside of iframes with ease, and also will allow you to use pseudo-selectors to find elements:
public static string GetFullyQualifiedXPathToElement(string cssSelector, bool isFullJQuery = false, bool noWarn = false)
{
if (cssSelector.Contains("$(") && !isFullJQuery) {
isFullJQuery = true;
}
string finder_method = #"
function getPathTo(element) {
if(typeof element == 'undefined') return '';
if (element.tagName == 'HTML')
return '/HTML[1]';
if (element===document.body)
return '/HTML[1]/BODY[1]';
var ix= 0;
var siblings = element.parentNode.childNodes;
for (var i= 0; i< siblings.length; i++) {
var sibling= siblings[i];
if (sibling===element)
return getPathTo(element.parentNode)+'/'+element.tagName+'['+(ix+1)+']';
if (sibling.nodeType===1 && sibling.tagName===element.tagName)
ix++;
}
}
";
if(isFullJQuery) {
cssSelector = cssSelector.TrimEnd(';');
}
string executable = isFullJQuery ? string.Format("{0} return getPathTo({1}[0]);", finder_method, cssSelector) : string.Format("{0} return getPathTo($('{1}')[0]);", finder_method, cssSelector.Replace("'", "\""));
string xpath = string.Empty;
try {
xpath = BaseTest.Driver.ExecuteJavaScript<string>(executable);
} catch (Exception e) {
if (!noWarn) {
//Warn about failure with custom message.
}
}
if (!noWarn && string.IsNullOrEmpty(xpath)) {
//Warn about failure with custom message.
//string.Format("Supplied cssSelector did not point to an element. Selector is \"{0}\".", cssSelector);
}
return xpath;
}
This method uses Jquery, which has more extensive search options using CssSelectors (such as pseudo selectors), and finds things 100% of the time given a good search query. This method uses JQuery to find the element, and then generates an explicit XPath to that element in the DOM, returning that XPath. With the explicit XPath, you can then tell Selenium to find the element using XPath.
It looks like the value of last-child is an element itself. If that is true, here is how you might use this in your example:
driver.FindElement(By.XPath(GetFullyQualifiedXPathToElement("$(#AmountToggle).prop('lastChild')[0]", true)));
Note three things here. The first is that I used "prop" in JQuery. Change that to "attr" if that was the correct call. Also, note the [0] index. This will return the JQuery element value as a regular javascript DOM element, which is what the method above uses. The final thing to note is the cssSelector value passed in. You can pass in just a selector to this method, such as "#SomeElementId > div", or you can pass in full JQuery, such as "$('#SomeElementId > div')".

Is there a better way to remove span elements but leave child nodes?

I want to remove all span elements (without attributes) but leave the inner html. I have created the following code snippet which appears to work but I can't help thinking this is overly complicated for such a task. Is there a better way?
var config = Configuration.Default.WithDefaultLoader().WithCss();
var parser = new HtmlParser(config);
var document = parser.Parse("<p><span><span><em>span text</em></span> </span> span text</p>");
foreach (var element in document.Descendents())
{
var parent = element.Parent;
while (parent != null)
{
var span = parent as IHtmlSpanElement;
if (span != null && !span.Attributes.Any())
{
span.Replace(span.ChildNodes.ToArray());
}
parent = parent.Parent;
}
}
document.Body.InnerHtml.Dump();
// outputs: <p><em>span text</em> span text</p>
What you want is a replacement. Luckily, such an API exists, which you already use (Replace). However, most of your boilerplate code can also be replaced with standard APIs (like QuerySelectorAll):
var config = Configuration.Default.WithDefaultLoader().WithCss();
var parser = new HtmlParser(config);
var document = parser.Parse("<p><span><span><em>span text</em></span> </span> span text</p>");
foreach (var element in document.QuerySelectorAll("span").Where(m => m.Attributes.Length == 0))
{
element.Replace(element.ChildNodes.ToArray());
}
document.Body.InnerHtml.Dump();
Note: I've only placed the Where to have the same condition as you placed in your code - namely no attribute should be found on these span elements.
Hope this helps!

Get rendering parameters when multiple sublayouts of the same type are on the page

I need to get the rendering parameters programmatically from my sublayout. Currently I do this as such:
var sublayout = ((Sublayout)this.Parent);
//Get all rendering
var renderings = Sitecore.Context.Item.Visualization.GetRenderings(Sitecore.Context.Device, true);
//Get the first rendering that matches the current sublayout's path
var sublayoutRendering = renderings.FirstOrDefault(r => r.RenderingItem.InnerItem["Path"] == sublayout.Path);
if (sublayoutRendering != null)
Response.Write(sublayoutRendering.RenderingItem.Parameters);
This solution came from this question and works perfectly until I have two sublayouts of the same type on the page. When this occurs obviously renderings.FirstOrDefault(r => r.RenderingItem.InnerItem["Path"] == sublayout.Path); always returns the first rendering parameter that matches the sublayouts path for both sublayouts.
How can I differentiate between them? I can see nothing that I can use to tie them together!
Edit:
To be clear, I add my sublayout in Presentation > Details, then when I click my control I set the fields in the 'Control Properties' window. I have a field called Module Source which always comes back the same - it always populates as the one highest up in the order. The values are definitely different for each sublayout but I cannot get them from the renderings.
Not sure if I'm missing something. But you can get the sublayouts rendering parameters, directly on the Sublayout. I use the following on my base Sublayout I use for all my Sitecore sublayouts - and it has no problems with rendering parameters on the same sublayout inserted multiple times :)
protected Sitecore.Web.UI.WebControls.Sublayout CurrentSublayout
{
get
{
Control c = Parent;
while (c != null && !(c is Sitecore.Web.UI.WebControls.Sublayout))
{
c = c.Parent;
if (c == null)
break;
}
return c as Sitecore.Web.UI.WebControls.Sublayout;
}
}
protected NameValueCollection CurrentParameters
{
get
{
if (CurrentSublayout == null)
return null;
NameValueCollection parms = WebUtil.ParseUrlParameters(CurrentSublayout.Parameters);
var sanitizedValues = new NameValueCollection();
for (int i = 0; i < parms.Count; i++)
{
if (!string.IsNullOrEmpty(parms[i]))
sanitizedValues.Add(parms.Keys[i], parms[i]);
}
return sanitizedValues;
}
}
You may want to check the cache settings on your sub-layout, if you don't have Cacheable VarbyParam it is not going to work for you

HtmlElement.Parent returns wrong parent

I'm trying to generate CSS selectors for random elements on a webpage by means of C#. Some background:
I use a form with a WebBrowser control. While navigating one can ask for the CSS selector of the element under the cursor. Getting the html-element is trivial, of course, by means of:
WebBrowser.Document.GetElementFromPoint(<Point>);
The ambition is to create a 'strict' css selector leading up to the element under the cursor, a-la:
html > body > span:eq(2) > li:eq(5) > div > div:eq(3) > span > a
This selector is based on :eq operators since it's meant to be handled by jQuery and/or SizzleJS (these two support :eq - original CSS selectors don't. Thumbs up #BoltClock for helping me clarify this). So, you get the picture. In order to achieve this goal, we supply the retrieved HtmlElement to the below method and start ascending up the DOM tree by asking for the Parent of each element we come across:
private static List<String> GetStrictCssForHtmlElement(HtmlElement element)
{
List<String> familyTree;
for (familyTree = new List<String>(); element != null; element = element.Parent)
{
string ordinalString = CalculateOrdinalPositionAmongSameTagSimblings(element);
if (ordinalString == null) return null;
familyTree.Add(element.TagName.ToLower() + ordinalString);
}
familyTree.Reverse();
return familyTree;
}
private static string CalculateOrdinalPositionAmongSameTagSimblings(HtmlElement element, bool simplifyEq0 = true)
{
int count = 0;
int positionAmongSameTagSimblings = -1;
if (element.Parent != null)
{
foreach (HtmlElement child in element.Parent.Children)
{
if (element.TagName.ToLower() == child.TagName.ToLower())
{
count++;
if (element == child)
{
positionAmongSameTagSimblings = count - 1;
}
}
}
if (positionAmongSameTagSimblings == -1) return null; // Couldn't find child in parent's offsprings!?
}
return ((count > 1) ? (":eq(" + positionAmongSameTagSimblings + ")") : ((simplifyEq0) ? ("") : (":eq(0)")));
}
This method has worked reliably for a variety of pages. However, there's one particular page which makes my head in:
http://www.delicious.com/recent
Trying to retrieve the CSS selector of any element in the list (at the center of the page) fails for one very simple reason:
After the ascension hits the first SPAN element in it's way up (you can spot it by inspecting the page with IE9's web-dev tools for verification) it tries to process it by calculating it's ordinal position among it's same tag siblings. To do that we need to ask it's Parent node for the siblings. This is where things get weird. The SPAN element reports that it's Parent is a DIV element with id="recent-index". However that's not the immediate parent of the SPAN (the immediate parent is LI class="wrap isAdv"). This causes the method to fail because -unsurprisingly- it fails to spot SPAN among the children.
But it gets even weirder. I retrieved and isolated the HtmlElement of the SPAN itself. Then I got it's Parent and used it to re-descend back down to the SPAN element using:
HtmlElement regetSpanElement = spanElement.Parent.Children[0].Children[1].Children[1].Children[0].Children[2].Children[0];
This lead us back to the SPAN node we begun ... with one twist however:
regetSpanElement.Parent.TagName;
This now reports LI as the parent X-X. How can this be? Any insight?
Thank you again in advance.
Notes:
I saved the Html code (as it's presented inside WebBrowser.Document.Html) and inspected it myself to be 100% sure that nothing funny is taking place (aka different code served to WebBrowser control than the one I see in IE9 - but that's not happening the structure matches 100% for the path concerned).
I am running WebBrowser control in IE9-mode using the instructions outlined here:
http://www.west-wind.com/weblog/posts/2011/May/21/Web-Browser-Control-Specifying-the-IE-Version
Trying to get WebBrowser control and IE9 to run as similarly as possible.
I suspect that the effects observed might be due to some script running behind my back. However my knowledge is not so far reaching in terms of web-programming to pin it down.
Edit: Typos
Relying on :eq() is tough! It is difficult to reliably re-select out of a DOM that is dynamic. Sure it may work on very static pages, but things are only getting more dynamic every day. You might consider changing strategy a little bit. Try using a smarter more flexible selector. Perhaps pop in some javascript like so:
predictCss = function(s, noid, noclass, noarrow) {
var path, node = s;
var psep = noarrow ? ' ' : ' > ';
if (s.length != 1) return path; //throw 'Requires one element.';
while (node.length) {
var realNode = node[0];
var name = (realNode.localName || realNode.tagName || realNode.nodeName);
if (!name || name == '#document') break;
name = name.toLowerCase();
if(node.parent().children(name).length > 1){
if (realNode.id && !noid) {
try {
var idtest = $(name + '#' + realNode.id);
if (idtest.length == 1) return name + '#' + realNode.id + (path ? '>' + path : '');
} catch (ex) {} // just ignore the exception, it was a bad ID
} else if (realNode.className && !noclass) {
name += '.' + realNode.className.split(/\s+/).join('.');
}
}
var parent = node.parent();
if (name[name.length - 1] == '.') {
name = name.substring(0, name.length - 1);
}
siblings = parent.children(name);
//// If you really want to use eq:
//if (siblings.length > 1) name += ':eq(' + siblings.index(node) + ')';
path = name + (path ? psep + path : '');
node = parent;
}
return path
}
And use it to generate a variety of selectors:
var elem = $('#someelement');
var epath = self.model.util.predictCss(elem, true, true, false);
var epathclass = self.model.util.predictCss(elem, true, false, false);
var epathclassid = self.model.util.predictCss(elem, false, false, false);
Then use each:
var relem= $(epathclassid);
if(relem.length === 0){
relem = $(epathclass);
if(relem.length === 0){
relem = $(epath);
}
}
And if your best selector still comes out with more than one element, you'll have to get creative in how you match a dom element - perhaps levenshtein or perhaps there is some specific text, or you can fallback to eq. Hope that helps!
Btw, I assumed you have jQuery - due to the sizzle reference. You could inject the above in a self-executing anonymous function in a script tag appended to the last child of body for example.

How to get values from tag? [duplicate]

I am developing a Windows Forms application which is interacting with a web site.
Using a WebBrowser control I am controlling the web site and I can iterate through the tags using:
HtmlDocument webDoc1 = this.webBrowser1.Document;
HtmlElementCollection aTags = webDoc1.GetElementsByTagName("a");
Now, I want to get a particular text from the tag which is below:
Show Assigned<br>
Like here I want to get the number 244 which is equal to assignedto in above tag and save it into a variable for further use.
How can I do this?
You can try splitting a string by ';' values, and then each string by '=' like this:
string aTag = ...;
foreach(var splitted in aTag.Split(';'))
{
if(splitted.Contains("="))
{
var leftSide = splitted.Split('=')[0];
var rightSide = splitted.Split('=')[1];
if(leftSide == "assignedto")
{
MessageBox.Show(rightSide); //It should be 244
//Or...
int num = int.Parse(rightSide);
}
}
}
Other option is to use Regexes, which you can test here: www.regextester.com. And some more info on regexes: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
Hope it helps!
If all cases are similar to this and you don't mind a reference to System.Web in your Windows Forms application, tou can do something like this:
using System;
public class Program
{
static void Main()
{
string href = #"issue?status=-1,1,2,3,4,5,6,7&
#sort=-activity&#search_text=&#dispname=Show Assigned&
#filter=status,assignedto&#group=priority&
#columns=id,activity,title,creator,status&assignedto=244&
#pagesize=50&#startwith=0";
href = System.Web.HttpUtility.HtmlDecode(href);
var querystring = System.Web.HttpUtility.ParseQueryString(href);
Console.WriteLine(querystring["assignedto"]);
}
}
This is a simplified example and first you need to extract the href attribute text, but that should not be complex. Having the href attribute text you can take advantage that is basically a querystring and reuse code in .NET that already parses query strings.
To complete the example, to obtain the href attribute text you could do:
HtmlElementCollection aTags = webBrowser.Document.GetElementsByTagName("a");
foreach (HtmlElement element in aTags)
{
string href = element.GetAttribute("href");
}

Categories

Resources