Scraping ASP.NET page, simulate click

Scraping ASP.NET page, simulate click - c#

I am trying to scrape alle pages of http://www.menorcarentals.com/en/villas but i am having some problems, as it give me the first page every time.
My approach is the find all inputs and selects on the page, and the set the value of __EVENTTARGET to the value of the button i want to click, which have worked before, but this site just won't budge.
Method to get all Input Fields
public static Dictionary<string, string> GetInputFields(CQ dom)
{
Dictionary<string, string> result = new Dictionary<string, string>();
foreach (var v in dom.Find("input"))
{
var value = v.Cq().Attr("value");
var key = v.Cq().Attr("name");
if (!string.IsNullOrWhiteSpace(value))
{
if (!result.ContainsKey(key))
{
result.Add(key, value);
}
else
{
result[key] = value;
}
}
}
// Get all selects
foreach (var s in dom.Select("select"))
{
var select = s.Cq();
var key = select.Attr("name");
foreach (var option in select.Children("option"))
{
var opt = option.Cq();
if(!string.IsNullOrWhiteSpace(opt.Attr("selected")))
{
if (!result.ContainsKey(key))
{
result.Add(key, opt.Val());
}
else
{
result[key] = opt.Val();
}
}
}
}
return result;
}
My code to run though the different pages
string searchPageUrl = "http://www.menorcarentals.com/en/villas";
var html = DownloadHelper.Download(searchPageUrl);
while (true)
{
CQ dom = html;
// parse page and get info i need here
// Find the next page
var pagination = dom.Select("#ctl00_Content_dpVillas").Children();
bool foundCurrent = false;
string clickElementName = string.Empty;
foreach (var pagi in pagination)
{
if (pagi.Classes.Any(x=>x.ToLower() == "current"))
{
foundCurrent = true;
}
else if (foundCurrent)
{
var href = pagi.Cq().Attr("href");
clickElementName = RegexHelper.Match(#"doPostBack\(\'([^']+)", href);
break;
}
}
if (string.IsNullOrWhiteSpace(clickElementName))
{
break; // no more pages
}
var inputFields = ScraperHelper.GetInputFields(html);
// Simulate that we click the next button
if (!inputFields.ContainsKey("__EVENTTARGET"))
inputFields.Add("__EVENTTARGET", String.Empty);
inputFields["__EVENTTARGET"] = clickElementName;
html = DownloadHelper.Post(searchPageUrl, inputFields);
}

Turn off your JavaScript along with cookies in browser (delete cookies before turning off) and than see the actual page that CsQuery will use.
This might be the result of why you can't parse anything,
For example an actual content of the page loads with AJAX.

Related

How can I make an auto injection detector in c#?

I want to make an auto injection scanner in any given website and I have to use c#.
I tried some things that I found online and none of them worked for me, until i find selenium but i keep getting this error message: "OpenQA.Selenium.ElementNotInteractableException: 'element not interactable", and I have no idea why.
I didn't find anything helpful online and I think the problem may be with selenium.
I tried to find SQL, JS and BASH injections, but the script fails when i try to interact with an input. I am using OWASP juice shop to test my code.
This is my code:
static int _crntTypeOfInjection;
const int ESQL = 0, EJS = 1, EBASH = 2;
static public bool IsImportantInput(string type)
{
bool valid = false;
string[] importantTypes = new string[] { "text", "email", "password", "search", "url" };
foreach (string check in importantTypes)
{
if (type == check)
{
return true;
}
}
return false;
}
public static string getCrntInjection()
{
switch (_crntTypeOfInjection)
{
case ESQL:
return "\' OR 1=1;--";
break;
case EBASH:
return "; echo Test";
break;
case EJS:
return "<img src=\"http:\\\\url.to.file.which\\not.exist\" onerror=alert(\"JS injection success\");>";
break;
}
return "defult";
}
static public bool AttackSuccessful(string normalPage, string InjectedPage, string MainUrl, string afterClickUrl)
{
if (afterClickUrl != MainUrl || InjectedPage.Contains("Internal Server Error") || InjectedPage.Contains("JS injection success") || InjectedPage.Contains("Test"))
{
return true;
}
return false;
}
static public void Injection(string url)
{
string InjectedPage = "", NormalPage = "", AfterClickUrl = "";
var driver = new ChromeDriver("C:\\Users\\nirya\\");
driver.Url = url;
Console.WriteLine(driver.PageSource);
Actions a = new Actions(driver);
foreach (var button in driver.FindElements(By.CssSelector("button")))
{
// INJECTED PAGE
a.MoveByOffset(0, 0).Click().Perform();
foreach (IWebElement input in driver.FindElements(By.TagName("input")))
{
Console.WriteLine(input.Text);
Console.WriteLine(input.TagName);
try
{
if (IsImportantInput(input.GetAttribute("type")))
{
input.Click(); // *** HERE IS THE PROBLEM ***
input.Clear();
input.SendKeys(getCrntInjection());
}
}
catch (NoSuchElementException)
{
continue;
}
}
button.Click();
InjectedPage = driver.PageSource;
AfterClickUrl = driver.Url;
driver.Navigate().Back();
// NORMAL PAGE
a.MoveByOffset(0, 0).Click().Perform();
foreach (IWebElement input in driver.FindElements(By.CssSelector("input")))
{
try
{
if (IsImportantInput(input.GetAttribute("type")))
{
input.Clear();
input.SendKeys("normal");
}
}
catch (NoSuchElementException)
{
continue;
}
}
button.Click();
NormalPage = driver.PageSource;
driver.Navigate().Back();
if (AttackSuccessful(NormalPage, InjectedPage, url, AfterClickUrl))
{
// add to database
}
}
}
static void Main(string[] args)
{
Injection("http://localhost:3000/#/login");
}
Is there a problem with my code? Or is there another library that i can use instead?

Popup / Alert windows when working with Selenium C#

The program for ordering statements on the registry, I can not go to their pop-up window, selenium does not see that any new is being created.
Is it possible to do it through Xpath without using the transition to the Popup window, a browser function, or in another way in Selenium (Chrome)?
New window detection function:
public static string ClickAndSwitchWindow(IWebElement elementToBeClicked,
IWebDriver driver, int timer = 2000)
{
System.Collections.Generic.List<string> previousHandles = new
System.Collections.Generic.List<string>();
System.Collections.Generic.List<string> currentHandles = new
System.Collections.Generic.List<string>();
previousHandles.AddRange(driver.WindowHandles);
elementToBeClicked.Click();
Thread.Sleep(timer);
for (int i = 0; i < 20; i++)
{
currentHandles.Clear();
currentHandles.AddRange(driver.WindowHandles);
foreach (string s in previousHandles)
{
currentHandles.RemoveAll(p => p == s);
}
if (currentHandles.Count == 1)
{
driver.SwitchTo().Window(currentHandles[0]);
Thread.Sleep(100);
return currentHandles[0];
}
else
{
Thread.Sleep(500);
}
}
return null;
}
The piece of code itself:
//After this click of this element, a window opens:
//"Send request"
IWebElement PopWindowsstart = ww.Until(ExpectedConditions.ElementIsVisible(By.XPath("/html/body/div[1]/div[6]/div[4]/div/div/section/div[2]/div[2]/div/div/div[2]/div/div[2]/div/div/div/div[1]/div/div/div/div[1]/div/div/div/div[4]/div/div/div/div[1]/div/div/div/div[1]/div/div/span/span")));
//Search for a new window
string newWin = ClickAndSwitchWindow(PopWindowsstart, Browser, 2500);
PopupWindowFinder finder = new PopupWindowFinder(Browser);
//Switch to a new window
Browser.SwitchTo().Window(newWin);
//Statement Number:
IWebElement NumExctract = ww.Until(ExpectedConditions.ElementIsVisible(By.XPath("div[class='v-label v-label-tipFont tipFont v-label-undef-w'] b")));
//Read check
MessageBox.Show(NumExctract.Text);
//"Continue work"
ww.Until(ExpectedConditions.ElementIsVisible(By.XPath("/html/body/div[7]/div/div/div/div[3]/div/div/div/div[1]/div/div/div/div[2]/div/div/div/div[1]/div/div/div/div[1]/div/div/span/span"))).Click();
//"Change"
ww.Until(ExpectedConditions.ElementIsVisible(By.XPath("/html/body/div[1]/div[6]/div[4]/div/div/section/div[2]/div[2]/div/div/div[2]/div/div[2]/div/div/div/div[1]/div/div/div/div[2]/div/div/div/div[1]/div/div/div/div[2]/div/div/span/span"))).Click();
Thread.Sleep(300000);
Type window:

Let's make this a bit easier.
If you need to switch to a popup, try the below.
public static string SwitchToPopup()
{
var mainHandle = Driver.CurrentWindowHandle;
var handles = Driver.WindowHandles;
foreach (var handle in handles)
{
if (mainHandle == handle)
{
continue;
}
Driver.SwitchTo().Window(handle);
break;
}
var result = Url;
return result;
}
When you need to switch back, use:
public static void GoToMainHandle()
{
var handles = Driver.WindowHandles;
foreach (var handle in handles)
{
Driver.SwitchTo().Window(handle);
break;
}
}
That being said, your xpath is not something that should ever be used. Please look at https://www.w3schools.com/xml/xpath_intro.asp and rewrite it. When you use chrome to give you your xpath like:
ww.Until(ExpectedConditions.ElementIsVisible(By.XPath("/html/body/div[1]/div[6]/div[4]/div/div/section/div[2]/div[2]/div/div/div[2]/div/div[2]/div/div/div/div[1]/div/div/div/div[2]/div/div/div/div[1]/div/div/div/div[2]/div/div/span/span"))).Click();
If your dev adds a div in here somewhere, all of your tests will now fail. If your devs are not providing unique identifiers, work with them to resolve that. You should have id's, class names etc.

Try:
public static void PopUp()
{
_webDriver.SwitchTo().Alert().Accept();
}

C# web scraper navigate to aspx link

I have a C# Windows Phone 8.1 app which I am building. Part of the app needs to go and look for information on a specific web page. One of the fields which I need is a URL which can be found on certain items on the page, however I am finding that the URL is in a relative-style format
FullArticle.aspx?a=323495
I am wondering if there is a way in C# using HtmlAgilityPack, HttpWebRequest etc etc to find the link to the actual page. Code snippet is below.
private static TileUpdate processSingleNewsItem(HtmlNode newsItemNode)
{
Debug.WriteLine("");
var articleImage = getArticleImage(getNode(newsItemNode, "div", "nw-container-panel-articleimage"));
var articleDate = getArticleDate(getNode(newsItemNode, "div", "nw-container-panel-articledate"));
var articleSummary = getArticleSummary(getNode(newsItemNode, "div", "nw-container-panel-textarea"));
var articleUrl = getArticleUrl(getNode(newsItemNode, "div", "nw-container-panel-articleimage"));
return new TileUpdate{
Date = articleDate,
Headline = articleSummary,
ImagePath = articleImage,
Url = articleUrl
};
}
private static string getArticleUrl(HtmlNode parentNode)
{
var imageNode = parentNode.Descendants("a").FirstOrDefault();
Debug.WriteLine(imageNode.GetAttributeValue("href", null));
return imageNode.GetAttributeValue("href", null);
}
private static HtmlNode getNode(HtmlNode parentNode, string nodeType, string className)
{
var children = parentNode.Elements(nodeType).Where(o => o.Attributes["class"].Value == className);
return children.First();
}
Would appreciate any ideas or solutions. Cheers!

In my web crawler here's what I do:
foreach (HtmlNode link in doc.DocumentNode.SelectNodes(#"//a[#href]"))
{
HtmlAttribute att = link.Attributes["href"];
if (att == null) continue;
string href = att.Value;
if (href.StartsWith("javascript", StringComparison.InvariantCultureIgnoreCase)) continue; // ignore javascript on buttons using a tags
Uri urlNext = new Uri(href, UriKind.RelativeOrAbsolute);
// Make it absolute if it's relative
if (!urlNext.IsAbsoluteUri)
{
urlNext = new Uri(urlRoot, urlNext);
}
...
}

Sitecore Dynamic Placeholders Allowed Renderings

I'm implementing Dynamic Placeholders in Sitecore 7 as described in the articles
http://trueclarity.wordpress.com/2012/06/19/dynamic-placeholder-keys-in-sitecore/
http://www.techphoria414.com/Blog/2011/August/Dynamic_Placeholder_Keys_Prototype
It is working correctly such that I can add the same Rendering to the layout and the renderings will go in the appropriate Dynamic Placeholder. However when I click to add a Rendering to the Dynamic Placeholder, the placeholder settings aren't being used.
What I am expecting is to be prompted with the allowed renderings that may be placed on the Dynamic Placeholder. Instead the Rendering/Layout tree is presented to manually select the rendering - giving Content Editors the ability to add disallowed renderings to the placeholder.
I have debugged the code and the correct Placeholder Settings Item is being found for the Dynamic Placeholder and the list of allowed Renderings are being retrieved however despite being set in the args the list is not presented for the User. See code below.
public class GetDynamicKeyAllowedRenderings : GetAllowedRenderings
{
//string that ends in a GUID
public const string DynamicKeyRegex = #"(.+){[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}}";
public new void Process(GetPlaceholderRenderingsArgs args)
{
Assert.IsNotNull(args, "args");
// get the placeholder key
string placeholderKey = args.PlaceholderKey;
var regex = new Regex(DynamicKeyRegex);
Match match = regex.Match(placeholderKey);
// if the placeholder key text followed by a Guid
if (match.Success && match.Groups.Count > 0)
{
// Is a dynamic placeholder
placeholderKey = match.Groups[1].Value;
}
else
{
return;
}
Item placeholderItem = null;
if (ID.IsNullOrEmpty(args.DeviceId))
{
placeholderItem = Client.Page.GetPlaceholderItem(placeholderKey, args.ContentDatabase,
args.LayoutDefinition);
}
else
{
using (new DeviceSwitcher(args.DeviceId, args.ContentDatabase))
{
placeholderItem = Client.Page.GetPlaceholderItem(placeholderKey, args.ContentDatabase,
args.LayoutDefinition);
}
}
// Retrieve the allowed renderings for the Placeholder
List<Item> collection = null;
if (placeholderItem != null)
{
bool allowedControlsSpecified;
args.HasPlaceholderSettings = true;
collection = this.GetRenderings(placeholderItem, out allowedControlsSpecified);
if (allowedControlsSpecified)
{
args.CustomData["allowedControlsSpecified"] = true;
}
}
if (collection != null)
{
if (args.PlaceholderRenderings == null)
{
args.PlaceholderRenderings = new List<Item>();
}
args.PlaceholderRenderings.AddRange(collection);
}
}
}
As this code was developed for Sitecore 6.5 / 6.6 I wonder if in the jump to Sitecore 7.0 brought a change that affects the latter half of the code

I have found the source of the issue by decompiling the Sitecore 7 Kernel and viewing the default GetAllowedRenderings class. If Allowed Renderings are found the ShowTree Option needs to be set to false. See below
public class GetDynamicKeyAllowedRenderings : GetAllowedRenderings
{
//string that ends in a GUID
public const string DynamicKeyRegex = #"(.+){[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}}";
public new void Process(GetPlaceholderRenderingsArgs args)
{
Assert.IsNotNull(args, "args");
// get the placeholder key
string placeholderKey = args.PlaceholderKey;
var regex = new Regex(DynamicKeyRegex);
Match match = regex.Match(placeholderKey);
// if the placeholder key text followed by a Guid
if (match.Success && match.Groups.Count > 0)
{
// Is a dynamic placeholder
placeholderKey = match.Groups[1].Value;
}
else
{
return;
}
// Same as Sitecore.Pipelines.GetPlaceholderRenderings.GetAllowedRenderings from here but with fake placeholderKey
// i.e. the placeholder without the Guid
Item placeholderItem = null;
if (ID.IsNullOrEmpty(args.DeviceId))
{
placeholderItem = Client.Page.GetPlaceholderItem(placeholderKey, args.ContentDatabase,
args.LayoutDefinition);
}
else
{
using (new DeviceSwitcher(args.DeviceId, args.ContentDatabase))
{
placeholderItem = Client.Page.GetPlaceholderItem(placeholderKey, args.ContentDatabase,
args.LayoutDefinition);
}
}
List<Item> collection = null;
if (placeholderItem != null)
{
bool allowedControlsSpecified;
args.HasPlaceholderSettings = true;
collection = this.GetRenderings(placeholderItem, out allowedControlsSpecified);
if (allowedControlsSpecified)
{
// Hide the Layout/Rendering tree to show the Allowed Renderings
args.Options.ShowTree = false;
}
}
if (collection != null)
{
if (args.PlaceholderRenderings == null)
{
args.PlaceholderRenderings = new List<Item>();
}
args.PlaceholderRenderings.AddRange(collection);
}
}
}
This is a change brought in by Sitecore 7 it seems.

How to do pagination in couchdb using loveseat in c#

I need to do page load on scroll down in my application. I am using couchdb as my back end and I found a pagination option in couchdb which I think would satisfy my issue.
The thing is I can't find any working examples for pagination anywhere. I need someone's help in making my application work with this one.
Take a look at this for reference: https://github.com/soitgoes/LoveSeat/blob/master/LoveSeat/PagingHelper.cs
This is my code. I am getting an error in the options = model.GetOptions(); line, saying "object reference not set to an instance of an object".
public List<newVO> Getdocs(IPageableModel model)
{
List<newVO> resultList = new List<newVO>();
var etag = "";
ViewOptions options = new ViewOptions();
options = model.GetOptions();
options.StartKeyDocId = lastId;
options.Limit = 13;
options.Skip = 1;
var result = oCouchDB.View<newVO>("GetAlldocs", options);
//model.UpdatePaging(options, result);
if (result.StatusCode == HttpStatusCode.NotModified)
{
response.StatusCode = "0";
return null;
}
if (result != null)
{
foreach (newVO newvo in result.Items)
{
resultList.Add(newvo );
}
}
return resultList;
}
Thanks in advance. All ideas are welcome.
public List<newVO> Getdocs(IPageableModel model)
{
List<newVO> resultList = new List<newVO>();
var etag = "";
ViewOptions options = new ViewOptions();
options = model.GetOptions();
options.StartKeyDocId = lastId;
options.Limit = 13;
options.Skip = 1;
var result = oCouchDB.View<newVO>("GetAlldocs", options);
//model.UpdatePaging(options, result);
if (result.StatusCode == HttpStatusCode.NotModified)
{
response.StatusCode = "0";
return null;
}
if (result != null)
{
foreach (newVO newvo in result.Items)
{
resultList.Add(newvo );
}
}
return resultList;
}
This is my code and i am getting error in "options = model.GetOptions();" line that object reference not set to an instance of an object...

I've not used the LoveSeat paging implementation, but you can use the Limit and Skip properties on the ViewOptions to achieve paging:
public static IEnumerable<T> GetPage(this ICouchDatabase couchDatabase,
string viewName,
string designDoc,
int page,
int pageSize)
{
return couchDatabase.View(viewName, new ViewOptions
{
Skip = page * pageSize,
Limit = pageSize
}, designDoc);
}
This simple extension method will get a page of data from a CouchDB view

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Scraping ASP.NET page, simulate click - c#

Turn off your JavaScript along with cookies in browser (delete cookies before turning off) and than see the actual page that CsQuery will use. This might be the result of why you can't parse anything, For example an actual content of the page loads with AJAX.

Related

How can I make an auto injection detector in c#?

Popup / Alert windows when working with Selenium C#

C# web scraper navigate to aspx link

Sitecore Dynamic Placeholders Allowed Renderings

How to do pagination in couchdb using loveseat in c#

Categories

Resources