Find an element on web page using C# - c#

Helo!
My goal is to retreive data from HPE Service Manager (HPSM). I use webBrowser component as it's seems the easyest way to me. At least I had no problem with authorization and opening search page but the things turned strange as I proceeded.
I need to find element with id "X13" on the page but
wb.document.getElementById("X13");
returns null although executing getElementById in IE and Chrome dev consoles still does the trick.
How can I find the element I need using C# and webBrowser component?

Related

Excecute script with HtmlAgilityPack [duplicate]

I'm trying to scrape a particular webpage which works as follows.
First the page loads, then it runs some sort of javascript to fetch the data it needs to populate the page. I'm interested in that data.
If I Get the page with HtmlAgilityPack - the script doesn't run so I get what it essentially a mostly-blank page.
Is there a way to force it to run a script, so I can get the data?
You are getting what the server is returning - the same as a web browser. A web browser, of course, then runs the scripts. Html Agility Pack is an HTML parser only - it has no way to interpret the javascript or bind it to its internal representation of the document. If you wanted to run the script you would need a web browser. The perfect answer to your problem would be a complete "headless" web browser. That is something that incorporates an HTML parser, a javascript interpreter, and a model that simulates the browser DOM, all working together. Basically, that's a web browser, except without the rendering part of it. At this time there isn't such a thing that works entirely within the .NET environment.
Your best bet is to use a WebBrowser control and actually load and run the page in Internet Explorer under programmatic control. This won't be fast or pretty, but it will do what you need to do.
Also see my answer to a similar question: Load a DOM and Execute javascript, server side, with .Net which discusses the available technology in .NET to do this. Most of the pieces exist right now but just aren't quite there yet or haven't been integrated in the right way, unfortunately.
You can use Awesomium for this, http://www.awesomium.com/. It works fairly well but has no support for x64 and is not thread safe. I'm using it to scan some web sites 24x7 and it's running fine for at least a couple of days in a row but then it usually crashes.

phantomjs not finding element mvc

I have an mvc app to test and my code works with Chrome but not Phantom. It can't find a simple input control with id = "password" on the logon page. I've tried different selectors, xpath, by class, by ID, even different controls on the page. It can find the "body" tag and the very next, but nothing is working to find anything past the next level in
I've also tried driver.waitforpageload and thread.sleep to make sure things are loaded first, no avail. Ideas anyone?
It can't find a simple input control with id = "password" on the logon
page.
and
It can find the "body" tag and the very next inside tag
"app-controller", but nothing is working to find anything past the
next level in, the view-manager tag.
and
I've even tested google.com to see if it was even loading and it can
find controls on that page successfully.
Considering the above, I'm guessing that the Login page is your page in your site after navigating to the URL.
I have encountered the same issue which normally led me think that PhantomJS cannot identify the elements on page. This until I started debugging stuff.
Because of the way how PhantomJS works loading everything headless, more or less in his own 'container' it turned out that it wasn't even able to load my test site, because it didn't had all the necessary prerequisites executed.
To be able to debug, and check if you face the same issue, just print to a log file the content of your page.
You can achieve this by using: var pageSource = driver.Pagesource;
It's highly likely that the last node present will be the body one.
If that's the case, you need to see what prerequisites are required for your site to start working: authentication, HTTPS certificates, etc.

Selenium webdriver C# - Unable to find the element in a grid developed using angular UI

I am trying to automate a web application developed using angular JS through selenium webdriver(C#) and in that i am trying to click on a cell in a angular UI grid, i tried finding by css selector or xpath but it didn't help.
Css selector is generating dynamic ID - #\31 460691734316-0-uiGrid-00KQ-cell > div
Xpath is also dynamic //*[#id="1460691734316-0-uiGrid-00KQ-cell"]/div
and i tried using
driver.FindElements(By.CssSelector("*[id^='1460'][id$='cell']"));
but it didn't help
any help will be highly appreciated. I can send more details if needed
For my particular problem with the HTML page containing iframes and developed with AnglularJS the following trick saved me a lot of time: In the DOM I clearly saw that there is an iframe which wraps all the content. So following code supposed to work:
driver.switchTo().frame(0);
waitUntilVisibleByXPath("//h2[contains(text(), 'Creative chooser')]");
But it was not working and told me something like "Cannot switch to frame. Window was closed". Then I modified the code to:
driver.switchTo().defaultContent();
driver.switchTo().frame(0);
waitUntilVisibleByXPath("//h2[contains(text(), 'Creative chooser')]");
After this everything went smoothly. So evidently Angular was mangling something with iframes and just after loading the page when you expect that driver is focused on default content it was focused by some already removed by Angular frame. Hope this may help some of you.
What about trying to find the element with Selenium IDE which is a plugin of firefox ?
In the IDE, you can easily find the selector using selecting the element with GUI
Rather than identifying the element specifically by its ID, could you use the elements around it instead? Is this cell within a table and at a consistent position? Is there a parent element you could more consistently select and iterate through the children in your C# program to identify the appropriate cell you're looking for?

Parse links from WebBrowser if source code is not updated

It should be such a problem: I should parse links from the site. Everything would be fine, but the links are displayed in the script and in the source code they are not. More precisely, they are, but the old ones.
Here is the site: http://54.join.ru/resume?q=
Need to parse links to resume. Everything is fine. But when you go to some other page, for example 5, a summary of changes, and the source code are old links, ie those that were on the first page.
Can anybody suggest how can I parse the new links? Write in c # using webBrowser.
Use Selenium WebDriver.
Selenium-WebDriver was developed to better support dynamic web pages
where elements of a page may change without the page itself being
reloaded.
Thus you will be able to access elements on a web page that has been changed dynamically by javascript.
Following code for example finds an element by given class name:
IWebElement we = driver.FindElement(By.ClassName("ra-elements-list__new-window-link"));

C# WebKit and CKEditor iframe scraping

Currently I am working on my project which involves using webbrowser control in c#. After many struggles I successfully accomplished joining WebKit to WinForms application and run website with CKEditor in it but it gave me 2 issues.
1 Image uploader works fine but it doesn't send callback or WebKit cannot process it. Is there any possibility to make it work?
2 When I try to scrape html document to get the iframe by doing this: webKitBrowser1.Document.GetElementById("cke_1_contents").LastChild I get iframe element but I have no idea how to get content of it because it says that i doesn't have any childs.
Anyone can suggest me what to do next or give any alternative for this?
I use VS2008 and .NET 3.5.
I can't answer this question in the context of the WebKit-based control, but I suggest that you try the native WinForms WebBrowser control. It works great as the host for CKEditor, once the WebBrowser Feature Control has been implemented.
Then, if I was to do web-scraping on a page with CKEditor, I'd try something like this to get the current editor content (from C#):
dynamic pageDocument = webBrowser.Document.DomDocument;
var ckeDocument = pageDocument.getElementsByClassName("cke_wysiwyg_frame").item(0).contentDocument;
MessageBox.Show((string)ckeDocument.documentElement.outerHTML);

Categories

Resources