Read a full page with aspx form Load dynamic in c# - c#

I need to read this page in WCF service
http://bvmf.bmfbovespa.com.br/cias-listadas/empresas-listadas/ResumoEmpresaPrincipal.aspx?codigoCvm=9512&idioma=pt-br
But I want to read this node generate dynamic by server class="ficha responsive"
When I use a method like
HtmlDocument doc = web.Load("http://bvmf.bmfbovespa.com.br/cias-listadas/empresas-listadas/ResumoEmpresaPrincipal.aspx?codigoCvm=9512&idioma=pt-br")
I not get full page because page call dynamic this form
form name="aspnetForm"
method="post"
action="ResumoEmpresaPrincipal.aspx?codigoCvm=9512&idioma+=+pt+-+br&idioma=pt-br"
id="aspnetForm"
How I can get load FULL page or post data to this webform in C#?? or load a full HTML Content ?
ResumoEmpresaPrincipal.aspx?codigoCvm=9512

The solution to read a full page content are in this post
Scraping webpage generated by javascript with C#

Related

Html Agility Pack, Web scraping [duplicate]

How can I scrape data that are dynamically generated by JavaScript in html document using C#?
Using WebRequest and HttpWebResponse in the C# library, I'm able to get the whole html source code as a string, but the difficulty is that the data I want isn't contained in the source code; the data are generated dynamically by JavaScript.
On the other hand, if the data I want are already in the source code, then I'm able to get them easily using Regular Expressions.
I have downloaded HtmlAgilityPack, but I don't know if it would take care of the case where items are generated dynamically by JavaScript...
Thank you very much!
When you make the WebRequest you're asking the server to give you the page file, this file's content hasn't yet been parsed/executed by a web browser and so the javascript on it hasn't yet done anything.
You need to use a tool to execute the JavaScript on the page if you want to see what the page looks like after being parsed by a browser. One option you have is using the built in .net web browser control: http://msdn.microsoft.com/en-au/library/aa752040(v=vs.85).aspx
The web browser control can navigate to and load the page and then you can query it's DOM which will have been altered by the JavaScript on the page.
EDIT (example):
Uri uri = new Uri("http://www.somewebsite.com/somepage.htm");
webBrowserControl.AllowNavigation = true;
// optional but I use this because it stops javascript errors breaking your scraper
webBrowserControl.ScriptErrorsSuppressed = true;
// you want to start scraping after the document is finished loading so do it in the function you pass to this handler
webBrowserControl.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowserControl_DocumentCompleted);
webBrowserControl.Navigate(uri);
private void webBrowserControl_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection divs = webBrowserControl.Document.GetElementsByTagName("div");
foreach (HtmlElement div in divs)
{
//do something
}
}
You could take a look at a tool like Selenium for scraping pages which has Javascript.
http://www.andykelk.net/tech/headless-browser-testing-with-phantomjs-selenium-webdriver-c-nunit-and-mono

Include HTML webform in ASP/C# on page load

I made a webform in html and I have a website in C#.
I would like this form to show up every time the page is loaded.
What is the best way to integrate/include/call the form?
Which pages I have to modify? Default.Aspx or Default.Aspx.cs?
The purpose of this project is to show this form everytime the cookies is not set in the aspx code.
Which I guess I have to modify the aspx part that checks if the value of the cookie is set or not and show/not show the webform based on this value.
You could do this using a combination of JavaScript (to check if cookies are enabled) and JQuery. If cookies aren't enabled, have a placeholder DIV that can hold the HTML content you wanted to show. Then use $.ajax (http://api.jquery.com/jquery.ajax/) to load the html content from browser and set the DIV's innerHTML property with the returned HTML.
Hope this works for you!!
Some little progress.
That's how I modified my pages for my needs. In this way the html webform is diplaying correctly.
In the head of default.master I added:
in the default.aspx I added:
with the entire html of my html page (tag html included).
Now I need to modify this page that in the way this pop up is showed only if a cookie value is not set.

Scraping data dynamically generated by JavaScript in html document using C#

How can I scrape data that are dynamically generated by JavaScript in html document using C#?
Using WebRequest and HttpWebResponse in the C# library, I'm able to get the whole html source code as a string, but the difficulty is that the data I want isn't contained in the source code; the data are generated dynamically by JavaScript.
On the other hand, if the data I want are already in the source code, then I'm able to get them easily using Regular Expressions.
I have downloaded HtmlAgilityPack, but I don't know if it would take care of the case where items are generated dynamically by JavaScript...
Thank you very much!
When you make the WebRequest you're asking the server to give you the page file, this file's content hasn't yet been parsed/executed by a web browser and so the javascript on it hasn't yet done anything.
You need to use a tool to execute the JavaScript on the page if you want to see what the page looks like after being parsed by a browser. One option you have is using the built in .net web browser control: http://msdn.microsoft.com/en-au/library/aa752040(v=vs.85).aspx
The web browser control can navigate to and load the page and then you can query it's DOM which will have been altered by the JavaScript on the page.
EDIT (example):
Uri uri = new Uri("http://www.somewebsite.com/somepage.htm");
webBrowserControl.AllowNavigation = true;
// optional but I use this because it stops javascript errors breaking your scraper
webBrowserControl.ScriptErrorsSuppressed = true;
// you want to start scraping after the document is finished loading so do it in the function you pass to this handler
webBrowserControl.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowserControl_DocumentCompleted);
webBrowserControl.Navigate(uri);
private void webBrowserControl_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection divs = webBrowserControl.Document.GetElementsByTagName("div");
foreach (HtmlElement div in divs)
{
//do something
}
}
You could take a look at a tool like Selenium for scraping pages which has Javascript.
http://www.andykelk.net/tech/headless-browser-testing-with-phantomjs-selenium-webdriver-c-nunit-and-mono

C# getting the HTML document from a Javascript Iframe

I am trying to access this webpage http://www.pof.com with C# code.
I figured out that the Document element is stored in an iframe after I successfully logged in as a user and I am not familiar with how to access the document element.
All I want to do is to get the HTML format of that page which is loaded with an iframe and go to some of the links of that site.
Use following code:
document.getElementById('iframe1').contentWindow.document
or simply,
var elemVal;
if (iframeDocument) {
elemVal= iframeDocument.getElementById('#iframe1');
}

How to extract dynamic ajax content from a web page

My requirement is to extract the required content from a web page. The page has a section which is being populated using ajax. When i view in page source it is not showing the content loaded using ajax. The section content will change based on check box selected. If we select 'India' check box then the section will display all the details of India. The page source will show only default content not the content displayed using ajax. I checked the page source after selecting the check box, still it shows only default value. How to get that section content,
In C# you can use HTMLAgilityPack to craw data, but if you use webBrowser.DocumentText, you can't load ajax content from webpage to get xpath. So after webBrowser control loaded webpage completely. In Document_Complete method you add some codes below:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
this.webBrowser1.Document;
IHTMLDocument2 currentDoc =(IHTMLDocument2)this.webBrowser1.Document.DomDocument;
doc.LoadHtml(currentDoc.activeElement.innerHTML);
Use Firebug under Firefox. Under NET tab you will see the extra content loaded.

Categories

Resources