I am using this code:
HttpWebResponse objHttpWebResponse = (HttpWebResponse)objHttpWebRequest.GetResponse();
return new StreamReader(objHttpWebResponse.GetResponseStream()).ReadToEnd();
I get the page content successfully, but my problem is that there are some dynamic content that are populated by javascript functions on the page and it seems that the content is fetched before those functions finished executing, so those parts of the page are returned not populated with data, is there any way to solve this "Wait for page until it's completely loaded including all contents".
Edit:
Regarding "#ElDog" answer, i tried the following code but with no luck to:
WebBrowser objWebBrowser = new WebBrowser();
objWebBrowser.DocumentCompleted += objWebBrowser_DocumentCompleted;
objWebBrowser.Navigate(url);
and at the document complete event i executed the following code:
string content = ((WebBrowser)(sender)).Document.Body.InnerHtml;
But still the javascript functions didn't execute.
HttpWebRequest is not going to execute java scripts at all. It just gives you what a web browser gets in response. To execute java scripts you would need a web browser emulation in your code.
Related
I create website at visual studio 2010. So, i should make open a new form and send information from first form. I used text file (i write from fist page to file and read this file at new form) and this is worked. But i want created connection by GET/POST request. I get this code from How to make an HTTP POST web request.
Project is compile, but exceeds the time limit. So, bottom i attached code and error.
Code from first page
var request = (HttpWebRequest)WebRequest.Create("http://localhost:55590/WebSite2/Form2.aspx");
var postData = text;
var data = Encoding.ASCII.GetBytes(postData);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
var response = (HttpWebResponse)request.GetResponse();
var responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
Code from second page
var request = (HttpWebRequest)WebRequest.Create("http://localhost:55590/WebSite2/Form2.aspx");
var response = (HttpWebResponse)request.GetResponse();
var responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
Error
Operation timed out
Description: An unhandled exception occurred while executing the current web request. Examine the stack trace for more information about this error and the code snippet that caused it.
Exception Details: System.Net.WebException: The operation timed out
Source error:
136: }
137:
138: var response = (HttpWebResponse)request.GetResponse(); // Error here
139:
140: var responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
I tried and second variant from source, but get error. So, help me please
So there are quite a few ways to send data and "things" from one web page to the next.
Session() is certainly one possible way.
Another is to use parameters in the URL you thus often see that on many web sites
Even as I write this post - we see the URL on StackOverFlow as this:
stackoverflow.com/questions/66294186/http-request-get-post?noredirect=1#comment117213494_66294186
So, the above is how stack overflow is passing values.
So session() and paramters in the URL os common.
However, asp.net net has a "feature" in which you can pass the previous page to the next. So then it becomes a simple matter to simply pluck/get/grab/use things from that first page in the next page you loaded. so this feature is PART of asp.net, and it will do all the dirty work of passing that previous page for you!!!
Hum, I wonder if people have to ever pass and get values from the previous page? I bet this most common idea MUST have been dealt with, right? And not only is this a common thing say like how human breathe air? It also a feature of asp.net.
So, a REALLY easy approach is to simply when you click on a button, and then jump to the next page in question? Well, if things are setup correctly, then you can simple use the "previous" page!!!
You can do this on page load:
if (IsPostBack == false)
{
TextBox txtCompay = PreviousPage.FindControl("txtCompnay");
Debug.Print("Value of text box company on prevous page = " + txtCompay.Text);
}
This approach is nice, since you don't really have to decide ahead of time if you want 2 or 20 values from controls on the previous page - you really don't care.
How does this work?
The previous page is ONLY valid based two approaches.
First way:
The button you drop on the form will often have "code behind" that of course jumps or goes to the next page in question.
That command (in code behind) is typical this:
Response.Redirect("some aspx web page to jump to")
The above does NOT pass previous page
However, if you use this:
Server.Transfer("some aspx web page to jump to")
Then the previous page IS PASSED and you can use it!!!!
So in the next page, on page load event, you can use "prevouspage" as per above.
so Server.Transfer("to the next page") WILL ALLOW use of "previous page" in your code.
So you can pick up any control, any value. You can even reference say a gridview and the row the user has selected. In effect the whole previous page is transferred and available for using in "previous page" as per above. You can NOT grab viewstate, but you can setup public methods in that previous page to expose members of viewstate if that also required.
You will of course have to use FindControl, but it is the previous page.
The other way (to allow use of previous page).
You don't use code behind to trigger the jump to the new page (with Server.Transfer()), but you set the post-back URL in the button in that first page. that is WHAT the post-back URL is for!!! (to pass the current page to the post-back URL).
eg this:
<asp:Button ID="Button1" runat="server" Text="View Hotels"
PostBackUrl="~/HotelGrid.aspx" />
So you use the "post back" URL feature of the button.
Now, when you click on that button, it will jump to the 2nd page, and once again previous page can be used as per above. And of course with post-back URL set, then of course you don't need a code behind stub to jump to that page.
So this is quite much a "basic" feature of asp.net, and is a built-in means to transfer the previous page to the next. Kind of like asp.net "101".
So this perhaps common, in fact MOST COMMON BASIC need to pass values from a previous web page is not only built in, but it is in fact called "prevous page"!!!!!
Rules:
Previous page only works if you use a Server.Transfer("to the page")
Response.Request("to the page") does NOT allow use of previous page.
using the post-back URL of a button (or in fact many other controls) also
have a post-back URL setting - and again if that control has post-back URL, then
again use of previous page is allowed due to that control causing such page
navagation.
The previous page can ONLY be used on the first page load (ispostBack = False).
Using post-back URL in a button of course means a code behind stub is not required for the page jump. And once again, using post-back URL will ensure that page previous can be used in the next page.
However, in those cases in which you don't want to hard code the URL, or perhaps some additonal logic occurs in that button code stub before such navigation to the next page? (or is even to occur??).
Then ok post-back URL is not all that practical, but you can then resort to and use Server.Transfer() in that code behind, and AGAIN this allows use of the built in "previous page".
Just keep in mind that whatever you need/want/will grab from the previous page HAS to occur on the FIRST PAGE load of that page we jumped to. Any additonal button post back and the regular life cycle and use of controls and events in code behind on that page will NOT have use of previous page AFTER the first page load has occurred. (previous page will be null and empty).
You can try it that way.
var request = (HttpWebRequest)WebRequest.Create("http://localhost:55590/WebSite2/Form2.aspx");
var postData = text;
var data = Encoding.ASCII.GetBytes(postData);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
HttpWebResponse httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
using (StreamReader streamReader = new StreamReader(httpResponse.GetResponseStream()))
{
result = streamReader.ReadToEnd();
}
Is there a way to get the fully rendered html of a web page using WebClient instead of the page source? I'm trying to scrape some data from the page's html. My current code is like this:
WebClient client = new WebClient();
var result = client.DownloadString("https://somepageoutthere.com/");
//using CsQuery
CQ dom = result;
var someElementHtml = dom["body > main];
WebClient will only return the URL you requested. It will not run any javacript on the page (which runs on the client) so if javascript is changing the page DOM in any way, you will not get that through webclient.
You are better off using some other tools. Look for those that will render the HTML and javascript in the page.
I don't know what you mean by "fully rendered", but if you mean "with all data loaded by ajax calls", the answer is: no, you can't.
The data which is not present in the initial html page is loaded through javascript in the browser, and WebClient has no idea what javascript is, and cannot interpret it, only browsers do.
To get this kind of data, you need to identify these calls (if you don't know the url of the data webservice, you can use tools like Fiddler), simulate/replay them from your application, and then, if successful, get response data, and extract data from it (will be easy if data comes as json, and more tricky if it comes as html)
better use http://html-agility-pack.net
it has all the functionality to scrap web data and having good help on the site
In summary, what I'm trying to do is "open" a page driver.Navigate().GoToUrl("http://somepage.com") and then immediately block the response from "http://somepage.com/something.asmx/GetStuff" so that I can verify that some element has some class before the response is loaded: driver.FindElement(By.CssSelector("buttom.some-button")).GetAttribute("class").Contains("disabled") and then the disabled.
Is something like this possible, and if so, how do I go about it?
My question is similar to Selenium Webdriver c# without waiting for page to load in what it's trying to achieve.
Cast your instance of IWebDriver (FirefoxDriver, ChromeDriver, etc) to IJavacriptExecutor and replace the jQuery $.ajax() method with a stub, such as:
var driver = Driver as IJavaScriptExecutor
driver.ExecuteScript("window.originalAjax = $.ajax")
driver.ExecuteScript("$.ajax = function() {}")
// navigate to page, check class
driver.ExecuteScript("$.ajax = window.originalAjax")
So when your request calls into $.ajax it will hit a blank method.
This has the downside that you cannot easily get the request to 'continue' after blocking it, as no request was ever created. You would have to refresh the page without doing the above steps which could give some sort of false positive.
So I'm trying to read the source of an url, let's say domain.xyz. No problem, I can simply get it work using HttpWebRequest.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
My problem is that it will return the page source, but without the source of the iframe inside this page. I only get something like this:
<iframe src="http://anotherdomain.xyz/frame_that_only_works_on_domain_xyz"></iframe>
I figured out that I can easily get the src of the iframe with WebBrowser, or basic string functions (the results are the same), and create another HttpWebRequest using the address. The problem is that if I view the full page (where the frame was inserted) in a browser (Chrome), i get the expected results. But if I copy the src to another tab, the contents are not the same. It says that the content I want to view is blocked because it's only allowed through domain.xyz.
So my final question is:
How can I simulate the request through a specified domain, or get the full, rendered page source?
That's likely the referer property of the web request: typically a browser tells the web server where it found the link to the page it is requesting.
That means, when you create the web request for the iframe, you set the referer property of that request to the page containing the link.
If that doesn't work, cookies may be another option. I.e. you have to collect the cookies sent for the first request, and send them with the second request.
I'm downloading a web site using WebClient
public void download()
{
client = new WebClient();
client.DownloadStringCompleted += new DownloadStringCompletedEventHandler(client_DownloadStringCompleted);
client.Encoding = Encoding.UTF8;
client.DownloadStringAsync(new Uri(eUrl.Text));
}
void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
SaveFileDialog sd = new SaveFileDialog();
if (sd.ShowDialog() == DialogResult.OK)
{
StreamWriter writer = new StreamWriter(sd.FileName,false,Encoding.Unicode);
writer.Write(e.Result);
writer.Close();
}
}
This works fine. But I am unable to read content that is loaded using ajax. Like this:
<div class="center-box-body" id="boxnews" style="width:768px;height:1167px; ">
loading .... </div>
<script language="javascript">
ajax_function('boxnews',"ajax/category/personal_notes/",'');
</script>
This "ajax_function" downloads data from server on the client side.
How can I download the full web html data?
To do so, you would need to host a Javascript runtime inside of a full-blown web browser. Unfortunately, WebClient isn't capable of doing this.
Your only option would be automation of a WebBrowser control. You would need to send it to the URL, wait until both the main page and any AJAX content has been loaded (including triggering that load if user action is required to do so), then scrape the entire DOM.
If you are only scraping a particular site, you are probably better off just pulling the AJAX URL yourself (simulating all required parameters), rather than pulling the web page that calls for it.
I think you'd need to use a WebBrowser control to do this since you actually need the javascript on the page to run to complete the page load. Depending on your application this may or may not be possible for you -- note it's a Windows.Forms control.
When you visit a page in a browser, it
1.downloads a document from the
requested url
2.downloads anything referenced by an
img, link, script,etc tag (anything
that references an external file)
3.executes javascript where applicable.
The WebClient class only performs step 1. It encapsulates a single http request and response. It does not contain a script engine, and does not, as far as I know, find image tags, etc that reference other files and initiate further requests to obtain those files.
If you want to get a page once it's been modified by an AJAX call and handler, you'll need to use a class that has the full capabilities of a web browser, which pretty much means using a web browser that you can somehow automate server-side. The WebBrowser control does this, but it's for WinForms only, I think. I shudder to think of the security issues here, or the demand that would be placed on the server if multiple users are taking advantage of this facility simultaneously.
A better question to ask yourself is: why are you doing this? If the data you're really interested in is being obtained via AJAX (probably through a web service), why not skip the webClient step and just go straight to the source?