I create website at visual studio 2010. So, i should make open a new form and send information from first form. I used text file (i write from fist page to file and read this file at new form) and this is worked. But i want created connection by GET/POST request. I get this code from How to make an HTTP POST web request.
Project is compile, but exceeds the time limit. So, bottom i attached code and error.
Code from first page
var request = (HttpWebRequest)WebRequest.Create("http://localhost:55590/WebSite2/Form2.aspx");
var postData = text;
var data = Encoding.ASCII.GetBytes(postData);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
var response = (HttpWebResponse)request.GetResponse();
var responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
Code from second page
var request = (HttpWebRequest)WebRequest.Create("http://localhost:55590/WebSite2/Form2.aspx");
var response = (HttpWebResponse)request.GetResponse();
var responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
Error
Operation timed out
Description: An unhandled exception occurred while executing the current web request. Examine the stack trace for more information about this error and the code snippet that caused it.
Exception Details: System.Net.WebException: The operation timed out
Source error:
136: }
137:
138: var response = (HttpWebResponse)request.GetResponse(); // Error here
139:
140: var responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
I tried and second variant from source, but get error. So, help me please
So there are quite a few ways to send data and "things" from one web page to the next.
Session() is certainly one possible way.
Another is to use parameters in the URL you thus often see that on many web sites
Even as I write this post - we see the URL on StackOverFlow as this:
stackoverflow.com/questions/66294186/http-request-get-post?noredirect=1#comment117213494_66294186
So, the above is how stack overflow is passing values.
So session() and paramters in the URL os common.
However, asp.net net has a "feature" in which you can pass the previous page to the next. So then it becomes a simple matter to simply pluck/get/grab/use things from that first page in the next page you loaded. so this feature is PART of asp.net, and it will do all the dirty work of passing that previous page for you!!!
Hum, I wonder if people have to ever pass and get values from the previous page? I bet this most common idea MUST have been dealt with, right? And not only is this a common thing say like how human breathe air? It also a feature of asp.net.
So, a REALLY easy approach is to simply when you click on a button, and then jump to the next page in question? Well, if things are setup correctly, then you can simple use the "previous" page!!!
You can do this on page load:
if (IsPostBack == false)
{
TextBox txtCompay = PreviousPage.FindControl("txtCompnay");
Debug.Print("Value of text box company on prevous page = " + txtCompay.Text);
}
This approach is nice, since you don't really have to decide ahead of time if you want 2 or 20 values from controls on the previous page - you really don't care.
How does this work?
The previous page is ONLY valid based two approaches.
First way:
The button you drop on the form will often have "code behind" that of course jumps or goes to the next page in question.
That command (in code behind) is typical this:
Response.Redirect("some aspx web page to jump to")
The above does NOT pass previous page
However, if you use this:
Server.Transfer("some aspx web page to jump to")
Then the previous page IS PASSED and you can use it!!!!
So in the next page, on page load event, you can use "prevouspage" as per above.
so Server.Transfer("to the next page") WILL ALLOW use of "previous page" in your code.
So you can pick up any control, any value. You can even reference say a gridview and the row the user has selected. In effect the whole previous page is transferred and available for using in "previous page" as per above. You can NOT grab viewstate, but you can setup public methods in that previous page to expose members of viewstate if that also required.
You will of course have to use FindControl, but it is the previous page.
The other way (to allow use of previous page).
You don't use code behind to trigger the jump to the new page (with Server.Transfer()), but you set the post-back URL in the button in that first page. that is WHAT the post-back URL is for!!! (to pass the current page to the post-back URL).
eg this:
<asp:Button ID="Button1" runat="server" Text="View Hotels"
PostBackUrl="~/HotelGrid.aspx" />
So you use the "post back" URL feature of the button.
Now, when you click on that button, it will jump to the 2nd page, and once again previous page can be used as per above. And of course with post-back URL set, then of course you don't need a code behind stub to jump to that page.
So this is quite much a "basic" feature of asp.net, and is a built-in means to transfer the previous page to the next. Kind of like asp.net "101".
So this perhaps common, in fact MOST COMMON BASIC need to pass values from a previous web page is not only built in, but it is in fact called "prevous page"!!!!!
Rules:
Previous page only works if you use a Server.Transfer("to the page")
Response.Request("to the page") does NOT allow use of previous page.
using the post-back URL of a button (or in fact many other controls) also
have a post-back URL setting - and again if that control has post-back URL, then
again use of previous page is allowed due to that control causing such page
navagation.
The previous page can ONLY be used on the first page load (ispostBack = False).
Using post-back URL in a button of course means a code behind stub is not required for the page jump. And once again, using post-back URL will ensure that page previous can be used in the next page.
However, in those cases in which you don't want to hard code the URL, or perhaps some additonal logic occurs in that button code stub before such navigation to the next page? (or is even to occur??).
Then ok post-back URL is not all that practical, but you can then resort to and use Server.Transfer() in that code behind, and AGAIN this allows use of the built in "previous page".
Just keep in mind that whatever you need/want/will grab from the previous page HAS to occur on the FIRST PAGE load of that page we jumped to. Any additonal button post back and the regular life cycle and use of controls and events in code behind on that page will NOT have use of previous page AFTER the first page load has occurred. (previous page will be null and empty).
You can try it that way.
var request = (HttpWebRequest)WebRequest.Create("http://localhost:55590/WebSite2/Form2.aspx");
var postData = text;
var data = Encoding.ASCII.GetBytes(postData);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var stream = request.GetRequestStream())
{
stream.Write(data, 0, data.Length);
}
HttpWebResponse httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
using (StreamReader streamReader = new StreamReader(httpResponse.GetResponseStream()))
{
result = streamReader.ReadToEnd();
}
Related
Here is the code down below
List<int> j = new List<int>();
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(result.SiteURL);
webRequest.AllowAutoRedirect = false;
HttpWebResponse response = (HttpWebResponse)webRequest.GetResponse();
j.Add((int)response.StatusCode);
What i want to do is, get all the response codes, seperate them(like 2xx,3xx 4xx-5xx) and put them in different lists. Because i need their numbers like how many 4xx responses are there or how many 200 responses are there. Or is there another way to do it?
result.SiteURL is the URL that for the responses. The problem is the last line of the code doesn't return or get anything. What am i missing here?
edit: The main problem is that whatever i try i only get 1 response code and that is mostly 200:OK. But, for youtube.com(ect) there must be 74 OK(200) responses, 1 No Content(204) response and 2 Moved Permanently(301) responses according to https://tools.pingdom.com/#!/fMjhr/youtube.com. How am i going to get them?
You misunderstand the result shown by pingdom.
Pingdom requests a web page just like a browser would: It loads the page itself, as well as all resources references by the page: style sheets, scripts, images, etc.
Your code only loads the main HTML page, which has great availability and always returns 200 OK.
If you want to reproduce pingdom's results, you'll need to parse the HTML page and load the page's resources as well. Keep in mind that parsing HTML is a non-trivial task (browser vendors put a lot of effort in it), so you might want to reconsider whether this is worth your time.
So I'm trying to read the source of an url, let's say domain.xyz. No problem, I can simply get it work using HttpWebRequest.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
My problem is that it will return the page source, but without the source of the iframe inside this page. I only get something like this:
<iframe src="http://anotherdomain.xyz/frame_that_only_works_on_domain_xyz"></iframe>
I figured out that I can easily get the src of the iframe with WebBrowser, or basic string functions (the results are the same), and create another HttpWebRequest using the address. The problem is that if I view the full page (where the frame was inserted) in a browser (Chrome), i get the expected results. But if I copy the src to another tab, the contents are not the same. It says that the content I want to view is blocked because it's only allowed through domain.xyz.
So my final question is:
How can I simulate the request through a specified domain, or get the full, rendered page source?
That's likely the referer property of the web request: typically a browser tells the web server where it found the link to the page it is requesting.
That means, when you create the web request for the iframe, you set the referer property of that request to the page containing the link.
If that doesn't work, cookies may be another option. I.e. you have to collect the cookies sent for the first request, and send them with the second request.
I'm trying to do some web scraping from a simple form in C#.
My issue is trying to figure out the action to post to and how to work out the post params.
The form I am trying to submit has:
<form method="post" action="./"
As the page sits at www.foobar.com I am creating a WebRequest object in my C# code and posting to this address.
The other issue with this is that I am not sure of the post values as the inputs only have ids not names:
<input name="ctl00$MainContent$txtSearchName" type="text" maxlength="8" id="MainContent_txtSearchName" class="input-large input-upper">
So I read this: c# - programmatically form fill and submit login, amongst others and my code looks like this:
var httpRequest = WebRequest.Create("https://www.foobar.com/");
var values = "SearchName=Foo&SearchLastName=Bar";
byte[] send = Encoding.Default.GetBytes(values);
httpRequest.Method = "POST";
httpRequest.ContentType = "application/x-www-form-urlencoded";
httpRequest.ContentLength = send.Length;
Stream sout = httpRequest.GetRequestStream();
sout.Write(send, 0, send.Length);
sout.Flush();
sout.Close();
WebResponse res = httpRequest.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream());
string returnvalue = sr.ReadToEnd();
File.WriteAllText(#"C:\src\test.html", returnvalue);
However, the resulting html page that is created does not show the search results, it shows the initial search form.
I am assuming the post is failing. My questions are around post I am making.
Does action="./" mean it posts back to the same page?
Do I need to submit all the form values (or can I get away with only submitting one or two)?
Is there any way to infer what the correct post parameter names are from the form?
Or am I missing something completely about web scraping and submitting forms in server side code?
What I would suggest is not doing all of this work manually, but letting your computer take a bit of the workload. You can use a tool such as Fiddler and the Fiddler Request To Code Plugin in order to programmatically generate the C# code for duplicating the web request. You can then modify it to take whatever dynamic input you may need.
If this isn't the route you'd like to take, you should make sure that you are requesting this data with the correct cookies (if applicable) and that you are supplying ALL POST data, no matter how menial it may seem.
My problem is that I can't get div InnerText from table. I have successfully extraced different kind of data, but i don't know how to read div from table.
In following picture I've highlighted div, and I need to get InnerText from it, in this case - number 3.
Click here for first picture
I'm trying to accomplish this using following path:
"//div[#class='kal']//table//tr[2]/td[1]/div[#class='cipars']"
But I'm getting following Error:
Click here for Error message picture
Assuming that rest of the code is written correctly, could anyone point me in the right direction ? I have been trying to figure this one out, but i can't get any results.
So your problem is that you are relying on positions within your XPath. Whilst this can be OK in some cases, it is not here, because you are expecting the first td in a given tr to have a div with the class.
Looking at the source in Chrome, it shows this is not always the case. You can see this by comparing the "1" element in the calendar, to "2" and "3". You'll notice the "1" element has a number of elements around it, which the others don't.
Your original XPath query does not return an element, this is why you are getting the error. In the event the XPath query you give HtmlAgilityPack does not result in a DOM element, it will return null.
Now, because you've not shown your entire code, I don't know how this code is being run. However, I am guessing you are trying to loop through all of the calendar items. Regardless, you have multiple ways of doing this, but I will show you that with the descendant XPath selector, you can just grab the whole lot in one go:
//div[#class='kal']//table//descendant::div[#class='cipars']
This will return all of the calendar items (ie 1 through 30).
However, to get all the items in a particular row, you can just stick that tr into the query:
//div[#class='kal']//table//tr[3]/descendant::div[#class='cipars']
This would return 2 to 8 (the second row of calendar items).
To target a specific one, well, you'll have to make an assumption on the source code of the website. It looks like that every "cipars" div has an ancestor of a td with a class datums....so to get the "3" value from your question:
//div[#class='kal']//table//tr[3]//td[#class='datums'][2]/div[#class='cipars']
Hopefully this is enough to show the issue at least.
Edit
Although you do have an XPath problem, you also have another issue.
The site is created very strangely. The calendar is loaded in a strange way. When I hit that URL, the calendar is created by some Javascript calling an XML web service (written in PHP) that then calculates the full table to be used for the calendar.
Due to the fact this is Javascript (client side code), HtmlAgilityPack won't execute it. Therefore, HtmlAgilityPack doesn't even "see" the table. Hence the queries against it come back as "not found" (null).
Ways around this: 1) Use a tool that will call the scripts. By this, I mean load up a Browser. A great tool to use for this is called Selenium. This will probably be the better overall solution because it means all the scripting used by the site will actually be called. You can still use XPath with it, so your queries will not change.
The second way is to send a request off to the same web service that the page does. This is to basically get back the same HTML that the page is getting, and using that with HtmlAgilityPack. How do we do that?
Well, you can easily POST data to a web service using C#. Just for ease of use I've stolen the code from this SO question. With this, we can send the same request the page is, and get the same HTML back.
So to send some POST data, we generate a method like so.....
public static string SendPost(string url, string postData)
{
string webpageContent = string.Empty;
byte[] byteArray = Encoding.UTF8.GetBytes(postData);
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.ContentLength = byteArray.Length;
using (Stream webpageStream = webRequest.GetRequestStream())
{
webpageStream.Write(byteArray, 0, byteArray.Length);
}
using (HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse())
{
using (StreamReader reader = new StreamReader(webResponse.GetResponseStream()))
{
webpageContent = reader.ReadToEnd();
}
}
return webpageContent;
}
We can call it like so:
string responseBody = SendPost("http://lekcijas.va.lv/lekcijas_request.php", "nodala=IT&kurss=1&gads=2013&menesis=9&c_dala=");
How did I get this? Well the php file we are calling is the web service the page is, and the POST data is too. The way I found out what data it sends to the service is by debugging the Javascript (using Chrome's Developer console), but you may notice it's pretty much the same thing that is in the URL. That seems to be intentional.
The responseBody that is returned is the physical HTML of just the table for the calendar.
What do we do with it now? We load that up into HtmlAgilityPack, because it is able to accept pure HTML.
var document = new HtmlDocument();
document.LoadHtml(webpageContent);
Now, we stick that original XPath in:
var node = document.DocumentNode.SelectSingleNode("//div[#class='kal']//table//tr[3]//td[#class='datums'][2]/div[#class='cipars']");
Now, we print out what should hopefully be "3":
Console.WriteLine(node.InnerText);
My output, running it locally, is indeed: 3.
However, although this would get you over the problem you are having, I am assuming the rest of the site is like this. If this is the case, you may still be able to work around it using technique above, but tools like Selenium were created for this very reason.
I'm using C# to download the HTML of a webpage, but when I check the actual code of the web page and my downloaded code, they are completely different. Here is the code:
public static string getSourceCode(string url) {
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Method = "GET";
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
string soruceCode = sr.ReadToEnd();
sr.Close();
resp.Close();
return soruceCode;
using (StreamReader sRead = new StreamReader(resp.GetResponseStream(), Encoding.UTF8)) {
// veriyi döndür
return sRead.ReadToEnd();
}
private void button1_Click(object sender, EventArgs e) {
string url = "http://www.booking.com/hotel/tr/nena.en-gb.html?label=gog235jc-hotel-en-tr-mina-nobrand-tr-com-T002-1;sid=fcc1c6c78f188a42870dcbe1cabf2fb4;dcid=1;origin=disamb;srhash=3938286438;srpos=5";
string sourceCode = Finder.getSourceCode(url);
StreamWriter sw = new StreamWriter("HotelPrice.txt");//Here the code are completly different with web page code.
sw.Write(sourceCode);
sw.Close();
#region //Get Score Value
int StartIndex = sourceCode.IndexOf("<strong id=\"rsc_total\">") + 23;
sourceCode = sourceCode.Substring(StartIndex, 3);
#endregion
}
Most likely the cause for the difference is that when you use the browser to request the same page it's part of a session which is not established when you request the same page using the WebRequest.
Looking at the URL it looks like that query parameter sid is a session identifier or a nonce of some sort. The page probably verifies that against the actually session id and when it determines that they are different it gives you some sort of "Ooopss.. wrong seesion" sort of response.
In order to mimic the browser's request you will have to make sure you generate the proper request which may need to include one or more of the following:
cookies (previously sent to you by the webserver)
a valid/proper user agent
some specific query parameters (again depending on what the page expects)
potentially a referrer URL
authentication credentials
The best way to determine what you need is to follow a conversation between your browser and the web server serving that page from start to finish and see exactly which pages are requested, what order and what information is passed back and forth. You can accomplish this using WireShark or Fidler - both free tools!
I ran into the same problem when trying to use HttpWebRequest to crawl a page, and the page used ajax to load all the data I was after. In order to get the ajax calls to occur I switched to the WebBrowser control.
This answer provides an example of how to use the control outside of a WinForms app. You'll want to hookup to the browser's DocumentCompleted event before parsing the page. Be warned, this event may fire multiple times before the page is ready to be parsed. You may want to add something like this
if(browser.ReadyState == WebBrowserReadyState.Complete)
to your event handler, to know when the page is completely done loading.