HTTPWebRequest acts differently from same request sent through browser - c#

I am working with the shift4shop web api. These guys used to be known as threeDCart if that helps anyone. Its an eCommerce platform.
we are trying to apply a promotion code to an open cart.
support has verified there is no api-way to do that.
there is an url that will apply the promotion. This is often emailed to customers so they can apply the promo if they choose to.
we can paste the correct url in chrome, brave, edge, or firefox and it correctly applies the promotion.
We used private tabs for the different browser tests and the browsers were 'cold'. we launched the browser and immediately entered the URL.
We are thinking this eliminates the possibility that there are cookies that are necessary.
https://www.mywebsite.com/continue_order.asp?orderkey=CDC886A7O4Srgyn278668&ApplyPromo=40pro
However, when I try to do this in C#, i get a response that is redirected a page that says 'The cart is empty'.
The promotion is not applied
I am stumped as to how the website would respond differently to the same URL when it comes from a browser as opposed to the c# system.net library.
here is the c# code I am using
using System.Net;
//i really create this using my data, but this is the resulting url
string url = "https://www.mywebsite.com/continue_order.asp?orderkey=CDC886A7O4Srgyn278668&ApplyPromo=40pro"
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)req.GetResponse();
string result = "";
using (StreamReader rdr = new StreamReader(response.GetResponseStream()))
{
result = rdr.ReadToEnd();
}
You can also call ".view_cart.asp" w the same parameters and the browsers will cause the promo to be applied.
I have tried setting the method to [ , GET, get ]
There has to be something about the request settings that are preventing this from working.
I do not know what else to try.
Any thoughts are appreciated.

As per shift4shop support, the continue_order.asp has a 302.
the browsers land on continue_order.asp and process that page.
They then continue on to view_order.asp.
The 2 pages together perform functionality that you can not get by just calling continue_order.asp
Thanks to savoy w/ shift4Shop for helping on that.

Related

How to retrieve HTML Page without getting redirected?

I want to scrape the HTML of a website. When I access this website with my browser (no matter if it is Chrome or FireFox), I have no problem accessing the website + HTML.
When I try to parse the HTML with C# using methods like HttpWebRequest and HtmlAgilityPack, the website redirects me to another website and thus I parse the HTML of the redirected website.
Any idea how to solve this problem?
I thought the site recognises my program as a program and redirects immediately, so I tried using Selenium and a GoogleDriver and FireFoxDriver but also no luck, I get redirected immediately.
The Website: https://www.jodel.city/7700#!home
private void bt_load_Click(object sender, EventArgs e)
{
var url = #"https://www.jodel.city/7700#!home";
var req = (HttpWebRequest)WebRequest.Create(url);
req.AllowAutoRedirect = false;
// req.Referer = "http://www.muenchen.de/";
var resp = req.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
String returnedContent = sr.ReadToEnd();
Console.WriteLine(returnedContent);
return;
}
And of course, cookies are to blame again, because cookies are great and amazing.
So, let's look at what happens in Chrome the first time you visit the site:
(I went to https://www.jodel.city/7700#!home):
Yes, I got a 302 redirect, but I also got told by the server to set a __cfduid cookie (twice actually).
When you visit the site again, you are correctly let into the site:
Notice how this time a __cfduid cookie was sent along? That's the key here.
Your C# code needs to:
Go to the site once, get redirected, but obtain the cookie value from the response header.
Go BACK to the site with the correct cookie value in the request header.
You can go to the first link in this post to see an example of how to set cookie values for requests.

unshortening urls

I am trying to unshorten urls and have not been able to find code (vb.net/c#) to do this. These are the twitter shortened urls and I guess I could try and access one of the web services available and do a httpwebrequest but would prefer to find some programmatic way of doing this.
You can get it directly from response of the shortened url since it will return a status code MovedPermanently and the location for the real url.(This should work for most of the sites without the need for navigating to the real url)
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://t.co/xqbLEi6s");
req.AllowAutoRedirect = false;
var resp = req.GetResponse();
string realUrl = resp.Headers["Location"];
Other test data: http://goo.gl/zdf2n , http://tinyurl.com/8xc9vca , http://x.co/iEup, http://is.gd/vTOlz6 , http://bit.ly/FUA4YU
There is no magic way to unshorten a URL without asking the service which created the URL (and the way to ask will be different for each service), or more pragmatically, just opening the URL and watching where it redirects to.

vexed! POST returns a 302 found object moved error in HttpWebRequest

Someone please help - been struggling with this lousy problem!
What I'm doing - I have an ASPX page from which I originate a GET and then a POST to a HTTPS page with a view to login to it. I have spent quite a bit of time comparing my GET and POST construction to a browser GET/POST using fiddler (protocol analyzer) and my requests are fine.
However, when I try login through the browser, everything works fine and it logs in. When I run my page, I can see the correct GET and POST, but I get a 302 found 'object moved error'
Originally I thought this was a cookie issue, but after much experimentation I'm pretty sure this has nothing to do with cookies. I have disabled cookies AND javascript on the browser and tried, and the pages work fine without either. I then simulated the exact GET/POST.
This is my situation:
My GET and the browsers GET are EXACTLY THE SAME
The 200 OK response from the site is EXACTLY the same EXCEPT three VIEWSTATE variables which have slightly different lengths (why? why different even if GET is same?)
My POST and the browsers POST are EXACTLY the same EXCEPT the 3 Viewstate variables (I fill it correctly from the GET)
And yet, the browser logs in, while I get a 302 found / object moved errror.
A couple of other things -
a) I copied the POST response from a recent browser POST and replaced my POST params with this browser POST and that got me the right response! This indicates that
- my headers are just fine
- my coding setup / environment etc. are fine
- something fishy in the VIEWSTATE values, which can only be because the browser sent it to me in the first place (there is no corruption in my parsing the GET VIEWSTATE variables and using it in POST, it's perfectly fine)
update I have also tried WebClient just to check - no difference, same 302.
update The object moved basically points to a error page which says 'a serious error occurred blah blah' - the POST is causing a error at the server, and the ONLY difference between the good POST (of the browser) and my POST are the Viewstate variables.
So - WHAT AM I DOING WRONG? Why is this cruel world tormenting me?!!
(PS - one other difference in the browser sequence, not sure how it matters)
Browser:
CONNECT
GET
GET (for a favicon, which returns an error)
CONNECT
POST (success)
Me:
CONNECT
GET
POST (flaming failure, 302 - page moved)
and for those who care, my POST header construction code
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(URL);
myRequest.UserAgent = chromeUserAgent;
//myRequest.CookieContainer = cCookies;
myRequest.ContentType = "application/x-www-form-urlencoded";
myRequest.Accept = chromeAccept;
myRequest.Referer = url;
myRequest.AllowAutoRedirect = false;
myRequest.Host = "thesitethatskillingme.com";
myRequest.Headers.Add("Origin", "https://thesitethatskillingme.com");
myRequest.Headers.Add("Accept-Encoding", chromeAcceptEncoding);
myRequest.Headers.Add("Accept-Language", chromeAcceptLanguage);
myRequest.Headers.Add("Accept-Charset", chromeAcceptCharset);
myRequest.Headers.Add("Cache-Control", "max-age=0");
myRequest.ServicePoint.Expect100Continue = false;
myRequest.Method = "POST";
myRequest.KeepAlive = true;
ASCIIEncoding ascii = new ASCIIEncoding();
byte[] bData = ascii.GetBytes(data);
myRequest.ContentLength = bData.Length;
using (Stream oStream = myRequest.GetRequestStream())
oStream.Write(bData, 0, bData.Length);
...and then read stream etc. no cookies.
I finally figured it out - and hopefully someone else who chances upon the same problem does not have to go through this again. It's possible that most HTTP gurus and people familiar with WWW development would never hit it, but a newbie quite well could.
So what was the problem? I had narrowed down the problem to VIEWSTATE which I always suspected (see my post above...). It turns out that all I had to do was to Server.UrlEncode the parsed VIEWSTATE values before putting them onto POST - that's it. It took me all day to get to that.
SO, as a learning to other newcomers
If you are trying to POST to a page through code and need to send it VIEWSTATE variables that you parsed from GET, then first Server.UrlEncode it before creating the parameters - for e.g.
do GET
get the response stream into a string
parse the string (I use HtmlAgilityPack- fabulous)
param1 = name +"="+Server.UrlEncode(value)+"&"
POST param = param1+param2+...
-send this in POST - voila, it works
because I have never, ever programmed with HttpWebRequest etc., I started by narrowing down my problem by eliminating cookies, javascript, GET construction, POST construction one-by-one using fiddler (great analyzer tool, free) and then finally did byte-comparison using BeyondCompare, and that's when I caught the VIEWSTATE variable modifications.
I learnt a lesson on URL encoding, and hopefully you won't have to!

page posting issue when working in Screen Scraping

I am working on screen scraping and done successfully in 3 websites, I have an issue in last website
here is my url, When I hit with my parameter, it is showing result on next page, simply posting to other page and showing the result fine on other page
Here is My Test
However, when I hit from my application, since here I don't have an option to post, it only fetch html of requested page that is obviously my above mention HTML test link, that actually have parameter in URL to get the result.
How can I handle this situtation?
Please give me hint.
Thanks
here is my C# code, I am using HTMLAgality
String url;
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc;
url = "http://mysampleURL";
doc = hw.Load(url);
Use the WebClient class for posting the form of the first page with the expected input values. The input values can be found in the source of the first page, but it's also possible to capture them using Fiddler which is imho a great tool for these scenarios.
Example:
NameValueCollection values = new NameValueCollection();
values.Add("action","hotelPackageWizard#searchHotelOnly");
values.Add("packageType","HOTEL_ONLY");
// etc..
WebClient webclient = new WebClient();
webclient.Headers.Add("Content-Type","application/x-www-form-urlencoded");
byte[] responseArray = webclient.UploadValues("http://www.expedia.com/Hotels?rfrr=-905&","POST", values);
string response = System.Text.Encoding.ASCII.GetString(responseArray);
If the resource requires a POST, then you MUST submit a POST.
This is a fairly simple task. Here is an example from Rick Strahl's blog. The code is a bit rustic but works and will get you heading the right direction
string lcUrl = "http://www.west-wind.com/testpage.wwd";
HttpWebRequest loHttp =
(HttpWebRequest) WebRequest.Create(lcUrl);
// *** Send any POST data
string lcPostData =
"Name=" + HttpUtility.UrlEncode("Rick Strahl") +
"&Company=" + HttpUtility.UrlEncode("West Wind ");
loHttp.Method="POST";
byte [] lbPostBuffer = System.Text.
Encoding.GetEncoding(1252).GetBytes(lcPostData);
loHttp.ContentLength = lbPostBuffer.Length;
Stream loPostData = loHttp.GetRequestStream();
loPostData.Write(lbPostBuffer,0,lbPostBuffer.Length);
loPostData.Close();
HttpWebResponse loWebResponse = (HttpWebResponse) loHttp.GetResponse();
Encoding enc = System.Text.Encoding.GetEncoding(1252);
StreamReader loResponseStream =
new StreamReader(loWebResponse.GetResponseStream(),enc);
string lcHtml = loResponseStream.ReadToEnd();
loWebResponse.Close();
loResponseStream.Close();
For screen scraping tasks that involve posting forms such as log-ins, maintaining cookies, taking care of XSRF tokens, one solution is to use CURL. But it is not easy.
I then explored Selenium and I love it. There are 2 things- 1) install Selenium IDE (works only in Firefox). 2) Install Selenium RC Server
After starting Selenium IDE, go to the site that you are trying to automate and start recording events that you do on the site. Think it as recording a macro in the browser. Afterwards, you get the code output for the language you want.
Just so you know Browsermob uses Selenium for load testing and for automating tasks on browser.
I've uploaded a ppt that I made a while back. This should save you a good amount of time- http://www.4shared.com/get/tlwT3qb_/SeleniumInstructions.html
In the above link select the option of regular download.
I spent good amount of time in figuring it out, so thought it may save somebody's time.

HTTP Post in C# console app doesn't return the same thing as a browser request

I have a C# console app (.NET 2.0 framework) that does an HTTP post using the following code:
StringBuilder postData = new StringBuilder(100);
postData.Append("post.php?");
postData.Append("Key1=");
postData.Append(val1);
postData.Append("&Key2=");
postData.Append(val2);
byte[] dataArray = Encoding.UTF8.GetBytes(postData.ToString());
HttpWebRequest httpRequest = (HttpWebRequest)WebRequest.Create("http://example.com/");
httpRequest.Method = "POST";
httpRequest.ContentType = "application/x-www-form-urlencoded";
httpRequest.ContentLength = dataArray.Length;
Stream requestStream = httpRequest.GetRequestStream();
requestStream.Write(dataArray, 0, dataArray.Length);
requestStream.Flush();
requestStream.Close();
HttpWebResponse webResponse = (HttpWebResponse)httpRequest.GetResponse();
if (httpRequest.HaveResponse == true) {
Stream responseStream = webResponse.GetResponseStream();
StreamReader responseReader = new System.IO.StreamReader(responseStream, Encoding.UTF8);
String responseString = responseReader.ReadToEnd();
}
The outputs from this are:
webResponse.ContentLength = -1
webResponse.ContentType = text/html
webResponse.ContentEncoding is blank
The responseString is HTML with a title and body.
However, if I post the same URL into a browser (http://example.com/post.php?Key1=some_value&Key2=some_other_value), I get a small XML snippet like:
<?xml version="1.0" ?>
<RESPONSE RESULT="SUCCESS"/>
with none of the same HTML as in the application. Why are the responses so different? I need to parse the returned result which I am not getting in the HTML. Do I have to change how I do the post in the application? I don't have control over the server side code that accepts the post.
If you are indeed supposed to use the POST HTTP method, you have a couple things wrong. First, this line:
postData.Append("post.php?");
is incorrect. You want to post to post.php, you don't want post the value "post.php?" to the page. Just remove this line entirely.
This piece:
... WebRequest.Create("http://example.com/");
needs post.php added to it, so...
... WebRequest.Create("http://example.com/post.php");
Again this is assuming you are actually supposed to be POSTing to the specified page instead of GETing. If you are supposed to be using GET, then the other answers already supplied apply.
You'll want to get an HTTP sniffer tool like Fiddler and compare the headers that are being sent from your app to the ones being sent by the browser. There will be something different that is causing the server to return a different response. When you tweak your app to send the same thing browser is sending you should get the same response. (It could be user-agent, cookies, anything, but something is surely different.)
I've seen this in the past.
When you run from a browser, the "User-Agent" in the header is "Mozilla ...".
When you run from a program, it's different and generally specific to the language used.
I think you need to use a GET request, instead of POST. If the url you're using has querystring values (like ?Key1=some_value&Key2=some_other_value) then it's expecting a GET. Instead of adding post values to your webrequest, just put this data in the querystring.
HttpWebRequest httpRequest = (HttpWebRequest)WebRequest.Create("http://example.com/?val1=" + val1 + "&val2=" + val2);
httpRequest.Method = "GET";
httpRequest.ContentType = "application/x-www-form-urlencoded";
....
So, the result you're getting is different when you POST the data from your app because the server-side code has a different output when it can't read the data it's expecting in the querystring.
In your code you a specify the POST method which sends the data to the PHP file without putting the data in the web address. When you put the information in the address bar, that is not the POST method, that is the GET method. The name may be confusing, but GET just means that the data is being sent to the PHP file through the web address, instead of behind the scenes, not that it is supposed to get any information. When you put the address in the browser it is using a GET.
Create a simple html form and specify POST as the method and your url as the action. You will see that the information is sent without appearing in the address bar.
Then do the same thing but specify GET. You will see the information you sent in the address bar.
I believe the problem has something to do with the way your headers are set up for the WebRequest.
I have seen strange cases where attempting to simulate a browser by changing headers in the request makes a difference to the server.
The short answer is that your console application is not a web browser and the web server of example.com is expecting to interact with a browser.
You might also consider changing the ContentType to be "multipart/form-data".
What I find odd is that you are essentially posting nothing. The work is being done by the query string. Therefore, you probably should be using a GET instead of a POST.
Is the form expecting a cookie? That is another possible reason why it works in the browser and not from the console app.

Categories

Resources