Not able to access a page from US as a country - c#

I want to use the US as a country to access this
url = http://www.tillys.com/product/Say-What/Short-Dresses/SAY-WHAT--Ribbed-Tank-Midi-Dress/Heather-Grey/285111595,
I've tried with cookies and all but the url still it redirects to the site's home page.
I want to see if there is any way i can access this page. Below is the code with which i am trying.
Below is the function with which i am trying to do this:
public static string getUrlContent (string url)
{
var myHttpWebRequest = (HttpWebRequest)WebRequest.Create(url);
myHttpWebRequest.Method = "GET";
myHttpWebRequest.AllowAutoRedirect = true;
myHttpWebRequest.ContentLength = 0;
myHttpWebRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
myHttpWebRequest.Headers.Add("Cookie", "=en%5FUS;");
myHttpWebRequest.UserAgent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/49.0.2623.108 Chrome/49.0.2623.108 Safari/537.36";
//myHttpWebRequest.Headers.Add("Accept-Encoding", "gzip, deflate, sdch");
myHttpWebRequest.Headers.Add("Accept-Language", "en-US,en;q=0.8");
myHttpWebRequest.Headers.Add("Cookie", "wlcme=true");
//myHttpWebRequest.CookieContainer = new CookieContainer();
//myHttpWebRequest.Headers.Add("X-Macys-ClientId", "NavApp");
var response = (HttpWebResponse)myHttpWebRequest.GetResponse();
var rmyResponseHeaders = response.Headers;
Console.WriteLine ("Content length is {0}", response.ContentLength);
Console.WriteLine ("Content type is {0}", response.ContentType);
// Get the stream associated with the response.
Stream receiveStream = response.GetResponseStream ();
// Pipes the stream to a higher level stream reader with the required encoding format.
StreamReader readStream = new StreamReader (receiveStream, Encoding.UTF8);
//Console.WriteLine ("Response stream received.");
Console.WriteLine (readStream.ReadToEnd ());
var josnStr = readStream.ReadToEnd ();
Console.WriteLine (josnStr);
return josnStr;
//Encoding enc1 = Encoding.GetEncoding(1252);
}

If the site www.tillys.com is using Geo-fencing it will show you different content based on a lookup of the requesting IP address. In this case there's nothing C# or other languages can do.
You'll need to either proxy your request through a VPN (see How to send WebRequest via proxy?) or deploy your code to a data center in the US. For example, if you use Azure you can deploy to several different data centers through out the world including several data centers in the US. Once your code is running in the US it should be able to access the US version of the page.

Related

'System.Net.WebException' when accessing WebClient. Works fine on browser

I want to go to download a string from a website, I made this php file to show an example.
(This won't work around my whole website)
The link http://swageh.co/information.php won't be downloaded using a webClient from any PC.
I prefer using a webClient.
No matter what I try, it won't downloadString.
It works fine on a browser.
It returns an error 500 An unhandled exception of type 'System.Net.WebException' occurred in System.dll
Additional information: The underlying connection was closed: An unexpected error occurred on a send. is the error
Did you change something on the server-side?
All of the following options are working just fine for me as of right now (all return just "false" with StatusCode of 200):
var client = new WebClient();
var stringResult = client.DownloadString("http://swageh.co/information.php");
Also HttpWebRequest:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://swageh.co/information.php");
request.GetResponse().GetResponseStream();
Newer HttpClient:
var client = new HttpClient();
var req = new HttpRequestMessage(HttpMethod.Get, "http://swageh.co/information.php");
var res = client.SendAsync(req);
var stringResult = res.Result.Content.ReadAsStringAsync().Result;
it's because your website is responding with 301 Moved Permanently
see Get where a 301 URl redirects to
This shows how to automatically follow the redirect: Using WebClient in C# is there a way to get the URL of a site after being redirected?
look at Christophe Debove's answer rather than the accepted answer.
Interestingly this doesn't work - tried making headers the same as Chrome as below, perhaps use Telerik Fiddler to see what is happening.
var strUrl = "http://theurl_inhere";
var headers = new WebHeaderCollection();
headers.Add("Accept-Language", "en-US,en;q=0.9");
headers.Add("Cache-Control", "no-cache");
headers.Add("Pragma", "no-cache");
headers.Add("Upgrade-Insecure-Requests", "1");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(strUrl);
request.Method = "GET";
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
request.Accept = "text/html,application/xhtml+xml,application/xml; q = 0.9,image / webp,image / apng,*/*;q=0.8";
request.Headers.Add( headers );
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
var strLastRedirect = response.ResponseUri.ToString();
StreamReader reader = new StreamReader(dataStream);
string strResponse = reader.ReadToEnd();
response.Close();

The underlying connection was closed exception while WebClient DownloadString

Just a piece of code
WebClient wc = new WebClient();
String str = wc.DownloadString(new Uri("http://content.warframe.com/dynamic/rss.php"));
And I got exception:
An unhandled exception of type 'System.Net.WebException' occurred in
System.dll
Additional information: The underlying connection was closed: The
connection was closed unexpectedly.
I've head that this is a bug in .NET (I am using 3.5), but I tried other methods to obtain the content of this link (its rss, xml). No lucky shot yet
var webrequest = (WebRequest)HttpWebRequest.Create(#"http://content.warframe.com/dynamic/rss.php");
var resp = webrequest.GetResponse();
//HttpWebResponse webresponse = (HttpWebResponse)webrequest.GetResponse(); // Wont work also
This code above won't work either, both casts the same exception
Fiddler logs:
SESSION STATE: Aborted.
Response Entity Size: 512 bytes.
== FLAGS ==================
BitFlags: [ResponseGeneratedByFiddler] 0x100
X-ABORTED-WHEN: Done
X-CLIENTIP: 127.0.0.1
X-CLIENTPORT: 2747
X-EGRESSPORT: 2748
X-FAILSESSION-WHEN: ReadingResponse
X-HOSTIP: 205.185.216.10
X-PROCESSINFO: willitwork.vshost:3300
== TIMING INFO ============
ClientConnected: 10:29:11.706
ClientBeginRequest: 10:29:11.713
GotRequestHeaders: 10:29:11.713
ClientDoneRequest: 10:29:11.713
Determine Gateway: 0ms
DNS Lookup: 164ms
TCP/IP Connect: 74ms
HTTPS Handshake: 0ms
ServerConnected: 10:29:11.953
FiddlerBeginRequest: 10:29:11.953
ServerGotRequest: 10:29:11.953
ServerBeginResponse: 10:29:12.372
GotResponseHeaders: 00:00:00.000
ServerDoneResponse: 10:29:12.372
ClientBeginResponse: 10:29:12.385
ClientDoneResponse: 10:29:12.385
Overall Elapsed: 0:00:00.672
The response was buffered before delivery to the client.
== WININET CACHE INFO ============
This URL is not present in the WinINET cache. [Code: 2]
* Note: Data above shows WinINET's current cache state, not the state at the time of the request.
* Note: Data above shows WinINET's Medium Integrity (non-Protected Mode) cache only.
Also - 504, this does not makes sense since I can get data from link via chrome / firefox / ie...
I just did it to work in other language, but I am forced to do it with C# (I' ve made 2 much code to rewrite it)
I've added some settings like fiddler said
myHttpWebRequest1.ProtocolVersion = HttpVersion.Version11;
myHttpWebRequest1.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36";
myHttpWebRequest1.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
At least now I get 504 error instead of "unknown", but I can still view the content via webbrowser, so the 504 error is fake
Edit: There is no response error when I added
myHttpWebRequest1.Headers["Accept-Encoding"] = "gzip";
but now the output is messed and unreadable
I have same error.
You can add User Agent to your httpRequest
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
Ok, i got this all fixes & working!
static void Main(string[] args)
{
Uri url = new Uri(#"http://content.warframe.com/dynamic/rss.php");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
// MAGIC LINE GOES HERE \/
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
// Assign the response object of HttpWebRequest to a HttpWebResponse variable.
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (Stream streamResponse = response.GetResponseStream())
{
using (StreamReader streamRead = new StreamReader(streamResponse))
{
Char[] readBuff = new Char[2000];
int count = streamRead.Read(readBuff, 0, 2000);
while (count > 0)
{
String outputData = new String(readBuff, 0, count);
Console.Write(outputData);
count = streamRead.Read(readBuff, 0, 2000);
}
}
}
}
Console.ReadKey();
}
Besides of not-using WebClient.DownloadString method i had to add decompresion line
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
Thanks for tips (especially fiddler one, Decode button saved my time to find what's wrong)
Check this answer:
..The underlying connection was closed: An unexpected error occurred on a receive
So this may work for you:
var webRequest = (HttpWebRequest)HttpWebRequest.Create(#"http://content.warframe.com/dynamic/rss.php");
webRequest.KeepAlive = false;
var resp = webRequest.GetResponse();
EDIT:
You are right, check rather this:
http://msdn.microsoft.com/cs-cz/library/system.net.httpwebrequest.keepalive%28v=vs.110%29.aspx
Here is working code that will print out the recieved response content:
static void Main(string[] args)
{
// Create a new HttpWebRequest object.Make sure that
// a default proxy is set if you are behind a firewall.
HttpWebRequest myHttpWebRequest1 =
(HttpWebRequest)WebRequest.Create(#"http://content.warframe.com/dynamic/rss.php");
myHttpWebRequest1.KeepAlive=false;
// Assign the response object of HttpWebRequest to a HttpWebResponse variable.
HttpWebResponse myHttpWebResponse1 =
(HttpWebResponse)myHttpWebRequest1.GetResponse();
Console.WriteLine("\nThe HTTP request Headers for the first request are: \n{0}", myHttpWebRequest1.Headers);
Stream streamResponse = myHttpWebResponse1.GetResponseStream();
StreamReader streamRead = new StreamReader(streamResponse);
Char[] readBuff = new Char[256];
int count = streamRead.Read(readBuff, 0, 256);
Console.WriteLine("The contents of the Html page are.......\n");
while (count > 0)
{
String outputData = new String(readBuff, 0, count);
Console.Write(outputData);
count = streamRead.Read(readBuff, 0, 256);
}
Console.WriteLine();
// Close the Stream object.
streamResponse.Close();
streamRead.Close();
Console.ReadKey();
}

C# using HttpWebRequest Post method doesn't work

Hey I'm trying to figure out using HttpWebRequest to do a Post request to a login page, say yahoo mail, and examine the returned page source.
But using my Post method I still got the login page.
Here is my method:
public static string GetResponse(string sURL, ref CookieContainer cookies, string sParameters)
{
HttpWebRequest httpRequest = (HttpWebRequest)WebRequest.Create(sURL);
httpRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36";
httpRequest.CookieContainer = cookies;
httpRequest.Method = "Post";
httpRequest.ContentType = "application/x-www-form-urlencoded";
httpRequest.ContentLength = sParameters.Length;
httpRequest.AllowAutoRedirect = true;
using (Stream stream = httpRequest.GetRequestStream())
{
stream.Write(Encoding.UTF8.GetBytes(sParameters), 0, sParameters.Length);
}
HttpWebResponse httpWebResponse = (HttpWebResponse)httpRequest.GetResponse();
string sResponse;
using (Stream stream = httpWebResponse.GetResponseStream())
{
StreamReader reader = new StreamReader(stream, System.Text.Encoding.GetEncoding(936));
sResponse = reader.ReadToEnd();
}
return sResponse;
}
The code to call the method is:
string sParameter = ".tries=1&.src=ym&.md5=&.hash=&.js=&.last=&promo=&.intl=us&.lang=en-US&.bypass=&.partner=&.u=eip09319532h1&.v=0&.challenge=3QjvX9eEFtJRrABhZp9kgS9IT.VO&.yplus=&.emailCode=&pkg=&stepid=&.ev=&hasMsgr=0&.chkP=Y&.done=http%3A%2F%2Fmail.yahoo.com&.pd=ym_ver%3D0%26c%3D%26ivt%3D%26sg%3D&.ws=1&.cp=0&nr=0&pad=3&aad=3&login=username%40yahoo.com&passwd=xxxxx&.persistent=&.save=&passwd_raw=";
System.Net.CookieContainer coookies = null ;
string sResponse;
sResponse = GetResponse(sUrl, ref coookies, sParameter);
The string sParameter was obtained by examining the data posted to the server in Firefox's Firebug plugin. But in the parameters I posted above, I masked my user id and password.
I wanted to re-use the session so I passed a CookieContainer object as reference to the method.
It compiles and runs, but the page returned to me is not logged-in status.
I have read several similar questions on stackoverflow, but still can't make my method work. Your help is appreciated.

Screen scraping: unable to authenticate into a site utilizing ASP .NET Forms Authentication

Using a C# WebRequest, I am attempting to screen scrape a website utilizing ASP.NET Forms Authentication.
First, the application performs a GET to the login page and extracts the __VIEWSTATE and __EVENTVALIDATION keys from hidden input fields, and the .NET SessionId from its cookie. Next, the application performs a POST with the username, password, other required form fields, and the three aforementioned .NET variables to the form action.
From a Fiddler session using Chrome to authenticate into the website, I am expecting a 302 with a token stored in a cookie to allow navigation of the secure area of the site. I cannot understand why I keep getting 302s without a token, redirecting me to the website's non-authenticated home page. In Fiddler, my application's request looks exactly the same as the request made from within Chrome or Firefox.
// Create a request using a URL that can receive a post.
var request = (HttpWebRequest)WebRequest.Create(LoginUrl);
// Set the Method property of the request to POST.
_container = new CookieContainer();
request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.17 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers["Accept-Encoding"] = "gzip,deflate,sdch";
request.Headers["Accept-Language"] = "en-US,en;q=0.8";
var response = (HttpWebResponse)request.GetResponse();
_container.Add(response.Cookies);
string responseFromServer;
using (var decompress = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress))
{
using (var reader = new StreamReader(decompress))
{
// Read the content.
responseFromServer = reader.ReadToEnd();
}
}
var doc = new HtmlDocument();
doc.LoadHtml(responseFromServer);
var hiddenFields = doc.DocumentNode.SelectNodes("//input[#type='hidden']").ToDictionary(input => input.GetAttributeValue("name", ""), input => input.GetAttributeValue("value", ""));
request = (HttpWebRequest)WebRequest.Create(LoginUrl);
request.Method = "POST";
request.CookieContainer = _container;
// Create POST data and convert it to a byte array. Modify this line accordingly
var postData = String.Format("ddlsubsciribers={0}&memberfname={1}&memberpwd={2}&chkRemberMe=true&Imgbtn=LOGIN&__EVENTTARGET&__EVENTARGUMENT&__LASTFOCUS", Agency, Username, Password);
postData = hiddenFields.Aggregate(postData, (current, field) => current + ("&" + field.Key + "=" + field.Value));
ServicePointManager.ServerCertificateValidationCallback = AcceptAllCertifications;
var byteArray = Encoding.UTF8.GetBytes(postData);
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.17 Safari/537.36";
// Set the ContentType property of the WebRequest.
request.ContentType = "application/x-www-form-urlencoded";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers["Accept-Encoding"] = "gzip,deflate,sdch";
request.Headers["Accept-Language"] = "en-US,en;q=0.8";
// Set the ContentLength property of the WebRequest.
request.ContentLength = byteArray.Length;
// Get the request stream.
var dataStream = request.GetRequestStream();
// Write the data to the request stream.
dataStream.Write(byteArray, 0, byteArray.Length);
// Close the Stream object.
dataStream.Close();
// Get the response.
response = (HttpWebResponse)request.GetResponse();
_container.Add(response.Cookies);
// Clean up the streams.
dataStream.Close();
response.Close();
As it would turn out, some funky characters in the __EVENTVALIDATION variable were being encoded into a line break, and ASP.NET then threw out the session assuming it had become corrupt. The solution was to escape the ASP.NET variables using Uri.EscapeDataString.
postData = hiddenFields.Aggregate(postData, (current, field) => current + ("&" + field.Key + "=" + Uri.EscapeDataString(field.Value)));

matweb.com: How to get source of page?

I have url like:
http://www.matweb.com/search/DataSheet.aspx?MatGUID=849e2916ab1541be9ff6a17b78f95c82
I want to download source code from that page using this code:
private static string urlTemplate = #"http://www.matweb.com/search/DataSheet.aspx?MatGUID=";
static string GetSource(string guid)
{
try
{
Uri url = new Uri(urlTemplate + guid);
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.Method = "GET";
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
Stream responseStream = webResponse.GetResponseStream();
StreamReader responseStreamReader = new StreamReader(responseStream);
String result = responseStreamReader.ReadToEnd();
return result;
}
catch (Exception ex)
{
return null;
}
}
When I do so I get:
You do not seem to have cookies enabled. MatWeb Requires cookies to be enabled.
Ok, that I understand, so I added lines:
CookieContainer cc = new CookieContainer();
webRequest.CookieContainer = cc;
I got:
Your IP Address has been restricted due to excessive use. The problem may be compounded when an IP address may be shared by many people in a company or through an internet service provider. We apologize for any inconvenience.
I can understand this but I'm not getting this message when I try to visit this page using web browser. What can I do to get the source code? Some cookies or http headers?
It probably doesn't like your UserAgent. Try this:
webRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)"; //maybe substitute your own in here
It looks like you're doing something that the company doesn't like, if you got an "excessive use" response.
You are downloading pages too fast.
When you use a browser you might get up to one page per second. Using a application you can get several pages per second and that's probably what their web server is detecting. Hence the excessive usage.

Categories

Resources