C# WebClient receives 403 when getting html from a site - c#

I am trying to download the HTML from a site and parse it. I am actually interested in the OpenGraph data in the head section only. For most sites using the WebClient, HttpClient or HtmlAgilityPack works, but some domains I get 403, for example: westelm.com
I have tried setting up the Headers to be absolutely the same as they are when I use the browser, but I still get 403. Here is some code:
string url = "https://www.westelm.com/m/products/brushed-herringbone-throw-t5792/?";
var doc = new HtmlDocument();
using(WebClient client = new WebClient()) {
client.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36";
client.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
client.Headers["Accept-Encoding"] = "gzip, deflate, br";
client.Headers["Accept-Language"] = "en-US,en;q=0.9";
doc.Load(client.OpenRead(url));
}
At this point, I am getting a 403.
Am I missing something or the site administrator is protecting the site from API requests?
How can I make this work? Is there a better way to get OpenGraph data from a site?
Thanks.

I used your question to resolve the same problem. IDK if you're already fixed this but I tell you how it worked for me
A page was giving me 403 for the same reasons. The thing is: you need to emulate a "web browser" from the code, sending a lot of headers.
I used one of yours headers I wasn't using (like Accept-Language)
I didn't use WebClient though, I used HttpClient to parse the webpage
private static async Task<string> GetHtmlResponseAsync(HttpClient httpClient, string url)
{
using var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url));
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36");
request.Headers.TryAddWithoutValidation("Accept-Charset", "UTF-8");
request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.9");
using var response = await httpClient.SendAsync(request).ConfigureAwait(false);
if (response == null)
return string.Empty;
using var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
return await streamReader.ReadToEndAsync().ConfigureAwait(false);
}
If it helps you, I'm glad. If not, I will leave this answer here to help someone else in the future!

Related

How to get json from Secured API

I did an api, for the exemple i will call it: https://testapp.azurewebsites.net.
I did the Authentication / Authorization in Azure for google Facebook and Miscrosoft account.
Then i want to consume it in xamarin for my Android/iOS app, so i did the login button etc but when I'm authentify i can't get the json of my api URL: "https://testapp.azurewebsites.net/api/Test/allCoordinates".
It work's perfectly in my browser and in postman... This is my code in C#:
var requestCoord = new OAuth2Request("GET", new Uri(URL),null , e.Account);
var responseCoord = await requestCoord.GetResponseAsync(); //it works for google userinfo but not for my Api...
string coordJson = await responseCoord.GetResponseTextAsync();
var mapTest = JsonConvert.DeserializeObject<List<CustomPin>>(coordJson);
In Postman it works and i can see this code for C# from postman:
var client = new RestClient("https://testapp.azurewebsites.net/api/Test/allCoordinates?access_token=ya29.a0AfH6SMCEGy4tP_zngNEhAcpf31d3O_ZYl7NE9QJjbKrW0KPh-dC7PjNmz-KOCbkySRtuwDJdg2ckhiTaTdIEsONxVhFhK3NpnUk9iITyCB1BnwpWJwNEEivxg0pL93UPP9r4UYf1dHEiTVd63eydfV7HoKlxExMFtS8");
client.Timeout = -1;
var request = new RestRequest(Method.GET);
request.AddHeader("User-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36");
request.AddHeader("Cookie", "ARRAffinity=d2e047f134af60dd8e0802593ad5206002e99e56a6231fee0e85747cfa96ea6f; AppServiceAuthSession=Dh0GnGQjaNoBXKv8r4lM9BoJkAA1UFSLuoDDAVP1qGrPP3ICauM1Glsb+Q7NhU+4m+IuPh5ZqGv2bzU6FtvEqri4Io88RuP6ZzKPayXSJKn4WbkzteU59if76yVY/KSjmwjbdUTC47yO+XO2snKygYlGZ9+pVlgaF/UdmW6OLWDlqPvJ069oSXkkZb/gGV5m6dHzYvfn3PcJ4HJmfEPQDclsRvRYUmpIY11hWcRUiiVx26o/SE+IaytRfWxkGk4g/thMFW3IOFtw09DdGXma/Qik8ANybClwXZ7G/3i1VyHQLM9TnU3UGcjtArLUFVj4T3jNkdaVioxtNQWJcDvwN54OL24eNFMM4Ov7Rbo7t2QtQrW73KxOrG/RyJHvBTHTyhjmAw6Hb7wg7VwcJvpKwcJKFBWH5ntvouFhj/DmCrzBuG/Cz6K+81ocEnHBLsHcx9qHrEBXCU3FlMQbogDcRRo1om78IwK+OxKoY+CzDWAJW3taJLl+jVO6QgFtbyqZKErzxEX1jeVcHTWTBdTImYaiA6zs1KKCSgo+rR3G0GWxvyWt9XCZwZD/5E+MYK3pxWFduKmmsEjSYrCgQ7Yhwe2bQg2bvX2HPScfo+yKVoIzQHNArqDr2NVTaWRUt2zN3GoLzSDxe5YgDjHXyo0ES6mEbEKsKy4dYDD7uRS/rdHRTTHUdih5i169sHvJlj0UFyaU8MV+J/dxbuMmNysqOmzVUU18oWQntE48RN35/Js=");
IRestResponse response = client.Execute(request);
Console.WriteLine(response.Content);
But how to find the cookie data in Xamarin ? and call this url ?
If you have any idea to make this Http Get from protected webapi i will help me so much !!!
Thanks a lot
If you need more precision let me know;)

C# HttpClient 504 Gateway Timeout when not using Fiddler proxy

I have this simple code to instantiate an HttpClient object, and send a few webrequests, but am running into a few problems that I will explain shortly:
var client = WebHelper.CreateGzipHttpClient(new WebProxy("127.0.0.1", 8888));
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36");
client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.9");
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate, br");
client.DefaultRequestHeaders.Add("Sec-Fetch-Mode", "navigate");
client.DefaultRequestHeaders.Add("Sec-Fetch-Site", "none");
client.DefaultRequestHeaders.Add("Sec-Fetch-User", "?1");
client.DefaultRequestHeaders.Add("Upgrade-Insecure-Requests", "1");
await client.GetAsync("https://www.example.com");
await client.GetAsync("https://www.bestbuy.com");
await client.GetAsync("https://www.costco.com");
If I remove the request to example.com, the subsequent requests fail (504 Gateway Timeout on bestbuy.com). Doesn't make any sense to me, so was wondering if someone on SO could enlighten me as to why that is.
Furthermore, if I remove the WebProxy from the HttpClient, only the request to example.com will succeed, and the other 2 will fail.
What is going on and how can I fix it?
public static HttpClient CreateGzipHttpClient(WebProxy proxy = null)
{
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate,
Proxy = proxy
};
return new HttpClient(handler);
}
Fixed by removing the Fiddler related SSL certificates within Internet Explorer's Internet Options. These weren't being removed even after an uninstall.

Get webpage content in asp using c#

I want to fill my MultiLine textbox from webpage's this is my code:
WebRequest request = WebRequest.Create(urltxt.Text.Trim());
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
html = sr.ReadToEnd();
}
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");
valuetxt.Text = htmlBody.InnerText;
This code is working fine for some url but for some url (https) this gave me an error:
Could not find file 'C:\Program Files\IIS Express\www.justdial.com
or:
The remote server returned an error: (403) Forbidden
Can anyone help me? Thanks in advance, sorry for my bad English.
Are you behind a proxy? Even on open internet, depending on your network configuration, you might need to set credentials in your connection before requesting.
WebRequest request = WebRequest.Create(urltxt.Text.Trim());
request.Credentials = new NetworkCredential("user", "password");
It seems your address doesn't have http:// or https:// at the beginning; in the urltxt variable and you get error because of relative addressing.
Add a UserAgent to your request to connect https properly:
request.UserAgent = #"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36";
from here

Why can't I use HttpClient to log in to this ASP.NET website?

There's an ASP.NET website from a third party that requires one to log on. I need to get some data from the website and parse it, so I figured I'd use HttpClient to post the necessary credentials to the website, same as the browser would do it. Then, after that POST request, I figured I'd be able to use the cookie values I received to make further request to the (authorization-only) urls.
I'm down to the point where I can successfully POST the credentials to the login url and receive three cookies: ASP.NET_SessionId, .ASPXAUTH, and a custom value used by the website itself, each with their own values. I figured that since the HttpClient I set up is using an HttpHandler that is using a CookieContainer, the cookies would be sent along with each further request, and I'd remain logged in.
However, this does not appear to be working. If I use the same HttpClient instance to then request one of the secured areas of the website, I'm just getting the login form again.
The code:
const string loginUri = "https://some.website/login";
var cookieContainer = new CookieContainer();
var clientHandler = new HttpClientHandler() { CookieContainer = cookieContainer, AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate };
var client = new HttpClient(clientHandler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
var loginRequest = new HttpRequestMessage(HttpMethod.Post, loginUri);
// These form values correspond with the values posted by the browser
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("customercode", "password"),
new KeyValuePair<string, string>("customerid", "username"),
new KeyValuePair<string, string>("HandleForm", "Login")
});
loginRequest.Content = formContent;
loginRequest.Headers.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393");
loginRequest.Headers.Referrer = new Uri("https://some.website/Login?ReturnUrl=%2f");
loginRequest.Headers.Host = "some.website";
loginRequest.Headers.Connection.Add("Keep-Alive");
loginRequest.Headers.CacheControl = new System.Net.Http.Headers.CacheControlHeaderValue() { NoCache = true };
loginRequest.Headers.AcceptLanguage.ParseAdd("nl-NL");
loginRequest.Headers.AcceptEncoding.ParseAdd("gzip, deflate");
loginRequest.Headers.Accept.ParseAdd("text/html, application/xhtml+xml, image/jxr, */*");
var response = await client.SendAsync(loginRequest);
var responseString = await response.Content.ReadAsStringAsync();
var cookies = cookieContainer.GetCookies(new Uri(loginUri));
When using the proper credentials, cookies contains three items, including a .ASPXAUTH cookie and a session id, which suggests that the login succeeded. However:
var text = await client.GetStringAsync("https://some.website/secureaction");
...this just returns the login form again, and not the content I get when I log in using the browser and navigate to /secureaction.
What am I missing?
EDIT: here's the complete request my application is making and the request chrome is making. They are identical, save for the cookie values. I ran them through windiff: the lines marked <! are the lines sent by my application, the ones marked !> are sent by Chrome.
GET https://some.website/secureaction
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept-Encoding: gzip, deflate, sdch, br
Upgrade-Insecure-Requests: 1
Host: some.website
Accept-Language:nl-NL,
>> nl;q=0.8,en-US;q=0.6,en;q=0.4
Accept: text/html,
>> application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Cookie:
<! customCookie=7CF190C0;
<! .ASPXAUTH=37D61E47(shortened for readability);
<! ASP.NET_SessionId=oqwmfwahpvf0qzpiextx0wtb
!> ASP.NET_SessionId=kn4t4rmeu2lfrgozjjga0z2j;
!> customCookie=8D43E263;
!> .ASPXAUTH=C2477BA1(shortened for readability)
The HttpClient application get a 302 referral to /login, Chrome gets a 200 response containing the requested page.
As requested, here's how I eventually made it work. I had to do a simple GET request to /login first, and then do a POST with the login credentials. I don't recall what value exactly is being set by that GET (I assume a cookie with some encoded value the server wants), but the HttpClient takes care of the cookies anyway, so it just works. Here's the final, working code:
const string loginUri = "https://some.website/login";
var cookieContainer = new CookieContainer();
var clientHandler = new HttpClientHandler()
{
CookieContainer = cookieContainer,
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
var client = new HttpClient(clientHandler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
// First do a GET to the login page, allowing the server to set certain
// required cookie values.
var initialGetRequest = new HttpRequestMessage(HttpMethod.GET, loginUri);
await client.SendAsync(initialGetRequest);
var loginRequest = new HttpRequestMessage(HttpMethod.Post, loginUri);
// These form values correspond with the values posted by the browser
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("customercode", "password"),
new KeyValuePair<string, string>("customerid", "username"),
new KeyValuePair<string, string>("HandleForm", "Login")
});
loginRequest.Content = formContent;
loginRequest.Headers.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393");
loginRequest.Headers.Referrer = new Uri("https://some.website/Login?ReturnUrl=%2f");
loginRequest.Headers.Host = "some.website";
loginRequest.Headers.Connection.Add("Keep-Alive");
loginRequest.Headers.CacheControl = new System.Net.Http.Headers.CacheControlHeaderValue() { NoCache = true };
loginRequest.Headers.AcceptLanguage.ParseAdd("nl-NL");
loginRequest.Headers.AcceptEncoding.ParseAdd("gzip, deflate");
loginRequest.Headers.Accept.ParseAdd("text/html, application/xhtml+xml, image/jxr, */*");
var response = await client.SendAsync(loginRequest);
var responseString = await response.Content.ReadAsStringAsync();
var cookies = cookieContainer.GetCookies(new Uri(loginUri));

fake http post request - viewstate

I am trying to fake a post request to a site programmed with c#.
I used WireShark to sniff the communication between my computer and the server.
I noticed that the client send viewstate data (encoded in Base64) and I would like to know how to fake it in my request.
my post code
public static void sendPostRequest(string responseUri,CookieCollection responseCookies)
{
HttpWebRequest mPostRequest =
(HttpWebRequest)WebRequest.Create("http://tickets.cinema-city.co.il/webtixsnetglilot/SelectSeatPage2.aspx?dtticks=" + responseUri + "&hideBackButton=1");
mPostRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36";
mPostRequest.KeepAlive = false;
mPostRequest.Method = "Post";
mPostRequest.ContentType = "application/x-www-form-urlencoded";
CookieContainer mCookies= new CookieContainer();
foreach (Cookie cookie in responseCookies)
{
mCookies.Add(cookie);
}
mPostRequest.CookieContainer = mCookies;
HttpWebResponse myHttpWebResponse2 = (HttpWebResponse)mPostRequest.GetResponse();
}
If you can "fake" signed/encrypted data you don't really need to deal with fake posts - just steal all SSL traffic :).
View state comes in original response for the page encrypted - so you simply need to parse original response (use Html Agility Pack) and send that view state back in post request.

Categories

Resources