I have this simple code to instantiate an HttpClient object, and send a few webrequests, but am running into a few problems that I will explain shortly:
var client = WebHelper.CreateGzipHttpClient(new WebProxy("127.0.0.1", 8888));
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36");
client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.9");
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate, br");
client.DefaultRequestHeaders.Add("Sec-Fetch-Mode", "navigate");
client.DefaultRequestHeaders.Add("Sec-Fetch-Site", "none");
client.DefaultRequestHeaders.Add("Sec-Fetch-User", "?1");
client.DefaultRequestHeaders.Add("Upgrade-Insecure-Requests", "1");
await client.GetAsync("https://www.example.com");
await client.GetAsync("https://www.bestbuy.com");
await client.GetAsync("https://www.costco.com");
If I remove the request to example.com, the subsequent requests fail (504 Gateway Timeout on bestbuy.com). Doesn't make any sense to me, so was wondering if someone on SO could enlighten me as to why that is.
Furthermore, if I remove the WebProxy from the HttpClient, only the request to example.com will succeed, and the other 2 will fail.
What is going on and how can I fix it?
public static HttpClient CreateGzipHttpClient(WebProxy proxy = null)
{
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate,
Proxy = proxy
};
return new HttpClient(handler);
}
Fixed by removing the Fiddler related SSL certificates within Internet Explorer's Internet Options. These weren't being removed even after an uninstall.
Related
I am trying to download the HTML from a site and parse it. I am actually interested in the OpenGraph data in the head section only. For most sites using the WebClient, HttpClient or HtmlAgilityPack works, but some domains I get 403, for example: westelm.com
I have tried setting up the Headers to be absolutely the same as they are when I use the browser, but I still get 403. Here is some code:
string url = "https://www.westelm.com/m/products/brushed-herringbone-throw-t5792/?";
var doc = new HtmlDocument();
using(WebClient client = new WebClient()) {
client.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36";
client.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
client.Headers["Accept-Encoding"] = "gzip, deflate, br";
client.Headers["Accept-Language"] = "en-US,en;q=0.9";
doc.Load(client.OpenRead(url));
}
At this point, I am getting a 403.
Am I missing something or the site administrator is protecting the site from API requests?
How can I make this work? Is there a better way to get OpenGraph data from a site?
Thanks.
I used your question to resolve the same problem. IDK if you're already fixed this but I tell you how it worked for me
A page was giving me 403 for the same reasons. The thing is: you need to emulate a "web browser" from the code, sending a lot of headers.
I used one of yours headers I wasn't using (like Accept-Language)
I didn't use WebClient though, I used HttpClient to parse the webpage
private static async Task<string> GetHtmlResponseAsync(HttpClient httpClient, string url)
{
using var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url));
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36");
request.Headers.TryAddWithoutValidation("Accept-Charset", "UTF-8");
request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.9");
using var response = await httpClient.SendAsync(request).ConfigureAwait(false);
if (response == null)
return string.Empty;
using var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
return await streamReader.ReadToEndAsync().ConfigureAwait(false);
}
If it helps you, I'm glad. If not, I will leave this answer here to help someone else in the future!
I'm making a POST request to a web site to send username/login information in order to get a cooking containing an authentication token. I build, test, and debug my process running in a console application, then I set it to run as part of a Windows Service.
When I am running in the console app, there are 2 cookies being returned from the post request: the JSESSIONID and the AuthToken. After I deploy and run it in the Windows Service, I only see the JSESSIONID.
I set up Fiddler to watch the windows service, and I can see the AuthToken being passed back in the response to my post request, but I am unable to get it from the cookie container.
Additionally, if I install the process on my local machine, and run through my Fiddler proxy, I am able to get the AuthToken, but if I disable the proxy, no AuthToken is contained in the cookies.GetCookies result.
I'm at a loss as to why it will operate fine when running as a console app on my local machine, but the AuthToken fails to return when run as a service on a remote machine.
My local machine is running .net 4.7, and the server where the service is installed is running 4.5.2.
Here's the code I'm using:
public string SubmitPost(Uri uri, string action, string contentPost, bool putRequest)
{
CookieContainer cookies = new CookieContainer();
WebRequestHandler handler = new WebRequestHandler();
handler.Proxy = new WebProxy("http://<FIDDLER PROXY>:8888", false, new string[] {});
X509Certificate cert = X509Certificate2.CreateFromSignedFile(m_CertPath);
handler.ClientCertificates.Add(cert);
handler.CookieContainer = cookies;
string resultContent = "";
using (var client = new HttpClient(handler))
{
AddClientHeadersForPost(uri, client);
cookies.Add(uri, m_jar);
var content = new StringContent(contentPost, Encoding.UTF8);
content.Headers.ContentType = new MediaTypeHeaderValue("application/json");
content.Headers.ContentLength = contentPost.Length;
HttpResponseMessage result = client.PostAsync(action, content).Result;
resultContent = result.Content.ReadAsStringAsync().Result;
IEnumerable<Cookie> responseCookies = cookies.GetCookies(uri).Cast<Cookie>();
Logger.InfoFormat("{0} cookies", responseCookies.Count());
foreach (Cookie item in responseCookies)
{
Logger.InfoFormat("Cookie: {0}", item.Name);
if (item.Name.Contains("auth"))
{
Logger.InfoFormat("Auth Token: {0}", item.Value);
}
m_jar.Add(item);
}
}
return (resultContent);
}
protected virtual void AddClientHeadersForPost(Uri uri, HttpClient client)
{
client.BaseAddress = uri;
client.DefaultRequestHeaders.TryAddWithoutValidation("Host", "<HOST URL>");
client.DefaultRequestHeaders.TryAddWithoutValidation("Origin", "<ORIGIN URL>");
client.DefaultRequestHeaders.TryAddWithoutValidation("X-Requested-With", "XMLHttpRequest");
client.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "*/*");
client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.8");
client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36");
client.DefaultRequestHeaders.TryAddWithoutValidation("DNT", "1");
client.DefaultRequestHeaders.TryAddWithoutValidation("Referer", "<REFERER URL>");
client.DefaultRequestHeaders.TryAddWithoutValidation("X-CSRF-TOKEN", m_csrf);
client.DefaultRequestHeaders.TryAddWithoutValidation("X-2-CSRF-TOKEN", m_sppCSRF);
client.DefaultRequestHeaders.TryAddWithoutValidation("Connection", "keep-alive");
client.DefaultRequestHeaders.ExpectContinue = false;
}
There's an ASP.NET website from a third party that requires one to log on. I need to get some data from the website and parse it, so I figured I'd use HttpClient to post the necessary credentials to the website, same as the browser would do it. Then, after that POST request, I figured I'd be able to use the cookie values I received to make further request to the (authorization-only) urls.
I'm down to the point where I can successfully POST the credentials to the login url and receive three cookies: ASP.NET_SessionId, .ASPXAUTH, and a custom value used by the website itself, each with their own values. I figured that since the HttpClient I set up is using an HttpHandler that is using a CookieContainer, the cookies would be sent along with each further request, and I'd remain logged in.
However, this does not appear to be working. If I use the same HttpClient instance to then request one of the secured areas of the website, I'm just getting the login form again.
The code:
const string loginUri = "https://some.website/login";
var cookieContainer = new CookieContainer();
var clientHandler = new HttpClientHandler() { CookieContainer = cookieContainer, AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate };
var client = new HttpClient(clientHandler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
var loginRequest = new HttpRequestMessage(HttpMethod.Post, loginUri);
// These form values correspond with the values posted by the browser
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("customercode", "password"),
new KeyValuePair<string, string>("customerid", "username"),
new KeyValuePair<string, string>("HandleForm", "Login")
});
loginRequest.Content = formContent;
loginRequest.Headers.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393");
loginRequest.Headers.Referrer = new Uri("https://some.website/Login?ReturnUrl=%2f");
loginRequest.Headers.Host = "some.website";
loginRequest.Headers.Connection.Add("Keep-Alive");
loginRequest.Headers.CacheControl = new System.Net.Http.Headers.CacheControlHeaderValue() { NoCache = true };
loginRequest.Headers.AcceptLanguage.ParseAdd("nl-NL");
loginRequest.Headers.AcceptEncoding.ParseAdd("gzip, deflate");
loginRequest.Headers.Accept.ParseAdd("text/html, application/xhtml+xml, image/jxr, */*");
var response = await client.SendAsync(loginRequest);
var responseString = await response.Content.ReadAsStringAsync();
var cookies = cookieContainer.GetCookies(new Uri(loginUri));
When using the proper credentials, cookies contains three items, including a .ASPXAUTH cookie and a session id, which suggests that the login succeeded. However:
var text = await client.GetStringAsync("https://some.website/secureaction");
...this just returns the login form again, and not the content I get when I log in using the browser and navigate to /secureaction.
What am I missing?
EDIT: here's the complete request my application is making and the request chrome is making. They are identical, save for the cookie values. I ran them through windiff: the lines marked <! are the lines sent by my application, the ones marked !> are sent by Chrome.
GET https://some.website/secureaction
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept-Encoding: gzip, deflate, sdch, br
Upgrade-Insecure-Requests: 1
Host: some.website
Accept-Language:nl-NL,
>> nl;q=0.8,en-US;q=0.6,en;q=0.4
Accept: text/html,
>> application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Cookie:
<! customCookie=7CF190C0;
<! .ASPXAUTH=37D61E47(shortened for readability);
<! ASP.NET_SessionId=oqwmfwahpvf0qzpiextx0wtb
!> ASP.NET_SessionId=kn4t4rmeu2lfrgozjjga0z2j;
!> customCookie=8D43E263;
!> .ASPXAUTH=C2477BA1(shortened for readability)
The HttpClient application get a 302 referral to /login, Chrome gets a 200 response containing the requested page.
As requested, here's how I eventually made it work. I had to do a simple GET request to /login first, and then do a POST with the login credentials. I don't recall what value exactly is being set by that GET (I assume a cookie with some encoded value the server wants), but the HttpClient takes care of the cookies anyway, so it just works. Here's the final, working code:
const string loginUri = "https://some.website/login";
var cookieContainer = new CookieContainer();
var clientHandler = new HttpClientHandler()
{
CookieContainer = cookieContainer,
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
var client = new HttpClient(clientHandler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
// First do a GET to the login page, allowing the server to set certain
// required cookie values.
var initialGetRequest = new HttpRequestMessage(HttpMethod.GET, loginUri);
await client.SendAsync(initialGetRequest);
var loginRequest = new HttpRequestMessage(HttpMethod.Post, loginUri);
// These form values correspond with the values posted by the browser
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("customercode", "password"),
new KeyValuePair<string, string>("customerid", "username"),
new KeyValuePair<string, string>("HandleForm", "Login")
});
loginRequest.Content = formContent;
loginRequest.Headers.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393");
loginRequest.Headers.Referrer = new Uri("https://some.website/Login?ReturnUrl=%2f");
loginRequest.Headers.Host = "some.website";
loginRequest.Headers.Connection.Add("Keep-Alive");
loginRequest.Headers.CacheControl = new System.Net.Http.Headers.CacheControlHeaderValue() { NoCache = true };
loginRequest.Headers.AcceptLanguage.ParseAdd("nl-NL");
loginRequest.Headers.AcceptEncoding.ParseAdd("gzip, deflate");
loginRequest.Headers.Accept.ParseAdd("text/html, application/xhtml+xml, image/jxr, */*");
var response = await client.SendAsync(loginRequest);
var responseString = await response.Content.ReadAsStringAsync();
var cookies = cookieContainer.GetCookies(new Uri(loginUri));
I'm trying to automate a WebSocket service that denies connection unless you send a user agent with the CONNECT request.
I tried sending the upgrade request with HttpWebRequest and setting User-Agent using the property.
Using Fiddler to debug the request this was sent out:
CONNECT *.*.com:443 HTTP/1.1
Host: *.*.com:443
Connection: keep-alive
How do I add the User-Agent string to the CONNECT request and then upgrade to using WebSocket protocol?
My code so far:
public void Login ( Action onEnd = null ) {
var req = CreateUpgradeRequest();
var res = GetResponse(req);
}
private HttpWebRequest CreateUpgradeRequest ( ) {
HttpWebRequest request = WebRequest.Create("https://lobby35.runescape.com/") as HttpWebRequest;
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36";
request.Connection = "Upgrade";
SetWebSocketHeader(request, "Key", "5LENZfSifyj/Rw1ghTvpgw==");
SetWebSocketHeader(request, "Version", "13");
SetWebSocketHeader(request, "Extensions", "permessage-deflate; client_max_window_bits");
SetWebSocketHeader(request, "Protocol", "jagex");
return request;
}
You cannot use WebRequest to create a websocket connection. You will need ClientWebSocket and use `ClientWebSocket.Options.SetRequestHeader.
Note, you may have issues adding that header: Setting "User-Agent" HTTP header in ClientWebSocket
Update: Since you cannot add that header with ClientWebSocket try with Websocket4Net.
HttpClient client = new HttpClient(new HttpClientHandler() { AllowAutoRedirect = true }) { };
HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, url);
request.Headers.Add("ContentType", "audio/mpeg");
request.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
request.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36");
request.Headers.Add("Accept-Encoding", "gzip, deflate");
request.Headers.Add("Host", "fsa.zedge.net");
request.Headers.Add("Accept-Encoding", "gzip,deflate,sdch");
request.Headers.Add("Accept-Language", "en-US,en;q=0.8,vi;q=0.6");
request.Headers.Add("Cache-Control", "max-age=0");
request.Headers.Add("Connection", "keep-alive");
HttpResponseMessage response = await client.GetAsync(new Uri(url,UriKind.Absolute));
response.EnsureSuccessStatusCode();
string responseUri = response.RequestMessage.RequestUri.AbsoluteUri;
An exception of type 'System.Net.Http.HttpRequestException' occurred
in System.Net.Http.DLL but was not handled in user code
Additional information: Response status code does not indicate
success: 403 (Forbidden).
I have a URL but when I get it, it will move toward the other, and each time a URL is redirected URL, then it's different. And this URL work sessions every time it expires. So now every time I get a new URL, it just informed that the session fails