I am currently trying to build an application that automatically scrapes a webpage for updates and their respective download link and downloads if need be. Currently I am trying to figure the best way to download a ZIP file via HttpClient. The webpage I am accessing requires authentication so I am using httpclient to authenticate and create a cookiecontainer enabling me to download the required files, however, I've looked all through the API and cannot figure out how to download. I've tried various methods, however, none have yet to work. This is how the method looks like, but again I haven't been able to figure out a good way to download the zip
var baseAddress = new Uri(url);
var cookieContainer = new CookieContainer();
using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer })
using (var client = new HttpClient(handler) { BaseAddress = baseAddress })
{
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0");
client.DefaultRequestHeaders.Add("Connection", "keep-alive");
client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.5");
client.DefaultRequestHeaders.Add("Accept", "application/json, text/javascript, */*; q=0.01");
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("username", username),
new KeyValuePair<string, string>("password", password),
});
//Download Files
string[] download = getDownload().Split('/');
foreach (string link in download)
{
string path = #".\" + link + ".zip";
string dlink = url + link;
var uri = new Uri(dlink);
Related
I would like to post a http request in order to login to a website using HttpClient and FormUrlEncodedContent. I can track how the response is supposed to look on my chrome browser, but when i recreate the post request made by my browser, I don't get the anticipated response. The website I'm trying to log into is https://www.lectio.dk/lectio/31/login.aspx.
I guess I'm in doubt as to what I should supply the FormUrlEncodedContent with (using the network tab in chrome I can see request headers and form data, but how do I select what I should supply?). Currently my code looks like this.
CookieContainer container = new CookieContainer();
HttpClientHandler handler = new HttpClientHandler();
handler.CookieContainer = container;
HttpClient client = new HttpClient(handler);
Console.WriteLine(client.DefaultRequestHeaders + "\n" + "--------------------------");
Dictionary<string, string> vals = new Dictionary<string, string>
{
{"user-agent","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"},
{"accept-language","en-GB,en-AS;q=0.9,en-DK;q=0.8,en;q=0.7,da-DK;q=0.6,da;q=0.5,en-US;q=0.4"},
{"accept-encoding","gzip, deflate, br"},
{"accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3"},
{"m$Content$username2","username"},
{"m$Content$passwordHidden","password"},
{"__EVENTTARGET","m$Content$submitbtn2"},
};
FormUrlEncodedContent content = new FormUrlEncodedContent(vals);
var response = client.PostAsync("https://www.lectio.dk/lectio/31/login.aspx", content);
var responseString = response.Result;
Console.WriteLine(responseString);
handler.Dispose();
client.Dispose();
The idea is to be logged into the website(I guess my Cookiecontainer will take care of that?) so that I can scrape som data.
I'm making a POST request to a web site to send username/login information in order to get a cooking containing an authentication token. I build, test, and debug my process running in a console application, then I set it to run as part of a Windows Service.
When I am running in the console app, there are 2 cookies being returned from the post request: the JSESSIONID and the AuthToken. After I deploy and run it in the Windows Service, I only see the JSESSIONID.
I set up Fiddler to watch the windows service, and I can see the AuthToken being passed back in the response to my post request, but I am unable to get it from the cookie container.
Additionally, if I install the process on my local machine, and run through my Fiddler proxy, I am able to get the AuthToken, but if I disable the proxy, no AuthToken is contained in the cookies.GetCookies result.
I'm at a loss as to why it will operate fine when running as a console app on my local machine, but the AuthToken fails to return when run as a service on a remote machine.
My local machine is running .net 4.7, and the server where the service is installed is running 4.5.2.
Here's the code I'm using:
public string SubmitPost(Uri uri, string action, string contentPost, bool putRequest)
{
CookieContainer cookies = new CookieContainer();
WebRequestHandler handler = new WebRequestHandler();
handler.Proxy = new WebProxy("http://<FIDDLER PROXY>:8888", false, new string[] {});
X509Certificate cert = X509Certificate2.CreateFromSignedFile(m_CertPath);
handler.ClientCertificates.Add(cert);
handler.CookieContainer = cookies;
string resultContent = "";
using (var client = new HttpClient(handler))
{
AddClientHeadersForPost(uri, client);
cookies.Add(uri, m_jar);
var content = new StringContent(contentPost, Encoding.UTF8);
content.Headers.ContentType = new MediaTypeHeaderValue("application/json");
content.Headers.ContentLength = contentPost.Length;
HttpResponseMessage result = client.PostAsync(action, content).Result;
resultContent = result.Content.ReadAsStringAsync().Result;
IEnumerable<Cookie> responseCookies = cookies.GetCookies(uri).Cast<Cookie>();
Logger.InfoFormat("{0} cookies", responseCookies.Count());
foreach (Cookie item in responseCookies)
{
Logger.InfoFormat("Cookie: {0}", item.Name);
if (item.Name.Contains("auth"))
{
Logger.InfoFormat("Auth Token: {0}", item.Value);
}
m_jar.Add(item);
}
}
return (resultContent);
}
protected virtual void AddClientHeadersForPost(Uri uri, HttpClient client)
{
client.BaseAddress = uri;
client.DefaultRequestHeaders.TryAddWithoutValidation("Host", "<HOST URL>");
client.DefaultRequestHeaders.TryAddWithoutValidation("Origin", "<ORIGIN URL>");
client.DefaultRequestHeaders.TryAddWithoutValidation("X-Requested-With", "XMLHttpRequest");
client.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "*/*");
client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.8");
client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36");
client.DefaultRequestHeaders.TryAddWithoutValidation("DNT", "1");
client.DefaultRequestHeaders.TryAddWithoutValidation("Referer", "<REFERER URL>");
client.DefaultRequestHeaders.TryAddWithoutValidation("X-CSRF-TOKEN", m_csrf);
client.DefaultRequestHeaders.TryAddWithoutValidation("X-2-CSRF-TOKEN", m_sppCSRF);
client.DefaultRequestHeaders.TryAddWithoutValidation("Connection", "keep-alive");
client.DefaultRequestHeaders.ExpectContinue = false;
}
There's an ASP.NET website from a third party that requires one to log on. I need to get some data from the website and parse it, so I figured I'd use HttpClient to post the necessary credentials to the website, same as the browser would do it. Then, after that POST request, I figured I'd be able to use the cookie values I received to make further request to the (authorization-only) urls.
I'm down to the point where I can successfully POST the credentials to the login url and receive three cookies: ASP.NET_SessionId, .ASPXAUTH, and a custom value used by the website itself, each with their own values. I figured that since the HttpClient I set up is using an HttpHandler that is using a CookieContainer, the cookies would be sent along with each further request, and I'd remain logged in.
However, this does not appear to be working. If I use the same HttpClient instance to then request one of the secured areas of the website, I'm just getting the login form again.
The code:
const string loginUri = "https://some.website/login";
var cookieContainer = new CookieContainer();
var clientHandler = new HttpClientHandler() { CookieContainer = cookieContainer, AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate };
var client = new HttpClient(clientHandler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
var loginRequest = new HttpRequestMessage(HttpMethod.Post, loginUri);
// These form values correspond with the values posted by the browser
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("customercode", "password"),
new KeyValuePair<string, string>("customerid", "username"),
new KeyValuePair<string, string>("HandleForm", "Login")
});
loginRequest.Content = formContent;
loginRequest.Headers.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393");
loginRequest.Headers.Referrer = new Uri("https://some.website/Login?ReturnUrl=%2f");
loginRequest.Headers.Host = "some.website";
loginRequest.Headers.Connection.Add("Keep-Alive");
loginRequest.Headers.CacheControl = new System.Net.Http.Headers.CacheControlHeaderValue() { NoCache = true };
loginRequest.Headers.AcceptLanguage.ParseAdd("nl-NL");
loginRequest.Headers.AcceptEncoding.ParseAdd("gzip, deflate");
loginRequest.Headers.Accept.ParseAdd("text/html, application/xhtml+xml, image/jxr, */*");
var response = await client.SendAsync(loginRequest);
var responseString = await response.Content.ReadAsStringAsync();
var cookies = cookieContainer.GetCookies(new Uri(loginUri));
When using the proper credentials, cookies contains three items, including a .ASPXAUTH cookie and a session id, which suggests that the login succeeded. However:
var text = await client.GetStringAsync("https://some.website/secureaction");
...this just returns the login form again, and not the content I get when I log in using the browser and navigate to /secureaction.
What am I missing?
EDIT: here's the complete request my application is making and the request chrome is making. They are identical, save for the cookie values. I ran them through windiff: the lines marked <! are the lines sent by my application, the ones marked !> are sent by Chrome.
GET https://some.website/secureaction
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept-Encoding: gzip, deflate, sdch, br
Upgrade-Insecure-Requests: 1
Host: some.website
Accept-Language:nl-NL,
>> nl;q=0.8,en-US;q=0.6,en;q=0.4
Accept: text/html,
>> application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Cookie:
<! customCookie=7CF190C0;
<! .ASPXAUTH=37D61E47(shortened for readability);
<! ASP.NET_SessionId=oqwmfwahpvf0qzpiextx0wtb
!> ASP.NET_SessionId=kn4t4rmeu2lfrgozjjga0z2j;
!> customCookie=8D43E263;
!> .ASPXAUTH=C2477BA1(shortened for readability)
The HttpClient application get a 302 referral to /login, Chrome gets a 200 response containing the requested page.
As requested, here's how I eventually made it work. I had to do a simple GET request to /login first, and then do a POST with the login credentials. I don't recall what value exactly is being set by that GET (I assume a cookie with some encoded value the server wants), but the HttpClient takes care of the cookies anyway, so it just works. Here's the final, working code:
const string loginUri = "https://some.website/login";
var cookieContainer = new CookieContainer();
var clientHandler = new HttpClientHandler()
{
CookieContainer = cookieContainer,
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
var client = new HttpClient(clientHandler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
// First do a GET to the login page, allowing the server to set certain
// required cookie values.
var initialGetRequest = new HttpRequestMessage(HttpMethod.GET, loginUri);
await client.SendAsync(initialGetRequest);
var loginRequest = new HttpRequestMessage(HttpMethod.Post, loginUri);
// These form values correspond with the values posted by the browser
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("customercode", "password"),
new KeyValuePair<string, string>("customerid", "username"),
new KeyValuePair<string, string>("HandleForm", "Login")
});
loginRequest.Content = formContent;
loginRequest.Headers.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393");
loginRequest.Headers.Referrer = new Uri("https://some.website/Login?ReturnUrl=%2f");
loginRequest.Headers.Host = "some.website";
loginRequest.Headers.Connection.Add("Keep-Alive");
loginRequest.Headers.CacheControl = new System.Net.Http.Headers.CacheControlHeaderValue() { NoCache = true };
loginRequest.Headers.AcceptLanguage.ParseAdd("nl-NL");
loginRequest.Headers.AcceptEncoding.ParseAdd("gzip, deflate");
loginRequest.Headers.Accept.ParseAdd("text/html, application/xhtml+xml, image/jxr, */*");
var response = await client.SendAsync(loginRequest);
var responseString = await response.Content.ReadAsStringAsync();
var cookies = cookieContainer.GetCookies(new Uri(loginUri));
Been stuck here for a while now.
Found few links on SO for this but didn't work for me...
In the answer the code is given without comments and since I'm doing this first time I didn't get it...
And couldn't get it to work, gives me 403 forbiden error.
C# download file from the web with login
C# https login and download file
http://codesimplified.blogspot.hr/2013/11/asynchronous-file-download-from-web.html
This is my code (part is copy-paste, did some research so I put the comments as my way of thinking -> not sure if they are correct)
private void button_Click(object sender, RoutedEventArgs e)
{
logtsk.Start(); // logtsk = new Task(Login) first time using async methods too (did research, probs there's a better way)
}
private async void Login()
{
using (var handler = new HttpClientHandler()) //handler is used for extra options and custom stuff to use with client
{
var request = new HttpRequestMessage(HttpMethod.Post, "https://somesite.com/login");
request.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)"); //read this can help with 403 errors
request.Headers.Add("Accept", "html/txt"); //same, can fix 403
var CookieJar = new CookieContainer(); //I store cookies from the login request here
handler.CookieContainer = CookieJar; //bind it to the handler
var hc = new HttpClient(handler); //create client
var byteArray = new UTF8Encoding().GetBytes("<user.name#gmail.com>:<password123>"); //creates bytes to send from user:pass pair I guess
hc.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic", Convert.ToBase64String(byteArray)); //I have no idea what this does... <copied>
var formData = new List<KeyValuePair<string, string>>(); //don't know why use this after creating the data in the 2 lines above pair O_o... This code is copied...
formData.Add(new KeyValuePair<string, string>("username", "user.name#gmail.com")); // don't know
formData.Add(new KeyValuePair<string, string>("password", "password123")); //...
formData.Add(new KeyValuePair<string, string>("scope", "all")); //nope....
request.Content = new FormUrlEncodedContent(formData); //creates data again I guess
var response = await hc.SendAsync(request); //sends request
MessageBox.Show(response.ToString()); //debug... I like msgbox
using (FileStream fileStream = new FileStream("c:\\table.xls", FileMode.Create, FileAccess.Write, FileShare.None))
{
//copy the content from response to filestream
var responseFile = await hc.GetAsync("https://somesite.com/subtab/table.xls");
await responseFile.Content.CopyToAsync(fileStream); //response is gotten by "hc" which has cookie stored so it should be authed and download right?
}
}
}
The code is bad I know that. It got mashed up too I guess, but exceptions thrown are too generic and contain no info. The code now throws errors in HttpClient requests (when sending request) and if I get it to work (don't ask how) it gives 403
Can someone please write it out the way it's supposed to look like /work like with comments and so I can finally understand how to think in HTTP. I want to do it with HttpClient but I'm fine with any other way if explained well. Thank youu!
I using Winrt, I try to parse a HTML Page for Results.
But to get the result, I must fill out a search page and hit the submit button.
Is that possible to do that by code in Win RT?
If you find your button using WinJS query, you can programatically fire the click event like this:
element.fireEvent("onclick");
I guess you haven't downloaded the page yet (or displayed in a WebView). To make a request have a closer look at HttpClient and HttpClientHandler. Depending on whether the page uses GET or POST you will need to create a HttpRequestMessage additionally. Search for the url of the form (most often the form's action attribute) to know your request uri.
Example:
var ClientHandler = new HttpClientHandler();
ClientHandler.UseCookies = true;
ClientHandler.AllowAutoRedirect = true;
ClientHandler.UseDefaultCredentials = true;
ClientHandler.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
var Client = new HttpClient(ClientHandler);
Client.DefaultRequestHeaders.Add("Accept", "text/html, application/xhtml+xml, */*");
Client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)");
var Response = await Client.GetAsync(RequestUri);
Your RequestUri could be something like http://www.example.com/search?query=search. But if the page you want uses POST to submit your query I think you need to create a HttpRequestMessage as below:
var RequestMessage = new HttpRequestMessage();
RequestMessage.Content = new StringContent(YourPostData, Encoding.UTF8, "application/x-www-form-urlencoded");
RequestMessage.Method = HttpMethod.Post;
RequestMessage.RequestUri = new Uri(OtherRequestUri);
Response = await Client.SendAsync(RequestMessage);
To parse the content of the response you best use the HtmlAgilityPack I think.