WebClient DownloadFile with Authorization not working - c#

I have tried just about everything I can think of to get this to work, including several things I've found online. All I'm trying to do is download a file (which has a direct link) from a website that I have to log in to.
I tried doing the following, with the "UploadValues":
WebClient myWebClient = new WebClient();
NameValueCollection myNameValueCollection = new NameValueCollection();
myNameValueCollection.Add("username", this.UserName);
myNameValueCollection.Add("password", this.Password);
byte[] responseArray = myWebClient.UploadValues(felony, myNameValueCollection);
myWebClient.DownloadFile(felony, localfelony);
and I've also tried putting the login info in the headers as well. I've also tried just setting the credentials, as you can see from the commented code:
WebClient client = new WebClient();
//client.UseDefaultCredentials = false;
//client.Credentials = new NetworkCredential(this.UserName, this.Password);
client.Headers.Add(HttpRequestHeader.Authorization, "Basic " + Convert.ToBase64String(Encoding.ASCII.GetBytes(this.UserName + ":" + this.Password)));
client.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36");
//client.Headers.Add(HttpRequestHeader.Cookie, this.webBrowser.Document.Cookie);
client.DownloadFile(felony, localfelony);
No matter what I try, the only thing I can get it to download is a file that ends up being the login page, as if it didn't accept the login info I passed.
I've looked at the headers and such, and I don't see anything out of the ordinary that would explain why this isn't working. Any ideas?

I could swear I had tried this before, but I guess maybe I had it just a little different or something. So it worked like this:
WebClient client = new WebClient();
client.UseDefaultCredentials = false;
client.Credentials = new NetworkCredential(this.UserName, this.Password);
client.Headers.Add(HttpRequestHeader.Cookie, "_gat=1; b46467afcb0b4bf5a47b2c6b22e3d284=mt84peq7u4r0bst72ejs5lb7p6; https://docs.stlucieclerk.com/=1,1; _ga=GA1.2.12049534.1467911267");
client.DownloadFile(webaddress, localname);
It was the cookie in the header that made it work. I thought I'd done that before, but maybe I did something involving a cookie that was different.

This seems to be a authentication/authorization issue.
There could be many reasons causing this like:
1) may be the authentication/authorization mechanism uses some kind of hash.
2) may be you are using the wrong kind of authentication mechanism ("Basic" as I can see).
3) may be you are getting authenticated but not authorized.
The best way to find the root cause is:
Use Fiddler.
Login using the UI page and try to download the file. While doing that capture the fiddler session. An there try to do the same with whatever code you have. Again capture the fiddler session. Compare the fiddler to find the difference.
Hope this helps.

Try temporarily changing the certificate validation:
System.Net.Security.RemoteCertificateValidationCallback r = System.Net.ServicePointManager.ServerCertificateValidationCallback;
System.Net.ServicePointManager.ServerCertificateValidationCallback =
delegate(object s, System.Security.Cryptography.X509Certificates.X509Certificate certificate, System.Security.Cryptography.X509Certificates.X509Chain chain, System.Net.Security.SslPolicyErrors sslPolicyErrors)
{ return true; };
//Do downloading here...
System.Net.ServicePointManager.ServerCertificateValidationCallback = r;
This would mean, however, that the webclient would accept any certificate, so see this post for more info.

Related

C# HttpClient gets new SessionID every request

The title explains it mostly. I have declared my HttpClient, HttpClientHandler, and CookieContainer as class variables.
private HttpClient client;
private HttpClientHandler handler;
private CookieContainer cookies;
Then in the form creation I initialize the variables like so
public FrmMain()
{
InitializeComponent();
handler = new HttpClientHandler();
cookies = new CookieContainer();
handler.AllowAutoRedirect = true;
handler.UseCookies = true;
handler.CookieContainer = cookies;
client = new HttpClient(handler);
client.DefaultRequestHeaders.Connection.Clear();
client.DefaultRequestHeaders.ConnectionClose = false;
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36");
}
Later on in the program, when I call the requests, I am able to log in to the url (in this case a device on my local net) just fine. As part of the troubleshooting for this, I started printing the cookie data to the console each time a request is made. When I initially log on, it gives me single cookie, a sessionID. Any subsequent requests that I make using the same client gives me a new sessionID. This causes my requests to get a return code of badRequest, most likely because it is trying to route me back to the login page. I know that I am successfully logging in with the first request because printing the response content gives me the HTML of the index page that I am redirected to upon a successful login. I've tested all the data I'm sending via Postman, where I'm able to do a log in request, then do whatever other requests I need without issue. The only difference between Postman and my program is that in my program I am getting a new sessionID for every request instead of it persisting. Anyone know why my cookies are not persisting despite the client handler, client, and cookie container all being declared in the class scope?
It turns out my issue was not a cookie issue. The cookie was supposed to change to a new sessionID after logging in. The sessionID never changes after that. HttpClient was saving the cookie persistently. The issue was with a hidden CSRFToken that I was submitting with the formdata. That changes with each request and while I was doing the steps to get the new one before each POST, I was not actually assigning it.
I'd like to thank Jonathan. If I hadn't jumped back in to try some tweaks to make sure the program was only loading the initializations once, I probably wouldn't be looking in the area where I was neglecting to assign the new CSRFToken.

Retrieve web page content like a browser

After I learned some things about differents technologies, I wanted to make a small project using UWP+NoSQL. I wanted to do a small UWP app that grabs the horoscope and display it on my raspberry Pi every morning.
So I took a WebClient, and I do the following:
WebClient client = new WebClient();
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
string downloadString = client.DownloadString("http://www.horoscope.com/us/horoscopes/general/horoscope-general-daily-today.aspx?sign=2");
But it seems that it detect that this request isn't coming from a browser, since the interesting part is not in the content(and when I check with the browser, it is in the initial HTML, according to fiddler).
I also tried with ScrapySharp but I got the same result. Any idea why?
(I've already done the UWP part, so I don't want to change the topic of my personal project just because it is detected as a "bot")
EDIT
It seems I wasn't clear enough. The issue is **not* that I'm unable to parse the HTML, the issue is that I don't receive expected HTML when using ScrapySharp/WebClient
EDIT2
Here is what I retrieve: http://pastebin.com/sXi4JJRG
And, I don't get(by example) the "Star ratings by domain" + the related images for each stars
You can read the entire content of the web page using the code snippet shown below:
internal static string ReadText(string Url, int TimeOutSec)
{
try
{
using (HttpClient _client = new HttpClient() { Timeout = TimeSpan.FromSeconds(TimeOutSec) })
{
_client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("text/html"));
using (HttpResponseMessage _responseMsg = _client.GetAsync(Url))
{
using (HttpContent content = _responseMsg.Content)
{
return content.ReadAsString();
}
}
}
}
catch { throw; }
}
Or in a simple way:
public static void DownloadString (string address)
{
WebClient client = new WebClient ();
string reply = client.DownloadString (address);
Console.WriteLine (reply);
}
(re: https://msdn.microsoft.com/en-us/library/fhd1f0sw(v=vs.110).aspx)
yes, WebClient won't give you expected result. many sites have scripts to load content. so to emulate browser you also should run page scripts.
I have never did similar things, so my answer pure theoretical.
To solve the problem you need "headless browser".
I know two project for this (I have never try ony of it):
http://webkitdotnet.sourceforge.net/ - it seems to be outdated
http://www.awesomium.com/
Ok, I think I know what's going on: I compared the real output (no fancy user agent strings) to the output as supplied by your pastebin and found something interesting. On line 213, your pastebin has:
<li class="dropdown"><a href="/us/profiles/zodiac/index-profile-zodiac-sign.aspx" class="dropdown-toggle" data-hov...ck">Forecast Tarot Readings</div>
Mind the data-hov...ck near the end. In the real output, this was:
<li class="dropdown">Astrology
followed by about 600 lines of code, including the aforementioned 'interesting part'. On line 814, it says:
<div class="bot-explore-col-subtitle f14 blocksubtitle black">Forecast Tarot Readings</div>
which, starting with the ck in black, matches up with the rest of the pastebin output. So, either pastebin has condensed the output or the original output was.
I created a new console application, inserted your code, and got the result I expected, including the 600 lines of html you seem to miss:
static void Main(string[] args)
{
WebClient client = new WebClient();
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
string downloadString = client.DownloadString("http://www.horoscope.com/us/horoscopes/general/horoscope-general-daily-today.aspx?sign=2");
File.WriteAllText(#"D:\Temp\source-mywebclient.html", downloadString);
}
My WebClient is from System.Net. And changing the UserAgent hardly has any effect, a couple of links are a bit different.
So, to sum it up: Your problem has nothing to do with content that is inserted dynamically after the initial get, but possibly with webclient combined with UWP. There's another question regarding webclient and UWP on the site: (UWP) WebClient and downloading data from URL in that states you should use HttpClient. Maybe that's a solution?
Some time ago I used http://www.nrecosite.com/phantomjs_wrapper_net.aspx it worked well, and as Anton mentioned it is a headless browser. Maybe it will be some help.
I'm wondering if all the 'interesting part' you expect to see 'in the content' are images? You are aware of the fact you have to retrieve any images separately? The fact that a html page contains <image.../> tags does not magically display them as well. As you can see with Fiddler, after retrieving a page, the browser then retrieves all images, style sheets, javascript and all other items that are specified, but not included in the page. (you might need to clear the browser cache to see this happen...)

Webclient 404 protocol error on valid url c#

I have a webclient that calls to a URL that works fine when i view it in a browser, which led me to believe i would need to add headers in to my call
I have done this, but am still getting the error.
I do have other calls to the same API that work fine, and have checked that all the parameters I am passing across are exactly the same as expected(case, spelling)
using (var wb = new WebClient())
{
wb.Proxy = proxy;
wb.Headers.Add("Accept-Language", " en-US");
wb.Headers.Add("Accept", " text/html, application/xhtml+xml, */*");
wb.Headers.Add("User-Agent", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)");
byte[] response = wb.UploadValues("http://myserver/api/account/GetUser",
new NameValueCollection()
{
{ "email", register.Email },
});
userDetails = Encoding.UTF8.GetString(response);
}
Does anyone have an idea why I am still getting the protocol error on a call that works perfectly fine in a browser?
UploadValue uses a HTTP POST. Are you sure that it what you want? If you are viewing it in a browser it is likely a GET, unless you are filling out some sort of web form.
One might surmise that what you are trying to do is GET this response "http://myserver/api/account/GetUser?email=blah#blah.com"
in which case you would just formulate that url, with query parameters, and execute a GET using one of the DownloadString overloads.
using (var wb = new WebClient())
{
wb.Proxy = proxy;
userDetails = wb.DownloadString("http://myserver/api/account/GetUser?email=" + register.Email);
}
The Wikipedia article on REST has a nice table that outlines the semantics of each HTTP verb, which may help choosing the appropriate WebClient method to use for your use cases.

C# WebRequest returning 401

There is a web file within my intranet that my computer is authorized to read and write. I can open up IE or Firefox and view the file by typing int the url address. I need to write a C# desktop app that reads/writes to that file. Even though my computer has access, all my attempts so far result in 401, unauthorized access errors. The program needs to work from any computer whose account has been authorized, so I cannot hard-code any username/password. I've never done anything like this, but I was able to scrounge the following from several sites:
WebRequest objRequest = HttpWebRequest.Create("https://site.com/file");
objRequest.Credentials = CredentialCache.DefaultNetworkCredentials;
objRequest.Proxy = WebRequest.DefaultWebProxy;
objRequest.Proxy.Credentials = CredentialCache.DefaultCredentials;
WebResponse objResponse = (WebResponse)objRequest.GetResponse();
using (StreamReader sr = new StreamReader(objResponse.GetResponseStream()))
{
string str = sr.ReadToEnd();
sr.Close();
//... Do stuff with str
}
If it matters, I'm working in .NET 2.0
Just ran into the same problem, it all started working when I added:
objRequest.UseDefaultCredentials = true;
Did you try using Fiddler to inspect the actual request that was sent to the server?
You can also check if the server requires a client certificate to allow the connection.
Since you are accessing an intranet server, do you really need to set the proxy part? I mean most of the time, the proxy is configured to ignore local addresses anyway.
This won't work if NTLM credentials are required:
objRequest.Credentials = CredentialCache.DefaultNetworkCredentials;
You need to pass in the actual credentials like:
NetworkCredential networkCredential = new NetworkCredential(UserName, Password, Domain);
CredentialCache credCache = new CredentialCache();
credCache.Add(new Uri(url), "NTLM", networkCredential);
objRequest.Proxy.Credentials = credCache;

how to get the source code as register user

i downloaded a sourcecode of a site,but i downloaded it i saw it identify my program as a guest,i search at google and figure out that i can send a cookie when i "ask" the source code.
that what i have managed to do and it still dont identify me as register user:
CookieContainer cj = new CookieContainer();
string all = "";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(Url);
req.CookieContainer = cj;
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
CookieCollection cs=cj.GetCookies(req.RequestUri);
CookieContainer cc = new CookieContainer();
cc.Add(cs);
req.CookieContainer = cc;
StreamReader read = new StreamReader(res.GetResponseStream());
all = read.ReadToEnd();
read.Close();
return all;
what is wrong here?
tyvm for help:)
(if that help,i can have a simple details of a register user of the site).
You would have to use the cookie that the server left behind in your cookie cache that identified you as a authenticated user in a previous session. You'll need to use the Cookie(name, value) constructor. Getting the value is the tricky part, look through your cookie cache to see if you can find one. It is still going to fail if the server expires the cookie.
Using a tool that lets you look at the HTTP headers and cookie values is important to debug this. Firebug is very nice.

Categories

Resources