Web site log in for data scraping - c#

I am attempting to web scrape date from my various remote transmitters. I have one brand of transmitter that I can log into with the following c# code:
public static string getSourceCode(string url, string user, string pass)
{
SecureString pw = new SecureString();
foreach (char c in pass.ToCharArray()) pw.AppendChar(c);
NetworkCredential credential = new NetworkCredential(user, pw, url);
CredentialCache cache = new CredentialCache();
cache.Add(new Uri(url), "Basic", credential);
Uri realLink = new Uri(url);
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(realLink);
req.Credentials = CredentialCache.DefaultNetworkCredentials;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
string sourceCode = sr.ReadToEnd();
sr.Close();
resp.Close();
return sourceCode;
}
The second brand of transmitter (I'm hesitant to put the url out in public) instead of returning a web page requesting username and password returns a box requesting username and password. using the above code just returns an unauthorized error.
Fiddler says the following is sent when I successfully login to the site:
GET http(colon slash slash)lasvegas3abn(*)dyndns(*)tv(PORT)125(slash)measurements(*)htm HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US
User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0; Touch)
Accept-Encoding: gzip, deflate
Host: lasvegas3abn.dyndns.tv:125
Authorization: Basic dXNlcjpsaW5lYXI=
Connection: Keep-Alive
DNT: 1
Any suggestions?

Instead of:
req.Credentials = CredentialCache.DefaultNetworkCredentials;
you can specify a credential that uses a specific username and password:
req.Credentials = new NetworkCredential("username", "password");
This should enable you to get through the login prompt (assuming that you specify the correct username and password).

Related

Content-Length or Chunked Encoding cannot be set for an operation that does not write data what should i do?

I try to access a specific page on the site and pull it out of information.
I did GET request to the homepage and I get response status code ==OK
then I do another GET request to the page that contains the Json I want to retrieve) and the response status code == OK.
Now I want to retrieve the information so I do get request for the resource (another URL that the last page load)
And I get the error at this line:
HttpWebResponse oHttpResponseIndicesApiUrl = (HttpWebResponse)oHttpRequestIndicesApiUrl.GetResponse();
"Content-Length or Chunked Encoding cannot be set for an operation
that does not write data"
I set all the headers Just like the get request inside the chrome Inspect -> Network Tab -> choose the URL that i wanna(there i can see the get request headers)
this is the code that i run:
HttpWebRequest oHttpRequestIndicesApiUrl = (HttpWebRequest)WebRequest.Create(sIndicesApiURL);
LOG.DebugFormat("{0}:calculateIndexSecurityWeights(), Create get request to '{0}'", Name, sIndicesApiURL);
oHttpRequestIndicesApiUrl.CookieContainer = new CookieContainer();
foreach (Cookie oCookie in oHttpResponseIndicesParmsUrl.Cookies)
{
oHttpRequestIndicesApiUrl.CookieContainer.Add(oCookie);
}
oHttpRequestIndicesApiUrl.AllowAutoRedirect = false;
oHttpRequestIndicesApiUrl.Accept = ("application/json, text/plain, */*");
oHttpRequestIndicesApiUrl.Headers.Add("accept-encoding", "gzip, deflate, br");
oHttpRequestIndicesApiUrl.Headers.Add("accept-language", "he-IL");
oHttpRequestIndicesApiUrl.KeepAlive = true;
oHttpRequestIndicesApiUrl.ContentLength = 120;
oHttpRequestIndicesApiUrl.ContentType = "application/json;charset=UTF-8";
oHttpRequestIndicesApiUrl.Host = "api.tase.co.il";
oHttpRequestIndicesApiUrl.Headers.Add("origin", "https://www.tase.co.il");
oHttpRequestIndicesApiUrl.Referer = sIndicesParamsURL;
oHttpRequestIndicesApiUrl.Headers.Add("sec-fetch-mode", "cors");
oHttpRequestIndicesApiUrl.Headers.Add("sec-fetch-site", "same-site");
oHttpRequestIndicesApiUrl.Headers.Add("upgrade-insecure-requests", "1");
oHttpRequestIndicesApiUrl.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36";
LOG.DebugFormat("{0}:calculateIndexSecurityWeights(), Set headers to '{1}'", Name, sIndicesApiURL);
HttpWebResponse oHttpResponseIndicesApiUrl = (HttpWebResponse)oHttpRequestIndicesApiUrl.GetResponse();
if (oHttpResponseIndicesApiUrl.StatusCode != HttpStatusCode.OK)
{
// response failed
throw new ApplicationException(string.Format("get response from url '{0}' failed, Status Code: '{1}', Status Description '{2}'", sIndicesApiURL, oHttpResponseIndicesApiUrl.StatusCode, oHttpResponseIndicesApiUrl.StatusDescription));
}
I can't understand why is it happening?

Why can't I use HttpClient to log in to this ASP.NET website?

There's an ASP.NET website from a third party that requires one to log on. I need to get some data from the website and parse it, so I figured I'd use HttpClient to post the necessary credentials to the website, same as the browser would do it. Then, after that POST request, I figured I'd be able to use the cookie values I received to make further request to the (authorization-only) urls.
I'm down to the point where I can successfully POST the credentials to the login url and receive three cookies: ASP.NET_SessionId, .ASPXAUTH, and a custom value used by the website itself, each with their own values. I figured that since the HttpClient I set up is using an HttpHandler that is using a CookieContainer, the cookies would be sent along with each further request, and I'd remain logged in.
However, this does not appear to be working. If I use the same HttpClient instance to then request one of the secured areas of the website, I'm just getting the login form again.
The code:
const string loginUri = "https://some.website/login";
var cookieContainer = new CookieContainer();
var clientHandler = new HttpClientHandler() { CookieContainer = cookieContainer, AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate };
var client = new HttpClient(clientHandler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
var loginRequest = new HttpRequestMessage(HttpMethod.Post, loginUri);
// These form values correspond with the values posted by the browser
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("customercode", "password"),
new KeyValuePair<string, string>("customerid", "username"),
new KeyValuePair<string, string>("HandleForm", "Login")
});
loginRequest.Content = formContent;
loginRequest.Headers.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393");
loginRequest.Headers.Referrer = new Uri("https://some.website/Login?ReturnUrl=%2f");
loginRequest.Headers.Host = "some.website";
loginRequest.Headers.Connection.Add("Keep-Alive");
loginRequest.Headers.CacheControl = new System.Net.Http.Headers.CacheControlHeaderValue() { NoCache = true };
loginRequest.Headers.AcceptLanguage.ParseAdd("nl-NL");
loginRequest.Headers.AcceptEncoding.ParseAdd("gzip, deflate");
loginRequest.Headers.Accept.ParseAdd("text/html, application/xhtml+xml, image/jxr, */*");
var response = await client.SendAsync(loginRequest);
var responseString = await response.Content.ReadAsStringAsync();
var cookies = cookieContainer.GetCookies(new Uri(loginUri));
When using the proper credentials, cookies contains three items, including a .ASPXAUTH cookie and a session id, which suggests that the login succeeded. However:
var text = await client.GetStringAsync("https://some.website/secureaction");
...this just returns the login form again, and not the content I get when I log in using the browser and navigate to /secureaction.
What am I missing?
EDIT: here's the complete request my application is making and the request chrome is making. They are identical, save for the cookie values. I ran them through windiff: the lines marked <! are the lines sent by my application, the ones marked !> are sent by Chrome.
GET https://some.website/secureaction
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept-Encoding: gzip, deflate, sdch, br
Upgrade-Insecure-Requests: 1
Host: some.website
Accept-Language:nl-NL,
>> nl;q=0.8,en-US;q=0.6,en;q=0.4
Accept: text/html,
>> application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Cookie:
<! customCookie=7CF190C0;
<! .ASPXAUTH=37D61E47(shortened for readability);
<! ASP.NET_SessionId=oqwmfwahpvf0qzpiextx0wtb
!> ASP.NET_SessionId=kn4t4rmeu2lfrgozjjga0z2j;
!> customCookie=8D43E263;
!> .ASPXAUTH=C2477BA1(shortened for readability)
The HttpClient application get a 302 referral to /login, Chrome gets a 200 response containing the requested page.
As requested, here's how I eventually made it work. I had to do a simple GET request to /login first, and then do a POST with the login credentials. I don't recall what value exactly is being set by that GET (I assume a cookie with some encoded value the server wants), but the HttpClient takes care of the cookies anyway, so it just works. Here's the final, working code:
const string loginUri = "https://some.website/login";
var cookieContainer = new CookieContainer();
var clientHandler = new HttpClientHandler()
{
CookieContainer = cookieContainer,
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
var client = new HttpClient(clientHandler);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
// First do a GET to the login page, allowing the server to set certain
// required cookie values.
var initialGetRequest = new HttpRequestMessage(HttpMethod.GET, loginUri);
await client.SendAsync(initialGetRequest);
var loginRequest = new HttpRequestMessage(HttpMethod.Post, loginUri);
// These form values correspond with the values posted by the browser
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("customercode", "password"),
new KeyValuePair<string, string>("customerid", "username"),
new KeyValuePair<string, string>("HandleForm", "Login")
});
loginRequest.Content = formContent;
loginRequest.Headers.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393");
loginRequest.Headers.Referrer = new Uri("https://some.website/Login?ReturnUrl=%2f");
loginRequest.Headers.Host = "some.website";
loginRequest.Headers.Connection.Add("Keep-Alive");
loginRequest.Headers.CacheControl = new System.Net.Http.Headers.CacheControlHeaderValue() { NoCache = true };
loginRequest.Headers.AcceptLanguage.ParseAdd("nl-NL");
loginRequest.Headers.AcceptEncoding.ParseAdd("gzip, deflate");
loginRequest.Headers.Accept.ParseAdd("text/html, application/xhtml+xml, image/jxr, */*");
var response = await client.SendAsync(loginRequest);
var responseString = await response.Content.ReadAsStringAsync();
var cookies = cookieContainer.GetCookies(new Uri(loginUri));

WebClient (403) Forbidden

I'm using Dropbox to store my files and when I use a direct link I get a HTTP redirect. But I'm able to get the redirect URI with...
var request = WebRequest.Create(MySQLData);
request.Method = "HEAD";
var response = request.GetResponse();
However I get "The remote server returned an error: (403) Forbidden." So then I added a User-Agent header, but it still returned the same error. I'm not sure what to try next.
WebClient wc = new WebClient();
wc.DownloadProgressChanged += new DownloadProgressChangedEventHandler(MySQLData_Check);
wc.DownloadFileCompleted += DownloadCompleted_MySQLData;
var request = WebRequest.Create(MySQLData);
request.Method = "HEAD";
var response = request.GetResponse();
wc.Headers.Add("User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)");
wc.DownloadFileAsync(new Uri(response.ResponseUri.ToString()), RootWindow_TextBox_SaveToDir.Text + "/" + "MySQLData");
Turns out it was because I was using ?raw=1, rather than ?dl=1 in the URI string.
For example...
string MyDownload = "https://dl.dropboxusercontent.com/u/********/MyFile.zip?dl=1";

c# webclient returns 403 forbidden

I am trying to emulate the process of accepting a trade offer in steam. I have asked steam support and the confirm that this action is allowed as long as I do not disrupt their service to other players.
So here is the details:
The URL for accepting a trade offer is https://steamcommunity.com/tradeoffer/OfferID/accept
Here is their ajax code for doing so
return $J.ajax(
{
url: 'https://steamcommunity.com/tradeoffer/' + nTradeOfferID + '/accept',
data: rgParams,
type: 'POST',
crossDomain: true,
xhrFields: { withCredentials: true }
}
Here is the headers i tracked using IE10
Request POST /tradeoffer/xxxxxxx/accept HTTP/1.1
Accept */*
Content-Type application/x-www-form-urlencoded; charset=UTF-8
Referer http://steamcommunity.com/tradeoffer/xxxxxxx/
Accept-Language en-CA
Origin http://steamcommunity.com
Accept-Encoding gzip, deflate
User-Agent Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)
Host steamcommunity.com
Content-Length 51
DNT 1
Connection Keep-Alive
Cache-Control no-cache
The post body:
sessionid=SESSIONID&tradeofferid=OfferID
Cookie:
Sent sessionid SessionID
Sent __utmc XXXXX
Sent steamLogin XXXXX
Sent webTradeEligibility XXXXXXX
Sent Steam_Language XXXXXXX
Sent timezoneOffset XXXXXXXX
Sent __utma XXXXXXXXXXXXX
Sent __utmz XXXXXXXXXXXXX
Sent steamMachineAuth XXXXXXXXXXXXX
Sent strInventoryLastContext XXXXXXXXX
Sent steamRememberLogin XXXXXXXXXXXX
Sent steamCC_XXXXXXXXXXXX XXXXXXX
Sent __utmb XXXXXXX
Sent tsTradeOffersLastRead XXXXXXX
the initiator of the request is XMLHttpRequest
In my code i did
public bool AcceptOffer(string offerID)
{
string path = "tradeoffer/" + offerID + "/";
//Simulate the browser opening the trade offer window
_steamWeb.Get(new Uri(WebAPI.SteamCommunity + path));
NameValueCollection data = new NameValueCollection();
data.Add("sessionid", _steamWeb.SessionID);
data.Add("tradeofferid", offerID);
string result = _steamWeb.Post(new Uri("https://steamcommunity.com/" + path + "accept"), data);
return true;
}
_steamWeb contains a cookie aware webclient which is used to do all the post/get requests
here are parts of the codes for the cookie aware webclient
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
if (request != null)
request.CookieContainer = _cookieContainer;
if (_lastPage != null)
request.Referer = _lastPage;
_lastPage = address.ToString();
return request;
}
protected override WebResponse GetWebResponse(WebRequest request)
{
WebResponse response = base.GetWebResponse(request);//403 exception here
ReadCookies(response);
return response;
}
here is the headers that i am setting
void SetCommonHeaders(Uri uri)
{
_webClient.Headers[HttpRequestHeader.Accept] = "text/html, application/xhtml+xml, */*";
_webClient.Headers[HttpRequestHeader.AcceptLanguage] = "en-CA";
_webClient.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded; charset=UTF-8";
_webClient.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)";
_webClient.Headers[HttpRequestHeader.Host] = uri.Host;
_webClient.Headers.Add("DNT", "1");
}
here are my cookie headers of the request i am sending
sessionid=XXXX;
steamMachineAuthXXXXX=XXXXXX;
steamLogin=XXXXXXX;
steamRememberLogin=XXXXXXXX;
Steam_Language=english;
webTradeEligibility=XXXXXXXXX;
steamCC_XXXXX=CA;
tsTradeOffersLastRead=XXXXXXXXX
I did not set those cookies manuelly, all of them are attened by GET requests to steamcommunity.com
I am pretty much sending the same request as the browser, but I am getting 403 Forbidden with my post. I have tried to set the X-Requested-With = XMLHttpRequest header but it is not helping. I see they are doing some credential thingy in the ajax call so am I suppose to do something too in my HttpWebRequest posts? Thanks
Problem solved, there are two things:
Keep alive header is not being send properly due to .NET bug
I encoded the sessionid twice

Log in to site programmatically and redirect browser to signed in state

I want to sign in to a site when a link is clicked and then redirect the browser there with a signed in session. Im having some troubles and here is what Ive tried:
First I get the session cookies from the login site:
CookieContainer cookies= new CookieContainer();
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create("http://someuri.com");
myHttpWebRequest.CookieContainer = cookies;
HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
myHttpWebResponse.Close();
Then I post to the sign in page to get signed in:
HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create("http://signInURL.com");
getRequest.CookieContainer = cookies;
getRequest.Method = WebRequestMethods.Http.Post;
getRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
getRequest.AllowWriteStreamBuffering = true;
getRequest.ProtocolVersion = HttpVersion.Version11;
getRequest.AllowAutoRedirect = true;
getRequest.ContentType = "application/x-www-form-urlencoded";
byte[] byteArray = Encoding.ASCII.GetBytes(PostParameterStringWithSignInInfo);
getRequest.ContentLength = byteArray.Length;
Stream newStream = getRequest.GetRequestStream();
newStream.Write(byteArray, 0, byteArray.Length);
newStream.Close();
HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
Then I figured I need to set the cookies to the client:
CookieCollection cooki = getRequest.CookieContainer.GetCookies(new Uri("http://someUri.com"));
for(int i = 0; i < cooki.Count; i++)
{
Cookie c = cooki[i];
Response.Cookies.Add(new HttpCookie(c.Name, c.Value));
}
And then redirect to where you end up being signed in:
Response.Redirect("http://URLwhenBeingSignedIn.com");
This doesnt work. When redirected Im still logged out.
Tried to do this with Fiddler and succeeded to sign in and get redirected:
Get the session cookies:
GET / HTTP/1.1
Content-type: application/x-www-form-urlencoded
Host: someuri.com
Post to the sign in page to get signed in:
POST /signIn HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Referer: http://someuri.com
Accept-Language: en-GB,en;q=0.7,tr;q=0.3
User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Content-Length: 90
DNT: 1
Host: signInURL.com
Pragma: no-cache
Cookie: JSESSIONID=fromBefore; Cookie2=fromBefore
PostParameterStringWithSignInInfo
Perhaps there's an easier way than the one I thought of now that you can see the fiddler requests that works, if so I'm happy to see it.

Categories

Resources