Get html code into string C# when need account authorization - c#

I want to download webpage code (plug.dj) and paste it into string. It is not hard but when I test my program an error appeared.
The remote server returned an error: (401) Unauthorized.
I think I cannot download code because I am not logged on this website. I tried add credentials into my code but I have no idea how it should looks like. User can log in with Google, Facebook or Twitter.
My code:
WebRequest request = WebRequest.Create("http://plug.dj/drum-bass/");
request.Credentials = new NetworkCredential (**Here should be username**, **Password**);
request.Method = "GET";
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string content = reader.ReadToEnd();
reader.Close();
response.Close();
Know someone how to solve this?
I got idea how to solve it but I dont know if it is realisable. Maybe could this program using data of web browser and get this information by this way.

As long as you only want to download a public/anonymous-login page of this website, I can't see any reason why you need to pass user-credentials or get 401 error code - no authentication in needed.
I just tried it at my end and it is working as desired:
using System.Net;
//...
using (WebClient client = new WebClient ()) // WebClient class inherits IDisposable
{
string htmlCode = client.DownloadString("http://plug.dj");
}
Results:
<!doctype html><!--[if lt IE 7 ]><html lang="en" class="no-js ie6"><![endif]--><!--[if IE 7 ]><html lang="en" class="no-js ie7"><![endif]--><!--[if IE 8 ]><html lang="en" class="no-js ie8"><![endif]--><!--[if (gte IE 9)|!(IE)]><!--><html class="no-js" lang="en"><!--<![endif]--><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><title>plug.dj – </title><link rel="icon" ...

Related

403 forbidden when posting to a url

I am posting a request to a website:
request = (HttpWebRequest)WebRequest.Create("https://www.footlocker.dk/api/users/carts/current/entries?timestamp=1611595223668");
request.Method = "POST";
using (var streamWriter = new StreamWriter(request.GetRequestStream()))
{
string json = "{\"user\":\"test\"," + "\"password\":\"bla\"}";
streamWriter.Write(json);
}
var httpResponse = (HttpWebResponse)request.GetResponse();
using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
{
var result = streamReader.ReadToEnd();
}
When I submit this request, I am getting a 403 forbidden, with following html:
<html>
<head>
<title>footlocker.dk</title>
<style>
#cmsg{animation: A 1.5s;}#keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}
</style>
</head>
<body style="margin:0">
<p id="cmsg">Please enable JS and disable any ad blocker</p>
<script>
var dd={'cid':'AHrlqAAAAAMA2k9UvgFgVkIAk04eSQ==','hsh':'A55FBF4311ED6F1BF9911EB71931D5','t':'fe','r':'b','s':17434,'host':'geo.captcha-delivery.com'}</script><script src="https://ct.captcha-delivery.com/c.js">
</script>
</body>
</html>
Are there anyway I can make the browser think that JS is enabled?
Why are you trying to make this request? As per the Footlocker Terms of Service,
You may not without the prior written permission of Foot Locker, use any computer code, data mining software, "robot", "bot", "spider", "scraper" or other automatic device, or program, algorithm or methodology having similar processes or functionality, or any manual process, to monitor or copy any of the web pages, data or content found on this Site or App, or accessed through this Site or App.
I'm assuming you're attempting to perform unauthorized scraping/monitoring of this site, and I'd highly advise you stop as that's against the aforementioned terms and conditions.
Maybe try to specify a user-agent.

Error 405 Method not Allowed on WebRequest

I am trying to grab the page code from the below page. It gives me a 405 error. If I try to get the page code from the home page it works fine but from this specific page i get Method not allowed, thoughts?
WebRequest request = WebRequest.Create("https://www.realtor.com/realestateandhomes-search/California/counties");
request.UseDefaultCredentials = true;
request.Credentials = CredentialCache.DefaultCredentials;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
string responseFromServer = reader.ReadToEnd();
Console.WriteLine(responseFromServer);
The site thinks you are a bot.
Details:
I tried it with HttpClient (recommended: doesn't throw an exception upon receiving a non-200 response code), and inspected the response HTML. Here is the important snipit:
<p>
As you were browsing, something about your browser made us think you might be a bot. There are a few reasons this might happen, including:
</p>
<ul>
<li>You're a power user moving through this website with super-human speed</li>
<li>You've disabled JavaScript and/or cookies in your web browser</li>
<li>A third-party browser plugin is preventing JavaScript from running. Additional information is available in this
<a title='Third party browser plugins that block javascript' href='http://ds.tl/help-third-party-plugins' target='_blank'>
support article
</a>.
</li>
</ul>
If you want the full response, try running this:
async void LogResponse()
{
using System.Net.Http.HttpClient client = new System.Net.Http.HttpClient();
var response = await client.GetAsync("https://www.realtor.com/realestateandhomes-search/California/counties");
Console.WriteLine(await response.Content.ReadAsStringAsync());
}
Side complaint against realtor.com, 405 (The method specified in the Request-Line is not allowed) is a rather poor response code for this; a 403 (The server understood the request, but is refusing to fulfill it.) seems better suited.

Web Server Does Not Allow Using Post Method

I want to connect a website with my user id and password and get my datas from website and store them in a text file, but I get error 405 that Method Not Allowed. Can somebody help me to figure out this?
Here is the html code of webserver:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>blablbablablabl</title>
</head>
<script type="text/javascript">
function login() {
setTimeout('window.close()',1000);
}
</script>
<body>
<div><h3>blablablaasdasd</h3><form onSubmit="javascript:login();" style='margin- top:10px;' id='loginPageForm' action='http://website.com' method='post' target='_blank'
<div>
<input name='t:ac' type='hidden' value='$002f$002website.com$002fclient$002fdefault$002fsearch$002faccount$003f' />
<input name='t:formdata' type='hidden' value='H4sIAAAAAAAAAJWQv0oDQRDGx4NAMJ1gEURstN2zMI02BkEQDgkc1mFvb7xs2Ntdd/ZMbKx8CRufQKz0CVLY+Q4+gI2FlYV7J6Lg/274mJnv932XD9CarMAyIXdiFA+4d0YnppB6czysCJ3mJZKDnnEF45aLETLPLZJ3Jz0mjEMlM5ZxQtbPgsiF35Wo8tUUfWXXDmad+8Xb5wjmEugIo8N3tR8+elhIxvyYx4rrIk69k7rYmloP8++uf8Hq/xdr4IxAorTKSkkkjZ5d5RuHTxd3EcDUfmtpOdHEuJyO4BSgwXyTfr2pT1qTJeh+sUU1hw9Btn8MIkxpjUbtiTXk/nOO8/Sxe3N9thNBlEBbKBm29xrvunpUWAahrr6R6qrbr+bD9Q/jCx9ggTUPAgAA' /></div>
<label for='identity'>Card Number:</label><div><input type='text' name='j_username' /</div>
<div style='clear:both;'></div>
<label for='password'>Password:</label>
<div><input name='j_password' type='password' class='pass' value='' /><input type='submit' value='Login' /></div></form></div>
</body>
</html>
Here is the C# code that I am trying to reach server.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://website.com/file.html");
request.AllowAutoRedirect = true;
request.Timeout = 10000; // timeout 10s
request.Method = "POST";
String formContent = "t:ac=$002f$002website.com$002fclient$002fdefault$002fsearch$002faccount$003f&t:formdata=H4sIAAAAAAAAAJWQv0oDQRDGx4NAMJ1gEURstN2zMI02BkEQDgkc1mFvb7xs2Ntdd/ZMbKx8CRufQKz0CVLY+Q4+gI2FlYV7J6Lg/274mJnv932XD9CarMAyIXdiFA+4d0YnppB6czysCJ3mJZKDnnEF45aLETLPLZJ3Jz0mjEMlM5ZxQtbPgsiF35Wo8tUUfWXXDmad+8Xb5wjmEugIo8N3tR8+elhIxvyYx4rrIk69k7rYmloP8++uf8Hq/xdr4IxAorTKSkkkjZ5d5RuHTxd3EcDUfmtpOdHEuJyO4BSgwXyTfr2pT1qTJeh+sUU1hw9Btn8MIkxpjUbtiTXk/nOO8/Sxe3N9thNBlEBbKBm29xrvunpUWAahrr6R6qrbr+bD9Q/jCx9ggTUPAgAA&j_username=johndoe0&j_password=12345";
byte[] byteArray = Encoding.UTF8.GetBytes(formContent);
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = byteArray.Length;
Stream dataStream = request.GetRequestStream();
dataStream.Write(byteArray, 0, byteArray.Length);
dataStream.Close();
// Get the response ...
WebResponse response;
response = (HttpWebResponse)request.GetResponse();//ERROR OCCURS HERE!!!
dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
richTextBox1.AppendText(HttpUtility.UrlDecode(reader.ReadToEnd()));
reader.Close();
dataStream.Close();
response.Close();
EDIT: Problem solved, found another URL in that website that allows POST method.
#GSiry's solution is probably the way to go if you control the server you fetch data from.
Otherwise, the issue is about adjusting your request to whatever HTTP method the remote server accepts: Method Not Allowed is supposed to mean that server won't take some particular methods while accepting others, and for good reasons. See more on request safety and idempotence.
What happens if you use GET instead of POST?
EDIT: Assuming you are really POSTing to the same web URL from both the HTML form and your C# request (which does not seem to be the case anyway), the reason why it behaves differently is not obvious and is in fact server implementation-dependent. Which means we can only do guesswork (for example, it might not like the user agent it gets (or lack thereof) from your C# code.
Anyway, I stand by the advice of using GET. There seems to be no reason at all to issue a POST request, since you don't intend to modify website.com/file.html, which is the stated purpose of POST method.
EDIT2: Its not necessary to use POST for a login per se. HTTP authentication can be performed through form parameters, through HTTP request headers or through the own authoritative part of the domain name (http://username:password#website.com/your_file.html). But this depends exclusively on the concrete server implementation.
If you can't access the server logs, I'm afraid you're in for some trial-and-error session. Start by mimicking the browser's request exactly. Firebug, Chrome's or Safari's developer console will be your friends to see exactly what headers are being passed along with the browser request so that the POST method is allowed.
On a side note, what you should be using for authentication procedure is SSL/TLS (https://...)
If you are using MVC, it might be as simple as adding the
[HttpPost]
attribute to the controller function that accepts your post request
If you're trying to access a WebService add following section to target's site Web.config under System.Web:
<webServices>
<protocols>
<add name="HttpPost"/>
</protocols>
</webServices>

in C#, how can I get the HTML content of a website before displaying it?

I have a web browser project in C#, I am thinking such system; when user writes the url then clicks "go" button, my browser get content of written web site ( it shouldn't visit that page, I mean it shouldn't display anything), then I want look for a specific "keyword" for ex; "violence", if there exists, I can navigate that browser to a local page that has a warning. Shortly, in C#, How can I get content of a web site before visiting?...
Sorry for my english,
Thanks in advance!
System.Net.WebClient:
string url = "http://www.google.com";
System.Net.WebClient wc = new System.Net.WebClient();
string html = wc.DownloadString(url);
You have to use WebRequest and WebResponse to load a site:
example:
string GetPageSource (string url)
{
HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(url);
webrequest.Method = "GET";
HttpWebResponse webResponse = (HttpWebResponse)webrequest.GetResponse();
string responseHtml;
using (StreamReader responseStream = new StreamReader(webResponse.GetResponseStream()))
{
responseHtml = responseStream.ReadToEnd().Trim();
}
return responseHtml;
}
After that you can check the responseHtml for some Keywords... for example with RegEx.
You can make an HTTP request (via HttpClient to the site) and parse the results looking for the various keywords. Then you can make the decision whether or not to visibly 'navigate' the user there.
There's an HTTP client sample on Dev Center that may help.

C# 403 error because the file contains an inaccessible image? or what?

I'm trying to get a stream from a url:http://actueel.nl.pwc.com/site/syndicate.jsp but i get the 403 error. It doest requier login. I used fiddler to check why IE can open it while my code doesn't. What i got was that there were 2 connections done when opening the link in IE. 1 succeeded while the other got a 403. The 403 was a sublink to a giff image. Seems like the xml is a public file, but the image it contains is located in a inaccesible folder.
I need to know how to ignore the image so i can still get the rest of stream. this is my code to test it(by the way..i tryed with WeClient too and headers) :
try
{
WebRequest request = WebRequest.Create("http://actueel.nl.pwc.com/site/syndicate.jsp");
request.Credentials = CredentialCache.DefaultCredentials;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
MessageBox.Show(reader.ReadToEnd());
}
catch(Exception ex){
MessageBox.Show(ex.Message);
}
Thanks for your reactions ;)
I agree with Dmytro. The WebRequest is NOT attempting to download the gif image referenced in the jsp file, only the contents of the jsp itself is being downloaded. Try looking carefully (in Fiddler) at the IE request compared to yours - only the url but also all the request/response headers - and see if anything else is missing, such as cookies or ACCEPT headers.
Using Wireshark and wget, the differences were in the headers only.
The remote server requires User Agent and an Accept headers.
eg:
WebRequest request = WebRequest.Create("http://actueel.nl.pwc.com/site/syndicate.jsp");
((HttpWebRequest)request).UserAgent = "stackoverflow.com/q/4233673/111013";
((HttpWebRequest) request).Accept = "*/*";

Categories

Resources