How to get the content of web page? [duplicate] - c#

This question already exists:
Closed 11 years ago.
Possible Duplicate:
Reading web page by sending username & password?
My problem is this. There is a site that has data which is frequently updated that I would like to get at regular intervals for later reporting.
for getting that data i have to provide the userid & password.
I have used HttpWebRequest to get data but the problem is that response text returns "Your browser doesn't support frame" instead of the data i want.
how can i get it?

Most likely you are having this problem because you are not setting the user-agent in your request, i.e. with a WebClient:
using(WebClient wc = new WebClient())
{
wc.Headers.Add("user-agent", "Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");
string htmlResult = wc.DownloadString(someUrl);
}

You can make use WebBrowser control to solve your problem. This approach works like this, First, you have to load the specific webpage on to the WebBrowser Control, then once the document has been loaded or not . If loaded then you can retrieve the web page stream using DocumentStream property.
Hope this helps.

Related

Sending a POST request to a URL from C# [duplicate]

This question already has answers here:
Send HTTP POST request in .NET
(16 answers)
Closed 5 years ago.
I have a Node server hosted with Azure, where I can send a POST request to the API for it to perform some function. The API itself works, I have tested it with Post Man.
A call to the API would look something like this..
http://website.com/api/Foo?name=bar&second=example
This doesn't necessarily need to return anything, as the call is silent and does something in the background. (note: perhaps it must return something and this is a hole in my understanding of the concept?)
Using C#, how can I make a web request to this URL?
I am already constructing the URL based on parameters passed to my method (so name and type as above could be whatever was passed to the method)
It's the POSTing to this URL that I cannot get working correctly.
This is the code I have tried..
void MakeCall(string name, string second)
{
string url = "http://website.com/api/Foo?name="+name+"&second="+second;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "POST";
request.ContentType = "application/json";
request.ContentLength = url.Length;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
}
You have to create a request stream and write to it, this link here has several ways to do this either with HttpWebRequest, HttpClient or using 3rd party libraries:
Posting data using C#

Retrieve web page content like a browser

After I learned some things about differents technologies, I wanted to make a small project using UWP+NoSQL. I wanted to do a small UWP app that grabs the horoscope and display it on my raspberry Pi every morning.
So I took a WebClient, and I do the following:
WebClient client = new WebClient();
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
string downloadString = client.DownloadString("http://www.horoscope.com/us/horoscopes/general/horoscope-general-daily-today.aspx?sign=2");
But it seems that it detect that this request isn't coming from a browser, since the interesting part is not in the content(and when I check with the browser, it is in the initial HTML, according to fiddler).
I also tried with ScrapySharp but I got the same result. Any idea why?
(I've already done the UWP part, so I don't want to change the topic of my personal project just because it is detected as a "bot")
EDIT
It seems I wasn't clear enough. The issue is **not* that I'm unable to parse the HTML, the issue is that I don't receive expected HTML when using ScrapySharp/WebClient
EDIT2
Here is what I retrieve: http://pastebin.com/sXi4JJRG
And, I don't get(by example) the "Star ratings by domain" + the related images for each stars
You can read the entire content of the web page using the code snippet shown below:
internal static string ReadText(string Url, int TimeOutSec)
{
try
{
using (HttpClient _client = new HttpClient() { Timeout = TimeSpan.FromSeconds(TimeOutSec) })
{
_client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("text/html"));
using (HttpResponseMessage _responseMsg = _client.GetAsync(Url))
{
using (HttpContent content = _responseMsg.Content)
{
return content.ReadAsString();
}
}
}
}
catch { throw; }
}
Or in a simple way:
public static void DownloadString (string address)
{
WebClient client = new WebClient ();
string reply = client.DownloadString (address);
Console.WriteLine (reply);
}
(re: https://msdn.microsoft.com/en-us/library/fhd1f0sw(v=vs.110).aspx)
yes, WebClient won't give you expected result. many sites have scripts to load content. so to emulate browser you also should run page scripts.
I have never did similar things, so my answer pure theoretical.
To solve the problem you need "headless browser".
I know two project for this (I have never try ony of it):
http://webkitdotnet.sourceforge.net/ - it seems to be outdated
http://www.awesomium.com/
Ok, I think I know what's going on: I compared the real output (no fancy user agent strings) to the output as supplied by your pastebin and found something interesting. On line 213, your pastebin has:
<li class="dropdown"><a href="/us/profiles/zodiac/index-profile-zodiac-sign.aspx" class="dropdown-toggle" data-hov...ck">Forecast Tarot Readings</div>
Mind the data-hov...ck near the end. In the real output, this was:
<li class="dropdown">Astrology
followed by about 600 lines of code, including the aforementioned 'interesting part'. On line 814, it says:
<div class="bot-explore-col-subtitle f14 blocksubtitle black">Forecast Tarot Readings</div>
which, starting with the ck in black, matches up with the rest of the pastebin output. So, either pastebin has condensed the output or the original output was.
I created a new console application, inserted your code, and got the result I expected, including the 600 lines of html you seem to miss:
static void Main(string[] args)
{
WebClient client = new WebClient();
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
string downloadString = client.DownloadString("http://www.horoscope.com/us/horoscopes/general/horoscope-general-daily-today.aspx?sign=2");
File.WriteAllText(#"D:\Temp\source-mywebclient.html", downloadString);
}
My WebClient is from System.Net. And changing the UserAgent hardly has any effect, a couple of links are a bit different.
So, to sum it up: Your problem has nothing to do with content that is inserted dynamically after the initial get, but possibly with webclient combined with UWP. There's another question regarding webclient and UWP on the site: (UWP) WebClient and downloading data from URL in that states you should use HttpClient. Maybe that's a solution?
Some time ago I used http://www.nrecosite.com/phantomjs_wrapper_net.aspx it worked well, and as Anton mentioned it is a headless browser. Maybe it will be some help.
I'm wondering if all the 'interesting part' you expect to see 'in the content' are images? You are aware of the fact you have to retrieve any images separately? The fact that a html page contains <image.../> tags does not magically display them as well. As you can see with Fiddler, after retrieving a page, the browser then retrieves all images, style sheets, javascript and all other items that are specified, but not included in the page. (you might need to clear the browser cache to see this happen...)

Make a request to StackExchange Api [duplicate]

This question already has answers here:
.NET: Is it possible to get HttpWebRequest to automatically decompress gzip'd responses?
(3 answers)
Closed 7 years ago.
I'm trying to request list of tags on StackExchange in JSON format by url, but problem is, that I'm getting some broken text instead of JSON, so I can't even parse it.
P.S. Done it with the help of RestSharp.
private void Refresh()
{
var client = new RestClient("http://api.stackexchange.com/2.2/tags?order=desc&sort=popular&site=stackoverflow");
var result = client.Execute(new RestRequest(Method.GET));
var array = JsonConvert.DeserializeObject<Root>(result.Content);
Platforms = array.Platforms;
}
If you make GET request to this URL using Fiddler, you will see that response has a header:
Content-Encoding: gzip
Which means that response is compressed with gzip. Good news is that HttpWebRequest can handle that:
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
After you add this row you will get nice and readable JSON.
As #peeskillet mentions, this looks like compressed data. Please have a look at
What is the canonical method for an HTTP client to instruct an HTTP server to disable gzip responses? and especially this answer.
Something like
Accept-Encoding: *;q=0
should help.

ASP Classic VBScript to ASP.NET C# Conversion

I am familiar with ASP.NET, but not with Visual Basic.
Here is the Visual Basic code:
myxml="http://api.ipinfodb.com/v3/ip-city/?key="&api_key&"&ip=" &UserIPAddress&"&format=xml"
set xml = server.CreateObject("MSXML2.DOMDocument.6.0")
xml.async = "false"
xml.resolveExternals = "false"
xml.setProperty "ServerHTTPRequest", true
xml.load(myxml)
response.write "<p><strong>First result</strong><br />"
for i=0 to 10
response.write xml.documentElement.childNodes(i).nodename & " : "
response.write xml.documentElement.childNodes(i).text & "<br/>"
NEXT
response.write "</p>"
What is going on in this code?
How can I convert this to ASP.NET (C#)?
Based on a quick glance at the site you linked to in a comment, it looks like the intended functionality is to make a request to a URL and receive the response. The first example given on that site is:
http://api.ipinfodb.com/v3/ip-city/?key=<your_api_key>&ip=74.125.45.100
You can probably use something like the System.Net.WebClient object to make an HTTP request and receive the response. The example on MSDN can be modified for your URL. Maybe something like this:
var client = new WebClient();
client.Headers.Add ("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
var data = client.OpenRead(#"http://api.ipinfodb.com/v3/ip-city/?key=<your_api_key>&ip=74.125.45.100");
var reader = new StreamReader(data);
var result = reader.ReadToEnd();
data.Close();
reader.Close();
(There's also the WebRequest class, which appears to share roughly the same functionality.)
At that point the result variable contains the response from the API. Which you can handle however you need to.
From the looks of the Visual Basic code, I think you should create two methods to "convert" this to an ASP.NET C# web page:
LoadXmlData method - use an XmlDocument to load from the URL via the XmlDocument's Load function. Read ASP.net load XML file from URL for an example.
BuildDisplay method - use an ASP.NET PlaceHolder or Panel to create a container to inject the paragraph tag and individual results into.

Get only response headers [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
HttpWebResponse: closing the stream
Using ASP.NET is it possible to make a request and get only response headers? I have to do a request to a big file, but I only need the response headers, i dont care about the content of the file.
I would like to know if there is something similar to get_headers from php (http://php.net/manual/en/function.get-headers.php).
I'm not sure how to do this natively, but at a minimum, you could use a custom HTTP handler (ASHX) file to create the headers you need, and return nothing else in the response.
Update:
If you set WebRequest.Method = "HEAD" then the server should automatically only return the Headers. This is according to W3.

Categories

Resources