webclient or httpwebrequest to retrieve hrefs and url - c#

How do I use either webclient or httpwebrequest to do two things:
1)Say after downloading the resource as a string using:
var result = x.DownloadString("http://randomsite.com);
there's a relative url(also query string):
Click here to get your name and age
how do I click(follow) on that link using webclient? after initially loading the resource in result. i was able to use htmlagilitypack to isolate the href but I would now like to follow it in code.
2) If the httpwebrequest does not redirect but instead loads the same page with different parameters how would i use webclient to retrieve the new url that is generated?
i.e if i call
var result = x.DownloadString("http://randomsite.com);
but this actually calls
http://randomsite.com/q?site=default
I then want to retrieve the second url
Thanks in advance

You can construct the url from the link and the link that you just downloaded like this:
Uri baseUri = new Uri("http://randomsite.com");
Uri myUri = new Uri(baseUri, "/q?name=john&age=50");
Console.WriteLine(myUri.ToString()); // gives you http://randomsite.com/q?name=john&age=50
This also works if you base Url has url parameters.
As for the second question, i guess you meant that the request was redirected and you want that url instead? Then the easiest way to do so is to sub-class WebClient described here.
Uri baseUri = new Uri("http://randomsite.com");
using(var client=new WebClient())
{
var result = client.DownloadString(myUri);
//get href via HtmlAgilityPack...
Uri myUri = new Uri(baseUri, "/q?name=john&age=50");
result = client.DownloadString(myUri);
}

Related

Webclient does not understand blob:http uri

I am trying to download data from a blob:http link problem is the webclient complains about the url formating. blob:http://localhost/7420f6fc-9c83-43a3-aa53-4a68ebec9518 this format is not know to webclient is there another way to download this data without using Azure calls?
using (var client = new WebClient())
{
//NotSupportedException: The URI prefix is not recognized.
var model = client.DownloadData(new Uri("blob:http://localhost/7420f6fc-9c83-43a3-aa53-4a68ebec9518"));
//Also tried
var model = client.DownloadData("blob:http://localhost/7420f6fc-9c83-43a3-aa53-4a68ebec9518");
}

Uri class doesn't handle the protocol-relative URL

*EDIT: This doesn't happen on Windows but on Mono 4.2.2 Linux (C# Online Compiler).
I want to parse the protocol-relative URL and get the host name etc. For now I insert "http:" to the head before processing it since C# Uri class couldn't handle a protocol-relative URL. Could you tell me if there's any better way or any good library?
// Protocol-relative URL
var uriString = "//www.example.com/bluh/bluh.css";
var uri = new Uri(uriString);
Console.WriteLine(uriString); // "//www.example.com/bluh/bluh.css"
Console.WriteLine(uri.Host); // "Empty" string
// Absolute URL
var fixUriString = uriString.StartsWith("//") ? "http:" + uriString : uriString;
var fixUri = new Uri(fixUriString);
Console.WriteLine(fixUriString); // "http://www.example.com/bluh/bluh.css"
Console.WriteLine(fixUri.Host); // "www.example.com"
This works:
Uri uri = null;
if(Uri.TryCreate("//forum.xda-developers.com/pixel-c", UriKind.Absolute, out uri))
{
Console.WriteLine(uri.Authority);
Console.WriteLine(uri.Host);
}
returns
forum.xda-developers.com
forum.xda-developers.com
It also worked for me using the Uri(string) constructor.

How do I get the destination URL of a shortened URL?

I have an API (https://www.readability.com/developers/api/parser#idm386426118064) to extract the contents of the webapges, but on passing a shortened url or an url that redirects to other, it gives error.
I am developing windows phone 8.1 (xaml) app. Is there any way to get the destination url in c# or any work around?
eg url - http://www.bing.com/r/2/BB7Q4J4?a=1&m=EN-IN
You could intercept the Location header value before the HttpClient follows it like this:
using (var handler = new HttpClientHandler())
{
handler.AllowAutoRedirect = false;
using (var client = new HttpClient(handler))
{
var response = await client.GetAsync("shortUrl");
var longUrl = response.Headers.Location.ToString();
}
}
This solution will always be the most efficient because it only issue one request.
It is possible however, that the short url will reference another short url and consequently cause this method to fail.
An alternative solution would be to allow the HttpClient to follow the Location header value and observe the destination:
using (var client = new HttpClient())
{
var response = client.GetAsync("shortUrl").Result;
var longUrl = response.RequestMessage.RequestUri;
}
This method is both terser and more reliable than the first.
The drawback is that this code will issue two requests instead of one.
You can get the ResponseUri from GetResponse():
string redirectedURL = WebRequest.Create("http://www.bing.com/r/2/BB7Q4J4?a=1&m=EN-IN")
.GetResponse()
.ResponseUri
.ToString();
Interesting article, by the way.
You need to inspect the headers returned from the URL.
If you get HTTP return codes 301 or 302, then you are being notified that the page is redirecting you to another URL.
See http://www.w3.org/Protocols/HTTP/HTRESP.html for more details about HTTP return codes.

Uri constructor with dontEscape is obsolete, what is alternatieve?

My question is regarding passing an URL to HttpWebRequest without escaping, I searched the forums and internet, but I didn't find a good solution for it.
I have following URL:string URL= www.website.com/sub/redirec\t\bs\dd
So when I create an uri like this:
Uri uri = new Uri(URL);
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(uri);
In this case on a get method I will get following URL:www.website.com/sub/redirect%5Ct%5Cbc%5Cdd
This sign "\" will be replaced by "%5C". What is crucial for me not to happen?
I can avoid that by:
Uri uri = new Uri(URL, true); //bool dontEscape
But this constructor is obsolete. How to have same effect without using obsolete?
use this
Uri uri = new Uri(Uri.EscapeUriString(URL));

.NET URI: How can I change ONE part of a URI?

Often I want to change just one part of a URI and get a new URI object back.
In my current dilemma, I want to append .nyud.net, to use the CoralCDN.
I have a fully qualified URI fullUri. How can I, in effect, do this:
fullUri.Host = fullUri.Host + ".nyud.net";
This needs to work for almost any URL, and the PORT of the request needs to be maintained.
Any help would be much appreciated.
You can use an UriBuilder to modify individual parts of an Uri:
Uri uri = new Uri("http://stackoverflow.com/questions/2163191/");
UriBuilder builder = new UriBuilder(uri);
builder.Host += ".nyud.net";
Uri result = builder.Uri;
// result is "http://stackoverflow.com.nyud.net/questions/2163191/"

Categories

Resources