I want to check if a URI returns a valid result.
Example:
String path = String.Format("{0}/agreements/{1}.gif", PicRoot, languageTwoLetterCode);
WebRequest request = WebRequest.Create(new Uri(path, UriKind.Relative));
...
This throws a notsuportedexception. So I figure I should be providing the absolute URI.
All examples I can find use a hardcoded root (like www.example.com). This is off course unacceptable because it is uncertain what the actual root of the website will be.
How can I either check the result from a relative URI or find the current root?
Are there better ways to check if say "/content/pics/agreements/en.gif" returns a gif or a 404?
You can get the root of the web site from the server object.
You could use Server.MapPath (see also) and then check if the physical path is on the server (use File.Exists)
Related
I have endpoint used to find book by book reference.
The book reference is string that can contain white-space and any kind of special characters, e.g. mybook, my book, my-book, my/book, book++
// GET api/books/reference/{reference}
[HttpGet("reference/{reference}")]
public ActionResult<BookItem> FindByReference(string reference)
This is what I get when testing:
GET api/books/reference/mybook
OK
GET api/books/reference/my book
OK
GET api/books/reference/my-book
OK
GET api/books/reference/my+book
404 Not found
GET api/books/reference/my/book
404 Not found
GET api/books/reference/book++
404 Not found
What is a proper way to encode this reference parameter IN THE URL PATH so that it gets properly resolved by routing? Is that even possible?
The encoding of the URLs is the responsibility of the client calling the API, a client must supply a valid URL if it wants a proper response. If you encode your examples you will get:
GET api/books/reference/mybook
GET api/books/reference/my%20book
GET api/books/reference/my-book
GET api/books/reference/my%2Bbook
GET api/books/reference/my%2Fbook
GET api/books/reference/book%2B%2B
These should now work.
If you want help with the actual encoding you will have to edit your question with the client source code.
I want fetch some webpage from internet, and get absolute URLs of some images on the page by using HtmlAgilityPack in C#.
The problem is...
The website will first redirect the URL to some other one, and then the src attribute in the <img> tag is related URL.
Currently, I have some codes like this:
using HtmlAgilityPack;
HtmlDocument webpageDocument = new HtmlWeb().Load("http://xyz.example.com/");
HtmlNodeCollection nodes = webpageDocument.DocumentNode.SelectNodes("//img");
String url = nodes[0].Attributes["src"].Value.ToString();
Above codes fetch a webpage from the given example url, and get some <img> element from the DOM tree, and get src attribute of it.
It works if the <img> has absolute url. But unfortunately the website I want to handle give me a related URI (e.g. /img/01.png). I need the absolute URL so that I can do more options about the image.
So, I need to know what URL is the base URL for given src, but failed. Or, in another word, I don't know how to get the location of the webpage after redirect.
Server side is not mine (I have no control to it).
Consider ResponseUri and to avoid second call give html agility parser the string with the content of the page instead.
All,
In HTML, it is my understanding that a url that starts // (e.g. //www.google.com) refers to a protocol-less url that should be requested in the same scheme as that in which the page was served.
However, the following c# code fails
var uri = new Uri("//www.google.com", UriKind.RelativeOrAbsolute);
Assert.IsTrue(uri.IsAbsoluteUri);
Am I missing something here? At the moment I am rolling my own regex to find out if a URI is absolute:
return Regex.IsMatch(url, #"^(https?:)?//")
It's not absolute. It's relative to whether the URL is accessed from a source that is served over HTTP, HTTPS, or something else.
I need to pass local path to HttpWebRequest in c#. i have test.xml in my c drive and i need get that xml file in HttpWebRequest. but it throws exception in
HttpWebRequest rqst = (HttpWebRequest)HttpWebRequest.Create(Uri.EscapeUriString(urlServ))
line "Invalid URI: The Authority/Host could not be parsed."
my coding->
string urlServ = "file:\\c:\\test.xml";
try
{
HttpWebRequest rqst = (HttpWebRequest)HttpWebRequest.Create(Uri.EscapeUriString(urlServ));
rqst.KeepAlive = false;
}
catch{}
I believe a file: URI is supposed to be created with forward-slashes, not back slashes. So, use this:
string urlServ = "file:///c:/test.xml";
I noticed when I typed it into my browser with backslashes, FF converted it to forward slashes for me.
You should use WebRequest.Create(uri) - this will automatically create the right object based on the URI type (e.g. file, http, etc). Now you can use the same code for real web pages or local test files.
I saw this in the documentation of FileWebRequest:
Do not use the FileWebRequest constructor. Use the WebRequest.Create
method to initialize new instances of the FileWebRequest class. If the
URI scheme is file://, the Create method returns a FileWebRequest
object.
The goal of my program is to grab a webpage and then generate a list of Absolute links with the pages it links to.
The problem I am having is when a page redirects to another page without the program knowing, it makes all the relative links wrong.
For example:
I give my program this link: moodle.pgmb.si/moodle/course/view.php?id=1
On this page, if it finds the link href="signup.php" meaning signup.php in the current directory, it errors because there is no directory above the root.
However this error is invalid because the page's real location is:
moodle.pgmb.si/moodle/login/index.php
Meaning that "signup.php" is linking to moodle.pgmb.si/signup.php which is a valid page, not moodle.pgmb.si/moodle/course/signup.php like my program thinks.
So my question is how is my program supposed to know that the page it received is at another location?
I am doing this in C Sharp using the follownig code to get the HTML
WebRequest wrq = WebRequest.Create(address);
WebResponse wrs = wrq.GetResponse();
StreamReader strdr = new StreamReader(wrs.GetResponseStream());
string html = strdr.ReadToEnd();
strdr.Close();
wrs.Close();
You should be able to use ResponseUri method of WebResponse class. This will contain the URI of the internet resource that actually provided the response data, as opposed to the resource that was requested. You can then use this URI to build correct links.
http://msdn.microsoft.com/en-us/library/system.net.webresponse.responseuri.aspx
What I would do is first check if each link is absolute or relative by searching for an "http://" within it. If it's absolute, you're done. If it's relative, then you need to append the path to the page you're scanning in front of it.
There are a number of ways you could get the current path: you could Split() it on the slashes ("/"), then recombine all but the last one. Or you could search for the last occurrence of a slash and then take a substring of up to and including that position.
Edit: Re-reading the question, I'm not sure I am understanding. href="signup.php" is a relative link, which should go to the /signup.php. So the current behavior you mentioned is correct "moodle.pgmb.si/moodle/course/signup.php."
The problem is that, if the URL isn't a relative or absolute URL, then you have no way of knowing where it goes unless you request it. Even then, it might not actually be being served from where you think it is located. This is because it might actually be implemented as a HTTP Redirect or similar server side.
So if you want to be exhaustive, what you can do is:
Use your current technique to grab a list of all links on the page.
Attempt to request each of those pages. Then if you:
Get a 200 responce code then all is good - it's there.
Get a 404 response code you know the page does not exist
Get a 3XX response code then you know where the web server
expects that content to actually orginate form.
Your (Http)WebResponse object should have a ResponseCode property. Note that you should also handle any possible WebException errors - these too will have a WebResponse with a ResponseCode in (usually 5xx).
You can also look at the HttpWebResponse Headers property - the Location header.