Converting a web address to a valid href value

Converting a web address to a valid href value - c#

Firstly, this seems like something that should have been asked before, but I cannot find anything that answers my question.
A basic overview of my task is to render an anchor link on a web page which is based on a user defined web address. As the address is user defined this could be in any format, for example:
http://www.example.com
https://www.example.com
www.example.com
example.com
What I need to do with this value is to set it as the href property of an anchor tag. Now, the problem is that (in Chrome at least) only the first two examples will work due to the fact they are recognised as absolute URL paths. The last two examples will redirect to the same domain (i.e. treated as relative paths)
So the ultimate question is: What is the best way to format these values to ensure a consistent absolute path is used? I could check for http/https and add it if missing, but I was hoping there might be an out of the box .Net class that would be more reliable.
In addition, as this is a user defined value, it could be complete junk anyway so a function to validate the URL would be a nice bonus too.

We ran into this problem a few months back, and needed a consistent way of ensuring the URLs were absolute. We also wanted a way of removing http(s):// for displaying the URL on the web page.
I came up with this function:
public static string FormatUrl(string Url, bool IncludeHttp = null)
{
Url = Url.ToLower();
switch (IncludeHttp) {
case true:
if (!(Url.StartsWith("http://") || Url.StartsWith("https://")))
Url = "http://" + Url;
break;
case false:
if (Url.StartsWith("http://"))
Url = Url.Remove(0, "http://".Length);
if (Url.StartsWith("https://"))
Url = Url.Remove(0, "https://".Length);
break;
}
return Url;
}
I know you're after an "out of the box" library, but this may be of some help.
I think the problem with an "out of the box" solution would be that the function won't know whether the URL should be http:// or https://. With my function I've made an assumption that its going to be http://, but for some URLs you need https://. If Microsoft were to build something like this into the framework, it would be buggy from the start.

You can try using this overload of the Uri class:
Uri Constructor (String)
This constructor creates a Uri instance from a URI string. It parses the URI, puts it in canonical format, and makes any required escape encodings.
This constructor does not ensure that the Uri refers to an accessible resource.
This constructor assumes that the string parameter references an absolute URI and is equivalent to calling the Uri constructor with UriKind set to Absolute. If the string parameter passed to the constructor is a relative URI, this constructor will throw a UriFormatException.
This will try to construct a canonical Uri from the user input. And you have lots of properties to check and extract the URL parts that you need.

Related

How to access properties of Uri defined with UriKind.Relative

My method receives a URI as a string and attempts to parse it into a predictable and consistent format. The incoming URL could be absolute (http://www.test.com/myFolder) or relative (/myFolder). Absolute URIs are easy enough to work with, but I've hit some stumbling blocks working with relative ones. Most notable is the fact that, although the constructor for Uri allows you to designate a relative URI using UriKind.Relative (or UriKind.RelativeOrAbsolute), it doesn't appear to have any properties available when you do this.
Specifically, it throws this exception: System.InvalidOperationException : This operation is not supported for a relative URI.
It makes sense that you wouldn't be able to access, say, the Scheme or Authority properties--although it seems weird that they actually throw invalid operation exceptions instead of just returning blank strings--but even properties like PathAndQuery or Fragment exhibit the same behavior. In fact, pretty much the only properties that don't throw exceptions for relative URIs are the IsX flags and OriginalString, which just shows the string you passed in to the the object in the first place.
Given that that constructor explicitly allows you to declare a relative URI, this all seems like a baffling omission. Is there something I'm missing here? Is there any way to handle relative URIs as their component parts, or do I need to just treat it as a string? Am I completely failing to understand what "relative URI" means in this case?
To replicate:
var uri = new Uri("/myFolder");
string foo = uri.PathAndQuery; // throws exception
Visual Studio Pro 2015,
.NET 4.5.2 (if any of that makes a difference)

It's by design that an exception gets thrown when accessing eg. PathAndQuery for a relative uri, see uri source code.
As a rather quick and dirty workaround to parse some segments, you could construct a temporary absolute uri out of the relative one, using a dummy base uri (scheme and host) which you ignore.
String url = "/myFolder?bar=1#baz";
Uri uri = new Uri(url, UriKind.RelativeOrAbsolute);
if (!uri.IsAbsoluteUri)
{
Uri baseUri = new Uri("http://foo.com");
uri = new Uri(baseUri, uri);
}
String pathAndQuery = uri.PathAndQuery; // /myFolder?bar=1
String query = uri.Query; // ?bar=1
String fragment = uri.Fragment; // #baz

unable to get complete url after # using query string in asp.net c# [duplicate]

I know on client side (javascript) you can use windows.location.hash but could not find anyway to access from the server side. I'm using asp.net.

We had a situation where we needed to persist the URL hash across ASP.Net post backs. As the browser does not send the hash to the server by default, the only way to do it is to use some Javascript:
When the form submits, grab the hash (window.location.hash) and store it in a server-side hidden input field Put this in a DIV with an id of "urlhash" so we can find it easily later.
On the server you can use this value if you need to do something with it. You can even change it if you need to.
On page load on the client, check the value of this this hidden field. You will want to find it by the DIV it is contained in as the auto-generated ID won't be known. Yes, you could do some trickery here with .ClientID but we found it simpler to just use the wrapper DIV as it allows all this Javascript to live in an external file and be used in a generic fashion.
If the hidden input field has a valid value, set that as the URL hash (window.location.hash again) and/or perform other actions.
We used jQuery to simplify the selecting of the field, etc ... all in all it ends up being a few jQuery calls, one to save the value, and another to restore it.
Before submit:
$("form").submit(function() {
$("input", "#urlhash").val(window.location.hash);
});
On page load:
var hashVal = $("input", "#urlhash").val();
if (IsHashValid(hashVal)) {
window.location.hash = hashVal;
}
IsHashValid() can check for "undefined" or other things you don't want to handle.
Also, make sure you use $(document).ready() appropriately, of course.

[RFC 2396][1] section 4.1:
When a URI reference is used to perform a retrieval action on the
identified resource, the optional fragment identifier, separated from
the URI by a crosshatch ("#") character, consists of additional
reference information to be interpreted by the user agent after the
retrieval action has been successfully completed. As such, it is not
part of a URI, but is often used in conjunction with a URI.
(emphasis added)
[1]: https://www.rfc-editor.org/rfc/rfc2396#section-4

That's because the browser doesn't transmit that part to the server, sorry.

Probably the only choice is to read it on the client side and transfer it manually to the server (GET/POST/AJAX).
Regards
Artur
You may see also how to play with back button and browser history
at Malcan

Just to rule out the possibility you aren't actually trying to see the fragment on a GET/POST and actually want to know how to access that part of a URI object you have within your server-side code, it is under Uri.Fragment (MSDN docs).

Possible solution for GET requests:
New Link format: http://example.com/yourDirectory?hash=video01
Call this function toward top of controller or http://example.com/yourDirectory/index.php:
function redirect()
{
if (!empty($_GET['hash'])) {
/** Sanitize & Validate $_GET['hash']
If valid return string
If invalid: return empty or false
******************************************************/
$validHash = sanitizeAndValidateHashFunction($_GET['hash']);
if (!empty($validHash)) {
$url = './#' . $validHash;
} else {
$url = '/your404page.php';
}
header("Location: $url");
}
}

Rewriting forward slashes in a query parameter

In my asp.net core app (angular 4 front end) I accept a URL like this:
example.com/report;url=http%3A%2F%2Fexample2.com
I would like to create a rewrite rule that allowed people to enter the following url:
example.com/report;url=http://example2.com
I can't find out how to do this.
I tried:
var options = new RewriteOptions()
.AddRewrite(#"(.*);url=http:\/\/([^;]*)(.*)", "$1;url=http%3A%2F%2F$2$3", skipRemainingRules: false)
.AddRewrite(#"^report.*", "index.html", skipRemainingRules: true)
app.UseRewriter(options);
This didn't work but even if it did it wouldn't account for urls that have slashes after the domain, i.e. sub directories. Using a group matching pattern I think it's impossible to do that. It needs to be a find & replace type operation on matched group.
Other webservers have this as a configurable option to decode slashes. I can't find any reference to it in the asp.net core docs. Is this possible?

You're going to want to pass a parameter for the URL. There really is no way to get you what you want by allowing the user to enter a URL as a parameter in the address bar. It will always need to be encoded.
Instead of:
http://example.com/report;url=http%3A%2F%2Fexample2.com
use:
http://example.com/report?url=http%3A%2F%2Fexample2.com
Rather than having the user enter all of this into a browser address bar, I would instead create a user interface that allows a user to ask for a report and hit submit. The report would need a textbox for the URL and it will send a get request to your site after encoding the contents of the URL textbox into the 'url' parameter.
Using a URL re-write module is probably going against the grain here.

Unfortunately based on the code of the RewriteRule class, it is not possible. You should be able to create your own custom rewrite rule though which would implement IRule interface and URL encode part of your request path.
The source code of the RewriteRule can be found here
https://github.com/aspnet/BasicMiddleware/blob/dev/src/Microsoft.AspNetCore.Rewrite/Internal/RewriteRule.cs
And here is the UrlEncoder to encode the value of the 'url'
https://github.com/dotnet/corefx/blob/master/src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/UrlEncoder.cs

What is the difference between ResolveUrl and ResolveClientUrl?

I have been using ResolveUrl for adding CSS and Javascript in ASP.NET files.
But I usually see an option of ResolveClientUrl. What is the difference between both?
When should I use ResolveClientUrl?

ResolveUrl creates the URL relative to the root.
ResolveClientUrl creates the URL relative to the current page.
You can also use whichever one you want, however ResolveUrl is more commonly used.

Here's a simple example:
//Returns: ../HomePage.aspx
String ClientURL = ResolveClientUrl("~/HomePage.aspx");
//Returns: /HomePage.aspx
String RegURL = ResolveUrl("~/HomePage.aspx");
//Returns: C:\inetpub\wwwroot\MyProject\HomePage.aspx
String ServerMappedPath = Server.MapPath("~/HomePage.aspx");
//Returns: ~/HomePage.aspx
String appRelVirtPath = AppRelativeVirtualPath;
//Returns: http://localhost:4913/
String baseUrl = Request.Url.GetLeftPart(UriPartial.Authority) + Request.ApplicationPath;
//Returns: "http://localhost:4913/HomePage.aspx"
String absUri = Request.Url.AbsoluteUri;

According to the MSDN documentation:
ResolveClientUrl
A fully qualified URL to the specified
resource suitable for use on the
browser.
Use the ResolveClientUrl method to
return a URL string suitable for use
by the client to access resources on
the Web server, such as image files,
links to additional pages, and so on.
ResolveUrl
The converted URL.
If the relativeUrl parameter contains an absolute URL, the URL is returned unchanged. If the relativeUrl parameter contains a relative URL, that URL is changed to a relative URL that is correct for the current request path, so that the browser can resolve the URL.
For example, consider the following
scenario:
A client has requested an ASP.NET page
that contains a user control that has
an image associated with it.
The ASP.NET page is located at
/Store/page1.aspx.
The user control is located at
/Store/UserControls/UC1.ascx.
The image file is located at
/UserControls/Images/Image1.jpg.
If the user control passes the
relative path to the image (that is,
/Store/UserControls/Images/Image1.jpg)
to the ResolveUrl method, the method
will return the value
/Images/Image1.jpg.
I think this explains it quite well.

In short:
Page.ResolveUrl(~): creates the URL from the root of app.
and
Page.ResolveClientUrl(~): creates the URL relative to the current page.(e.g: ../../..)
but in my tests in asp.net, Page.ResolveUrl is better because of stable output & is not relative.

Using Page.ResolveUrl is better if you are trying to get a Javascript friendly Url.
Like if you are opening an iframe from the parent page, you would need a full url that would be passed to the iframe src property.

Truncating Query String & Returning Clean URL C# ASP.net

I would like to take the original URL, truncate the query string parameters, and return a cleaned up version of the URL. I would like it to occur across the whole application, so performing through the global.asax would be ideal. Also, I think a 301 redirect would be in order as well.
ie.
in: www.website.com/default.aspx?utm_source=twitter&utm_medium=social-media
out: www.website.com/default.aspx
What would be the best way to achieve this?

System.Uri is your friend here. This has many helpful utilities on it, but the one you want is GetLeftPart:
string url = "http://www.website.com/default.aspx?utm_source=twitter&utm_medium=social-media";
Uri uri = new Uri(url);
Console.WriteLine(uri.GetLeftPart(UriPartial.Path));
This gives the output: http://www.website.com/default.aspx
[The Uri class does require the protocol, http://, to be specified]
GetLeftPart basicallys says "get the left part of the uri up to and including the part I specify". This can be Scheme (just the http:// bit), Authority (the www.website.com part), Path (the /default.aspx) or Query (the querystring).
Assuming you are on an aspx web page, you can then use Response.Redirect(newUrl) to redirect the caller.

Here is a simple trick
Dim uri = New Uri(Request.Url.AbsoluteUri)
dim reqURL = uri.GetLeftPart(UriPartial.Path)

Here is a quick way of getting the root path sans the full path and query.
string path = Request.Url.AbsoluteUri.Replace(Request.Url.PathAndQuery,"");

This may look a little better.
string rawUrl = String.Concat(this.GetApplicationUrl(), Request.RawUrl);
if (rawUrl.Contains("/post/"))
{
bool hasQueryStrings = Request.QueryString.Keys.Count > 1;
if (hasQueryStrings)
{
Uri uri = new Uri(rawUrl);
rawUrl = uri.GetLeftPart(UriPartial.Path);
HtmlLink canonical = new HtmlLink();
canonical.Href = rawUrl;
canonical.Attributes["rel"] = "canonical";
Page.Header.Controls.Add(canonical);
}
}
Followed by a function to properly fetch the application URL.
Works perfectly.

I'm guessing that you want to do this because you want your users to see pretty looking URLs. The only way to get the client to "change" the URL in its address bar is to send it to a new location - i.e. you need to redirect them.
Are the query string parameters going to affect the output of your page? If so, you'll have to look at how to maintain state between requests (session variables, cookies, etc.) because your query string parameters will be lost as soon as you redirect to a page without them.
There are a few ways you can do this globally (in order of preference):
If you have direct control over your server environment then a configurable server module like ISAPI_ReWrite or IIS 7.0 URL Rewrite Module is a great approach.
A custom IHttpModule is a nice, reusable roll-your-own approach.
You can also do this in the global.asax as you suggest
You should only use the 301 response code if the resource has indeed moved permanently. Again, this depends on whether your application needs to use the query string parameters. If you use a permanent redirect a browser (that respects the 301 response code) will skip loading a URL like .../default.aspx?utm_source=twitter&utm_medium=social-media and load .../default.aspx - you'll never even know about the query string parameters.
Finally, you can use POST method requests. This gives you clean URLs and lets you pass parameters in, but will only work with <form> elements or requests you create using JavaScript.

Take a look at the UriBuilder class. You can create one with a url string, and the object will then parse this url and let you access just the elements you desire.

After completing whatever processing you need to do on the query string, just split the url on the question mark:
Dim _CleanUrl as String = Request.Url.AbsoluteUri.Split("?")(0)
Response.Redirect(_CleanUrl)
Granted, my solution is in VB.NET, but I'd imagine that it could be ported over pretty easily. And since we are only looking for the first element of the split, it even "fails" gracefully when there is no querystring.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.