How to check that a uri string is valid - c#

How do you check that a uri string is valid (that you can feed it to the Uri constructor)?
So far I only have the following but for obvious reasons I'd prefer a less brute way:
Boolean IsValidUri(String uri)
{
try
{
new Uri(uri);
return true;
}
catch
{
return false;
}
}
I tried Uri.IsWellFormedUriString but it doesn't seem to like everything that you can throw at the constructor. For example:
String test = #"C:\File.txt";
Console.WriteLine("Uri.IsWellFormedUriString says: {0}", Uri.IsWellFormedUriString(test, UriKind.RelativeOrAbsolute));
Console.WriteLine("IsValidUri says: {0}", IsValidUri(test));
The output will be:
Uri.IsWellFormedUriString says: False
IsValidUri says: True
Update/Answer
The Uri constructor uses kind Absolute by default. This was causing a discrepancy when I tried using Uri.TryCreate and the constructor. You do get the expected outcome if you match the UriKind for both the constructor and TryCreate.

A well-formed URI implies conformance with certain RFCs. The local path in your example is not conformant with these. Read more in the IsWellFormedUriString documentation.
A false result from that method does not imply that the Uri class will not be able to parse the input. While the URI input might not be RFC conformant, it still can be a valid URI.
Update: And to answer your question - as the Uri documentation shows, there is a static method called TryCreate that will attempt exactly what you want and return true or false (and the actual Uri instance if true).

Since the accepted answer doesn't provide an explicit example, here is some code to validate URIs in C#:
Uri outUri;
if (Uri.TryCreate("ThisIsAnInvalidAbsoluteURI", UriKind.Absolute, out outUri)
&& (outUri.Scheme == Uri.UriSchemeHttp || outUri.Scheme == Uri.UriSchemeHttps))
{
//Do something with your validated Absolute URI...
}

Assuming we only want to support absolute URI and HTTP requests, here is a function that does what you want:
public static bool IsValidURI(string uri)
{
if (!Uri.IsWellFormedUriString(uri, UriKind.Absolute))
return false;
Uri tmp;
if (!Uri.TryCreate(uri, UriKind.Absolute, out tmp))
return false;
return tmp.Scheme == Uri.UriSchemeHttp || tmp.Scheme == Uri.UriSchemeHttps;
}

In my case I just wanted to test the uri, I don't want to slow down the application testing the uri.
Boolean IsValidUri(String uri){
return Uri.IsWellFormedUriString(uri, UriKind.Absolute);
}

Try it:
private bool IsValidUrl(string address)
{
return Uri.IsWellFormedUriString(address, UriKind.RelativeOrAbsolute);
}

In your case the uri argument is an absolute path which refers to a file location, so as per the doc of the method it returns false. Refer to this

Related

Why Uri.TryCreate throws NRE when url contains Turkish character?

I have encountered an interesting situation where I get NRE from Uri.TryCreate method when it's supposed to return false.
You can reproduce the issue like below:
Uri url;
if (Uri.TryCreate("http:Ç", UriKind.RelativeOrAbsolute, out url))
{
Console.WriteLine("success");
}
I guess it's failing during the parse, but when I try "http:A" for example, it returns true and parses it as relative url. Even if fails on parse it should just return false as I understand, what could be the problem here? This seems like a bug in the implementation cause documentation doesn't mention about any exception on this method.
The error occurs in .NET 4.6.1 but not 4.0
This is a bug in the .NET framework. You can open a ticket on MicrosoftConnect.
The exception will be raised in this method
void Systen.Uri.CreateUriInfo(System.Uri.Flags cF)
on line 2290 (inspect the reference source) executing following statement:
// This is NOT an ImplicitFile uri
idx = (ushort)m_Syntax.SchemeName.Length;
At this time, the m_Syntax object will be null, because during parsing, it will be discarded.
Method
void InitializeUri(ParsingError err, UriKind uriKind, out UriFormatException e)
line 121:
if (m_Syntax.IsSimple)
{
if ((err = PrivateParseMinimal()) != ParsingError.None)
{
if (uriKind != UriKind.Absolute && err <= ParsingError.LastRelativeUriOkErrIndex)
{
// RFC 3986 Section 5.4.2 - http:(relativeUri) may be considered a valid relative Uri.
m_Syntax = null; // convert to relative uri
e = null;
m_Flags &= Flags.UserEscaped; // the only flag that makes sense for a relative uri
}
// ...
}
// ...
}
The PrivateParseMinimal() method returns ParsingError.BadAuthority and uriKind == UriKind.RelativeOrAbsolute by your specification.
The PrivateParseMinimal() method looks for any of the following character sequences: "//", "\", "/\", "/". And since there are no such sequences in your input string, a ParsingError.BadAuthority code will be returned.

Strange behavior in Uri-class (.NET)

Why does the Uri class urldecode my url that I send to its contructor and how can I prevent this?
Example (look at the querystring value "options"):
string url = "http://www.example.com/default.aspx?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3";
Uri uri = new Uri(url); // http://www.example.com/default.aspx?id=1&name=andreas&options=one=1&two=2&three=3
Update:
// ?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3
Request.QueryString["options"] = one=1&two=2&three=3
// ?id=1&name=andreas&options=one=1&two=2&three=3
Request.QueryString["options"] = one=1
This is my problem :)
why exactly?
you can get to the encoded version using url.AbsoluteUri
EDIT
Console.WriteLine("1) " + uri.AbsoluteUri);
Console.WriteLine("2) " + uri.Query);
OUT:
1) http://www.example.com/default.aspx?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3
2) ?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3
I would expect that from a Uri class. I am quite sure that it still gets you in a good place if you use it with e.g. WebClient class (i.e. WebClient.OpenRead (Uri uri)). What's the problem in your case?
This is how the internal code of .NET behaves - in previous versions you could use another constructor of Uri that accepted boolean value telling if to escape or not, but it has been deprecated.
The only way around it is hackish: accessing some private method directly by means of reflection:
string url = "http://www.example.com/default.aspx?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3";
Uri uri = new Uri(url);
MethodInfo mi = uri.GetType().GetMethod("CreateThis", BindingFlags.NonPublic | BindingFlags.Instance);
if (mi != null)
mi.Invoke(uri, new object[] { url, true, UriKind.RelativeOrAbsolute });
This worked for me in quick test, but not ideal as you "hack" into .NET internal code.

HttpWebRequest Url escaping

I know, the title sounds like this question has been addressed many times. But I am struggling with a specific case and I am very confused over it. Hopefully a seasoned C#'er could point me in the correct direction.
I have the code:
string serviceURL = "https://www.domain.com/service/tables/bucketname%2Ftables%2Ftesttable/imports";
HttpWebRequest dataRequest = (HttpWebRequest)WebRequest.Create(serviceURL);
Now when I quickwatch dataRequest, I see that:
RequestUri: {https://www.domain.com/service/tables/bucketname/tables/testtable/imports}
And it looks like the HttpWebRequest has changed both the %2F to /. However, the server needs the requested Uri to be exactly as serviceURL is written, containing the %2F.
Is there any way to get the HttpWebRequest class to call the Url:
https://www.domain.com/service/tables/bucketname%2Ftables%2Ftesttable/imports
Many thanks! I am at a complete loss here...
-Brett
Kyle posted the answer in a comment, so to make it official:
GETting a URL with an url-encoded slash
It's a weird work around, but nevertheless gets the job done.
As long as the problem lies in %2F being unescaped to "/" there are solutions out there. One involving a hack and for newer versions of .Net, an app.config setting. Check here: How to make System.Uri not to unescape %2f (slash) in path?
However I have still to figure out how to prevent it unescaping some specifically escaped characters, like '(' and ')' (%28 and %29). I have tried all the settings and hacks that I found out there to prevent the Uri class from delivering a partially unescaped path for the WebRequest. The solutions will happily prevent %2F being unescaped, but not %28 and %29 and possible most of the other chars being specifically escaped.
It seems like the WebRequest is specifically asking for 1 value from the Uri object to create the "GET /path HTTP/1.1" syntax: Uri.PathAndQuery which again calls its UriParser.GetComponents.
If you want to download from mediafire and it contains the chars %28 and %29 you will get into a infinite redirect loop as .Net keeps changing %28 and %29 to '(' and ')' and following the redirect (exception: "Too many automatic redirections were attempted").
So this is a solution for those who are stuck and have not been able to find a way to prevent the unescape of some characters.
The only way I have found to override this (currenly using .Net 4.6) and deliver my own PathAndQuery has been a combination of inherting UriParser and hacking its use.
public sealed class MyUriParser : System.UriParser
{
private UriParser _originalParser;
private MethodInfo _getComponentsMethod;
public MyUriParser(UriParser originalParser) : base()
{
if (_originalParser == null)
{
_originalParser = originalParser;
_getComponentsMethod = typeof(UriParser).GetMethod("GetComponents", BindingFlags.NonPublic | BindingFlags.Instance);
if (_getComponentsMethod == null)
{
throw new MissingMethodException("UriParser", "GetComponents");
}
}
}
private static Regex rx = new Regex(#"^(?<Scheme>[^:]+):(?://((?<User>[^#/]+)#)?(?<Host>[^#:/?#]+)(:(?<Port>\d+))?)?(?<Path>([^?#]*)?)?(\?(?<Query>[^#]*))?(#(?<Fragment>.*))?$",RegexOptions.Compiled | RegexOptions.ExplicitCapture | RegexOptions.Singleline);
private Match m = null;
protected override string GetComponents(Uri uri, UriComponents components, UriFormat format)
{
var original = (string)_getComponentsMethod.Invoke(_originalParser, BindingFlags.InvokeMethod, null, new object[] { uri, components, format }, null);
if (components == UriComponents.PathAndQuery)
{
var reg = rx.Match(uri.OriginalString);
var path = reg.Groups["Path"]?.Value;
var query = reg.Groups["Query"]?.Value;
if (path != null && query != null) return $"{path}?{query}";
if (query == null) return $"{path}";
return $"{path}";
}
return original;
}
}
And then hacking it into the Uri instance by replacing its UriParser with this one.
public static Uri CreateUri(string url)
{
var uri = new Uri(url);
if (url.Contains("%28") || url.Contains("%29"))
{
var originalParser = ReflectionHelper.GetValueByReflection(uri, "m_Syntax") as UriParser;
var parser = new MyUriParser(originalParser);
ReflectionHelper.SetValueByReflection(parser, "m_Scheme", "http");
ReflectionHelper.SetValueByReflection(parser, "m_Port", 80);
ReflectionHelper.SetValueByReflection(uri, "m_Syntax", parser);
}
return uri;
}
Due to the way UriParser works, it normally needs to register to have its port and scheme name set, so these 2 values has to be set by reflection as we are not registering it the correct way. I have not found a way to register "http" as it already exist. The ReflectionHelper is just a class I have but can be quickly replaced with normal reflection code.
Then call it like this:
HttpWebRequest dataRequest = (HttpWebRequest)WebRequest.Create(CreateUri(serviceURL));
string serviceURL = Uri.EscapeUriString("https://www.domain.com/service/tables/bucketname%2Ftables%2Ftesttable/imports");

Alternatives to .NET provided apis regarding uris and urls

I've recently come to the realization that the .NET apis working with URLs and URIs frequently come up short in achieving even basic functionality (atleast easily) including things such as: generating a FQDN url from a relative path, forcing https or back to http, getting the root of the site, combining relative urls properly and so forth.
Are there any alternative libraries out there that have put all of these type of functionality in a simple and reliable project?
I've certainly found myself doing much the same URI-manipulation code more than once, in .NET, but I don't see your cases as places it lacks.
Full URI from relative Uri:
new Uri(base, relative) // (works whether relative is a string or a Uri).
Obtaining the actual FQDN:
string host = uri.Host;
string fqdn = hostEndsWith(".") ? host : host + ".";
Forcing https or back to http:
UriBuilder toHttp = new UriBuilder(someUri);
toHttp.Scheme = "http";
toHttp.Port = 80;
return toHttp.Uri;
UriBuilder toHttps = new UriBuilder(someUri);
toHttps.Scheme = "https";
toHttps.Port = 443;
return toHttps.Uri;
Getting the root of the site:
new Uri(startingUri, "/");
Combining relative urls properly:
new Uri(baseUri, relUri); // We had this one already.
Only two of these are more than a single method call, and of those obtaining the FQDN is pretty obscure (unless rather than wanting the dot-ended FQDN you just wanted the absolute URI, in which case we're back to a single method call).
There is a single method version of the HTTPS/HTTP switching, though it's actually more cumbersome since it calls several properties of the Uri object. I can live with it taking a few lines to do this switch.
Still, to provide a new API one need only supply:
public static Uri SetHttpPrivacy(this Uri uri, bool privacy)
{
UriBuilder ub = new UriBuilder(uri);
if(privacy)
{
ub.Scheme = "https";
ub.Port = 443;
}
else
{
ub.Scheme = "http";
ub.Port = 80;
}
return ub.Uri;
}
I really can't see how an API could possibly be any more concise in the other cases.
XUri is a nice class that is part of the open source project from MindTouch
http://developer.mindtouch.com/en/ref/dream/MindTouch.Dream/XUri?highlight=XUri
This article includes a quick sample on how to use it.
http://blog.developer.mindtouch.com/2009/05/18/consuming-rest-services-and-tdd-with-plug/
I am a fan of it. A little overkill assembly wise if you are going to just use the XUri portion, but there are other really nice things in the library too.
I use a combination of extensions with 'System.IO.Path' object as well.
These are just blurbs for example.
public static Uri SecureIfRemote(this Uri uri){
if(!System.Web.HttpContext.Current.Request.IsSecureConnection &&
!System.Web.HttpContext.Current.Request.IsLocal){
return new Uri......(build secure uri here)
}
return uri;
}
public static NameValueCollection ParseQueryString(Uri uri){
return uri.Query.ParseQueryString();
}
public static NameValueCollection ParseQueryString(this string s)
{
//return
return HttpUtility.ParseQueryString(s);
}

Detecting address type given an a string

I have a text box, which users are allowed to enter addresses in these forms:
somefile.htm
someFolder/somefile.htm
c:\somepath\somemorepath\somefile.htm
http://someaddress
\\somecomputer\somepath\somefile.htm
or any other source that navigates to some content, containing some markup.
Should I also put a drop down list near the text box, asking what type of address is this, or is there a reliable way that can auto-detect the type of the address in the text box?
I don't think there is a particularly nice way of automatically doing this without crafting your own detection.
If you don't mind catching an exception in the failure case (which generally I do), then the snippet below will work for your examples (noting that it will also identify directories as being of type file)
public string DetectScheme(string address)
{
Uri result;
if (Uri.TryCreate(address, UriKind.Absolute, out result))
{
// You can only get Scheme property on an absolute Uri
return result.Scheme;
}
try
{
new FileInfo(address);
return "file";
}
catch
{
throw new ArgumentException("Unknown scheme supplied", "address");
}
}
I would suggest using a regex to determine the paths, similar to
public enum FileType
{
Url,
Unc,
Drive,
Other,
}
public static FileType DetermineType(string file)
{
System.Text.RegularExpressions.MatchCollection matches = System.Text.RegularExpressions.Regex.Matches(file, "^(?<unc>\\\\)|(?<drive>[a-zA-Z]:\\.*)|(?<url>http://).*$", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
if (matches.Count > 0)
{
if (matches[0].Groups["unc"].Value == string.Empty) return FileType.Unc;
if (matches[0].Groups["drive"].Value == string.Empty) return FileType.Drive;
if (matches[0].Groups["url"].Value == string.Empty) return FileType.Url;
}
return FileType.Other;
}
If there is only a limited number of formats, you can validate against these and only allow valid ones. This will make auto-detection a bit easier as you will be able to use the same logic for that.
Check Uri.HostNameType Property and Uri.Scheme Property

Categories

Resources