Check for a Valid URL string - c#

I have an MVC application and I am going through a strange situation right now.
I am allowing users to stare and Display website URL's. I am using the following method to check whether the following URL is correct or not:
if(Uri.TryCreate(urlString, UriKind.Absolute, out uri))
{
// Do something
}
else
{
// Invalid Url
}
This method is not working because when I try URL starting with "www" or directly with the domain name then it does'not work.
I want the first section of If statement to be bullet proof.
Thanks for you help.

Maybe just try to create a new URI in a try-catch block.
Something like:
Uri myUri = new Uri("http://www.stackoverflow.com")
If you get to the catch then it's a wrong URL.

Related

How to get all urls from a webpage belonging to same Domain

Currently i am using this code to get the above :
Uri baseUri = new Uri(url);
Uri myUri = new Uri(baseUri, strRef);
domain = baseUri.Host;
Console.WriteLine(myUri.ToString());
strRef = myUri.ToString();
if (strRef.Contains(domain))
{
//THIS MEANS IT BELONGS TO SAME DOMAIN...
}
But using this code i am having some issue like suppose we have a main url = http://www.xxx.co.uk
Then the above code also treats a url like http://www.news.xxx.co.uk as external link ? Is this correct should it do that if not any one know a better solution for this?
I think you are in the correct path. But, to grab the latter mentioned URL (http://www.news.xxx.co.uk/) you could do a quick fix like this.
domain = baseUri.Host.Replace("www.", string.Empty);
Cheers!
vote if helpful.

C# string parsing containing in a list

I'm trying to create a parsing system for c#, to block my program from fetching images from "banned" websites that are located in a list. I have tried using bool class, to do a Regex.Replace operation, unfortunately it didn't work out.
To elaborate on what I exactly would like, this is an example:
I have a List BannedSites = new List { "site" };
if(Bannedsites.Contains(input))
{
Don't go to that site
}
else
{
Go to that site
}
Though the error I mostly get is I have "site" in the list, though if someone does "site " with a space afterwards it goes to the else statement, since it doesn't directly exist in the list, or if someone does "site?" and we know a questionmark at the end of the url doesn't make a difference usually to access the site, so they bypass it again. Is it possible to do something that if the input contains "site", WITHING the string, for it to not go to the site. Sorry if this is a simple code, though I haven't been able to figure it out and google didn't help.
Thanks in advance!
You can use LINQ's .Any to help with that:
if(Bannedsites.Any(x => input.Contains(x)) {
// Don't go to that site
} else {
// Go to that site
}
Remember to use .ToUpperInvariant() on everything to make it case-insensitive.
If you make sure that you only have the domain names (and arguably ips) in the list Bannedsites then you can look for the domain only.
To get the domain of a Uri, do as follows:
var uri = new Uri("http://stackoverflow.com/questions/11060418/c-sharp-string-parsing-containing-in-a-list");
Console.WriteLine(uri.DnsSafeHost);
The output is:
stackoverflow.com
Now you can get it to work like this (remember to store in upper case in Bannedsites):
var uri = new Uri(input)
if(Bannedsites.Contains(uri.DnsSafeHost.ToUpper(CultureInfo.InvariantCulture)))
{
//Don't go to that site
}
else
{
//Go to that site
}
This will also ensure that the domain didn't appear as a part of another string by chance, for example as part of a parameter.
Also note that this method will give you subdomains, so:
var uri = new Uri("http://msdn.microsoft.com/en-US/");
Console.WriteLine(uri.DnsSafeHost);
returns:
msdn.microsoft.com
and not only:
microsoft.com
You may also verify that the uri is valid with uri.IsWellFormedOriginalString():
var uri = new Uri(input)
if(uri.IsWellFormedOriginalString() && Bannedsites.Contains(uri.DnsSafeHost))
{
//Don't go to that site
}
else
{
//Go to that site
}
Now, let's say that you want to take into account the detail of subdomains, well, you can do this:
var uri = new Uri(input)
if(uri.IsWellFormedOriginalString() && Bannedsites.Any(x => uri.DnsSafeHost.EndsWith(x))
{
// Don't go to that site
}
else
{
// Go to that site
}
Lastly if you are banning particular pages not whole webs (in which case caring for the subdomains makes no sense), then you can do as follows:
var uri = new Uri(input)
if(uri.IsWellFormedOriginalString() && Bannedsites.Contains((uri.DnsSafeHost + uri.AbsolutePath)))
{
//Don't go to that site
}
else
{
//Go to that site
}
Using AbsolutePath you take care of those "?" and "#" often used to pass parameters, and any other character that doesn't change the requested page.
You may also consider using Uri.Compare and store a list of Uri instead of a list of strings.
I leave you the task of making the comparisons case invariant as RFC 1035 says:
"
For all parts of the DNS that are part of the official protocol, all
comparisons between character strings (e.g., labels, domain names, etc.)
are done in a case-insensitive manner.
"

Correctly replacing a string with another string

I am work a site that was coded in c# and uses a ssl cert "secure.mydomain.com
To switch from http to https it uses the following code
if (useSsl)
{
if (!String.IsNullOrEmpty(ConfigurationManager.AppSettings["SharedSSL"]))
{
//shared SSL
result = ConfigurationManager.AppSettings["SharedSSL"];
}
else
{
//SSL
**result = result.Replace("http:/", "https://");**
}
This will switch from "http://mydoman.com" to "https://mydomain.com", but I need "https://secure.mydomin.com". If I change the code to result = result.Replace("http:/", "https://secure"); it takes me to an error page because it is trying to go to "https://secure".
I have been searching for 3 weeks to find a solution and tried so of them but none worked. Any suggestions on how to correct this?
You have missed the extra forward slash on your http
result.Replace("http://", "https://secure.");
This will work for you hopefully
As stated by soniic, you've missed a /.
This means your string will look like
https://secure/.mydomain.com
Thats why you're being redirected to https://secure instead of https://secure.mydomain.com

IIS treats double-encoded forward slashes in URLs differently on the first request than it does on subsequent requests

Recently my team was asked to implement an HttpModule for an ASP.NET MVC application that handled double-encoded URLs on IIS 7 and .NET 3.5. Here's the crux of the problem:
We sometimes get URLs that have double-encoded forward slashes that look like so:
http://www.example.com/%252fbar%5cbaz/foo
There are other formats that we have to handle as well, but they all have something in common, they have a double-encoded forward slash.
To fix this, we wrote an HttpModule that only acts when a URL has a double encoded forward slash, and we redirect it to a sane URL. The details aren't important, but there are two bits that are:
We can't control the fact that these URLs have double-encoded forward slashes
And we have not ugpraded to .NET 4.0 yet, nor is it on the immediate horizon.
Here's the problem:
The first request after IIS starts up shows a different URL than the second request does.
If we used the URL from the above example, the first request to IIS would look like:
http://www.example.com/bar/baz/foo
and the second request would look like:
http://www.example.com/%252fbar%5cbaz/foo
This was done by inspecting the Application.Request.Url.AbsolutePath property while debugging.
Here's the smallest code example that should reproduce the problem (create a new MVC application, and register the following HttpModule):
public class ForwardSlashHttpModule : IHttpModule
{
internal IHttpApplication Application { get; set; }
public void Dispose()
{
Application = null;
}
public void Init(HttpApplication context)
{
Initialize(new HttpApplicationAdapter(context));
}
internal void Initialize(IHttpApplication context)
{
Application = context;
context.BeginRequest += context_BeginRequest;
}
internal void context_BeginRequest(object sender, EventArgs e)
{
var url = Application.Request.Url.AbsolutePath; //<-- Problem point
//Do stuff with Url here.
}
}
Then, call the same URL on localhost:
http://www.example.com/%252fbar%5c/foo
NB: Make sure to insert a Debugger.Launch() call before the line in context_BeginRequest so that you'll be able to see it the first time IIS launches
When you execute the first request, you should see:
http://example.com/bar/foo
on subsequent requests, you should see:
http://example.com//bar/foo.
My question is: Is this a bug in IIS? Why does it provide different URLs when calling Application.Request.Url.AbsolutePath the first time, but not for any subsequent request?
Also: It doesn't matter whether the first request is for a double encoded URL or not, the second request will always be handled appropriately by IIS (or at least, as appropriate as handling double-encoded forward slashes can be). It's that very first request that is the problem.
Update
I tried a few different properties to see if one had different values on the first request:
First Request
string u = Application.Request.Url.AbsoluteUri;
"http://example.com/foo/baz/bar/"
string x = Application.Request.Url.OriginalString;
"http://example.com:80/foo/baz/bar"
string y = Application.Request.RawUrl;
"/%2ffo/baz/bar"
bool z = Application.Request.Url.IsWellFormedOriginalString();
true
The only interesting thing is that the Application.Request.RawUrl emits a single-encoded Forward slash (%2f), and translates the encoded backslash (%5c) to a forwardslash (although everything else does that as well).
The RawUrl is still partially encoded on the first request.
Second Request
string u = Application.Request.Url.AbsoluteUri;
"http://example.com//foo/baz/bar"
string x = Application.Request.Url.OriginalString;
"http://example.com:80/%2ffoo/baz/bar"
string y = Application.Request.RawUrl;
"/%2ffoo/baz/bar"
bool z = Application.Request.Url.IsWellFormedOriginalString();
false
Interesting points from the second request:
IsWellFormedOriginalString() is false. On the first request it was true.
The RawUrl is the same (potentially helpful).
The AbsoluteUri is different. On the second request, it has two forward slashes.
Update
Application.Request.ServerVariables["URL"] = /quotes/gc/v12/CMX
Application.Request.ServerVariables["CACHE_URL"] = http://example.com:80/%2ffoo/baz/bar
Open Questions
This seems like a bug in either IIS or .NET. Is it?
This only matters for the very first request made by an application after an iisreset
Besides using RawUrl (as we'd have to worry about a lot of other problems if we parsed the Raw Url instead of using the 'safe' URL provided by .NET), what other methods are there for us to handle this?
Keep in mind, the physical impact of this problem is low: For it to be an actual problem, the first request to the web server from a client would have to be for the above specific URL, and the chances of that happening are relatively low.
Request.Url can be decoded already - I wouldn't trust it for what you are doing.
See the internal details at:
Querystring with url-encoded ampersand prematurely decoded within Request.Url
The solution is to access the values directly via Request.RawUrl.
I realize your prob is with the path, but it seems the same thing is going on. Try the RawUrl - see if it works for you instead.
This really isn't an answer, but possibly a step in the right direction. I haven't had time to create a test harness to prove anything.
I followed this.PrivateAbsolutePath through Reflector and it goes on and on. There is a lot of string manipulation when it's accessed.
public string AbsolutePath
{
get
{
if (this.IsNotAbsoluteUri)
{
throw new InvalidOperationException(SR.GetString("net_uri_NotAbsolute"));
}
string privateAbsolutePath = this.PrivateAbsolutePath; //HERE
if (this.IsDosPath && (privateAbsolutePath[0] == '/'))
{
privateAbsolutePath = privateAbsolutePath.Substring(1);
}
return privateAbsolutePath;
}
}

Truncating Query String & Returning Clean URL C# ASP.net

I would like to take the original URL, truncate the query string parameters, and return a cleaned up version of the URL. I would like it to occur across the whole application, so performing through the global.asax would be ideal. Also, I think a 301 redirect would be in order as well.
ie.
in: www.website.com/default.aspx?utm_source=twitter&utm_medium=social-media
out: www.website.com/default.aspx
What would be the best way to achieve this?
System.Uri is your friend here. This has many helpful utilities on it, but the one you want is GetLeftPart:
string url = "http://www.website.com/default.aspx?utm_source=twitter&utm_medium=social-media";
Uri uri = new Uri(url);
Console.WriteLine(uri.GetLeftPart(UriPartial.Path));
This gives the output: http://www.website.com/default.aspx
[The Uri class does require the protocol, http://, to be specified]
GetLeftPart basicallys says "get the left part of the uri up to and including the part I specify". This can be Scheme (just the http:// bit), Authority (the www.website.com part), Path (the /default.aspx) or Query (the querystring).
Assuming you are on an aspx web page, you can then use Response.Redirect(newUrl) to redirect the caller.
Here is a simple trick
Dim uri = New Uri(Request.Url.AbsoluteUri)
dim reqURL = uri.GetLeftPart(UriPartial.Path)
Here is a quick way of getting the root path sans the full path and query.
string path = Request.Url.AbsoluteUri.Replace(Request.Url.PathAndQuery,"");
This may look a little better.
string rawUrl = String.Concat(this.GetApplicationUrl(), Request.RawUrl);
if (rawUrl.Contains("/post/"))
{
bool hasQueryStrings = Request.QueryString.Keys.Count > 1;
if (hasQueryStrings)
{
Uri uri = new Uri(rawUrl);
rawUrl = uri.GetLeftPart(UriPartial.Path);
HtmlLink canonical = new HtmlLink();
canonical.Href = rawUrl;
canonical.Attributes["rel"] = "canonical";
Page.Header.Controls.Add(canonical);
}
}
Followed by a function to properly fetch the application URL.
Works perfectly.
I'm guessing that you want to do this because you want your users to see pretty looking URLs. The only way to get the client to "change" the URL in its address bar is to send it to a new location - i.e. you need to redirect them.
Are the query string parameters going to affect the output of your page? If so, you'll have to look at how to maintain state between requests (session variables, cookies, etc.) because your query string parameters will be lost as soon as you redirect to a page without them.
There are a few ways you can do this globally (in order of preference):
If you have direct control over your server environment then a configurable server module like ISAPI_ReWrite or IIS 7.0 URL Rewrite Module is a great approach.
A custom IHttpModule is a nice, reusable roll-your-own approach.
You can also do this in the global.asax as you suggest
You should only use the 301 response code if the resource has indeed moved permanently. Again, this depends on whether your application needs to use the query string parameters. If you use a permanent redirect a browser (that respects the 301 response code) will skip loading a URL like .../default.aspx?utm_source=twitter&utm_medium=social-media and load .../default.aspx - you'll never even know about the query string parameters.
Finally, you can use POST method requests. This gives you clean URLs and lets you pass parameters in, but will only work with <form> elements or requests you create using JavaScript.
Take a look at the UriBuilder class. You can create one with a url string, and the object will then parse this url and let you access just the elements you desire.
After completing whatever processing you need to do on the query string, just split the url on the question mark:
Dim _CleanUrl as String = Request.Url.AbsoluteUri.Split("?")(0)
Response.Redirect(_CleanUrl)
Granted, my solution is in VB.NET, but I'd imagine that it could be ported over pretty easily. And since we are only looking for the first element of the split, it even "fails" gracefully when there is no querystring.

Categories

Resources