How to check if a url is valid or not [duplicate]

How to check if a url is valid or not [duplicate] - c#

This question already has answers here:
How to check whether a string is a valid HTTP URL?
(11 answers)
Closed 8 years ago.
I am trying to filter out invalid url from valid ones using .NET.
I am using Uri.TryCreate() method for this.
It has the following syntax
public static bool TryCreate(Uri baseUri,string relativeUri,out Uri result)
Now I am doing this....
Uri uri = null;
var domainList = new List<string>();
domainList.Add("asas");
domainList.Add("www.stackoverflow.com");
domainList.Add("www.codera.org");
domainList.Add("www.joker.testtest");
domainList.Add("about.me");
domainList.Add("www.ma.tt");
var correctList = new List<string>();
foreach (var item in domainList)
{
if(Uri.TryCreate(item, UriKind.RelativeOrAbsolute, out uri))
{
correctList.Add(item);
}
}
I am trying the above code I expect it to remove asas and www.joker.testtest from the list, but it doesnt.
Can some one help me out on this.
Update :
just tried out with Uri.IsWellFormedUriString this too did'nt help.
More Update
List of Valid uri
http://www.ggogle.com
www.abc.com
www.aa.org
www.aas.co
www.hhh.net
www.ma.tt
List of invalid uri
asas
as##SAd
this.not.valid
www.asa.toptoptop

You seem to be confused about what exactly URL (or URI, the difference is not significant here) is. For example, http://stackoverflow.com is a valid absolute URL. On the other hand, stackoverflow.com is technically a valid relative URL, but it would refer to the file named stackoverflow.com in the current directory, not the website with that name. But stackoverflow.com is a registered domain name.
If you want to check whether a domain name is valid, you need to define what exactly do you mean by “valid”:
Is it a valid domain name? Check whether the string consists of parts separated by dots, each part can contain letters, numbers and a hyphen (-). For example, asas and this.not.valid are both valid domain names.
Could it be an Internet domain name? Domain names on the Internet (as opposed to intranet) are specific in that they always have a TLD (top-level domain). So, asas certainly isn't an Internet domain name, but this.not.valid could be.
Is it a domain name under existing TLD? You can download the list of all TLDs and check against that. For example, this.not.valid wouldn't be considered valid under this rule, but thisisnotvalid.com would.
Is it a registered domain name?
Does the domain name resolve to an IP address? A domain name could be registered, but it still may not have an IP address in its DNS record.
Does the computer the domain name points to respond to requests? The requests that make the most sense are a simple HTTP request (e.g. trying to access http://domaininquestion/) or ping.

Try this one:
public static bool IsWellFormedUriString(
string uriString,
UriKind uriKind
)
Or Alternativly you can do this using RegExp like :
^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$
Take alook at this list

The problem is that none of the urls you have added here will classify as Absolute URLs. For that you have to prefix the protocol of the URL to it.
You can test and find out that
www.stackoverflow.com - Relative URL
http://www.stackoverflow.com - Absolute URL
//www.stackoverflow.com - Absolute URL ( No surprise here. Refer RFC 3986: "Uniform Resource Identifier (URI): Generic Syntax", Section 4.2 )
The point is that you have to prefix at least // to show that its an absolute URL.
So, in a nutshell, since all your URLs are relative URLs, it passes all your tests.

All your examples are valid,
some are absolute URLs some are relative, thats why none are getting removed.
Else for each Uri, you might try and construct a HttpWebRequest class
and then check for correct responses.

After checking other's answer I am aware that you are not looking for existence of domain and ping back you need to test them based on your GRAMMER... or Syntax of domain name right?
For that you need to rely on regex test only... and make proper rule to eveluate the domain name and if they fail exclude them from the list.
You can adopt these patterns and modify one to suite your need and then test them with every element in the list.

all of your URIs are Well-Formatted URIs so TryCreate and IsWellFormedUriString will not work in your case.
from here, the solutions is trying to open the URI:
using(var client = new MyClient()) {
client.HeadOnly = true;
// fine, no content downloaded
string s1 = client.DownloadString("www.stackoverflow.com");
// throws 404
string s2 = client.DownloadString("www.joker.testtest");
}

Related

How can I check if a string is url friendly

I'm making an ecommerce application and I want the user to be able to put content at a URL they have specified. IF a user were to put in something like "/thank-you!", how can I clean the string to either be a valid URL or check this is valid URL format? I would want the url to basically always be hyphened between words so like "/thank-you" from "/thankyou". What's the best approach for achieving such a thing. I'm within c# using .NET MVC 4.

Alas, I cannot comment 'possible duplicate' yet (How to check whether a string is a valid HTTP URL?).
As this must be an answer however, one way to validate a string URL would be using the URI.TryCreate functioanlity. See here also https://msdn.microsoft.com/en-us/library/system.uri.trycreate(v=vs.110).aspx
URI is also the preferred data type for URLs, rather than strings.

How to check what is wrong in domain URL in C#

How to check what is wrong in domain URL in C#
I want to updated domain URL when invalid domain enter.
Input Put of Domain: OutPut Like
1)http:/localhost:1234/------>http://localhost:1234/
2)http://localhost:1234------>http://localhost:1234/
3)http:localhost:1234/------->http://localhost:1234/
4)http:localhost:1234-------->http://localhost:1234/
5)localhost:1234/------------>http://localhost:1234/
6)localhost:1234------------->http://localhost:1234/
Also above all test cases with HTTPS
May be need add more test cases.
I have code of nopCommerece for warning but it's use only current store .
How I develop a code for enter domain is valid or not and return valid domain.

My understanding of the question is you want to take in a given URL and output a correction. At the very minumum you are looking for the string "localhost:1234". You could use a regular expression to check for the existence of this string. If true, output "http://localhost:1234/"
The regular express is "/localhost:1234/g" and can be found here: http://regexr.com/3e2n8
To check this regular expression in C# you will code:
Regex regex = new Regex(#"/localhost:1234/g");
Match match = regex.Match("http:/localhost:1234/"); // your given string
if (match.Success)
{
// your given string contains localhost:1234
}

In any domain name the following are important:
www..com:
portnumber is 80 by default
but still, to check and get the Exception, use this URL,
Best way to determine if a domain name would be a valid in a "hosts" file?

What is best way to normalize an URI to extract just the domain name?

For example:
http://www.google.co.uk
www.google.co.uk
google.co.uk
will all be converted to:
google.co.uk
I would have liked to use the System.Uri class but this only seems to accept urls with a scheme.

Extracting the domain name is easy
The UriBuilder class normalises URLs and handles many edge cases like a missing scheme. This makes it easy to extract the domain name. For example, these all give you www.google.co.uk:
new UriBuilder("www.google.co.uk").Host
new UriBuilder("http://www.google.co.uk").Host
new UriBuilder("ftp://www.google.co.uk:21/some/path").Host
...but removing www. is hard
The problem seems easy, but it's not. You can't reliably remove subdomains like www because there's no real distinction. The domain is www.google.co.uk, including www. There's nothing special about co.uk that makes google part of the domain and www not part of it — it just happens that co.uk is managed by the registrar, and google.co.uk is managed by Google.
To give you an idea of the problem, here's an incomplete list of domain suffixes which includes nearly 7100 entries so far. Notably, which part is which isn't even consistent:
URL the domain you want
--------------------- -------------------
http://www.crews.aero crews.aero
http://www.crew.aero www.crew.aero
The best approach would be what Google itself does for Chrome's omnibar: fetch the (incomplete) list of domain suffixes, cache it temporarily, and compare domain names against the list of domain suffixes. You can see the result for yourself: type "crews.aero" in the Chrome omnibar and it will be treated as a URL, or type "crew.aero" and it will be treated as a search.

try with this code
var url = "";
if (! url.Contains("://"))
{
Url = "http://" + url;
}
var result = new Uri(url).Host;

How to handle paths to files with extra parameters in C#?

I'm downloading files from the Internet inside of my application. Now I'm dealing with multiple file types so I need to able to detect what file type the file is before my application can continue. The problem that I ran into is that some of the URLs where the files are getting downloaded from contain extra parameters.
For example:
http://www.myfaketestsite.com/myaudio.mp3?id=20
Originally I was using String.EndsWith(). Obviously this doesn't work anymore. Any idea on how to detect the file type?

Wrap the URL in a Uri class. It will split it up into different segments that you can use, or you can use the helper methods on the Uri class itself:
var uri = new Uri("http://www.myfaketestsite.com/myaudio.mp3?id=20");
string path = uri.GetLeftPart(UriPartial.Path);
// path = "http://www.myfaketestsite.com/myaudio.mp3"
Your question is a duplicate of:
Truncating Query String & Returning Clean URL C# ASP.net
Get url without querystring

You could always split on the question mark to eliminate the parameters. e.g.
string s = "http://www.myfaketestsite.com/myaudio.mp3?id=20";
string withoutQueryString = s.Split('?')[0];
If no question mark exists, it won't matter, as you'll still be grabbing the value from the zero index. You can then do your logic on the withoutQueryString string.

How do I use a pattern Url to extract a segment from an actual Url?

If I have a series of "pattern" Urls of the form:
http://{username}.sitename.com/
http://{username}.othersite.net/
http://mysite.com/{username}
and I have an actual Url of the form:
http://joesmith.sitename.com/
Is there any way that I can match a pattern Url and in turn use it to extract the username portion out the actual Url? I've thought of nasty ways to do it, but it just seems like there should be a more intuitive way to accomplish this.
ASP.NET MVC uses a similar approach to extract the various segments of the URL when it is building its routes. Given the example:
{controller}/{action}
So given the Url of the form, Home/Index, it knows that it is the Home controller calling the Index action method.

Not sure I understand this question correctly but you can just use a regular expression to match anything between 'http://' and the first dot.

A very simple regex will do:
':https?://([a-z0-9\.-]*[a-z0-9])\.sitename\.com'
This will allow any subdomain that only contains valid subdomain characters. Example of allowed subdomains:
joesmith.sitename.com
joe.smith.sitename.com
joe-smith.sitename.com
a-very-long-subdomain.sitename.com
As you can see, you might want to complicate the regex slightly. For instance, you could limit it to only allow a certain amount of characters in the subdomain.

It seems the the quickest and easiest solution is going off of Machine's answer.
var givenUri = "http://joesmith.sitename.com/";
var patternUri = "http://{username}.sitename.com/";
patternUri = patternUri.Replace("{username}", #"([a-z0-9\.-]*[a-z0-9]");
var result = Regex.Match(givenUri, patternUri, RegexOptions.IgnoreCase).Groups;
if(!String.IsNullOrEmpty(result[1].Value))
return result[1].Value;
Seems to work great.

Well, this "pattern URL" is a format you've made up, right? You basically you'll just need to process it.
If the format of it is:
anything inside "{ }" is a thing to capture, everything else must be as is
Then you'd just find the start/end index of those brackets, and match everything else. Then when you get to a place where one is, make sure you only look for chars such that they don't match whatever 'token' comes after the next ending '}'.

There are definitely different ways - ultimately though your server must be configured to handle (and possibly route) these different subdomain requests.
What I would do would be to answer all subdomain requests (except maybe some reserved words, like 'www', 'mail', etc.) on sitename.com with a single handler or page (I'm assuming ASP.NET here based on your C# tag).
I'd use the request path, which is easy enough to get, with some simple string parsing/regex routines (remove the 'http://', grab the first token up until '.' or '/' or '\', etc.) and then use that in a session, making sure to observe URL changes.
Alternately, you could map certain virtual paths to request urls ('joesmith.sitename.com' => 'sitename.com/index.aspx?username=joesmith') via IIS but that's kind of nasty too.
Hope this helps!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.