I have, for example, the following URL:
http://www.beta.microsoft.com/path/page.htm
and I need to retrieve the name from it, which in this case is:
microsoft
I need to get the name of the website - without the sub-domain, www, .com extension and other stuff - only the name.
How do I get it in the fastest and most convenient way?
Din.
It sounds like you mean the domain name:
new Uri(string).Host
You could make a Array with all the domain extensions, replace that with String.Empty to remove it and then pick the last item from Split('.'). This will give you what you want most of the times. Otherwise it is not possible to know which part is the right one.
UPDATE:
This code does what wanted, but i'm guessing there is a better way for this, maybe regex or something in that direction.
http://pastebin.com/SVkiJ1Vq
Related
I am totally new to Regex and have been trying to do this with little success.
Basically what I want to do is to create a regex that matches any google domain such as Google.com, Google.co.uk, etc.
So far I have ^http://www.google\.com/.*$, but this only matches Google.com. How can I modify it to allow any extension besides com?
Thanks!
You could use alternation, but then you would have to supply all TLDs you want to allow:
^http://www\.google\.(?:com|co\.uk|de|es)/.*$
Add more options separated by pipes. Alternatively, you could allow any TLD (whether valid or not) with this:
^http://www\.google\.[a-z.]+/.*$
However this would also match something like http://www.google.myowndomain.com/. I don't think there would be any approach that allows only valid domains without listing them all.
By the way, if you want to make that slash and the path/query at the end optional, change that to one of the following:
^http://www\.google\.(?:com|co\.uk|de|es)(?:/.*)?$
^http://www\.google\.[a-z.]+(?:/.*)?$
And then you could go another step further and make the www. optional:
^http://(?:www\.)?google\.(?:com|co\.uk|de|es)(?:/.*)?$
^http://(?:www\.)?google\.[a-z.]+(?:/.*)?$
You see, matching all possible but valid URLs for a given problem is not an easy task, but one that needs careful consideration ;).
Depending on the language you are using there might be better options with built-in URL-parsing functions. In PHP for instance, this would be a much easier approach:
$domain = parse_url($urlStr, PHP_URL_HOST);
$isGoogle = preg_match('/^(?:www\.)?google\.[a-z.]+/', $domain);
Or (since this is not perfect anyway, as outlined above) you could abandon regex altogether and do the check like this:
$isGoogle = strpos($domain, 'google.') !== false;
I get url as
http://orders.mealsandyou.com/default.php
i dont want to use string functions to use it to get the main domain ie
mealsandyou.com
is there any function in c# to do that, UrilAuthority and all gives subdomain too...
Suggestions welcome, not workarounds
.Net doesn't provide a built-in feature to extract specific parts from Uri.Host. You will have to use string manipulation or a regular expression yourself.
The only constant part of the domain string is the TLD. The TLD is the very last bit of the domain string, eg .com, .net, .uk etc. Everything else under that depends on the particular TLD for its position (so you can't assume the next to last part is the "domain name" as, for .co.uk it would be .co.
In any case I think you're taking the wrong approach. URL rewriting is far more suited to this sort of thing. Have a read of this: learn.iis.net/page.aspx/460/using-the-url-rewrite-module
I have the following so far:
^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?(((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?)?$
Been testing against these:
https://www.google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
https://google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
https://google.com:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
http://www.foo.com
http://www.foo.com/
http://blog.foo.com/
http://blog.foo.com.ar/
http://foo.com
http://blog.foo.com
http://foo.com.ar
I'm using the following tool to test the regexes: regex tester
So far I've been able to yield the following groups:
full protocol
reduced protocol
full domain name
subdomain?
top level domain
port
port number
rest of the url
rest of the "directory"
no idea how to drop this group
page name
argument string
argument string
hash tag
hash tag
I will be using this regex to change the subdomain for my application for cross-domain redirect hyperlinks.
Using Request.Url as a parameter, I want to redirect from
http://example.com or http://www.example.com to http://blog.example.com
How can I achieve this?
I can't really tell what, if any, the current subdomain ( either nothing, www, blog, or forum, for instance) actually is...
What would be the best way to make this replacement?
What I actually need is some way to find out what the top level domain is. in either http://www.example.com, http://blog.example.com, or http://example.com I want to get example.com.
What would be the best way to make this replacement?
This may not be the answer you're looking for... but IMO the best way would be to make use of the System.Uri class.
The Uri class will easily extract the Host for you - and you can then split the host on "." delimiter - that should easily give you access to the current subdomain.
This is just my opinion - and its especially formed because I find it hard to maintain regex code like ^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?(((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?)?$
You can use the Uri class to parse the strings. There are many properties available in addition to Segments:
Uri MyUri = new Uri("https://www.google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash");
foreach (String Segment in MyUri.Segments)
Response.Write(Segment + "<br />");
I think you should reconsider whether usage of a RegEx is really needed in this case;
I think extracting the top level domain from an URL is quite simple; in case of "http://www.example.com/?blah=111" you can simply take the part before the 3rd slash and perform a String.Split('.') and concat the last two array items. In case of "http://www.example.com", even easier.
Regex-patterns are very error-prone and quite hard to maintain and according to me you won't get any advantage of it. I recommend you to get rid off the Regex. Perhaps the result will be 2 - 3 more lines of code, but it will work, your code will be much better readable and easier to understand.
If I have a series of "pattern" Urls of the form:
http://{username}.sitename.com/
http://{username}.othersite.net/
http://mysite.com/{username}
and I have an actual Url of the form:
http://joesmith.sitename.com/
Is there any way that I can match a pattern Url and in turn use it to extract the username portion out the actual Url? I've thought of nasty ways to do it, but it just seems like there should be a more intuitive way to accomplish this.
ASP.NET MVC uses a similar approach to extract the various segments of the URL when it is building its routes. Given the example:
{controller}/{action}
So given the Url of the form, Home/Index, it knows that it is the Home controller calling the Index action method.
Not sure I understand this question correctly but you can just use a regular expression to match anything between 'http://' and the first dot.
A very simple regex will do:
':https?://([a-z0-9\.-]*[a-z0-9])\.sitename\.com'
This will allow any subdomain that only contains valid subdomain characters. Example of allowed subdomains:
joesmith.sitename.com
joe.smith.sitename.com
joe-smith.sitename.com
a-very-long-subdomain.sitename.com
As you can see, you might want to complicate the regex slightly. For instance, you could limit it to only allow a certain amount of characters in the subdomain.
It seems the the quickest and easiest solution is going off of Machine's answer.
var givenUri = "http://joesmith.sitename.com/";
var patternUri = "http://{username}.sitename.com/";
patternUri = patternUri.Replace("{username}", #"([a-z0-9\.-]*[a-z0-9]");
var result = Regex.Match(givenUri, patternUri, RegexOptions.IgnoreCase).Groups;
if(!String.IsNullOrEmpty(result[1].Value))
return result[1].Value;
Seems to work great.
Well, this "pattern URL" is a format you've made up, right? You basically you'll just need to process it.
If the format of it is:
anything inside "{ }" is a thing to capture, everything else must be as is
Then you'd just find the start/end index of those brackets, and match everything else. Then when you get to a place where one is, make sure you only look for chars such that they don't match whatever 'token' comes after the next ending '}'.
There are definitely different ways - ultimately though your server must be configured to handle (and possibly route) these different subdomain requests.
What I would do would be to answer all subdomain requests (except maybe some reserved words, like 'www', 'mail', etc.) on sitename.com with a single handler or page (I'm assuming ASP.NET here based on your C# tag).
I'd use the request path, which is easy enough to get, with some simple string parsing/regex routines (remove the 'http://', grab the first token up until '.' or '/' or '\', etc.) and then use that in a session, making sure to observe URL changes.
Alternately, you could map certain virtual paths to request urls ('joesmith.sitename.com' => 'sitename.com/index.aspx?username=joesmith') via IIS but that's kind of nasty too.
Hope this helps!
I am using dasBlog for my Blog and one of the categories I have is C#. Now when I click on this tag, it takes me to /CategoryView.aspx?category=C thus removing the hash. Does anyone know a quick way of either:
fixing this and keeping C# as the tag OR
locating the data store for the categories and changing to "CSharp"
Thanks in advance
Andrew
There are no ways to fix dasBlog to use exactly "/CategoryView.aspx?category=C#" url because '#' is a special character in links. You also can use the way stackoverflow is use (using "c%23" instead of "c#") but I think that you'll need to fix some code in dasBlog to achieve this.
Also you can rename this category using category renamer