I hope, you can help me with this one.
Is it possible to have an URL like this : http://example.com/xxxyyy
When users access the above link, I'd like to extract the xxxyyy part of the URL for further use.
I'd like to do this WITHOUT subdomains, as I don't know how many different 'xxxyyy's I'll have to accept. (eg http://example.com/europe, http://example.com/spam and so on)
Regards,
Morten
It depends on exactly what you're trying to do, but you should find what you need here: Making Sense of ASP.NET Paths
For example:
string path = Request.ApplicationPath;
Check the documentation here.
Related
I'm making an ecommerce application and I want the user to be able to put content at a URL they have specified. IF a user were to put in something like "/thank-you!", how can I clean the string to either be a valid URL or check this is valid URL format? I would want the url to basically always be hyphened between words so like "/thank-you" from "/thankyou". What's the best approach for achieving such a thing. I'm within c# using .NET MVC 4.
Alas, I cannot comment 'possible duplicate' yet (How to check whether a string is a valid HTTP URL?).
As this must be an answer however, one way to validate a string URL would be using the URI.TryCreate functioanlity. See here also https://msdn.microsoft.com/en-us/library/system.uri.trycreate(v=vs.110).aspx
URI is also the preferred data type for URLs, rather than strings.
I'm writing some kind of a page scraper, and one of the things I'm looking to do is combine the current url with an url fragment extracted from the current page.
Like this:
if (WebPath.IsAbsolute(urlFragment))
links.Add(new Uri(urlFragment));
else
links.Add(new Uri(currentUrl, urlFragment));
Easy peasy - this approach works most of the time, for both relative and absolute Uris.
However, some pages look like http://example.com/couple/of/folders/, with the url fragment couple/of/otherfolders/. And every single browser out there interprets that as http://example.com/couple/of/otherfolders.
Of course, my code yields http://example.com/couple/of/folders/couple/of/otherfolders. Which totally looks correct from the Uri's point of view - but I don't get how a browser can interpret this otherwise.
Now, I've searched for a solution to this problem, but I only found people who didn't know how to combine two urls, so that didn't get me very far. Closest thing I found was this question: How do you combine URL fragments in Java the same way browsers do? , but the answer doesn't tackle my particular problem.
Does anybody know what I'm missing?
Edit - this is the IsAbsolute method (I know I should replace it with new Uri(link).IsAbsoluteUri):
public static bool IsAbsolute(string path)
{
var uppercasePath = path.ToUpper();
return uppercasePath.StartsWith("HTTP://") || uppercasePath.StartsWith("HTTPS://");
}
Normally, browsers wouldn’t do that. But when there’s a <base> element, its href replaces the current page’s URL for the page’s URL-resolving purposes.
Check for a <base> and use it in place of currentUrl if it exists.
Also, thanks for reminding me to fix all my scrapers!
I have, for example, the following URL:
http://www.beta.microsoft.com/path/page.htm
and I need to retrieve the name from it, which in this case is:
microsoft
I need to get the name of the website - without the sub-domain, www, .com extension and other stuff - only the name.
How do I get it in the fastest and most convenient way?
Din.
It sounds like you mean the domain name:
new Uri(string).Host
You could make a Array with all the domain extensions, replace that with String.Empty to remove it and then pick the last item from Split('.'). This will give you what you want most of the times. Otherwise it is not possible to know which part is the right one.
UPDATE:
This code does what wanted, but i'm guessing there is a better way for this, maybe regex or something in that direction.
http://pastebin.com/SVkiJ1Vq
I have the following so far:
^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?(((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?)?$
Been testing against these:
https://www.google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
https://google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
https://google.com:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
http://www.foo.com
http://www.foo.com/
http://blog.foo.com/
http://blog.foo.com.ar/
http://foo.com
http://blog.foo.com
http://foo.com.ar
I'm using the following tool to test the regexes: regex tester
So far I've been able to yield the following groups:
full protocol
reduced protocol
full domain name
subdomain?
top level domain
port
port number
rest of the url
rest of the "directory"
no idea how to drop this group
page name
argument string
argument string
hash tag
hash tag
I will be using this regex to change the subdomain for my application for cross-domain redirect hyperlinks.
Using Request.Url as a parameter, I want to redirect from
http://example.com or http://www.example.com to http://blog.example.com
How can I achieve this?
I can't really tell what, if any, the current subdomain ( either nothing, www, blog, or forum, for instance) actually is...
What would be the best way to make this replacement?
What I actually need is some way to find out what the top level domain is. in either http://www.example.com, http://blog.example.com, or http://example.com I want to get example.com.
What would be the best way to make this replacement?
This may not be the answer you're looking for... but IMO the best way would be to make use of the System.Uri class.
The Uri class will easily extract the Host for you - and you can then split the host on "." delimiter - that should easily give you access to the current subdomain.
This is just my opinion - and its especially formed because I find it hard to maintain regex code like ^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?(((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?)?$
You can use the Uri class to parse the strings. There are many properties available in addition to Segments:
Uri MyUri = new Uri("https://www.google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash");
foreach (String Segment in MyUri.Segments)
Response.Write(Segment + "<br />");
I think you should reconsider whether usage of a RegEx is really needed in this case;
I think extracting the top level domain from an URL is quite simple; in case of "http://www.example.com/?blah=111" you can simply take the part before the 3rd slash and perform a String.Split('.') and concat the last two array items. In case of "http://www.example.com", even easier.
Regex-patterns are very error-prone and quite hard to maintain and according to me you won't get any advantage of it. I recommend you to get rid off the Regex. Perhaps the result will be 2 - 3 more lines of code, but it will work, your code will be much better readable and easier to understand.
If I have a series of "pattern" Urls of the form:
http://{username}.sitename.com/
http://{username}.othersite.net/
http://mysite.com/{username}
and I have an actual Url of the form:
http://joesmith.sitename.com/
Is there any way that I can match a pattern Url and in turn use it to extract the username portion out the actual Url? I've thought of nasty ways to do it, but it just seems like there should be a more intuitive way to accomplish this.
ASP.NET MVC uses a similar approach to extract the various segments of the URL when it is building its routes. Given the example:
{controller}/{action}
So given the Url of the form, Home/Index, it knows that it is the Home controller calling the Index action method.
Not sure I understand this question correctly but you can just use a regular expression to match anything between 'http://' and the first dot.
A very simple regex will do:
':https?://([a-z0-9\.-]*[a-z0-9])\.sitename\.com'
This will allow any subdomain that only contains valid subdomain characters. Example of allowed subdomains:
joesmith.sitename.com
joe.smith.sitename.com
joe-smith.sitename.com
a-very-long-subdomain.sitename.com
As you can see, you might want to complicate the regex slightly. For instance, you could limit it to only allow a certain amount of characters in the subdomain.
It seems the the quickest and easiest solution is going off of Machine's answer.
var givenUri = "http://joesmith.sitename.com/";
var patternUri = "http://{username}.sitename.com/";
patternUri = patternUri.Replace("{username}", #"([a-z0-9\.-]*[a-z0-9]");
var result = Regex.Match(givenUri, patternUri, RegexOptions.IgnoreCase).Groups;
if(!String.IsNullOrEmpty(result[1].Value))
return result[1].Value;
Seems to work great.
Well, this "pattern URL" is a format you've made up, right? You basically you'll just need to process it.
If the format of it is:
anything inside "{ }" is a thing to capture, everything else must be as is
Then you'd just find the start/end index of those brackets, and match everything else. Then when you get to a place where one is, make sure you only look for chars such that they don't match whatever 'token' comes after the next ending '}'.
There are definitely different ways - ultimately though your server must be configured to handle (and possibly route) these different subdomain requests.
What I would do would be to answer all subdomain requests (except maybe some reserved words, like 'www', 'mail', etc.) on sitename.com with a single handler or page (I'm assuming ASP.NET here based on your C# tag).
I'd use the request path, which is easy enough to get, with some simple string parsing/regex routines (remove the 'http://', grab the first token up until '.' or '/' or '\', etc.) and then use that in a session, making sure to observe URL changes.
Alternately, you could map certain virtual paths to request urls ('joesmith.sitename.com' => 'sitename.com/index.aspx?username=joesmith') via IIS but that's kind of nasty too.
Hope this helps!