Get specific subdomain from URL in foo.bar.car.com - c#

Given a URL as follows:
foo.bar.car.com.au
I need to extract foo.bar.
I came across the following code :
private static string GetSubDomain(Uri url)
{
if (url.HostNameType == UriHostNameType.Dns)
{
string host = url.Host;
if (host.Split('.').Length > 2)
{
int lastIndex = host.LastIndexOf(".");
int index = host.LastIndexOf(".", lastIndex - 1);
return host.Substring(0, index);
}
}
return null;
}
This gives me like foo.bar.car. I want foo.bar. Should i just use split and take 0 and 1?
But then there is possible wwww.
Is there an easy way for this?

Given your requirement (you want the 1st two levels, not including 'www.') I'd approach it something like this:
private static string GetSubDomain(Uri url)
{
if (url.HostNameType == UriHostNameType.Dns)
{
string host = url.Host;
var nodes = host.Split('.');
int startNode = 0;
if(nodes[0] == "www") startNode = 1;
return string.Format("{0}.{1}", nodes[startNode], nodes[startNode + 1]);
}
return null;
}

I faced a similar problem and, based on the preceding answers, wrote this extension method. Most importantly, it takes a parameter that defines the "root" domain, i.e. whatever the consumer of the method considers to be the root. In the OP's case, the call would be
Uri uri = "foo.bar.car.com.au";
uri.DnsSafeHost.GetSubdomain("car.com.au"); // returns foo.bar
uri.DnsSafeHost.GetSubdomain(); // returns foo.bar.car
Here's the extension method:
/// <summary>Gets the subdomain portion of a url, given a known "root" domain</summary>
public static string GetSubdomain(this string url, string domain = null)
{
var subdomain = url;
if(subdomain != null)
{
if(domain == null)
{
// Since we were not provided with a known domain, assume that second-to-last period divides the subdomain from the domain.
var nodes = url.Split('.');
var lastNodeIndex = nodes.Length - 1;
if(lastNodeIndex > 0)
domain = nodes[lastNodeIndex-1] + "." + nodes[lastNodeIndex];
}
// Verify that what we think is the domain is truly the ending of the hostname... otherwise we're hooped.
if (!subdomain.EndsWith(domain))
throw new ArgumentException("Site was not loaded from the expected domain");
// Quash the domain portion, which should leave us with the subdomain and a trailing dot IF there is a subdomain.
subdomain = subdomain.Replace(domain, "");
// Check if we have anything left. If we don't, there was no subdomain, the request was directly to the root domain:
if (string.IsNullOrWhiteSpace(subdomain))
return null;
// Quash any trailing periods
subdomain = subdomain.TrimEnd(new[] {'.'});
}
return subdomain;
}

You can use the following nuget package Nager.PublicSuffix. It uses the PUBLIC SUFFIX LIST from Mozilla to split the domain.
PM> Install-Package Nager.PublicSuffix
Example
var domainParser = new DomainParser();
var data = await domainParser.LoadDataAsync();
var tldRules = domainParser.ParseRules(data);
domainParser.AddRules(tldRules);
var domainName = domainParser.Get("sub.test.co.uk");
//domainName.Domain = "test";
//domainName.Hostname = "sub.test.co.uk";
//domainName.RegistrableDomain = "test.co.uk";
//domainName.SubDomain = "sub";
//domainName.TLD = "co.uk";

private static string GetSubDomain(Uri url)
{
if (url.HostNameType == UriHostNameType.Dns)
{
string host = url.Host;
String[] subDomains = host.Split('.');
return subDomains[0] + "." + subDomains[1];
}
return null;
}

OK, first. Are you specifically looking in 'com.au', or are these general Internet domain names? Because if it's the latter, there is simply no automatic way to determine how much of the domain is a "site" or "zone" or whatever and how much is an individual "host" or other record within that zone.
If you need to be able to figure that out from an arbitrary domain name, you will want to grab the list of TLDs from the Mozilla Public Suffix project (http://publicsuffix.org) and use their algorithm to find the TLD in your domain name. Then you can assume that the portion you want ends with the last label immediately before the TLD.

I would recommend using Regular Expression. The following code snippet should extract what you are looking for...
string input = "foo.bar.car.com.au";
var match = Regex.Match(input, #"^\w*\.\w*\.\w*");
var output = match.Value;

In addition to the NuGet Nager.PubilcSuffix package specified in this answer, there is also the NuGet Louw.PublicSuffix package, which according to its GitHub project page is a .Net Core Library that parses Public Suffix, and is based on the Nager.PublicSuffix project, with the following changes:
Ported to .NET Core Library.
Fixed library so it passes ALL the comprehensive tests.
Refactored classes to split functionality into smaller focused classes.
Made classes immutable. Thus DomainParser can be used as singleton and is thread safe.
Added WebTldRuleProvider and FileTldRuleProvider.
Added functionality to know if Rule was a ICANN or Private domain rule.
Use async programming model
The page also states that many of above changes were submitted back to original Nager.PublicSuffix project.

Related

Sub domain as query string

Is there any way in ASP.net C# to treat sub-domain as query string?
I mean if the user typed london.example.com then I can read that he is after london data and run a query based on that. example.com does not currently have any sub-domains.
This is a DNS problem more than an C#/ASP.Net/IIS problem. In theory, you could use a wildcard DNS record. In practice, you run into this problem from the link:
The exact rules for when a wild card will match are specified in RFC 1034, but the rules are neither intuitive nor clearly specified. This has resulted in incompatible implementations and unexpected results when they are used.
So you can try it, but it's not likely to end well. Moreover, you can fiddle with things until it works in your testing environment, but that won't be able to guarantee things go well for the general public. You'll likely do much better choosing a good DNS provider with an API, and writing code to use the API to keep individual DNS entries in sync. You can also set up your own public DNS server, though I strongly recommend using a well-known and reputable commercial DNS host.
An additional problem you can run into is the TLS/SSL certificate (because of course you're gonna use HTTPS. Right? RIGHT!?) You can try a wild card certificate and probably be okay, but depending on what else you do you may find it's not adequate; suddenly you're needing to provision a separate SSL certificate for every city entry in your database, and that can be a real pain, even via the Let's Encrypt service.
If you do try it, IIS is easily capable of mapping the requests to your ASP.Net app based on a wildcard host name, and ASP.Net itself is easily capable of reading and parsing the host name out of the request and returning different results based on that. IIS URL re-writing should be able to help with this, though I'm not sure whether you can do stock MVC routing in C#/ASP.Net based on this attribute.
I have to add to the previous answers, that after you fix the dns, and translate the subdomain to some parameters you can use the RewritePath to move that parameters to your pages.
For example let say that a function PathTranslate(), translate the london.example.com to example.com/default.aspx?Town=1
Then you use the RewritePath to keep the sub-domain and at the same time send your parameters to your page.
string sThePathToReWrite = PathTranslate();
if (sThePathToReWrite != null){
HttpContext.Current.RewritePath(sThePathToReWrite, false);
}
string PathTranslate()
{
string sCurrentPath = HttpContext.Current.Request.Path;
string sCurrentHost = HttpContext.Current.Request.Url.Host;
//... lot of code ...
return strTranslatedUrl
}
A low tech solution can be like this: (reference: https://www.pavey.me/2016/03/aspnet-c-extracting-parts-of-url.html)
public static List<string> SubDomains(this HttpRequest Request)
{
// variables
string[] requestArray = Request.Host().Split(".".ToCharArray());
var subDomains = new List<string>();
// make sure this is not an ip address
if (Request.IsIPAddress())
{
return subDomains;
}
// make sure we have all the parts necessary
if (requestArray == null)
{
return subDomains;
}
// last part is the tld (e.g. .com)
// second to last part is the domain (e.g. mydomain)
// the remaining parts are the sub-domain(s)
if (requestArray.Length > 2)
{
for (int i = 0; i <= requestArray.Length - 3; i++)
{
subDomains.Add(requestArray[i]);
}
}
// return
return subDomains;
}
// e.g. www
public static string SubDomain(this HttpRequest Request)
{
if (Request.SubDomains().Count > 0)
{
// handle cases where multiple sub-domains (e.g. dev.www)
return Request.SubDomains().Last();
}
else
{
// handle cases where no sub-domains
return string.Empty;
}
}
// e.g. azurewebsites.net
public static string Domain(this HttpRequest Request)
{
// variables
string[] requestArray = Request.Host().Split(".".ToCharArray());
// make sure this is not an ip address
if (Request.IsIPAddress())
{
return string.Empty;
}
// special case for localhost
if (Request.IsLocalHost())
{
return Request.Host().ToLower();
}
// make sure we have all the parts necessary
if (requestArray == null)
{
return string.Empty;
}
// make sure we have all the parts necessary
if (requestArray.Length > 1)
{
return $"{requestArray[requestArray.Length - 2]}.{requestArray[requestArray.Length - 1]}";
}
// return empty string
return string.Empty;
}
Following question is similar to yours:
Using the subdomain as a parameter

Get Last two folder's name from URL using C#

I have a URL and from which i need to get names after "bussiness" and Before the Page Name i.e. "paradise-villas-little.aspx" from below URL.
http://test.com/anc/bussiness/accommo/resort/paradise-villas-little.aspx
I am not getting how can i get this. i have tried the RawUrl, but it fetched the full. Please help me how can i do this.
UPDATE: This is a type of URL, i need to check it for dynamically.
You can create a little helper, and parse the URL from it's Uri Segments :
public static class Helper
{
public static IEnumerable<String> ExtractSegments(this Uri uri, String exclusiveStart)
{
bool startFound = false;
foreach (var seg in uri.Segments.Select(i => i.Replace(#"/","")))
{
if (startFound == false)
{
if (seg == exclusiveStart)
startFound = true;
}
else
{
if (!seg.Contains("."))
yield return seg;
}
}
}
}
And call it like this :
Uri uri = new Uri(#"http://test.com/anc/bussiness/accommo/resort/paradise-villas-little.aspx");
var found = uri.ExtractSegments("bussiness").ToList();
Then found contains "accommo" and "resort", and this method is extensible to any URL length, with or without file name at the end.
Nothing sophisticated in this implementation, just regular string operations:
string url = "http://test.com/anc/bussiness/accommo/resort/paradise-villas-little.aspx";
string startAfter = "business";
string pageName = "paradise-villas-little.aspx";
char delimiter = '/'; //not platform specific
var from = url.IndexOf(startAfter) + startAfter.Length + 1;
var to = url.Length - from - pageName.Length - 1;
var strings = url.Substring(from, to).Split(delimiter);
You may want to add validations though.
You have to use built-in string methods. The best is to use String Split.
String url = "http://test.com/anc/bussiness/accommo/resort/paradise-villas-little.aspx";
String[] url_parts = url.Split('/'); //Now you have all the parts of the URL all folders and page. Access the folder names from string array.
Hope this helps

Removing ../ in the middle of a relative path

I want to get from this
"../lib/../data/myFile.xml"
to this
"../data/myFile.xml"
I guess I could do it by manipulating the string, searching for "../" and canceling them out with the preceding folders but I was looking for an already existing C# solution.
Tried instantiating an Uri from this string and going back toString(). Didn't help. It leaves the string unchanged.
You can always try to use:
Path.GetFullPath("../lib/../data/myFile.xml")
It behaves as you want with absolute paths but you might end up with strange behaviors with relative paths since it always bases itself from the current working directory. For instance:
Path.GetFullPath("/lib/../data/myFile.xml") // C:\data\myFile.xml
Path.GetFullPath("../lib/../data/myFile.xml") // C:\Program Files (x86)\data\myFile.xml
Sounds like you may either need to parse/rebuild the path yourself, or use some kind of well constructed regular expression to do this for you.
Taking the parse/rebuild route, you could do something like:
public static string NormalisePath(string path)
{
var components = path.Split(new Char[] {'/'});
var retval = new Stack<string>();
foreach (var bit in components)
{
if (bit == "..")
{
if (retval.Any())
{
var popped = retval.Pop();
if (popped == "..")
{
retval.Push(popped);
retval.Push(bit);
}
}
else
{
retval.Push(bit);
}
}
else
{
retval.Push(bit);
}
}
var final = retval.ToList();
final.Reverse();
return string.Join("/", final.ToArray());
}
(and yes, you'd probably want better variable names/commenting/etc.)
You can use a regular expression to do this:
public static string NormalisePath(string path)
{
return new Regex(#"\.{2}/.*/(?=\.\.)").Replace(path, "");
}

Umbraco. Get node's url in console application

I work with Umbraco from Console application.
When I try get NiceUrl for some node it is impossible because UmbracoContext.Current is null.
I can get node path with ids like this: "-1,1067,1080", but don't know how convert it in url format.
How Can I get NiceUrl for Node in console application?
I did next:
In my console application I get node by Id, simple like this:
Node someNode = new Node(nodeId);
When I try get NiceUrl:
string url = someNode.NiceUrl;
get ArgumentNullException.
I checked why it: found next answer NiceUrl uses UmbracoContext so it is not possible because it's null.
Also I can't use this: UmbracoContext.Current.ContentCache.GetById(someidhere).Url
Thanks.
Without the UmbracoContext I don't think it's possible in V6 to get the URL of an IContent node.
I looked through the Umbraco source code and decided to recreate the way it's done there. I came up with this, which worked for my needs.
https://gist.github.com/petergledhill/ca2a3a0ea81b06abcb08
public static class ContentExtensions
{
public static string RelativeUrl(this IContent content)
{
var pathParts = new List<string>();
var n = content;
while (n != null)
{
pathParts.Add(n.UrlName());
n = n.Parent();
}
pathParts.RemoveAt(pathParts.Count() - 1); //remove root node
pathParts.Reverse();
var path = "/" + string.Join("/", pathParts);
return path;
}
public static string UrlName(this IContent content)
{
return new DefaultUrlSegmentProvider().GetUrlSegment(content).ToLower();
}
}
Yes, you can't use: UmbracoContext.Current.ContentCache because this is accessing the same context.
It looks like you are using v6+, so instead you will need to use the API services that Umbraco provide, specifically the ContentService.
There is a thread here that looks into the same thing you are asking: http://our.umbraco.org/forum/developers/api-questions/37981-Using-v6-API-ContentService-in-external-application
And an example of a solution here: https://github.com/sitereactor/umbraco-console-example

Alternatives to .NET provided apis regarding uris and urls

I've recently come to the realization that the .NET apis working with URLs and URIs frequently come up short in achieving even basic functionality (atleast easily) including things such as: generating a FQDN url from a relative path, forcing https or back to http, getting the root of the site, combining relative urls properly and so forth.
Are there any alternative libraries out there that have put all of these type of functionality in a simple and reliable project?
I've certainly found myself doing much the same URI-manipulation code more than once, in .NET, but I don't see your cases as places it lacks.
Full URI from relative Uri:
new Uri(base, relative) // (works whether relative is a string or a Uri).
Obtaining the actual FQDN:
string host = uri.Host;
string fqdn = hostEndsWith(".") ? host : host + ".";
Forcing https or back to http:
UriBuilder toHttp = new UriBuilder(someUri);
toHttp.Scheme = "http";
toHttp.Port = 80;
return toHttp.Uri;
UriBuilder toHttps = new UriBuilder(someUri);
toHttps.Scheme = "https";
toHttps.Port = 443;
return toHttps.Uri;
Getting the root of the site:
new Uri(startingUri, "/");
Combining relative urls properly:
new Uri(baseUri, relUri); // We had this one already.
Only two of these are more than a single method call, and of those obtaining the FQDN is pretty obscure (unless rather than wanting the dot-ended FQDN you just wanted the absolute URI, in which case we're back to a single method call).
There is a single method version of the HTTPS/HTTP switching, though it's actually more cumbersome since it calls several properties of the Uri object. I can live with it taking a few lines to do this switch.
Still, to provide a new API one need only supply:
public static Uri SetHttpPrivacy(this Uri uri, bool privacy)
{
UriBuilder ub = new UriBuilder(uri);
if(privacy)
{
ub.Scheme = "https";
ub.Port = 443;
}
else
{
ub.Scheme = "http";
ub.Port = 80;
}
return ub.Uri;
}
I really can't see how an API could possibly be any more concise in the other cases.
XUri is a nice class that is part of the open source project from MindTouch
http://developer.mindtouch.com/en/ref/dream/MindTouch.Dream/XUri?highlight=XUri
This article includes a quick sample on how to use it.
http://blog.developer.mindtouch.com/2009/05/18/consuming-rest-services-and-tdd-with-plug/
I am a fan of it. A little overkill assembly wise if you are going to just use the XUri portion, but there are other really nice things in the library too.
I use a combination of extensions with 'System.IO.Path' object as well.
These are just blurbs for example.
public static Uri SecureIfRemote(this Uri uri){
if(!System.Web.HttpContext.Current.Request.IsSecureConnection &&
!System.Web.HttpContext.Current.Request.IsLocal){
return new Uri......(build secure uri here)
}
return uri;
}
public static NameValueCollection ParseQueryString(Uri uri){
return uri.Query.ParseQueryString();
}
public static NameValueCollection ParseQueryString(this string s)
{
//return
return HttpUtility.ParseQueryString(s);
}

Categories

Resources