Regex for getting domain and subdomain in C#

Regex for getting domain and subdomain in C# - c#

I am having a requirement to correctly get the domain/subdomain based on the current url. this is required in order to correctly fetch the data from database and further call web api with correct parameters.
In perticular, I am facing issues with local and production urls. for ex.
In local, i have
http://sample.local.example.com
http://test.dev.example.com
In production, i have
http://client.example.com
http://program.live.example.com
i need
Subdomain as: sample / test / client / program
Domain as: exmpale
So far i tried to use c# with following code to identify the same. It works fine on my local but i am sure this will create an issue on production at some point of time. Basically, for Subdomain, get the first part and for Domain, get the last part before ''.com''
var host = Request.Url.Host;
var domains = host.Split('.');
var subDomain = domains[0];
string mainDomain = string.Empty;
#if DEBUG
mainDomain = domains[2];
#else
mainDomain = domains[1];
#endif
return Tuple.Create(mainDomain, subDomain);

Instead of a regex, I think Linq should help your here. Try:
public static (string, string) GetDomains(Uri url)
{
var domains = url.Host.Substring(0, url.Host.LastIndexOf(".")).Split('.');
var subDomain = string.Join("/", domains.Take(domains.Length - 1));
var mainDomain = domains.Last();
return (mainDomain, subDomain);
}
output for "http://program.live.example.com"
example
program/live
Try it Online!

This regex should work for you:
Match match = Regex.Match(temp, #"http://(\w+)\.?.*\.(\w+).com$");
string subdomain = match.Groups[1].Value;
string domain = match.Groups[2].Value;
http://(\w+)\. matches 1 or more word characters as group 1 before a dot and after http://
.* matches zero or more occurences of any character
\.(\w+).com matches 1 or more word characters as group 2 before .com and after a dot
$ specifies the end of the string
\.? makes the dot optional to catch the case if there is nothing between group 1 and 2 like in http://client.example.com

You are doing the right and you can get the domain name as the second last value in the array.
var host = Request.Url.Host;
var domains = host.Split('.');
string subDomain = domains[0].Split('/')[2];
string mainDomain = domains[domains.Length-2];
return Tuple.Create(mainDomain, subDomain);
If you want all the subdomains you can put a loop here.

Related

how to detect specific part of the URL and modify it?

I am developing website using asp.net. In there I mainly use URL to pass parameters.
I have URL structure like this
http://localhost:51247/yyy/zzz/hrforum/(if its in my local PC)
http://test.com/yyy/zzz/hrforum/
I need to detect that zzz part and replace it with another word. I tried many things including Regex patterns but seems I am doing git wrong way. Please help me to detect it. Modify it and rebuild the URL
Codes I tried
Regex myRegex = new Regex(#"/([\w\s]+?\;){2}/");
var match = myRegex.Match(fullUrl);
var firstName = match.Groups[0].Value;
But this is not working.

The easiest method of doing this would be to use the Uri.Segments property. For example:
Uri uriAddress1 = new Uri("http://test.com/yyy/zzz/hrforum/");
Uri uriAddress2 = new Uri("ttp://localhost:51247/yyy/zzz/hrforum/");
Console.WriteLine(uriAddress1.Segments[2] == uriAddress2.Segments[2]);
Console.WriteLine("Segment 2 of Address 1: {0} Segment 2 of Address 2: {1}", uriAddress1.Segments[2].Trim('/'),uriAddress2.Segments[2].Trim('/'));
Output:
True
Segment 2 of Address 1: zzz Segment 2 of Address 2: zzz

I'm not sure what you want to achieve but to answer this question:
How to detect specific part of the URL and modify it?
I think you can use Uri class instead of using Regex.
var uri = new Uri("http://test.com/yyy/zzz/hrforum/");
var pathName = uri.PathAndQuery;
foreach (var item in pathName.Split('/'))
{
Console.WriteLine(item);
}
// output:
// yyy
// zzz
// hrforum

How to retrieve the locale(country) code from URL?

I have a URL, which is like http://example.com/UK/Deal.aspx?id=322
My target is to remove the locale(country) part, to make it like http://example.com/Deal.aspx?id=322
Since the URL may have other similar formats like: https://ssl.example.com/JP/Deal.aspx?id=735, using "substring" function is not a good idea.
What I can think about is to use the following method for separating them, and map them back later.
HttpContext.Current.Request.Url.Scheme
HttpContext.Current.Request.Url.Host
HttpContext.Current.Request.Url.AbsolutePath
HttpContext.Current.Request.Url.Query
And, suppose HttpContext.Current.Request.Url.AbsolutePath will be:
/UK/Deal.aspx?id=322
I am not sure how to deal with this since my boss asked me not to use "regular expression"(he thinks it will impact performance...)
Except "Regular Expression", is there any other way to remove UK from it?
p.s.: the UK part may be JP, DE, or other country code.
By the way, for USA, there is no country code, and the url will be http://example.com/Deal.aspx?id=322
Please also take this situation into consideration.
Thank you.

Assuming that you'll have TwoLetterCountryISOName in the Url. yYou can use UriBuilder class to remove the path from Uri without using the Regex.
E.g.
var originalUri = new Uri("http://example.com/UK/Deal.aspx?id=322");
if (IsLocaleEnabled(sourceUri))
{
var builder = new UriBuilder(sourceUri);
builder.Path
= builder.Path.Replace(sourceUri.Segments[1] /* remove UK/ */, string.Empty);
// Construct the Uri with new path
Uri newUri = builder.Uri;;
}
Update:
// Cache the instance for performance benefits.
static readonly Regex regex = new Regex(#"^[aA-zZ]{2}\/$", RegexOptions.Compiled);
/// <summary>
/// Regex to check if Url segments have the 2 letter
/// ISO code as first ocurrance after root
/// </summary>
private bool IsLocaleEnabled(Uri sourceUri)
{
// Update: Compiled regex are way much faster than using non-compiled regex.
return regex.IsMatch(sourceUri.Segments[1]);
}
For performance benefits you must cache it (means keep it in static readonly field). There's no need to parse a pre-defined regex on every request. This way you'll get all the performance benefits you can get.
Result - http://example.com/Deal.aspx?id=322

It all depends on whether the country code always has the same position. If it's not, then some more details on the possible formats are required.. Maybe you could check, if the first segment has two chars or something, to be sure it really is a country code (not sure if this is reliable though). Or you start with the filename, if it's always in the format /[optionalCountryCode]/deal.aspx?...
How about these two approaches (on string level):
public string RemoveCountryCode()
{
Uri originalUri = new Uri("http://example.com/UK/Deal.aspx?id=322");
string hostAndPort = originalUri.GetLeftPart(UriPartial.Authority);
// v1: if country code is always there, always has same position and always
// has format 'XX' this is definitely the easiest and fastest
string trimmedPathAndQuery = originalUri.PathAndQuery.Substring("/XX/".Length);
// v2: if country code is always there, always has same position but might
// not have a fixed format (e.g. XXX)
trimmedPathAndQuery = string.Join("/", originalUri.PathAndQuery.Split('/').Skip(2));
// in both cases you need to join it with the authority again
return string.Format("{0}/{1}", hostAndPort, trimmedPathAndQuery);
}

If the AbsolutePath will always have the format /XX/...pagename.aspx?id=### where XX is the two letter country code, then you can just strip off the first 3 characters.
Example that removes the first 3 characters:
var targetURL = HttpContext.Current.Request.Url.AbsolutePath.Substring(3);
If the country code could be different lengths, then you could find the index of the second / character and start the substring from there.
var sourceURL = HttpContext.Current.Request.Url.AbsolutePath;
var firstOccurance = sourceURL.IndexOf('/')
var secondOccurance = sourceURL.IndexOf('/', firstOccurance);
var targetURL = sourceURL.Substring(secondOccurance);

The easy way would be to treat as string, split it by the "/" separator, remove the fourth element, and then join them back with the "/" separator again:
string myURL = "https://ssl.example.com/JP/Deal.aspx?id=735";
List<string> myURLsplit = myURL.Split('/').ToList().RemoveAt(3);
myURL = string.Join("/", myURLsplit);
RESULT: https://ssl.example.com/Deal.aspx?id=735

Remove the first and last parts of a URL string from AbsolutePath in ASP.NET

I'm not good with manipulating strings and could use a little help.
I'd have a URL (http://localhost/mySite/default.aspx) and I have the AbsolutePath as a string that I'm working with (/mySite/default.aspx):
string mySubUrl = Request.Url.AbsolutePath;
What I'm trying to do is remove the first and last parts of the AbsolutePath. In this example, removing "mySite" and "default.aspx", which would leave me with just "/".
There also may be instances where the URL is longer or shorter, e.g., http://localhost/mySite/mySubFolder/default.aspx, in which case after removing the first and last parts of the AbsolutePath I would be left with '/mySubFolder/'.
I did try working a little with Uri segments but didn't get too far:
string absolutePath = Request.Url.AbsolutePath;
Uri uri = new Uri(absolutePath);
string[] pathSegments = uri.Segments;

Quick solution:
string[] pathSegments = Request.Url.Segments.Skip(1).Take(Request.Url.Segments.Length - 2).ToArray();

The Request.Url.AbsolutePath already removes the left part of the Url for you, so it will give you something like /subSection/subFolder/default.aspx.
Then, you can remove the last part like this:
string absolutePath = Request.Url.AbsolutePath;
string[] urlSegments = absolutePath.Split('/');
urlSegments = urlSegments.Skip(1).Take(urlSegments.Length - 2);
string url = string.Join("/", urlSegments);

Remove characters after specific character in string, then remove substring?

I feel kind of dumb posting this when this seems kind of simple and there are tons of questions on strings/characters/regex, but I couldn't find quite what I needed (except in another language: Remove All Text After Certain Point).
I've got the following code:
[Test]
public void stringManipulation()
{
String filename = "testpage.aspx";
String currentFullUrl = "http://localhost:2000/somefolder/myrep/test.aspx?q=qvalue";
String fullUrlWithoutQueryString = currentFullUrl.Replace("?.*", "");
String urlWithoutPageName = fullUrlWithoutQueryString.Remove(fullUrlWithoutQueryString.Length - filename.Length);
String expected = "http://localhost:2000/somefolder/myrep/";
String actual = urlWithoutPageName;
Assert.AreEqual(expected, actual);
}
I tried the solution in the question above (hoping the syntax would be the same!) but nope. I want to first remove the queryString which could be any variable length, then remove the page name, which again could be any length.
How can I get the remove the query string from the full URL such that this test passes?

For string manipulation, if you just want to kill everything after the ?, you can do this
string input = "http://www.somesite.com/somepage.aspx?whatever";
int index = input.IndexOf("?");
if (index >= 0)
input = input.Substring(0, index);
Edit: If everything after the last slash, do something like
string input = "http://www.somesite.com/somepage.aspx?whatever";
int index = input.LastIndexOf("/");
if (index >= 0)
input = input.Substring(0, index); // or index + 1 to keep slash
Alternately, since you're working with a URL, you can do something with it like this code
System.Uri uri = new Uri("http://www.somesite.com/what/test.aspx?hello=1");
string fixedUri = uri.AbsoluteUri.Replace(uri.Query, string.Empty);

To remove everything before the first /
input = input.Substring(input.IndexOf("/"));
To remove everything after the first /
input = input.Substring(0, input.IndexOf("/") + 1);
To remove everything before the last /
input = input.Substring(input.LastIndexOf("/"));
To remove everything after the last /
input = input.Substring(0, input.LastIndexOf("/") + 1);
An even more simpler solution for removing characters after a specified char is to use the String.Remove() method as follows:
To remove everything after the first /
input = input.Remove(input.IndexOf("/") + 1);
To remove everything after the last /
input = input.Remove(input.LastIndexOf("/") + 1);

Here's another simple solution. The following code will return everything before the '|' character:
if (path.Contains('|'))
path = path.Split('|')[0];
In fact, you could have as many separators as you want, but assuming you only have one separation character, here is how you would get everything after the '|':
if (path.Contains('|'))
path = path.Split('|')[1];
(All I changed in the second piece of code was the index of the array.)

The Uri class is generally your best bet for manipulating Urls.

To remove everything before a specific char, use below.
string1 = string1.Substring(string1.IndexOf('$') + 1);
What this does is, takes everything before the $ char and removes it. Now if you want to remove the items after a character, just change the +1 to a -1 and you are set!
But for a URL, I would use the built in .NET class to take of that.

Request.QueryString helps you to get the parameters and values included within the URL
example
string http = "http://dave.com/customers.aspx?customername=dave"
string customername = Request.QueryString["customername"].ToString();
so the customername variable should be equal to dave
regards

I second Hightechrider: there is a specialized Url class already built for you.
I must also point out, however, that the PHP's replaceAll uses regular expressions for search pattern, which you can do in .NET as well - look at the RegEx class.

you can use .NET's built in method to remove the QueryString.
i.e., Request.QueryString.Remove["whatever"];
here whatever in the [ ] is name of the querystring which you want to
remove.
Try this...
I hope this will help.

You can use this extension method to remove query parameters (everything after the ?) in a string
public static string RemoveQueryParameters(this string str)
{
int index = str.IndexOf("?");
return index >= 0 ? str.Substring(0, index) : str;
}

how to get a text from textbox that is betwen two dots

i just want to get a text from textbox that is betwen two dots for example. www. abc.org . h

in C#
string url = "www.google.com";
string[] split_strings = url.Split('.');
Console.WriteLine(split_strings[1]);
Get String From Textbox:
string url = textbox_url.Text;
string[] split_strings = url.Split('.');
Console.WriteLine(split_strings[1]);
But please, use try and catch ;)

You'll need to be a bit more specific with your question I think. Now, if you're just looking to extract the middle part of the address, something like the following should do the job:
var parts = textbox.Text.Split(new char[] {'.'});
if (parts.Length < 3) throw new InvalidOperationException("Invalid address.");
var middlePart = parts[1];

Is that as specific as your requirement is?
does it only have to work for www.SOMESITE.com
what about other tld extensions like, .net, .org, .co.uk, .ie etc...
what about other subdomains like, www2., api., news. etc...
what about domains with no subdomain like, google.com, theregister.co.uk, bit.ly
if that's a simple as your requirement is,
then
textBox.Text.Replace("www.", "").Replace(".com", "");
though I've a feeling you haven't thought through or fully explained your requirements.
If it is a more complex scenario, you might want to look at Regular expressions.

string haystack= "www.google.com";
string needle = "google";
string myWord = GetWordFromString(haystack, needle);
private string GetWordFromString(string haystack, string needle)
{
if (haystack.ToLower().Contains(needle))
{
return needle;
}
}
I re-read the post with comments I can see that you probably don't know what word you are going to extract... I think the first answer is the one that you are looking fore.
There's also regular expressions for extracting the domainname out of a url if that is your specific need.
Something like this:
public static string ExtractDomainName(string Url)
{
return System.Text.RegularExpressions.Regex.Replace(
Url,
#"^([a-zA-Z]+:\/\/)?([^\/]+)\/.*?$",
"$2"
);
}

string text = "www. abc.org . h";
int left = Math.Max(text.IndexOf('.'), 0),
right = Math.Min(text.LastIndexOf('.'), text.Length - 1);
string result = text.Substring(left+1, right - left-1).Trim();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex for getting domain and subdomain in C# - c#

Related

how to detect specific part of the URL and modify it?

How to retrieve the locale(country) code from URL?

Remove the first and last parts of a URL string from AbsolutePath in ASP.NET

Remove characters after specific character in string, then remove substring?

how to get a text from textbox that is betwen two dots

Categories

Resources