At first glance, I concede that this question looks like a duplicate of this question and any other related to it:
Regular expression that matches valid IPv6 addresses
That question in fact has an answer that nearly answers my question, but not fully.
The code from that question which I have issues with, yet had the most success with, is as shown below:
private string RemoveIPv6(string sInput)
{
string pattern = #"(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))";
//That is one looooong regex! From: https://stackoverflow.com/a/17871737/3472690
//if (IsCompressedIPv6(sInput))
// sInput = UncompressIPv6(sInput);
string output = Regex.Replace(sInput, pattern, "");
if (output.Contains("Addresses"))
output = output.Substring(0, "Addresses: ".Length);
return output;
}
The issues I had with the regex pattern as provided in this answer, David M. Syzdek's Answer, is that it doesn't match and remove the full form of the IPv6 addresses I'm throwing at it.
I'm using the regex pattern to mainly replace IPv6 addresses in strings with blanks or null value.
For instance,
Addresses: 2404:6800:4003:c02::8a
As well as...
Addresses: 2404:6800:4003:804::200e
And finally...
Addresses: 2001:4998:c:a06::2:4008
All either don't get fully matched by the regex, or failed to be completely matched.
The regex will return me the remaining parts of the string as shown below:
Addresses: 8a
Addresses: 200e
Addresses: 2:4008
As can be seen, it has left remnants of the IPv6 addresses, which is hard to detect and remove, due to the varying formats that the remnants take on. Below is the regex pattern by itself for better analysis:
(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
Therefore, my question is, how can this regex pattern be corrected so it can match, and therefore allow the complete removal of any IPv6 addresses, from a string that doesn't solely contain the IPv6 address(es) itself?
Alternatively, how can the code snippet I provided above be corrected to provide the required outcome?
For those who may be wondering, I am getting the string from the StandardOutput of nslookup commands, and the IPv6 addresses will always differ. For the examples above, I got those IPv6 addresses from "google.com" and "yahoo.com".
I am not using the built-in function to resolve DNS entries for a good reason, which I don't think will matter for the moment, therefore I am using nslookup.
As for the code that is calling that function, if required, is as below: (It itself is also another function/method, or rather part of one)
string output = "";
string garbagecan = "";
string tempRead = "";
string lastRead = "";
using (StreamReader reader = nslookup.StandardOutput)
{
while (reader.Peek() != -1)
{
if (LinesRead > 3)
{
tempRead = reader.ReadLine();
tempRead = RemoveIPv6(tempRead);
if (tempRead.Contains("Addresses"))
output += tempRead;
else if (lastRead.Contains("Addresses"))
output += tempRead.Trim() + Environment.NewLine;
else
output += tempRead + Environment.NewLine;
lastRead = tempRead;
}
else
garbagecan = reader.ReadLine();
LinesRead++;
}
}
return output;
The corrected regex should only allow the removal of IPv6 addresses, and leave IPv4 addresses untouched. The string that will be passed to the regex will not contain the IPv6 address(es) alone, and will almost always contain other details, and as such, it is unpredictable at which index will the addresses appear. The regex is also skipping all other IPv6 addresses after the first occuring IPv6 addresses as well for some reason, it should be noted.
Apologies if there are any missing details, I will try my best to include them in when alerted. I would also prefer working code samples, if possible, as I have almost zero knowledge regarding regex.
(?:^|(?<=\s))(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))(?=\s|$)
Using lookarounds you can enforce a complete match rather than a partial match.See demo.
https://regex101.com/r/cT0hV4/5
(?i)(?<ipv6>(?:[\da-f]{0,4}:){1,7}(?:(?<ipv4>(?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?))|[\da-f]{0,4}))
Demo: Regex101
Github Repository
I want to read the value of the X-Forwarded-For header value in a request.
I've tried
HttpContext.Current.Request.Headers["X-Forwarded-For"].Split(new char[] { ',' }).FirstOrDefault();
in C#.
OR do I need to split the header by ":" and the take the second string?
I am asking this because, Wikipedia says
The general format of the field is:
X-Forwarded-For: client1, proxy1, proxy2
The format that you get in return is client1, proxy1, proxy2
So you split it with the comma, and get the first to see the ip of your client.
If helps, this is a simple way of getting the user's IP address, considering the X_FORWARDED_FOR header
var forwardedFor = Request.ServerVariables["HTTP_X_FORWARDED_FOR"];
var userIpAddress = String.IsNullOrWhiteSpace(forwardedFor) ?
Request.ServerVariables["REMOTE_ADDR"] : forwardedFor.Split(',').Select(s => s.Trim()).FirstOrDefault();
Don't forget that X-Forwarded-For can contain whatever client writes there. It can contain XSS or SQL-injection inside.
Sometimes the first may contain one of the local (private) reserved addresses which is not useful. Also the first position(s) are open to to spoofing.
Update - April 2018: Sampling the cases of a live production website where the first address is local (private) indicates some configuration issue on the end user's network or his ISP. The cases are occurring only rarely (<1%) and consistently for the same end users.
The answer below suggests walking from right to left until you hit a public address. Not sure anyone actually does this but it points out the issue.
https://husobee.github.io/golang/ip-address/2015/12/17/remote-ip-go.html
I'm trying to extract the domain name from a string in C#. You don't necessarily have to use a RegEx but we should be able to extract yourdomain.com from all of the following:
yourdomain.com
www.yourdomain.com
http://www.yourdomain.com
http://www.yourdomain.com/
store.yourdomain.com
http://store.yourdomain.com
whatever.youdomain.com
*.yourdomain.com
Also, any TLD is acceptable, so replace all the above with .net, .org, 'co'uk, etc.
If no scheme present (no colon in string), prepend "http://" to make it a valid URL.
Pass string to Uri constructor.
Access the Uri's Host property.
Now you have the hostname. What exactly you consider the ‘domain name’ of a given hostname is a debatable point. I'm guessing you don't simply mean everything after the first dot.
It's not possible to distinguish hostnames like ‘whatever.youdomain.com’ from domains-in-an-SLD like ‘warwick.ac.uk’ from just the strings. Indeed, there is even a bit of grey area about what is and isn't a public SLD, given the efforts of some registrars to carve out their own niches.
A common approach is to maintain a big list of SLDs and other suffixes used by unrelated entities. This is what web browsers do to stop unwanted public cookie sharing. Once you've found a public suffix, you could add the one nearest prefix in the host name split by dots to get the highest-level entity responsible for the given hostname, if that's what you want. Suffix lists are hell to maintain, but you can piggy-back on someone else's efforts.
Alternatively, if your app has the time and network connection to do it, it could start sniffing for information on the hostname. eg. it could do a whois query for the hostname, and keep looking at each parent until it got a result and that would be the domain name of the lowest-level entity responsible for the given hostname.
Or, if all that's too much work, you could try just chopping off any leading ‘www.’ present!
I would recommend trying this yourself. Using regulator and a regex cheat sheet.
http://sourceforge.net/projects/regulator/
http://regexlib.com/CheatSheet.aspx
Also find some good info on Regular Expressions at coding horror.
Have a look at this other answer. It was for PHP but you'll easily get the regex out of the 4-5 lines of PHP and you can benefit from the discussion that followed (see Alnitak's answer).
A regex doesn't really fit your requirement of "any TLD", since the format and number of TLDs is quite large and continually in flux. If you limited your scope to:
(?<domain>[^\.]+\.([A-Z]+$|co\.[A-Z]$))
You would catch .anything and .co.anything, which I imagine covers most realistic cases...
I'm using IPAddress.TryParse() to parse IP addresses. However, it's a little too permissive (parsing "1" returns 0.0.0.1). I'd like to limit the input to dotted octet notation. What's the best way to do this?
(Note: I'm using .NET 2.0)
Edit
Let me clarify:
I'm writing an app that will scan a range of IPs looking for certain devices (basically a port scanner). When the user enters "192.168.0.1" for the starting address, I want to automatically fill in "192.168.0.255" as the ending address. The problem is that when they type "1", it parses as "0.0.0.1" and the ending address fills in as "0.0.0.255" - which looks goofy.
If you are interested in parsing the format, then I'd use a regular expression. Here's a good one (source):
bool IsDottedDecimalIP(string possibleIP)
{
Regex R = New Regex(#"\b(?:\d{1,3}\.){3}\d{1,3}\b");
return R.IsMatch(possibleIP) && Net.IPAddress.TryParse(possibleIP, null);
}
That regex doesn't catch invalid IPs but does enforce your pattern. The TryParse checks their validity.
An IP address is actually a 32 bit number - it is not xxx.xxx.xxx.xxx - that's just a human readable format for the same. So IP address 1 is actually 0.0.0.1.
EDIT: Given the clarification, you could either go with a regex as has been suggested, or you could format the short cuts to your liking, so if you want "1" to appears as "1.0.0.0". you could append that and still use the parse method.