Regular expression that matches all valid format IPv6 addresses - c#

At first glance, I concede that this question looks like a duplicate of this question and any other related to it:
Regular expression that matches valid IPv6 addresses
That question in fact has an answer that nearly answers my question, but not fully.
The code from that question which I have issues with, yet had the most success with, is as shown below:
private string RemoveIPv6(string sInput)
{
string pattern = #"(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))";
//That is one looooong regex! From: https://stackoverflow.com/a/17871737/3472690
//if (IsCompressedIPv6(sInput))
// sInput = UncompressIPv6(sInput);
string output = Regex.Replace(sInput, pattern, "");
if (output.Contains("Addresses"))
output = output.Substring(0, "Addresses: ".Length);
return output;
}
The issues I had with the regex pattern as provided in this answer, David M. Syzdek's Answer, is that it doesn't match and remove the full form of the IPv6 addresses I'm throwing at it.
I'm using the regex pattern to mainly replace IPv6 addresses in strings with blanks or null value.
For instance,
Addresses: 2404:6800:4003:c02::8a
As well as...
Addresses: 2404:6800:4003:804::200e
And finally...
Addresses: 2001:4998:c:a06::2:4008
All either don't get fully matched by the regex, or failed to be completely matched.
The regex will return me the remaining parts of the string as shown below:
Addresses: 8a
Addresses: 200e
Addresses: 2:4008
As can be seen, it has left remnants of the IPv6 addresses, which is hard to detect and remove, due to the varying formats that the remnants take on. Below is the regex pattern by itself for better analysis:
(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
Therefore, my question is, how can this regex pattern be corrected so it can match, and therefore allow the complete removal of any IPv6 addresses, from a string that doesn't solely contain the IPv6 address(es) itself?
Alternatively, how can the code snippet I provided above be corrected to provide the required outcome?
For those who may be wondering, I am getting the string from the StandardOutput of nslookup commands, and the IPv6 addresses will always differ. For the examples above, I got those IPv6 addresses from "google.com" and "yahoo.com".
I am not using the built-in function to resolve DNS entries for a good reason, which I don't think will matter for the moment, therefore I am using nslookup.
As for the code that is calling that function, if required, is as below: (It itself is also another function/method, or rather part of one)
string output = "";
string garbagecan = "";
string tempRead = "";
string lastRead = "";
using (StreamReader reader = nslookup.StandardOutput)
{
while (reader.Peek() != -1)
{
if (LinesRead > 3)
{
tempRead = reader.ReadLine();
tempRead = RemoveIPv6(tempRead);
if (tempRead.Contains("Addresses"))
output += tempRead;
else if (lastRead.Contains("Addresses"))
output += tempRead.Trim() + Environment.NewLine;
else
output += tempRead + Environment.NewLine;
lastRead = tempRead;
}
else
garbagecan = reader.ReadLine();
LinesRead++;
}
}
return output;
The corrected regex should only allow the removal of IPv6 addresses, and leave IPv4 addresses untouched. The string that will be passed to the regex will not contain the IPv6 address(es) alone, and will almost always contain other details, and as such, it is unpredictable at which index will the addresses appear. The regex is also skipping all other IPv6 addresses after the first occuring IPv6 addresses as well for some reason, it should be noted.
Apologies if there are any missing details, I will try my best to include them in when alerted. I would also prefer working code samples, if possible, as I have almost zero knowledge regarding regex.

(?:^|(?<=\s))(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))(?=\s|$)
Using lookarounds you can enforce a complete match rather than a partial match.See demo.
https://regex101.com/r/cT0hV4/5

(?i)(?<ipv6>(?:[\da-f]{0,4}:){1,7}(?:(?<ipv4>(?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?))|[\da-f]{0,4}))
Demo: Regex101
Github Repository

Related

Why does EmailAddressAttribute.IsValid and MailAddress think that emails which contain "ª" are valid? [duplicate]

This question already has answers here:
Can an email address contain international (non-english) characters?
(7 answers)
Closed 1 year ago.
I have this C# code:
void Main()
{
// method 1 - using MailAddress
var email = "fooªbar#cander.com";
Console.WriteLine(IsValidEmail(email));
// method 2 - using EmailAddressAttribute
var validator = new System.ComponentModel.DataAnnotations.EmailAddressAttribute();
Console.WriteLine(validator.IsValid(email));
}
bool IsValidEmail(string email)
{
try
{
var addr = new System.Net.Mail.MailAddress(email);
return addr.Address == email;
}
catch
{
return false;
}
}
That validates the fooªbar#cander.com email address. And... It validates it althougt it has the "ª" symbol. Why? According to: What characters are allowed in an email address? it shoudn't be valid
It validates it althougt it has the "ª" symbol. Why?
Because your Regex allows "one or more \word characters" before the #, and ª is a word character:
RegexStorm uses the .net engine: you can see that the \w pattern (a single word character) has successfully matched an ª (one match)
According to: What characters are allowed in an email address? it shoudn't be valid
Alas, the regular expression you have used does not accurately implement the specification given in the linked question
When it comes to validating email addresses, genuinely I don't think you should try and control it to a very fine degree - it's a headache to form and maintain a complex Regex that considers every variation and it doesn't really bring much benefit, it just generates a pain point for users whose valid emails don't validate because of a bug in your Regex.
When we test for email validity, we basically only check that it contains an #.. what's the worst that can happen if a user types it in wrong?
(apologies if that picture appears huge; it looks reasonable on a cellphone but I recall that iPhone screenshots sometimes end up looking a bit oversized on web)

How do you get an email address within a string

I am pulling many emails from an Exchange 2003 server and from those emails, trying to determine which are bounce-backs (invalid) so I can remove them from our contacts.
What would be the most efficient method of searching the email bodies to find email addresses on the bounce backs?
You might want to look at this page, which has several variants of regexes for matching email addresses and explains the trade-offs for selecting each. You should definitely read it before picking one here.
Just use a regex.
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b
This is the regex that we use in a lot of our applications for email validation;
public static bool CheckEmail(string email)
{
//validate Email
Regex regex = new Regex(#"^([a-zA-Z0-9_\-\.\']+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$", RegexOptions.IgnoreCase);
Match match = regex.Match(email);
return match.Success;
}
The actual process for correctly identifying a bounced email, rather than an auto-reply or genuine message is a little more complicated, but this will at least give you the email address.
I pulled a few of the answers here into something like this. It actually returns each email address from the string (sometimes there are multiples from the mail host and target address). I can then match each of the email addresses up against the outbound addresses we sent, to verify. I used the article from #plinth to get a better understanding of the regular expression and modified the code from #Chris Bint
However, I'm still wondering if this is the fastest way to monitor 10,000+ emails? Are there any more efficient methods (while still using c#)? The live code won't recreate the Regex object every time within the loop.
public static MatchCollection CheckEmail(string email)
{
Regex regex = new Regex(#"\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b", RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(email);
return matches;
}

c# regex - matching optionals after a named group

I'm sure this has been quite numerous times but though i've checked all similar questions, i couldn't come up with a solution.
The problem is that i've an input urls similar to;
http://www.justin.tv/peacefuljay
http://www.justin.tv/peacefuljay#/w/778713616/3
http://de.justin.tv/peacefuljay#/w/778713616/3
I want to match the slug part of it (in above examples, it's peacefuljay).
Regex i've tried so far are;
http://.*\.justin\.tv/(?<Slug>.*)(?:#.)?
http://.*\.justin\.tv/(?<Slug>.*)(?:#.)
But i can't come with a solution. Either it fails in the first url or in others.
Help appreciated.
The easiest way of parsing a Uri is by using the Uri class:
string justin = "http://www.justin.tv/peacefuljay#/w/778713616/3";
Uri uri = new Uri(justin);
string s1 = uri.LocalPath; // "/peacefuljay"
string s2 = uri.Segments[1]; // "peacefuljay"
If you insisnt on a regex, you can try someting a bit more specific:
Match mate = Regex.Match(str, #"http://(\w+\.)*justin\.tv(?:/(?<Slug>[^#]*))?");
(\w+\.)* - Ensures you match the domain, not anywhere else in the string (eg, hash or query string).
(?:/(?<Slug>[^#]*))? - Optional group with the string you need. [^#] limits the characters you expect to see in your slug, so it should eliminate the need of the extra group after it.
As I see it there's no reason to treat to the parts after the "slug".
Therefore you only need to match all characters after the host that aren't "/" or "#".
http://.*\.justin\.tv/(?<Slug>[^/#]+)
http://.*\.justin\.tv/(?<Slug>.*)#*?
or
http://.*\.justin\.tv/(?<Slug>.*)(#|$)

C# Extracting a name from a string

I want to extract 'James\, Brown' from the string below but I don't always know what the name will be. The comma is causing me some difficuly so what would you suggest to extract James\, Brown?
OU=James\, Brown,OU=Test,DC=Internal,DC=Net
Thanks
A regex is likely your best approach
static string ParseName(string arg) {
var regex = new Regex(#"^OU=([a-zA-Z\\]+\,\s+[a-zA-Z\\]+)\,.*$");
var match = regex.Match(arg);
return match.Groups[1].Value;
}
You can use a regex:
string input = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
Match m = Regex.Match(input, "^OU=(.*?),OU=.*$");
Console.WriteLine(m.Groups[1].Value);
A quite brittle way to do this might be...
string name = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
string[] splitUp = name.Split("=".ToCharArray(),3);
string namePart = splitUp[1].Replace(",OU","");
Console.WriteLine(namePart);
I wouldn't necessarily advocate this method, but I've just come back from a departmental Christmas lyunch and my brain is not fully engaged yet.
I'd start off with a regex to split up the groups:
Regex rx = new Regex(#"(?<!\\),");
String test = "OU=James\\, Brown,OU=Test,DC=Internal,DC=Net";
String[] segments = rx.Split(test);
But from there I would split up the parameters in the array by splitting them up manually, so that you don't have to use a regex that depends on more than the separator character used. Since this looks like an LDAP query, it might not matter if you always look at params[0], but there is a chance that the name might be set as "CN=". You can cover both cases by just reading the query like this:
String name = segments[0].Split('=', 2)[1];
That looks suspiciously like an LDAP or Active Directory distinguished name formatted according to RFC 2253/4514.
Unless you're working with well known names and/or are okay with a fragile hackaround (like the regex solutions) - then you should start by reading the spec.
If you, like me, generally hate implementing code according to RFCs - then hope this guy did a better job following the spec than you would. At least he claims to be 2253 compliant.
If the slash is always there, I would look at potentially using RegEx to do the match, you can use a match group for the last and first names.
^OU=([a-zA-Z])\,\s([a-zA-Z])
That RegEx will match names that include characters only, you will need to refine it a bit for better matching for the non-standard names. Here is a RegEx tester to help you along the way if you go this route.
Replace \, with your own preferred magic string (perhaps & #44;), split on remaining commas or search til the first comma, then replace your magic string with a single comma.
i.e. Something like:
string originalStr = #"OU=James\, Brown,OU=Test,DC=Internal,DC=Net";
string replacedStr = originalStr.Replace("\,", ",");
string name = replacedStr.Substring(0, replacedStr.IndexOf(","));
Console.WriteLine(name.Replace(",", ","));
Assuming you're running in Windows, use PInvoke with DsUnquoteRdnValueW. For code, see my answer to another question: https://stackoverflow.com/a/11091804/628981
If the format is always the same:
string line = GetStringFromWherever();
int start = line.IndexOf("=") + 1;//+1 to get start of name
int end = line.IndexOf("OU=",start) -1; //-1 to remove comma
string name = line.Substring(start, end - start);
Forgive if syntax is not quite right - from memory. Obviously this is not very robust and fails if the format ever changes.

IP address parsing in .NET

I'm using IPAddress.TryParse() to parse IP addresses. However, it's a little too permissive (parsing "1" returns 0.0.0.1). I'd like to limit the input to dotted octet notation. What's the best way to do this?
(Note: I'm using .NET 2.0)
Edit
Let me clarify:
I'm writing an app that will scan a range of IPs looking for certain devices (basically a port scanner). When the user enters "192.168.0.1" for the starting address, I want to automatically fill in "192.168.0.255" as the ending address. The problem is that when they type "1", it parses as "0.0.0.1" and the ending address fills in as "0.0.0.255" - which looks goofy.
If you are interested in parsing the format, then I'd use a regular expression. Here's a good one (source):
bool IsDottedDecimalIP(string possibleIP)
{
Regex R = New Regex(#"\b(?:\d{1,3}\.){3}\d{1,3}\b");
return R.IsMatch(possibleIP) && Net.IPAddress.TryParse(possibleIP, null);
}
That regex doesn't catch invalid IPs but does enforce your pattern. The TryParse checks their validity.
An IP address is actually a 32 bit number - it is not xxx.xxx.xxx.xxx - that's just a human readable format for the same. So IP address 1 is actually 0.0.0.1.
EDIT: Given the clarification, you could either go with a regex as has been suggested, or you could format the short cuts to your liking, so if you want "1" to appears as "1.0.0.0". you could append that and still use the parse method.

Categories

Resources