At first glance, I concede that this question looks like a duplicate of this question and any other related to it:
Regular expression that matches valid IPv6 addresses
That question in fact has an answer that nearly answers my question, but not fully.
The code from that question which I have issues with, yet had the most success with, is as shown below:
private string RemoveIPv6(string sInput)
{
string pattern = #"(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))";
//That is one looooong regex! From: https://stackoverflow.com/a/17871737/3472690
//if (IsCompressedIPv6(sInput))
// sInput = UncompressIPv6(sInput);
string output = Regex.Replace(sInput, pattern, "");
if (output.Contains("Addresses"))
output = output.Substring(0, "Addresses: ".Length);
return output;
}
The issues I had with the regex pattern as provided in this answer, David M. Syzdek's Answer, is that it doesn't match and remove the full form of the IPv6 addresses I'm throwing at it.
I'm using the regex pattern to mainly replace IPv6 addresses in strings with blanks or null value.
For instance,
Addresses: 2404:6800:4003:c02::8a
As well as...
Addresses: 2404:6800:4003:804::200e
And finally...
Addresses: 2001:4998:c:a06::2:4008
All either don't get fully matched by the regex, or failed to be completely matched.
The regex will return me the remaining parts of the string as shown below:
Addresses: 8a
Addresses: 200e
Addresses: 2:4008
As can be seen, it has left remnants of the IPv6 addresses, which is hard to detect and remove, due to the varying formats that the remnants take on. Below is the regex pattern by itself for better analysis:
(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
Therefore, my question is, how can this regex pattern be corrected so it can match, and therefore allow the complete removal of any IPv6 addresses, from a string that doesn't solely contain the IPv6 address(es) itself?
Alternatively, how can the code snippet I provided above be corrected to provide the required outcome?
For those who may be wondering, I am getting the string from the StandardOutput of nslookup commands, and the IPv6 addresses will always differ. For the examples above, I got those IPv6 addresses from "google.com" and "yahoo.com".
I am not using the built-in function to resolve DNS entries for a good reason, which I don't think will matter for the moment, therefore I am using nslookup.
As for the code that is calling that function, if required, is as below: (It itself is also another function/method, or rather part of one)
string output = "";
string garbagecan = "";
string tempRead = "";
string lastRead = "";
using (StreamReader reader = nslookup.StandardOutput)
{
while (reader.Peek() != -1)
{
if (LinesRead > 3)
{
tempRead = reader.ReadLine();
tempRead = RemoveIPv6(tempRead);
if (tempRead.Contains("Addresses"))
output += tempRead;
else if (lastRead.Contains("Addresses"))
output += tempRead.Trim() + Environment.NewLine;
else
output += tempRead + Environment.NewLine;
lastRead = tempRead;
}
else
garbagecan = reader.ReadLine();
LinesRead++;
}
}
return output;
The corrected regex should only allow the removal of IPv6 addresses, and leave IPv4 addresses untouched. The string that will be passed to the regex will not contain the IPv6 address(es) alone, and will almost always contain other details, and as such, it is unpredictable at which index will the addresses appear. The regex is also skipping all other IPv6 addresses after the first occuring IPv6 addresses as well for some reason, it should be noted.
Apologies if there are any missing details, I will try my best to include them in when alerted. I would also prefer working code samples, if possible, as I have almost zero knowledge regarding regex.
(?:^|(?<=\s))(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))(?=\s|$)
Using lookarounds you can enforce a complete match rather than a partial match.See demo.
https://regex101.com/r/cT0hV4/5
(?i)(?<ipv6>(?:[\da-f]{0,4}:){1,7}(?:(?<ipv4>(?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?))|[\da-f]{0,4}))
Demo: Regex101
Github Repository
Good day~
I am not really good at this regular expression.
So I need your help, please.
Condition:
Users can input their email addressed and name together.
I want to extract email address and user name out of string.
string pattern1 = "Peter Jackson<peter#jackson.com>";
From that string I want to get "Peter Jackson" and "<peter#jackson.com>".
string pattern2 = "Peter Jackson(peter#jackson.com)";
However, people always make mistakes like below.
And they can also use "[" instead of "<".
so...
string pattern3 = "Peter Jackson[peter#jackson.com]";
Even some stupid users can input like...
string pattern4 = "Peter Jackson{peter#jackson.com}";
So, I had to look for the characters which are "<", "(", "[" and "{".
I tried
string regularExpressionPattern = #"^(<|(|[|{)(.*?)^(}|]|)|>)";
But I think I've done something wrong.
And I also try to think that people could input more mistake like....
string pattern5 = "Peter Jackson<peter#jackson.com>mistake";
Could anyone help this problem?
Advanced thanks.
PS: I know how to split string with a character. So it won't help. I needa proper regular expression.
I believe the regular expression you are looking for is as follows:
(.*?)[<([{](.*?)[>)\]}]
You would want group 1 and group 2.
(.*?)[([<{](.*?)[)\]>}]
I believe this should be adequate.
The name will be in the first captured group, and the email will be in the second.
Seems like (.*)[<\(\[{](.*)[>\)]}] works for me.
Groups for the name, and the email address as well.
http://regexr.com?33sq4 is my test.
I am pulling many emails from an Exchange 2003 server and from those emails, trying to determine which are bounce-backs (invalid) so I can remove them from our contacts.
What would be the most efficient method of searching the email bodies to find email addresses on the bounce backs?
You might want to look at this page, which has several variants of regexes for matching email addresses and explains the trade-offs for selecting each. You should definitely read it before picking one here.
Just use a regex.
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b
This is the regex that we use in a lot of our applications for email validation;
public static bool CheckEmail(string email)
{
//validate Email
Regex regex = new Regex(#"^([a-zA-Z0-9_\-\.\']+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$", RegexOptions.IgnoreCase);
Match match = regex.Match(email);
return match.Success;
}
The actual process for correctly identifying a bounced email, rather than an auto-reply or genuine message is a little more complicated, but this will at least give you the email address.
I pulled a few of the answers here into something like this. It actually returns each email address from the string (sometimes there are multiples from the mail host and target address). I can then match each of the email addresses up against the outbound addresses we sent, to verify. I used the article from #plinth to get a better understanding of the regular expression and modified the code from #Chris Bint
However, I'm still wondering if this is the fastest way to monitor 10,000+ emails? Are there any more efficient methods (while still using c#)? The live code won't recreate the Regex object every time within the loop.
public static MatchCollection CheckEmail(string email)
{
Regex regex = new Regex(#"\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b", RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(email);
return matches;
}
I'm a complete newbie to RegEx and I'm sure it'll be brilliant to use once I know how to use it. :P
I have a couple of textBoxes and I was wondering if anyone could me acomplish what I need.
In the EMail textbox, I'd like to make sure the user writes in a valid email. xxx#yyy.zzz
Is there a way for RegEx to help me out?
I'd also really like a way to format the name the user writes down. So if a user writes in "SerGIo TAPIA gutTIerrez I want to format that string (behind the scenes before saving it) to "Sergio Tapia Gutierrez" Can RegEx do this?
Thanks so much SO.
(inb4 Rex :P )
A complete and accurate regex for email validation is surprisingly difficult, I trust you can use google to find some examples.
The general rule for email validation is to actually try to send an email.
Well, this is an easy one! :)
no, there exists no regex that can validate* e-mail addresses;
no, regex cannot transform "SerGIo TAPIA gutTIerrez" into "Sergio Tapia Gutierrez". Sure, some language like Perl (and other perhaps) can mix-in some fancy stuff inside regex-es to do this, but it is not regex that actually performs the transformation. Regex only matches text, plain and simple.
* by 'valid' I mean see if the address actually exists.
This is one way, but there are many others.
public static bool isEmail(string emailAddress)
{
if(string.IsNullOrEmpty(emailAddress))
return false;
Regex EmailAddress = new Regex(#"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*#([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$");
return EmailAddress.IsMatch(emailAddress);
}
http://www.cambiaresearch.com/c4/bf974b23-484b-41c3-b331-0bd8121d5177/Parsing-Email-Addresses-with-Regular-Expressions.aspx
public bool TestEmailRegex(string emailAddress)
{
// string patternLenient = #"\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*";
// Regex reLenient = new Regex(patternLenient);
string patternStrict = #"^(([^<>()[\]\\.,;:\s#""]+"
+ #"(\.[^<>()[\]\\.,;:\s#""]+)*)|("".+""))#"
+ #"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
+ #"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+"
+ #"[a-zA-Z]{2,}))$";
Regex reStrict = new Regex(patternStrict);
// bool isLenientMatch = reLenient.IsMatch(emailAddress);
// return isLenientMatch;
bool isStrictMatch = reStrict.IsMatch(emailAddress);
return isStrictMatch;
}