Regex to replace email addresses - c#

I wish to replace email addresses in a string to something else. It does not work for me.
string body = "this is a test abc#emailadx.com";
string pattern = #"\b[!#$%&'*+./0-9=?_`a-z{|}~^-]+#[.0-9a-z-]+\.[a-z]{2,6}\b";
Regex.Replace(body, pattern, "Hidden Email Address");
return body;
Any hints would be helpful please.

You want to do this:
return Regex.Replace(body, pattern, "Hidden Email Address");
If you look at the documentation for Regex.Replace, you'll see that it returns the newly replaced string. It does not affect the string that was passed in.
NOTE: this is assuming you're using C#. But I'm guessing you are, from the syntax.
FURTHERMORE: If your regex still isn't working well, try this one from the Regular Expressions Cookbook (by Goyvaerts & Levithan):
#"^[\w!#$%&'*+/=?`{|}~^.-]+#[A-Z0-9.-]+$"

Related

Why does EmailAddressAttribute.IsValid and MailAddress think that emails which contain "ª" are valid? [duplicate]

This question already has answers here:
Can an email address contain international (non-english) characters?
(7 answers)
Closed 1 year ago.
I have this C# code:
void Main()
{
// method 1 - using MailAddress
var email = "fooªbar#cander.com";
Console.WriteLine(IsValidEmail(email));
// method 2 - using EmailAddressAttribute
var validator = new System.ComponentModel.DataAnnotations.EmailAddressAttribute();
Console.WriteLine(validator.IsValid(email));
}
bool IsValidEmail(string email)
{
try
{
var addr = new System.Net.Mail.MailAddress(email);
return addr.Address == email;
}
catch
{
return false;
}
}
That validates the fooªbar#cander.com email address. And... It validates it althougt it has the "ª" symbol. Why? According to: What characters are allowed in an email address? it shoudn't be valid
It validates it althougt it has the "ª" symbol. Why?
Because your Regex allows "one or more \word characters" before the #, and ª is a word character:
RegexStorm uses the .net engine: you can see that the \w pattern (a single word character) has successfully matched an ª (one match)
According to: What characters are allowed in an email address? it shoudn't be valid
Alas, the regular expression you have used does not accurately implement the specification given in the linked question
When it comes to validating email addresses, genuinely I don't think you should try and control it to a very fine degree - it's a headache to form and maintain a complex Regex that considers every variation and it doesn't really bring much benefit, it just generates a pain point for users whose valid emails don't validate because of a bug in your Regex.
When we test for email validity, we basically only check that it contains an #.. what's the worst that can happen if a user types it in wrong?
(apologies if that picture appears huge; it looks reasonable on a cellphone but I recall that iPhone screenshots sometimes end up looking a bit oversized on web)

Regular expression that matches all valid format IPv6 addresses

At first glance, I concede that this question looks like a duplicate of this question and any other related to it:
Regular expression that matches valid IPv6 addresses
That question in fact has an answer that nearly answers my question, but not fully.
The code from that question which I have issues with, yet had the most success with, is as shown below:
private string RemoveIPv6(string sInput)
{
string pattern = #"(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))";
//That is one looooong regex! From: https://stackoverflow.com/a/17871737/3472690
//if (IsCompressedIPv6(sInput))
// sInput = UncompressIPv6(sInput);
string output = Regex.Replace(sInput, pattern, "");
if (output.Contains("Addresses"))
output = output.Substring(0, "Addresses: ".Length);
return output;
}
The issues I had with the regex pattern as provided in this answer, David M. Syzdek's Answer, is that it doesn't match and remove the full form of the IPv6 addresses I'm throwing at it.
I'm using the regex pattern to mainly replace IPv6 addresses in strings with blanks or null value.
For instance,
Addresses: 2404:6800:4003:c02::8a
As well as...
Addresses: 2404:6800:4003:804::200e
And finally...
Addresses: 2001:4998:c:a06::2:4008
All either don't get fully matched by the regex, or failed to be completely matched.
The regex will return me the remaining parts of the string as shown below:
Addresses: 8a
Addresses: 200e
Addresses: 2:4008
As can be seen, it has left remnants of the IPv6 addresses, which is hard to detect and remove, due to the varying formats that the remnants take on. Below is the regex pattern by itself for better analysis:
(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
Therefore, my question is, how can this regex pattern be corrected so it can match, and therefore allow the complete removal of any IPv6 addresses, from a string that doesn't solely contain the IPv6 address(es) itself?
Alternatively, how can the code snippet I provided above be corrected to provide the required outcome?
For those who may be wondering, I am getting the string from the StandardOutput of nslookup commands, and the IPv6 addresses will always differ. For the examples above, I got those IPv6 addresses from "google.com" and "yahoo.com".
I am not using the built-in function to resolve DNS entries for a good reason, which I don't think will matter for the moment, therefore I am using nslookup.
As for the code that is calling that function, if required, is as below: (It itself is also another function/method, or rather part of one)
string output = "";
string garbagecan = "";
string tempRead = "";
string lastRead = "";
using (StreamReader reader = nslookup.StandardOutput)
{
while (reader.Peek() != -1)
{
if (LinesRead > 3)
{
tempRead = reader.ReadLine();
tempRead = RemoveIPv6(tempRead);
if (tempRead.Contains("Addresses"))
output += tempRead;
else if (lastRead.Contains("Addresses"))
output += tempRead.Trim() + Environment.NewLine;
else
output += tempRead + Environment.NewLine;
lastRead = tempRead;
}
else
garbagecan = reader.ReadLine();
LinesRead++;
}
}
return output;
The corrected regex should only allow the removal of IPv6 addresses, and leave IPv4 addresses untouched. The string that will be passed to the regex will not contain the IPv6 address(es) alone, and will almost always contain other details, and as such, it is unpredictable at which index will the addresses appear. The regex is also skipping all other IPv6 addresses after the first occuring IPv6 addresses as well for some reason, it should be noted.
Apologies if there are any missing details, I will try my best to include them in when alerted. I would also prefer working code samples, if possible, as I have almost zero knowledge regarding regex.
(?:^|(?<=\s))(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))(?=\s|$)
Using lookarounds you can enforce a complete match rather than a partial match.See demo.
https://regex101.com/r/cT0hV4/5
(?i)(?<ipv6>(?:[\da-f]{0,4}:){1,7}(?:(?<ipv4>(?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?))|[\da-f]{0,4}))
Demo: Regex101
Github Repository

How to find string between some characters using regex in c#

Good day~
I am not really good at this regular expression.
So I need your help, please.
Condition:
Users can input their email addressed and name together.
I want to extract email address and user name out of string.
string pattern1 = "Peter Jackson<peter#jackson.com>";
From that string I want to get "Peter Jackson" and "<peter#jackson.com>".
string pattern2 = "Peter Jackson(peter#jackson.com)";
However, people always make mistakes like below.
And they can also use "[" instead of "<".
so...
string pattern3 = "Peter Jackson[peter#jackson.com]";
Even some stupid users can input like...
string pattern4 = "Peter Jackson{peter#jackson.com}";
So, I had to look for the characters which are "<", "(", "[" and "{".
I tried
string regularExpressionPattern = #"^(<|(|[|{)(.*?)^(}|]|)|>)";
But I think I've done something wrong.
And I also try to think that people could input more mistake like....
string pattern5 = "Peter Jackson<peter#jackson.com>mistake";
Could anyone help this problem?
Advanced thanks.
PS: I know how to split string with a character. So it won't help. I needa proper regular expression.
I believe the regular expression you are looking for is as follows:
(.*?)[<([{](.*?)[>)\]}]
You would want group 1 and group 2.
(.*?)[([<{](.*?)[)\]>}]
I believe this should be adequate.
The name will be in the first captured group, and the email will be in the second.
Seems like (.*)[<\(\[{](.*)[>\)]}] works for me.
Groups for the name, and the email address as well.
http://regexr.com?33sq4 is my test.

How do you get an email address within a string

I am pulling many emails from an Exchange 2003 server and from those emails, trying to determine which are bounce-backs (invalid) so I can remove them from our contacts.
What would be the most efficient method of searching the email bodies to find email addresses on the bounce backs?
You might want to look at this page, which has several variants of regexes for matching email addresses and explains the trade-offs for selecting each. You should definitely read it before picking one here.
Just use a regex.
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b
This is the regex that we use in a lot of our applications for email validation;
public static bool CheckEmail(string email)
{
//validate Email
Regex regex = new Regex(#"^([a-zA-Z0-9_\-\.\']+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$", RegexOptions.IgnoreCase);
Match match = regex.Match(email);
return match.Success;
}
The actual process for correctly identifying a bounced email, rather than an auto-reply or genuine message is a little more complicated, but this will at least give you the email address.
I pulled a few of the answers here into something like this. It actually returns each email address from the string (sometimes there are multiples from the mail host and target address). I can then match each of the email addresses up against the outbound addresses we sent, to verify. I used the article from #plinth to get a better understanding of the regular expression and modified the code from #Chris Bint
However, I'm still wondering if this is the fastest way to monitor 10,000+ emails? Are there any more efficient methods (while still using c#)? The live code won't recreate the Regex object every time within the loop.
public static MatchCollection CheckEmail(string email)
{
Regex regex = new Regex(#"\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b", RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(email);
return matches;
}

How can I use RegEx to make sure a valid email is written in my TextBox>?

I'm a complete newbie to RegEx and I'm sure it'll be brilliant to use once I know how to use it. :P
I have a couple of textBoxes and I was wondering if anyone could me acomplish what I need.
In the EMail textbox, I'd like to make sure the user writes in a valid email. xxx#yyy.zzz
Is there a way for RegEx to help me out?
I'd also really like a way to format the name the user writes down. So if a user writes in "SerGIo TAPIA gutTIerrez I want to format that string (behind the scenes before saving it) to "Sergio Tapia Gutierrez" Can RegEx do this?
Thanks so much SO.
(inb4 Rex :P )
A complete and accurate regex for email validation is surprisingly difficult, I trust you can use google to find some examples.
The general rule for email validation is to actually try to send an email.
Well, this is an easy one! :)
no, there exists no regex that can validate* e-mail addresses;
no, regex cannot transform "SerGIo TAPIA gutTIerrez" into "Sergio Tapia Gutierrez". Sure, some language like Perl (and other perhaps) can mix-in some fancy stuff inside regex-es to do this, but it is not regex that actually performs the transformation. Regex only matches text, plain and simple.
* by 'valid' I mean see if the address actually exists.
This is one way, but there are many others.
public static bool isEmail(string emailAddress)
{
if(string.IsNullOrEmpty(emailAddress))
return false;
Regex EmailAddress = new Regex(#"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*#([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$");
return EmailAddress.IsMatch(emailAddress);
}
http://www.cambiaresearch.com/c4/bf974b23-484b-41c3-b331-0bd8121d5177/Parsing-Email-Addresses-with-Regular-Expressions.aspx
public bool TestEmailRegex(string emailAddress)
{
// string patternLenient = #"\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*";
// Regex reLenient = new Regex(patternLenient);
string patternStrict = #"^(([^<>()[\]\\.,;:\s#""]+"
+ #"(\.[^<>()[\]\\.,;:\s#""]+)*)|("".+""))#"
+ #"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
+ #"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+"
+ #"[a-zA-Z]{2,}))$";
Regex reStrict = new Regex(patternStrict);
// bool isLenientMatch = reLenient.IsMatch(emailAddress);
// return isLenientMatch;
bool isStrictMatch = reStrict.IsMatch(emailAddress);
return isStrictMatch;
}

Categories

Resources