Email entry regex validation - c#

I am using the following regex to validate an email address:
"^[-a-zA-Z0-9][-.a-zA-Z0-9]*#[-.a-zA-Z0-9]+(\.[-.a-zA-Z0-9]+)*\.(com|edu|info|gov|int|mil|net|org|biz|name|museum|coop|aero|pro|[a-zA-Z]{2})$"
Unfortunately, this does not allow email addresses with hyphens underscores. Ex.:
first_last#abc.com
How can I modify this to allow hyphens underscores?

_ is not hyphen, it is underscore. Hyphen is -
If it is okay to start an email address with an underscore, add _ to both of the character classes that appear before #
^[-a-zA-Z0-9_][-.a-zA-Z0-9_]*#...
If the email id cannot start with an _, add it only to the second character class:
^[-a-zA-Z0-9][-.a-zA-Z0-9_]*#...
That said, your regex has a couple of issues:
It accepts email addresses starting with a hyphen; is this intended? If not, remove the - from the first character class to make it [a-zA-Z0-9]
It accepts consecutive periods after the first character thereby making 3...#example.com a valid id - is this status-by-design?
RFC specification for email address is quite complicated. See these threads for more information. Also don't forget to check the one and only perfect and the official regex for validating email addresses (be warned that you might find it a little longer than what sanity would suggest)

"^[-_a-zA-Z0-9][-_.a-zA-Z0-9]*#[-_.a-zA-Z0-9]+(\.[_-.a-zA-Z0-9]+)*\.(com|edu|info|gov|int|mil|net|org|biz|name|museum|coop|aero|pro|[a-zA-Z]{2})$"
Possibly?

^[-a-zA-Z0-9_][-.a-zA-Z0-9_]*#[-.a-zA-Z0-9]+(\.[-.a-zA-Z0-9]+)*\.(com|edu|info|gov|int|mil|net|org|biz|name|museum|coop|aero|pro|[a-zA-Z]{2})$
I added "_" to your two character classes.

Regular-expressions.info has a very good discussion of e-mail address validation by regex, including his preferred regex for "99% of all e-mail addresses in use today", and another to match e-mail addresses as defined by RFC-2822.
I won't do the author a disservice by copying his work here. But I do think it's worthy of a read since it's directly related to your question.

There is also an interesting blog post about email validation on Larry Osterman's website.
This is a followup post to the original post in which he attempts to generate a regular expression to validate an email address. His RegExp is:
string strRegex = #"^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}" +
#"\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\" +
#".)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$";
His notes:
The key thing to note in this grammar is that the local-part is almost free-form when it comes to the local part. And there are characters allowed in the local part like !, *, $, etc that are totally legal according to RFC2822 that aren't allowed.
and ...
Adi Oltean pointed out that V2 of the .Net framework contains the System.Net.MailAddress class which contains a built-in validator.
It looks like the System.Net.Mail.MailAddress constructor validates email addresses and you can catch a FormatException to ensure that the email is valid.

Related

Extracting words from lines that match different patterns

I'm monitoring incoming e-mail subjects, and each subject may contain a particularly formatted code inside it which I used to reference something else with down the line.
These codes can be anywhere within the string, and sometimes not at all - and so the problem I'm having is my lack of RegEx skills (which I assume is the best option for this solution?).
An example of a subject would be:
"Please refer to reference MZ5051CLA"
or
"Attention for Mr Danshi, RE. 11123MTX"
The codes I'm looking to extract in these scenarios are "MZ5051CLA" and "11123MTX".
The format of MZ5051CLA will be:
- Always starts with "MZ"
- Follows by a number
- Always ends with "CLA"
Is there a simple way to evaluate the subject as a whole and extract any words that match the codes only?
I've looked at various solutions to my problem here on SO, but they're either overly complicated or I can't quite relate.
Edit:
As ShashishChandra pointed out, the idea is to monitor multiple mailboxes, each with their own code formats. So my idea was to implement a regex setting for each mailbox.
Perhaps this was important to mention initially, since a solution to catch all formats in one regex won't work. Apologies for that.
Try this regex:
^.*(?:(MZ\d+CLA)|RE\.\s+(\d+MTX))$
Demo
The below regex would match only the first string MZ5051CLA
\bMZ\d+CLA\b
DEMO
But this would match the both strings MZ5051CLA and 11123MTX,
\b[A-Z0-9]+$
All alphanumeric characters present at the last of a line are matched.
DEMO
This would get you the Alphanumeric string which starts with MZ and ends with CLA or starts with a number and ends with mtx
(?:\b[A-Z0-9]+$|\b\d+MTX\b)
DEMO
Both Codes in One Pattern
It seems that the codes must include at least one uppercase letter and at least one digit. For that kind of pattern, a password-validation technique is commonly used, and I would suggest:
\b(?=[A-Z0-9]*[A-Z])[A-Z0-9]*[0-9][A-Z0-9]*
In the demo, see how only the correct groups are matched. Of course false positives are possible.
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
So, in that case if you don't mind false positives, then use: /^(?=.*[0-9])(?=.*[A-Z])([A-Z0-9]+)$/. This will work well in general.

Regex to detect hyphen character

Live demo: http://regex101.com/r/wW6wC4
I'm trying to add a regex expression that allows email addresses like:
asdf.asdf#test-dom-a.com
([\w+\.]+#[\w]{1,})(\.)([0-9a-zA-Z\.\-]{1,})
^---- Thought this would allow hyphens...
what am I missing here?
Your pattern requires that the hyphen appears after a period. Try this instead:
([\w+.]+#[\w-]{1,})(\.)([0-9a-zA-Z.-]+)
Demonstration
Or more simply:
([\w+.]+#[\w.-]+)
Although the second pattern doesn't require that the second part of the address contains a period.
Demonstration
Your hyphen code appears in the segment that checks characters after the first period in the domain name. You need to add it to the match block before the domain name:
([\w+\.]+#[\w\-]{1,})(\.)([0-9a-zA-Z\.\-]{1,})
^^---- check here as well.
In reality, I would search for a more comprehensive email regex - the one you have doesn't seem robust enough IMHO.
Your regex:
([\w+\.]+#[\w]{1,})(\.)([0-9a-zA-Z\.\-]{1,})
This will allow hyphen as last character only.
To allow it anywhere use:
^([\w+.-]+#[\w-])(\.)([0-9a-zA-Z.-])$
OR to allow it only in between use (except first and last position):
^[\w+.-]*#\w[\w-]*\.[\w-]*[0-9a-zA-Z.]+$
Working Demo: http://regex101.com/r/lQ1nV7
You're not matching strings of the form "asd#fge.hj-kl", which as you can see not what you want.
([\w+\.]+)#([0-9a-zA-Z\.\-]{1,})\.com
([\w+\.]+)#([0-9a-zA-Z\.\-]{1,})\.([\w]{1,})

Incorporating length validation into an existing email address format regex

I am using this regex in C# for email format validation based on information from http://www.regular-expressions.info/email.html:
Regex.IsMatch("test#gmail.com",
#"^[A-Z0-9'._%+-]+#(?:[A-Z0-9-]+\.)+[A-Z]{2,4}$",RegexOptions.IgnoreCase);
I would also like to validate that the total length of the email is between 5 - 254 characters. How should this regex be modified to satisfy the length condition?
I don't want to check the length of the string in C# explicitly.
Although it's probably cleaner to just do the length checks separately, you can incorporate your length constraint by adding a noncapturing lookahead to the start of your expression:
^(?=.{5,254})[A-Z0-9'._%+-]+#(?:[A-Z0-9-]+\.)+[A-Z]{2,4}$
I suggest you not put such restriction. if you really want the regex. As paxdiablo said many number of questions exist in SO. Email validation in regex
^.{1,254}$
http://rick.measham.id.au/paste/explain.pl?regex=%5E.%7B1%2C150%7D%24

Password Regex (client side javascript)

I need a regex for the following criteria:
Atleast 7 alphanumeric characters with 1 special character
I used this:
^.*(?=.{7,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$!%^&+=]).*$
It works fine if I type Password1! but doesnt work for PASSWORD1!.
Wont work for: Stmaryshsp1tal!
I am using the Jquery validation plugin where I specify the regex.
When I use a regular expression validator and specify the following regex:
^.*(?=.{7,})(?=(.*\W){1,}).*$
It works perfectly without any issues. When I set this regex in the Jquery validation I am using it doesnt work.
Please can someone shed some light on this? I want to understand why my first regex doesnt work.
(?=.\d)(?=.[a-z])
tries to match a digit and an alphanumeric character at the same place. Remember that (?= ... ) does not glob anything.
What you want is probably:
^(?=.*\W)(?=(.*\w){7})
This is exactly the same as veryfying that your string both matches ^.*\W (at least one special character) and ^(.*\w){7}) (7 alphanumeric characters. Note that it also matches if there are more.
Try this regex:
\S*[##$!%^&+=]+\S*(?<=\S{7,})
EDIT3: Ok, this is last edit ;).
This will match also other special characters. So if you wan't limit the number of valid characters change \S to range of all valid characters.
Here is the regex , I think it can handle all possible combination..
^(?=.{7,})\w*[.##$!%^&+=]+(\w*[.##$!%^&+=]*)*$
here is the link for this regex, http://regexr.com?2tuh5
As a good tool for quickly testing regular expressions I'd suggest http://regexpal.com/ (no relations ;) ). Sometimes simplifying your expression helps a lot.
Then you might want to try something like ^[a-zA-Z0-9##$!%^&+=]{7,}$
Update 2 now including digits
^.*(?=.{7,})(?=.*\d)(?=.*[a-zA-Z])(?=.*[##$%^&+=!]).*$
This matches:
Stmarysh3sptal!, password1!, PASSWORD1P!!!!!!##^^ASSWORD1, 122ss121a212!!
... but not:
Password1, PASSWORD1PASSWORD1, PASSWORD!, Password!, 1221121212!! etc
The reason it matches Password1! but not PASSWORD1! is this clause:
(?=.*[a-z])
That requires at least one lowercase letter in the password. The pattern says that the password must be at least 7 characters long, and contain both uppercase and lowercase letters, at least one number, and at least one of ##$!%^&+=. PASSWORD1! fails because there are no lowercase letters in it.
The second pattern accepts PASSWORD1! because it's a far, far weaker password requirement. All it requires is that the password is 7+ characters and has at least one special character in it (other than _). The {1,} is unnecessary, by the way.
If I were you, I'd avoid weakening the password and just leave it as it is. If I wanted to allow all-lowercase or all-uppercase passwords for some reason, I'd simply change it to
^(?=.*\d)(?=.*[a-zA-Z])(?=.*[##$!%^&+=]).{7,}$
...thus not weakening the password requirements any more than I had to.

How to match a comma separated list of emails with regex?

Trying to validate a comma-separated email list in the textbox with asp:RegularExpressionValidator, see below:
<asp:RegularExpressionValidator ID="RegularExpressionValidator1"
runat="server" ErrorMessage="Wrong email format (separate multiple email by comma [,])" ControlToValidate="txtEscalationEmail"
Display="Dynamic" ValidationExpression="([\w+-.%]+#[\w-.]+\.[A-Za-z]{2,4},?)" ValidationGroup="vgEscalation"></asp:RegularExpressionValidator>
It works just fine when I test it at http://regexhero.net/tester/, but it doesn't work on my page.
Here's my sample input:
test#test.com,test1#test.com
I've tried a suggestion in this post, but couldn't get it to work.
p.s. I don't want a discussion on proper email validation
This Regex will allow emails with spaces after the commas.
^[\W]*([\w+\-.%]+#[\w\-.]+\.[A-Za-z]{2,4}[\W]*,{1}[\W]*)*([\w+\-.%]+#[\w\-.]+\.[A-Za-z]{2,4})[\W]*$
Playing around with this, a colleague came up with this RegEx that's more accurate. The above answer seems to let through an email address list where the first element is not an email address. Here's the update which also allows spaces after the commas.
Try this:
^([\w+-.%]+#[\w-.]+\.[A-Za-z]{2,4},?)+$
Adding the + after the parentheses means that the preceding group can be present 1 or more times.
Adding the ^ and $ means that anything between the start of the string and the start of the match (or the end of the match and the end of the string) causes the validation to fail.
The first answer which is selected as best matches the string like abc#xyz.comxyz#abc.com which is invalid.
The following regex will work for comma separated email ids awesomely.
^([\w+-.%]+#[\w.-]+\.[A-Za-z]{2,4})(,[\w+-.%]+#[\w.-]+\.[A-Za-z]{2,4})*$
It will match single emailId, comma separated emailId but not if comma is missed.
First group will match string of single emailId. Second group is optionally required by '*' token i.e. either 0 or more number of such group but ',' is required to be at the beginning of such emailId which makes comma separated emailId to match to the above regex.
A simple modification of #Donut's answer allows adjacent commas, all TLDs of two characters or more, and arbitrary whitespace between email addresses and commas.
^([\w+-.%]+#[\w-.]+\.[A-Za-z]{2,}(\s*,?\s*)*)+$
You will need to split and remove whitespace and empty strings on your side, but this should be an overall better user experience.
Examples of matched lists:
person#example.co,chris#o.com,simon#example.capetown
person#example.co ,, chris#o.com, simon#example.capetown
^([a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+,*[\W]*)+$
This will also work. It's a little bit stricter on emails, and doesn't that there be more than one email address entered or that a comma be present at all.
The following RegEx will work even with some of the weirdest emails out there, and it supports a comma between emails.
((?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]),?)+
A few Examples:
Valid: planetearth#solar.com
Valid: planet.earth#solar.com
Valid: planet.earth#solar.com,blue.planet#solar.com
Valid: planet-earth#solar-system.com,/#!$%&'*+-/=?^_`{}|~#solar.org,"!#$%&'-/=^_`{}|~.a"#solar.org
Invalid: planet earth#solar.com
Hope This helps.
^([\w+.%-]+#[\w.-]+\.[A-Za-z]{2,})( *,+ *(?1))*( *,* *)$
The point about requiring a comma between groups, but not necessarily at the end is handled here - I'm mostly adding this as it includes a nice subgroup with the (?1) so you only define the actual email address regex once, and then can muck about with delimiters.
Email address ref here: https://www.regular-expressions.info/email.html
The regex below is less restrictive and more appropriate for validating a manually-entered list of comma-separated email addresses. It allows for adjacent commas.
^([\w+-.%]+#[\w-.]+\.[A-Za-z]{2,4},*[\W]*)+$
Use the following regex, it will resolve your problem. The following regex will entertain post and pre spaces with comma too
/^((([a-zA-Z0-9_-.]+)#([a-zA-Z0-9_-.]+).([a-zA-Z\s?]{2,5}){1,25})(\s?,\s*?))$/
I'm a bit late to the party, I know, but I figured I'd add my two cents, since the accepted answer has the problem of matching email addresses next to each other without a comma.
My proposed regex is this:
^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}(,[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,})*$
It's similar to the accepted answer, but solves the problem I was talking about. The solution I came up with was instead of searching for "an email address followed by an optional comma" one or more times, which is what the accepted answer does, this regex searches for "an email address followed by an optional comma prefixed email address any number of times".
That solves the problem by grouping the comma with the email address after it, and making the entire group optional, instead of just the comma.
Notes:
This regex is meant to be used with the insensitive flag enabled.
You can use whichever regex to match an email address you please, I just used the one that I was already using. You would just replace each [A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,} with whichever regex you want to use.
The solution that work for me is the following
^([a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)(,([a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+))*
The easiest solution would be as following. This will match the string with comma-separated list. Use the following regex in your code.
Regex: '[^,]+,?'
^([\w+-.%]+#[\w-.]+\.[A-Za-z]+)(, ?[\w+-.%]+#[\w-.]+\.[A-Za-z]+)*$
Works correctly with 0 or 1 spaces after each comma and also for long domain extensions
This works for me in JS and TS
^([a-z0-9!#$%&'*+/=?^_`{|}~.-]+#[a-z0-9]([a-z0-9-]*[a-z0-9])?(\.[a-z0-9]([a-z0-9-]*[a-z0-9])?)*)(([, ]+[a-z0-9!#$%&'*+/=?^_`{|}~.-]+#[a-z0-9]([a-z0-9-]*[a-z0-9])\.([a-z0-9]([a-z0-9-]*[a-z0-9]))*)?)*$
You can check it out here
https://regex101.com/r/h0l9ks/1
The regex i have for this issue all well except that we need to add comma after every email address.
^((\s*?)[a-zA-Z0-9._%-]+#[a-zA-Z0-9.-]+\.[a-zA-Z,]{2,4}(\s*?),)*
The explanation for this will be like this:
(\s*?) will allow spaces at the start.
[a-zA-Z0-9._%-]+#[a-zA-Z0-9.-]+\.[a-zA-Z,]{2,4} is common email pattern.
(\s*?) will allow space at the end too.
, will restrict comma.
For me, this one works perfectly for multiple emails:
^(\w+((-\w+)|(\.\w+))*\#[A-Za-z0-9]+((\.|-)[A-Za-z0-9]+)*\.[A-Za-z0-9]{2,4}\s*?,?\s*?)+$
RegEx Component
Explanation
^
Matches the start of the string.
\w+
Matches one or more word characters (letters, digits or underscores).
((-\w+)|(\.\w+))*
Matches zero or more occurrences of a hyphen followed by one or more word characters or a period followed by one or more word characters.
\#
Matches the # symbol.
[A-Za-z0-9]+
Matches one or more letters or digits.
((\.|-)[A-Za-z0-9]+)*
Matches zero or more occurrences of a period or hyphen followed by one or more letters or digits.
\.[A-Za-z0-9]{2,4}
Matches a period followed by two to four letters or digits.
\s*?,?\s*?
Matches optional whitespace followed by an optional comma followed by optional whitespace.
+
Matches one or more occurrences of the entire expression.
$
Matches the end of the string.

Categories

Resources