How can I validate the following incorrect email addresses with regex in c#?
eg:
test#test.com.net.com
or
test#test.net.net.org
These are being validated as correct email addresses. Any thoughts? Thanks
While both test#test.com.net.com and test#test.net.net.org are valid, from a syntactic point of view, their domain parts do not point to existing domains.
For this kind of test, you may want to extract the domain part you are interest in and query the DNS (see RFC 2821 and RFC 2822) to see if it exists.
Since you are using .NET, by the way, I would suggest you to take a look at our EmailVerify.NET, a leading email validation library which can validate the syntax (according to the latest IETF standards), the domain parts and the presence of a mailbox for your email addresses.
You may want to just use something like:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
For a list, please see this page.
Please consider this regex:
([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})
It will match all the email-address, if it did not match the email address, the email-ad is not in correct format, Hope it helps :)
Related
EmailAddressAttribute in .NET 4.6.1 allow dot on the end.
That means that following email: someone#google.com. is valid.
For Microsoft this email is valid.
But, for example, for PayPal, email is not valid.
So does anybody know, is dot on the end of email valid or not?
There is a lot of contending information about whether this is legal, or valid. Those are two different views, and I'm going to try and explain a bit why.
Email addresses are described in part by RFC 5322 - Internet Message Format which explains email formats in excruciating detail.
In section 3.4.1 - Addr-spec, the email address format is explained. I'm paraphrasing for brevity but the general format is:
local-part#domain
With local-name described as one of the following dot-atom / quoted-string / obs-local-part and domain described as dot-atom / domain-literal / obs-domain.
So it's a domain name, which is described in RFC 1034 - Domain Names - Concepts And Facilities.
A domain name can be ambiguous or unambiguous, which is defined by the absence or presence of the trailing dot. Ambiguous domain names are not guaranteed to resolve to a location, but most (if not all at this point) DNS search lists append a period behind the scenes if one is not present, but this is a Quality-of-Life improvement. Unambiguous domain names must contain a trailing period, it's basically a terminating character in DNS.
Thomas Flinkow already mentioned what the source looks like, I just wanted to give some context as to why - historically - the regex might be the way it is. A trailing period is legal, but validity is defined by the mail providers.
Well, since I did not find any documentation on that, I checked the source of the EmailAddressAttribute to see if any comments explained whether or not
someone#google.com.
is considered valid, but I did not find comments regarding that.
What I did find is this regular expression, which is used to determine whether or not an address is invalid:
^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*
(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|
[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09
\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*
(\x0d\x0a))?(\x20|\x09)+)?(\x22)))#((([a-z]|\d|
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$
which is obviously quite long. The interesting part however is this little part at the very end:
\.?
which means match between 0 and 1 "." characters.
Therefore I feel like it is intentionally built in that email addresses ending with a period are considered valid, although I have not found any external resources on whether email addresses ending with a period are actually allowed by any email providers.
For the validating, I would advise to not rely only on the EmailAddressAttribute, but to make your own validator (since EmailAddressAttribute is sealed and you can not derive your own attribute), which could look somewhat like this:
public bool IsValidEmailAddress(string email)
{
var emailValidator = new EmailAddressAttribute();
return emailValidator.IsValid(email) && !String.EndsWith(".");
}
In the code above, the attribute is used to provide the basic checking implementation, and !String.EndsWith(".") takes care of email addresses falsely determined as valid that have a trailing period.
TL;DR: The definite answer seems to be what Yannick Meeus has written:
A trailing period is legal, but validity is defined by the mail
providers.
and therefore Microsoft seems to have conformed to the rules, even though in practice only few (none?) mail providers allow a trailing period. So you have to decide whether or not you also confirm to the formal rules and allow the trailing "." or if you want to exclude it (as demonstrated in the sample code above).
I get url as
http://orders.mealsandyou.com/default.php
i dont want to use string functions to use it to get the main domain ie
mealsandyou.com
is there any function in c# to do that, UrilAuthority and all gives subdomain too...
Suggestions welcome, not workarounds
.Net doesn't provide a built-in feature to extract specific parts from Uri.Host. You will have to use string manipulation or a regular expression yourself.
The only constant part of the domain string is the TLD. The TLD is the very last bit of the domain string, eg .com, .net, .uk etc. Everything else under that depends on the particular TLD for its position (so you can't assume the next to last part is the "domain name" as, for .co.uk it would be .co.
In any case I think you're taking the wrong approach. URL rewriting is far more suited to this sort of thing. Have a read of this: learn.iis.net/page.aspx/460/using-the-url-rewrite-module
How would I check on registration that a user types in a specific email address? For example, i want my registration form to only allow these email addresses:
#gmail.com
#yahoo.com
#live.com
Like this:
[RegularExpression( #"#(gmail|yahoo|live)\.com$", ErrorMessage = "Invalid domain in email address. The domain must be gmail.com, yahoo.com or live.com")]
public string EmailAddress { get ; set ; }
You don't even need a regular expression; you can just use the split() function to obtain the part of the email address after the "#" and check it against your list of allowed providers.
This by itself doesn't guarantee that it's a well-formed email address (that may require a regex, and a somewhat complicated one), but it will make sure that the address ends with one of the domains on your list.
You could use a RegularExpressionValidator control and an expression to look for the email domains. You can find a sample at http://www.regexplib.com if you don't already have one.
You're probably going to want a CustomValidator as well that performs an identical server-side check. Users can circumvent your RegularExpressionValidator if they disable Javascript.
There is not built in but you can use [RegularExpression].You can write custom EmailAttribute deriving from RegularExpressionAttribute.
A very well implementation is done here
You can use following regular expression to check email:
^[a-z0-9_\+-]+(\.[a-z0-9_\+-]+)#[a-z0-9]+(\.[a-z0-9]+)\.([a-z]{2,4})$
Beside this, you can have Data Annotation Extension which has [Email] attribute that allows for validating an email address.
I have the following so far:
^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?(((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?)?$
Been testing against these:
https://www.google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
https://google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
https://google.com:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
http://www.foo.com
http://www.foo.com/
http://blog.foo.com/
http://blog.foo.com.ar/
http://foo.com
http://blog.foo.com
http://foo.com.ar
I'm using the following tool to test the regexes: regex tester
So far I've been able to yield the following groups:
full protocol
reduced protocol
full domain name
subdomain?
top level domain
port
port number
rest of the url
rest of the "directory"
no idea how to drop this group
page name
argument string
argument string
hash tag
hash tag
I will be using this regex to change the subdomain for my application for cross-domain redirect hyperlinks.
Using Request.Url as a parameter, I want to redirect from
http://example.com or http://www.example.com to http://blog.example.com
How can I achieve this?
I can't really tell what, if any, the current subdomain ( either nothing, www, blog, or forum, for instance) actually is...
What would be the best way to make this replacement?
What I actually need is some way to find out what the top level domain is. in either http://www.example.com, http://blog.example.com, or http://example.com I want to get example.com.
What would be the best way to make this replacement?
This may not be the answer you're looking for... but IMO the best way would be to make use of the System.Uri class.
The Uri class will easily extract the Host for you - and you can then split the host on "." delimiter - that should easily give you access to the current subdomain.
This is just my opinion - and its especially formed because I find it hard to maintain regex code like ^((http[s]?|ftp):\/\/)(([^.:\/\s]*)[\.]([^:\/\s]+))(:([^\/]*))?(((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?)?$
You can use the Uri class to parse the strings. There are many properties available in addition to Segments:
Uri MyUri = new Uri("https://www.google.com.ar:8080/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash");
foreach (String Segment in MyUri.Segments)
Response.Write(Segment + "<br />");
I think you should reconsider whether usage of a RegEx is really needed in this case;
I think extracting the top level domain from an URL is quite simple; in case of "http://www.example.com/?blah=111" you can simply take the part before the 3rd slash and perform a String.Split('.') and concat the last two array items. In case of "http://www.example.com", even easier.
Regex-patterns are very error-prone and quite hard to maintain and according to me you won't get any advantage of it. I recommend you to get rid off the Regex. Perhaps the result will be 2 - 3 more lines of code, but it will work, your code will be much better readable and easier to understand.
I'm trying to extract the domain name from a string in C#. You don't necessarily have to use a RegEx but we should be able to extract yourdomain.com from all of the following:
yourdomain.com
www.yourdomain.com
http://www.yourdomain.com
http://www.yourdomain.com/
store.yourdomain.com
http://store.yourdomain.com
whatever.youdomain.com
*.yourdomain.com
Also, any TLD is acceptable, so replace all the above with .net, .org, 'co'uk, etc.
If no scheme present (no colon in string), prepend "http://" to make it a valid URL.
Pass string to Uri constructor.
Access the Uri's Host property.
Now you have the hostname. What exactly you consider the ‘domain name’ of a given hostname is a debatable point. I'm guessing you don't simply mean everything after the first dot.
It's not possible to distinguish hostnames like ‘whatever.youdomain.com’ from domains-in-an-SLD like ‘warwick.ac.uk’ from just the strings. Indeed, there is even a bit of grey area about what is and isn't a public SLD, given the efforts of some registrars to carve out their own niches.
A common approach is to maintain a big list of SLDs and other suffixes used by unrelated entities. This is what web browsers do to stop unwanted public cookie sharing. Once you've found a public suffix, you could add the one nearest prefix in the host name split by dots to get the highest-level entity responsible for the given hostname, if that's what you want. Suffix lists are hell to maintain, but you can piggy-back on someone else's efforts.
Alternatively, if your app has the time and network connection to do it, it could start sniffing for information on the hostname. eg. it could do a whois query for the hostname, and keep looking at each parent until it got a result and that would be the domain name of the lowest-level entity responsible for the given hostname.
Or, if all that's too much work, you could try just chopping off any leading ‘www.’ present!
I would recommend trying this yourself. Using regulator and a regex cheat sheet.
http://sourceforge.net/projects/regulator/
http://regexlib.com/CheatSheet.aspx
Also find some good info on Regular Expressions at coding horror.
Have a look at this other answer. It was for PHP but you'll easily get the regex out of the 4-5 lines of PHP and you can benefit from the discussion that followed (see Alnitak's answer).
A regex doesn't really fit your requirement of "any TLD", since the format and number of TLDs is quite large and continually in flux. If you limited your scope to:
(?<domain>[^\.]+\.([A-Z]+$|co\.[A-Z]$))
You would catch .anything and .co.anything, which I imagine covers most realistic cases...