Regex to detect hyphen character - c#

Live demo: http://regex101.com/r/wW6wC4
I'm trying to add a regex expression that allows email addresses like:
asdf.asdf#test-dom-a.com
([\w+\.]+#[\w]{1,})(\.)([0-9a-zA-Z\.\-]{1,})
^---- Thought this would allow hyphens...
what am I missing here?

Your pattern requires that the hyphen appears after a period. Try this instead:
([\w+.]+#[\w-]{1,})(\.)([0-9a-zA-Z.-]+)
Demonstration
Or more simply:
([\w+.]+#[\w.-]+)
Although the second pattern doesn't require that the second part of the address contains a period.
Demonstration

Your hyphen code appears in the segment that checks characters after the first period in the domain name. You need to add it to the match block before the domain name:
([\w+\.]+#[\w\-]{1,})(\.)([0-9a-zA-Z\.\-]{1,})
^^---- check here as well.
In reality, I would search for a more comprehensive email regex - the one you have doesn't seem robust enough IMHO.

Your regex:
([\w+\.]+#[\w]{1,})(\.)([0-9a-zA-Z\.\-]{1,})
This will allow hyphen as last character only.
To allow it anywhere use:
^([\w+.-]+#[\w-])(\.)([0-9a-zA-Z.-])$
OR to allow it only in between use (except first and last position):
^[\w+.-]*#\w[\w-]*\.[\w-]*[0-9a-zA-Z.]+$
Working Demo: http://regex101.com/r/lQ1nV7

You're not matching strings of the form "asd#fge.hj-kl", which as you can see not what you want.
([\w+\.]+)#([0-9a-zA-Z\.\-]{1,})\.com
([\w+\.]+)#([0-9a-zA-Z\.\-]{1,})\.([\w]{1,})

Related

RegEx to capture text between two delimiter characters including 'shared'

If I have the following text...
The quick :brown:fox: jumped over the lazy :dog:.
I would like a regular expression to capture all the words that are between 2 : characters. In the above example it should return :brown:, :fox:, :dog:.
So far, I have this (\:{1}.\w*\s*\:{1}) which returns :brown: and :dog:. I can't quite figure out how to share the : between the 2 matching groups so that it will also return ':fox:'.
Here is a simple pattern which can be made to work:
(?<=:)(\w+)(?=:)
This uses lookarounds to make sure that one or more word characters are surrounded before and after by colons. Check the demo below to see it working.
The match would be available as the first capture group. Actually, it should also be available as the entire match itself, because lookarounds do not consume anything.
Demo
I like the above lookaround approach because it is clean and simple (at least in my mind). If, for some reason, you don't want any lookarounds, then just use the following pattern:
:(\w+):
But note that now you explicitly have to access the first capture group to obtain the matching word without colons on either side.

Extracting words from lines that match different patterns

I'm monitoring incoming e-mail subjects, and each subject may contain a particularly formatted code inside it which I used to reference something else with down the line.
These codes can be anywhere within the string, and sometimes not at all - and so the problem I'm having is my lack of RegEx skills (which I assume is the best option for this solution?).
An example of a subject would be:
"Please refer to reference MZ5051CLA"
or
"Attention for Mr Danshi, RE. 11123MTX"
The codes I'm looking to extract in these scenarios are "MZ5051CLA" and "11123MTX".
The format of MZ5051CLA will be:
- Always starts with "MZ"
- Follows by a number
- Always ends with "CLA"
Is there a simple way to evaluate the subject as a whole and extract any words that match the codes only?
I've looked at various solutions to my problem here on SO, but they're either overly complicated or I can't quite relate.
Edit:
As ShashishChandra pointed out, the idea is to monitor multiple mailboxes, each with their own code formats. So my idea was to implement a regex setting for each mailbox.
Perhaps this was important to mention initially, since a solution to catch all formats in one regex won't work. Apologies for that.
Try this regex:
^.*(?:(MZ\d+CLA)|RE\.\s+(\d+MTX))$
Demo
The below regex would match only the first string MZ5051CLA
\bMZ\d+CLA\b
DEMO
But this would match the both strings MZ5051CLA and 11123MTX,
\b[A-Z0-9]+$
All alphanumeric characters present at the last of a line are matched.
DEMO
This would get you the Alphanumeric string which starts with MZ and ends with CLA or starts with a number and ends with mtx
(?:\b[A-Z0-9]+$|\b\d+MTX\b)
DEMO
Both Codes in One Pattern
It seems that the codes must include at least one uppercase letter and at least one digit. For that kind of pattern, a password-validation technique is commonly used, and I would suggest:
\b(?=[A-Z0-9]*[A-Z])[A-Z0-9]*[0-9][A-Z0-9]*
In the demo, see how only the correct groups are matched. Of course false positives are possible.
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
So, in that case if you don't mind false positives, then use: /^(?=.*[0-9])(?=.*[A-Z])([A-Z0-9]+)$/. This will work well in general.

Password Regex (client side javascript)

I need a regex for the following criteria:
Atleast 7 alphanumeric characters with 1 special character
I used this:
^.*(?=.{7,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$!%^&+=]).*$
It works fine if I type Password1! but doesnt work for PASSWORD1!.
Wont work for: Stmaryshsp1tal!
I am using the Jquery validation plugin where I specify the regex.
When I use a regular expression validator and specify the following regex:
^.*(?=.{7,})(?=(.*\W){1,}).*$
It works perfectly without any issues. When I set this regex in the Jquery validation I am using it doesnt work.
Please can someone shed some light on this? I want to understand why my first regex doesnt work.
(?=.\d)(?=.[a-z])
tries to match a digit and an alphanumeric character at the same place. Remember that (?= ... ) does not glob anything.
What you want is probably:
^(?=.*\W)(?=(.*\w){7})
This is exactly the same as veryfying that your string both matches ^.*\W (at least one special character) and ^(.*\w){7}) (7 alphanumeric characters. Note that it also matches if there are more.
Try this regex:
\S*[##$!%^&+=]+\S*(?<=\S{7,})
EDIT3: Ok, this is last edit ;).
This will match also other special characters. So if you wan't limit the number of valid characters change \S to range of all valid characters.
Here is the regex , I think it can handle all possible combination..
^(?=.{7,})\w*[.##$!%^&+=]+(\w*[.##$!%^&+=]*)*$
here is the link for this regex, http://regexr.com?2tuh5
As a good tool for quickly testing regular expressions I'd suggest http://regexpal.com/ (no relations ;) ). Sometimes simplifying your expression helps a lot.
Then you might want to try something like ^[a-zA-Z0-9##$!%^&+=]{7,}$
Update 2 now including digits
^.*(?=.{7,})(?=.*\d)(?=.*[a-zA-Z])(?=.*[##$%^&+=!]).*$
This matches:
Stmarysh3sptal!, password1!, PASSWORD1P!!!!!!##^^ASSWORD1, 122ss121a212!!
... but not:
Password1, PASSWORD1PASSWORD1, PASSWORD!, Password!, 1221121212!! etc
The reason it matches Password1! but not PASSWORD1! is this clause:
(?=.*[a-z])
That requires at least one lowercase letter in the password. The pattern says that the password must be at least 7 characters long, and contain both uppercase and lowercase letters, at least one number, and at least one of ##$!%^&+=. PASSWORD1! fails because there are no lowercase letters in it.
The second pattern accepts PASSWORD1! because it's a far, far weaker password requirement. All it requires is that the password is 7+ characters and has at least one special character in it (other than _). The {1,} is unnecessary, by the way.
If I were you, I'd avoid weakening the password and just leave it as it is. If I wanted to allow all-lowercase or all-uppercase passwords for some reason, I'd simply change it to
^(?=.*\d)(?=.*[a-zA-Z])(?=.*[##$!%^&+=]).{7,}$
...thus not weakening the password requirements any more than I had to.

C# Regex: only letters followed by an optional

I am looking for a way to get words out of a sentence. I am pretty far with the following expression:
\b([a-zA-Z]+?)\b
but there are some occurrences that it counts a word when I want it not to. E.g a word followed by more than one period like "text..". So, in my regex I want to have the period to be at the end of a word zero or one time. Inserting \.? did not do the trick, and variations on this have not yielded anything fruitful either.
Hope someone can help!
A single dot means any character. You must escape it as
\.?
Maybe you want an expression like this:
\w+\.?
or
\p{L}+\.?
You need to add \.? (and not .?) because the period has special meaning in regexes.
to avoid a match on your example "test.." you ask for you not only need to put the \.? for checking first character after the word to be a dot but also look one character further to check the second character after the word.
I did end up with something like this
\w{2,}\.?[^.]
You should also consider that a sentence not always ends with a . but also ! or ? and alike.
I usually use rubulator.com to quick test a regexp

I have two problems, one of them is a regex

I am updating some code that I didn't write and part of it is a regex as follows:
\[url(?:\s*)\]www\.(.*?)\[/url(?:\s*)\]
I understand that .*? does a non-greedy match of everything in the second register.
What does ?:\s* in the first and third registers do?
Update: As requested, language is C# on .NET 3.5
The syntax (?:) is a way of putting parentheses around a subexpression without separately extracting that part of the string.
The author wanted to match the (.*?) part in the middle, and didn't want the spaces at the beginning or the end from getting in the way. Now you can use \1 or $1 (or whatever the appropriate method is in your particular language) to refer to the domain name, instead of the first chunk of spaces at the beginning of the string
?: makes the parentheses non-grouping. In that regex, you'll only pull out one piece of information, $1, which contains the middle (.*?) expression.
What does ?:\s* in the first and third registers do?
It's matching zero or more whitespace characters, without capturing them.
The regex author intends to allow trailing whitespace in the square-bracket-tags, matching all DNS labels following the "www." like so:
[url]www.foo.com[/url] # foo.com
[url ]www.foo.com[/url ] # same
[url ]www.foo.com[/url] # same
[url]www.foo.com[/url ] # same
Note that the regex also matches:
[url]www.[/url] # empty string!
and fails to match
[url]stackoverflow.com[/url] # no match, bummer
You may find this Regular Expressions Cheat Sheet very helpful (hopefully). I spent ages trying to learn Regex with no luck. And once I read this cheat-sheet - I immediately understood what I previously failed to learn.
http://krijnhoetmer.nl/stuff/regex/cheat-sheet/

Categories

Resources