I need to match this string 011Q-0SH3-936729 but not 345376346 or asfsdfgsfsdf
It has to contain characters AND numbers AND dashes
Pattern could be 011Q-0SH3-936729 or 011Q-0SH3-936729-SDF3 or 000-222-AAAA or 011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729 and I want it to be able to match anyone of those. Reason for this is that I don't really know if the format is fixed and I have no way of finding out either so I need to come up with a generic solution for a pattern with any number of dashes and the pattern recurring any number of times.
Sorry this is probably a stupid question, but I really suck at Regular expressions.
TIA
foundMatch = Regex.IsMatch(subjectString,
#"^ # Start of the string
(?=.*\p{L}) # Assert that there is at least one letter
(?=.*\p{N}) # and at least one digit
(?=.*-) # and at least one dash.
[\p{L}\p{N}-]* # Match a string of letters, digits and dashes
$ # until the end of the string.",
RegexOptions.IgnorePatternWhitespace);
should do what you want. If by letters/digits you meant "only ASCII letters/digits" (and not international/Unicode letters, too), then use
foundMatch = Regex.IsMatch(subjectString,
#"^ # Start of the string
(?=.*[A-Z]) # Assert that there is at least one letter
(?=.*[0-9]) # and at least one digit
(?=.*-) # and at least one dash.
[A-Z0-9-]* # Match a string of letters, digits and dashes
$ # until the end of the string.",
RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
EDIT:
this will match any of the key provided in your comments:
^[0-9A-Z]+(-[0-9A-Z]+)+$
this means the key starts with the digit or letter and have at leats one dash symbol:
Without more info about the regularity of the dashes or otherwise, this is the best we can do:
Regex.IsMatch(input,#"[A-Z0-9\-]+\-[A-Z0-9]")
Although this will also match -A-0
Most naive implementation EVER (might get you started):
([0-9]|[A-Z])+(-)([0-9]|[A-Z])+(-)([0-9]|[A-Z])+
Tested with Regex Coach.
EDIT:
That does match only three groups; here another, slightly better:
([0-9A-Z]+\-)+([0-9A-Z]+)
Are you applying the regex to a whole string (i.e., validating or filtering)? If so, Tim's answer should put you right. But if you're plucking matches from a larger string, it gets a bit more complicated. Here's how I would do that:
string input = #"Pattern could be 011Q-0SH3-936729 or 011Q-0SH3-936729-SDF3 or 000-222-AAAA or 011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729 but not 345-3763-46 or ASFS-DFGS-FSDF or ASD123FGH987.";
Regex pluckingRegex = new Regex(
#"(?<!\S) # start of 'word'
(?=\S*\p{L}) # contains a letter
(?=\S*\p{N}) # contains a digit
(?=\S*-) # contains a hyphen
[\p{L}\p{N}-]+ # gobble up letters, digits and hyphens only
(?!\S) # end of 'word'
", RegexOptions.IgnorePatternWhitespace);
foreach (Match m in pluckingRegex.Matches(input))
{
Console.WriteLine(m.Value);
}
output: 011Q-0SH3-936729
011Q-0SH3-936729-SDF3
000-222-AAAA
011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729
The negative lookarounds serve as 'word' boundaries: they insure the matched substring starts either at the beginning of the string or after a whitespace character ((?<!\S)), and ends either at the end of the string or before a whitespace character ((?!\S)).
The three positive lookaheads work just like Tim's, except they use \S* to skip whatever precedes the first letter/digit/hyphen. We can't use .* in this case because that would allow it to skip to the next word, or the next, etc., defeating the purpose of the lookahead.
Related
This seems like it should be easy, but I'm not so good with regex, and this doesn't seem to be easy to find on google.
I need a regex that starts with the string 'SP-multiple digits' and ends with the string '- multiple digits'
For example i have to match '-12' in "Sp-1234-12".
My attempt was: [^*-]*$ -> This case matches everything after the minus but i need the minus included.
For that digit and hyphen format, you could use a capture group for the part of the string that you want:
^Sp(?:-\d+)*(-\d+)$
Explanation
^ Start of string
Sp Match literally
(?:-\d+)* Optionally repeat - and 1+ digits
(-\d+) Capture group 1, match - and 1+ digits
$ End of string
Regex demo
Note that in C# you can use [0-9] instead of \d to match only digits 0-9
I use this pattern but I do not get the answer. Regex reg = new Regex(#"^([A-Za-z][A-Za-z0-9\.]*(?:[A-Za-z])){6,30}#mydomain.com$");
I want my string to start with a letter and end with a letter, with a combination of letters, numbers, and dots, provided it is between 6 and 30 characters long.
For example: a.124b#mydomain.com or abc.1e#mydomain.com and ...
string pattern = #"^[a-z][a-z0-9.]{4,28}[a-z]#mydomain\.com$";
string input = #"a.124b#mydomain.com";
RegexOptions options = RegexOptions.IgnoreCase;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
now the explanation:
^[a-z] start of the string, and one letter
[a-z0-9.]{4,28} letters, digits and dot character (you don't need to escape it when in square brackets), repeated between 4 and 28 times
[a-z] another single letter
(those in combination amont to 6 to 30 characters)
#mydomain\.com$ rest of your mail address and end of string.
notice also the RegexOptions.IgnoreCase - when you know you don't care about case, it makes letter groups a bit more readable
the error you made in your regex was adding the quantifier for your complete capture group - meaning a repetition of 6-30 times of the whole group.
i also recommend https://regex101.com/ for all your regex needs
Here is one option:
^(?![0-9.]|.*[0-9.]#)[a-zA-Z0-9.]{6,30}#mydomain\.com$
See the online demo
^ - Start line anchor.
(?![0-9.]|.*[0-9.]#) - Negative lookahead to prevent start with dot/digit or end dot/digit before the "#".
[a-zA-Z0-9.]{6,30} - 6-30 Characters specified in class.
#mydomain\.com - Literally match "#mydomain.com". Notice the backslash before the dot to make it literal (outside a character class).
$ - End line anchor.
I was going to mention a case-insensitive alternative, but it looks like #FranzGleichmann got you covered =)
I've read through a lot of really interesting stuff recently about regex. Especially about creating your own regex boundaries
One thing that I don't think I've seen done (I'm 100% it has been done, but I haven't noticed any examples) is how to exclude a regex match if it's preceded by a 'special character', such as & ! % $ #. For example:
If I use the regex (Note this is from C#)
([A-Z]{2,}\\b)
It will match any capital letters that are two or more in length, and use the \b boundary to make sure the two capital letters don't start with or end with any other letters. But here's where I'm not sure how this would behave:
AA -Match
sAB -No Match
ACs -No Match
!AD -Match
AF! -Match
I would like to know how to select only two or more capital letters that aren't preceded by a lower case letter/number/symbol, or followed by a lower case letter/number/special characters.
I've seen people use spaces, so make sure the string starts with or ends with a space, but that doesn't work if it's at the beginning or end of a line.
So, the output I would look for from the example above would be:
AA -Match
sAB -No Match
ACs -No Match
!AD -No Match
AF! -No Match
Any help is appreciated.
You just need to use a lookbehind and a lookahead:
(?<![a-z\d!##$%^&*()])[A-Z]{2,}(?![a-z\d!##$%^&*()])
See regex demo
The (?<![a-z\d!##$%^&*()]) lookbehind makes sure there is no lowercase letters ([a-z]), digits (\d), or special characters that you defined. If there is one, the match is failed, nothing is returned.
The (?![a-z\d!##$%^&*()]) lookahead also fails a match if the same characters are found after the ALLCAPS letters.
See more details on Lookahead and Lookbehind Zero-Length Assertions here.
I think it's enough to just precede the pattern you have with a negation of lower case letter and any symbols you want to exclude. My example only excludes !, but you can add to the list as appropriate. ^ inside brackets negates what is inside them. So, for example, you can incorporate the pattern
/[^a-z!][A-Z]{2,}[^a-z!]/g
I have the following input text:
#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
I would like to parse the values with the #name=value syntax as name/value pairs. Parsing the previous string should result in the following named captures:
name:"foo"
value:"bar"
name:"name"
value:"John \""The Anonymous One\"" Doe"
name:"age"
value:"38"
I tried the following regex, which got me almost there:
#"(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\\)""))"
The primary issue is that it captures the opening quote in "John \""The Anonymous One\"" Doe". I feel like this should be a lookbehind instead of a lookahead, but that doesn't seem to work at all.
Here are some rules for the expression:
Name must start with a letter and can contain any letter, number, underscore, or hyphen.
Unquoted must have at least one character and can contain any letter, number, underscore, or hyphen.
Quoted value can contain any character including any whitespace and escaped quotes.
Edit:
Here's the result from regex101.com:
(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)"))
(?:(?<=\s)|^) Non-capturing group
# matches the character # literally
(?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name
\s* match any white space character [\r\n\t\f ]
= matches the character = literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) Named capturing group value
1st Alternative: [A-Za-z0-9_-]+
[A-Za-z0-9_-]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
0-9 a single character in the range between 0 and 9
_- a single character in the list _- literally
2nd Alternative: (?=").+?(?=(?<!\\)")
(?=") Positive Lookahead - Assert that the regex below can be matched
" matches the characters " literally
.+? matches any character (except newline)
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(?=(?<!\\)") Positive Lookahead - Assert that the regex below can be matched
(?<!\\) Negative Lookbehind - Assert that it is impossible to match the regex below
\\ matches the character \ literally
" matches the characters " literally
You can use a very useful .NET regex feature where multiple same-named captures are allowed. Also, there is an issue with your (?<name>) capture group: it allows a digit in the first position, which does not meet your 1st requirement.
So, I suggest:
(?si)(?:(?<=\s)|^)#(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\\)""))
See demo
Note that you cannot debug .NET-specific regexes at regex101.com, you need to test them in .NET-compliant environment.
Use string methods.
Split
string myLongString = ""#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
string[] nameValues = myLongString.Split('#');
From there either use Split function with "=" or use IndexOf("=").
I have this password regex for an application that is being built its purpose is to:
Make sure users use between 6 - 12 characters.
Make sure users use either one special character or one number.
Also that its case insensitive.
The application is in .net I have the following regex:
I have the following regex for the password checker, bit lengthy but for your viewing if you feel any of this is wrong please let me know.
^(?=.*\d)(?=.*[A-Za-z]).{6-12}$|^(?=.*[A-Za-z])(?=.*[!#$%&'\(\)\*\+-\.:;<=>\?#\[\\\]\^_`\{\|\}~0x0022]|.*\s).{6,12}$
Just a break down of the regex to make sure your all happy it’s correct.
^ = start of string ”^”
(?=.*\d) = must contain “?=” any set of characters “.*” but must include a digit “\d”.
(?=.*[A-Za-z]) = must contain “?=” any set of characters “.*” but must include an insensitive case letter.
.{6-12}$ = must contain any set of characters “.” but must have between 6-12 characters and end of string “$”.
|^ = or “|” start of string “^”
(?=.*[A-Za-z]) = must contain “?=” any set of characters “.*” but must include an insensitive case letter.
(?=.*[!#$%&'\(\)\*\+-\.:;<=>\?#\[\\\]\^_`\{\|\}~0x0022]|.*\s) = must contain “?=” any set of characters “.*” but must include at least special character we have defined or a space ”|.*\s)”. “0x0022” is Unicode for single quote “ character.
.{6,12}$ = set of characters “.” must be between 6 – 12 and this is the end of the string “$”
It's quite long winded, seems to be doing the job but I want to know if there is simpler methods to write this sort of regex and I want to know how I can shorten it if its possible?
Thanks in Advanced.
Does it have to be regex? Looking at the requirements, all you need is String.Length and String.IndexOfAny().
First, good job at providing comments for your regex. However, there is a much better way. Simply write your regex from the get-go in free-spacing mode with lots of comments. This way you can document your regex right in the source code (and provide indentation to improve readability when there are lots of parentheses). Here is how I would write your original regex in C# code:
if (Regex.IsMatch(usernameString,
#"# Validate username having a digit and/or special char.
^ # Either... Anchor to start of string.
(?=.*\d) # Assert there is a digit AND
(?=.*[A-Za-z]) # assert there is an alpha.
.{6-12} # Match any name with length from 6 to 12.
$ # Anchor to end of string.
| ^ # Or... Anchor to start of string
(?=.*[A-Za-z]) # Assert there is an alpha AND
(?=.* # assert there is either a special char
[!#$%&'\(\)\*\+-\.:;<=>\?#\[\\\]\^_`\{\|\}~\x22]
| .*\s # or a space char.
) # End specialchar-or-space assertion.
.{6-12} # Match any name with length from 6 to 12.
$ # Anchor to end of string.
", RegexOptions.IgnorePatternWhitespace)) {
// Valid username.
} else {
// Invalid username.
}
The code snippet above uses the preferable #"..." string syntax which simplifies the escaping of metacharacters. This original regex erroneously separates the two numbers of the curly brace quantifier using a dash, i.e. .{6-12}. The correct syntax is to separate these numbers with a comma, i.e. .*{6,12}. (Maybe .NET allows using the .{6-12} syntax?) I've also changed the 0x0022 (the " double quote char) to \x22.
That said, yes the original regex can be improved a bit:
if (Regex.IsMatch(usernameString,
#"# Validate username having a digit and/or special char.
^ # Anchor to start of string.
(?=.*?[A-Za-z]) # Assert there is an alpha.
(?: # Group for assertion alternatives.
(?=.*?\d) # Either assert there is a digit
| # or assert there is a special char
(?=.*?[!#$%&'()*+-.:;<=>?#[\\\]^_`{|}~\x22\s]) # or space.
) # End group of assertion alternatives.
.{6,12} # Match any name with length from 6 to 12.
$ # Anchor to end of string.
", RegexOptions.IgnorePatternWhitespace)) {
// Valid username.
} else {
// Invalid username.
}
This regex eliminates the global alternative and instead uses a non-capture group for the "digit or specialchar" assertion alternatives. Also, you can eliminate the non-capture group for the "special char or whitespace" alternatives by simply adding the \s to the list of special chars. I've also added a lazy modifier to the dot-stars in the assertions, i.e. .*? - (this may make the regex match a bit faster.) A bunch of unnecessary escapes were removed from the specialchar character class.
But as Stema cleverly pointed out, you can combine the digit and special char to simplify this even further:
if (Regex.IsMatch(usernameString,
#"# Validate username having a digit and/or special char.
^ # Anchor to start of string
(?=.*?[A-Za-z]) # Assert there is an alpha.
# Assert there is a special char, space
(?=.*?[!#$%&'()*+-.:;<=>?#[\\\]^_`{|}~\x22\s\d]) # or digit.
.{6,12} # Match any name with length from 6 to 12.
$ # Anchor to end of string.
", RegexOptions.IgnorePatternWhitespace)) {
// Valid username.
} else {
// Invalid username.
}
Other than that, there is really nothing wrong with your original regex with regard to accuracy. However, logically, this formula allows a username to end with whitespace which is probably not a good idea. I would also explicitly specify a whitelist of allowable chars in the name rather than using the overly permissive "." dot.
I am not sure if it makes sense what you are doing, but to achieve that, your regex can be simpler
^(?=.*[A-Za-z])(?=.*[\d\s!#$%&'\(\)\*\+-\.:;<=>\?#\[\\\]\^_`\{\|\}~0x0022]).{6,12}$
Why using alternatives? Just Add \d and \s to the character class.