I'm using C# 2012 and I can not solve this regular expression.
I need to validate the text so that points or traces are Mandatory to separate the numbers in the input text :
[0-9]{9}(-|.)[\s]?[0-9]{4}(-|.)[0-9]{4}(-|.)0-9[0-9]{2}(-|.)[0-9]{4}
A valid text should be as follows:
0706570-39.2014.8.02.0001
but the expression above returns true to the text below although it should be false:
...Certidão de Casamento nº 00287301551
982200032250000901391 - Cartório Privativo....
^[0-9]{9}(-|\.)[\s]?[0-9]{4}(-|\.)[0-9]{4}(-|\.)0-9[0-9]{2}(-|\.)[0-9]{4}$
Add anchors ^...$ to denote start and end of string. Also escape ..
You need to use the following regex:
\b[0-9]{7}[-.][0-9]{2}[-.][0-9]{4}[-.][0-9][.-][0-9]{2}[-.][0-9]{4}\b
See demo
If the expression must match individual full strings, replace \b...\b with ^...$.
Note that (-|.) is really pointless as . matches -, so your intention was to match a literal .. To match a literal ., you need to escape it (as vks shows), or put it into a character class [.]. A character class is a bit more efficient solution here since there is much less backtracking than with alternation | operator. Anyway, the original expression is matching different digit groups (see [0-9]{7}(-|\.)\s?[0-9]{2}(-|\.)[0-9]{4}(-|\.)[0-9]{1}(-|\.)[0-9]{2}(-|\.)[0-9]{4} just for a demo sake that is a "fixed" version.)
Related
I am trying to create regular expression for following type of strings:
combination of the prefix (XI/ YV/ XD/ YQ/ XZ), numerical digits only, and either no ‘Z’ or a ‘Z’ suffix.
For example, XD35Z should pass but XD01HW should not pass.
So far I tried following:
#"XD\d+Z?" - XD35Z passes but unfortunately it also works for XD01HW
#"XD\d+$Z" - XD01HW fails which is what I want but XD35Z also fails
I have also tried #"XD\d{1,}Z"? but it did not work
I need a single regex which will give me appropriate results for both types of strings.
Try this regex:
^(XI|YV|XD|YQ|XZ){1}\d+Z{0,1}$
I'm using quantifying braces to explicitly limit the allowed numbers of each character/group. And the ^ and $ anchors make sure that the regex matches only the whole line (string).
Broken into logical pieces this regex checks
^(XI|YV|XD|YQ|XZ){1} Starts with exactly one of the allowed prefixes
\d+ Is follow by one or more digits
Z{0,1}$ Ends with between 0 and 1 Z
You're misusing the $ which represents the end of the string in the Regex
It should be : #"^XD\d+Z?$" (notice that it appears at the end of the Regex, after the Z?)
The regex following the behaviour you want is:
^(XI|YV|XD|YQ|XZ)\d+Z?$
Explanation:
combination of the prefix (XI/ YV/ XD/ YQ/ XZ)
^(XI|YV|XD|YQ|XZ)
numerical digits only
\d+
‘Z’ or a ‘Z’ suffix
Z?$
I'm working on some code inherited from someone else and trying to understand some regular expression code in C#:
Regex.Replace(query, #"""[^""~]+""([^~]|$)",
m => string.Format(field + "_exact:{0}", m.Value))
What is the above regular expression doing? This is in relation to input from a user performing a search. It's doing a replace of the query string using the pattern provided in the second argument, with the value of the third. But what is that regular expression? For the life of me, it doesn't make sense. Thanks.
As far as I can see, xanatos' answer is correct. I tried to understand the regex, so here it comes:
"[^"~]+"([^~]|$)
You can test our regex and play with the single parts for better understanding at http://www.regexpal.com/
1.) a single character
"
The first pattern is a literal character. Since there is no statement of relative position, it can occur everywhere.
2.) a character class
[^"~]
The next expression is the []-bracket. This is a character set. It defines a quantity of characters, which maybe follow next. It is a placeholder for one single character... So lets see inside, which content is allowed:
^"~
The definition of the character class begins with an caret (^), which is a special character. Typing a caret after the opening square bracket will negate the character class. So it's "upside down": everything following, which does not match the class expression, matches and is a valid character.
In this case, every literal character is possible, except the two excluded ones: " or ~.
3.) a special character
+
The next expression, a plus, tells the engine to attempt to match the preceding token once or more.
So the defined character class should one or multiple times repeated to match the given expression.
4.) a single character
"
To match, the expression should contain furthermore one further apostrophe, which will be the corresponding apostrophe to the first one in 1.) since the character class in (2.) hence (3.) does not permit an apostrophe.
5.) a lookaround
([^~]|$)
The first structure here to examine is the ()-bracket. This is called a "Lookaround".
It is is a special kind of group. Lookaround matches a position. It does not expand the regex match.
So this means this part does not try to find any certain characters inside of an expression
rather then to localize them.
The localisation demands has two conditions, which are connected by a logical OR by the pipeline symbol: |
So the next character of the matched expression could either be
[^~] one single character out of the class everything excluding the character ~
or
$ the end of the line (or word, if multiline-mode is not used in regex engine)
I'll try to edit my answer to a better format, since this is my first post, I first have to check out how this is working.. :)
Update:
to "detect" a Asterisk/star in front/end of the line, you have to do following:
First it's a special character, so you have to escape it with an backslash: *
To define the position, you can use:
^ to look at the beginning of the line,
$ end of the line
The overall expression would be:
^* in front of the expression to search for an * at the beginning of
the line $* at the end of the regex to demand an * at the end.
.... in your case you can add the * in the last character class to detect an * in the end:
([^~]|$|$*)
and to force an * in the end, delete the other conditions:
($*)
PS:
(somehow my regex is swallowed up by formating engine, so my update is wrong...)
The # makes it necessary to escape all the " with a second ", so "". Without it to escape the " you would have used \", but I consider it better to always use # in regexes, because the \ is used quite often, and it's boring and unreadable to always have to escape it to \\.
Let's see what the regex really is:
Console.WriteLine(#"""[^""~]+""([^~]|$)");
is
"[^"~]+"([^~]|$)
So now we can look at the "real" regex.
It looks for a " followed by one or more non-" and non-~ followed by another " followed by a non-~ or the end of the string. Note that the match could start after the start of the string and it could end before the end of the string (with a non-~)
For example in
car"hello"help
it would match "hello"h
Pattern is
Regex splRegExp = new System.Text.RegularExpressions.Regex(#"[\,#,+,\,?,\d,%,.,?,*,&,^,$,(,!,),#,-,_]");
All characters work except '-'. Please advise.
Use
#"[,#+\\?\d%.*&^$(!)#_-]"
No need for all those commas.
If you place a - inside a character class, it means a literal dash only if it's at the start or end of the class. Otherwise it denotes a range like A-Z. As Damien put it, the range ,-, is indeed rather small (and doesn't contain the -, of course).
'-' has to be the first charater in your regex.
Regex splRegExp = new System.Text.RegularExpressions.Regex(#"[-,\,#,+,\,?,\d,%,.,?,*,&,^,$,(,!,),#,_]");
You need to escape the -character for it to work (it's a regular expression syntax)
Try this:
"[\,#,+,\,?,\d,%,.,?,*,&,^,$,(,!,),#,\-,_]"
I am trying to write a regular expression that matches only any of the following:
{string}
{string[]}
{string[6]}
where instead of 6 in the last line, there could be any positive integer. Also, wherever string appears in the above lines, there could be int.
Here is the regular expression I initially wrote : {(string|int)(([]|[[0-9]])?)}. It worked well but wouldn't allow more than one digit within the square bracket. To overcome this problem, I modified it this way : {(string|int)(([]|[[0-9]*])?)}.
The modified regex seems to be having serious problems. It matches {string[][]}. Can you please tell me what causes it to match against this? Also, when I try to enclose [0-9] within paranthesis, I get an exception saying "too many )'s". Why does this happen?
Can you please tell me how to write the regular expression that would satisfy my requirements?
Note : I am tryingthis in C#
You need to escape the special characters like {} and []:
\{(string|int)(\[\d*\])?\}
You might need to use [0-9] instead of \d depending on the engine you use.
You need to escape the [ as it indicates a character set in regular expressions; otherwise [[0-9]*] would be interpreted as character class of [ and 0–9, * quantifier, followed by a literal ]. So:
{(string|int)(\[[0-9]*])?}
And since the quantifier * allows zero repetitions too, you don’t need the special case for an empty [].
Please provide a solution to write a regular expression as following in C#.NET:
I would require a RegEx for Non-Alphabets(a to z;A to Z) and Non-Numerals(0 to 9).
Mean to say as reverse way for getting regular expression other than alphabets and otherthan numerals(0 to 9).
Kindly suggest the solution for the same.
You can use a negated character class here:
[^a-zA-Z0-9]
Above regex will match a single character which can't be a latin lowercase or uppercase letter or a number.
The ^ at the start of the character class (the part between [ and ]) negates the complete class so that it matches anything not in the class, instead of normal character class behavior.
To make it useful, you probably want one of those:
Zero or more such characters
[^a-zA-Z0-9]*
The asterisk (*) here signifies that the preceding part can be repeated zero or more times.
One or more such characters
[^a-zA-Z0-9]+
The plus (+) here signifies that the preceding part can be repeated one or more times.
A complete (possibly empty) string, consisting only of such characters
^[^a-zA-Z0-9]*$
Here the characters ^ and $ have a meaning as anchors, matching the start and end of the string, respectively. This ensures that the entire string consists of characters not in that character class and no other characters come before or after them.
A complete (non-empty) string, consisting only of such characters
^[^a-zA-Z0-9]+$
Elaborating a bit, this won't (and can't) make sure that you won't use any other characters, possibly from other scripts. The string аеΒ would be completely valid with the above regular expression, because it uses letters from Greek and Cyrillic. Furthermore there are other pitfalls. The string á will pass above regular expression, while the string ́a will not (because it constructs the letter á from the letter a and a combining diacritical mark).
So negated character classes have to be taken with care at times.
I can also use numerals from other scripts, if I wanted to: ١٢٣ :-)
You can use the character class
[^\p{L&}\p{Nd}]
if you need to take care of the above things.
just negate the class:
[^A-Za-z0-9]
To obey local setting use:
[^[:alnum:]]