Regex to match format of (x), (x) - c#

Can somebody help me to write a regex for format like (x), (x)
where x can be any single digit number. I am able to write match a format like (x) as follows:
Regex rgx = new Regex(#"^\(([^)]+\)$", RegexOptions.IgnoreCase)

If you don't need to capture the non numbers, then only pattern actually required is \d for a numeric.
Each match of \d will be the individual number found as the parser works across the string.
For example:
var values = Regex.Matches("(1) (2)", #"\d")
.OfType<Match>()
.Select (mt => mt.ToString())
.ToArray();
Console.WriteLine ("Numbers found: {0}", string.Join(", ", values));
// Writes out->
// Numbers found: 1, 2
Eratta
The example you gave has RegexOptions.IgnoreCase. This actually does slow down pattern matching because the parser has to convert any character to its neutral counterpart of either upper or lower case before it compares to the words in the target match. Culture is taken into account so 'a' is also connected with 'À', 'Ã', and 'Ä' etc which too have to be processed.
Since you are dealing with numbers using that option makes no sense.
If you don't believe me, look at Jeff Atwood's (Stackoverflow's co-founder) answer to Is regex case insensitivity slower?

Are you looking for something like this?
\(([0-9])\),\s?\([0-9]\)
Also, when trying to write Regexps, I would recommend using Regex101.com.

Related

Match all 'X' from 'Y' until 'Z'

Well, I hope the title is not too confusing. My task is to match (and replace) all Xs that are between Y and Z.
I use X,Y,Z since those values may vary at runtime, but that's not a problem at all.
What I've tried so far is this:
pattern = ".*Y.*?(X).*?Z.*";
Which actually works.. but only for one X. I simply can't figure out, how to match all Xs between those "tags".
I also tried this:
pattern = #"((Y|\G).*?)(?!Z)(X)"
But this matches all Xs, ignoring the "tags".
What is the correct pattern to solve my problem? Thanks in advance :)
Edit
some more information:
X is a single char, Y and Z are strings
A more real life test string:
Some.text.with.dots [nodots]remove.dots.here[/nodots] again.with.dots
=> match .s between [nodots] and [/nodots]
(note: I used xml-like syntax here, but that's not guaranteed so I can unfortunately not use a simple xml or html parser)
In C#, if you need to replace some text inside some block of text, you may match the block(s) with a simple regex like (?s)(START)(.*?)(END) and then inside a match evaluator make the necessary replacements in the matched blocks.
In your case, you may use something like
var res = Regex.Replace(str, #"(?s)(\[nodots])(.*?)(\[/nodots])",
m => string.Format(
"{0}{1}{2}",
m.Groups[1].Value, // Restoring start delimiter
m.Groups[2].Value.Replace(".",""), // Modifying inner contents
m.Groups[3].Value // Restoring end delimiter
)
);
See the C# online demo
Pattern details:
(?s) - an inline version of the RegexOptions.Singleline modifier flag
(\[nodots])- Group 1: starting delimiter (literal string [nodots])
(.*?) - Group 2: any 0+ chars as few as possible
(\[/nodots]) - Group 3: end delimiter (literal string [/nodots])

How to match all regular expression groups with or without character between them

Here is my regular expression
(".+?")*([^{}\s]+)*({.+?})*
Generally matching with this expression work well, but only if there is any character between matched groups. For example this:
{1.0 0b1 2006-01-01_12:34:56.789} {1.2345 0b100000 2006-01-01_12:34:56.789}
produces two matches:
1. {1.0 0b1 2006-01-01_12:34:56.789}
2. {1.2345 0b100000 2006-01-01_12:34:56.789}
but this:
{1.0 0b1 2006-01-01_12:34:56.789}{1.2345 0b100000 2006-01-01_12:34:56.789}
only one containing last match:
{1.2345 0b100000 2006-01-01_12:34:56.789}
PS. I'm using switch g for global match
EDIT:
I do research in meantime and I must to provide additional data. I pasted whole regular expression which matches also words and strings so asterix after groups is neccecary
EDIT2: Here is example text:
COMMAND STATUS {OBJECT1}{OBJECT2} "TEXT" "TEXT"
As a result I want this groups:
COMMAND
STATUS
{OBJECT1}
{OBJECT2}
"TEXT1"
"TEXT2"
Here is my actual C# code:
var regex = new Regex("(\".+?\")*([^{}\\s]+)*({.+?})*");
var matches = regex.Matches(responseString);
return matches
.Cast<Match>()
.Where(match => match.Success && !string.IsNullOrWhiteSpace(match.Value))
.Select(match => CommandParameter.Parse(match.Value));
You can use the following regex to capture all the {...}s:
(".+?"|[^{}\s]+|{[^}]+?})
See demo here.
My approach to capture anything inside some single characters is using a negated character class with the same character. Also, since you are matching non-empty texts, you'd better off using + quantifier that ensures at least 1 character match.
EDIT:
Instead of making each group optional, you should use alternative lists.
You have an extra quantifier * for ({.+?}) sub-pattern.
You can use this regex:
("[^"]*"|{[^}]*}|[^{}\s]+)
RegEx Demo
And note how it matches both groups one with space between them and one without any space.

Checking C input for regex value of "IntegerxInteger"

I am new to C and I am trying to detect if a user entered some regex of IntxInt (i.e. 2x5 or 10x15). I will not go over 15.
From what I have gathered I can just make a regex to detect this. I have been confused on making regex's for C though and no examples have been very useful yet.
I found this example here
string pattern = #"\*\d*\.txt";
Regex rgx = new Regex(pattern)
input = rgx.Replace(input, "");
And my guess of making it fit the above criteria would be something like this
string pattern = #"[0-9][0-9]*[x|X][0-9][0-9]*";
I would guess this as I need at least 1 digit, followed by possible another? Not sure how to limit it to 0 or 1 more numbers. Followed by an X. Then the same thing as the first part.
Is this right/wrong, why?
If this is correct, how do I "test" the input I get in?
Use this regex:
^(?i)(?:1[0-5]|[1-9])x(?:1[0-5]|[1-9])$
(?i) makes x case-insensitive so it can match both x and X.
(?:1[0-5]|[1-9]) matches digits from "1" to "15".
Here is a regex demo.

Regex to ensure that in a string such as "05123:12315", the first number is less than the second?

I must have strings in the format x:y where x and y have to be five digits (zero padded) and x <= y.
Example:
00515:02152
What Regex will match this format?
If possible, please explain the solution briefly to help me learn.
EDIT: Why do I need Regex? I've written a generic tool that takes input and validates it according to a configuration file. An unexpected requirement popped up that would require me to validate a string in the format I've shown (using the configuration file). I was hoping to solve this problem using the existing configuration framework I've coded up, as splitting and parsing would be out of the scope of this tool. For an outstanding requirement such as this, I don't mind having some unorthodox/messy regex, as long as it's not 10000 lines long. Any intelligent solutions using Regex are appreciated! Thanks.
Description
This expression will validate that the first 5 digit number is smaller then the second 5 digit number where zero padded 5 digit numbers are in a : delimited string and is formatted as 01234:23456.
^
(?:
(?=0....:[1-9]|1....:[2-9]|2....:[3-9]|3....:[4-9]|4....:[5-9]|5....:[6-9]|6....:[7-9]|7....:[8-9]|8....:[9])
|(?=(.)(?:0...:\1[1-9]|1...:\1[2-9]|2...:\1[3-9]|3...:\1[4-9]|4...:\1[5-9]|5...:\1[6-9]|6...:\1[7-9]|7...:\1[8-9]|8...:\1[9]))
|(?=(..)(?:0..:\2[1-9]|1..:\2[2-9]|2..:\2[3-9]|3..:\2[4-9]|4..:\2[5-9]|5..:\2[6-9]|6..:\2[7-9]|7..:\2[8-9]|8..:\2[9]))
|(?=(...)(?:0.:\3[1-9]|1.:\3[2-9]|2.:\3[3-9]|3.:\3[4-9]|4.:\3[5-9]|5.:\3[6-9]|6.:\3[7-9]|7.:\3[8-9]|8.:\3[9]))
|(?=(....)(?:0:\4[1-9]|1:\4[2-9]|2:\4[3-9]|3:\4[4-9]|4:\4[5-9]|5:\4[6-9]|6:\4[7-9]|7:\4[8-9]|8:\4[9]))
)
\d{5}:\d{5}$
Live demo: http://www.rubular.com/r/w1QLZhNoEa
Note that this is using the x option to ignore all white space and allow comments, if you use this without x then the expression will need to be all on one line
The language you want to recognize is finite, so the easiest thing to do is just list all the cases separated by "or". The regexp you want is:
(00000:[00000|00001| ... 99999])| ... |(99998:[99998|99999])|(99999:99999)
That regexp will be several billion characters long and take quite some time to execute, but it is what you asked for: a regular expression that matches the stated language.
Obviously that's impractical. Now is it clear why regular expressions are the wrong tool for this job? Use a regular expression to match 5 digits - colon - five digits, and then once you know you have that, split up the string and convert the two sets of digits to integers that you can compare.
x <= y.
Well, you are using wrong tool. Really, regex can't help you here. Or even if you get a solution, that will be too complex, and will be too difficult to expand.
Regex is a text-processing tool to match pattern in regular languages. It is very weak when it comes to semantics. It cannot identify meaning in the given string. Like in your given condition, to conform to x <= y condition, you need to have the knowledge of their numerical values.
For e.g., it can match digits in a sequence, or a mix of digits and characters, but what it cannot do is the stuff like -
match a number greater than 15 and less than 1245, or
match a pattern which is a date between given two dates.
So, where-ever matching a pattern, involves applying semantics to the matched string, Regex is not an option there.
The appropriate way here would be to split the string on colon, and then compare numbers. For leading zero, you can find some workaround.
You can't generally* do this with regex. You can use regex to match the pattern and extract the numbers, then compare the numbers in your code.
For example to match such format (without comparing the numbers) and get the numbers you could use:
^(\d{5}):(\d{5})\z
*) You probably could in this case (as the numbers are always 5 digits and zero padded, but it wouldn't be nice.
You should do something like this instead:
bool IsCorrect(string s)
{
string[] split = s.split(':');
int number1, number2;
if (split.Length == 2 && split[0].Length == 5 && split[1].Length == 5)
{
if (int.TryParse(split[0], out number1) && int.TryParse(split[1], out number2) && number1 <= number2)
{
return true;
}
}
return false;
}
With regex you can't make comparisons to see if a number is bigger than another number.
Let me show you a good example of why you shouldn't try to do this. This is a regex that (nearly) does the same job.
https://gist.github.com/anonymous/ad74e73f0350535d09c1
Raw file:
https://gist.github.com/anonymous/ad74e73f0350535d09c1/raw/03ea835b0e7bf7ac3c5fb6f9c7e934b83fb09d95/gistfile1.txt
Except it's just for 3 digits. For 4, the program that generates these fails with an OutOfMemoryException. With gcAllowVeryLargeObjects enabled. It went on until 5GB until it crashed. You don't want most of your app to be a Regex, right?
This is not a Regex's job.
This is a two step process because regex is a text parser and not analyzer. But with that said, Regex is perfect for validating that we have the 5:5 number pattern and this regex pattern will determine if we have that form factor \d\d\d\d\d:\d\d\d\d\d right. If that form factor is not found then a match fails and the whole validation fails. If it is valid, we can use regex/linq to parse out the numbers and then check for validity.
This code would be inside a method to do the check
var data = "00515:02151";
var pattern = #"
^ # starting from the beginning of the string...
(?=[\d:]{11}) # Is there is a string that is at least 11 characters long with only numbers and a ;, fail if not
(?=\d{5}:\d{5}) # Does it fall into our pattern? If not fail the match
((?<Values>[^:]+)(?::?)){2}
";
// IgnorePatternWhitespace only allows us to comment the pattern, it does not affect the regex parsing
var result = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => mt.Groups["Values"].Captures
.OfType<Capture>()
.Select (cp => int.Parse(cp.Value)))
.FirstOrDefault();
// Two values at this point 515, 2151
bool valid = ((result != null) && (result.First () < result.Last ()));
Console.WriteLine (valid); // True
Using Javascript this can work.
var string = "00515:02152";
string.replace(/(\d{5})\:(\d{5})/, function($1,$2,$3){
return (parseInt($2)<=parseInt($3))?$1:null;
});
FIDDLE http://jsfiddle.net/VdzF7/

Why doesn't finite repetition in lookbehind work in some flavors?

I want to parse the 2 digits in the middle from a date in dd/mm/yy format but also allowing single digits for day and month.
This is what I came up with:
(?<=^[\d]{1,2}\/)[\d]{1,2}
I want a 1 or 2 digit number [\d]{1,2} with a 1 or 2 digit number and slash ^[\d]{1,2}\/ before it.
This doesn't work on many combinations, I have tested 10/10/10, 11/12/13, etc...
But to my surprise (?<=^\d\d\/)[\d]{1,2} worked.
But the [\d]{1,2} should also match if \d\d did, or am I wrong?
On lookbehind support
Major regex flavors have varying supports for lookbehind differently; some imposes certain restrictions, and some doesn't even support it at all.
Javascript: not supported
Python: fixed length only
Java: finite length only
.NET: no restriction
References
regular-expressions.info/Flavor comparison
Python
In Python, where only fixed length lookbehind is supported, your original pattern raises an error because \d{1,2} obviously does not have a fixed length. You can "fix" this by alternating on two different fixed-length lookbehinds, e.g. something like this:
(?<=^\d\/)\d{1,2}|(?<=^\d\d\/)\d{1,2}
Or perhaps you can put both lookbehinds as alternates of a non-capturing group:
(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}
(note that you can just use \d without the brackets).
That said, it's probably much simpler to use a capturing group instead:
^\d{1,2}\/(\d{1,2})
Note that findall returns what group 1 captures if you only have one group. Capturing group is more widely supported than lookbehind, and often leads to a more readable pattern (such as in this case).
This snippet illustrates all of the above points:
p = re.compile(r'(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}')
print(p.findall("12/34/56")) # "[34]"
print(p.findall("1/23/45")) # "[23]"
p = re.compile(r'^\d{1,2}\/(\d{1,2})')
print(p.findall("12/34/56")) # "[34]"
print(p.findall("1/23/45")) # "[23]"
p = re.compile(r'(?<=^\d{1,2}\/)\d{1,2}')
# raise error("look-behind requires fixed-width pattern")
References
regular-expressions.info/Lookarounds, Character classes, Alternation, Capturing groups
Java
Java supports only finite-length lookbehind, so you can use \d{1,2} like in the original pattern. This is demonstrated by the following snippet:
String text =
"12/34/56 date\n" +
"1/23/45 another date\n";
Pattern p = Pattern.compile("(?m)(?<=^\\d{1,2}/)\\d{1,2}");
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group());
} // "34", "23"
Note that (?m) is the embedded Pattern.MULTILINE so that ^ matches the start of every line. Note also that since \ is an escape character for string literals, you must write "\\" to get one backslash in Java.
C-Sharp
C# supports full regex on lookbehind. The following snippet shows how you can use + repetition on a lookbehind:
var text = #"
1/23/45
12/34/56
123/45/67
1234/56/78
";
Regex r = new Regex(#"(?m)(?<=^\d+/)\d{1,2}");
foreach (Match m in r.Matches(text)) {
Console.WriteLine(m);
} // "23", "34", "45", "56"
Note that unlike Java, in C# you can use #-quoted string so that you don't have to escape \.
For completeness, here's how you'd use the capturing group option in C#:
Regex r = new Regex(#"(?m)^\d+/(\d{1,2})");
foreach (Match m in r.Matches(text)) {
Console.WriteLine("Matched [" + m + "]; month = " + m.Groups[1]);
}
Given the previous text, this prints:
Matched [1/23]; month = 23
Matched [12/34]; month = 34
Matched [123/45]; month = 45
Matched [1234/56]; month = 56
Related questions
How can I match on, but exclude a regex pattern?
Unless there's a specific reason for using the lookbehind which isn't noted in the question, how about simply matching the whole thing and only capturing the bit you're interested in instead?
JavaScript example:
>>> /^\d{1,2}\/(\d{1,2})\/\d{1,2}$/.exec("12/12/12")[1]
"12"
To quote regular-expressions.info:
The bad news is that most regex
flavors do not allow you to use just
any regex inside a lookbehind, because
they cannot apply a regular expression
backwards. Therefore, the regular
expression engine needs to be able to
figure out how many steps to step back
before checking the lookbehind.
Therefore, many regex flavors,
including those used by Perl and
Python, only allow fixed-length
strings. You can use any regex of
which the length of the match can be
predetermined. This means you can use
literal text and character classes.
You cannot use repetition or optional
items. You can use alternation, but
only if all options in the alternation
have the same length.
In other words your regex does not work because you're using a variable-width expression inside a lookbehind and your regex engine does not support that.
In addition to those listed by #polygenelubricants, there are two more exceptions to the "fixed length only" rule. In PCRE (the regex engine for PHP, Apache, et al) and Oniguruma (Ruby 1.9, Textmate), a lookbehind may consist of an alternation in which each alternative may match a different number of characters, as long as the length of each alternative is fixed. For example:
(?<=\b\d\d/|\b\d/)\d{1,2}(?=/\d{2}\b)
Note that the alternation has to be at the top level of the lookbehind subexpression. You might, like me, be tempted to factor out the common elements, like this:
(?<=\b(?:\d\d/|\d)/)\d{1,2}(?=/\d{2}\b)
...but it wouldn't work; at the top level, the subexpression now consists of a single alternative with a non-fixed length.
The second exception is much more useful: \K, supported by Perl and PCRE. It effectively means "pretend the match really started here." Whatever appears before it in the regex is treated as a positive lookbehind. As with .NET lookbehinds, there are no restrictions; whatever can appear in a normal regex can be used before the \K.
\b\d{1,2}/\K\d{1,2}(?=/\d{2}\b)
But most of the time, when someone has a problem with lookbehinds, it turns out they shouldn't even be using them. As #insin pointed out, this problem can be solved much more easily by using a capturing group.
EDIT: Almost forgot JGSoft, the regex flavor used by EditPad Pro and PowerGrep; like .NET, it has completely unrestricted lookbehinds, positive and negative.

Categories

Resources