Checking C input for regex value of "IntegerxInteger" - c#

I am new to C and I am trying to detect if a user entered some regex of IntxInt (i.e. 2x5 or 10x15). I will not go over 15.
From what I have gathered I can just make a regex to detect this. I have been confused on making regex's for C though and no examples have been very useful yet.
I found this example here
string pattern = #"\*\d*\.txt";
Regex rgx = new Regex(pattern)
input = rgx.Replace(input, "");
And my guess of making it fit the above criteria would be something like this
string pattern = #"[0-9][0-9]*[x|X][0-9][0-9]*";
I would guess this as I need at least 1 digit, followed by possible another? Not sure how to limit it to 0 or 1 more numbers. Followed by an X. Then the same thing as the first part.
Is this right/wrong, why?
If this is correct, how do I "test" the input I get in?

Use this regex:
^(?i)(?:1[0-5]|[1-9])x(?:1[0-5]|[1-9])$
(?i) makes x case-insensitive so it can match both x and X.
(?:1[0-5]|[1-9]) matches digits from "1" to "15".
Here is a regex demo.

Related

Regular expression with a limited length, spaces and other limitations

I'm looking for the following regex:
The match can be empty.
If it is not empty, it must contain at least 2 characters which are English letters or digits.
The regex must allow spaces between words.
This is what I come up with:
^[a-zA-Z0-9]{2,}$
It works fine, but it does not except spaces between words.
Here, you can use this regex to make sure we match all kind of spaces (even a hard space), and make sure we allow an empty string match:
(?i)^(?:[a-z0-9]{2}[a-z0-9\p{Zs}]*|)$
C#:
var rg11x = new Regex(#"(?i)^(?:[a-z0-9]{2}[a-z0-9\p{Zs}]*|)$");
var tst = rg11x.IsMatch(""); // true
var tst1 = rg11x.Match("Mc Donalds").Value; // Mc Donalds
You can use ^[a-zA-Z\d]{2}[a-zA-Z\d\s]*?$
Here is also an useful site for learning and testing regex patterns.
http://regex101.com/

c# String - Split Pascal Case

I've been trying to get a C# regex command to turn something like
EYDLessThan5Days
into
EYD Less Than 5 Days
Any ideas?
The code I used :
public static string SplitPascalCase(this string value) {
Regex NameExpression = new Regex("([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z0-9]+)",
RegexOptions.Compiled);
return NameExpression.Replace(value, " $1").Trim();
}
Out:
EYD Less Than5 Days
But still give me wrong result.
Actually I already asked about this in javascript code but when i implemented in c# code with same logic, it's failed.
Please help me. Thanks.
Use lookarounds in your regex so that it won't consume any characters and it allows overlapping of matches.
(?<=[A-Za-z])(?=[A-Z][a-z])|(?<=[a-z0-9])(?=[0-9]?[A-Z])
Replace the matched boundaries with a space.
Regex.Replace(yourString, #"(?<=[A-Za-z])(?=[A-Z][a-z])|(?<=[a-z0-9])(?=[0-9]?[A-Z])", " ");
DEMO
Explanation:
(?<=[A-Za-z])(?=[A-Z][a-z]) Matches the boundary which was exists inbetween an upper or lowercase letter and an Uppercase letter which was immediately followed by a lowercase letter. For example. consider this ABc string. And this regex would match, the boundary exists inbetween A and Bc. For this aBc example , this regex would match, the boundary exists inbetween a and Bc
| Logical OR operator.
(?<=[a-z0-9])(?=[0-9]?[A-Z]) Matches the boundary which was exists inbetween an lower case letter or digit and an optional digit which was immediately followed by an Uppercase letter. For example. consider this a9A string. And this regex would match, the boundary exists inbetween a and 9A , and also the boundary exists inbetween 9 and A, because we gave [0-9] as optional in positive lookahead.
You can just match and join..
var arr = Regex.Matches(str, #"[A-Z]+(?=[A-Z][a-z]+)|\d|[A-Z][a-z]+").Cast<Match>().Select(m => m.Value).ToArray();
Console.WriteLine(String.Join(" ",arr));
The regex isn't complex at all, it is just capturing each and joining them with a " "
DEMO
Something like this should do
string pattern=#"(?<=\d)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=\d)|(?=[A-Z][a-z])|(?<=[a-z])(?=[A-Z])";
Regex.Replace(input,pattern," ");

C# regex match, match.Success returns false even after following the rules

Friends,
I want to match a string like
"int lnum[];" so I am trying to match it with a pattern like this
[A-Za-z_0-9] [A-Za-z_0-9]\[\]
but it does not seem to work.
I looked up rules at http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
string pJavaLine = "int lnum[]";
match = Regex.Match(pJavaLine, #"[A-Za-z_0-9] [A-Za-z_0-9]\[\] ", RegexOptions.IgnoreCase);
if (match.Success) {
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
the match.Success returns false.
Would anybody please let me know a possible way to get this.
Each of your character classes, like [A-Za-z_0-9], matches only a single character. If you want to match more than one character, you need to add something to the end. For example, [A-Za-z_0-9]+ -- the + means 1 or more of these. You could also use * for 0 or more, or specify a range, like {2,5} for 2-5 characters.
That said, you can use this pattern to match that string:
[A-Za-z_0-9]+ [A-Za-z_0-9]+\[\]
The \w is loosely equivalent to [A-Za-z_0-9] (see link in jessehouwing's comment below), so you can probably simply use:
\w+ \w+\[\]
Check here for more info on the standard Character Classes.

Regex which ensures no character is repeated

I need to ensure that a input string follows these rules:
It should contain upper case characters only.
NO character should be repeated in the string.
eg. ABCA is not valid because 'A' is being repeated.
For the upper case thing, [A-Z] should be fine.
But i am lost at how to ensure no repeating characters.
Can someone suggest some method using regular expressions ?
You can do this with .NET regular expressions although I would advise against it:
string s = "ABCD";
bool result = Regex.IsMatch(s, #"^(?:([A-Z])(?!.*\1))*$");
Instead I'd advise checking that the length of the string is the same as the number of distinct characters, and checking the A-Z requirement separately:
bool result = s.Cast<char>().Distinct().Count() == s.Length;
Alteranatively, if performance is a critical issue, iterate over the characters one by one and keep a record of which you have seen.
This cannot be done via regular expressions, because they are context-free. You need at least context-sensitive grammar language, so only way how to achieve this is by writing the function by hand.
See formal grammar for background theory.
Why not check for a character which is repeated or not in uppercase instead ? With something like ([A-Z])?.*?([^A-Z]|\1)
Use negative lookahead and backreference.
string pattern = #"^(?!.*(.).*\1)[A-Z]+$";
string s1 = "ABCDEF";
string s2 = "ABCDAEF";
string s3 = "ABCDEBF";
Console.WriteLine(Regex.IsMatch(s1, pattern));//True
Console.WriteLine(Regex.IsMatch(s2, pattern));//False
Console.WriteLine(Regex.IsMatch(s3, pattern));//False
\1 matches the first captured group. Thus the negative lookahead fails if any character is repeated.
This isn't regex, and would be slow, but You could create an array of the contents of the string, and then iterate through the array comparing n to n++
=Waldo
It can be done using what is call backreference.
I am a Java program so I will show you how it is done in Java (for C#, see here).
final Pattern aPattern = Pattern.compile("([A-Z]).*\\1");
final Matcher aMatcher1 = aPattern.matcher("ABCDA");
System.out.println(aMatcher1.find());
final Matcher aMatcher2 = aPattern.matcher("ABCDA");
System.out.println(aMatcher2.find());
The regular express is ([A-Z]).*\\1 which translate to anything between 'A' to 'Z' as group 1 ('([A-Z])') anything else (.*) and group 1.
Use $1 for C#.
Hope this helps.

Why doesn't finite repetition in lookbehind work in some flavors?

I want to parse the 2 digits in the middle from a date in dd/mm/yy format but also allowing single digits for day and month.
This is what I came up with:
(?<=^[\d]{1,2}\/)[\d]{1,2}
I want a 1 or 2 digit number [\d]{1,2} with a 1 or 2 digit number and slash ^[\d]{1,2}\/ before it.
This doesn't work on many combinations, I have tested 10/10/10, 11/12/13, etc...
But to my surprise (?<=^\d\d\/)[\d]{1,2} worked.
But the [\d]{1,2} should also match if \d\d did, or am I wrong?
On lookbehind support
Major regex flavors have varying supports for lookbehind differently; some imposes certain restrictions, and some doesn't even support it at all.
Javascript: not supported
Python: fixed length only
Java: finite length only
.NET: no restriction
References
regular-expressions.info/Flavor comparison
Python
In Python, where only fixed length lookbehind is supported, your original pattern raises an error because \d{1,2} obviously does not have a fixed length. You can "fix" this by alternating on two different fixed-length lookbehinds, e.g. something like this:
(?<=^\d\/)\d{1,2}|(?<=^\d\d\/)\d{1,2}
Or perhaps you can put both lookbehinds as alternates of a non-capturing group:
(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}
(note that you can just use \d without the brackets).
That said, it's probably much simpler to use a capturing group instead:
^\d{1,2}\/(\d{1,2})
Note that findall returns what group 1 captures if you only have one group. Capturing group is more widely supported than lookbehind, and often leads to a more readable pattern (such as in this case).
This snippet illustrates all of the above points:
p = re.compile(r'(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}')
print(p.findall("12/34/56")) # "[34]"
print(p.findall("1/23/45")) # "[23]"
p = re.compile(r'^\d{1,2}\/(\d{1,2})')
print(p.findall("12/34/56")) # "[34]"
print(p.findall("1/23/45")) # "[23]"
p = re.compile(r'(?<=^\d{1,2}\/)\d{1,2}')
# raise error("look-behind requires fixed-width pattern")
References
regular-expressions.info/Lookarounds, Character classes, Alternation, Capturing groups
Java
Java supports only finite-length lookbehind, so you can use \d{1,2} like in the original pattern. This is demonstrated by the following snippet:
String text =
"12/34/56 date\n" +
"1/23/45 another date\n";
Pattern p = Pattern.compile("(?m)(?<=^\\d{1,2}/)\\d{1,2}");
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group());
} // "34", "23"
Note that (?m) is the embedded Pattern.MULTILINE so that ^ matches the start of every line. Note also that since \ is an escape character for string literals, you must write "\\" to get one backslash in Java.
C-Sharp
C# supports full regex on lookbehind. The following snippet shows how you can use + repetition on a lookbehind:
var text = #"
1/23/45
12/34/56
123/45/67
1234/56/78
";
Regex r = new Regex(#"(?m)(?<=^\d+/)\d{1,2}");
foreach (Match m in r.Matches(text)) {
Console.WriteLine(m);
} // "23", "34", "45", "56"
Note that unlike Java, in C# you can use #-quoted string so that you don't have to escape \.
For completeness, here's how you'd use the capturing group option in C#:
Regex r = new Regex(#"(?m)^\d+/(\d{1,2})");
foreach (Match m in r.Matches(text)) {
Console.WriteLine("Matched [" + m + "]; month = " + m.Groups[1]);
}
Given the previous text, this prints:
Matched [1/23]; month = 23
Matched [12/34]; month = 34
Matched [123/45]; month = 45
Matched [1234/56]; month = 56
Related questions
How can I match on, but exclude a regex pattern?
Unless there's a specific reason for using the lookbehind which isn't noted in the question, how about simply matching the whole thing and only capturing the bit you're interested in instead?
JavaScript example:
>>> /^\d{1,2}\/(\d{1,2})\/\d{1,2}$/.exec("12/12/12")[1]
"12"
To quote regular-expressions.info:
The bad news is that most regex
flavors do not allow you to use just
any regex inside a lookbehind, because
they cannot apply a regular expression
backwards. Therefore, the regular
expression engine needs to be able to
figure out how many steps to step back
before checking the lookbehind.
Therefore, many regex flavors,
including those used by Perl and
Python, only allow fixed-length
strings. You can use any regex of
which the length of the match can be
predetermined. This means you can use
literal text and character classes.
You cannot use repetition or optional
items. You can use alternation, but
only if all options in the alternation
have the same length.
In other words your regex does not work because you're using a variable-width expression inside a lookbehind and your regex engine does not support that.
In addition to those listed by #polygenelubricants, there are two more exceptions to the "fixed length only" rule. In PCRE (the regex engine for PHP, Apache, et al) and Oniguruma (Ruby 1.9, Textmate), a lookbehind may consist of an alternation in which each alternative may match a different number of characters, as long as the length of each alternative is fixed. For example:
(?<=\b\d\d/|\b\d/)\d{1,2}(?=/\d{2}\b)
Note that the alternation has to be at the top level of the lookbehind subexpression. You might, like me, be tempted to factor out the common elements, like this:
(?<=\b(?:\d\d/|\d)/)\d{1,2}(?=/\d{2}\b)
...but it wouldn't work; at the top level, the subexpression now consists of a single alternative with a non-fixed length.
The second exception is much more useful: \K, supported by Perl and PCRE. It effectively means "pretend the match really started here." Whatever appears before it in the regex is treated as a positive lookbehind. As with .NET lookbehinds, there are no restrictions; whatever can appear in a normal regex can be used before the \K.
\b\d{1,2}/\K\d{1,2}(?=/\d{2}\b)
But most of the time, when someone has a problem with lookbehinds, it turns out they shouldn't even be using them. As #insin pointed out, this problem can be solved much more easily by using a capturing group.
EDIT: Almost forgot JGSoft, the regex flavor used by EditPad Pro and PowerGrep; like .NET, it has completely unrestricted lookbehinds, positive and negative.

Categories

Resources