Find multiple, non consecutive, occurrences of a character in string - c#

Can somebody please help me write RegEx expression which can check if a string contains more than one occurrence of a uppercase (or lowercase, doesn't matter) letter but not in a row.
I need to have at least two (or even better n) occurrences in a string. If n=2 valid situations would be PassWord or PAssword or PASSWord.
When I tried this /(?=([A-Z]{2,3}))/g it matched PassWOrd but not PassWord.
What is strange to me is that it also matched PaSSWOrd. I thought 3 in {2,3} actually means that no more that 3 Uppercase characters will be matched. Why is SSWO matched?
I tried similar variations but non of them worked for me (nothing strange as i'm not very familiar with RegEx).
Can this be done using RegEx?

The (?=([A-Z]{2,3})) regex matches 2 to 3 consecutive uppercase ASCII letters anywhere inside a string. You want to match a string that only contains 2 to 3 uppercase ASCII letters, not necessarily consecutively.
To match a string that only contains two uppercase ASCII letters (no more no less), use the following expression:
^(?:[^A-Z]*[A-Z]){2}[^A-Z]*$
Or, if you only allow ASCII letters in the whole string:
^(?:[a-z]*[A-Z]){2}[a-z]*$
See the regex demo.
Pattern details
^ - start of string
(?:[^A-Z]*[A-Z]){2} - exactly 2 consequent occurrences of
[^A-Z]* - zero or more chars other than ASCII uppercase letters
[A-Z] - one ASCII uppercase letter
[^A-Z]* - zero or more chars other than ASCII uppercase letters
$ - end of string.
In C#, use
var strs = new List<string> { "PassWord", "PAssword", "PASSWord"};
var n = 2;
var pat = $#"^(?:[^A-Z]*[A-Z]){{{n}}}[^A-Z]*$";
foreach (var s in strs) {
Console.WriteLine("{0}: {1}", s, Regex.IsMatch(s, pat));
}
Result:
PassWord: True
PAssword: True
PASSWord: False
See the online demo
Note that in case you need to require 2 uppercase ASCII letters in a string where other chars can be any chars, you do not need a regex, use LINQ:
var strs = new List<string> { "PassWord", "PAssword", "PASSWord"};
var n = 2;
foreach (var s in strs) {
var res = s.Count(c => (c >= 65 && c <= 90));
Console.WriteLine("{0}: {1}", s, res == 2);
}
See another demo. The .Count(c => (c >= 65 && c <= 90)) part will count the uppercase ASCII letters anywhere in the string, and res==2 will return a boolean result, whether the number is equal to 2 or not. It can be adjusted for a numeric range check easily.
If you need Unicode compatibility, replace .Count(c => (c >= 65 && c <= 90)) with .Where(Char.IsUpper).

Related

How to extract first character and numbers after that until find a character (a-z) in a string - C# 2.0

How to extract numbers from a second index of a string until find a character (a-z) in a string?
I am using C# version 2.0 (can't upgrade for some reasons).
Here are some examples;
M000067TGFD45F = M000067
B000064TFR765TXT = B000064
B000065TFR765 = B000065
B000067TGFD = B000067
I have tried Regex("[^0-9]") which works if there is no character after digits (4th example)
"B" + regexOnlyNumbers.Replace(mystring, string.Empty);
You can use
string text = "M000067TGFD45F";
Match m = Regex.Match(text, #"^[A-Z][0-9]+");
if (m.Success)
{
Console.WriteLine(m.Value);
}
See the regex demo.
Details
^ - start of string
[A-Z] - an uppercase ASCII letter
[0-9]+ - one or more ASCII digits.
Alternatively, you might consider a ^[A-Z][^A-Z]+ pattern, where [^A-Z]+ matches any one or more chars other than uppercase ASCII letters.
To ignore case, use RegexOptions.IgnoreCase: Regex.Match(text, #"^[A-Z][0-9]+", RegexOptions.IgnoreCase).

How to find string which contains two numbers with a character between them

I want to find a pattern which contains two positive integer number having a % or - character between them in a string. Let consider a string "Приветственный_3%5" Here we can see from the string two number 3 and 5 having a % sign between them. I want to find string having a portion of two number with a '%' or '-' sign between them.
You can use regular expressions for this. And you can even extract the integer values with Regex:
var input = "Приветственный_32%50";
var searchPattern = #"(\d+)[%-](\d+)";
var matches = Regex.Matches(input, searchPattern);
if (matches.Count == 1) {
// A single occurence of this pattern has been found in the input string.
// We can extract the numbers from the Groups of this Match.
// Group 0 is the entire match, groups 1 and 2 are the groups we captured with the parentheses
var firstNumber = matches[0].Groups[1].Value;
var secondNumber = matches[0].Groups[2].Value;
}
Regex pattern explanation:
(\d+) ==> matches one or more digits and captures it in a group with the parentheses.
[%-] ==> matches a single % or - character
(\d+) ==> matches one or more digits and captures it in a group with the parentheses.
create a simple function that follows those steps:
loop throw the whole text since you want to check all
if current char is a digit get the full number for example 123
you will need to extract all before passing to stage 3.
if % or - is the next char after the number and next char is digit
extract the second number

Regex: Combination of String Length and Starts With patterns

I want to check if the string contains exactly 11 characters, not more or less, and also if it starts with the numbers '09', so my pattern is:
Regex rg = new Regex(#"^(?=09)(?={11})");
Console.WriteLine(rg.IsMatch("09123456789"));
^(?=09) is working correctly, but when I add second part, (?={11}), an exception will be thrown. What's the right pattern?
You may achieve that without a regex:
if (s.Length == 11 && s.StartsWith("09") && s.All(Char.IsDigit))
See the C# demo (not sure you need to have digits only. If not, remove s.All(Char.IsDigit)).
Note that ^(?=09)(?={11}) matches a start of string position (with ^), then checks if the string starts with 09 substring, and then requires {11} literal char sequence at the beginning of a string. That can't work since 09 != {1.
If you need a regex you may use
\A09[0-9]{9}\z
or, to match not only digits:
\A09.{9}\z
where
\A - asserts the start of a string
09 - matches a literal char sequence 09
.{9} - matches 9 chars other than LF
\z - the very end of the string.
new Regex(#"^09[0-9]{9}$");
pattern for valid string
You can use
Regex rg = new Regex(#"^09.{9}$", RegexOptions.Compiled);
bool CheckStringRegex(string str)
{
return rg.IsMatch(str);
}
but I suggest to do in simpler, without regexes
bool CheckString(string str)
{
return str.Length == 11 && str.StartsWith("09");
}
How about ^(?=09)\w{11}$ ?
DEMO click me

Retrieve textboxdata in "" (double code) [duplicate]

This is the input string: 23x^45*y or 2x^2 or y^4*x^3.
I am matching ^[0-9]+ after letter x. In other words I am matching x followed by ^ followed by numbers. Problem is that I don't know that I am matching x, it could be any letter that I stored as variable in my char array.
For example:
foreach (char cEle in myarray) // cEle is letter in char array x, y, z, ...
{
match CEle in regex(input) //PSEUDOCODE
}
I am new to regex and I new that this can be done if I define regex variables, but I don't know how.
You can use the pattern #"[cEle]\^\d+" which you can create dynamically from your character array:
string s = "23x^45*y or 2x^2 or y^4*x^3";
char[] letters = { 'e', 'x', 'L' };
string regex = string.Format(#"[{0}]\^\d+",
Regex.Escape(new string(letters)));
foreach (Match match in Regex.Matches(s, regex))
Console.WriteLine(match);
Result:
x^45
x^2
x^3
A few things to note:
It is necessary to escape the ^ inside the regular expression otherwise it has a special meaning "start of line".
It is a good idea to use Regex.Escape when inserting literal strings from a user into a regular expression, to avoid that any characters they type get misinterpreted as special characters.
This will also match the x from the end of variables with longer names like tax^2. This can be avoided by requiring a word boundary (\b).
If you write x^1 as just x then this regular expression will not match it. This can be fixed by using (\^\d+)?.
The easiest and faster way to implement from my point of view is the following:
Input: This?_isWhat?IWANT
string tokenRef = "?";
Regex pattern = new Regex($#"([^{tokenRef}\/>]+)");
The pattern should remove my tokenRef and storing the following output:
Group1 This
Group2 _isWhat
Group3 IWANT
Try using this pattern for capturing the number but excluding the x^ prefix:
(?<=x\^)[0-9]+
string strInput = "23x^45*y or 2x^2 or y^4*x^3";
foreach (Match match in Regex.Matches(strInput, #"(?<=x\^)[0-9]+"))
Console.WriteLine(match);
This should print :
45
2
3
Do not forget to use the option IgnoreCase for matching, if required.

Special letters check

How can I check if a TextBox contains numbers, letters, and also have special letters like "õ, ä, ö, ü"?
I use code to check numbers and letters:
Regex.IsMatch(Value, "^[a-z0-9]+$", RegexOptions.IgnoreCase)
How can I check if textbox contains numbers and letters only,
bool isValid = textBox.Text.All(char.IsLetterOrDigit);
Consider the following example:
string str = "Something123õäö";
bool isValid = str.All(char.IsLetterOrDigit);
You will get true for the above case.
Does How can you strip non-ASCII characters from a string? (in C#) contain any pointers?
You could include the unicode using the \uXXXX syntax within the regex for any additional letters you specifically want to strip test for.
Regex.IsMatch(Value, "^[a-z0-9\u00c0-\u00f6]+$", RegexOptions.IgnoreCase)
Just loop over every char and compare with or to the other chars and with char.GetUnicodeCategory for letters and digits:
var allowed = new[] { 'ö', 'ä' };
var isOK = textBox1.Text.All(c =>
char.GetUnicodeCategory(c) == UnicodeCategory.LowercaseLetter ||
char.GetUnicodeCategory(c) == UnicodeCategory.UppercaseLetter ||
char.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber ||
allowed.Contains(c));
UnicodeCategory.LowercaseLetter are standard lowercase letters ('a'..'z'), UnicodeCategory.UppercaseLetter are uppercase letters, and UnicodeCategory.DecimalDigitNumber are digits, so this and a customized allowed array should take care of everything you want to accept.
If you want to validate all "word charters" just use \w if you want to see if a whole string is just word characters or digits use the regex ^(\w|\d)+$

Categories

Resources