Regex match 2 out of 4 groups

Regex match 2 out of 4 groups - c#

I want a single Regex expression to match 2 groups of lowercase, uppercase, numbers or special characters. Length needs to also be grater than 7.
I currently have this expression
^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z]).{8,}$
It, however, forces the string to have lowercase and uppercase and digit or special character.
I currently have this implemented using 4 different regex expressions that I interrogate with some C# code.
I plan to reuse the same expression in JavaScript.
This is sample console app that shows the difference between 2 approaches.
class Program
{
private static readonly Regex[] Regexs = new[] {
new Regex("[a-z]", RegexOptions.Compiled), //Lowercase Letter
new Regex("[A-Z]", RegexOptions.Compiled), // Uppercase Letter
new Regex(#"\d", RegexOptions.Compiled), // Numeric
new Regex(#"[^a-zA-Z\d\s:]", RegexOptions.Compiled) // Non AlphaNumeric
};
static void Main(string[] args)
{
Regex expression = new Regex(#"^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z]).{8,}$", RegexOptions.ECMAScript & RegexOptions.Compiled);
string[] testCases = new[] { "P#ssword", "Password", "P2ssword", "xpo123", "xpo123!", "xpo123!123##", "Myxpo123!123##", "Something_Really_Complex123!#43#2*333" };
Console.WriteLine("{0}\t{1}\t", "Single", "C# Hack");
Console.WriteLine("");
foreach (var testCase in testCases)
{
Console.WriteLine("{0}\t{2}\t : {1}", expression.IsMatch(testCase), testCase,
(testCase.Length >= 8 && Regexs.Count(x => x.IsMatch(testCase)) >= 2));
}
Console.ReadKey();
}
}
Result Proper Test String
------- ------- ------------
True True : P#ssword
False True : Password
True True : P2ssword
False False : xpo123
False False : xpo123!
False True : xpo123!123##
True True : Myxpo123!123##
True True : Something_Really_Complex123!#43#2*333

For javascript you can use this pattern that looks for boundaries between different character classes:
^(?=.*(?:.\b.|(?i)(?:[a-z]\d|\d[a-z])|[a-z][A-Z]|[A-Z][a-z]))[^:\s]{8,}$
if a boundary is found, you are sure to have two different classes.
pattern details:
\b # is a zero width assertion, it's a boundary between a member of
# the \w class and an other character that is not from this class.
.\b. # represents the two characters with the word boundary.
boundary between a letter and a number:
(?i) # make the subpattern case insensitive
(?:
[a-z]\d # a letter and a digit
| # OR
\d[a-z] # a digit and a letter
)
boundary between an uppercase and a lowercase letter:
[a-z][A-Z] | [A-Z][a-z]
since all alternations contains at least two characters from two different character classes, you are sure to obtain the result you hope.

You could use possessive quantifiers (emulated using atomic groups), something like this:
((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}
Since using possessive matching will prevent backtracking, you won't run into the two groups being two consecutive groups of lowercase letters, for instance. So the full regex would be something like:
^(?=.*((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}).{8,}$
Though, were it me, I'd cut the lookahead, just use the expression ((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}, and check the length separately.

Related

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!

For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?

I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.

For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

Replace floating numbers in math equation with letter variables

I want to replace all the floating numbers from a mathematical expression with letters using regular expressions. This is what I've tried:
Regex rx = new Regex("[-]?([0-9]*[.])?[0-9]+");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = 'a';
while (rx.IsMatch(expression))
{
expression = rx.Replace(expression , letter.ToString(), 1);
letter++;
}
The problem is that if I have for example (5-2)+3 it will replace it to: (ab)+c
So it gets the -2 as a number but I don't want that.
I am not experienced with Regex but I think I need something like this:
Check for '-', if there is a one, check if there is a number or right parenthesis before it. If there is NOT then save the '-'.
After that check for digits + dot + digits
My above Regex also works with values like: .2 .3 .4 but I don't need that, it should be explicit: 0.2 0.3 0.4

Following the suggested logic, you may consider
(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?
See the regex demo.
Regex details
(?:(?<![)0-9])-)? - an optional non-capturing group matching 1 or 0 occurrences of
(?<![)0-9]) - a place in string that is not immediately preceded with a ) or digit
- - a minus
[0-9]+ - 1+ digits
(?:\.[0-9]+)? - an optional non-capturing group matching 1 or 0 occurrences of a . followed with 1+ digits.
In code, it is better to use a match evaluator (see the C# demo online):
Regex rx = new Regex(#"(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = (char)96; // char before a in ASCII table
string result = rx.Replace(expression, m =>
{
letter++; // char is incremented
return letter.ToString();
}
);
Console.WriteLine(result); // => ((a+b)*(c+d))-((e*f)-g)

Why is my regex not finding any matches?

So I have a string that looks like this with the spaces and everything.
id: 123456789,
name: 'HappyDev',
member: false,
language: 0,
isLoggedIn: 0
And here is my pattern
static string pattern = #" id: (.*),
name: (.*),
member: (.*),
language: (.*),
isLoggedIn: (.*)";
Then to get my match I do it like so..
static Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(myString);
if (m.Success)
{
Console.WriteLine(m.Value);
}
For some reason it's returning false when I compile even though on every website where I can test my pattern it returns a match with the values.
Why is it returning false when I compile?

Alternate solution:
Pattern
(?:: ?)(.*)(?:,)|(?:: ?)(.*)
Explanation:
1st Alternative (?:: ?)(.*)(?:,)
Non-capturing group (?:: ?)
: matches the character : literally (case sensitive)
? matches the character literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times
as possible, giving back as needed (greedy)
1st Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many
times as possible, giving back as needed (greedy)
Non-capturing group (?:,)
, matches the character , literally (case sensitive)
2nd Alternative (?:: ?)(.*)
Non-capturing group (?:: ?)
: matches the character : literally (case sensitive)
? matches the character literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times
as possible, giving back as needed (greedy)
2nd Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many
times as possible, giving back as needed (greedy)
You loose the distinctness of specifying ID etc - but you reley on non-named capturing groups with implicit ordering anyways - so some place for refinement. If you think they might skip params or reorder them, I would keep the named identifiers part of the pattern and add names to the capturing groups so they are decoupled from ordering.

The problem is that you have different number of spaces. To ignore this problem in any case you can use a pattern to match multiple spaces: \s+. Also you should replace your new lines with a pattern for new line: [\n\r]+ (note that this will match any number of new lines)
So your pattern becomes:
static string pattern = #"\s+id: (.*),[\n\r]+\s+name: (.*),[\n\r]+\s+member: (.*),[\n\r]+\s+language: (.*),[\n\r]+\s+isLoggedIn: (.*)";

There are different ways of solving this. Here is mine:
string pattern = #"^id:\s*(.+),[\n|\r|\r\n]\s+name:(.+),[\n|\r|\r\n]\s+member:\s+(.+),[\n|\r|\r\n]\s+language:\s+(.+),[\n|\r|\r\n]\s+isLoggedIn:\s+(.+)$";
It will account for any space in-between as well as any combination of carriage return/line feed.

var str = #"
id: 123456789,
name: 'HappyDev',
member: false,
language: 0,
isLoggedIn: 0";
var matches = Regex.Matches(str, #"(?im)(?'attr'\w+):\s+(?'val'[^,]+)");
if (matches.Count == 0)
Console.WriteLine("No matches found");
else
matches.Cast<Match>().ToList().ForEach(m =>
Console.WriteLine($"Match: '{m.Value}' [Attribute = {m.Groups["attr"].Value}, Value = {m.Groups["val"].Value}]"));

I tried it on Regex101, your pattern have spacing issues (number of spaces don't match).
You can use the following regex for spaces and new line chars so no more need to worry for how many spaces:
id: (.*),\s*name: (.*),\s*member: (.*),\s*language: (.*),\s*isLoggedIn: (.*)

Talking about the initial code, check if the amount of spaces is equal in the string and pattern. This code finds the match:
var myString =
#" id: 123456789,
name: 'HappyDev',
member: false,
language: 0,
isLoggedIn: 0";
string pattern =
#" id:.*,
name: (.*),
member: (.*),
language: (.*),
isLoggedIn: (.*)";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(myString);
if (m.Success)
{
Console.WriteLine(m.Value);
}
However, you shouldn't use it like that, but replace spaces with ( +) or make use of other solutions provided here.

Regex - Trying to write match validation for Swedish Social Number

So I want the formats xxxxxx-xxxx AND xxxxxxxx-xxxx to be possible. I've managed to fix the first section before the dash, but the last four digits are troublesome. It does require to match at least 4 characters, but I also want the regex to return false if there's more than 4 characters. How do I do it?
This is how it looks so far:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
And this is the results:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
Match m = regex.Match("840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-23");
Console.WriteLine(m.Success); // Outputs false
Match m = regex.Match("19840204-2323423423");
Console.WriteLine(m.Success); // Outputs true, and this is what I don't want

The \d{6,8} pattern matches 6, 7 or 8 digits, so that will already invalidate your regex pattern. Besdies, [-|(\s)]{0,1} matches 1 or 0 -, (, ), | or whitespace chars, and will also match strings like 19840204|2323, 19840204(2323 and 19840204)2323.
You may use
^\d{6}(?:\d{2})?[-\s]?\d{4}$
See the regex demo.
Details
^ - start of string
\d{6} - 6 digits
(?:\d{2})? - optional 2 digits
[-\s]? - 1 or 0 - or whitespaces
\d{4} - 4 digits
$ - end of string.
To make \d only match ASCII digits, pass RegexOptions.ECMAScriptoption. Example:
var res = Regex.IsMatch(s, #"^\d{6}(?:\d{2})?[-\s]?\d{4}$", RegexOptions.ECMAScript);

You are forgetting the $ at the end:
var regex = new Regex(#"^(\d{6}|\d{8})-\d{4}$");
If you want to match the social security number anywhere in a string, you van also use \b to test for boundaries:
var regex = new Regex(#"\b(\d{6}|\d{8})-\d{4}\b");
Edit: I corrected the RegEx to fix the problems mentioned in the comments. The commentors are right, of course. In my earlier post I just wanted to explain why the RegEx matched the longer string.

Regular expression match all numbers after the last dash?

Trying to find the last instance of numbers after last dash in a string so
test-123-2-456 would return 456
123-test would return ""
123-test-456 would return 456
123-test-456sdfsdf would return 456
123-test-asd456 would return 456
The expression, #"[^-]*$", does not match the numbers though, and I have tried using [\d] but to no avail.

Sure, the simplest solution would be something like this:
(\d+)[^-]*$
This will match one or more digits, captured in group 1, followed by zero or more of any character other than a hyphen, followed by the end of the string. In other words, it will match any sequence of digits as long as there are no hyphens between that sequence and the end of the string. You then just have to extract group 1 from the match. For example:
var inputs = new[] {
"test-123-2-456",
"123-test",
"123-test-456",
"123-test-456sdfsdf",
"123-test-asd456"
};
foreach(var str in inputs)
{
var m = Regex.Match(str, #"(\d+)[^-]*$");
Console.WriteLine("{0} --> {1}", str, m.Groups[1].Value);
}
Produces:
test-123-2-456 --> 456
123-test -->
123-test-456 --> 456
123-test-456sdfsdf --> 456
123-test-asd456 --> 456
Alternatively, if you could use a negative lookahead like this:
\d+(?!.*-)
This will match one or more digit characters so long as they are not followed by a hyphen. Only the digits will be included in the match.
Note that these two options behave differently if there are two or more sets of numbers after the last -, e.g. foo-123bar456. In this case it's not entirely clear what you want to happen, but the first pattern will simply match everything starting from the first sequence of digits to the end (123bar456) with group 1 only containing the first sequence of digits (123). If you'd like to change this so that it only captures the last sequence of digits, place a \d inside the character class (i.e. (\d+)[^\d-]*$). The second second pattern would produce a separate match for each sequence digits (in this example, 123 and 456) but the Regex.Match method will only give you the first match.

I suggest to apply two regex-functions. Take the result of the first one as the input for the second one.
The first regex is:
-[0-9]+[^-]+$ // Take the last peace of your string lead by a minus (-)
// followed by digits ([0-9]+)
// and some ugly rest that doesn't contain another minus ([^-]+$)
The second regex is:
-[0-9]+ // Seperate the relevant digits from the ugly rest
// You know that there can only be one minus + digits part in it
Tested here: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

The latest group from this RegEx can get the last number for you:
[^-A-z][0-9]+[^A-z]
If you are looking at groups, you can write this code by matching groups to get the latest number:
var inputs = new[] {
"test-123-2-456",
"123-test",
"123-test-456",
"123-test-456sdfsdf",
"123-test-asd456"
};
var m = Regex.Match(str, #"([0-9]*)");
if(m.Groups.Length>1) //This will avoid the values starting with numbers only.
Console.WriteLine("{0} --> {1}", str, m.Groups[m.Groups.Length-1].Value);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex match 2 out of 4 groups - c#

Related

Simplify regex code in C#: Add a space between a digit/decimal and unit

Replace floating numbers in math equation with letter variables

Why is my regex not finding any matches?

Regex - Trying to write match validation for Swedish Social Number

Regular expression match all numbers after the last dash?

Categories

Resources