Why is my regex not finding any matches? - c#

So I have a string that looks like this with the spaces and everything.
id: 123456789,
name: 'HappyDev',
member: false,
language: 0,
isLoggedIn: 0
And here is my pattern
static string pattern = #" id: (.*),
name: (.*),
member: (.*),
language: (.*),
isLoggedIn: (.*)";
Then to get my match I do it like so..
static Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(myString);
if (m.Success)
{
Console.WriteLine(m.Value);
}
For some reason it's returning false when I compile even though on every website where I can test my pattern it returns a match with the values.
Why is it returning false when I compile?

Alternate solution:
Pattern
(?:: ?)(.*)(?:,)|(?:: ?)(.*)
Explanation:
1st Alternative (?:: ?)(.*)(?:,)
Non-capturing group (?:: ?)
: matches the character : literally (case sensitive)
? matches the character literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times
as possible, giving back as needed (greedy)
1st Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many
times as possible, giving back as needed (greedy)
Non-capturing group (?:,)
, matches the character , literally (case sensitive)
2nd Alternative (?:: ?)(.*)
Non-capturing group (?:: ?)
: matches the character : literally (case sensitive)
? matches the character literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times
as possible, giving back as needed (greedy)
2nd Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many
times as possible, giving back as needed (greedy)
You loose the distinctness of specifying ID etc - but you reley on non-named capturing groups with implicit ordering anyways - so some place for refinement. If you think they might skip params or reorder them, I would keep the named identifiers part of the pattern and add names to the capturing groups so they are decoupled from ordering.

The problem is that you have different number of spaces. To ignore this problem in any case you can use a pattern to match multiple spaces: \s+. Also you should replace your new lines with a pattern for new line: [\n\r]+ (note that this will match any number of new lines)
So your pattern becomes:
static string pattern = #"\s+id: (.*),[\n\r]+\s+name: (.*),[\n\r]+\s+member: (.*),[\n\r]+\s+language: (.*),[\n\r]+\s+isLoggedIn: (.*)";

There are different ways of solving this. Here is mine:
string pattern = #"^id:\s*(.+),[\n|\r|\r\n]\s+name:(.+),[\n|\r|\r\n]\s+member:\s+(.+),[\n|\r|\r\n]\s+language:\s+(.+),[\n|\r|\r\n]\s+isLoggedIn:\s+(.+)$";
It will account for any space in-between as well as any combination of carriage return/line feed.

var str = #"
id: 123456789,
name: 'HappyDev',
member: false,
language: 0,
isLoggedIn: 0";
var matches = Regex.Matches(str, #"(?im)(?'attr'\w+):\s+(?'val'[^,]+)");
if (matches.Count == 0)
Console.WriteLine("No matches found");
else
matches.Cast<Match>().ToList().ForEach(m =>
Console.WriteLine($"Match: '{m.Value}' [Attribute = {m.Groups["attr"].Value}, Value = {m.Groups["val"].Value}]"));

I tried it on Regex101, your pattern have spacing issues (number of spaces don't match).
You can use the following regex for spaces and new line chars so no more need to worry for how many spaces:
id: (.*),\s*name: (.*),\s*member: (.*),\s*language: (.*),\s*isLoggedIn: (.*)

Talking about the initial code, check if the amount of spaces is equal in the string and pattern. This code finds the match:
var myString =
#" id: 123456789,
name: 'HappyDev',
member: false,
language: 0,
isLoggedIn: 0";
string pattern =
#" id:.*,
name: (.*),
member: (.*),
language: (.*),
isLoggedIn: (.*)";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(myString);
if (m.Success)
{
Console.WriteLine(m.Value);
}
However, you shouldn't use it like that, but replace spaces with ( +) or make use of other solutions provided here.

Related

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?
I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.
For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

Regex C# Matching string from two words in exact order and returning capture of non-matched words

C# Regex
I have the following list of strings:
"New patient, brief"
"New patient, limited"
"Established patient, brief"
"Established patient, limited"
"New diet patient"
"Established diet patient"
"School Physical"
"Deposition, 1 hour"
"Deposition, 2 hour"
I would like to separate these strings into groups using regex.
The first pattern I see is:
"New" or "Established" -- will be the first word of the matched pattern. This word will need to be captured and returned. Of this pattern, "patient" must be present without need to capture. Any word after "patient" must be captured.
I've tried: ((?=.*\bNew\b))(?=.*\bpatient\b)([A-Za-z0-9\-]+)
but the return match gives:
Full match 0-3 `New`
Group 1. 0-0 ``
Group 2. 0-3 `New`
Not at all what I am looking for.
string input = "New patient, limited";
string pattern = #"((?=.*\bNew\b))(?=.*\bpatient\b)([A-Za-z0-9\-]+)";
MatchCollection matches = Regex.Matches(input, pattern);
GroupCollection groups = matches[0].Groups;
foreach (Match match in matches)
{
Console.WriteLine("First word: {0}", match.Groups[1].Value);
Console.WriteLine("Last words: {0}", match.Groups[2].Value);
Console.WriteLine();
}
Console.WriteLine();
Thank you for any help with this.
Edit #1
For strings like "New patient, limited"
output should be: "New" "limited"
For strings like "Deposition, 1 hour" where "hour" is present,
output should be: "Deposition, 1 hour"
For strings where there are no words after "patient" but "patient" is present, like
"New diet patient",
output should be: "New" "diet"
For strings where neither "patient" nor "hour" is present, the entire string should be returned. i.e like "School Physical" should return the entire string,
"School Physical".
As I said, this is my ultimate quest. At the moment, I am trying to focus on separating out only the first pattern :). Much Thanks.
I suggest using
^(?:(?!\b(?:New|Established)\b).)*$|\b(New|Established)\s+(?:patient\b\W*)?(.+)
See the regex demo
Details
^(?:(?!\b(?:New|Established)\b).)*$ - any string that has no New or Established as whole words
| - or
\b(New|Established) - a whole word New or Established (put into Group 1)
\s+ - 1+ whitespaces
(?:patient\b\W*)? - an optional non-capturing group matching 1 or 0 occurrences of patient followed with word boundary and 0+ non-word chars
(.+) - Group 2: any 1 or more chars other than line break chars.
The code will look like
var match = Regex.Match(s, #"^(?:(?!\b(?:New|Established)\b).)*$|\b(New|Established)\s+(?:patient\b\W*)?(.+)");
If Group 1 is not matched (!match.Groups[1].Success), grab the whole match, match.Value. Else, grab match.Groups[1].Value and match.Groups[2].Value.
Results:

Regex - Trying to write match validation for Swedish Social Number

So I want the formats xxxxxx-xxxx AND xxxxxxxx-xxxx to be possible. I've managed to fix the first section before the dash, but the last four digits are troublesome. It does require to match at least 4 characters, but I also want the regex to return false if there's more than 4 characters. How do I do it?
This is how it looks so far:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
And this is the results:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
Match m = regex.Match("840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-23");
Console.WriteLine(m.Success); // Outputs false
Match m = regex.Match("19840204-2323423423");
Console.WriteLine(m.Success); // Outputs true, and this is what I don't want
The \d{6,8} pattern matches 6, 7 or 8 digits, so that will already invalidate your regex pattern. Besdies, [-|(\s)]{0,1} matches 1 or 0 -, (, ), | or whitespace chars, and will also match strings like 19840204|2323, 19840204(2323 and 19840204)2323.
You may use
^\d{6}(?:\d{2})?[-\s]?\d{4}$
See the regex demo.
Details
^ - start of string
\d{6} - 6 digits
(?:\d{2})? - optional 2 digits
[-\s]? - 1 or 0 - or whitespaces
\d{4} - 4 digits
$ - end of string.
To make \d only match ASCII digits, pass RegexOptions.ECMAScriptoption. Example:
var res = Regex.IsMatch(s, #"^\d{6}(?:\d{2})?[-\s]?\d{4}$", RegexOptions.ECMAScript);
You are forgetting the $ at the end:
var regex = new Regex(#"^(\d{6}|\d{8})-\d{4}$");
If you want to match the social security number anywhere in a string, you van also use \b to test for boundaries:
var regex = new Regex(#"\b(\d{6}|\d{8})-\d{4}\b");
Edit: I corrected the RegEx to fix the problems mentioned in the comments. The commentors are right, of course. In my earlier post I just wanted to explain why the RegEx matched the longer string.

Regex with balancing groups

I need to write regex that capture generic arguments (that also can be generic) of type name in special notation like this:
System.Action[Int32,Dictionary[Int32,Int32],Int32]
lets assume type name is [\w.]+ and parameter is [\w.,\[\]]+
so I need to grab only Int32, Dictionary[Int32,Int32] and Int32
Basically I need to take something if balancing group stack is empty, but I don't really understand how.
UPD
The answer below helped me solve the problem fast (but without proper validation and with depth limitation = 1), but I've managed to do it with group balancing:
^[\w.]+ #Type name
\[(?<delim>) #Opening bracet and first delimiter
[\w.]+ #Minimal content
(
[\w.]+
((?(open)|(?<param-delim>)),(?(open)|(?<delim>)))* #Cutting param if balanced before comma and placing delimiter
((?<open>\[))* #Counting [
((?<-open>\]))* #Counting ]
)*
(?(open)|(?<param-delim>))\] #Cutting last param if balanced
(?(open)(?!) #Checking balance
)$
Demo
UPD2 (Last optimization)
^[\w.]+
\[(?<delim>)
[\w.]+
(?:
(?:(?(open)|(?<param-delim>)),(?(open)|(?<delim>))[\w.]+)?
(?:(?<open>\[)[\w.]+)?
(?:(?<-open>\]))*
)*
(?(open)|(?<param-delim>))\]
(?(open)(?!)
)$
I suggest capturing those values using
\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*
See the regex demo.
Details:
\w+(?:\.\w+)* - match 1+ word chars followed with . + 1+ word chars 1 or more times
\[ - a literal [
(?:,?(?<res>\w+(?:\[[^][]*])?))* - 0 or more sequences of:
,? - an optional comma
(?<res>\w+(?:\[[^][]*])?) - Group "res" capturing:
\w+ - one or more word chars (perhaps, you would like [\w.]+)
(?:\[[^][]*])? - 1 or 0 (change ? to * to match 1 or more) sequences of a [, 0+ chars other than [ and ], and a closing ].
A C# demo below:
var line = "System.Action[Int32,Dictionary[Int32,Int32],Int32]";
var pattern = #"\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*";
var result = Regex.Matches(line, pattern)
.Cast<Match>()
.SelectMany(x => x.Groups["res"].Captures.Cast<Capture>()
.Select(t => t.Value))
.ToList();
foreach (var s in result) // DEMO
Console.WriteLine(s);
UPDATE: To account for unknown depth [...] substrings, use
\w+(?:\.\w+)*\[(?:\s*,?\s*(?<res>\w+(?:\[(?>[^][]+|(?<o>\[)|(?<-o>]))*(?(o)(?!))])?))*
See the regex demo

Regex match 2 out of 4 groups

I want a single Regex expression to match 2 groups of lowercase, uppercase, numbers or special characters. Length needs to also be grater than 7.
I currently have this expression
^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z]).{8,}$
It, however, forces the string to have lowercase and uppercase and digit or special character.
I currently have this implemented using 4 different regex expressions that I interrogate with some C# code.
I plan to reuse the same expression in JavaScript.
This is sample console app that shows the difference between 2 approaches.
class Program
{
private static readonly Regex[] Regexs = new[] {
new Regex("[a-z]", RegexOptions.Compiled), //Lowercase Letter
new Regex("[A-Z]", RegexOptions.Compiled), // Uppercase Letter
new Regex(#"\d", RegexOptions.Compiled), // Numeric
new Regex(#"[^a-zA-Z\d\s:]", RegexOptions.Compiled) // Non AlphaNumeric
};
static void Main(string[] args)
{
Regex expression = new Regex(#"^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z]).{8,}$", RegexOptions.ECMAScript & RegexOptions.Compiled);
string[] testCases = new[] { "P#ssword", "Password", "P2ssword", "xpo123", "xpo123!", "xpo123!123##", "Myxpo123!123##", "Something_Really_Complex123!#43#2*333" };
Console.WriteLine("{0}\t{1}\t", "Single", "C# Hack");
Console.WriteLine("");
foreach (var testCase in testCases)
{
Console.WriteLine("{0}\t{2}\t : {1}", expression.IsMatch(testCase), testCase,
(testCase.Length >= 8 && Regexs.Count(x => x.IsMatch(testCase)) >= 2));
}
Console.ReadKey();
}
}
Result Proper Test String
------- ------- ------------
True True : P#ssword
False True : Password
True True : P2ssword
False False : xpo123
False False : xpo123!
False True : xpo123!123##
True True : Myxpo123!123##
True True : Something_Really_Complex123!#43#2*333
For javascript you can use this pattern that looks for boundaries between different character classes:
^(?=.*(?:.\b.|(?i)(?:[a-z]\d|\d[a-z])|[a-z][A-Z]|[A-Z][a-z]))[^:\s]{8,}$
if a boundary is found, you are sure to have two different classes.
pattern details:
\b # is a zero width assertion, it's a boundary between a member of
# the \w class and an other character that is not from this class.
.\b. # represents the two characters with the word boundary.
boundary between a letter and a number:
(?i) # make the subpattern case insensitive
(?:
[a-z]\d # a letter and a digit
| # OR
\d[a-z] # a digit and a letter
)
boundary between an uppercase and a lowercase letter:
[a-z][A-Z] | [A-Z][a-z]
since all alternations contains at least two characters from two different character classes, you are sure to obtain the result you hope.
You could use possessive quantifiers (emulated using atomic groups), something like this:
((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}
Since using possessive matching will prevent backtracking, you won't run into the two groups being two consecutive groups of lowercase letters, for instance. So the full regex would be something like:
^(?=.*((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}).{8,}$
Though, were it me, I'd cut the lookahead, just use the expression ((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}, and check the length separately.

Categories

Resources