Regex - Trying to write match validation for Swedish Social Number - c#

So I want the formats xxxxxx-xxxx AND xxxxxxxx-xxxx to be possible. I've managed to fix the first section before the dash, but the last four digits are troublesome. It does require to match at least 4 characters, but I also want the regex to return false if there's more than 4 characters. How do I do it?
This is how it looks so far:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
And this is the results:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
Match m = regex.Match("840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-23");
Console.WriteLine(m.Success); // Outputs false
Match m = regex.Match("19840204-2323423423");
Console.WriteLine(m.Success); // Outputs true, and this is what I don't want

The \d{6,8} pattern matches 6, 7 or 8 digits, so that will already invalidate your regex pattern. Besdies, [-|(\s)]{0,1} matches 1 or 0 -, (, ), | or whitespace chars, and will also match strings like 19840204|2323, 19840204(2323 and 19840204)2323.
You may use
^\d{6}(?:\d{2})?[-\s]?\d{4}$
See the regex demo.
Details
^ - start of string
\d{6} - 6 digits
(?:\d{2})? - optional 2 digits
[-\s]? - 1 or 0 - or whitespaces
\d{4} - 4 digits
$ - end of string.
To make \d only match ASCII digits, pass RegexOptions.ECMAScriptoption. Example:
var res = Regex.IsMatch(s, #"^\d{6}(?:\d{2})?[-\s]?\d{4}$", RegexOptions.ECMAScript);

You are forgetting the $ at the end:
var regex = new Regex(#"^(\d{6}|\d{8})-\d{4}$");
If you want to match the social security number anywhere in a string, you van also use \b to test for boundaries:
var regex = new Regex(#"\b(\d{6}|\d{8})-\d{4}\b");
Edit: I corrected the RegEx to fix the problems mentioned in the comments. The commentors are right, of course. In my earlier post I just wanted to explain why the RegEx matched the longer string.

Related

Extracting integer ranges separated with hyphen

I try to filter some strings I streamed for some useful information in C#.
I got two possible string structures:
string examplestring1 = "from - to (mm) no. 1\r\n\r\nna 570 - 590\r\n60 18.12.20\r\nna 5390 - 5410\r\n60 18.12.20\r\nna 11380 - 11390 60 18.12.20\r\nPage 1/1";
string examplestring2 = "e ne 570 - 590 ne 5390 - 5410 ne 11380 - 11390 e";
I'd like to get an array or a List of strings in the format of "xxx - xxx". Like:
string[] example = new string[]{"570 - 590","5390 - 5410","11380 - 11390"};
I tried to use Regex:
List<string> numbers = new List<string>();
numbers.AddRange(Regex.Split(examplestring2, #"\D+"));
At least I get a list only containg the numbers. But that doesn't work out for examplestring1 since there is date within.
Also I tried to play around with Regex pattern. But things like following does not work.
Regex.Split(examplestring1, #"\D+" + " - " + #"\D+");
I'd be grateful for a solution or at least some hint how to solve that matter.
You can use
var results = Regex.Matches(text, #"\d+\s*-\s*\d+").Cast<Match>().Select(x => x.Value);
See the regex demo. If there must be a single regular space on both ends of the -, you can use \d+ - \d+ regex.
If you want to match any -, you can use [\p{Pd}\xAD] instead of -.
Note that \d in .NET matches any Unicode digits, to only match ASCII digits, use RegexOptions.ECMAScript option: Regex.Matches(text, #"\d+\s*-\s*\d+", RegexOptions.ECMAScript).

Getting matching result using Regex

I need to find the matching result i.e a string using Regex. Let me demonstrate the scenario using sample inputs.
string input= "xb-cv_107_20190608_032214_006"; // <-1st case
string input = "yb-ha_107_20190608_032214_006__foobar"; // <-2nd case
string input= "fv_vgf_ka01mq3286__20190426_084135_039"; // <-3rd case
string input="fv_vgf_ka01mq3286__2090426_084135_039"; //<-4th case
For 1st case input, output required= "xb-cv_107_20190608_032214_006".
For 2nd case input, output required= "yb-ha_107_20190608_032214_006".
For 3rd case input, output required= "fv_vgf_ka01mq3286__20190426_084135_039".
For 4th case input, output required= null since the pattern does not match.
The procedure to get the output is:
Check using regex if pattern ends with _ followed by 8 decimals followed by '_'
followed by 6 decimals followed by 3 decimals
Or check using regex if pattern ends with _ followed by 8 decimals followed by _ followed by 6 decimals followed by 3 decimals exists followed by __ exists followed by anything random.
Till now, I have come up with this Regex expression:
string pattern = #".+[_][0-9]{8}[_][0-9]{6}[_][0-9]{3}([_]{2})?";
var result = Regex.Match(input, pattern)?.Groups[0].Value ;
You may use
var result = Regex.Match(input, #"^(.+_[0-9]{8}_[0-9]{6}_[0-9]{3})__")?.Groups[1].Value;
Regex details:
^ - start of string
( - Group 1 start:
.+ - any 1+ chars other than LF, as many as possible
_[0-9]{8}_[0-9]{6}_[0-9]{3} - _, 8 digits, _, 6 digits, _, 3 digits
) - end of Group 1
__ - two underscores.
If there is a match, the result holds the value that resides in Group 1.
If there is no match, result is null.

How should be the regex for alphanumeric where alphabet only with combinations of"w" or "d" or "h"

I need the regex to validate the user entered string, the string may be of following formats.
ex: 1w 2d 1h or 2w 1h or 1w 2d. Likewise combinations of numbers and w,d and h.
I am looking for the regex to allow the combinations number and w or d or h.
Is it possible to have a regex like that way?
You could say you want
\d+ any number, 1 or more times
[wdh] one of w, d, h
(?: |$) space or end of string
Then put this in a group, loop it 1 or more times
(?: start of non-capture group
)+ end of non-capture group, 1 or more times
^ and $ start and end of string respectively
Result
^(?:\d+[wdh](?: |$))+$
Edit: I see you've added more requirements in the comments of your question, this regex will not fulfil your most recent requirements
We can try writing a rudimentary parser to check the input string.
string input = "1w 2d 1h";
string[] parts = Regex.Split(input, #"\s+");
bool success = true;
if (parts.Length > 3)
{
success = false;
}
else
{
Regex regex = new Regex(#"\d+(?:w|d|h)");
foreach (string part in parts)
{
Match match = regex.Match(part);
if (!match.Success)
{
success = false;
}
}
}
if (success)
{
Console.WriteLine("MATCH");
}
else
{
Console.WriteLine("NO MATCH");
}
This answer might carry its own weight if, in your C# application, you also needed to extract the numerical values of each component.
Try ^(?:\d+w ?)?(?:\d+d ?)?(?:\d+h)?$
Explaantion:
^ - match beginning of a string
(?:...) - non-capturing group
\d+ - match one or more digits
w, d, h - match literally w, d, h respectively
? - match preceeding pattern zero or one time
Demo
Finally, the following string seems to be working fine for me...
^(?:\d+w)?((?:\d+d)|(?: \d+d))?((?:\d+h)|(?: \d+h))?$
Thanks, everyone for helping.
This works for your requirement. Hope this helps! Ordering (w->d->h) guaranteed.
^(\d+w){0,1}\s*(\d+d){0,1}\s*(\d+h){0,1}$
Test cases:
10w 12d 11h
1w 2d
2w 1h
1d 1h
1d 1h
3w 2h

Replace floating numbers in math equation with letter variables

I want to replace all the floating numbers from a mathematical expression with letters using regular expressions. This is what I've tried:
Regex rx = new Regex("[-]?([0-9]*[.])?[0-9]+");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = 'a';
while (rx.IsMatch(expression))
{
expression = rx.Replace(expression , letter.ToString(), 1);
letter++;
}
The problem is that if I have for example (5-2)+3 it will replace it to: (ab)+c
So it gets the -2 as a number but I don't want that.
I am not experienced with Regex but I think I need something like this:
Check for '-', if there is a one, check if there is a number or right parenthesis before it. If there is NOT then save the '-'.
After that check for digits + dot + digits
My above Regex also works with values like: .2 .3 .4 but I don't need that, it should be explicit: 0.2 0.3 0.4
Following the suggested logic, you may consider
(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?
See the regex demo.
Regex details
(?:(?<![)0-9])-)? - an optional non-capturing group matching 1 or 0 occurrences of
(?<![)0-9]) - a place in string that is not immediately preceded with a ) or digit
- - a minus
[0-9]+ - 1+ digits
(?:\.[0-9]+)? - an optional non-capturing group matching 1 or 0 occurrences of a . followed with 1+ digits.
In code, it is better to use a match evaluator (see the C# demo online):
Regex rx = new Regex(#"(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = (char)96; // char before a in ASCII table
string result = rx.Replace(expression, m =>
{
letter++; // char is incremented
return letter.ToString();
}
);
Console.WriteLine(result); // => ((a+b)*(c+d))-((e*f)-g)

Why is my regex not finding any matches?

So I have a string that looks like this with the spaces and everything.
id: 123456789,
name: 'HappyDev',
member: false,
language: 0,
isLoggedIn: 0
And here is my pattern
static string pattern = #" id: (.*),
name: (.*),
member: (.*),
language: (.*),
isLoggedIn: (.*)";
Then to get my match I do it like so..
static Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(myString);
if (m.Success)
{
Console.WriteLine(m.Value);
}
For some reason it's returning false when I compile even though on every website where I can test my pattern it returns a match with the values.
Why is it returning false when I compile?
Alternate solution:
Pattern
(?:: ?)(.*)(?:,)|(?:: ?)(.*)
Explanation:
1st Alternative (?:: ?)(.*)(?:,)
Non-capturing group (?:: ?)
: matches the character : literally (case sensitive)
? matches the character literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times
as possible, giving back as needed (greedy)
1st Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many
times as possible, giving back as needed (greedy)
Non-capturing group (?:,)
, matches the character , literally (case sensitive)
2nd Alternative (?:: ?)(.*)
Non-capturing group (?:: ?)
: matches the character : literally (case sensitive)
? matches the character literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times
as possible, giving back as needed (greedy)
2nd Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many
times as possible, giving back as needed (greedy)
You loose the distinctness of specifying ID etc - but you reley on non-named capturing groups with implicit ordering anyways - so some place for refinement. If you think they might skip params or reorder them, I would keep the named identifiers part of the pattern and add names to the capturing groups so they are decoupled from ordering.
The problem is that you have different number of spaces. To ignore this problem in any case you can use a pattern to match multiple spaces: \s+. Also you should replace your new lines with a pattern for new line: [\n\r]+ (note that this will match any number of new lines)
So your pattern becomes:
static string pattern = #"\s+id: (.*),[\n\r]+\s+name: (.*),[\n\r]+\s+member: (.*),[\n\r]+\s+language: (.*),[\n\r]+\s+isLoggedIn: (.*)";
There are different ways of solving this. Here is mine:
string pattern = #"^id:\s*(.+),[\n|\r|\r\n]\s+name:(.+),[\n|\r|\r\n]\s+member:\s+(.+),[\n|\r|\r\n]\s+language:\s+(.+),[\n|\r|\r\n]\s+isLoggedIn:\s+(.+)$";
It will account for any space in-between as well as any combination of carriage return/line feed.
var str = #"
id: 123456789,
name: 'HappyDev',
member: false,
language: 0,
isLoggedIn: 0";
var matches = Regex.Matches(str, #"(?im)(?'attr'\w+):\s+(?'val'[^,]+)");
if (matches.Count == 0)
Console.WriteLine("No matches found");
else
matches.Cast<Match>().ToList().ForEach(m =>
Console.WriteLine($"Match: '{m.Value}' [Attribute = {m.Groups["attr"].Value}, Value = {m.Groups["val"].Value}]"));
I tried it on Regex101, your pattern have spacing issues (number of spaces don't match).
You can use the following regex for spaces and new line chars so no more need to worry for how many spaces:
id: (.*),\s*name: (.*),\s*member: (.*),\s*language: (.*),\s*isLoggedIn: (.*)
Talking about the initial code, check if the amount of spaces is equal in the string and pattern. This code finds the match:
var myString =
#" id: 123456789,
name: 'HappyDev',
member: false,
language: 0,
isLoggedIn: 0";
string pattern =
#" id:.*,
name: (.*),
member: (.*),
language: (.*),
isLoggedIn: (.*)";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(myString);
if (m.Success)
{
Console.WriteLine(m.Value);
}
However, you shouldn't use it like that, but replace spaces with ( +) or make use of other solutions provided here.

Categories

Resources