Regex Remove unnecessary symbols with certain rules c# - c#

I want to remove all unnecessary characters so the name can be valid, here are the rules :
• Has length between 3 and 16 characters
• Contains only letters, numbers, hyphens and underscores
• Has no redundant symbols before, after or in between
This is the input:
Jeff, john45, ab, cd, peter-ivanov, #smith, sh, too_long_username, !lleg#l ch#rs, jeffbutt
My Regex so far is : https://regexr.com/4ahls, and I want to remove:
#smith
!lleg#l
ch#rs

Your own regex \b([a-zA-Z0-9_-]){3,16}\b is good enough for giving the intended match but \b fails to do their job and will allow matching partially in a word like #smith to give you smith because # is not part of word character and hence s in smith will match as the point in between # and s is indeed a word boundary. You will need a different regex ensuring the word is preceded/followed by a space and in addition comma too as some words are followed by comma and you want to count them in. Try using this regex,
(?<= |^)[a-zA-Z0-9_-]{3,16}(?=[ ,]|$)
Demo
This should give you matches to only words that follow your rules.
Note: Always keep - either at very start or very end while having it in a character set, otherwise it sometimes behaves weird and gives unexpected results.

You could try this pattern: (?=^[a-zA-Z0-9-_]{3,16}$).+
Generally positive lookaheads (?=...) are used to assert that some rules are valid, as you want to do. Explanation:
^ - match beginning of a string
[a-zA-Z0-9-_]{3,16} - match at least 3 and 16 at most of characters in a character class: a-zA-Z - all letters, 0-9 - digits, -_ - hyphen or underscore
$ - end of a string
And if this assertion is successfull, then match everything with .*
Demo

You actually do not need a regex to solve this. Use old good string.Split() and process names
var input = "Jeff, john45, ab, cd, peter-ivanov, #smith, sh, too_long_username, !lleg#l ch#rs, jeffbutt";
var listOfNames = input.Split(new[] {",", " "}, StringSplitOptions.RemoveEmptyEntries)
.Where(l => l.Length >= 3 && l.Length <= 18) // filter by length
.Where(l => l.All(c => char.IsDigit(c) || char.IsLetter(c) || c == '-')) // filter by spec chars
.ToList();
now you have a list of four names. If you want to turn it back to string just join your names:
var singleLine = string.Join(", ", listOfNames);
// singleLine is "Jeff, john45, peter-ivanov, jeffbutt"

Related

Regex expression to capture only numeric fields and strip $ and comma, no match if there are any alphanumeric

I'm trying to write a regex that will strip out $ and , from a value and not match at all if there are any other non-numerics.
$100 -> 100
$12,203.00 -> 12203.00
12JAN2022 -> no match
I have gotten sort of close with this:
^(?:[$,]*)(([0-9.]{1,3})(?:[,.]?))+(?:[$,]*)$
However this doesn't properly capture the numeric value with $1 as the repeating digits are captured as like subgroup captures as you can see here https://regex101.com/r/4bOJtB/1
You can use a named capturing group to capture all parts of the number and then concatenate them. Although, it is more straight-forward to replace all chars you do not need as a post-processing step.
Here is an example code:
var pattern = #"^\$*(?:(?<v>\d{1,3})(?:,(?<v>\d{3}))*|(?<v>\d+))(?<v>\.\d+)?$";
var tests = new[] {"$100", "$12,203.00", "12JAN2022"};
foreach (var test in tests) {
var result = string.Concat(Regex.Match(test, pattern)?
.Groups["v"].Captures.Cast<Capture>().Select(x => x.Value));
Console.WriteLine("{0} -> {1}", test, result.Length > 0 ? result : "No match");
}
See the C# demo. Output:
$100 -> 100
$12,203.00 -> 12203.00
12JAN2022 -> No match
The regex is
^\$*(?:(?<v>\d{1,3})(?:,(?<v>\d{3}))*|(?<v>\d+))(?<v>\.\d+)?$
See the regex demo. Details:
^ - start of string
\$* - zero or more dollar symbols
(?:(?<v>\d{1,3})(?:,(?<v>\d{3}))*|(?<v>\d+)) - either one to three digits (captured into Group "v") and then zero or more occurrences of a comma and then three digits (captured into Group "v"), or one or more digits (captured into Group "v")
(?<v>\.\d+)? - an optional occurrence of . and one or more digits (all captured into Group "v")
$ - end of string.
I don't know how to achieve this in single regexp, but personal opinion here I find dividing the problem into smaller steps a good idea - it's easier to implement and maintain/understand in the future without sacrificing time to understand the magic.
replace all $ and , to empty string
[\$\,] => ``
match only digits and periods as a capture group (of course you may need to align this with your requirements on allowed period locations etc.)
^((\d{1,3}\.?)+)$
Hope this helps!

Regex split and merge into single record

In my C# application I'm using the below Regex to split the string ([A-Z0-9]{20}\d{0}). But it is splitting the ErrorCode and ErrorMsg as two different records but I need ErrorCode and ErrorMgs in the Single Array record.
For Example:
Current Logic:
[0] 05300030000GN0030018
[1 Field is required.
But I need like below one
[0] 05300030000GN0030018Field is required.
Current Implementation:
Expected output
Assuming the msg is never empty and \d{0} was used to fail any match if the next char after [A-Z0-9]{20} is a digit, you can use
var result = Regex.Matches(input, #"\b[A-Z0-9]{20}\D.*?(?=\b[A-Z0-9]{20}\D|\z)", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Value)
.ToList();
See the regex demo. Note that in case msg can be empty you need to use a (?!\d) lookahead instead of \D, #"\b[A-Z0-9]{20}(?!\d).*?(?=\b[A-Z0-9]{20}(?!\d)|\z)".
Details:
\b - word boundary (need to make sure the char limit is fine)
[A-Z0-9]{20} - twenty uppercase ASCII letters or digits
\D - a non-digit char
.*? - any zero or more chars as few as possible
(?=\b[A-Z0-9]{20}\D|\z) - a positive lookahead that requires a word boundary, twenty uppercase ASCII letters or digits and a non-digit or end of string immediately to the right of the current location.

C# How to filtered datatable rows which containing alphanumeric with special characters using Regex

I have below data in my C# Datatable
What I want is to filter those data which has Alphanumeric with special characters like:
HOAUD039#
HOAUD00$
So I try below regex in my linq query:
var matches =
dt.AsEnumerable()
.Where(row => Regex.IsMatch(row["Empolyee_CRC"].ToString(),
"^[a-zA-Z0-9!##$&()\\-`.+,/\"]*$"))
.CopyToDataTable();
which returns me both Alphanumeric result and Alphanumeric with characters like below:
Now my question is simple and clear what is the right way to show results only having Alphanumeric with special characters.
I've also tried this regex but it is also not work
^(?:[\d,\/().]*[a-zA-Z][a-zA-Z\d,\/().]*)?$
You can try this based on your example patterns this will serve
^(?=.*\d)(?=.*[A-Za-z])(?=.*[!##$&()\\-`.+,\/\"]).*$
Explanation
^ - Anchor to start of string.
(?=.*\d) - Condition for checking at least one digit must be there in match.
(?=.*[A-Za-z]) - Condition for checking at least one character must be there in match.
(?=.*[!##$&()\\-.+,/\"])` - Condition for checking at least one special must be there in match.
.* - Match anything except newline.
$ - End of string.
Demo
In your regex you are using a single chararacter class which will only select one out of many, but your have 3 requirements.
In your second regex, everything is optional due to the * and the ?
You could use 3 positive lookaheads to assert your requirements:
^(?=.*\d)(?=.*[!##$&()`.+,\/\-])(?=.*[A-Z])[A-Z\d!##$&()`.+,\/\-]+$
In C#:
string pattern = #"^(?=.*\d)(?=.*[!##$&()`.+,\/\-])(?=.*[a-zA-Z])[a-zA-Z\d!##$&()`.+,\/-]+$";
That will match:
^ Start of string
(?=.*\d) Assert a digit
(?=.*[!##$&().+,/-])` Assert a special character
(?=.*[A-Za-z]) Assert a lowercase or uppercase character
[A-Za-z\d!##$&().+,/-]+` Match 1+ times only the allowed characters
$ End of the string
Regex demo | C# Demo

Regex Match all characters until reach character, but also include last match

I'm trying to find all Color Hex codes using Regex.
I have this string value for example - #FF0000FF#0038FFFF#51FF00FF#F400FFFF and I use this:
#.+?(?=#)
pattern to match all characters until it reaches #, but it stops at the last character, which should be the last match.
I'm kind of new to this Regex stuff. How could I also get the last match?
Your regex does not match the last value because your regex (with the positive lookahead (?=#)) requires a # to appear after an already consumed value, and there is no # at the end of the string.
You may use
#[^#]+
See the regex demo
The [^#] negated character class matches any char but # (+ means 1 or more occurrences) and does not require a # to appear immediately to the right of the currently matched value.
In C#, you may collect all matches using
var result = Regex.Matches(s, #"#[^#]+")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
A more precise pattern you may use is #[A-Fa-f0-9]{8}, it matches a # and then any 8 hex chars, digits or letters from a to f and A to F.
Don't rely upon any characters after the #, match hex characters and it
will work every time.
(?i)#[a-f0-9]+

Regular Expression: Section names with unknown length?

I have a text block that is formatted like this:
1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 etc
An unknow number of numbers and dots (sections of a legal document)
How can I capture the full section (1.2.3.4.5) into a group?
I use C# but any regex is fine, aI can translate it.
UPDATED
Use this Regex:
Regex.Matches(inputString, #"\d[\.\d]*(?<!\.)");
explain:
\d digits (0-9)
[.\d]* any character of: '.', digits (0-9)
(0 or more times, matching the most amount possible))
(?<! subexpression) Zero-width negative lookbehind assertion.
string s = "1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 or 2222.3333.111.5 etc";
var matches = Regex.Matches(s, #"\d+(\.\d+)*").Cast<Match>()
.Select(m => m.Value)
.ToArray();
well, if you know you can't go beyond 5, then you can do
#"1+((.2+)((.3+)((.4+)(.5+)?)?)?)?"
and you can expand on that pattern for every symbol, up to a finite number of symbols
the + means any number of occurrences of the symbol, but at least 1. IF 0 is valid, you can use * instead
put ?: after an opening parenthesies if you don't want the pattern to be captured
like example: (?:abc)
I ommitted them to make the regex more readable.
the ? after the parenthesies, means 1 or 0 of the preceding symbol.
Now if you don't know how far you string can go, for instance
"1.2.3.4......252525.262626.272727.......n.n.n" than my intuition tells me that you can't do that with regex.

Categories

Resources