Regex split and merge into single record - c#

In my C# application I'm using the below Regex to split the string ([A-Z0-9]{20}\d{0}). But it is splitting the ErrorCode and ErrorMsg as two different records but I need ErrorCode and ErrorMgs in the Single Array record.
For Example:
Current Logic:
[0] 05300030000GN0030018
[1 Field is required.
But I need like below one
[0] 05300030000GN0030018Field is required.
Current Implementation:
Expected output

Assuming the msg is never empty and \d{0} was used to fail any match if the next char after [A-Z0-9]{20} is a digit, you can use
var result = Regex.Matches(input, #"\b[A-Z0-9]{20}\D.*?(?=\b[A-Z0-9]{20}\D|\z)", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Value)
.ToList();
See the regex demo. Note that in case msg can be empty you need to use a (?!\d) lookahead instead of \D, #"\b[A-Z0-9]{20}(?!\d).*?(?=\b[A-Z0-9]{20}(?!\d)|\z)".
Details:
\b - word boundary (need to make sure the char limit is fine)
[A-Z0-9]{20} - twenty uppercase ASCII letters or digits
\D - a non-digit char
.*? - any zero or more chars as few as possible
(?=\b[A-Z0-9]{20}\D|\z) - a positive lookahead that requires a word boundary, twenty uppercase ASCII letters or digits and a non-digit or end of string immediately to the right of the current location.

Related

How to read first set of numbers between strings using regex?

I have the following strings and need to extract the numbers as follows :
1) M123123123AD123 => 123123123
2) M1231231212MN23 => 1231231212
3) G12312312312DD => 12312312312
I am currently reading it using "\d+[0-9]". This works well if there is 1 number after the second set of characters. But if there are multiple numbers after the second character set, the above regex string picks them too. For example, 'M123123123AD123' will give 123123123123. But the last 123 should not be there.
You want to get a streak of digits in between two letters.
You can use
(?<=[a-zA-Z])\d+(?=[a-zA-Z])
See the .NET regex demo.
Or, if you want to get the digits after the leading non-digit chars, use
(?<=^\D+)\d+(?=[a-zA-Z])
See this .NET regex demo.
In C#, you can use Regex.Match:
var result = Regex.Match(text, #"(?<=^\D+)\d+(?=[a-zA-Z])")?.Value;
Regex details:
(?<=[a-zA-Z]) - right before the current location, there must be an ASCII letter (use \p{L} to match any letter)
(?<=^\D+) - right before the current location, there must be start of string + any one or more non-digit chars (use \D* if the digits can appear at the start of string)
\d+ - one or more digits
(?=[a-zA-Z]) - right after the current location, there must be an ASCII letter (use \p{L} to match any letter).

Get string after the last comma or the last number using Regex in C#

How can I get the string after the last comma or last number using regex for this examples:
"Flat 1, Asker Horse Sports", -- get string after "," result: "Asker
Horse Sports"
"9 Walkers Barn" -- get string after "9" result:
Walkers Barn
I need that regex to support both cases or to different regex rules, each / case.
I tried /,[^,]*$/ and (.*),[^,]*$ to get the strings after the last comma but no luck.
You can use
[^,\d\s][^,\d]*$
See the regex demo (and a .NET regex demo).
Details
[^,\d\s] - any char but a comma, digit and whitespace
[^,\d]* - any char but a comma and digit
$ - end of string.
In C#, you may also tell the regex engine to search for the match from the end of the string with the RegexOptions.RightToLeft option (to make regex matching more efficient. although it might not be necessary in this case if the input strings are short):
var output = Regex.Match(text, #"[^,\d\s][^,\d]*$", RegexOptions.RightToLeft)?.Value;
You were on the right track the capture group in (.*),[^,]*$, but the group should be the part that you are looking for.
If there has to be a comma or digit present, you could match until the last occurrence of either of them, and capture what follows in the capturing group.
^.*[\d,]\s*(.+)$
^ Start of string
.* Match any char except a newline 0+ times
[\d,] Match either , or a digit
\s* Match 0+ whitespace chars
(.+) Capture group 1, match any char except a newline 1+ times
$ End of string
.NET regex demo | C# demo

Regex Remove unnecessary symbols with certain rules c#

I want to remove all unnecessary characters so the name can be valid, here are the rules :
• Has length between 3 and 16 characters
• Contains only letters, numbers, hyphens and underscores
• Has no redundant symbols before, after or in between
This is the input:
Jeff, john45, ab, cd, peter-ivanov, #smith, sh, too_long_username, !lleg#l ch#rs, jeffbutt
My Regex so far is : https://regexr.com/4ahls, and I want to remove:
#smith
!lleg#l
ch#rs
Your own regex \b([a-zA-Z0-9_-]){3,16}\b is good enough for giving the intended match but \b fails to do their job and will allow matching partially in a word like #smith to give you smith because # is not part of word character and hence s in smith will match as the point in between # and s is indeed a word boundary. You will need a different regex ensuring the word is preceded/followed by a space and in addition comma too as some words are followed by comma and you want to count them in. Try using this regex,
(?<= |^)[a-zA-Z0-9_-]{3,16}(?=[ ,]|$)
Demo
This should give you matches to only words that follow your rules.
Note: Always keep - either at very start or very end while having it in a character set, otherwise it sometimes behaves weird and gives unexpected results.
You could try this pattern: (?=^[a-zA-Z0-9-_]{3,16}$).+
Generally positive lookaheads (?=...) are used to assert that some rules are valid, as you want to do. Explanation:
^ - match beginning of a string
[a-zA-Z0-9-_]{3,16} - match at least 3 and 16 at most of characters in a character class: a-zA-Z - all letters, 0-9 - digits, -_ - hyphen or underscore
$ - end of a string
And if this assertion is successfull, then match everything with .*
Demo
You actually do not need a regex to solve this. Use old good string.Split() and process names
var input = "Jeff, john45, ab, cd, peter-ivanov, #smith, sh, too_long_username, !lleg#l ch#rs, jeffbutt";
var listOfNames = input.Split(new[] {",", " "}, StringSplitOptions.RemoveEmptyEntries)
.Where(l => l.Length >= 3 && l.Length <= 18) // filter by length
.Where(l => l.All(c => char.IsDigit(c) || char.IsLetter(c) || c == '-')) // filter by spec chars
.ToList();
now you have a list of four names. If you want to turn it back to string just join your names:
var singleLine = string.Join(", ", listOfNames);
// singleLine is "Jeff, john45, peter-ivanov, jeffbutt"

Regex Match all characters until reach character, but also include last match

I'm trying to find all Color Hex codes using Regex.
I have this string value for example - #FF0000FF#0038FFFF#51FF00FF#F400FFFF and I use this:
#.+?(?=#)
pattern to match all characters until it reaches #, but it stops at the last character, which should be the last match.
I'm kind of new to this Regex stuff. How could I also get the last match?
Your regex does not match the last value because your regex (with the positive lookahead (?=#)) requires a # to appear after an already consumed value, and there is no # at the end of the string.
You may use
#[^#]+
See the regex demo
The [^#] negated character class matches any char but # (+ means 1 or more occurrences) and does not require a # to appear immediately to the right of the currently matched value.
In C#, you may collect all matches using
var result = Regex.Matches(s, #"#[^#]+")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
A more precise pattern you may use is #[A-Fa-f0-9]{8}, it matches a # and then any 8 hex chars, digits or letters from a to f and A to F.
Don't rely upon any characters after the #, match hex characters and it
will work every time.
(?i)#[a-f0-9]+

Regular Expression: Section names with unknown length?

I have a text block that is formatted like this:
1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 etc
An unknow number of numbers and dots (sections of a legal document)
How can I capture the full section (1.2.3.4.5) into a group?
I use C# but any regex is fine, aI can translate it.
UPDATED
Use this Regex:
Regex.Matches(inputString, #"\d[\.\d]*(?<!\.)");
explain:
\d digits (0-9)
[.\d]* any character of: '.', digits (0-9)
(0 or more times, matching the most amount possible))
(?<! subexpression) Zero-width negative lookbehind assertion.
string s = "1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 or 2222.3333.111.5 etc";
var matches = Regex.Matches(s, #"\d+(\.\d+)*").Cast<Match>()
.Select(m => m.Value)
.ToArray();
well, if you know you can't go beyond 5, then you can do
#"1+((.2+)((.3+)((.4+)(.5+)?)?)?)?"
and you can expand on that pattern for every symbol, up to a finite number of symbols
the + means any number of occurrences of the symbol, but at least 1. IF 0 is valid, you can use * instead
put ?: after an opening parenthesies if you don't want the pattern to be captured
like example: (?:abc)
I ommitted them to make the regex more readable.
the ? after the parenthesies, means 1 or 0 of the preceding symbol.
Now if you don't know how far you string can go, for instance
"1.2.3.4......252525.262626.272727.......n.n.n" than my intuition tells me that you can't do that with regex.

Categories

Resources