Regex Match all characters until reach character, but also include last match - c#

I'm trying to find all Color Hex codes using Regex.
I have this string value for example - #FF0000FF#0038FFFF#51FF00FF#F400FFFF and I use this:
#.+?(?=#)
pattern to match all characters until it reaches #, but it stops at the last character, which should be the last match.
I'm kind of new to this Regex stuff. How could I also get the last match?

Your regex does not match the last value because your regex (with the positive lookahead (?=#)) requires a # to appear after an already consumed value, and there is no # at the end of the string.
You may use
#[^#]+
See the regex demo
The [^#] negated character class matches any char but # (+ means 1 or more occurrences) and does not require a # to appear immediately to the right of the currently matched value.
In C#, you may collect all matches using
var result = Regex.Matches(s, #"#[^#]+")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
A more precise pattern you may use is #[A-Fa-f0-9]{8}, it matches a # and then any 8 hex chars, digits or letters from a to f and A to F.

Don't rely upon any characters after the #, match hex characters and it
will work every time.
(?i)#[a-f0-9]+

Related

How to read first set of numbers between strings using regex?

I have the following strings and need to extract the numbers as follows :
1) M123123123AD123 => 123123123
2) M1231231212MN23 => 1231231212
3) G12312312312DD => 12312312312
I am currently reading it using "\d+[0-9]". This works well if there is 1 number after the second set of characters. But if there are multiple numbers after the second character set, the above regex string picks them too. For example, 'M123123123AD123' will give 123123123123. But the last 123 should not be there.
You want to get a streak of digits in between two letters.
You can use
(?<=[a-zA-Z])\d+(?=[a-zA-Z])
See the .NET regex demo.
Or, if you want to get the digits after the leading non-digit chars, use
(?<=^\D+)\d+(?=[a-zA-Z])
See this .NET regex demo.
In C#, you can use Regex.Match:
var result = Regex.Match(text, #"(?<=^\D+)\d+(?=[a-zA-Z])")?.Value;
Regex details:
(?<=[a-zA-Z]) - right before the current location, there must be an ASCII letter (use \p{L} to match any letter)
(?<=^\D+) - right before the current location, there must be start of string + any one or more non-digit chars (use \D* if the digits can appear at the start of string)
\d+ - one or more digits
(?=[a-zA-Z]) - right after the current location, there must be an ASCII letter (use \p{L} to match any letter).

regular expression: The beginning and end of the string with a letter with a specified length

I use this pattern but I do not get the answer. Regex reg = new Regex(#"^([A-Za-z][A-Za-z0-9\.]*(?:[A-Za-z])){6,30}#mydomain.com$");
I want my string to start with a letter and end with a letter, with a combination of letters, numbers, and dots, provided it is between 6 and 30 characters long.
For example: a.124b#mydomain.com or abc.1e#mydomain.com and ...
string pattern = #"^[a-z][a-z0-9.]{4,28}[a-z]#mydomain\.com$";
string input = #"a.124b#mydomain.com";
RegexOptions options = RegexOptions.IgnoreCase;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
now the explanation:
^[a-z] start of the string, and one letter
[a-z0-9.]{4,28} letters, digits and dot character (you don't need to escape it when in square brackets), repeated between 4 and 28 times
[a-z] another single letter
(those in combination amont to 6 to 30 characters)
#mydomain\.com$ rest of your mail address and end of string.
notice also the RegexOptions.IgnoreCase - when you know you don't care about case, it makes letter groups a bit more readable
the error you made in your regex was adding the quantifier for your complete capture group - meaning a repetition of 6-30 times of the whole group.
i also recommend https://regex101.com/ for all your regex needs
Here is one option:
^(?![0-9.]|.*[0-9.]#)[a-zA-Z0-9.]{6,30}#mydomain\.com$
See the online demo
^ - Start line anchor.
(?![0-9.]|.*[0-9.]#) - Negative lookahead to prevent start with dot/digit or end dot/digit before the "#".
[a-zA-Z0-9.]{6,30} - 6-30 Characters specified in class.
#mydomain\.com - Literally match "#mydomain.com". Notice the backslash before the dot to make it literal (outside a character class).
$ - End line anchor.
I was going to mention a case-insensitive alternative, but it looks like #FranzGleichmann got you covered =)

Regex split and merge into single record

In my C# application I'm using the below Regex to split the string ([A-Z0-9]{20}\d{0}). But it is splitting the ErrorCode and ErrorMsg as two different records but I need ErrorCode and ErrorMgs in the Single Array record.
For Example:
Current Logic:
[0] 05300030000GN0030018
[1 Field is required.
But I need like below one
[0] 05300030000GN0030018Field is required.
Current Implementation:
Expected output
Assuming the msg is never empty and \d{0} was used to fail any match if the next char after [A-Z0-9]{20} is a digit, you can use
var result = Regex.Matches(input, #"\b[A-Z0-9]{20}\D.*?(?=\b[A-Z0-9]{20}\D|\z)", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Value)
.ToList();
See the regex demo. Note that in case msg can be empty you need to use a (?!\d) lookahead instead of \D, #"\b[A-Z0-9]{20}(?!\d).*?(?=\b[A-Z0-9]{20}(?!\d)|\z)".
Details:
\b - word boundary (need to make sure the char limit is fine)
[A-Z0-9]{20} - twenty uppercase ASCII letters or digits
\D - a non-digit char
.*? - any zero or more chars as few as possible
(?=\b[A-Z0-9]{20}\D|\z) - a positive lookahead that requires a word boundary, twenty uppercase ASCII letters or digits and a non-digit or end of string immediately to the right of the current location.

C# How to filtered datatable rows which containing alphanumeric with special characters using Regex

I have below data in my C# Datatable
What I want is to filter those data which has Alphanumeric with special characters like:
HOAUD039#
HOAUD00$
So I try below regex in my linq query:
var matches =
dt.AsEnumerable()
.Where(row => Regex.IsMatch(row["Empolyee_CRC"].ToString(),
"^[a-zA-Z0-9!##$&()\\-`.+,/\"]*$"))
.CopyToDataTable();
which returns me both Alphanumeric result and Alphanumeric with characters like below:
Now my question is simple and clear what is the right way to show results only having Alphanumeric with special characters.
I've also tried this regex but it is also not work
^(?:[\d,\/().]*[a-zA-Z][a-zA-Z\d,\/().]*)?$
You can try this based on your example patterns this will serve
^(?=.*\d)(?=.*[A-Za-z])(?=.*[!##$&()\\-`.+,\/\"]).*$
Explanation
^ - Anchor to start of string.
(?=.*\d) - Condition for checking at least one digit must be there in match.
(?=.*[A-Za-z]) - Condition for checking at least one character must be there in match.
(?=.*[!##$&()\\-.+,/\"])` - Condition for checking at least one special must be there in match.
.* - Match anything except newline.
$ - End of string.
Demo
In your regex you are using a single chararacter class which will only select one out of many, but your have 3 requirements.
In your second regex, everything is optional due to the * and the ?
You could use 3 positive lookaheads to assert your requirements:
^(?=.*\d)(?=.*[!##$&()`.+,\/\-])(?=.*[A-Z])[A-Z\d!##$&()`.+,\/\-]+$
In C#:
string pattern = #"^(?=.*\d)(?=.*[!##$&()`.+,\/\-])(?=.*[a-zA-Z])[a-zA-Z\d!##$&()`.+,\/-]+$";
That will match:
^ Start of string
(?=.*\d) Assert a digit
(?=.*[!##$&().+,/-])` Assert a special character
(?=.*[A-Za-z]) Assert a lowercase or uppercase character
[A-Za-z\d!##$&().+,/-]+` Match 1+ times only the allowed characters
$ End of the string
Regex demo | C# Demo

Regex pattern to separate string with semicolon and plus

Here I have used the below mentioned code.
MatchCollection matches = Regex.Matches(cellData, #"(^\[.*\]$|^\[.*\]_[0-9]*$)");
The only this pattern is not doing is it's not separating the semicolon and plus from the main string.
A sample string is
[dbServer];[ciDBNAME];[dbLogin];[dbPasswd] AND [SIM_ErrorFound#1]_+[#IterationCount]
I am trying to extract
[dbServer]
[ciDBNAME]
[dbLogin]
[dbPasswd]
[SIM_ErrorFound#1]
[#IterationCount]
from the string.
To extract the stuff in square brackets from [dbServer];[ciDBNAME];[dbLogin];[dbPasswd] AND [SIM_ErrorFound#1]_+[#IterationCount] (which is what I assume you're be trying to do),
The regular expression (I haven't quoted it) should be
\[([^\]]*)\]
You should not use ^ and $ as youre not interested in start and end of strings. The parentheses will capture every instance of zero or more characters inside square brackets.
If you want to be more specific about what you're capturing in the brackets, you'll need to change the [^\] to something else.
Your regex - (^\[.*\]$|^\[.*\]_[0-9]*$) - matches any full string that starts with [, then contains zero or more chars other than a newline, and ends with ] (\]$) or with _ followed with 0+ digits (_[0-9]*$). You could also write the pattern as ^\[.*](?:_[0-9]*)?$ and it would work the same.
However, you need to match multiple substrings inside a larger string. Thus, you should have removed the ^ and $ anchors and retried. Then, you would find out that .* is too greedy and matches from the first [ up to the last ]. To fix that, it is best to use a negated character class solution. E.g. you may use [^][]* that matches 0+ chars other than [ and ].
Edit: It seems you need to get only the text inside square brackets.
You need to use a capturing group, a pair of unescaped parentheses around the part of the pattern you need to get and then access the value by the group ID (unnamed groups are numbered starting with 1 from left to right):
var results = Regex.Matches(s, #"\[([^][]+)]")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
See the .NET regex demo

Categories

Resources