Regex Conditional Values - c#

If I have a string like the following that can have two possible values (although the value JB37 can be variable)
String One\r\nString Two\r\n
String One\r\nJB37\r\n
And I only want to capture the string if the value following String One\r\n does NOT equal String Two\r\n, how would I code that in Regex?
So normally without any condition, this is what I want:
String One\r\n(.+?)\r\n

With regex, you may resort to a negative lookahead:
String One\r\n(?!String Two(?:\r\n|$))(.*?)(?:\r\n|$)
See the regex demo
You may also use [^\r\n] instead of .:
String One\r\n(?!String Two(?:\r\n|$))([^\r\n]*)
If you use RegexOptions.Multiline, you will also be able to use
(?m)String One\r\n(?!String Two\r?$)(.*?)\r?$
See yet another demo.
Details
(?m) - a RegexOptions.Multiline option that makes ^ match start of a line and $ end of line positions
String One\r\n - String One text followed with a CRLF line ending
(?!String Two\r?$) - a negative lookahead that fails the match if immediately to the right of the current location, there is String Two at the end of the line
(.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible, up to the leftmost occurrence of
\r?$ - an optional CR and end of the line (note that in a .NET regex, $ matches only in front of LF, not CR, in the multiline mode, thus, \r? is necessary).
C# demo:
var m = Regex.Match(s, #"(?m)String One\r\n(?!String Two\r?$)(.*?)\r?$");
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
If CR can be missing, add ? after each \r in the pattern.

Related

Regex: Check if there are more than x line breaks

I need to validate a string according to the occurence of line breaks.
The input is okay if there are no more than say 6 line breaks.
The input it not okay if there are more than say 6 line breaks.
Of course between the line breaks can (but does not have to) occur other characters.
I need to solve this solely within the regular expression because I cannot add any additional code.
I tought about something like this:
/^(\r\n|\r|\n){0,6}$/ // not working :[
You can use
Regex.IsMatch(input, #"^.*(?:\n.*){0,6}\z")
Or, if your line endings can be single CR/LF, you should bear in mind that in a .NET regex, . - without the RegexOptions.Singleline option - matches any chars but LF, and matches CR chars, so you will need to use something like
Regex.IsMatch(input, #"^[^\r\n]*(?:(?:\r\n?|\n)[^\r\n]*){0,6}\z")
The regex matches
^ - start of string
.* - any zero or more chars other than line feed (\n) char as many as possible (= a line)
(?:\n.*){0,6} - zero to six consecutive occurrences of an LF char and then any zero or more chars other than an LF char as many as possible
\z - the very end of string.
The second pattern matches
^ - start of string
[^\r\n]* - zero or more chars other than LF and CR as many as possible
(?:(?:\r\n?|\n)[^\r\n]*){0,6} - zero to six occurrences of
(?:\r\n?|\n) - either CRLF, or CR, or LF
[^\r\n]* - zero or more chars other than LF and CR as many as possible
\z - the very end of string.

Regex option "Multiline"

I have a regex to match date format with comma.
yyyy/mm/dd or yyyy/mm
For example:
2016/09/02,2016/08,2016/09/30
My code:
string data="21535300/11/11\n";
Regex reg = new Regex(#"^(20\d{2}/(0[1-9]|1[012])(/(0[1-9]|[12]\d|30|31))?,?)*$",
RegexOptions.Multiline);
if (!reg.IsMatch(data))
"Error".Dump();
else
"True".Dump();
I use option multiline.
If string data have "\n".
Any character will match this regex.
For example:
string data="test\n"
string data="2100/1/1"
I find option definition in MSDN. It says:
It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.
I didn't understand why this problem has happened.
Anyone can explan it?
Thanks.
Your regex can match an empty line that you get once you add a newline at the end of the string. "test\n" contains 2 lines, and the second one gets matched.
See your regex pattern in a free-spacing mode:
^ # Matches the start of a line
( # Start of Group 1
20\d{2}/
(0[1-9]|1[012])
(/
(0[1-9]|[12]\d|30|31)
)?,?
)* # End of group 1 - * quantifier makes it match 0+ times
$ # End of line
If you do not want it to match an empty line, replace the last )* with )+.
An alternative is to use a more unrolled pattern like
^20\d{2}/(0[1-9]|1[012])(/(0[1-9]|[12]\d|3[01]))?(,20\d{2}/(0[1-9]|1[012])(/(0[1-9]|[12]\d|3[01]))?)*$
See the regex demo. Inside the code, it is advisable to use a block and build the pattern dynamically:
string date = #"20\d{2}/(0[1-9]|1[012])(/(0[1-9]|[12]\d|3[01]))?";
Regex reg = new Regex(string.Format("^{0}(,{0})*$", date), RegexOptions.Multiline);
As you can see, the first block (after the start of the line ^ anchor) is obligatory here, and thus an empty line will never get matched.

Can't understand why regex doesn't work with start/end markers of the string

Here is my test regex with options IgnoreCase and Singleline :
^\s*((?<test1>[-]?\d{0,10}.\d{3})(?<test2>\d)?(?<test3>\d)?){1,}$
and input data:
24426990.568 128364695.70706 -1288.460
If I omit ^ (match start of line) and $ (match end of line)
\s*((?<test1>[-]?\d{0,10}.\d{3})(?<test2>\d)?(?<test3>\d)?){1,}
then everything works perfectly.
Why it doesn't work with string start/end markers (^/$)?
Thanks in advance.
The start and end is literally the start and end of the input string when in single line mode.
It only means the start of the line and the end of the line in multiline mode.
Please note that this means the entire input string.
So if you use:
24426990.568 128364695.70706 -1288.460
as your input string, then the beginning is the first white space and the end of the string will be the 0
As your pattern matches exactly one instance of what you are looking for the regex will fail when used with ^ and $. This is because it is looking for one instance of that pattern in the input string, but there are three.
You have two options:
Remove the ^ and $
Change the pattern to match at least one time

C# - Removing single word in string after certain character

I have string that I would like to remove any word following a "\", whether in the middle or at the end, such as:
testing a\determiner checking test one\pronoun
desired result:
testing a checking test one
I have tried a simple regex that removes anything between the backslash and whitespace, but it gives the following result:
string input = "testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*\s");
string output = regex.Replace(input, " ");
Result:
testing a one\pronoun
It looks like this regex matches from the backslash until the last whitespace in the string. I cannot seem to figure out how to match from the backlash to the next whitespace. Also, I am not guaranteed a whitespace at the end, so I would need to handle that. I could continue processing the string and remove any text after the backslash, but I was hoping I could handle both cases with one step.
Any advice would be appreciated.
Change .* which match any characters, to \w*, which only match word characters.
Regex regex = new Regex(#"\\\w*");
string output = regex.Replace(input, "");
".*" matches zero or more characters of any kind. Consider using "\w+" instead, which matches one or more "word" characters (not including whitespace).
Using "+" instead of "*" would allow a backslash followed by a non-"word" character to remain unmatched. For example, no matches would be found in the sentence "Sometimes I experience \ an uncontrollable compulsion \ to intersperse backslash \ characters throughout my sentences!"
With your current pattern, .* tells the parser to be "greedy," that is, to take as much of the string as possible until it hits a space. Adding a ? right after that * tells it instead to make the capture as small as possible--to stop as soon as it hits the first space.
Next, you want to end at not just a space, but at either a space or the end of the string. The $ symbol captures the end of the string, and | means or. Group those together using parentheses and your group collectively tells the parser to stop at either a space or the end of the string. Your code will look like this:
string input = #"testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*?(\s|$)");
string output = regex.Replace(input, " ");
Try this regex (\\[^\s]*)
(\\[^\s]*)
1st Capturing group (\\[^\s]*)
\\ matches the character \ literally
[^\s]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ].

regex to match string that contains brackets

I am reading in a file and verifying the contents of the file by checking each line. The string lines look like this:
CMD: [THIS_IS_THE_CMD]
DELAY: [5]
FLAGS: [ANY]
All I need to check is that the line follows that exact form and what is in between the brackets is either text (I have tried [A-Z_] but it's not working) or a number depending on the line.
What I have so far:
string line = "CMD: [THIS_IS_THE_CMD]";
if(!VerifyLine(#"^CMD: \[", line))
{
// No match, set error
}
private static bool VerifyLine(string regExp, string line)
{
Regex reg = new Regex(regExp);
return reg.IsMatch(line);
}
But this does not check the contents in between the brackets and it does not check for the closing bracket.
This should do it for you:
([A-Z_]*):\s*\[(\w*)\]
First group matches the part before the colon, second matches the part inside the []s.
First part can be any uppercase letter or underscore, second part can be any alphanumeric character of any case, or an underscore.
Additionally, you might use the following extras, which require the option that makes ^$ match EOLs instead of just BOF and EOF:
^([A-Z_]*):\s*\[(\w*)\]$ // will only match whole lines
^\s*([A-Z_]*):\s*\[(\w*)\]\s*$ // same as above but ignores extra whitespace
// on the beginning and end of lines
Different things you might use to capture the groups depending on your file format:
[A-Z] // matches any capital letter
[A-Za-z] // matches any letter
[A-Za-z0-9] // matches any alphanumeric character
\w // matches any "word character", which is any alnum character or _
try with this: ^\w+:\s*\[(\w+)\], \w will match alphabet, digit, and underscore
and another pattern will match upper case only: ^[A-Z\d_]+:\s*\[([A-Z\d_]+)\]
You tried ^CMD: \[, but your Regex contains Space. note that in regex you have to use \s to matching white spaces. try your regex but using \s:
if(!VerifyLine(#"^CMD:\s*\[", line))
...
explain:
\s Matches any white-space character.
* Matches the previous element zero or more times.

Categories

Resources