Matching pattern to end of line - c#

I am trying to get some in-line comments from a text file and need some help with the expression.
this comes before selection
this is on the same line %% this is the first group and it can have any character /[{3$5!+-p
here is some more text in the middle
this stuff is also on a line with a comment %% this is the second group of stuff !##%^()<>/~`
this goes after the selections
I am trying to get everything that follows %%\s+. Here is what I tried:
%%\s+(.*)$
But that matches all text following the first %%. Not sure where to go from here.

Most engines default to the dot does not match newlines
AND not multi-line mode.
That means %%\s+(.*)$ should not match unless it finds
%% on the last line in the string.
Instead of trying to fight it, use inline modifiers (?..) that
override external switches.
Use (?-s)%%\s+(.*) which takes off dot all

Since . matches any character but a newline by default, you needn't use $:
%%\s+(.*)
See regex demo
Explanation:
%% - two literal % symbols
\s+ - 1 or more whitespace
(.*) - 0 or more any characters other than a newline (captured into Group 1)
C# demo:
var s = "THE_STRING";
var result = Regex.Matches(s, #"%%\s+(.*)")
.Cast<Match>()
.Select(p=>p.Groups[1].Value)
.ToList();
Console.WriteLine(string.Join("\n", result));

Related

Why is Regex.Replace giving me weird result for last group

A simple example:
Regex.Replace("12345678910999999999", #"(\d{3})(\d{3})(\d{3})(\d{2})", "$1-$2-$3 $4")
This outputs to:
123-456-789 10999999999
But why? I have specifically set the group index i need. And that group index contains the exact value (checked in debugger).
Here is a fiddle:
https://dotnetfiddle.net/dkAPx3
Match the rest of the string with .* to truncate it:
Regex.Replace("12345678910999999999", #"^(\d{3})(\d{3})(\d{3})(\d{2}).*", "$1-$2-$3 $4")
I'd also add ^ at the start to match the beginning of the string.
See the .NET regex demo.
Your regex has matched and replaced only "first" part of string, add .* to the end of the pattern:
Regex.Replace("12345678910999999999", #"(\d{3})(\d{3})(\d{3})(\d{2}).*", "$1-$2-$3 $4"); // results in "123-456-789 10"

How can I filter out certain combinations?

I'm trying to filter the input of a TextBox using a Regex. I need up to three numbers before the decimal point and I need two after it. This can be in any form.
I've tried changing the regex commands around, but it creates errors and single inputs won't be valid. I'm using a TextBox in WPF to collect the data.
bool containsLetter = Regex.IsMatch(units.Text, "^[0-9]{1,3}([.] [0-9] {1,3})?$");
if (containsLetter == true)
{
MessageBox.Show("error");
}
return containsLetter;
I want the regex filter to accept these types of inputs:
111.11,
11.11,
1.11,
1.01,
100,
10,
1,
As it has been mentioned in the comment, spaces are characters that will be interpreted literally in your regex pattern.
Therefore in this part of your regex:
([.] [0-9] {1,3})
a space is expected between . and [0-9],
the same goes for after [0-9] where the regex would match 1 to 3 spaces.
This being said, for readability purpose you have several way to construct your regex.
1) Put the comments out of the regex:
string myregex = #"\s" // Match any whitespace once
+ #"\n" // Match one newline character
+ #"[a-zA-Z]"; // Match any letter
2) Add comments within your regex by using the syntax (?#comment)
needle(?# this will find a needle)
Example
3) Activate free-spacing mode within your regex:
nee # this will find a nee...
dle # ...dle (the split means nothing when white-space is ignored)
doc: https://www.regular-expressions.info/freespacing.html
Example

Extract sub string between two string with new line

Please help me to write a regular expression to extract the entire content between * .
Note the number of * characters can vary.
I tried (\*\n)([\s\S]*)(\n\*) but it groups everything as 1 block instead of 2.
Expected Output
1.
Thanks for contacting us
Regards,
XXX
2.
It wAS a pleasure talking with you
Good to see you today
Test string:
*******
Thanks for contacting us
Regards,
XXX
************
It wAS a pleasure talking with you
Good to see you today
*******
You may use
var results = Regex.Matches(s, #"(?s)\*{3,}(.*?)(?=\*{3,}|$)")
.Cast<Match>()
.Select(x => x.Groups[1].Value.Trim())
.ToList();
See the regex demo
Details
(?s) - RegexOptions.Singleline inline modifier
\*{3,} - 3 or more asterisks
(.*?) - Group 1: any 0+ chars, as few as possible as *? is a lazy quantifier
(?=\*{3,}|$) - a positive lookahead (required to obtain overlapping matches) that matches a location that is followed with 3 or more asterisks or the end of string.
The .Select(x => x.Groups[1].Value.Trim()) part grabs the value inside Group 1 and trims off leading/trailing whitespace.
Another way is to match the ***+ line, then capture all lines not stasrting with 3 or more asterisks into Group 1:
(?m)^\*{3,}.*((?:\r?\n(?!\*{3,}).*)*)
See this regex demo (it can be used in the above code as is, too.)
Details
(?m) - a RegexOptions.Multiline modifier to make ^ and $ match start/end of a line
^ - start of line
\*{3,} - 3 or more asterisks
.* - the rest of the line (or use \r?$ to make sure the end of a line is reached)
((?:\r?\n(?!\*{3,}).*)*) - Group 1: zero or more sequences of
\r?\n(?!\*{3,}) - CRLF or LF line ending that is not followed with 3 or more *s
.* - rest of the string

How to read line by line in C#?

How can read line by line at the below expression?
CurrentTime=04/24/16 09:57:23
san-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}
ass
bbbb
CurrentTime=04/24/16 09:57:23
631706
aaa
bbb
I want to write this expression current time to current time .
For example :
(?<=(?<=CurrentTime\=)[0-9].*)\n(.*)
I wrote this formula but it didn't solve my problem.It is only read one line after the current time. But I want to read all line :
san-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}
ass
bbbb
to other current time flag.
You can use
(?sm)^CurrentTime=([^\n]*)\n(.*?)(?=^CurrentTime=|\z)
See the regex demo
The pattern matches
^ - start of a line (since the /m modifier is used)
CurrentTime= - the sequence of literal characters CurrentTime=
([^\n]*)\n - matches and captures into Group 1 zero or more characters other than a newline and will just match the following newline
(.*?) - Group 2 capturing zero or more any characters (incl. a newline since the DOTALL /s modifier is used) but as few as possible, up to the first
(?=^CurrentTime=|\z) - CurrentTime= at the beginning of a line, or the end of string (\z)
Performance Update
To unroll the current regex, just use negated character classes with some additional grouping and a negative lookahead:
(?m)^CurrentTime=([^\n]*)\n([^\n]*(?:\n(?!CurrentTime=)[^\n]*)*)
See another regex demo
In .NET:
var input = "CurrentTime=04/24/16 09:57:23\nsan-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}\nass\nbbbb\nCurrentTime=04/24/16 09:57:23\n631706\naaa\nbbb\n";
var pat = #"(?m)^CurrentTime=([^\n]*)\n([^\n]*(?:\n(?!CurrentTime=)[^\n]*)*)";
var results = Regex.Matches(input, pat)
.Cast<Match>()
.Select(p => p.Groups[2].Value) // Get the capture group 2 values
.ToList();
Console.WriteLine(string.Join("\n---\n", results));
See the Ideone demo
See regex demo at RegexStorm

How to match a whole line with or without '\n' using regex

I want to match all comments in a text file and I use the following regex to match single line comment:
//(.*?)\r?\n
But it could not match the last line if the last line is a single comment line such as:
// test
so, how to write a single regex to match a whole line that with or without '\n' in C#, thanks!
You could write your regex as:
//(.*?)\r?$
The $ sign will match the end of the line.
You could consider this instead:
//(.*)\s*$
This will exclude any trailing whitespace (including newlines) from your capture group. I assume you wouldn't care about capturing trailing spaces.
The limitations of a regex to match comments are important to keep in mind: it will match some non-comment items in code. For example, consider this line:
get_web_page('http://www.foo.com');
If you only want to match comments on their own line, you could do this:
^\s*//(.*)\s*$
If you need to match comments that come after code, as well, the above problem can't be overcome easily with a regex.
Update: I am assuming that your code iterates through the file line by line, the most common case. However, if you have the entire file in a string and are matching the regex against that, you can enable multi-line mode for this to work.
^/s*?//([^\s]*?)/s*?$
a whole line start with '^' and end with '$', and some other white space characters '/s'
You dont need to specify new line because . matches any character except new line.
//(.*)
The last line does not contain a newline at the end, that is why your //(.*?)\r?\n regex fails.
You need to use a non-capturing group with a $ anchor as an alternative:
//(.*?)(?:\r?\n|$)
^^^^^^^^^^
See the .NET regex demo. Results:

Categories

Resources