Extract sub string between two string with new line

Extract sub string between two string with new line - c#

Please help me to write a regular expression to extract the entire content between * .
Note the number of * characters can vary.
I tried (\*\n)([\s\S]*)(\n\*) but it groups everything as 1 block instead of 2.
Expected Output
1.
Thanks for contacting us
Regards,
XXX
2.
It wAS a pleasure talking with you
Good to see you today
Test string:
*******
Thanks for contacting us
Regards,
XXX
************
It wAS a pleasure talking with you
Good to see you today
*******

You may use
var results = Regex.Matches(s, #"(?s)\*{3,}(.*?)(?=\*{3,}|$)")
.Cast<Match>()
.Select(x => x.Groups[1].Value.Trim())
.ToList();
See the regex demo
Details
(?s) - RegexOptions.Singleline inline modifier
\*{3,} - 3 or more asterisks
(.*?) - Group 1: any 0+ chars, as few as possible as *? is a lazy quantifier
(?=\*{3,}|$) - a positive lookahead (required to obtain overlapping matches) that matches a location that is followed with 3 or more asterisks or the end of string.
The .Select(x => x.Groups[1].Value.Trim()) part grabs the value inside Group 1 and trims off leading/trailing whitespace.
Another way is to match the ***+ line, then capture all lines not stasrting with 3 or more asterisks into Group 1:
(?m)^\*{3,}.*((?:\r?\n(?!\*{3,}).*)*)
See this regex demo (it can be used in the above code as is, too.)
Details
(?m) - a RegexOptions.Multiline modifier to make ^ and $ match start/end of a line
^ - start of line
\*{3,} - 3 or more asterisks
.* - the rest of the line (or use \r?$ to make sure the end of a line is reached)
((?:\r?\n(?!\*{3,}).*)*) - Group 1: zero or more sequences of
\r?\n(?!\*{3,}) - CRLF or LF line ending that is not followed with 3 or more *s
.* - rest of the string

Related

Regex C# - optional group in the middle

I have such source text with an optional group in the middle:
GH22-O0-TFS-SFSD 00-1-006.19135
GH22-O0-TFS-SFSD 00-1-006.1.19135
Desired value in the first case will be '19135' and in the second '1.19135'.
Regex has to match the whole string and select all characters after first "." - which is my Group 1. I tried to make subgroups and mark Group 3 as optional but it is not working.
Regex:
.*\.0*(([0-9])(\.0*([0-9]+)))
How it should be changed to capture desired values?

This should work for you:
.*?\.(.*)
This will match the whole string and include everything after the first period in capture group 1 regardless of character type.

You can use
^(.*?)\.0*(\d+)(?:\.0*(\d+))?$
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than an LF char as few as possoble (as *? is a lazy quantifier)
\. - a dot
0* - zero or more zeros
(\d+) - Group 2: any one or more digits
(?:\.0*(\d+))? - an optional occurrence of ., zero or more zeros, then Group 3 capturing one or more digits
$ - end of string.

I hope I understand your goals and this should work:
.*?\.([\d.]+)
.*?\. - loosely capture everything leading up to the first period
([\d.]+) - capture the remaining digits and periods into capture group #1
https://regex101.com/r/0t9Ijy/1

Regex - Extract second position digit from string

I have a regex:
var thisMatch = Regex.Match(result, #"(?-s).+(?=[\r\n]+The information appearing in this document)", RegexOptions.IgnoreCase);
This returns the line before "The information appearing in this document" just fine.
The output of my regex is
10 880 $10,000 $800 $25 $10
I need to extract 880, which will always be in second position (the number before 880 could be vary, so \d{0,2} shouldn't be allowed).
How can I grab the second position number?

You can use something like
(?<=^\S+[\p{Zs}\t]+)\d+(?=.*[\r\n]+The information appearing in this document)
See the .NET regex demo. In C#:
var output = Regex.Match(result, #"(?<=^\S+[\p{Zs}\t]+)\d+(?=.*[\r\n]+The information appearing in this document)", RegexOptions.Multiline)?.Value;
Or, you could capture the number and grab it from a group with
^\S+[\p{Zs}\t]+(\d+).*[\r\n]+The information appearing in this document
See this regex demo. In C#:
var output = Regex.Match(result, #"^\S+[\p{Zs}\t]+(\d+).*[\r\n]+The information appearing in this document", RegexOptions.Multiline)?.Groups[1].Value;
Regex details:
(?<= - start of a positive lookbehind that requires its pattern to match immediately to the left of the current location:
^ - start of a line (due to the RegexOptions.Multiline)
\S+ - one or more non-whitespace chars
[\p{Zs}\t]+ - one or more horizontal whitespaces
) - end of the lookbehind
\d+ - one or more digits (use \S+ if you are sure this will always be the non-whitespace char streak)
(?= - start of a positive lookahead that requires its pattern to match immediately to the right of the current location:
.* - the rest of the line (as . does not match an LF char)
[\r\n]+ - one or more CR/LF chars
The information appearing in this document - literal text
) - end of the lookahead.

If you insert
\d+\s(\d+)
this will capture a leading number (\d+), separated by a whitespace (\s) from the number you're looking for ((\d+)), captured in a capture group so you can easily access it.
Check the tab Split List in this online demo

Regex start new match at specific pattern

Hello im kinda new to regex and have a small, maybe simple question.
I have the given text:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.
But i need the Additional test text also in the second group.
i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.
How can i match everything until a new line with a date starts and create a new match from there on?

If you are sure there is only one additional line to be matched you can use
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo. Details:
(?m) - a multiline modifier
^ - start of a line
(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string
\s* - zero or more whitespaces
(.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.
If there can be any amount of lines, you may consider
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo. Here,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace
[\p{Zs}\t]* - 0+ horizontal whitespace chars
(?s) - now, . will match any chars including a newline
(.*?) - Group 2: any zero or more chars, as few as possible
(?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.

You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.
You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
Regex demo
If multiple lines can follow, match all following lines that do not start with a date like pattern.
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation
^ Start of the string
( Capture group1
\d{2}\.\d{2}\.\d{4} Match a date like pattern
) Close group 1
\s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*)
( Capture group 2
.* Match the whole line
(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern
) Close group 2
Regex demo

How to read line by line in C#?

How can read line by line at the below expression?
CurrentTime=04/24/16 09:57:23
san-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}
ass
bbbb
CurrentTime=04/24/16 09:57:23
631706
aaa
bbb
I want to write this expression current time to current time .
For example :
(?<=(?<=CurrentTime\=)[0-9].*)\n(.*)
I wrote this formula but it didn't solve my problem.It is only read one line after the current time. But I want to read all line :
san-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}
ass
bbbb
to other current time flag.

You can use
(?sm)^CurrentTime=([^\n]*)\n(.*?)(?=^CurrentTime=|\z)
See the regex demo
The pattern matches
^ - start of a line (since the /m modifier is used)
CurrentTime= - the sequence of literal characters CurrentTime=
([^\n]*)\n - matches and captures into Group 1 zero or more characters other than a newline and will just match the following newline
(.*?) - Group 2 capturing zero or more any characters (incl. a newline since the DOTALL /s modifier is used) but as few as possible, up to the first
(?=^CurrentTime=|\z) - CurrentTime= at the beginning of a line, or the end of string (\z)
Performance Update
To unroll the current regex, just use negated character classes with some additional grouping and a negative lookahead:
(?m)^CurrentTime=([^\n]*)\n([^\n]*(?:\n(?!CurrentTime=)[^\n]*)*)
See another regex demo
In .NET:
var input = "CurrentTime=04/24/16 09:57:23\nsan-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}\nass\nbbbb\nCurrentTime=04/24/16 09:57:23\n631706\naaa\nbbb\n";
var pat = #"(?m)^CurrentTime=([^\n]*)\n([^\n]*(?:\n(?!CurrentTime=)[^\n]*)*)";
var results = Regex.Matches(input, pat)
.Cast<Match>()
.Select(p => p.Groups[2].Value) // Get the capture group 2 values
.ToList();
Console.WriteLine(string.Join("\n---\n", results));
See the Ideone demo
See regex demo at RegexStorm

Matching pattern to end of line

I am trying to get some in-line comments from a text file and need some help with the expression.
this comes before selection
this is on the same line %% this is the first group and it can have any character /[{3$5!+-p
here is some more text in the middle
this stuff is also on a line with a comment %% this is the second group of stuff !##%^()<>/~`
this goes after the selections
I am trying to get everything that follows %%\s+. Here is what I tried:
%%\s+(.*)$
But that matches all text following the first %%. Not sure where to go from here.

Most engines default to the dot does not match newlines
AND not multi-line mode.
That means %%\s+(.*)$ should not match unless it finds
%% on the last line in the string.
Instead of trying to fight it, use inline modifiers (?..) that
override external switches.
Use (?-s)%%\s+(.*) which takes off dot all

Since . matches any character but a newline by default, you needn't use $:
%%\s+(.*)
See regex demo
Explanation:
%% - two literal % symbols
\s+ - 1 or more whitespace
(.*) - 0 or more any characters other than a newline (captured into Group 1)
C# demo:
var s = "THE_STRING";
var result = Regex.Matches(s, #"%%\s+(.*)")
.Cast<Match>()
.Select(p=>p.Groups[1].Value)
.ToList();
Console.WriteLine(string.Join("\n", result));

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract sub string between two string with new line - c#

Related

Regex C# - optional group in the middle

Regex - Extract second position digit from string

Regex start new match at specific pattern

How to read line by line in C#?

Matching pattern to end of line

Categories

Resources