Please help me to write a regular expression to extract the entire content between * .
Note the number of * characters can vary.
I tried (\*\n)([\s\S]*)(\n\*) but it groups everything as 1 block instead of 2.
Expected Output
1.
Thanks for contacting us
Regards,
XXX
2.
It wAS a pleasure talking with you
Good to see you today
Test string:
*******
Thanks for contacting us
Regards,
XXX
************
It wAS a pleasure talking with you
Good to see you today
*******
You may use
var results = Regex.Matches(s, #"(?s)\*{3,}(.*?)(?=\*{3,}|$)")
.Cast<Match>()
.Select(x => x.Groups[1].Value.Trim())
.ToList();
See the regex demo
Details
(?s) - RegexOptions.Singleline inline modifier
\*{3,} - 3 or more asterisks
(.*?) - Group 1: any 0+ chars, as few as possible as *? is a lazy quantifier
(?=\*{3,}|$) - a positive lookahead (required to obtain overlapping matches) that matches a location that is followed with 3 or more asterisks or the end of string.
The .Select(x => x.Groups[1].Value.Trim()) part grabs the value inside Group 1 and trims off leading/trailing whitespace.
Another way is to match the ***+ line, then capture all lines not stasrting with 3 or more asterisks into Group 1:
(?m)^\*{3,}.*((?:\r?\n(?!\*{3,}).*)*)
See this regex demo (it can be used in the above code as is, too.)
Details
(?m) - a RegexOptions.Multiline modifier to make ^ and $ match start/end of a line
^ - start of line
\*{3,} - 3 or more asterisks
.* - the rest of the line (or use \r?$ to make sure the end of a line is reached)
((?:\r?\n(?!\*{3,}).*)*) - Group 1: zero or more sequences of
\r?\n(?!\*{3,}) - CRLF or LF line ending that is not followed with 3 or more *s
.* - rest of the string
Related
I have such source text with an optional group in the middle:
GH22-O0-TFS-SFSD 00-1-006.19135
GH22-O0-TFS-SFSD 00-1-006.1.19135
Desired value in the first case will be '19135' and in the second '1.19135'.
Regex has to match the whole string and select all characters after first "." - which is my Group 1. I tried to make subgroups and mark Group 3 as optional but it is not working.
Regex:
.*\.0*(([0-9])(\.0*([0-9]+)))
How it should be changed to capture desired values?
This should work for you:
.*?\.(.*)
This will match the whole string and include everything after the first period in capture group 1 regardless of character type.
You can use
^(.*?)\.0*(\d+)(?:\.0*(\d+))?$
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than an LF char as few as possoble (as *? is a lazy quantifier)
\. - a dot
0* - zero or more zeros
(\d+) - Group 2: any one or more digits
(?:\.0*(\d+))? - an optional occurrence of ., zero or more zeros, then Group 3 capturing one or more digits
$ - end of string.
I hope I understand your goals and this should work:
.*?\.([\d.]+)
.*?\. - loosely capture everything leading up to the first period
([\d.]+) - capture the remaining digits and periods into capture group #1
https://regex101.com/r/0t9Ijy/1
I have a regex:
var thisMatch = Regex.Match(result, #"(?-s).+(?=[\r\n]+The information appearing in this document)", RegexOptions.IgnoreCase);
This returns the line before "The information appearing in this document" just fine.
The output of my regex is
10 880 $10,000 $800 $25 $10
I need to extract 880, which will always be in second position (the number before 880 could be vary, so \d{0,2} shouldn't be allowed).
How can I grab the second position number?
You can use something like
(?<=^\S+[\p{Zs}\t]+)\d+(?=.*[\r\n]+The information appearing in this document)
See the .NET regex demo. In C#:
var output = Regex.Match(result, #"(?<=^\S+[\p{Zs}\t]+)\d+(?=.*[\r\n]+The information appearing in this document)", RegexOptions.Multiline)?.Value;
Or, you could capture the number and grab it from a group with
^\S+[\p{Zs}\t]+(\d+).*[\r\n]+The information appearing in this document
See this regex demo. In C#:
var output = Regex.Match(result, #"^\S+[\p{Zs}\t]+(\d+).*[\r\n]+The information appearing in this document", RegexOptions.Multiline)?.Groups[1].Value;
Regex details:
(?<= - start of a positive lookbehind that requires its pattern to match immediately to the left of the current location:
^ - start of a line (due to the RegexOptions.Multiline)
\S+ - one or more non-whitespace chars
[\p{Zs}\t]+ - one or more horizontal whitespaces
) - end of the lookbehind
\d+ - one or more digits (use \S+ if you are sure this will always be the non-whitespace char streak)
(?= - start of a positive lookahead that requires its pattern to match immediately to the right of the current location:
.* - the rest of the line (as . does not match an LF char)
[\r\n]+ - one or more CR/LF chars
The information appearing in this document - literal text
) - end of the lookahead.
If you insert
\d+\s(\d+)
this will capture a leading number (\d+), separated by a whitespace (\s) from the number you're looking for ((\d+)), captured in a capture group so you can easily access it.
Check the tab Split List in this online demo
Hello im kinda new to regex and have a small, maybe simple question.
I have the given text:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.
But i need the Additional test text also in the second group.
i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.
How can i match everything until a new line with a date starts and create a new match from there on?
If you are sure there is only one additional line to be matched you can use
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo. Details:
(?m) - a multiline modifier
^ - start of a line
(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string
\s* - zero or more whitespaces
(.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.
If there can be any amount of lines, you may consider
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo. Here,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace
[\p{Zs}\t]* - 0+ horizontal whitespace chars
(?s) - now, . will match any chars including a newline
(.*?) - Group 2: any zero or more chars, as few as possible
(?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.
You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.
You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
Regex demo
If multiple lines can follow, match all following lines that do not start with a date like pattern.
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation
^ Start of the string
( Capture group1
\d{2}\.\d{2}\.\d{4} Match a date like pattern
) Close group 1
\s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*)
( Capture group 2
.* Match the whole line
(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern
) Close group 2
Regex demo
How can read line by line at the below expression?
CurrentTime=04/24/16 09:57:23
san-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}
ass
bbbb
CurrentTime=04/24/16 09:57:23
631706
aaa
bbb
I want to write this expression current time to current time .
For example :
(?<=(?<=CurrentTime\=)[0-9].*)\n(.*)
I wrote this formula but it didn't solve my problem.It is only read one line after the current time. But I want to read all line :
san-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}
ass
bbbb
to other current time flag.
You can use
(?sm)^CurrentTime=([^\n]*)\n(.*?)(?=^CurrentTime=|\z)
See the regex demo
The pattern matches
^ - start of a line (since the /m modifier is used)
CurrentTime= - the sequence of literal characters CurrentTime=
([^\n]*)\n - matches and captures into Group 1 zero or more characters other than a newline and will just match the following newline
(.*?) - Group 2 capturing zero or more any characters (incl. a newline since the DOTALL /s modifier is used) but as few as possible, up to the first
(?=^CurrentTime=|\z) - CurrentTime= at the beginning of a line, or the end of string (\z)
Performance Update
To unroll the current regex, just use negated character classes with some additional grouping and a negative lookahead:
(?m)^CurrentTime=([^\n]*)\n([^\n]*(?:\n(?!CurrentTime=)[^\n]*)*)
See another regex demo
In .NET:
var input = "CurrentTime=04/24/16 09:57:23\nsan-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}\nass\nbbbb\nCurrentTime=04/24/16 09:57:23\n631706\naaa\nbbb\n";
var pat = #"(?m)^CurrentTime=([^\n]*)\n([^\n]*(?:\n(?!CurrentTime=)[^\n]*)*)";
var results = Regex.Matches(input, pat)
.Cast<Match>()
.Select(p => p.Groups[2].Value) // Get the capture group 2 values
.ToList();
Console.WriteLine(string.Join("\n---\n", results));
See the Ideone demo
See regex demo at RegexStorm
I am trying to get some in-line comments from a text file and need some help with the expression.
this comes before selection
this is on the same line %% this is the first group and it can have any character /[{3$5!+-p
here is some more text in the middle
this stuff is also on a line with a comment %% this is the second group of stuff !##%^()<>/~`
this goes after the selections
I am trying to get everything that follows %%\s+. Here is what I tried:
%%\s+(.*)$
But that matches all text following the first %%. Not sure where to go from here.
Most engines default to the dot does not match newlines
AND not multi-line mode.
That means %%\s+(.*)$ should not match unless it finds
%% on the last line in the string.
Instead of trying to fight it, use inline modifiers (?..) that
override external switches.
Use (?-s)%%\s+(.*) which takes off dot all
Since . matches any character but a newline by default, you needn't use $:
%%\s+(.*)
See regex demo
Explanation:
%% - two literal % symbols
\s+ - 1 or more whitespace
(.*) - 0 or more any characters other than a newline (captured into Group 1)
C# demo:
var s = "THE_STRING";
var result = Regex.Matches(s, #"%%\s+(.*)")
.Cast<Match>()
.Select(p=>p.Groups[1].Value)
.ToList();
Console.WriteLine(string.Join("\n", result));