How to read line by line in C#? - c#

How can read line by line at the below expression?
CurrentTime=04/24/16 09:57:23
san-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}
ass
bbbb
CurrentTime=04/24/16 09:57:23
631706
aaa
bbb
I want to write this expression current time to current time .
For example :
(?<=(?<=CurrentTime\=)[0-9].*)\n(.*)
I wrote this formula but it didn't solve my problem.It is only read one line after the current time. But I want to read all line :
san-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}
ass
bbbb
to other current time flag.

You can use
(?sm)^CurrentTime=([^\n]*)\n(.*?)(?=^CurrentTime=|\z)
See the regex demo
The pattern matches
^ - start of a line (since the /m modifier is used)
CurrentTime= - the sequence of literal characters CurrentTime=
([^\n]*)\n - matches and captures into Group 1 zero or more characters other than a newline and will just match the following newline
(.*?) - Group 2 capturing zero or more any characters (incl. a newline since the DOTALL /s modifier is used) but as few as possible, up to the first
(?=^CurrentTime=|\z) - CurrentTime= at the beginning of a line, or the end of string (\z)
Performance Update
To unroll the current regex, just use negated character classes with some additional grouping and a negative lookahead:
(?m)^CurrentTime=([^\n]*)\n([^\n]*(?:\n(?!CurrentTime=)[^\n]*)*)
See another regex demo
In .NET:
var input = "CurrentTime=04/24/16 09:57:23\nsan-ls-02022;ENEXRHO;1505;{call Pm_I_AuthFailToPTSF(631706,-21,?)}\nass\nbbbb\nCurrentTime=04/24/16 09:57:23\n631706\naaa\nbbb\n";
var pat = #"(?m)^CurrentTime=([^\n]*)\n([^\n]*(?:\n(?!CurrentTime=)[^\n]*)*)";
var results = Regex.Matches(input, pat)
.Cast<Match>()
.Select(p => p.Groups[2].Value) // Get the capture group 2 values
.ToList();
Console.WriteLine(string.Join("\n---\n", results));
See the Ideone demo
See regex demo at RegexStorm

Related

Regex expression to capture only numeric fields and strip $ and comma, no match if there are any alphanumeric

I'm trying to write a regex that will strip out $ and , from a value and not match at all if there are any other non-numerics.
$100 -> 100
$12,203.00 -> 12203.00
12JAN2022 -> no match
I have gotten sort of close with this:
^(?:[$,]*)(([0-9.]{1,3})(?:[,.]?))+(?:[$,]*)$
However this doesn't properly capture the numeric value with $1 as the repeating digits are captured as like subgroup captures as you can see here https://regex101.com/r/4bOJtB/1
You can use a named capturing group to capture all parts of the number and then concatenate them. Although, it is more straight-forward to replace all chars you do not need as a post-processing step.
Here is an example code:
var pattern = #"^\$*(?:(?<v>\d{1,3})(?:,(?<v>\d{3}))*|(?<v>\d+))(?<v>\.\d+)?$";
var tests = new[] {"$100", "$12,203.00", "12JAN2022"};
foreach (var test in tests) {
var result = string.Concat(Regex.Match(test, pattern)?
.Groups["v"].Captures.Cast<Capture>().Select(x => x.Value));
Console.WriteLine("{0} -> {1}", test, result.Length > 0 ? result : "No match");
}
See the C# demo. Output:
$100 -> 100
$12,203.00 -> 12203.00
12JAN2022 -> No match
The regex is
^\$*(?:(?<v>\d{1,3})(?:,(?<v>\d{3}))*|(?<v>\d+))(?<v>\.\d+)?$
See the regex demo. Details:
^ - start of string
\$* - zero or more dollar symbols
(?:(?<v>\d{1,3})(?:,(?<v>\d{3}))*|(?<v>\d+)) - either one to three digits (captured into Group "v") and then zero or more occurrences of a comma and then three digits (captured into Group "v"), or one or more digits (captured into Group "v")
(?<v>\.\d+)? - an optional occurrence of . and one or more digits (all captured into Group "v")
$ - end of string.
I don't know how to achieve this in single regexp, but personal opinion here I find dividing the problem into smaller steps a good idea - it's easier to implement and maintain/understand in the future without sacrificing time to understand the magic.
replace all $ and , to empty string
[\$\,] => ``
match only digits and periods as a capture group (of course you may need to align this with your requirements on allowed period locations etc.)
^((\d{1,3}\.?)+)$
Hope this helps!

Regex replace with bracket variable in C#

I am sure that has been asked before, but I cannot find the appropriate question(s).
Being new to C#'s Regex, I want to mimic what is possible e.g. with sed and awk where I would write s/_(20[0-9]{2})[.0-9]{1}/\1/g in order to find obtain a 4-digit year number after 2000 which is has an underscore as prefix and a number or a dot afterwards. The \1 refers to the value within brackets.
Example: Both files fx_201902.csv or fx_2019.csv should give me back myYear=2019. I was not successful with:
string myYear = Regex.Replace(Path.GetFileName(x), #"_20([0-9]{2})[.0-9]{1}", "\1")
How do I have to escape? Or is this kind of replacement not possible? If so, how would I do that?
Edit: My issue how to do the /1 in C#, in other words how to extract a regex-variable. Please forgive me my typos in the original post - I am trying the new SO app and I submitted earlier than intended.
I'd suggest more robust regex: _(20(?:0[1-9]|[1-9][0-9]))[\d.]
Explanation:
_ - match _ literally
(...) - first capturing group
20 - match 20 literally
(?:...) - non-capturing group
0[1-9]|[1-9][0-9] - alternation: match 0 and digit other than 0 OR match digit other then zero followed by any digits - this allows you to match ANY year after 2000
[\d.] - match dot or digit
And below is how you use capturing groups:
var regex = new Regex(#"_(20(?:0[1-9]|[1-9][0-9]))[\d.]");
regex.Match("fx_201902.csv").Groups[1].Value;
// "2019"
regex.Match("fx_20190.csv").Groups[1].Value;
// "2019"
regex.Match("fx_2019.csv").Groups[1].Value;
// "2019"
To extract the year using Regex.Replace, you need to capture only the year part of the string into a group and replace the entire string with just the capture group. That means you need to also match the characters before and after the year using (for example)
^.*_(20[0-9]{2})[.0-9].*$
That can then be replaced with $1 e.g.
Regex r = new Regex(#"^.*_(20[0-9]{2})[.0-9].*$");
string filename = "fx_201902.csv";
string myYear = r.Replace(filename, "$1");
Console.WriteLine(myYear);
filename = "fx_2019.csv";
myYear = r.Replace(filename, "$1");
Console.WriteLine(myYear);
Output:
2019
2019
If you want to exclude the year 2000 from your match, change the regex to
^.*_(20(?:0[1-9]|[1-9][0-9]))[.0-9].*$
You might use a capturing group for the first 4 digits and match what is before and after the 4 digits.
.*_(20[0-9]{2})[0-9]*\.\w+$
Explanation
.*_ Match the last underscore
(20[0-9]{2}) Match 20 and 2 digits
[0-9]*\. Match 0 or more occurrences of a digit followed by a dot
\w+$ Match 1 or or more word chars till the end of the string.
Regex demo | C# demo
In the replacement use:
$1
For example
string[] strings = {"fx_2019.csv", "fx_201902.csv"};
foreach (string s in strings)
{
string myYear = Regex.Replace(s, #".*_(20[0-9]{2})[0-9]*\.\w+$", "$1");
Console.WriteLine(myYear);
}
Output
2019
2019
Your second example does not contains the month's digits. If you still want to capture, make it optional:
Regex.Replace(Path.GetFileName(x), #"_20([1-9]{2})([.0-9]{2})?", "\1")
Note that I only added 3 characters to your query: (, ) and ?
If you want the returning value to be as expected: change the replacement to $1 from \1 as documented (with the correct parenthesis) and capture 2020, 2030, etc (still excluding 2000) with the usage of or operator and the combination of [0-9]{1} and [1-9]{1}:
Regex.Replace(Path.GetFileName(x), #"_(20(([1-9]{1})([0-9]{1})||([0-9]{1})([1-9]{1})))([.0-9]{2})?", "$1")
It worths mentioning that $3 and $4 matches the last and the 2nd last digit; and $2 matches with the last 2 digits (aka the combination of [0-9]{1} [1-9]{1} || [1-9]{1} [0-9]{1}).

Regex Match all characters until reach character, but also include last match

I'm trying to find all Color Hex codes using Regex.
I have this string value for example - #FF0000FF#0038FFFF#51FF00FF#F400FFFF and I use this:
#.+?(?=#)
pattern to match all characters until it reaches #, but it stops at the last character, which should be the last match.
I'm kind of new to this Regex stuff. How could I also get the last match?
Your regex does not match the last value because your regex (with the positive lookahead (?=#)) requires a # to appear after an already consumed value, and there is no # at the end of the string.
You may use
#[^#]+
See the regex demo
The [^#] negated character class matches any char but # (+ means 1 or more occurrences) and does not require a # to appear immediately to the right of the currently matched value.
In C#, you may collect all matches using
var result = Regex.Matches(s, #"#[^#]+")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
A more precise pattern you may use is #[A-Fa-f0-9]{8}, it matches a # and then any 8 hex chars, digits or letters from a to f and A to F.
Don't rely upon any characters after the #, match hex characters and it
will work every time.
(?i)#[a-f0-9]+

Extract sub string between two string with new line

Please help me to write a regular expression to extract the entire content between * .
Note the number of * characters can vary.
I tried (\*\n)([\s\S]*)(\n\*) but it groups everything as 1 block instead of 2.
Expected Output
1.
Thanks for contacting us
Regards,
XXX
2.
It wAS a pleasure talking with you
Good to see you today
Test string:
*******
Thanks for contacting us
Regards,
XXX
************
It wAS a pleasure talking with you
Good to see you today
*******
You may use
var results = Regex.Matches(s, #"(?s)\*{3,}(.*?)(?=\*{3,}|$)")
.Cast<Match>()
.Select(x => x.Groups[1].Value.Trim())
.ToList();
See the regex demo
Details
(?s) - RegexOptions.Singleline inline modifier
\*{3,} - 3 or more asterisks
(.*?) - Group 1: any 0+ chars, as few as possible as *? is a lazy quantifier
(?=\*{3,}|$) - a positive lookahead (required to obtain overlapping matches) that matches a location that is followed with 3 or more asterisks or the end of string.
The .Select(x => x.Groups[1].Value.Trim()) part grabs the value inside Group 1 and trims off leading/trailing whitespace.
Another way is to match the ***+ line, then capture all lines not stasrting with 3 or more asterisks into Group 1:
(?m)^\*{3,}.*((?:\r?\n(?!\*{3,}).*)*)
See this regex demo (it can be used in the above code as is, too.)
Details
(?m) - a RegexOptions.Multiline modifier to make ^ and $ match start/end of a line
^ - start of line
\*{3,} - 3 or more asterisks
.* - the rest of the line (or use \r?$ to make sure the end of a line is reached)
((?:\r?\n(?!\*{3,}).*)*) - Group 1: zero or more sequences of
\r?\n(?!\*{3,}) - CRLF or LF line ending that is not followed with 3 or more *s
.* - rest of the string

Matching pattern to end of line

I am trying to get some in-line comments from a text file and need some help with the expression.
this comes before selection
this is on the same line %% this is the first group and it can have any character /[{3$5!+-p
here is some more text in the middle
this stuff is also on a line with a comment %% this is the second group of stuff !##%^()<>/~`
this goes after the selections
I am trying to get everything that follows %%\s+. Here is what I tried:
%%\s+(.*)$
But that matches all text following the first %%. Not sure where to go from here.
Most engines default to the dot does not match newlines
AND not multi-line mode.
That means %%\s+(.*)$ should not match unless it finds
%% on the last line in the string.
Instead of trying to fight it, use inline modifiers (?..) that
override external switches.
Use (?-s)%%\s+(.*) which takes off dot all
Since . matches any character but a newline by default, you needn't use $:
%%\s+(.*)
See regex demo
Explanation:
%% - two literal % symbols
\s+ - 1 or more whitespace
(.*) - 0 or more any characters other than a newline (captured into Group 1)
C# demo:
var s = "THE_STRING";
var result = Regex.Matches(s, #"%%\s+(.*)")
.Cast<Match>()
.Select(p=>p.Groups[1].Value)
.ToList();
Console.WriteLine(string.Join("\n", result));

Categories

Resources