C# regular expression - c#

Please give a suggestion here:
I try to do a regex in C#. Here is the text
E1A: pop(+)
call T
call E1
E1B: return
TA: call F
call T1
I want to split it like this:
1)
E1A: pop(+)
call T
call E1
2)
E1B: return
3)
TA: call F
call T1
I tought at lookbehind, but it's not working because of the .+
Here is what I hope to work but it doesn't:
"[A-Z0-9]+[:].+(?=([A-Z0-9]+[:]))"
Does anyone have a better ideea?
EDIT: The "E1A","E1B","TA" are changing. All it remains the same is that they are made by letter and numbers follow by ":"

Regex regexObj = new Regex(
#"^ # Start of line
[A-Z0-9]+: # Match identifier
(?: # Match...
(?!^[A-Z0-9]+:) # (unless it's the start of the next identifier)
. # ... any character,
)* # repeat as needed.",
RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
allMatchResults = regexObj.Matches(subjectString);
Now allMatchResults.Count will contain the number of matches in subjectString, and allMatchResults.Item[i] will contain the ith match.

Related

How to write a regular expression that captures tags in a comma-separated list?

Here is my input:
#
tag1, tag with space, !##%^, 🦄
I would like to match it with a regex and yield the following elements easily:
tag1
tag with space
!##%^
🦄
I know I could do it this way:
var match = Regex.Match(input, #"^#[\n](?<tags>[\S ]+)$");
// if match is a success
var tags = match.Groups["tags"].Value.Split(',').Select(x => x.Trim());
But that's cheating, as it involves messing around with C#. There must be a neat way to do this with a regex. Just must be... right? ;D
The question is: how to write a regular expression that would allow me to iterate through captures and extract tags, without the need of splitting and trimming?
This works (?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+
It uses C#'s Capture Collection to find a variable amount of field data
in a single record.
You could extend the regex further to get all records at once.
Where each record contains its own variable amount of field data.
The regex has built-in trimming as well.
Expanded:
(?ms) # Inline modifiers: multi-line, dot-all
^ \# \s+ # Beginning of record
(?: # Quantified group, 1 or more times, get all fields of record at once
\s* # Trim leading wsp
( # (1 start), # Capture collector for variable fields
(?: # One char at a time, but not comma or begin of record
(?!
,
| ^ \# \s+
)
.
)*?
) # (1 end)
\s*
(?: , | $ ) # End of this field, comma or EOL
)+
C# code:
string sOL = #"
#
tag1, tag with space, !##%^, 🦄";
Regex RxOL = new Regex(#"(?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+");
Match _mOL = RxOL.Match(sOL);
while (_mOL.Success)
{
CaptureCollection ccOL1 = _mOL.Groups[1].Captures;
Console.WriteLine("-------------------------");
for (int i = 0; i < ccOL1.Count; i++)
Console.WriteLine(" '{0}'", ccOL1[i].Value );
_mOL = _mOL.NextMatch();
}
Output:
-------------------------
'tag1'
'tag with space'
'!##%^'
'??'
''
Press any key to continue . . .
Nothing wrong with cheating ;]
string input = #"#
tag1, tag with space, !##%^, 🦄";
string[] tags = Array.ConvertAll(input.Split('\n').Last().Split(','), s => s.Trim());
You can pretty much make it without regex. Just split it like this:
var result = input.Split(new []{'\n','\r'}, StringSplitOptions.RemoveEmptyEntries).Skip(1).SelectMany(x=> x.Split(new []{','},StringSplitOptions.RemoveEmptyEntries).Select(y=> y.Trim()));

How to match string that contains ^ in regular expression?

I tried to make a regular expression using online tool but not succeeded. Here is the string i need to match:-
27R4FF^27R4FF Text until end
always starts with alphanumeric (case-insensitive)
then always caret sign ^ (no space before & after)
then alphanumeric string
then always one white space
then string until end.
Here is the regular expression that is not working for me:-
((?:[a-z][a-z]*[0-9]+[a-z0-9]*))(\^)((?:[a-z][a-z]*[0-9]+[a-z0-9]*)).*?((?:[a-z][a-z]+))
c# code:-
string txt = "784SFS^784SFS Value is here";
var regs = #"((?:[a-z][a-z]*[0-9]+[a-z0-9]*))(\^)((?:[a-z][a-z]*[0-9]+[a-z0-9]*)).*?((?:[a-z][a-z]+))";
Regex r = new Regex(regs, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(txt);
Console.Write(m.Success ? "matched" : "didn't match");
Console.ReadLine();
Help appreciated. Thanks
Verbatim ^[^\W_]+\^[^\W_]+[ ].*$
^ # BOS
[^\W_]+ # Alphanum
\^ # Caret
[^\W_]+ # Alphanum
[ ] # Space
.* # Anything
$ # EOS
Output
** Grp 0 - ( pos 0 , len 28 )
27R4FF^27R4FF Text until end
I didn't get if string 'until the end' should be matched.
This works for
27R4FF^27R4FF Text
^\w+\^\w+\s\w+$
if you have some spaces at the end, try with
^\w+\^\w+\s[\w\s]+$
Try this: https://regex101.com/r/hD0hV0/2
^[\da-z]+\^[\da-z]+\s.*$
...or commented (assumes RegexOptions.IgnorePatternWhitespace if you're using the format in code):
^ # always starts...
[\da-z]+ # ...with alphanumeric (case-insensitive)
\^ # then always caret sign ^ (no space before & after)
[\da-z]+ # then alphanumeric string
\s # then always one white space
.* # then string...
$ # ...until end.
The other answers don't actually match what you describe (at the time of this writing) because \w matches underscore and you didn't mention any limitations on "the string at the end".

How to create Regex that contains Not colon char?

I have created a regular expression that seems to be working somewhat:
// look for years starting with 19 or 20 followed by two digits surrounded by spaces.
// Instead of ending space, the year may be followed by a '.' or ';'
static Regex regex = new Regex(#" 19\d{2} | 19\d{2}. | 19\d{2}; | 20\d {2} | 20\d{2}. | 20\d{2}; ");
// Trying to add 'NOT followed by a colon'
static Regex regex = new Regex(#" 19\d{2}(?!:) | 19\d{2}. | 19\d{2}; | 20\d{2}(?!:) | 20\d{2}. | 20\d{2}; ");
// Trying to optimize --
//static Regex regex = new Regex(#" (19|20)\d{2}['.',';']");
You can see where I tried to optimize a bit.
But more importantly, it is finding a match for 2002:
How do I make it not do that?
I think I am looking for some sort of NOT operator?
(?:19|20)\d{2}(?=[ ,;.])
Try this.See demo.
https://regex101.com/r/sJ9gM7/103
I'd rather go with \b here, this will help deal with other punctuation that may appear after/before the years:
\b(?:19|20)[0-9]{2}\b
C#:
static Regex regex = new Regex(#"\b(?:19|20)[0-9]{2}\b");
Tested in Expresso:
Problem in your regex is dot.
You should have something like this:
static Regex regex =
new Regex(#" 19\d{2} | 19\d{2}[.] | 19\d{2}; | 20\d{2} | 20\d{2}[.] | 20\d{2}; ");
This did it for me:
// look for years starting with 19 or 20 followed by two digits surrounded by spaces.
// Instead of ending space, the year may also be followed by a '.' or ';'
// but may not be followed by a colon, dash or any other unspecified character.
// optimized --
static Regex regex = new Regex(#"(19|20)\d{2} | (19|20)\d{2};| (19|20)\d{2}[.]");
Used Regex Tester here:
http://regexhero.net/tester/

Regex replace between and including tags

I have the following line of text (META Title):
Buy [ProductName][Text] at a great price [/Text] from [ShopName] today.
I am replacing depending on what values I have.
I have it working as I require however I can't find the correct regex to replace:
[Text] at a great price [/Text]
The words (in a nd between square brackets) change so the only thing that will remain the same is:
[][/]
i.e I may also want to replace
[TestText]some test text[/TestText] with nothing.
I have this working:
System.Text.RegularExpressions.Regex.Replace(SEOContent, #"\[Text].*?\[/Text]", #"");
I presumed the regex of:
[.*?].*?\[/.*?]
Would work but it didn't! - I'm coding in ASP.NET C#
Thanks in advance,
Dave
Use a named capture to get the node name of [..], then find it again using \k<..>.
(\[(?<Tag>[^\]]+)\][^\[]+\[/\k<Tag>\])
Broken down using Ignore Pattern Whitespace and an example program.
string pattern = #"
( # Begin our Match
\[ # Look for the [ escape anchor
(?<Tag>[^\]]+) # Place anything that is not antother ] into the named match Tag
\] # Anchor of ]
[^\[]+ # Get all the text to the next anchor
\[/ # Anchor of the closing [...] tag
\k<Tag> # Use the named capture subgroup Tag to balance it out
\] # Properly closed end tag/node.
) # Match is done";
string text = "[TestText]some test text[/TestText] with nothing.";
Console.WriteLine (Regex.Replace(text, pattern, "Jabberwocky", RegexOptions.IgnorePatternWhitespace));
// Outputs
// Jabberwocky with nothing.
As an aside, I would actually create a tokenizing regex (using a regex If with the above pattern) and replace within matches by identify the sections by named captures. Then in the replace using a match evaluator replace the identified tokens such as:
string pattern = #"
(?(\[(?<Tag>[^\]]+)\][^\[]+\[/\k<Tag>\]) # If statement to check []..[/] situation
( # Yes it is, match into named captures
\[
(?<Token>[^\]]+) # What is the text inside the [ ], into Token
\]
(?<TextOptional>[^\[]+) # Optional text to reuse
\[
(?<Closing>/[^\]]+) # The closing tag info
\]
)
| # Else, let is start a new check for either [] or plain text
(?(\[) # If a [ is found it is a token.
( # Yes process token
\[
(?<Token>[^\]]+) # What is the text inside the [ ], into Token
\]
)
| # Or (No of the second if) it is just plain text
(?<Text>[^\[]+) # Put it into the text match capture.
)
)
";
string text = #"Buy [ProductName] [Text]at a great price[/Text] from [ShopName] today.";
Console.WriteLine (
Regex.Replace(text,
pattern,
new MatchEvaluator((mtch) =>
{
if (mtch.Groups["Text"].Success) // If just text, return it.
return mtch.Groups["Text"].Value;
if (mtch.Groups["Closing"].Success) // If a Closing match capture group reports success, then process
{
return string.Format("Reduced Beyond Comparison (Used to be {0})", mtch.Groups["TextOptional"].Value);
}
// Otherwise its just a plain old token, swap it out.
switch ( mtch.Groups["Token"].Value )
{
case "ProductName" : return "Jabberwocky"; break;
case "ShopName" : return "StackOverFlowiZon"; break;
}
return "???"; // If we get to here...we have failed...need to determine why.
}),
RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture));
// Outputs:
// Buy Jabberwocky Reduced Beyond Comparison (Used to be at a great price) from StackOverFlowiZon today.

C# Regex Grouping

I have to write a function that will get a string and it will have 2 forms:
XX..X,YY..Y where XX..X are max 4 characters and YY..Y are max 26 characters(X and Y are digits or A or B)
XX..X where XX..X are max 8 characters (X is digit or A or B)
e.g. 12A,784B52 or 4453AB
How can i user Regex grouping to match this behavior?
Thanks.
p.s. sorry if this is to localized
You can use named captures for this:
Regex regexObj = new Regex(
#"\b # Match a word boundary
(?: # Either match
(?<X>[AB\d]{1,4}) # 1-4 characters --> group X
, # comma
(?<Y>[AB\d]{1,26}) # 1-26 characters --> group Y
| # or
(?<X>[AB\d]{1,8}) # 1-8 characters --> group X
) # End of alternation
\b # Match a word boundary",
RegexOptions.IgnorePatternWhitespace);
X = regexObj.Match(subjectString).Groups["X"].Value;
Y = regexObj.Match(subjectString).Groups["Y"].Value;
I don't know what happens if there is no group Y, perhaps you might need to wrap the last line in an if statement.

Categories

Resources