Regex string between two strings includes brakets - c#

I struggle with some regex expression.
Test string is e.g.
Path.parent.child[0]
I need to extract the "parent"
I always know the start "Path" and end "child[0]" of the test string. End can also be "child[1]" or without "[]"
I try it with: (?<=Path.).*?(?=.child[0])
But it not find a match. I think the [] in the child string is the problem.

A regex which might work is ^[^.]*\.(.*)\.[^.]*\[?\d?\]?$
^ beginning of the line
[^.]* anything except a dot
\. a literal dot
(.*) anything, captured as a group
\. a literal dot again
[^.]* anything except a dot
\[?\d?\]? optional brackets and index. Could be (\[\d\])? if you don't mind another group.
$ end of the line
Check it on Regex101.com
But, if you always know the start and the end, you could switch to a non-Regex approach:
var start = "Path";
var end = "child[0]";
var s = start+".parent."+end;
// +1 for the first dot, -2 for two dots
var extracted = s.Substring(start.Length+1, s.Length-start.Length-end.Length-2);
Console.WriteLine(extracted);

Related

C# RegEx to match specific strings

I need to match (using regex) strings that can be like this:
required: custodian_{number 1 - 9}_{fieldType either txt or ssn}
optional: _{fieldLength 1-999}
So for example:
custodian_1_ssn_1 is valid
custodian_1_ssn_1_255 is valid
custodian or custodian_ or custodian_1 or custodian_1_ or custodian_1_ssn or custodian_1_ssn_ or custodian_1_ssn_1_ are not valid
Currently I am working with this:
(?:custodian|signer)_[1-9]?[0-9]_(?:txt|ssn)_[1-9][0-9]?(_[1-9]?[0-9]?[0-9]?)?
as my regex and my api is working to pick up:
custodian_1_txt_1
custodian_1_ssn_1
custodian_1_txt_1_255 <---- not matching the last "5"
any thoughts?
You may use pattern:
^custodian(?:_[a-z0-9]+)+$
^ Assert position beginning of line.
custodian Match literal substring custodian.
(?:_[a-z0-9]+)+ Non capturing group. Multiple sequence of _ followed by alphanumerics.
$ Assert position end of line.
You can check the correct matches here.
Obviously you can modify the pattern to add substring signer in non capturing group as:
^(?:custodian|signer)(?:_[a-z0-9]+)+$.
I suggest using \d for numbers not yours and this is my code try it:-
(?:custodian|signer)_[1-9]?[0-9]_(?:txt|ssn)_[1-9][0-9]?(_[1-9]?\d*)?
I just added a \d value to the end of your pattern to match all end digits before another match.
You could use an anchor to assert the start ^ and the end $ of the string and for the last part make at least the first 1-9 not optional or else it would match and underscore at the end:
^(?:custodian|signer)_[1-9]?[0-9]_(?:txt|ssn)_[1-9][0-9]?(_[1-9][0-9]?[0-9]?)?$
If you're only interested in the last digits, this super generic regex will do:
(?:.+)_(\d+)
If you do need to match the whole string, this worked:
^(?:custodian|signer)_\d+_(?:txt|ssn)(?:_\d+)?_(\d+)$

(Regular Expression) To check start with specified text and next numeric

I'd like to check start with "Text" and next is numeric.
I want this with Regex but can't make it well.
below is example and I only want to get
"Test2018.txt", "Test2019.txt"
List<string> fileNames = new List<string>() {"Test2018.txt", "Test2019.txt", "TestEvent2018.txt", "TestEvent2019.txt"};
fileNames.Where(p => Regex.IsMatch(p, "Test^[0-9]+*") == true);
You could use this Regex:
^Test[0-9]+\.txt$
Where
^ denotes the start of the line.
Test matches the literal text.
[0-9] matches any numeric digit 0-9.
+ means at least once.
\. matches the period.
txt matches the literal text.
$ denotes the end of the line.
And in C#:
var matchingFiles = fileNames.Where(p => Regex.IsMatch(p, #"^Test[0-9]+\.txt$"));
^ matches the start of the string, so it doesn’t make sense for it to go in the middle of your pattern. I think you also meant .* by *, but there’s no need to check the rest of the string when checking for IsMatch without an ending anchor. (That means [0-9]+ can also become [0-9].)
No need to use == true on booleans, either.
fileNames.Where(p => Regex.IsMatch(p, "^Test[0-9]"))

Regex pattern to separate string with semicolon and plus

Here I have used the below mentioned code.
MatchCollection matches = Regex.Matches(cellData, #"(^\[.*\]$|^\[.*\]_[0-9]*$)");
The only this pattern is not doing is it's not separating the semicolon and plus from the main string.
A sample string is
[dbServer];[ciDBNAME];[dbLogin];[dbPasswd] AND [SIM_ErrorFound#1]_+[#IterationCount]
I am trying to extract
[dbServer]
[ciDBNAME]
[dbLogin]
[dbPasswd]
[SIM_ErrorFound#1]
[#IterationCount]
from the string.
To extract the stuff in square brackets from [dbServer];[ciDBNAME];[dbLogin];[dbPasswd] AND [SIM_ErrorFound#1]_+[#IterationCount] (which is what I assume you're be trying to do),
The regular expression (I haven't quoted it) should be
\[([^\]]*)\]
You should not use ^ and $ as youre not interested in start and end of strings. The parentheses will capture every instance of zero or more characters inside square brackets.
If you want to be more specific about what you're capturing in the brackets, you'll need to change the [^\] to something else.
Your regex - (^\[.*\]$|^\[.*\]_[0-9]*$) - matches any full string that starts with [, then contains zero or more chars other than a newline, and ends with ] (\]$) or with _ followed with 0+ digits (_[0-9]*$). You could also write the pattern as ^\[.*](?:_[0-9]*)?$ and it would work the same.
However, you need to match multiple substrings inside a larger string. Thus, you should have removed the ^ and $ anchors and retried. Then, you would find out that .* is too greedy and matches from the first [ up to the last ]. To fix that, it is best to use a negated character class solution. E.g. you may use [^][]* that matches 0+ chars other than [ and ].
Edit: It seems you need to get only the text inside square brackets.
You need to use a capturing group, a pair of unescaped parentheses around the part of the pattern you need to get and then access the value by the group ID (unnamed groups are numbered starting with 1 from left to right):
var results = Regex.Matches(s, #"\[([^][]+)]")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
See the .NET regex demo

C# - Removing single word in string after certain character

I have string that I would like to remove any word following a "\", whether in the middle or at the end, such as:
testing a\determiner checking test one\pronoun
desired result:
testing a checking test one
I have tried a simple regex that removes anything between the backslash and whitespace, but it gives the following result:
string input = "testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*\s");
string output = regex.Replace(input, " ");
Result:
testing a one\pronoun
It looks like this regex matches from the backslash until the last whitespace in the string. I cannot seem to figure out how to match from the backlash to the next whitespace. Also, I am not guaranteed a whitespace at the end, so I would need to handle that. I could continue processing the string and remove any text after the backslash, but I was hoping I could handle both cases with one step.
Any advice would be appreciated.
Change .* which match any characters, to \w*, which only match word characters.
Regex regex = new Regex(#"\\\w*");
string output = regex.Replace(input, "");
".*" matches zero or more characters of any kind. Consider using "\w+" instead, which matches one or more "word" characters (not including whitespace).
Using "+" instead of "*" would allow a backslash followed by a non-"word" character to remain unmatched. For example, no matches would be found in the sentence "Sometimes I experience \ an uncontrollable compulsion \ to intersperse backslash \ characters throughout my sentences!"
With your current pattern, .* tells the parser to be "greedy," that is, to take as much of the string as possible until it hits a space. Adding a ? right after that * tells it instead to make the capture as small as possible--to stop as soon as it hits the first space.
Next, you want to end at not just a space, but at either a space or the end of the string. The $ symbol captures the end of the string, and | means or. Group those together using parentheses and your group collectively tells the parser to stop at either a space or the end of the string. Your code will look like this:
string input = #"testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*?(\s|$)");
string output = regex.Replace(input, " ");
Try this regex (\\[^\s]*)
(\\[^\s]*)
1st Capturing group (\\[^\s]*)
\\ matches the character \ literally
[^\s]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ].

Replace with wildcards

I need some advice. Suppose I have the following string: Read Variable
I want to find all pieces of text like this in a string and make all of them like the following:Variable = MessageBox.Show. So as aditional examples:
"Read Dog" --> "Dog = MessageBox.Show"
"Read Cat" --> "Cat = MessageBox.Show"
Can you help me? I need a fast advice using RegEx in C#. I think it is a job involving wildcards, but I do not know how to use them very well... Also, I need this for a school project tomorrow... Thanks!
Edit: This is what I have done so far and it does not work: Regex.Replace(String, "Read ", " = Messagebox.Show").
You can do this
string ns= Regex.Replace(yourString,"Read\s+(.*?)(?:\s|$)","$1 = MessageBox.Show");
\s+ matches 1 to many space characters
(.*?)(?:\s|$) matches 0 to many characters till the first space (i.e \s) or till the end of the string is reached(i.e $)
$1 represents the first captured group i.e (.*?)
You might want to clarify your question... but here goes:
If you want to match the next word after "Read " in regex, use Read (\w*) where \w is the word character class and * is the greedy match operator.
If you want to match everything after "Read " in regex, use Read (.*)$ where . will match all characters and $ means end of line.
With either regex, you can use a replace of $1 = MessageBox.Show as $1 will reference the first matched group (which was denoted by the parenthesis).
Complete code:
replacedString = Regex.Replace(inStr, #"Read (.*)$", "$1 = MessageBox.Show");
The problem with your attempt is, that it cannot know that the replacement string should be inserted after your variable. Let's assume that valid variable names contain letters, digits and underscores (which can be conveniently matched with \w). That means, any other character ends the variable name. Then you could match the variable name, capture it (using parentheses) and put it in the replacement string with $1:
output = Regex.Replace(input, #"Read\s+(\w+)", "$1 = MessageBox.Show");
Note that \s+ matches one or more arbitrary whitespace characters. \w+ matches one or more letters, digits and underscores. If you want to restrict variable names to letters only, this is the place to change it:
output = Regex.Replace(input, #"Read\s+([a-zA-Z]+)", "$1 = MessageBox.Show");
Here is a good tutorial.
Finally note, that in C# it is advisable to write regular expressions as verbatim strings (#"..."). Otherwise, you will have to double escape everything, so that the backslashes get through to the regex engine, and that really lessens the readability of the regex.

Categories

Resources