Regular expression and removing signs - c#

I'm new in regular expressions. I've got a little problem and i can't find the answer. I'm looking for redundant brackets using this regular espression:
public Regex RedundantBrackets = new Regex("[(](\\s?)[a-z](\\s?)[)]");
When i find something i want to modife string in this way:
text1 (text2) text3 => text1 text2 text3 - so as you can se i want only to remove brackets. How can i do this? I was trying to use Replace method, but using it i can only replace every sign of "(text2)".
Thanks in advance!

Try this replace
Regex.Replace("text1 (text2) text3", // Input
#"([()])", // Pattern to match
string.Empty) // Item to replace
/* result: text1 text2 text3*/
Explanation
Regex replace looks across the whole string for a match. If it finds a match it will replace that item. So our match pattern looks like this ([()]). Which means this
( is what is required within the pattern to start the match and needs a closing ) otherwise the match pattern is not balanced.
[] in the pattern says, I am searching for a character, and [ and ] define a set. They are considered set matches. The most common one is [A-Z] which is any set of characters, starting with A and ending in Z. We will define our own set. *Remember [ and ] mean to regex we are looking for 1 character but we specify a set of many characters within that.
( and ) within our set [()] which also could be specified as [)(] as well means we have a set of two characters. Those two characters are the opening and closing parenthesis ().
So taken all together we are looking to match (1) any character in the set (2) that is either a ( or a ). When that match is found, replace the ( or ) with string.empty.
When we run the regex replace on your text it finds two matches the (text2 and finally the match text2). Those are replaced with string.empty.

First off, it can be handy to use verbatim strings so you don't have to escape the slashes etc.
public Regex RedundantBrackets = new Regex(#"[(]\s?([a-z]+)\s?[)]");
We want to wrap [a-z] in parenthesis because that's what we're trying to capture. We can then use $1 to place that capture into the replacement
RedundantBrackets.Replace("text (text) text", "$1");
EDIT: I forgot to add repetition to [a-z] => [a-z]+

this will remove all charaters using regex
finalString = Regex.Replace(finalString, #"[^\w ]", "");

Related

RegEx to find non-existence of white space prefix but not include the character in the match?

So i have the following RegEx for the purpose of finding and adding whitespace:
(\S)(\()
So for a string like "SomeText(Somemoretext)" I want to update this to "SomeText (Somemoretext)" it matches "t(" and so my replace eliminates the "t" from the string which is not good. I also do not know what the character could be, I'm merely trying to find the non-existence of whitespace.
Is there a better expression to use or is there a way to exclude the found character from the match returned so that I can safely replace without catching characters i do not want to replace?
Thanks
I find lookarounds hard to read and would prefer using substitutions in the replacement string instead:
var s = Regex.Replace("test1() test2()", #"(\S)\(", "$1 (");
Debug.Assert(s == "test1 () test2 ()");
$1 inserts the first capture group from the regex into the replacement string which is the non-space character before the opening parenthesis (.
If you need to detect the absence of space before a specific character (such as bracket) after a word, how about the following?
\b(?=[^\s])\(
This will detect words ( [a-zA-z0-9_] that are followed by a bracket, without a space).
(if I got your problem correctly) you can replace the full match with ( and get exactly what you need.
In case you need to look for absence spaces before a symbol (like a bracket) in any kind of text (as in the text may be non-word, such as punctuation) you might want to use the following instead.
^(?:\S*)(\()(?:\S*)$
When using this, your result will be in group 1, instead of just full match (which now contains the whole line, if a line is matched).

Regex - how do i match the first part of an indexed path

for the line
Tester[0]/Test[4]/testId
Tester[0]/Test[4]/testId
Test[1]/Test[4]/testId
Test[2]/Test[4]/testId
I want to match the first part of the path including the first [, n and ] and first /
so for line above I would get
Tester[0]
Tester[0]
Test[1]
Test[2]
I have tried using
var rx = new Regex(#"^\[.*\]\/");
var res = rx.Replace("Tester[0]/Test[4]/testId", "", 1 /*only one occurrence */);
i get
res == "testId";
rather than
res == "Test[4]/testId"
which is what im hoping for
so its matching the first open square bracket and the last closing bracket.
I need it to match only the first closing bracket
Update:
Sorry, i am trying to match the first forward slash also.
Tester[0]/
Tester[0]/
Test[1]/
Test[2]/
Solution:
to remove the first match using "?":
var rx = new Regex(#"^.*?\[.*?\]\/");
var res = rx.Replace("Tester[0]/Test[4]/testId", "", 1 /*only one occurrence */);
I'm assuming this was your original regex pattern: ^.*\[.*\]/ (the pattern in your question does not match the lines).
This pattern uses greedy quantifiers (*), so, even though we only requested one match, the pattern itself matches more than we'd like. As you noticed, it matched until the second occurrence of the square brackets.
We can make this pattern non-greedy by adding question marks to the quantifiers: ^.*?\[.*?\]/.
Although this works for your use-case, a better pattern may be: ^[^/]+/. This removes any character up to the first forward-slash. The [^ ... ] is a negative character class (the brackets are unrelated to the brackets in the strings we're matching against). In this case, it matches any character that isn't a forward-slash.
For this simple text manipulation, though, we could just use a String.Substring() instead of regular expressions:
line.Substring(line.IndexOf('/') + 1);
This is faster and easier to understand than a regular expression pattern.
You can use lookahead and lookbehind approach to find matching and replace accordingly :
With lookaround approach, your regex would be like this :
(?=/).*(?<=])
Is this the sort of thing you are looking for?
updated
var str="Tester[0]/Test[4]/testId\nTester[0]/Test[4]/testId\nTest[1]/Test[4]/testId\nTest[2]/Test[4]/testId"
console.log(str)
// Tester[0]/Test[4]/testId
// Tester[0]/Test[4]/testId
// Test[1]/Test[4]/testId
// Test[2]/Test[4]/testId
var str2=str.replace(/\/.+/mg,"")
console.log(str2)
// Tester[0]
// Tester[0]
// Test[1]
// Test[2]
this works by starting the match at the first '/' and then ending when the line ends and replaces this match with " ". the m flags multi-line and the g flags to do a global match.

Regex pattern to separate string with semicolon and plus

Here I have used the below mentioned code.
MatchCollection matches = Regex.Matches(cellData, #"(^\[.*\]$|^\[.*\]_[0-9]*$)");
The only this pattern is not doing is it's not separating the semicolon and plus from the main string.
A sample string is
[dbServer];[ciDBNAME];[dbLogin];[dbPasswd] AND [SIM_ErrorFound#1]_+[#IterationCount]
I am trying to extract
[dbServer]
[ciDBNAME]
[dbLogin]
[dbPasswd]
[SIM_ErrorFound#1]
[#IterationCount]
from the string.
To extract the stuff in square brackets from [dbServer];[ciDBNAME];[dbLogin];[dbPasswd] AND [SIM_ErrorFound#1]_+[#IterationCount] (which is what I assume you're be trying to do),
The regular expression (I haven't quoted it) should be
\[([^\]]*)\]
You should not use ^ and $ as youre not interested in start and end of strings. The parentheses will capture every instance of zero or more characters inside square brackets.
If you want to be more specific about what you're capturing in the brackets, you'll need to change the [^\] to something else.
Your regex - (^\[.*\]$|^\[.*\]_[0-9]*$) - matches any full string that starts with [, then contains zero or more chars other than a newline, and ends with ] (\]$) or with _ followed with 0+ digits (_[0-9]*$). You could also write the pattern as ^\[.*](?:_[0-9]*)?$ and it would work the same.
However, you need to match multiple substrings inside a larger string. Thus, you should have removed the ^ and $ anchors and retried. Then, you would find out that .* is too greedy and matches from the first [ up to the last ]. To fix that, it is best to use a negated character class solution. E.g. you may use [^][]* that matches 0+ chars other than [ and ].
Edit: It seems you need to get only the text inside square brackets.
You need to use a capturing group, a pair of unescaped parentheses around the part of the pattern you need to get and then access the value by the group ID (unnamed groups are numbered starting with 1 from left to right):
var results = Regex.Matches(s, #"\[([^][]+)]")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
See the .NET regex demo

Regex to extract text from a pattern in C#

I have a string pattern, which contains a ID and Text to make markup easier for our staff.
The pattern to create a "fancy" button in our CMS is:
(button_section_ID:TEXT)
Example:
(button_section_25:This is a fancy button)
How do I extract the "This is a fancy button" part of that pattern? The pattern will always be the same. I tried to do some substring stuff but that got complicated very fast.
Any help would be much appreciated!
If the text is always in the format you specified, you just need to trim parentheses and then split with ::
var res = input.Trim('(', ')').Split(':')[1];
If the string is a substring, use a regex:
var match = Regex.Match(input, #"\(button_section_\d+:([^()]+)\)");
var res = match.Success ? match.Groups[1].Value : "";
See this regex demo.
Explanation:
\(button_section_ - a literal (button_section_
\d+ - 1 or more digits
: - a colon
([^()]+) - Group 1 capturing 1+ characters other than ( and ) (you may replace with ([^)]*) to make matching safer and allow an empty string and ( inside this value)
)- a literal)`
The following .NET regex will give you a match containing a group with the text you want:
var match = Regex.Matches(input, #"\(button_section_..:(.*)\)");
The braces define a match group, which will give you everything between the button section, and the final curly brace.

Replace with wildcards

I need some advice. Suppose I have the following string: Read Variable
I want to find all pieces of text like this in a string and make all of them like the following:Variable = MessageBox.Show. So as aditional examples:
"Read Dog" --> "Dog = MessageBox.Show"
"Read Cat" --> "Cat = MessageBox.Show"
Can you help me? I need a fast advice using RegEx in C#. I think it is a job involving wildcards, but I do not know how to use them very well... Also, I need this for a school project tomorrow... Thanks!
Edit: This is what I have done so far and it does not work: Regex.Replace(String, "Read ", " = Messagebox.Show").
You can do this
string ns= Regex.Replace(yourString,"Read\s+(.*?)(?:\s|$)","$1 = MessageBox.Show");
\s+ matches 1 to many space characters
(.*?)(?:\s|$) matches 0 to many characters till the first space (i.e \s) or till the end of the string is reached(i.e $)
$1 represents the first captured group i.e (.*?)
You might want to clarify your question... but here goes:
If you want to match the next word after "Read " in regex, use Read (\w*) where \w is the word character class and * is the greedy match operator.
If you want to match everything after "Read " in regex, use Read (.*)$ where . will match all characters and $ means end of line.
With either regex, you can use a replace of $1 = MessageBox.Show as $1 will reference the first matched group (which was denoted by the parenthesis).
Complete code:
replacedString = Regex.Replace(inStr, #"Read (.*)$", "$1 = MessageBox.Show");
The problem with your attempt is, that it cannot know that the replacement string should be inserted after your variable. Let's assume that valid variable names contain letters, digits and underscores (which can be conveniently matched with \w). That means, any other character ends the variable name. Then you could match the variable name, capture it (using parentheses) and put it in the replacement string with $1:
output = Regex.Replace(input, #"Read\s+(\w+)", "$1 = MessageBox.Show");
Note that \s+ matches one or more arbitrary whitespace characters. \w+ matches one or more letters, digits and underscores. If you want to restrict variable names to letters only, this is the place to change it:
output = Regex.Replace(input, #"Read\s+([a-zA-Z]+)", "$1 = MessageBox.Show");
Here is a good tutorial.
Finally note, that in C# it is advisable to write regular expressions as verbatim strings (#"..."). Otherwise, you will have to double escape everything, so that the backslashes get through to the regex engine, and that really lessens the readability of the regex.

Categories

Resources