Subexpression inside Regex - c#

I have the following Regex which I use for syntax highlighting:
static Regex inQuotes = new Regex("(\"|\').*(\"|\')", RegexOptions.Compiled);
However, there is an issue. Whenever, I encounter a text like:
"text_example1' or
'text_example2"
it actually changes the color of the text, because these two cases are considered a match. What I want to do is to change this Regex in a way that I can replace the second (\"|\') with something else.
I was thinking about subexpressions and I was wondering on how I could change it so that once I get the first match (" or '), then the last match must be the same as the first match instead of " or '.

(\"|\').*?(\1)
You can use backreferencing here to achieve what you want.Also make your expression non greedy .*? from greedy .*.See demo.
https://regex101.com/r/nM7nT5/3
string strRegex = #"(\""|\').*?(\1)";

Related

Regex to match a word beginning with a period and ending with an underscore?

I'm quite the Regex novice, but I have a series of strings similar to this "[$myVar.myVar_STATE]" I need to replace the 2nd myVar that begins with a period and ends with an underscore. I need it to match it exactly, as sometimes I'll have "[$myVar.myVar_moreMyVar_STATE]" and in that case I wouldn't want to replace anything.
I've tried things like "\b.myVar_\b", "\.\bmyVar_\b" and several more, all to no luck.
How about this:
\[\$myVar\.([^_]+)_STATE\]
Matches:
[$myVar.myVar_STATE] // matches and captures 'myvar'
[$myVar.myVar_moreMyVar_STATE] // no match
Working regex example:
http://regex101.com/r/yM9jQ3
Or if _STATE was variable, you could use this: (as long as the text in the STATE part does not have underscores in it.)
\[\$myVar\.([^_]+)_[^_]+\]
Working regex example:
http://regex101.com/r/kW8oE1
Edit: Conforming to OP's comments below, This should be what he's going for:
(\[\$myVar\.)([^_]+)(_[^_]+\])
Regex replace example:
http://regex101.com/r/pU6yL8
C#
var pattern = #"(\[\$myVar\.)([^_]+)(_[^_]+\])";
var replaced = Regex.Replace(input, pattern, "$1"+ newVar + "$3")
What about something like:
.*.(myVar_).*
This looks for anything then a . and "myVar_" followed by anything.
It matches:
"[$myVar.myVar_STATE]"
And only the first myVar_ here:
"[$myVar.myVar_moremyVar_STATE]"
See it in action.
This should do it:
\[\$myVar\.(.*?)_STATE\]
You can use this little trick to pick out the groups, and build the replacement at the end, like so:
var replacement = "something";
var input = #"[$myVar.myVar_STATE]";
var pattern = #"(\[\$myVar\.)(.*?)_(.*?)]";
var replaced = Regex.Replace(input, pattern, "$1"+ replacement + "_$2]")
C# already has builtin method to do this
string text = ".asda_";
Response.Write((text.StartsWith(".") && text.EndsWith("_")));
Is Regex really required?
string input = "[$myVar.myVar_STATE]";
string oldVar = "myVar";
string newVar = "myNewVar";
string result = input.Replace("." + oldVar + "_STATE]", "." + newVar + "_STATE]");
In case "STATE" is a variable part, then we'll need to use Regex. The easiest way is to use this Regex pattern which matches a position between a prefix and a suffix. Prefix and suffix are used for searching but are not included in the resulting match:
(?<=prefix)find(?=suffix)
result =
Regex.Replace(input, #"(?<=\.)" + Regex.Escape(oldVar) + "(?=_[A-Z]+])", newVar);
Explanation:
The prefix part is \., which stand for ".".
The find part is the escaped old variable to be replaced. Regex escaping makes sure that characters with a special meaning in Regex are escaped.
The suffix part is _[A-Z]+], an underscore followed by at least one letter followed by "]". Note: the second ] needs not to be escaped. An opening bracket [ would have to be escaped like this: \[. We cannot use \w for word characters for the STATE-part as \w includes underscores. You might have to adapt the [A-Z] part to exactly match all possible states (e.g. if state has digits, use [A-Z0-9].

Regular expression and removing signs

I'm new in regular expressions. I've got a little problem and i can't find the answer. I'm looking for redundant brackets using this regular espression:
public Regex RedundantBrackets = new Regex("[(](\\s?)[a-z](\\s?)[)]");
When i find something i want to modife string in this way:
text1 (text2) text3 => text1 text2 text3 - so as you can se i want only to remove brackets. How can i do this? I was trying to use Replace method, but using it i can only replace every sign of "(text2)".
Thanks in advance!
Try this replace
Regex.Replace("text1 (text2) text3", // Input
#"([()])", // Pattern to match
string.Empty) // Item to replace
/* result: text1 text2 text3*/
Explanation
Regex replace looks across the whole string for a match. If it finds a match it will replace that item. So our match pattern looks like this ([()]). Which means this
( is what is required within the pattern to start the match and needs a closing ) otherwise the match pattern is not balanced.
[] in the pattern says, I am searching for a character, and [ and ] define a set. They are considered set matches. The most common one is [A-Z] which is any set of characters, starting with A and ending in Z. We will define our own set. *Remember [ and ] mean to regex we are looking for 1 character but we specify a set of many characters within that.
( and ) within our set [()] which also could be specified as [)(] as well means we have a set of two characters. Those two characters are the opening and closing parenthesis ().
So taken all together we are looking to match (1) any character in the set (2) that is either a ( or a ). When that match is found, replace the ( or ) with string.empty.
When we run the regex replace on your text it finds two matches the (text2 and finally the match text2). Those are replaced with string.empty.
First off, it can be handy to use verbatim strings so you don't have to escape the slashes etc.
public Regex RedundantBrackets = new Regex(#"[(]\s?([a-z]+)\s?[)]");
We want to wrap [a-z] in parenthesis because that's what we're trying to capture. We can then use $1 to place that capture into the replacement
RedundantBrackets.Replace("text (text) text", "$1");
EDIT: I forgot to add repetition to [a-z] => [a-z]+
this will remove all charaters using regex
finalString = Regex.Replace(finalString, #"[^\w ]", "");

Regex for word.otherword

I want a Regular Expression for a word.otherword form. I tried \b[a-z]\.[a-z]\b, but it gives me an error at the \. part, saying Unrecognized escape sequence. Any idea what's wrong? I'm working under .NET C#. Thanks!
LE:
john.Smith or JoHn.SmItH or JOHN.SMITH should work.
John Smith or john!Smith or john.Smith.Smith shouldn't work.
Try this :
foundMatch = Regex.IsMatch(SubjectString, #"\b[a-z]\.[a-z]\b");
Probably you were not using #?
Your regex tries to match a.a this means a single character. But since you want it to match complete words you need a quantifier e.g.
\b[a-z]+\.[a-z]+\b
Finally you may want to use the case insensitive match to allow for words with capital letters to be matched too :
foundMatch = Regex.IsMatch(SubjectString, #"\b[a-z]+\.[a-z]+\b", RegexOptions.IgnoreCase);
This will match all words.words with at least one character for each word regardless of capitalization.
This will match all word.otherword only if there is a space behind the first word or it is the start of the string and only if there is a space after the second word or it is the end of the string.
foundMatch = Regex.IsMatch(SubjectString, #"(?<=\s|^)\b[a-z]+\.[a-z]+\b(?=\s|$)", RegexOptions.IgnoreCase);
Try this regex for word.word format:
#"\b([a-z]+)\.\1"
For word.otherword use this:
#"\b[a-z]+\.[a-z]+\b"

Regular expression for replacement operation

Is there any regular expression that will replace everything except alphanumeric?
My attempt (not working)
string str = "This is a string;;;; having;;; and It also 5555 777has dot (.) Many dots(.....)";
Regex rgx2 = new Regex("^[a-zA-Z0-9]+");
string result1 = rgx2.Replace(str, "");
[^a-zA-Z0-9]+ instead ^[a-zA-Z0-9]+
The ^ symbol in your second regex means 'at start of string', the way it is written. In order to have it negate the set it needs to be the first character after opening bracket:
[^a-zA-Z0-9]+
However, this will remove the - characters that you previously replaced spaces with. You probably want to exclude that character as well:
[^a-zA-Z0-9-]+

C# regex replace unexpected behavior

Given $displayHeight = "800";, replace whatever number is at 800 with int value y_res.
resultString = Regex.Replace(
im_cfg_contents,
#"\$displayHeight[\s]*=[\s]*""(.*)"";",
Convert.ToString(y_res));
In Python I'd use re.sub and it would work. In .NET it replaces the whole line, not the matched group.
What is a quick fix?
Building on a couple of the answers already posted. The Zero-width assertion allows you to do a regular expression match without placing those characters in the match. By placing the first part of the string in a group we've separated it from the digits that you want to be replaced. Then by using a zero-width lookbehind assertion in that group we allow the regular expression to proceed as normal but omit the characters in that group in the match. Similarly, we've placed the last part of the string in a group, and used a zero-width lookahead assertion. Grouping Constructs on MSDN shows the groups as well as the assertions.
resultString = Regex.Replace(
im_cfg_contents,
#"(?<=\$displayHeight[\s]*=[\s]*"")(.*)(?="";)",
Convert.ToString(y_res));
Another approach would be to use the following code. The modification to the regular expression is just placing the first part in a group and the last part in a group. Then in the replace string, we add back in the first and third groups. Not quite as nice as the first approach, but not quite as bad as writing out the $displayHeight part. Substitutions on MSDN shows how the $ characters work.
resultString = Regex.Replace(
im_cfg_contents,
#"(\$displayHeight[\s]*=[\s]*"")(.*)("";)",
"${1}" + Convert.ToString(y_res) + "${3}");
Try this:
resultString = Regex.Replace(
im_cfg_contents,
#"\$displayHeight[\s]*=[\s]*""(.*)"";",
#"\$displayHeight = """ + Convert.ToString(y_res) + #""";");
It replaces the whole string because you've matched the whole string - nothing about this statement tells C# to replace just the matched group, it will find and store that matched group sure, but it's still matching the whole string overall.
You can either change your replacer to:
#"\$displayHeight = """ + Convert.ToString(y_res) + #""";"
..or you can change your pattern to just match the digits, i.e.:
#"[0-9]+"
..or you could see if C# regex supports lookarounds (I'm not sure if it does offhand) and change your match accordingly.
You could also try this, though I think it is a little slower than my other method:
resultString = Regex.Replace(
im_cfg_contents,
"(?<=\\$displayHeight[\\s]*=[\\s]*\").*(?=\";)",
Convert.ToString(y_res));
Check this pattern out
(?<=(\$displayHeight\s*=\s*"))\d+(?=";)
A word about "lookaround".

Categories

Resources