Detecting "\" (backslash) using Regex - c#

I have a C# Regex like
[\"\'\\/]+
that I want to use to evaluate and return error if certain special characters are found in a string.
My test string is:
\test
I have a call to this method to validate the string:
public static bool validateComments(string input, out string errorString)
{
errorString = null;
bool result;
result = !Regex.IsMatch(input, "[\"\'\\/]+"); // result is true if no match
// return an error if match
if (result == false)
errorString = "Comments cannot contain quotes (double or single) or slashes.";
return result;
}
However, I am unable to match the backslash. I have tried several tools such as regexpal and a VS2012 extension that both seem to match this regex just fine, but the C# code itself won't. I do realize that C# is escaping the string as it is coming in from a Javascript Ajax call, so is there another way to match this string?
It does match /test or 'test or "test, just not \test

The \ is used even by Regex(es). Try "[\"\'\\\\/]+" (so double escape the \)
Note that you could have #"[""'\\/]+" and perhaps it would be more readable :-) (by using the # the only character you have to escape is the ", by the use of a second "")
You don't really need the +, because in the end [...] means "one of", and it's enough for you.
Don't eat what you can't chew... Instead of regexes use
// result is true if no match
result = input.IndexOfAny(new[] { '"', '\'', '\\', '/' }) == -1;
I don't think anyone ever lost the work because he preferred IndexOf instead of a regex :-)

You can solve this by making the string verbatim like this #:
result = !Regex.IsMatch(input, #"[\""\'\\/]+");

Since backslashes are used as escapes inside regex themselves, I find it best to use verbatim strings when working with the regex library:
string input = #"\test";
bool result = !Regex.IsMatch(input, #"[""'\\]+");
// ^^
// You need to double the double-quotes when working with verbatim strings;
// All other characters, including backslashes, remain unchanged.
if (!result) {
Console.WriteLine("Comments cannot contain quotes (double or single) or slashes.");
}
The only issue with that is that you must double your double-quotes (which is ironically what you need to do in your case).
Demo on ideone.

For the trivial case, I am able to use regexhero.net for your test expression using the simple:
\\
to validate
\test
The code generated by RegExHero:
string strRegex = #"\\";
RegexOptions myRegexOptions = RegexOptions.IgnoreCase;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"\test";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}

Related

Using parentheses in input string for regex

I am trying to add items to a multi line TextBox. The TextBox should not take duplicate lines. If a duplicate is present then it should rename it to name (1). I am using Regex for this purpose.
Following is my function:
private string Rename(string input, string[] lines)
{
string output = string.Empty;
if (lines.Contains(input))
{
Regex regEx = new Regex(string.Format(#"\b{0}\b", input), RegexOptions.ExplicitCapture);
string[] str = lines.Select(x => x).Where(y => regEx.IsMatch(y)).ToArray();
regEx = new Regex(string.Format(#"\b{0}\b \(\d+\)", input));
string[] matchedStrings = str.Select(x => x).Where(y => regEx.IsMatch(y)).ToArray();
if (matchedStrings.Length > 0)
{
return string.Format("{0} ({1})", input, (matchedStrings.Length + 1));
}
else
{
return string.Format("{0} (1)", input, matchedStrings.Length);
}
}
else
{
return input;
}
}
This is how I call the function in a button click:
// textBox2 is a multiline text bor. textBox1 is where the input is taken from
textBox2.Text += Rename(textBox1.Text, textBox2.Lines) + Environment.NewLine;
The above code works for normal text. For example:
if input is abc and same input is given again, it returns abc (1). After this if I give input as abc (1), then the first regex pattern returns zero matches. Because of this, I am unable to rename appropriately.
You need to escape the input when injecting it to a regex:
// vvvvvvvvvvvvv v
Regex regEx = new Regex(string.Format(#"\b{0}\b", Regex.Escape(input)), RegexOptions.ExplicitCapture);
When input is abc (1), without escaping the regex becomes \babc (1)\b. See? The parentheses are unescaped, which makes literal (1) string a pattern of literal 1 captured in a group.
Please note the RegexOptions.ExplicitCapture you're using does not make parentheses treated literally. This option only tells the regex engine to use parentheses for grouping and not for capturing.
As a rule of thumb, do always escape strings injected into a regex.
The second issue is the closing \b fails to match after closing ). ) is not considered a word character so end of string after ) is not a word boundary. To fix it you may supply the closing \b with an alternative matching anything prepended by a ). This may be put as (?<=\)), which is an empty string preceded by a ). So the regex initialisation turns to:
// v vvvvvvvvv
Regex regEx = new Regex(string.Format(#"\b{0}(\b|(?<=\)))", Regex.Escape(input)), RegexOptions.ExplicitCapture);

Replace one character but not two in a string

I want to replace single occurrences of a character but not two in a string using C#.
For example, I want to replace & by an empty string but not when the ocurrence is &&. Another example, a&b&&c would become ab&&c after the replacement.
If I use a regex like &[^&], it will also match the character after the & and I don't want to replace it.
Another solution I found is to iterate over the string characters.
Do you know a cleaner solution to do that?
To only match one & (not preceded or followed by &), use look-arounds (?<!&) and (?!&):
(?<!&)&(?!&)
See regex demo
You tried to use a negated character class that still matches a character, and you need to use a look-ahead/look-behind to just check for some character absence/presence, without consuming it.
See regular-expressions.info:
Negative lookahead is indispensable if you want to match something not followed by something else. When explaining character classes, this tutorial explained why you cannot use a negated character class to match a q not followed by a u. Negative lookahead provides the solution: q(?!u).
Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a "b" that is not preceded by an "a", using negative lookbehind. It doesn't match cab, but matches the b (and only the b) in bed or debt.
You can match both & and && (or any number of repetition) and only replace the single one with an empty string:
str = Regex.Replace(str, "&+", m => m.Value.Length == 1 ? "" : m.Value);
You can use this regex: #"(?<!&)&(?!&)"
var str = Regex.Replace("a&b&&c", #"(?<!&)&(?!&)", "");
Console.WriteLine(str); // ab&&c
You can go with this:
public static string replacement(string oldString, char charToRemove)
{
string newString = "";
bool found = false;
foreach (char c in oldString)
{
if (c == charToRemove && !found)
{
found = true;
continue;
}
newString += c;
}
return newString;
}
Which is as generic as possible
I would use something like this, which IMO should be better than using Regex:
public static class StringExtensions
{
public static string ReplaceFirst(this string source, char oldChar, char newChar)
{
if (string.IsNullOrEmpty(source)) return source;
int index = source.IndexOf(oldChar);
if (index < 0) return source;
var chars = source.ToCharArray();
chars[index] = newChar;
return new string(chars);
}
}
I'll contribute to this statement from the comments:
in this case, only the substring with odd number of '&' will be replaced by all the "&" except the last "&" . "&&&" would be "&&" and "&&&&" would be "&&&&"
This is a pretty neat solution using balancing groups (though I wouldn't call it particularly clean nor easy to read).
Code:
string str = "11&222&&333&&&44444&&&&55&&&&&";
str = Regex.Replace(str, "&((?:(?<2>&)(?<-2>&)?)*)", "$1$2");
Output:
11222&&333&&44444&&&&55&&&&
ideone demo
It always matches the first & (not captured).
If it's followed by an even number of &, they're matched and stored in $1. The second group is captured by the first of the pair, but then it's substracted by the second.
However, if there's there's an odd number of of &, the optional group (?<-2>&)? does not match, and the group is not substracted. Then, $2 will capture an extra &
For example, matching the subject "&&&&", the first char is consumed and it isn't captured (1). The second and third chars are matched, but $2 is substracted (2). For the last char, $2 is captured (3). The last 3 chars were stored in $1, and there's an extra & in $2.
Then, the substitution "$1$2" == "&&&&".

How have I screwed up my regex?

I am really confused here. I have written a snippet of code in C# that is passed a possible file pathway. If it contains a character specified in a regex string, it should return false. However, the regex function Match refuses to find anything matching (I even set it to a singular character I knew was in the string), resulting in severe irritation from me.
The code is:
static bool letterTest(string pathway)
{
bool validPath = false;
char[] c = Path.GetInvalidPathChars();
string test = new string(c);
string regex = "["+test+"]";
string spTest = "^[~#%&*\\{}+<>/\"|]";
Match match = Regex.Match(pathway, spTest);
if (!match.Success)
{
validPath = true;
}
return validPath;
}
The string I pass to it is: #"C:/testing/invalid#symbol"
What am I doing wrong/misunderstanding with the regex, or is it something other than the regex that I have messed up?
Remove the initial caret from your regex:
[~#%&*\\{}+<>/\"|]
You are requiring that the path begin with one of those characters. By removing that constraint, it will search the whole string for any of those characters.
But why not use the framework to do the work for you?
Check this out: Check if a string is a valid Windows directory (folder) path
Instead of a regular expression you can just do the following.
static bool letterTest(string pathway)
{
char[] badChars = Path.GetInvalidPathChars();
return pathway.All(c => !badChars.Contains(c));
// or
// return !pathway.Any(c => badChars.Contains(c));
// or
// return badChars.All(bc => !pathway.Contains(bc));
// or
// return !badChars.Any(bc => pathway.Contains(bc));
}
Someone has already pointed out the caret that was anchoring your match to the first character. But there's another error you may not be aware of yet. This one has to do with your use of string literals. What you have now is a traditional, C-style string literal:
"[~#%&*\\{}+<>/\"|]"
...which becomes this regex:
[~#%&*\{}+<>/"|]
The double backslash has become a single backslash, which is treated as an escape for the following brace (\{). The brace doesn't need escaping inside a character class, but it's not considered a syntax error.
However, the regex will not detect a backslash as you intended. To do that, you need two backslashes in the regex, so there should be four backslashes in the string literal:
"[~#%&*\\\\{}+<>/\"|]"
Alternatively, you can use a C# verbatim string literal. Backslashes have no special meaning in a verbatim string. The only thing that needs special handling is the quotation mark, which you escape by adding another quotation mark:
#"[~#%&*\\{}+<>/""|]"
you have to escape the / literal
"^[~#%&*\\{}+<>\/\"|]"
Caret stands for negation of the character group. Removing it from spTest solves this issue.
string spTest = "[~#%&*\\{}+<>/\"|]";

RegEx.Replace to Replace Whole Words and Skip when Part of the Word

I am using regex to replace certain keywords from a string (or Stringbuilder) with the ones that I choose. However, I fail to build a valid regex pattern to replace only whole words.
For example, if I have InputString = "fox foxy" and want to replace "fox" with "dog" it the output would be "dog dogy".
What is the valid RegEx pattern to take only "fox" and leave "foxy"?
public string Replace(string KeywordToReplace, string Replacement) /
{
this.Replacement = Replacement;
this.KeywordToReplace = KeywordToReplace;
Regex RegExHelper = new Regex(KeywordToReplace, RegexOptions.IgnoreCase);
string Output = RegExHelper.Replace(InputString, Replacement);
return Output;
}
Thanks!
Regexes support a special escape sequence that represents a word boundary. Word-characters are everything in [a-zA-Z0-9]. So a word-boundary is between any character that belongs in this group and a character that doesn't. The escape sequence is \b:
\bfox\b
Do not forget to put '#' symbol before your '\bword\b'.
For example:
address = Regex.Replace(address, #"\bNE\b", "Northeast");
# symbol ensures escape character, backslash(\), does not get escaped!
You need to use boundary..
KeywordToReplace="\byourWord\b"

Regexp skip pattern

Problem
I need to replace all asterisk symbols('*') with percent symbol('%'). The asterisk symbols in square brackets should be ignored.
Example
[Test]
public void Replace_all_asterisks_outside_the_square_brackets()
{
var input = "Hel[*o], w*rld!";
var output = Regex.Replace(input, "What_pattern_should_be_there?", "%")
Assert.AreEqual("Hel[*o], w%rld!", output));
}
Try using a look ahead:
\*(?![^\[\]]*\])
Here's a bit stronger solution, which takes care of [] blocks better, and even escaped \[ characters:
string text = #"h*H\[el[*o], w*rl\]d!";
string pattern = #"
\\. # Match an escaped character. (to skip over it)
|
\[ # Match a character class
(?:\\.|[^\]])* # which may also contain escaped characters (to skip over it)
\]
|
(?<Asterisk>\*) # Match `*` and add it to a group.
";
text = Regex.Replace(text, pattern,
match => match.Groups["Asterisk"].Success ? "%" : match.Value,
RegexOptions.IgnorePatternWhitespace);
If you don't care about escaped characters you can simplify it to:
\[ # Skip a character class
[^\]]* # until the first ']'
\]
|
(?<Asterisk>\*)
Which can be written without comments as: #"\[[^\]]*\]|(?<Asterisk>\*)".
To understand why it works we need to understand how Regex.Replace works: for every position in the string it tries to match the regex. If it fails, it moves one character. If it succeeds, it moves over the whole match.
Here, we have dummy matches for the [...] blocks so we may skip over the asterisks we don't want to replace, and match only the lonely ones. That decision is made in a callback function that checks if Asterisk was matched or not.
I couldn't come up with a pure RegEx solution. Therefore I am providing you with a pragmatic solution. I tested it and it works:
[Test]
public void Replace_all_asterisks_outside_the_square_brackets()
{
var input = "H*]e*l[*o], w*rl[*d*o] [o*] [o*o].";
var actual = ReplaceAsterisksNotInSquareBrackets(input);
var expected = "H%]e%l[*o], w%rl[*d*o] [o*] [o*o].";
Assert.AreEqual(expected, actual);
}
private static string ReplaceAsterisksNotInSquareBrackets(string s)
{
Regex rx = new Regex(#"(?<=\[[^\[\]]*)(?<asterisk>\*)(?=[^\[\]]*\])");
var matches = rx.Matches(s);
s = s.Replace('*', '%');
foreach (Match match in matches)
{
s = s.Remove(match.Groups["asterisk"].Index, 1);
s = s.Insert(match.Groups["asterisk"].Index, "*");
}
return s;
}
EDITED
Okay here is my final attempt ;)
Using negative lookbehind (?<!) and negative lookahead (?!).
var output = Regex.Replace(input, #"(?<!\[)\*(?!\])", "%");
This also passes the test in the comment to another answer "Hel*o], w*rld!"

Categories

Resources