How have I screwed up my regex? - c#

I am really confused here. I have written a snippet of code in C# that is passed a possible file pathway. If it contains a character specified in a regex string, it should return false. However, the regex function Match refuses to find anything matching (I even set it to a singular character I knew was in the string), resulting in severe irritation from me.
The code is:
static bool letterTest(string pathway)
{
bool validPath = false;
char[] c = Path.GetInvalidPathChars();
string test = new string(c);
string regex = "["+test+"]";
string spTest = "^[~#%&*\\{}+<>/\"|]";
Match match = Regex.Match(pathway, spTest);
if (!match.Success)
{
validPath = true;
}
return validPath;
}
The string I pass to it is: #"C:/testing/invalid#symbol"
What am I doing wrong/misunderstanding with the regex, or is it something other than the regex that I have messed up?

Remove the initial caret from your regex:
[~#%&*\\{}+<>/\"|]
You are requiring that the path begin with one of those characters. By removing that constraint, it will search the whole string for any of those characters.
But why not use the framework to do the work for you?
Check this out: Check if a string is a valid Windows directory (folder) path

Instead of a regular expression you can just do the following.
static bool letterTest(string pathway)
{
char[] badChars = Path.GetInvalidPathChars();
return pathway.All(c => !badChars.Contains(c));
// or
// return !pathway.Any(c => badChars.Contains(c));
// or
// return badChars.All(bc => !pathway.Contains(bc));
// or
// return !badChars.Any(bc => pathway.Contains(bc));
}

Someone has already pointed out the caret that was anchoring your match to the first character. But there's another error you may not be aware of yet. This one has to do with your use of string literals. What you have now is a traditional, C-style string literal:
"[~#%&*\\{}+<>/\"|]"
...which becomes this regex:
[~#%&*\{}+<>/"|]
The double backslash has become a single backslash, which is treated as an escape for the following brace (\{). The brace doesn't need escaping inside a character class, but it's not considered a syntax error.
However, the regex will not detect a backslash as you intended. To do that, you need two backslashes in the regex, so there should be four backslashes in the string literal:
"[~#%&*\\\\{}+<>/\"|]"
Alternatively, you can use a C# verbatim string literal. Backslashes have no special meaning in a verbatim string. The only thing that needs special handling is the quotation mark, which you escape by adding another quotation mark:
#"[~#%&*\\{}+<>/""|]"

you have to escape the / literal
"^[~#%&*\\{}+<>\/\"|]"

Caret stands for negation of the character group. Removing it from spTest solves this issue.
string spTest = "[~#%&*\\{}+<>/\"|]";

Related

Trim Non-alphanum from beginning and end of string

what is the best way to trim ALL non alpha numeric characters from the beginning and end of a string ? I tried to add characters that I do no need manually but it doesn't work well and use the . I just need to trim anything not alphanumeric.
I tried using this function:
string something = "()&*1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
string somethingNew = Regex.Replace(something, #"[^\p{L}-\s]+", "");
But it removes all characters that are non alpha numeric from the string. What I basically want is like this:
"test1" -> test1
#!#!2test# -> 2test
(test3) -> test3
##test4---- -> test4
I do want to support unicode characters but not symbols..
EDIT:
The output of the example should be:
Littering aaaannnndóú
Regards
Assuming you want to trim non-alphanumeric characters from the start and end of your string:
s = new string(s.SkipWhile(c => !char.IsLetterOrDigit(c))
.TakeWhile(char.IsLetterOrDigit)
.ToArray());
#"[^\p{L}\s-]+(test\d*)|(test\d*)[^\p{L}\s-]+","$1"
You can use String function String.Trim Method (Char[]) in .NET library to trim the unnecessary characters from the given string.
From MSDN : String.Trim Method (Char[])
Removes all leading and trailing occurrences of a set of characters
specified in an array from the current String object.
Before trimming the unwanted characters, you need to first identify whether the character is Letter Or Digit, if it is non-alphanumeric then you can use String.Trim Method (Char[]) function to remove it.
you need to use Char.IsLetterOrDigit() function to identify wether the character is alphanumeric or not.
From MSDN: Char.IsLetterOrDigit()
Indicates whether a Unicode character is categorized as a letter or a
decimal digit.
Try This:
string str = "()&*1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
foreach (char ch in str)
{
if (!char.IsLetterOrDigit(ch))
str = str.Trim(ch);
}
Output:
1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9
If you need to remove any character which is not alphanumeric, you can use IsLetterOrDigit paired with a Where to go through every character. And because we're working at the char level, we'll need a little Concat at the end to bring everything back into a string.
string result = string.Concat(input.Where(char.IsLetterOrDigit));
which you can easily convert into an extension method
public static class Extensions
{
public static string ToAlphaNum(this string input)
{
return string.Concat(input.Where(char.IsLetterOrDigit));
}
}
that you can use like this :
string testString = "#!#!\"(test123)\"";
string result = testString.ToAlphaNum(); //test123
Note: this will remove every non-alphanumeric character from your string, if you really need to remove only those at the beginning/end, please add more details about what defines a beginning or an end and add more examples.
And you could also replace all the non-letters/numbers at the beginning and/or end of the line:
^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$
used as
resultString = Regex.Replace(subjectString, #"^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$", "", RegexOptions.Multiline);
If you really want to only remove characters at the beginning and end of the "String" and not do this line by line, then remove the ^$ match at linebreak option (RegexOption.Multiline)
If you wanted to include leading or trailing underscores, as characters to be retained, you could simplify the regex to:
^\W+|\W+$
The core of the regex:
[^\p{L}\p{N}]
is a negated character class which includes all of the characters in the Unicode class of Letters \p{L} or Numbers \p{N}
In other words:
Trim non-unicode alphanumeric characters
^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$
Options: Case sensitive; Exact spacing; Dot doesn't match line breaks; ^$ match at line breaks; Parentheses capture
Match this alternative «^[^\p{L}\p{N}]*»
Assert position at the beginning of a line «^»
Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character from the Unicode category “letter” «\p{L}»
A character from the Unicode category “number” «\p{N}»
Or match this alternative «[^\p{L}\p{N}]*$»
Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character from the Unicode category “letter” «\p{L}»
A character from the Unicode category “number” «\p{N}»
Assert position at the end of a line «$»
Created with RegexBuddy
Without using regex:
In Java, you could do: (in c# syntax would be nearly the same with same functionality)
while (true) {
if (word.length() == 0) {
return ""; // bad
}
if (!Character.isLetter(word.charAt(0))) {
word = word.substring(1);
continue; // so we are doing front first
}
if (!Character.isLetter(word.charAt(word.length()-1))) {
word = word.substring(0, word.length()-1);
continue; // then we are doing end
}
break; // if front is done, and end is done
}
you could use this pattern
^[^[:alnum:]]+|[^[:alnum:]]+$
with g option
Demo

Detecting "\" (backslash) using Regex

I have a C# Regex like
[\"\'\\/]+
that I want to use to evaluate and return error if certain special characters are found in a string.
My test string is:
\test
I have a call to this method to validate the string:
public static bool validateComments(string input, out string errorString)
{
errorString = null;
bool result;
result = !Regex.IsMatch(input, "[\"\'\\/]+"); // result is true if no match
// return an error if match
if (result == false)
errorString = "Comments cannot contain quotes (double or single) or slashes.";
return result;
}
However, I am unable to match the backslash. I have tried several tools such as regexpal and a VS2012 extension that both seem to match this regex just fine, but the C# code itself won't. I do realize that C# is escaping the string as it is coming in from a Javascript Ajax call, so is there another way to match this string?
It does match /test or 'test or "test, just not \test
The \ is used even by Regex(es). Try "[\"\'\\\\/]+" (so double escape the \)
Note that you could have #"[""'\\/]+" and perhaps it would be more readable :-) (by using the # the only character you have to escape is the ", by the use of a second "")
You don't really need the +, because in the end [...] means "one of", and it's enough for you.
Don't eat what you can't chew... Instead of regexes use
// result is true if no match
result = input.IndexOfAny(new[] { '"', '\'', '\\', '/' }) == -1;
I don't think anyone ever lost the work because he preferred IndexOf instead of a regex :-)
You can solve this by making the string verbatim like this #:
result = !Regex.IsMatch(input, #"[\""\'\\/]+");
Since backslashes are used as escapes inside regex themselves, I find it best to use verbatim strings when working with the regex library:
string input = #"\test";
bool result = !Regex.IsMatch(input, #"[""'\\]+");
// ^^
// You need to double the double-quotes when working with verbatim strings;
// All other characters, including backslashes, remain unchanged.
if (!result) {
Console.WriteLine("Comments cannot contain quotes (double or single) or slashes.");
}
The only issue with that is that you must double your double-quotes (which is ironically what you need to do in your case).
Demo on ideone.
For the trivial case, I am able to use regexhero.net for your test expression using the simple:
\\
to validate
\test
The code generated by RegExHero:
string strRegex = #"\\";
RegexOptions myRegexOptions = RegexOptions.IgnoreCase;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"\test";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}

Replacing double backwards slashes with single ones in c#

I need to replace double quotes with single so that something like this
\\\\servername\\dir1\\subdir1\\
becomes
\\servername\dir1\subdir1\
I tried this
string dir = "\\\\servername\\dir1\\subdir1\\";
string s = dir.Replace(#"\\", #"\");
The result I get is
\\servername\\dir1\\subdir1\\
Any ideas?
You don't need to replace anything here. The backslashes are escaped, that's why they are doubled.
Just like \t represents a tabulator, \\ represents a single \. You can see the full list of Escape Sequences on MSDN.
string dir = "\\\\servername\\dir1\\subdir1\\";
Console.WriteLine(dir);
This will output \\servername\dir1\subdir1\.
BTW: You can use the verbatim string to make it more readable:
string dir = #"\\servername\dir1\subdir1\";
There is no problem with the code for replacing. The result that you get is:
\servername\dir1\subdir1\
When you are looking at the result in the debugger, it's shown as it would be written as a literal string, so a backslash characters is shown as two backslash characters.
The string that you create isn't what you think it is. This code:
string dir = "\\\\servername\\dir1\\subdir1\\";
produces a string containing:
\\servername\dir1\subdir1\
The replacement code does replace the \\ at the beginning of the string.
If you want to produce the string \\\\servername\\dir1\\subdir1\\, you use:
string dir = #"\\\\servername\\dir1\\subdir1\\";
or:
string dir = "\\\\\\\\servername\\\\dir1\\\\subdir1\\\\";
This string "\\\\servername\\dir1\\subdir1\\" is the same as #"\\servername\dir1\subdir1\". In order to escape backslashes you need either use # symbol before string, or use double backslash instead of one.
Why you need that? Because in C# backslash used for escape sequences.

Why doesn't my code compile?

I am using regular expression in code behind file and defining string as
string ValEmail = "\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*";
if (Regex.IsMatch(email, "\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*"))
{ }
else
{ }
It gives me warning and does not compile. How can I define such string combination?.
In C# the backslash is a special character, if it is to represent a backslash we need to inform the compiler as such.
This can be achieved by escaping it with a backslash:
string ValEmail = "\\w+([-+.']\\w+)*#\\w+([-.]\\w+)*\\.\\w+([-.]\\w+)*";
Or using an # prefix when constructing the string:
string ValEmail = #"\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*";
The backslash is the escape char in c# strings. Technically you have to escape the backslash with another blackslash ("\\") or just add an # before your string:
string ValEmail = #"\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*";
Use #"\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*" so the backslashes will get escaped

How do i strip special characters from the end of a string?

I need to strip unknown characters from the end of a string returned from an SQL database. I also need to log when a special character occurs in the string.
What's the best way to do this?
You can use the Trim() method to trim blanks or specific characters from the end of a string. If you need to trim a certain number of characters you can use the Substring() method. You can use Regexs (System.Text.RegularExpressions namespace) to match patterns in a string and detect when they occur. See MSDN for more info.
If you need more help you'll need to provide a bit more info on what exactly you're trying to do.
First define what are unknown characters (chars other than 0-9, a to z and A to Z ?) and put them in an array
Loop trough the characters of a string and check if the char occurs, if so remove.
you can also to a String.Replace with as param the unknown char, and replaceparam ''.
Since you've specified that the legal characters are only alphanumeric, you could do something like this:
Match m = Regex.Match(original, "^([0-9A-Za-z]*)(.*)$");
string good = m.Groups[1].Value;
string bad = m.Groups[2].Value;
if (bad.Length > 0)
{
// log bad characters
}
Console.WriteLine(good);
Your definition of the problem is not precise yet this is a fast trick to do so:
string input;
...
var trimed = input.TrimEnd(new[] {'#','$',...} /* array of unwanted characters */);
if(trimed != input)
myLogger.Log(input.Replace(trimed, ""));
check out the Regex.Replace methods...there are lots of overloads. You can use the Match methods for the logging to identify all matches.
String badString = "HELLO WORLD!!!!";
Regex regex = new Regex("!{1,}$" );
String newString = regex.Replace(badString, String.Empty);

Categories

Resources