How to escape '-' (minus) with Regex? C# [duplicate]

How to escape '-' (minus) with Regex? C# [duplicate] - c#

I'm trying to write a method that replaces all occurrences of the characters in the input array (charsToReplace) with the replacementCharacter using regex. The version I have written does not work if the array contains any characters that may change the meaning of the regex pattern, such as ']' or '^'.
public static string ReplaceAll(string str, char[] charsToReplace, char replacementCharacter)
{
if(str.IsNullOrEmpty())
{
return string.Empty;
}
var pattern = $"[{new string(charsToReplace)}]";
return Regex.Replace(str, pattern, replacementCharacter.ToString());
}
So ReplaceAll("/]a", {'/', ']' }, 'a') should return "aaa".

Inside a character class, only 4 chars require escaping, ^, -, ] and \. You can't use Regex.Escape because it does not escape -and ] as they are not "special" outside a character class. Note that Regex.Escape is meant to be used only for literal char (sequences) that are outside character classes.
An unescaped ] char will close your character class prematurely and that is the main reason why your code does not work.
So, the fixed pattern variable definition can look like
var pattern = $"[{string.Concat(charsToReplace).Replace(#"\", #"\\").Replace("-", #"\-").Replace("^", #"\^").Replace("]", #"\]")}]";
See an online C# demo.

I suggest using Linq, not regular expresions:
using System.Linq;
...
public static string ReplaceAll(
string str, char[] charsToReplace, char replacementCharacter)
{
// Please, note IsNullOrEmpty syntax
// we should validate charsToReplace as well
if (string.IsNullOrEmpty(str) || null == charsToReplace || charsToReplace.Length <= 0)
return str; // let's just do nothing (say, not turn null into empty string)
return string.Concat(str.Select(c => charsToReplace.Contains(c)
? replacementCharacter
: c));
}
If you insist on Regex (note, that we should Regex.Escape chars within charsToReplace). However, according to the manual Regex.Escape doesn't escape - and [ which have special meaning within regular expression brackets.
public static string ReplaceAll(
string str, char[] charsToReplace, char replacementCharacter) {
if (string.IsNullOrEmpty(str) || null == charsToReplace || charsToReplace.Length <= 0)
return str;
string charsSet = string.Concat(charsToReplace
.Select(c => new char[] { ']', '-' }.Contains(c) // in case of '-' and ']'
? $#"\{c}" // escape them as well
: Regex.Escape(c.ToString())));
return Regex.Replace(
str,
$"[{charsSet}]+",
m => new string(replacementCharacter, m.Length));
}

Related

C# How to add a hyphen before each uppercase character in a string

How do I find the uppercase letters of an existing string and add (-) before each of them?
string inputStr = "fhkSGJndjHkjsdA";
string outputStr = String.Concat(inputStr.Where(x => Char.IsUpper(x)));
Console.WriteLine(outputStr);
Console.ReadKey();
This code finds the uppercase letters and prints them on the screen, but I want it to print:
fhk-S-G-Jndj-Hkjsd-A
How can I achieve this?

I think that using a RegEx would be much easier:
string outputStr = Regex.Replace(inputStr, "([A-Z])", "-$1");

Another option using Linq's aggregate:
string inputStr = "fhkSGJndjHkjsdA";
var result = inputStr.Aggregate(new StringBuilder(),
(acc, symbol) =>
{
if (Char.IsUpper(symbol))
{
acc.Append('-');
acc.Append(symbol);
}
else
{
acc.Append(symbol);
}
return acc;
}).ToString();
Console.WriteLine(result);

Using Where Filters a sequence of values based on a predicate and then String.Concat will concatenate all the values giving you SGJHA.
Instead, you could use Select, check per character for an uppercase char and return the char prepended with a - or the same char as a string when not an uppercase char.
string inputStr = "fhkSGJndjHkjsdA";
String outputStr = String.Concat(inputStr.Select(c => Char.IsUpper(c) ? "-" + c : c.ToString()));
Console.WriteLine(outputStr);
Output
fhk-S-G-Jndj-Hkjsd-A
C# demo
To also find unicode uppercase characters using a regex, you could use \p{Lu} to find an uppercase letter that has a lowercase variant as Char.IsUpper checks for a Unicode character.
In the replacement, you can use the full match using $0 prepended with a -
string inputStr = "fhkSGJndjHkjsdA";
string outputStr = Regex.Replace(inputStr, #"\p{Lu}", "-$0");
Console.WriteLine(outputStr);
Output
fhk-S-G-Jndj-Hkjsd-A
C# demo

Replace one character but not two in a string

I want to replace single occurrences of a character but not two in a string using C#.
For example, I want to replace & by an empty string but not when the ocurrence is &&. Another example, a&b&&c would become ab&&c after the replacement.
If I use a regex like &[^&], it will also match the character after the & and I don't want to replace it.
Another solution I found is to iterate over the string characters.
Do you know a cleaner solution to do that?

To only match one & (not preceded or followed by &), use look-arounds (?<!&) and (?!&):
(?<!&)&(?!&)
See regex demo
You tried to use a negated character class that still matches a character, and you need to use a look-ahead/look-behind to just check for some character absence/presence, without consuming it.
See regular-expressions.info:
Negative lookahead is indispensable if you want to match something not followed by something else. When explaining character classes, this tutorial explained why you cannot use a negated character class to match a q not followed by a u. Negative lookahead provides the solution: q(?!u).
Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a "b" that is not preceded by an "a", using negative lookbehind. It doesn't match cab, but matches the b (and only the b) in bed or debt.

You can match both & and && (or any number of repetition) and only replace the single one with an empty string:
str = Regex.Replace(str, "&+", m => m.Value.Length == 1 ? "" : m.Value);

You can use this regex: #"(?<!&)&(?!&)"
var str = Regex.Replace("a&b&&c", #"(?<!&)&(?!&)", "");
Console.WriteLine(str); // ab&&c

You can go with this:
public static string replacement(string oldString, char charToRemove)
{
string newString = "";
bool found = false;
foreach (char c in oldString)
{
if (c == charToRemove && !found)
{
found = true;
continue;
}
newString += c;
}
return newString;
}
Which is as generic as possible

I would use something like this, which IMO should be better than using Regex:
public static class StringExtensions
{
public static string ReplaceFirst(this string source, char oldChar, char newChar)
{
if (string.IsNullOrEmpty(source)) return source;
int index = source.IndexOf(oldChar);
if (index < 0) return source;
var chars = source.ToCharArray();
chars[index] = newChar;
return new string(chars);
}
}

I'll contribute to this statement from the comments:
in this case, only the substring with odd number of '&' will be replaced by all the "&" except the last "&" . "&&&" would be "&&" and "&&&&" would be "&&&&"
This is a pretty neat solution using balancing groups (though I wouldn't call it particularly clean nor easy to read).
Code:
string str = "11&222&&333&&&44444&&&&55&&&&&";
str = Regex.Replace(str, "&((?:(?<2>&)(?<-2>&)?)*)", "$1$2");
Output:
11222&&333&&44444&&&&55&&&&
ideone demo
It always matches the first & (not captured).
If it's followed by an even number of &, they're matched and stored in $1. The second group is captured by the first of the pair, but then it's substracted by the second.
However, if there's there's an odd number of of &, the optional group (?<-2>&)? does not match, and the group is not substracted. Then, $2 will capture an extra &
For example, matching the subject "&&&&", the first char is consumed and it isn't captured (1). The second and third chars are matched, but $2 is substracted (2). For the last char, $2 is captured (3). The last 3 chars were stored in $1, and there's an extra & in $2.
Then, the substitution "$1$2" == "&&&&".

Escape character in C#'s Split()

I am parsing some delimiter separated values, where ? is specified as the escape character in case the delimiter appears as part of one of the values.
For instance: if : is the delimiter, and a certain field the value 19:30, this needs to be written as 19?:30.
Currently, I use string[] values = input.Split(':'); in order to get an array of all values, but after learning about this escape character, this won't work anymore.
Is there a way to make Split take escape characters into account? I have checked the overload methods, and there does not seem to be such an option directly.

string[] substrings = Regex.Split("aa:bb:00?:99:zz", #"(?<!\?):");
for
aa
bb
00?:99
zz
Or as you probably want to unescape ?: at some point, replace the sequence in the input with another token, split and replace back.
(This requires the System.Text.RegularExpressions namespace to be used.)

This kind of stuff is always fun to code without using Regex.
The following does the trick with one single caveat: the escape character will always escape, it has no logic to check for only valid ones: ?;. So the string one?two;three??;four?;five will be split into onewo, three?, fourfive.
public static IEnumerable<string> Split(this string text, char separator, char escapeCharacter, bool removeEmptyEntries)
{
string buffer = string.Empty;
bool escape = false;
foreach (var c in text)
{
if (!escape && c == separator)
{
if (!removeEmptyEntries || buffer.Length > 0)
{
yield return buffer;
}
buffer = string.Empty;
}
else
{
if (c == escapeCharacter)
{
escape = !escape;
if (!escape)
{
buffer = string.Concat(buffer, c);
}
}
else
{
if (!escape)
{
buffer = string.Concat(buffer, c);
}
escape = false;
}
}
}
if (buffer.Length != 0)
{
yield return buffer;
}
}

No, there's no way to do that. You will need to use regex (which depends on how exactly do you want your "escape character" to behave). In worst case I suppose you'll have to do the parsing manually.

check content of string input

How can I check if my input is a particular kind of string. So no numeric, no "/",...

Well, to check that an input is actually an object of type System.String, you can simply do:
bool IsString(object value)
{
return value is string;
}
To check that a string contains only letters, you could do something like this:
bool IsAllAlphabetic(string value)
{
foreach (char c in value)
{
if (!char.IsLetter(c))
return false;
}
return true;
}
If you wanted to combine these, you could do so:
bool IsAlphabeticString(object value)
{
string str = value as string;
return str != null && IsAllAlphabetic(str);
}

If you mean "is the string completely letters", you could do:
string myString = "RandomStringOfLetters";
bool allLetters = myString.All( c => Char.IsLetter(c) );
This is based on LINQ and the Char.IsLetter method.

It's not entirely clear what you want, but you can probably do it with a regular expression. For example to check that your string contains only letters in a-z or A-Z you can do this:
string s = "dasglakgsklg";
if (Regex.IsMatch(s, "^[a-z]+$", RegexOptions.IgnoreCase))
{
Console.WriteLine("Only letters in a-z.");
}
else
{
// Not only letters in a-z.
}
If you also want to allow spaces, underscores, or other characters simply add them between the square brackets in the regular expression. Note that some characters have a special meaning inside regular expression character classes and need to be escaped with a backslash.
You can also use \p{L} instead of [a-z] to match any Unicode character that is considered to be a letter, including letters in foreign alphabets.

using System.Linq;
...
bool onlyAlphas = s.All(c => (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'));

Something like this (have not tested) may fit your (vague) requirement.
if (input is string)
{
// test for legal characters?
string pattern = "^[A-Za-z]+$";
if (Regex.IsMatch(input, pattern))
{
// legal string? do something
}
// or
if (input.Any(c => !char.IsLetter(c)))
{
// NOT legal string
}
}

C# 3.0 Remove chars from string

I have a string and what to
remove all characters except all english letters (a..z)
replace all whitespaces sequences with a single whitespace
How would you do that with C# 3.0 ?

Regex (edited)?
string s = "lsg #~A\tSd 2£R3 ad"; // note tab
s = Regex.Replace(s, #"\s+", " ");
s = Regex.Replace(s, #"[^a-zA-Z ]", ""); // "lsg A Sd R ad"

Of course the Regex solution is the best one (i think).
But someone HAS to do it in LINQ, so i had some fun. There you go:
bool inWhiteSpace = false;
string test = "lsg #~A\tSd 2£R3 ad";
var chars = test.Where(c => ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z') || char.IsWhiteSpace(c))
.Select(c => {
c = char.IsWhiteSpace(c) ? inWhiteSpace ? char.MinValue : ' ' : c;
inWhiteSpace = c == ' ' || c == char.MinValue;
return c;
})
.Where(c => c != char.MinValue);
string result = new string(chars.ToArray());

Using regular expressions of course!
string myCleanString = Regex.Replace(stringToCleanUp, #"[\W]", "");
string myCleanString = Regex.Replace(stringToCleanUp, #"[^a-zA-Z0-9]", "");

I think you can do this with regular expression .What Marc and boekwurm mentioned.
Try these links also http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
note : [a-z] :A range of characters. Matches any character in the specified
range. For example, “[a-z]” matches any lowercase alphabetic
character in the range “a” through “z”.
Regular expressions also provide special characters to represent common character
ranges. You could use “[0-9]” to match any numeric digit, or you can use “\d”. Similarly,
“\D” matches any non-numeric digit. Use “\s” to match any white-space character,
and use “\S” to match any non-white-space character.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to escape '-' (minus) with Regex? C# [duplicate] - c#

Related

C# How to add a hyphen before each uppercase character in a string

Replace one character but not two in a string

Escape character in C#'s Split()

check content of string input

C# 3.0 Remove chars from string

Categories

Resources