Regex replace special character - c#

I need help in my regex.
I need to remove the special character found in the start of text
for example I have a text like this
.just a $#text this should not be incl#uded
The output should be like this
just a text this should not be incl#uded
I've been testing my regex here but i can't make it work
([\!-\/\;-\#]+)[\w\d]+
How do I limit the regex to check only the text that starts in special characters?
Thank you

Use \B[!-/;-#]+\s*\b:
var result = Regex.Replace(s, #"\B[!-/;-#]+\s*\b", "");
See the regex demo
Details
\B - the position other than a word boundary (there must be start of string or a non-word char immediately to the left of the current position)
[!-/;-#]+ - 1 or more ASCII punctuation
\s* - 0+ whitespace chars
\b - a word boundary, there must be a letter/digit/underscore immediately to the right of the current location.
If you plan to remove all punctuation and symbols, use
var result = Regex.Replace(s, #"\B[\p{P}\p{S}]+\s*\b", "");
See another regex demo.
Note that \p{P} matches any punctuation symbols and \p{S} matches any symbols.

Use lookahead:
(^[.$#]+|(?<= )[.$#]+)
The ^[.$#]+ is used to match the special characters at the start of a line.
The (?<= )[.$#]+) is used to matching the special characters at the start of a word which is in the sentence.
Add your special characters in the character group [] as you need.

Following are two possible options from your question details. Hope it will help you.
string input = ".just a $#text this should not be incl#uded";
//REMOVING ALL THE SPECIAL CHARACTERS FROM THE WHOLE STRING
string output1 = Regex.Replace(input, #"[^0-9a-zA-Z\ ]+", "");
// REMOVE LEADING SPECIAL CHARACTERS FROM EACH WORD IN THE STRING. WILL KEEP OTHER SPECIAL CHARACTERS
var split = input.Split();
string output2 = string.Join(" ", split.Select(s=> Regex.Replace(s, #"^[^0-9a-zA-Z]+", "")).ToArray());

Negative lookahead is fine here :
(?![\.\$#].*)[\S]+
https://regex101.com/r/i0aacp/11/
[\S] match any character
(?![\.\$#].*) negative lookahead means those characters [\S]+ should not start with any of \.\$#

Related

How to split Alphanumeric with Symbol in C#

I want to spilt Alphanumeric with two part Alpha and numeric with special character like -
string mystring = "1- Any Thing"
I want to store like:
numberPart = 1
alphaPart = Any Thing
For this i am using Regex
Regex re = new Regex(#"([a-zA-Z]+)(\d+)");
Match result = re.Match("1- Any Thing");
string alphaPart = result.Groups[1].Value;
string numberPart = result.Groups[2].Value;
If there is no space in between string its working fine but space and symbol both alphaPart and numberPart showing null where i am doing wrong Might be Regex expression is wrong for this type of filter please suggest me on same
Try this:
(\d+)(?:[^\w]+)?([a-zA-Z\s]+)
Demo
Explanation:
(\d+) - capture one or more digit
[^\w]+ match anything except alphabets
? this tell that anything between word and number can appear or not(when not space is between them)
[a-zA-Z\s]+ match alphabets(even if between them have spaces)
Start of string is matched with ^.
Digits are matched with \d+.
Any non-alphanumeric characters are matched with [\W_] or \W.
Anything is matched with .*.
Use
(?s)^(\d+)\W*(.*)
See proof
(?s) makes . match linebreaks. So, it literally matches everything.

Remove Dashes but Not Hyphens

I want to remove dashes before, after, and between spaced words, but not hyphenated words.
This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.
should become:
This is a test-sentence. Test One-Two--Three---Four.
Remove multiple dashes ---.
Keep multiple hyphens Three---Four.
I was trying to do it with this:
http://rextester.com/SXQ57185
string sentence = "This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.";
string regex = #"(?<!\w)\-(?!\-)|(?<!\-)\-(?!\w)";
sentence = Regex.Replace(sentence, regex, "");
Console.WriteLine(sentence);
But the output is:
This is a test-sentence. Test - One-TwoThree-Four--.
What I would recommend doing is a combination of both a positive lookback and a positive lookahead against the characters that you don't want the dashes to be next to. In your case, that would be spaces and full stops. If either the lookbehind or lookahead match, you want to remove that dash.
This would be: ((?<=[\s\.])\-+)|(\-+(?=[\s\.])).
Breaking this down:
((?<=[\s\.])\-+) - match hyphens that follow either a space or a full stop
| - or
(\-+(?=[\s\.]) - match hyphens that are followed by either a space or a full stop
Here's a JavaScript example showcasing that:
const string = 'This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.';
const regex = /((?<=[\s\.])\-+)|(\-+(?=[\s\.]))/g;
console.log(string.replace(regex, ''));
And this can also been seen on Regex101.
Note that you'll probably also want to trim the excess spaces after using this, which can simply be done with .Trim() in C#.
You can use \b|\s for this task.
/(\b|\s)(-{3})(\b|\s)/g
DEMO
Breakdown shamelessly copied from regex101.com:
/(\b|\s)(-{3})(\b|\s)/g
1st Capturing Group (\b|\s)
1st Alternative \b
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
2nd Alternative \s
\s matches any whitespace character (equal to [\r\n\t\f\v ])
2nd Capturing Group (-{3})
-{3} matches the character - literally (case sensitive)
{3} Quantifier — Matches exactly 3 times
3rd Capturing Group (\b|\s)
1st Alternative \b
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
2nd Alternative \s
\s matches any whitespace character (equal to [\r\n\t\f\v ])
You may just match all hyphens in between word chars, and remove all others with a simple
Regex.Replace(s, #"\b(-+)\b|-", "$1")
See the regex demo
Details
\b(-+)\b - word boundary, followed with 1+ hyphens, and then again a word boundary (that is, hyphen(s) in between letters, digits and underscores)
| - or
- - a hyphen in other contexts (it will be removed).
See the C# demo:
var s = "This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.";
var result = Regex.Replace(s, #"\b(-+)\b|-", "$1");
Console.WriteLine(result);
// => This is a test-sentence. Test One-Two--Three---Four.

C# RegEx Keep Line Breaks

I need to remove all special characters except line breaks. Does anyone know of a RegEx that would accomplish this task? Here is my RegEx:
string b = "ABC\r\nVVV";
string a = Regex.Replace(b, "[^\\x20-\\x7E]", "");
You match any char other than a char from space to tilde with "[^\\x20-\\x7E]". So, it matches CR and LF symbols. To avoid matching those chars, add them to the character class, and it is best to add + after the ] to match 1 or more occurrences to remove whole sequences at once:
string a = Regex.Replace(b, "[^\\x20-\\x7E\r\n]+", "");
See the regex demo at RegexStorm.

Escape special characters in xml

Using Regex, how can I escape special characters in xml attribute values?
Given the following xml as string:
"<node attr=\"<Sample>\"></node>"
I want to get:
"<node attr=\"<Sample>\"></node>"
System.Security.SecurityElement.Escape function won't work as it tries to escape every special characters (including tag opening/closing angle brackets).
string text = "<node attr=\"<Sample>\"></node>";
string pattern = #"(?<=\b\w+\s*=\s*"")<\w+>(?="")";
string result = Regex.Replace(text, pattern, m => SecurityElement.Escape(m.Value));
Console.WriteLine(text);
Console.WriteLine(result);
Where:
?<= - positive lookbehind
\b - start the match at a word boundary
\w+ - match one or more word characters
\s* - match zero or more white-space characters
?= - positive lookahead

.NET RegEx for letters and spaces

I am trying to create a regular expression in C# that allows only alphanumeric characters and spaces. Currently, I am trying the following:
string pattern = #"^\w+$";
Regex regex = new Regex(pattern);
if (regex.IsMatch(value) == false)
{
// Display error
}
What am I doing wrong?
If you just need English, try this regex:
"^[A-Za-z ]+$"
The brackets specify a set of characters
A-Z: All capital letters
a-z: All lowercase letters
' ': Spaces
If you need unicode / internationalization, you can try this regex:
#"$[\\p{L}\\s]+$"
See https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#word-character-w
This regex will match all unicode letters and spaces, which may be more than you need, so if you just need English / basic Roman letters, the first regex will be simpler and faster to execute.
Note that for both regex I have included the ^ and $ operator which mean match at start and end. If you need to pull this out of a string and it doesn't need to be the entire string, you can remove those two operators.
try this for all letter with space :
#"[\p{L} ]+$"
The character class \w does not match spaces. Try replacing it with [\w ] (there's a space after the \w to match word characters and spaces. You could also replace the space with \s if you want to match any whitespace.
If, other then 0-9, a-z and A-Z, you also need to cover any accented letters like ï, é, æ, Ć or Ş then you should better use the Unicode properties \p{...} for matching, i.e. (note the space):
string pattern = #"^[\p{IsLetter}\p{IsDigit} ]+$";
This regex works great for me.
Regex rgx = new Regex("[^a-zA-Z0-9_ ]+");
if (rgx.IsMatch(yourstring))
{
var err = "Special charactes are not allowed in Tags";
}

Categories

Resources