Remove punctuation from string with Regex - c#

I'm really bad with Regex but I want to remove all these .,;:'"$##!?/*&^-+ out of a string
string x = "This is a test string, with lots of: punctuations; in it?!.";
How can I do that ?

First, please read here for information on regular expressions. It's worth learning.
You can use this:
Regex.Replace("This is a test string, with lots of: punctuations; in it?!.", #"[^\w\s]", "");
Which means:
[ #Character block start.
^ #Not these characters (letters, numbers).
\w #Word characters.
\s #Space characters.
] #Character block end.
In the end it reads "replace any character that is not a word character or a space character with nothing."

This code shows the full RegEx replace process and gives a sample Regex that only keeps letters, numbers, and spaces in a string - replacing ALL other characters with an empty string:
//Regex to remove all non-alphanumeric characters
System.Text.RegularExpressions.Regex TitleRegex = new
System.Text.RegularExpressions.Regex("[^a-z0-9 ]+",
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
string ParsedString = TitleRegex.Replace(stringToParse, String.Empty);
return ParsedString;
And I've also stored the code here for future use:
http://code.justingengo.com/post/Use%20a%20Regular%20Expression%20to%20Remove%20all%20Punctuation%20from%20a%20String
Sincerely,
S. Justin Gengo
http://www.justingengo.com

Related

Regex replace special character

I need help in my regex.
I need to remove the special character found in the start of text
for example I have a text like this
.just a $#text this should not be incl#uded
The output should be like this
just a text this should not be incl#uded
I've been testing my regex here but i can't make it work
([\!-\/\;-\#]+)[\w\d]+
How do I limit the regex to check only the text that starts in special characters?
Thank you
Use \B[!-/;-#]+\s*\b:
var result = Regex.Replace(s, #"\B[!-/;-#]+\s*\b", "");
See the regex demo
Details
\B - the position other than a word boundary (there must be start of string or a non-word char immediately to the left of the current position)
[!-/;-#]+ - 1 or more ASCII punctuation
\s* - 0+ whitespace chars
\b - a word boundary, there must be a letter/digit/underscore immediately to the right of the current location.
If you plan to remove all punctuation and symbols, use
var result = Regex.Replace(s, #"\B[\p{P}\p{S}]+\s*\b", "");
See another regex demo.
Note that \p{P} matches any punctuation symbols and \p{S} matches any symbols.
Use lookahead:
(^[.$#]+|(?<= )[.$#]+)
The ^[.$#]+ is used to match the special characters at the start of a line.
The (?<= )[.$#]+) is used to matching the special characters at the start of a word which is in the sentence.
Add your special characters in the character group [] as you need.
Following are two possible options from your question details. Hope it will help you.
string input = ".just a $#text this should not be incl#uded";
//REMOVING ALL THE SPECIAL CHARACTERS FROM THE WHOLE STRING
string output1 = Regex.Replace(input, #"[^0-9a-zA-Z\ ]+", "");
// REMOVE LEADING SPECIAL CHARACTERS FROM EACH WORD IN THE STRING. WILL KEEP OTHER SPECIAL CHARACTERS
var split = input.Split();
string output2 = string.Join(" ", split.Select(s=> Regex.Replace(s, #"^[^0-9a-zA-Z]+", "")).ToArray());
Negative lookahead is fine here :
(?![\.\$#].*)[\S]+
https://regex101.com/r/i0aacp/11/
[\S] match any character
(?![\.\$#].*) negative lookahead means those characters [\S]+ should not start with any of \.\$#

Make Regex Match word containing spetial characters

My Code is like this:
string currentPageSlug = "securities/EBR#03L$ZZZ";
string patern= #"securities/(\w+)[\#\$]";
string res = Regex.Match(currentPageSlug, patern).Value;
Console.WriteLine(res);
which gives me this result:
securities/EBR#
but I want to get:
securities/EBR#03L$ZZZ
whole word including all special characters (# and $ and maybe others too)
my regex pattern does not seem to work.
Your regex matches words followed by a single special character. You need to include [#$] in the repeating construct +, like this:
string patern= #"securities/((?:\w|[#$])+)";
Note that since # and $ are used inside a character class, it is not necessary to escape them with a backslash \.

Regex to remove dots from string

I have this
regex Regex.Replace(listing.Company, #"[^A-Za-z0-9_\.~]+", "-");
listing.Company is a string, this works but when a string has dots it does not remove them.
Could you please help me out
In your current regex, you have \. in your exclusion, which will cause it to be ignored by Regex.Replace. Also, your regex does nothing to convert the input string to lower case. You can do that afterwards, but doing it before your Replace makes your pattern simpler.
Try this method out:
var output = Regex.Replace(listing.Company.ToLower(), "[^a-z0-9_]+", "-");
Perhaps you are looking for something like this:
string res = Regex.Replace(listing.Company, #"[\W+\.~]", "-");
Here regex engine will look for any character other than A-Z, a-z, underscore along with dot and ~ and will replace it with "-".
Demo
try
Regex.Replace(listing.Company.ToLower(), #"[^a-z0-9_]+", "-");
you are excluding \. which is for dot.
Also, if you want it in lower letters, you need to convert the string to lower case first.

C# - Removing single word in string after certain character

I have string that I would like to remove any word following a "\", whether in the middle or at the end, such as:
testing a\determiner checking test one\pronoun
desired result:
testing a checking test one
I have tried a simple regex that removes anything between the backslash and whitespace, but it gives the following result:
string input = "testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*\s");
string output = regex.Replace(input, " ");
Result:
testing a one\pronoun
It looks like this regex matches from the backslash until the last whitespace in the string. I cannot seem to figure out how to match from the backlash to the next whitespace. Also, I am not guaranteed a whitespace at the end, so I would need to handle that. I could continue processing the string and remove any text after the backslash, but I was hoping I could handle both cases with one step.
Any advice would be appreciated.
Change .* which match any characters, to \w*, which only match word characters.
Regex regex = new Regex(#"\\\w*");
string output = regex.Replace(input, "");
".*" matches zero or more characters of any kind. Consider using "\w+" instead, which matches one or more "word" characters (not including whitespace).
Using "+" instead of "*" would allow a backslash followed by a non-"word" character to remain unmatched. For example, no matches would be found in the sentence "Sometimes I experience \ an uncontrollable compulsion \ to intersperse backslash \ characters throughout my sentences!"
With your current pattern, .* tells the parser to be "greedy," that is, to take as much of the string as possible until it hits a space. Adding a ? right after that * tells it instead to make the capture as small as possible--to stop as soon as it hits the first space.
Next, you want to end at not just a space, but at either a space or the end of the string. The $ symbol captures the end of the string, and | means or. Group those together using parentheses and your group collectively tells the parser to stop at either a space or the end of the string. Your code will look like this:
string input = #"testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*?(\s|$)");
string output = regex.Replace(input, " ");
Try this regex (\\[^\s]*)
(\\[^\s]*)
1st Capturing group (\\[^\s]*)
\\ matches the character \ literally
[^\s]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ].

.NET RegEx for letters and spaces

I am trying to create a regular expression in C# that allows only alphanumeric characters and spaces. Currently, I am trying the following:
string pattern = #"^\w+$";
Regex regex = new Regex(pattern);
if (regex.IsMatch(value) == false)
{
// Display error
}
What am I doing wrong?
If you just need English, try this regex:
"^[A-Za-z ]+$"
The brackets specify a set of characters
A-Z: All capital letters
a-z: All lowercase letters
' ': Spaces
If you need unicode / internationalization, you can try this regex:
#"$[\\p{L}\\s]+$"
See https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#word-character-w
This regex will match all unicode letters and spaces, which may be more than you need, so if you just need English / basic Roman letters, the first regex will be simpler and faster to execute.
Note that for both regex I have included the ^ and $ operator which mean match at start and end. If you need to pull this out of a string and it doesn't need to be the entire string, you can remove those two operators.
try this for all letter with space :
#"[\p{L} ]+$"
The character class \w does not match spaces. Try replacing it with [\w ] (there's a space after the \w to match word characters and spaces. You could also replace the space with \s if you want to match any whitespace.
If, other then 0-9, a-z and A-Z, you also need to cover any accented letters like ï, é, æ, Ć or Ş then you should better use the Unicode properties \p{...} for matching, i.e. (note the space):
string pattern = #"^[\p{IsLetter}\p{IsDigit} ]+$";
This regex works great for me.
Regex rgx = new Regex("[^a-zA-Z0-9_ ]+");
if (rgx.IsMatch(yourstring))
{
var err = "Special charactes are not allowed in Tags";
}

Categories

Resources