How can I remove all special chars from UTF8 text in c#? - c#

I would like to remove all special characters from my UTF8 text, but I can't find any matching regular expression.
My text like this:
ASDÉÁPŐÓÖŰ_->,.!"%=%!HMHF
I would like to remove only these chars: _->,.!"%=%!
I tried this regex:
result = Regex.Replace(text, #"([^a-zA-Z0-9_]|^\s)", "");
But it removes my uft8 chars also.
I don't want to remove the accented characters, but I want to remove all glyph.

Regex.Replace(text, #"([^\w]|_)", "")

you want only numbers and letters?
then this is your solution:
result = Regex.Replace(text, "[^0-9a-zA-Z]+", "");
you could also try to specify a range in the ASCII table if you want a custom way of things stay in your string:
result = Regex.Replace(text, "[^\x00-\x80]+", "");

Related

Html.Decoded ­ is problematic in string functions

string local= HttpUtility.HtmlDecode(GetLocalizedSupportPhone()).Replace("-", "").Replace(" ", "");
I am getting a string :
"0­12­4 41­481­73"
from the GetLocalizedSupportPhone() method. The Html Decode method returns:
"0-12-4 41-481-73"
I have a list of phone numbers like:- "01244148173", "01244148173", etc which are plain integers without any space character or html character.
Problem scenario:- All i want to do is to get decoded local string ("0-12-4 41-481-73"), replace the ­ as well as " " with empty string character and compare the resultant local string with the list items. If a similar list item exists, then remove that particular list item.
But strangely, the .Replace() method replaces space character with blank string but is unable to replace "-" with empty string.
I am just curious why is it happening? Why ANY OF THE STRING METHODS (like I tried with .split() ) can not detect "-"?
There are different types of hyphens. ­ is a soft hyphen. Specifically the soft hyphen is 173 and the hyphen on your keyboard is 45.
Try this instead.
var r = HttpUtility.HtmlDecode("0­12­4 41­481­73")
.Replace((char)173, ' ')
.Replace(" ", "");
That will replace the soft hyphen with a space and then your second replace will get rid of that.
Another option would be to use a regular expression to remove all non-numeric values.
Regex nonNumeric = new Regex(#"\D");
var r = nonNumeric.Replace(
HttpUtility.HtmlDecode("0­12­4 41­481­73"),
string.Empty);
This might help if you're just looking to strip spaces and soft hypens from a string without having to deal with HTML decoding:
var regex = new Regex(#"\u00ad| ");
var result = regex.Replace(stringWithSoftHyphens, string.Empty);
I tried doing this with Trim((char)173) but it (and methods like Split) do not seem to be able to handle the soft hyphen character like the Regex class can.

Regex to match only numbers , no apostrophes

I want to match only numbers in the following string
String : "40’000"
Match : "40000"
basically tring to ignore apostrophe.
I am using C#, in case it matters.
Cant use any C# methods, need to only use Regex.
Replace like this it replace all char excpet numbers
string input = "40’000";
string result = Regex.Replace(input, #"[^\d]", "");
Since you said; I just want to pick up numbers only, how about without regex?
var s = "40’000";
var result = new string(s.Where(char.IsDigit).ToArray());
Console.WriteLine(result); // 40000
I suggest use regex to find the special characters not the digits, and then replace by ''.
So a simple (?=\S)\D should be enough, the (?=\S) is to ignore the whitespace at the end of number.
DEMO
Replace like this it replace all char excpet numbers and points
string input = "40’000";
string result = Regex.Replace(input, #"[^\d^.]", "");
Don't complicate your life, use Regex.Replace
string s = "40'000";
string replaced = Regex.Replace(s, #"\D", "");

Need RegEx to remove all alphabets from string

I need a regex to move all alphabets from string (A-Z) and (a-z)..everything including any kind of special character should remain intact. I tried #"[^\d]" but it only returns numbers in string.
String : asd!## $%dfdf4545D jasjkd #(*)jdjd56
desired output : !## $%4545 #(*)56
Just replace all undesired characters with an empty string sequence:
string filtered = Regex.Replace(input, "[A-Za-z]", "");
Try the following regular expression:
[^a-zA-Z]
This will match all non-english letters.

Performance issue when removing numbers from large string

I have a function containing the following code:
Text = Text.Where(c => !Char.IsDigit(c)).Aggregate<char, string>(null, (current, c) => current + c);
but it is rather slow. Is there anyway I can speed it up?
Try this regex:
Text = Regex.Replace(Text, #"\d+", "");
\d+ is more efficient than just \d because it removes multiple consecutive digits at once.
Yes, you can use Regex.Replace:
Text = Regex.Replace(Text, "\\d", "");
The regular expression matches a single digit. Regex.Replace replaces each occurrence of it in the Text string with an empty string "".
All those concatenations are probably killing you. The easiest/best is probably a regex:
Text = Regex.Replace(Text, "\\d", "");
Or you can try making only one new string instance:
Text = new string(Text.Where(c => !Char.IsDigit(c)).ToArray())
Try with Regex.Replace;
In a specified input string, replaces strings that match a regular
expression pattern with a specified replacement string.
Regex.Replace(Text, "\\d+", "");
Here is a DEMO.

Visual c# Replace special characters and white space from a string

I want to replace the white space and special characters with a hyphen.
I want to all the non-letters characters with a hyphen like ?,(,),{,},[,],<,>,",',!,#<# etc
This would do all non-alpha-numeric and non-whitespace characters:
var input = "this i$ s#m3 inp^t";
var replaced = Regex.Replace(input, #"[^\d\w\s]","-");
Console.WriteLine(replaced);
// Output: this i- s-m3 inp-t
Depending on how you define "special characters", you can just do:
yourString = Regex.Replace(yourString,#"\W","-");

Categories

Resources