Remove non-alphanumerical characters excluding space - c#

I have this statement:
String cap = Regex.Replace(winCaption, #"[^\w\.#-]", "");
that transforms "Hello | World!?" to "HelloWorld".
But I want to preserve space character, for example: "Hello | World!?" to "HelloΒ Β World".
How can I do this?

just add a space to your set of characters, [^\w.#- ]
var winCaption = "Hello | World!?";
String cap = Regex.Replace(winCaption, #"[^\w\.#\- ]", "");
Note that you have to escape the 'dash' (-) character since it normally is used to denote a range of characters (for instance, [A-Za-z0-9])

Here you go...
string cap = Regex.Replace(winCaption, #"[^\w \.#-]", "");

Try this:
String cap= Regex.Replace(winCaption, #"[^\w\.#\- ]", "");

Related

Regex to allow some special character with unicodes

I want to allow some special characters like (,),\,_,., etc
and emojis is denoted by [\u0000-\u007F]+
Valid names are
"🎊❀️\U0001f923\U0001f923πŸ˜ŠπŸ˜ŠπŸŽ‰πŸŽ‰πŸ˜πŸ˜"
"12333🎊❀️\U0001f923\U0001f923πŸ˜ŠπŸ˜ŠπŸŽ‰πŸŽ‰πŸ˜πŸ˜.txt"
"123()-213🎊❀️\U0001f923\U0001f923πŸ˜ŠπŸ˜ŠπŸŽ‰πŸŽ‰πŸ˜πŸ˜.txt"
Invalid specialcharacters should be replaced with ""
"123^&*()!##$🎊❀️\U0001f923\U0001f923πŸ˜ŠπŸ˜ŠπŸŽ‰πŸŽ‰πŸ˜πŸ˜" should be
"123()🎊❀️\U0001f923\U0001f923πŸ˜ŠπŸ˜ŠπŸŽ‰πŸŽ‰πŸ˜πŸ˜"
This regex does it for some special characters
string filename = "12%&^%^&% \U0001f973\U0001f973.xlsx"
string output = Regex.Replace(filename, #"[^\w\s\.\\[\]()|_-]+", "");
prints "12 .xlsx"
For unicode characters (like emojis)
string output = Regex.Replace(filename, #"[\u0000-\u007F]+", "");
prints"\U0001f973\U0001f973"
While combining i want
"12 \U0001f973\U0001f973.xlsx"
I have tried
Test 1
string output = Regex.Replace(filename, #"[^\w\s\.\\[\]()|_-]+|[^\u0000-\u007F]+", "");
"12 .xlsx" // but no luck
Test 2
string output = Regex.Replace(filename, #"[^-\w\s\.\\\[\]()|_\u0000-\u007F]+", "");
prints "";
You need the "opposite" of your unicode range in order to be able to add it to your negated character class. Try:
[^\u0080-\uFFFF\w\s\.\\[\]()|_-]+

Regex for text between two characters

I'm trying to get some text between two strings in C# in regex expression.
The text is in variable (tb1.product_name) : Example Text | a:10,Colour:Green
Get all text before |, in this case, Example Text
Get all text between : and ,, in this case, 10
In two differents regex.
I try with:
Regex.Match(tb1.product_name, #"\:([^,]*)\)").Groups[1].Value
But this doesn't work.
If it is not so necessary to use regex, you can do this simply by using string.Substring & string.IndexOf:
string str = "Example Text | a:10,Colour:Green";
string strBeforeVerticalBar = str.Substring(0, str.IndexOf('|'));
string strInBetweenColonAndComma = str.Substring(str.IndexOf(':') + 1, str.IndexOf(',') - str.IndexOf(':') - 1);
Edit 1:
I feel Regex might be an overkill for something as simple as this. Also if use what i suggested, you can add Trim() at the end to remove whitespaces, if any. Like:
string strBeforeVerticalBar = str.Substring(0, str.IndexOf('|')).Trim();
string strInBetweenColonAndComma = str.Substring(str.IndexOf(':') + 1, str.IndexOf(',') - str.IndexOf(':') - 1).Trim();
string str = #"Example Text |a:10,Colour: Green";
Match match = Regex.Match(str, #"^([A-Za-z\s]*)|$");
Match match2= Regex.Match(str, #":([0-9]*),");
//output Example Text
Console.WriteLine(match.Groups[1].Value);
//output 10
Console.WriteLine(match2.Groups[1].Value);

How can I cut out the below pattern from a string using Regex?

I have a string which will have the word "TAG" followed by an integer,underscore and another word.
Eg: "TAG123_Sample"
I need to cut the "TAGXXX_" pattern and get only the word Sample. Meaning I will have to cut the word "TAG" and the integer followed by and the underscore.
I wrote the following code but it doesn't work. What have I done wrong? How can I do this? Please advice.
static void Main(string[] args)
{
String sentence = "TAG123_Sample";
String pattern=#"TAG[^\d]_";
String replacement = "";
Regex r = new Regex(pattern);
String res = r.Replace(sentence,replacement);
Console.WriteLine(res);
Console.ReadLine();
}
You're currently negating (matching NOT a digit), you need to modify the regex as follows:
String s = "TAG123_Sample";
String r = Regex.Replace(s, #"TAG\d+_", "");
Console.WriteLine(r); //=> "Sample"
Explanation:
TAG match 'TAG'
\d+ digits (0-9) (1 or more times)
_ '_'
You can use String.Split for this:
string[] s = "TAG123_Sample".Split('_');
Console.WriteLine(s[1]);
https://msdn.microsoft.com/en-us/library/b873y76a.aspx
Try this will work in this case for sure:
resultString = Regex.Replace(sentence ,
#"^ # Match start of string
[^_]* # Match 0 or more characters except underscore
_ # Match the underscore", "", RegexOptions.IgnorePatternWhitespace);
No regex is necessary if your string contains 1 underscore and you need to get a substring after it.
Here is a Substring+IndexOf-based approach:
var res = sentence.Substring(sentence.IndexOf('_') + 1); // => Sample
See IDEONE demo

How to remove white spaces from sentence?

I want to remove all white spaces from string variable which contains a sentence.
Here is my code:
string s = "This text contains white spaces";
string ns = s.Trim();
Variable "sn" should look like "Thistextcontainswhitespaces", but it doesn't(method s.Trim() isn't working). What am I missing or doing wrong?
The method Trim usually just removes whitespace from the begin and end of a string.
string s = " String surrounded with whitespace ";
string ns = s.Trim();
Will create this string: "String surrounded with whitespace"
To remove all spaces from a string use the Replace method:
string s = "This text contains white spaces";
string ns = s.Replace(" ", "");
This will create this string: "Thistextcontainswhitespaces"
Try this.
s= s.Replace(" ", String.Empty);
Or using Regex
s= Regex.Replace(s, #"\s+", String.Empty);

Regular expressions: extract all words out of quotes

By using Regular Expressions how can I extract all text in double quotes, and all words out of quotes in such string:
01AB "SET 001" IN SET "BACK" 09SS 76 "01 IN" SET
First regular expression should extract all text inside double quotes like
SET 001
BACK
01 IN
Second expression shoud extract all other words in string
01AB
IN
SET
09SS
76
SET
For the first case works fine ("(.*?)"). How can I extract all words out of quotes?
Try this expression:
(?:^|")([^"]*)(?:$|")
The groups matched by it will exclude the quotation marks, because they are enclosed in non-capturing parentheses (?: and ). Of course you need to escape the double-quotes for use in C# code.
If the target string starts and/or ends in a quoted value, this expression will match empty groups as well (for the initial and for the trailing quote).
Try this regex:
\"[^\"]*\"
Use Regex.Matches for texts in double quotes, and use Regex.Split for all other words:
var strInput = "01AB \"SET 001\" IN SET \"BACK\" 09SS 76 \"01 IN\" SET";
var otherWords = Regex.Split(strInput, "\"[^\"]*\"");
Maybe you can try replacing the words inside quotes with empty string like:
Regex r = new Regex("\".*?\"", RegexOptions.CultureInvariant | RegexOptions.Compiled | RegexOptions.Singleline);
string p = "01AB \"SET 001\" IN SET \"BACK\" 09SS 76 \"01 IN\" SET";
Console.Write(r.Replace(p, "").Replace(" "," "));
You need to negate the pattern in your first expression.
(?!pattern)
Check out this link.
If suggest you need all blocks of sentence - quoted and not ones - then there is more simple way to separate source string by using Regex.Split:
static Regex QuotedTextRegex = new Regex(#"("".*?"")", RegexOptions.IgnoreCase | RegexOptions.Compiled);
var result = QuotedTextRegex
.Split(sourceString)
.Select(v => new
{
value = v,
isQuoted = v.Length > 0 && v[0] == '\"'
});

Categories

Resources