Regex to allow some special character with unicodes - c#

I want to allow some special characters like (,),\,_,., etc
and emojis is denoted by [\u0000-\u007F]+
Valid names are
"🎊❀️\U0001f923\U0001f923πŸ˜ŠπŸ˜ŠπŸŽ‰πŸŽ‰πŸ˜πŸ˜"
"12333🎊❀️\U0001f923\U0001f923πŸ˜ŠπŸ˜ŠπŸŽ‰πŸŽ‰πŸ˜πŸ˜.txt"
"123()-213🎊❀️\U0001f923\U0001f923πŸ˜ŠπŸ˜ŠπŸŽ‰πŸŽ‰πŸ˜πŸ˜.txt"
Invalid specialcharacters should be replaced with ""
"123^&*()!##$🎊❀️\U0001f923\U0001f923πŸ˜ŠπŸ˜ŠπŸŽ‰πŸŽ‰πŸ˜πŸ˜" should be
"123()🎊❀️\U0001f923\U0001f923πŸ˜ŠπŸ˜ŠπŸŽ‰πŸŽ‰πŸ˜πŸ˜"
This regex does it for some special characters
string filename = "12%&^%^&% \U0001f973\U0001f973.xlsx"
string output = Regex.Replace(filename, #"[^\w\s\.\\[\]()|_-]+", "");
prints "12 .xlsx"
For unicode characters (like emojis)
string output = Regex.Replace(filename, #"[\u0000-\u007F]+", "");
prints"\U0001f973\U0001f973"
While combining i want
"12 \U0001f973\U0001f973.xlsx"
I have tried
Test 1
string output = Regex.Replace(filename, #"[^\w\s\.\\[\]()|_-]+|[^\u0000-\u007F]+", "");
"12 .xlsx" // but no luck
Test 2
string output = Regex.Replace(filename, #"[^-\w\s\.\\\[\]()|_\u0000-\u007F]+", "");
prints "";

You need the "opposite" of your unicode range in order to be able to add it to your negated character class. Try:
[^\u0080-\uFFFF\w\s\.\\[\]()|_-]+

Related

Capture substring within delimiters and excluding characters using regex

How could a regex pattern look like to capture a substring between 2 delimiters, but excluding some characters (if any) after first delimiter and before last delimiter (if any)?
The input string looks for instance like this:
var input = #"Not relevant {
#AddInfoStart Comment:String:=""This is a comment"";
AdditionalInfo:String:=""This is some additional info"" ;
# } also not relevant";
The capture should contain the substring between "{" and "}", but excluding any spaces, newlines and "#AddInfoStart" string after start delimiter "{" (just if any of them present), and also excluding any spaces, newlines and ";" and "#" characters before end delimiter "}" (also if any of them present).
The captured string should look like this
Comment:String:=""This is a comment"";
AdditionalInfo:String:=""This is some additional info""
It is possible that there are blanks before or after the ":" and ":=" internal delimiters, and also that the value after ":=" is not always marked as a string, for instance something like:
{ Val1 : Real := 1.7 }
For arrays is used the following syntax:
arr1 : ARRAY [1..5] OF INT := [2,5,44,555,11];
arr2 : ARRAY [1..3] OF REAL
This is my solution:
Remove the content outside the brackets
Use a regular expression to get the values inside the brackets
Code:
var input = #"Not relevant {
#AddInfoStart Comment:String:=""This is a comment"";
Val1 : Real := 1.7
AdditionalInfo:String:=""This is some additional info"" ;
# } also not relevant";
// remove content outside brackets
input = Regex.Replace(input, #".*\{", string.Empty);
input = Regex.Replace(input, #"\}.*", string.Empty);
string property = #"(\w+)";
string separator = #"\s*:\s*"; // ":" with or without whitespace
string type = #"(\w+)";
string equals = #"\s*:=\s*"; // ":=" with or without whitespace
string text = #"""?(.*?)"""; // value between ""
string number = #"(\d+(\.\d+)?)"; // number like 123 or with a . separator such as 1.45
string value = $"({text}|{number})"; // value can be a string or number
string pattern = $"{property}{separator}{type}{equals}{value}";
var result = Regex.Matches(input, pattern)
.Cast<Match>()
.Select(match => new
{
FullMatch = match.Groups[0].Value, // full match is always the 1st group
Property = match.Groups[1].Value,
Type = match.Groups[2].Value,
Value = match.Groups[3].Value
})
.ToList();

Regex for text between two characters

I'm trying to get some text between two strings in C# in regex expression.
The text is in variable (tb1.product_name) : Example Text | a:10,Colour:Green
Get all text before |, in this case, Example Text
Get all text between : and ,, in this case, 10
In two differents regex.
I try with:
Regex.Match(tb1.product_name, #"\:([^,]*)\)").Groups[1].Value
But this doesn't work.
If it is not so necessary to use regex, you can do this simply by using string.Substring & string.IndexOf:
string str = "Example Text | a:10,Colour:Green";
string strBeforeVerticalBar = str.Substring(0, str.IndexOf('|'));
string strInBetweenColonAndComma = str.Substring(str.IndexOf(':') + 1, str.IndexOf(',') - str.IndexOf(':') - 1);
Edit 1:
I feel Regex might be an overkill for something as simple as this. Also if use what i suggested, you can add Trim() at the end to remove whitespaces, if any. Like:
string strBeforeVerticalBar = str.Substring(0, str.IndexOf('|')).Trim();
string strInBetweenColonAndComma = str.Substring(str.IndexOf(':') + 1, str.IndexOf(',') - str.IndexOf(':') - 1).Trim();
string str = #"Example Text |a:10,Colour: Green";
Match match = Regex.Match(str, #"^([A-Za-z\s]*)|$");
Match match2= Regex.Match(str, #":([0-9]*),");
//output Example Text
Console.WriteLine(match.Groups[1].Value);
//output 10
Console.WriteLine(match2.Groups[1].Value);

Regex replace special characters defind by client

I need a c# function which will replace all special characters customized by the client from a string Example
string value1 = #"β€ΉΒ₯ó׬¢ÝÆ";
string input1 = #"ThiΒ₯s is\123a strΓ†ing";
string output1 = Regex.Replace(input1, value1, "");
I want have a result like this : output1 =Thi s is\123a str ing
Why do you need regex? This is more efficient, concise also readable:
string result = string.Concat(input1.Except(value1));
If you don't want to remove but replace them with a different string you can still use a similar(but not as efficient) approach:
string replacement = "[foo]";
var newChars = input1.SelectMany(c => value1.Contains(c) ? replacement : c.ToString());
string result = string.Concat( newChars ); // Thi[foo]s is\123a str[foo]ing
Someone asked for a regex?
string value1 = #"^\-[]β€ΉΒ₯ó׬¢ÝÆ";
string input1 = #"T-^\hiΒ₯s is\123a strΓ†ing";
// Handles ]^-\ by escaping them
string value1b = Regex.Replace(value1, #"([\]\^\-\\])", #"\$1");
// Creates a [...] regex and uses it
string input1b = Regex.Replace(input1, "[" + value1b + "]", " ");
The basic idea is to use a [...] regex. But first you have to escape some characters that have special meaning inside a [...]. They should be ]^-\ Note that you don't need to escape the [
note that this solution isn't compatible with non-BMP unicode characters (characters that fill-up two char)
A solution that is compatible with them is more complex, but for normal use it shouldn't be a problem.

Visual c# Replace special characters and white space from a string

I want to replace the white space and special characters with a hyphen.
I want to all the non-letters characters with a hyphen like ?,(,),{,},[,],<,>,",',!,#<# etc
This would do all non-alpha-numeric and non-whitespace characters:
var input = "this i$ s#m3 inp^t";
var replaced = Regex.Replace(input, #"[^\d\w\s]","-");
Console.WriteLine(replaced);
// Output: this i- s-m3 inp-t
Depending on how you define "special characters", you can just do:
yourString = Regex.Replace(yourString,#"\W","-");

Remove non-alphanumerical characters excluding space

I have this statement:
String cap = Regex.Replace(winCaption, #"[^\w\.#-]", "");
that transforms "Hello | World!?" to "HelloWorld".
But I want to preserve space character, for example: "Hello | World!?" to "HelloΒ Β World".
How can I do this?
just add a space to your set of characters, [^\w.#- ]
var winCaption = "Hello | World!?";
String cap = Regex.Replace(winCaption, #"[^\w\.#\- ]", "");
Note that you have to escape the 'dash' (-) character since it normally is used to denote a range of characters (for instance, [A-Za-z0-9])
Here you go...
string cap = Regex.Replace(winCaption, #"[^\w \.#-]", "");
Try this:
String cap= Regex.Replace(winCaption, #"[^\w\.#\- ]", "");

Categories

Resources