Regex & C#: Replace all Special Characters except Emojis - c#

I need to replace all special characters in a string except the following (which includes alphabetic characters):
:)
:P
;)
:D
:(
This is what I have now:
string input = "Hi there!!! :)";
string output = Regex.Replace(input, "[^0-9a-zA-Z]+", "");
This replaces all special characters. How can I modify this to not replace mentioned characters (emojis) but replace any other special character?

You may use a known technique: match and capture what you need and match only what you want to remove, and replace with the backreference to Group 1:
(:(?:[D()P])|;\))|[^0-9a-zA-Z\s]
Replace with $1. Note I added \s to the character class, but in case you do not need spaces, remove it.
See the regex demo
Pattern explanation:
(:(?:[D()P])|;\)) - Group 1 (what we need to keep):
:(?:[D()P]) - a : followed with either D, (, ) or P
| - or
;\) - a ;) substring
(here, you may extend the capture group with more |-separated branches).
| - or ...
[^0-9a-zA-Z\s] - match any char other than ASCII digits, letters (and whitespace, but as I mentioned, you may remove \s if you do not need to keep spaces).

I would use a RegEx to match all emojis and select them out of the text
string input = "Hi there!!! :)";
string output = string.Concat(Regex.Matches(input, "[;|:][D|P|)|(]+").Cast<Match>().Select(x => x.Value));
Pattern [;|:][D|P|)|(]+
[;|:] starts with : or ;
[D|P|)|(] ends with D, P, ) or (
+ one or more

Related

Regular expression : Excluding last part

I'm looking to apply a regular expression to an input string.
Regular expression:(.*)\\(.*)_(.*)_(.*)-([0-9]{4}).*
Test entries:
Parkman\L9\B137598_00_T-3298-B
Parkman\L9\B137598_00_T-3298
The result should be B137598_00_T-3298 for both test entries. The problem is that if I add 4 digits in the test entries, the result will be, for example, B137598_00_T-3298-5555.
What I need here is that anything after the 3298 should not be taken into account.
What are the changes that I can perform to make that possible
You can use a single capture group with a bit more specific pattern:
\w\\\w+\\((?:[^\W_]+_){2}[^\W_]+-[0-9]{4})\b
The pattern matches:
\w Match a single word char
\\\w+\\ Match 1+ word chars between backslashes
( Capture group 1
(?:[^\W_]+_){2} Repeat 2 times word chars without _ followed by a single _
[^\W_]+- Match 1+ word chars without _ and then -
-[0-9]{4} Match - and 4 digits
) Close group 1
\b A word boundary
Regex demo
Or a bit broader pattern with a match only, where \w also matches an underscore, and asserting \ to the left:
(?<=\\)\w+-[0-9]{4}\b
Regex demo
c# code:
string s1 = #"Parkman\\L9\\B137598_00_T-3298-B";
string s2 = #"Parkman\L9\B137598_00_T-3298";
string pattern = #"\w+_[0-9]{2}_T-[0-9]{4}";
var match = Regex.Matches( s1, pattern);
Console.WriteLine("s1: {0}", match[0]);
match = Regex.Matches(s2, pattern);
Console.WriteLine("s2: {0}" , match[0]);
then the result:
s1: B137598_00_T-3298
s2: B137598_00_T-3298

C# Regex - starts with pattern1 not contain pattern2

for the following input string contains all of these:
a1.aaa[SUBSCRIBED]
a1.bbb
a1.ccc
b1.ddd
d1.ddd[SUBSCRIBED]
I want to get the output:
bbb
ccc
which means: all the words that come after "a1." And not contain the substring "[SUBSCRIBED]"
all the words comes after "a1." And not contains the substring
"[SUBSCRIBED]"
Why regex? Following is crystal clear:
var result = strings
.Where(s => s.StartsWith("a1.") && !s.Contains("[SUBSCRIBED]"))
.Select(s => s.Substring(3));
Tim's answer makes sense. However if you insist on it I would venture that a Regex would look like this though.
^a1\.(.*)(?<!\[SUBSCRIBED\])$
with ^a1 meaning starts with a1
\.(.*) taking any number of character
and the negative lookbehind (?<!\[SUBSCRIBED\])$ would refuse text ending with [SUBSCRIBED]
You may use
^a1\.(?!.*\[SUBSCRIBED])(.*)
See the regex demo.
Details
^ - start of string
a1\. - a literal a1. substring
(?!.*\[SUBSCRIBED]) - a negative lookahead that fails the match if there is a [SUBSCRIBED] substring is present after any 0+ chars (other than newline if the RegexOptions.Singleline option is not used)
(.*) - Group 1: the rest of the line up to the end (if you use RegexOptions.Singleline option, . will match newlines as well).
C# code:
var result = string.Empty;
var m = Regex.Match(s, #"^a1\.(?!.*\[SUBSCRIBED])(.*)");
if (m.Success)
{
result = m.Groups[1].Value;
}

I think my regular expression pattern in C# is incorrect

I'm checking to see if my regular expression matches my string.
I have a filename that looks like somename_somthing.txt and I want to match it to somename_*.txt, but my code is failing when I try to pass something that should match. Here is my code.
string pattern = "somename_*.txt";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
using (ZipFile zipFile = ZipFile.Read(fullPath))
{
foreach (ZipEntry e in zipFile)
{
Match m = r.Match("somename_something.txt");
if (!m.Success)
{
throw new FileNotFoundException("A filename with format: " + pattern + " not found.");
}
}
}
The asterisk is matching the underscore and throwing it off.
Try:
somename_(\w+).txt
The (\w+) here will match the group at this location.
You can see it match here: https://regex101.com/r/qS8wA5/1
In General
Regex give in this code matches the _ with an * meaning zero or more underscores instead of what you intended. The * is used to denote zero or more of the previous item. Instead try
^somename_(.*)\.txt$
This matches exactly the first part "somename_".
Then anything (.*)
And finally the end ".txt". The backslash escapes the 'dot'.
More Specific
You can also say if you only want letters and not numbers or symbols in the middle part of the match with:
^somename_[a-z]*\.txt$
As written, your regular expression
somename_*.txt
matches (in a case-insensitive manner):
the literal text somename, followed by
zero or more underscore characters (_), followed
any character (other than newline), followed
the literal text txt
And it will match that anywhere in the source text. You probably want to write something like
Regex myPattern = new Regex( #"
^ # anchor the match to start-of-text, followed by
somename # the literal 'somename', followed by
_ # a literal underscore character, followed by
.* # zero or of any character (except newline), followed by
\. # a literal period/fullstop, followed by
txt # the literal text 'txt'
$ # with the match anchored at end-of-text
" , RegexOptions.IgnoreCase|RegexOptions.IgnorePatternWhitespace
) ;
Hi I think the pattern should be
string pattern = "somename_.*\\.txt";
Regards

How to replace words following certain character and extract rest with REGEX

Assume that i have the following sentence
select PathSquares from tblPathFinding where RouteId=470
and StartingSquareId=267 and ExitSquareId=13
Now i want to replace words followed by = and get the rest of the sentence
Lets say i want to replace following word of = with %
Words are separated with space character
So this sentence would become
select PathSquares from tblPathFinding where RouteId=%
and StartingSquareId=% and ExitSquareId=%
With which regex i can achieve this ?
.net 4.5 C#
Use a lookbehind to match all the non-space or word chars which are just after to = symbol . Replacing the matched chars with % wiil give you the desired output.
#"(?<==)\S+"
OR
#"(?<==)\w+"
Replacement string:
%
DEMO
string str = #"select PathSquares from tblPathFinding where RouteId=470
and StartingSquareId=267 and ExitSquareId=13";
string result = Regex.Replace(str, #"(?<==)\S+", "%");
Console.WriteLine(result);
IDEONE
Explanation:
(?<==) Asserts that the match must be preceded by an = symbol.
\w+ If yes, then match the following one or more word characters.

How can I replace specific word in c# with parenthesis?

Consider the following string:
string s = "The man is (old).";
If I use:
Regex.Replace(s,#"\b\(old\)\b", #"<b>$&</b>");
The output is :
The man is (old).
But I would change the whole of the (old) word like this:
The man is (old).
How can I do this?
\b won't match because ( and ) are not word characters. Is there a reason why you put them there, because you could just leave them out:
string replaced = Regex.Replace(s,#"\(old\)", #"<b>$&</b>");
According to the specs:
\b : The match must occur on a boundary between a \w (alphanumeric) and a \W (nonalphanumeric) character.
-space- and ) are both nonalphanumeric. The same for ( and ., so \b won't match in both cases.
You might not even need a regex... try
string result = s.Replace("(old)", "<b>(old)</b>");
or
string result = s.Replace("(", "<b>(").Replace(")", ")</b>");

Categories

Resources