C# - Removing single word in string after certain character - c#

I have string that I would like to remove any word following a "\", whether in the middle or at the end, such as:
testing a\determiner checking test one\pronoun
desired result:
testing a checking test one
I have tried a simple regex that removes anything between the backslash and whitespace, but it gives the following result:
string input = "testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*\s");
string output = regex.Replace(input, " ");
Result:
testing a one\pronoun
It looks like this regex matches from the backslash until the last whitespace in the string. I cannot seem to figure out how to match from the backlash to the next whitespace. Also, I am not guaranteed a whitespace at the end, so I would need to handle that. I could continue processing the string and remove any text after the backslash, but I was hoping I could handle both cases with one step.
Any advice would be appreciated.

Change .* which match any characters, to \w*, which only match word characters.
Regex regex = new Regex(#"\\\w*");
string output = regex.Replace(input, "");

".*" matches zero or more characters of any kind. Consider using "\w+" instead, which matches one or more "word" characters (not including whitespace).
Using "+" instead of "*" would allow a backslash followed by a non-"word" character to remain unmatched. For example, no matches would be found in the sentence "Sometimes I experience \ an uncontrollable compulsion \ to intersperse backslash \ characters throughout my sentences!"

With your current pattern, .* tells the parser to be "greedy," that is, to take as much of the string as possible until it hits a space. Adding a ? right after that * tells it instead to make the capture as small as possible--to stop as soon as it hits the first space.
Next, you want to end at not just a space, but at either a space or the end of the string. The $ symbol captures the end of the string, and | means or. Group those together using parentheses and your group collectively tells the parser to stop at either a space or the end of the string. Your code will look like this:
string input = #"testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*?(\s|$)");
string output = regex.Replace(input, " ");

Try this regex (\\[^\s]*)
(\\[^\s]*)
1st Capturing group (\\[^\s]*)
\\ matches the character \ literally
[^\s]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ].

Related

Regex string between two strings includes brakets

I struggle with some regex expression.
Test string is e.g.
Path.parent.child[0]
I need to extract the "parent"
I always know the start "Path" and end "child[0]" of the test string. End can also be "child[1]" or without "[]"
I try it with: (?<=Path.).*?(?=.child[0])
But it not find a match. I think the [] in the child string is the problem.
A regex which might work is ^[^.]*\.(.*)\.[^.]*\[?\d?\]?$
^ beginning of the line
[^.]* anything except a dot
\. a literal dot
(.*) anything, captured as a group
\. a literal dot again
[^.]* anything except a dot
\[?\d?\]? optional brackets and index. Could be (\[\d\])? if you don't mind another group.
$ end of the line
Check it on Regex101.com
But, if you always know the start and the end, you could switch to a non-Regex approach:
var start = "Path";
var end = "child[0]";
var s = start+".parent."+end;
// +1 for the first dot, -2 for two dots
var extracted = s.Substring(start.Length+1, s.Length-start.Length-end.Length-2);
Console.WriteLine(extracted);

How to match words that doesn't start nor end with certain characters using Regex?

I want to find word matches that doesn't start nor end with some specific characters.
For example, I have this input and I only want to match the highlighted word:
"string" string 'string'
And exclude other words that start and end with either " or '.
I am currently using this pattern:
But I do not know what pattern I should use that would exclude words that start and end with some specified characters.
Can some one give me some advice on what pattern I should use? Thank you
The pattern you're currently using matches since \b properly asserts the positions between "s and g" (a position between a word character [a-zA-Z0-9_] and a non-word character). You can use one of the following methods:
Negate specific characters (negative lookbehind/lookahead)
This method allows you to specify a character, set of characters, or substring to negate from a match.
(?<!['"])\bstring\b(?!['"]) - see it in use here
(?<!['"]) - ensure " doesn't precede.
(?!['"]) - ensure " doesn't proceeds.
Allow specific characters (positive lookbehind/lookahead)
This method allows you to specify a character, set of characters, or substring to ensure match.
(?<=\s|^)\bstring\b(?=\s|$) - see it in use here
(?<=\s|^) - ensure whitespace or the beginning of the line precedes.
(?=\s|$) - ensure whitespace or the end of the line proceeds.
A combination of both above
This method allows you to negate specific cases while allowing others (not commonly used and not really needed for the problem presented, but may be useful to you or others.
Something like (?<=\s|^)string(?!\s+(?!stop)|$) would ensure the word isn't followed by the word stop
Something like (?<=(?<!stop\s*)\s+|^)string(?=\s+|$) would ensure the word doesn't follow the word stop - note that quantifiers (\s+) in lookbehinds are not allowed in most regex engines, .NET allows it.
Something like (?<=\s|^)\bstring\b(?=\s|$)(?!\z) would ensure a the word isn't at the end of the string (different from end of line if multi-line).
This regex will pick string if it is between spaces: \sstring\s
var sample = "\"string\" string \"string\" astring 'string_ string?string string ";
var regx = new Regex(#"\sstring\s");
var matches = regx.Matches(sample);
foreach (Match mt in matches)
{
Console.WriteLine($"{mt.Value} {mt.Index,3} {mt.Length,3}");
}

How to match exactly one or more characters inside boundary

Currently i using this pattern: [HelloWorld]{1,}.
So if my input is: Hello -> It will be match.
But if my input is WorldHello -> Still match but not right.
So how to make input string must match exactly will value inside pattern?
Just get rid of the square brackets, and the comma and you're good to go!
HelloWorld{1}
In regex what's between square brackets is a character set.
So [HelloWorld] matches 1 character that's in the set [edlorHW].
And .{1,} or .+ both match 1 or more characters.
What you probably want is the literal word.
So the regex would simple be "HelloWorld".
That would match HelloWord in the string "blaHelloWorldbla".
If you want the word to be a single word, and not part of a word?
Then you could use wordboundaries \b, which indicate the transition between a word character (\w = [A-Za-z0-9_]) and a non-word character (\W = [^A-Za-z0-9_]) or the beginning of a line ^ or the end of a line $.
For example #"\bHelloWorld\b" to get a match from "bla HelloWorld bla" but not from "blaHelloWorldbla".
Note that the regex string this time was proceeded by #.
Because by using a verbatim string the backslashes don't have to be backslashed.
it seems you need to use online regex tester web sites to check your pattern. for example you could find one of them here and also you could study c# regex reference here
Try this pattern:
[a-zA-Z]{1,}
You can test it online

Regex - How to replace a certain word given a few starting letters

I have the following string with me - "ct lungs, mediastinum". Now I want to do a Regex.Replace such that word starting with the letters "media" in the expression is converted to "chest".
So, the following strings should be converted to "ct chest no contrast" -
"ct media no contrast"
"ct medias no contrast"
"ct mediastin no contrast"
etc.
I wrote
Regex.Replace(myString,#"\bmedia.*\b"," chest ")
but this is taking everything after "media" and "media" included and changing it to "chest". So, if I use the above on the given example then the words "no contrast" are lost. What can I do to only replace the word starting with "media" to "chest" and leave everything after that in the string as it is?
Thanks a lot!
The .* is greedy, meaning it will try to take match as many characters as possible. You can make it match as few as possible by using .*? instead.
Regex.Replace(myString,#"\bmedia.*?\b"," chest ")
string text = Regex.Replace( inputString, #"media\w*", "chest" , RegexOptions.None );
This means replace media + 0 or more matches of any word character with chest.
You may want to use:
\bmedia\w*
\b means word boundary, so you will only do it if the word starts with media
In regex \S* means non whitespace character zero or more times. So try with this one:
Regex.Replace(myString,#"\bmedia\S*"," chest ")
^^
You can switch the \S* into [a-zA-Z]* if you want to allow only alphabets.
Make your * quantifier non-greedy by following it with ?. This means it will stop consuming at the first word boundary it finds, not the last one (the end of the string).
Without Regex, maybe something like this?
Dim tempList = myString.Split(" ").ToList()
tempList.Where(s => s == "media").ToList().ForEach(i => i = "chest")
Dim myString = String.Join(" ", tempList)

Replace with wildcards

I need some advice. Suppose I have the following string: Read Variable
I want to find all pieces of text like this in a string and make all of them like the following:Variable = MessageBox.Show. So as aditional examples:
"Read Dog" --> "Dog = MessageBox.Show"
"Read Cat" --> "Cat = MessageBox.Show"
Can you help me? I need a fast advice using RegEx in C#. I think it is a job involving wildcards, but I do not know how to use them very well... Also, I need this for a school project tomorrow... Thanks!
Edit: This is what I have done so far and it does not work: Regex.Replace(String, "Read ", " = Messagebox.Show").
You can do this
string ns= Regex.Replace(yourString,"Read\s+(.*?)(?:\s|$)","$1 = MessageBox.Show");
\s+ matches 1 to many space characters
(.*?)(?:\s|$) matches 0 to many characters till the first space (i.e \s) or till the end of the string is reached(i.e $)
$1 represents the first captured group i.e (.*?)
You might want to clarify your question... but here goes:
If you want to match the next word after "Read " in regex, use Read (\w*) where \w is the word character class and * is the greedy match operator.
If you want to match everything after "Read " in regex, use Read (.*)$ where . will match all characters and $ means end of line.
With either regex, you can use a replace of $1 = MessageBox.Show as $1 will reference the first matched group (which was denoted by the parenthesis).
Complete code:
replacedString = Regex.Replace(inStr, #"Read (.*)$", "$1 = MessageBox.Show");
The problem with your attempt is, that it cannot know that the replacement string should be inserted after your variable. Let's assume that valid variable names contain letters, digits and underscores (which can be conveniently matched with \w). That means, any other character ends the variable name. Then you could match the variable name, capture it (using parentheses) and put it in the replacement string with $1:
output = Regex.Replace(input, #"Read\s+(\w+)", "$1 = MessageBox.Show");
Note that \s+ matches one or more arbitrary whitespace characters. \w+ matches one or more letters, digits and underscores. If you want to restrict variable names to letters only, this is the place to change it:
output = Regex.Replace(input, #"Read\s+([a-zA-Z]+)", "$1 = MessageBox.Show");
Here is a good tutorial.
Finally note, that in C# it is advisable to write regular expressions as verbatim strings (#"..."). Otherwise, you will have to double escape everything, so that the backslashes get through to the regex engine, and that really lessens the readability of the regex.

Categories

Resources