Remove whitespace near a character using regex in a long text - c#

How do I delete one or mores white spaces near a character in a long text. I do not want to remove other white spaces which are not present adjacent to the matching string. I only want to remove all white spaces next to the matching character and not all white spaces of the input string. For example:
[text][space][space]![space][text] should result in [text]![text]
[text][space][space]![space][space][space][text] should result in [text]![text]
[text][space]![space][space][text] should result in [text]![text]
[text][space]![space][text] should result in [text]![text]
[text]![space][space][text] should result in [text]![text]
[text][space][space]![text] should result in [text]![text]
[text][space][space]! should result in [text]!
![space][space][text] should result in ![text]
The code I am going to write is:
for (int i = 0 to length of string)
{
if (string[i] == character) //which is the desired character "!"
{
int location = i+1;
//remove all whitespace after the character till a non-whitespace character
//is found or string ends
while (string[location] == whitespace)
{
string[location].replace(" ", "");
location++;
}
int location = i-1;
//remove all whitespace before the character till a non-whitespace character
//is found or string ends
while (string[location] == whitespace)
{
string[location].replace(" ", "");
location--;
}
}
}
Is there a better way of removing whitespaces near a character using Regex?
UPDATE: I do not want to remove other white spaces which are not present adjacent to the matching string. For example:
some_text[space]some_other_text[space][space]![space]some_text[space]some_other_text
is
some_text[space]some_other_text!some_text[space]some_other_text

Regex rgx = new Regex(pattern);
string input = "This is text with far too much " +
"whitespace.";
string pattern = "\\s*!\\s*";
string replacement = "!";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
taken from http://msdn.microsoft.com/de-de/library/vstudio/xwewhkd1.aspx

Related

Regex to match last character in string - C#

How can I remove the last ';' in a string?
In case of comment at the end of the string I need to return the ';' before the comment.
Example:
"line 1 //comment
line2;
extra text; //comment may also contain ;."
You didn't wrote what you wanna do with the character, so I give you a solution here that replaces the character:
string pattern = "(?<!//.*);(?=[^;]*(//|$))";
Console.WriteLine(Regex.Replace("line 1 //comment", pattern, "#"));
Console.WriteLine(Regex.Replace("line2;", pattern, "#"));
Console.WriteLine(Regex.Replace("extra; text; //comment may also contain ;.", pattern, "#"));
Output:
line 1 //comment
line2#
extra; text# //comment may also contain ;.
This is slightly ugly with Regex, but here it is:
var str = #"line 1 //comment
line2; test;
extra text; //comment may also contain ;.";
var matches = Regex.Matches(str, #"^(?:(?<!//).)+(;)", RegexOptions.Multiline);
if (matches.Count > 0)
{
Console.WriteLine(matches[matches.Count - 1].Groups[1].Index);
}
We get a match for the last semicolon in each line (that isn't preceded by a comment), then we look at the last of these matches.
We have to do this on a line-by-line basis, as comments apply for the whole line.
If you want to process each line individually (your question doesn't say this, but it implies it), then loop over matches instead of just looking at the last one.
If you want to replace each semicolon with another character, then you can do something like this:
const string replacement = "#";
var result = Regex.Replace(str, #"^((?:(?<!//).)+);", "$1" + replacement, RegexOptions.Multiline);
If you want to remove it entirely, then simply:
var result = Regex.Replace(str, #"^((?:(?<!//).)+);", "$1", RegexOptions.Multiline);
If you just want to remove the final semicolon in the entire string, then you can just use string.Remove:
var matches = Regex.Matches(str, #"^(?:(?<!//).)+(;)", RegexOptions.Multiline);
if (matches.Count > 0)
{
str = str.Remove(matches[matches.Count - 1].Groups[1].Index, 1);
}

Regex replace Windows line break characters

I have this bit of code, which is supposed to replace the Windows linebreak character (\r\n) with an empty character.
However, it does not seem to replace anything, as if I view the string after the regex expression is applied to it, the linebreak characters are still there.
private void SetLocationsAddressOrGPSLocation(Location location, string locationString)
{
//Regex check for the characters a-z|A-Z.
//Remove any \r\n characters (Windows Newline characters)
locationString = Regex.Replace(locationString, #"[\\r\\n]", "");
int test = Regex.Matches(locationString, #"[\\r\\n]").Count; //Curiously, this outputs 0
int characterCount = Regex.Matches(locationString,#"[a-zA-Z]").Count;
//If there were characters, set the location's address to the locationString
if (characterCount > 0)
{
location.address = locationString;
}
//Otherwise, set the location's coordinates to the locationString.
else
{
location.coordinates = locationString;
}
} //End void SetLocationsAddressOrGPSLocation()
You are using verbatim string literal, thus \\ is treated as a literal \.
So, your regex is actually matching \, r and n.
Use
locationString = Regex.Replace(locationString, #"[\r\n]+", "");
This [\r\n]+ pattern will make sure you will remove each and every \r and \n symbol, and you won't have to worry if you have a mix of newline characters in your file. (Sometimes, I have both \n and \r\n endings in text files).

REGEX: What the meaning of . followed by +?

Sorry to ask this question, but I'm really stuck. This code belongs to someone who already left the company. And it causing problem.
protected override string CleanDataLine(string line)
{
//the regular expression for GlobalSight log
Regex regex = new Regex("\".+\"");
Match match = regex.Match(line);
if (match.Success)
{
string matchPart = match.Value;
matchPart =
matchPart.Replace(string.Format("\"{0}\"",
Delimiter), string.Format("\"{0}\"", "*+*+"));
matchPart = matchPart.Replace(Delimiter, '_');
matchPart =
matchPart.Replace(string.Format("\"{0}\"", "*+*+"),
string.Format("\"{0}\"", Delimiter));
line = line.Replace(match.Value, matchPart);
}
return line;
}
I've spent to much time researching. What was he trying to accomplish?
Thanks for helping.
That regex matches
a quote ("),
followed by one or more (+) characters (any character except newlines (.), as many as possible),
followed by a quote ".
It's not a very good regex. For example, in the string foo "bar" baz "bam" boom, it will match "bar" baz "bam".
If the intention is to match a quoted string, a more appropriate regex would be "[^"]*".
. is any character except \n, + means 1 or more.
So: .+ is "1 or more characters"
The dot matches any character except line breaks.
+ is "one or more" (equal to {1,})
protected override string CleanDataLine(string line)
{
//the regular expression for GlobalSight log
Regex regex = new Regex("\".+\"");
Match match = regex.Match(line);
if (match.Success)
{
string matchPart = match.Value;
matchPart =
matchPart.Replace(string.Format("\"{0}\"",
Delimiter), string.Format("\"{0}\"", "*+*+"));
matchPart = matchPart.Replace(Delimiter, '_');
matchPart =
matchPart.Replace(string.Format("\"{0}\"", "*+*+"),
string.Format("\"{0}\"", Delimiter));
line = line.Replace(match.Value, matchPart);
}
return line;
}
line is just some text, could be Hello World, or anything really.
new Regex("\".+\"") the \" is an escaped quote, this means it's actually looking for a string to start with a double quote. .+ means to find any character not including the new-line character one or more times.
If it does match, then he tries to figure out the part that matched by grabbing the value.
It then just becomes a normal search and replace for whatever string was matched.

How to replace the text between two characters in c#

I am bit confused writing the regex for finding the Text between the two delimiters { } and replace the text with another text in c#,how to replace?
I tried this.
StreamReader sr = new StreamReader(#"C:abc.txt");
string line;
line = sr.ReadLine();
while (line != null)
{
if (line.StartsWith("<"))
{
if (line.IndexOf('{') == 29)
{
string s = line;
int start = s.IndexOf("{");
int end = s.IndexOf("}");
string result = s.Substring(start+1, end - start - 1);
}
}
//write the lie to console window
Console.Write Line(line);
//Read the next line
line = sr.ReadLine();
}
//close the file
sr.Close();
Console.ReadLine();
I want replace the found text(result) with another text.
Use Regex with pattern: \{([^\}]+)\}
Regex yourRegex = new Regex(#"\{([^\}]+)\}");
string result = yourRegex.Replace(yourString, "anyReplacement");
string s = "data{value here} data";
int start = s.IndexOf("{");
int end = s.IndexOf("}", start);
string result = s.Substring(start+1, end - start - 1);
s = s.Replace(result, "your replacement value");
To get the string between the parentheses to be replaced, use the Regex pattern
string errString = "This {match here} uses 3 other {match here} to {match here} the {match here}ation";
string toReplace = Regex.Match(errString, #"\{([^\}]+)\}").Groups[1].Value;
Console.WriteLine(toReplace); // prints 'match here'
To then replace the text found you can simply use the Replace method as follows:
string correctString = errString.Replace(toReplace, "document");
Explanation of the Regex pattern:
\{ # Escaped curly parentheses, means "starts with a '{' character"
( # Parentheses in a regex mean "put (capture) the stuff
# in between into the Groups array"
[^}] # Any character that is not a '}' character
* # Zero or more occurrences of the aforementioned "non '}' char"
) # Close the capturing group
\} # "Ends with a '}' character"
The following regular expression will match the criteria you specified:
string pattern = #"^(\<.{27})(\{[^}]*\})(.*)";
The following would perform a replace:
string result = Regex.Replace(input, pattern, "$1 REPLACE $3");
For the input: "<012345678901234567890123456{sdfsdfsdf}sadfsdf" this gives the output "<012345678901234567890123456 REPLACE sadfsdf"
You need two calls to Substring(), rather than one: One to get textBefore, the other to get textAfter, and then you concatenate those with your replacement.
int start = s.IndexOf("{");
int end = s.IndexOf("}");
//I skip the check that end is valid too avoid clutter
string textBefore = s.Substring(0, start);
string textAfter = s.Substring(end+1);
string replacedText = textBefore + newText + textAfter;
If you want to keep the braces, you need a small adjustment:
int start = s.IndexOf("{");
int end = s.IndexOf("}");
string textBefore = s.Substring(0, start-1);
string textAfter = s.Substring(end);
string replacedText = textBefore + newText + textAfter;
the simplest way is to use split method if you want to avoid any regex .. this is an aproach :
string s = "sometext {getthis}";
string result= s.Split(new char[] { '{', '}' })[1];
You can use the Regex expression that some others have already posted, or you can use a more advanced Regex that uses balancing groups to make sure the opening { is balanced by a closing }.
That expression is then (?<BRACE>\{)([^\}]*)(?<-BRACE>\})
You can test this expression online at RegexHero.
You simply match your input string with this Regex pattern, then use the replace methods of Regex, for instance:
var result = Regex.Replace(input, "(?<BRACE>\{)([^\}]*)(?<-BRACE>\})", textToReplaceWith);
For more C# Regex Replace examples, see http://www.dotnetperls.com/regex-replace.

Replace special character with white space through regex

I have a function which replace character.
public static string Replace(string value)
{
value = Regex.Replace(value, "[\n\r\t]", " ");
return value;
}
value="abc\nbcd abcd abcd\ " if in string there is any unwanted white space they are also remove.Means I want result like this
value="abcabcdabcd".Help to change Regex Pattern to get desire result.Thanks a lot.
If you need to remove any number of whitespace characters from the string, probably you're looking for something like this:
value = Regex.Replace(value, #"\s+", "");
where \s matches any whitespace character and + means one or more times.
Instead of replacing your newline, tab, etc. characters with a space, just replace all whitespace with nothing:
public static string RemoveWhitespace(string value)
{
return Regex.Replace(value, "\\s", "");
}
\s is a special character group that matches all whitespace characters. (The backslash is doubled because the backslash has a special meaning in C# strings as well.) The following MSDN link contains the exact definition of that character group:
Character Classes: White-Space Character: \s
You may want to try \s indicating white spaces. With the statement Regex.Replace(value, #"\s", ""), the output will be "abcabcdabcd".

Categories

Resources