How do I remove multiple offending characters from my string? [duplicate]

How do I remove multiple offending characters from my string? [duplicate] - c#

This question already has answers here:
Remove characters from C# string
(22 answers)
Closed 9 years ago.
Here's my working code:
string Input;
string Output;
Input = data;
Output = Input.Replace(#")", "");
Here, I am simply removing the parentheses ")" from my string, if it exists. Now how do I expand the list of offending characters like ")" to include "(" and "-" as well?
I realize I can write 2 more Output-like statements, but I'm wondering if there is a better way...

If you're just doing a couple replacements (I see you're only doing three), the easiest way without worrying about Regex or StringBuilders is to chain three Replace calls into one statement:
Output = Input.Replace("(", "").Replace(")", "").Replace("-", "");
... which is marginally better than storing the result in Output every time.

Output = Regex.Replace(Input, "[()-]", "");
The [] characters in the expression create a character class. It doesn't match those character directly.

LINQ solution:
Output = new String(Input.Except("()-").ToArray());

As an alternative to Regex, it may be easier to manage as a collection of replacements and doing the replaces using a StringBuilder.
var replacements = new[] { ")", "-" };
var output = new StringBuilder(Input);
foreach (var r in replacements)
output.Replace(r, string.Empty);

You can use Regex.Replace(), documented here.

You can use a List which contains your badwords. Now just use a foreach loop to iterate over it and replace every bad string.
StringBuilder output = new StringBuilder("(Hello) W,o.r;ld");
List<string> badwords = new List<string>();
badwords.Add("(");
badwords.Add(")");
badwords.Add(",");
badwords.Add(".");
badwords.Add(";");
badwords.ForEach(bad => output = output.Replace(bad, String.Empty));
//Result "Hello World"
Kind regards.
//Edit:
Implemented changes suggested by Khan.

This will allow you to do same thing also
private static string ReplaceBadWords(string[] BadStrings, string input)
{
StringBuilder sb = new StringBuilder(input);
BadStrings.ToList().ForEach(b =>
{
if(b != "")
{
sb = sb.Replace(b, string.Empty);
}
});
return sb.ToString();
}
Sample usage would be
string[] BadStrings = new string[]
{
")",
"(",
"random",
""
};
string input = "Some random text()";
string output = ReplaceBadWords(BadStrings, input);

I'd probably use a regular expression as it's terse and to the point. If you're scared of regular expression, you can teach the computer to write them for you. Here's a simple class for cleaning strings: you just provide it with a list of invalid characters:
class StringCleaner
{
private Regex regex ;
public StringCleaner( string invalidChars ) : this ( (IEnumerable<char>) invalidChars )
{
return ;
}
public StringCleaner ( params char[] invalidChars ) : this( (IEnumerable<char>) invalidChars )
{
return ;
}
public StringCleaner( IEnumerable<char> invalidChars )
{
const string HEX = "0123456789ABCDEF" ;
SortedSet<char> charSet = new SortedSet<char>( invalidChars ) ;
StringBuilder sb = new StringBuilder( 2 + 6*charset.Count ) ;
sb.Append('[') ;
foreach ( ushort c in charSet )
{
sb.Append(#"\u" )
.Append( HEX[ ( c >> 12 ) & 0x000F ] )
.Append( HEX[ ( c >> 8 ) & 0x000F ] )
.Append( HEX[ ( c >> 4 ) & 0x000F ] )
.Append( HEX[ ( c >> 0 ) & 0x000F ] )
;
}
sb.Append(']') ;
this.regex = new Regex( sb.ToString() ) ;
}
public string Clean( string s )
{
if ( string.IsNullOrEmpty(s) ) return s ;
string value = this.regex.Replace(s,"") ;
return value ;
}
}
Once you have that, it's easy:
static void Main(string[] args)
{
StringCleaner cleaner = new StringCleaner( "aeiou" ) ;
string dirty = "The quick brown fox jumped over the lazy dog." ;
string clean = cleaner.Clean(dirty) ;
Console.WriteLine( clean ) ;
return;
}
At the end of which clean is Th qck brwn fx jmpd vr th lzy dg.
Easy!

Related

Remove specific characters except last

I have a text string and I want to replace the dots with underscores except for the last character found in the string.
Example:
input = "video.coffee.example.mp4"
result = "video_coffe_example.mp4"
I have a code but this replaces everything including the last character
first option failed
static string replaceForUnderScore(string file)
{
return file = file.Replace(".", "_");
}
I implemented a second option that works for me but I find that it is very extensive and not very optimized
static string replaceForUnderScore(string file)
{
string result = "";
var splits = file.Split(".");
var extension = splits.LastOrDefault();
splits = splits.Take(splits.Count() - 1).ToArray();
foreach (var strItem in splits)
{
result = result + "_" + strItem;
}
result = result.Substring(1, result.Length-1);
string finalResult = result + "."+extension;
return finalResult;
}
Is there a better way to do it?

Since you work with files, I suggest using Path class: all
we want is to change file name only while keeping extension intact:
static string replaceForUnderScore(string file) =>
Path.GetFileNameWithoutExtension(file).Replace('.', '_') + Path.GetExtension(file);

You can replace all the dots with an underscore except for the last dot by asserting that there is still a dot present to the right when matching one.
string result = Regex.Replace(input, #"\.(?=[^.]*\.)", "_");
The result will be
video_coffee_example.mp4

Regex will help you to do this.
Add the namespace using System.Text.RegularExpressions;
And use this code:
var regex = new Regex(Regex.Escape("."));
var newText = regex.Replace("video.coffee.example.mp4", "_", 2);
Here we specified the maximum number of times to replace the .
The output would be the following:
video_coffee_example.mp4
Additionally, you can update the code to replace any number of dots excluding the last one.
var replaceChar = '.';
var regex = new Regex(Regex.Escape(replaceChar.ToString()));
var replaceWith = "_";
// The text to process
var text = "video.coffee.example.mp4";
// Count how many chars to replace excluding extension
var replaceCount = text.Count(s => s == replaceChar) - 1;
var newText = regex.Replace(text, replaceWith, replaceCount);

Off the top of my head but this might work.
return $"{file.Replace(".mp4","").Replace(".","_")}.mp4";

The simplest (and probably fastest) way is just to iterate over the string:
static string replaceForUnderScore(string file)
{
StringBuilder sb = new StringBuilder( file.Length ) ;
int lastDot = -1 ;
for ( int i = 0 ; i < file.Length ; ++i )
{
char c = file[i] ;
// if we found a '.', replace it with '_' and save its position
if ( c == '.' )
{
c = '_' ;
lastDot = i ;
}
sb.Append( c ) ;
}
// if we changed any '.' to '_', convert the last such replacement back to '.'
if ( lastDot >= 0 )
{
sb.Replace ( '.' , '_' , lastDot, 1 );
}
return sb.ToString();
}
Another approach would be to use System.IO.Path. It's certainly the most succinct:
static string replaceForUnderScore( string file )
{
string ext = Path.GetExtension( file ) ;
string name = Path
.GetFileNameWithoutExtension( file )
.Replace( '.' , '_' )
;
return Path.ChangeExtension( name , ext ) ;
}

Keep only numeric value from a string?

I have some strings like this
string phoneNumber = "(914) 395-1430";
I would like to strip out the parethenses and the dash, in other word just keep the numeric values.
So the output could look like this
9143951430
How do I get the desired output ?

You do any of the following:
Use regular expressions. You can use a regular expression with either
A negative character class that defines the characters that are what you don't want (those characters other than decimal digits):
private static readonly Regex rxNonDigits = new Regex( #"[^\d]+");
In which case, you can do take either of these approaches:
// simply replace the offending substrings with an empty string
private string CleanStringOfNonDigits_V1( string s )
{
if ( string.IsNullOrEmpty(s) ) return s ;
string cleaned = rxNonDigits.Replace(s, "") ;
return cleaned ;
}
// split the string into an array of good substrings
// using the bad substrings as the delimiter. Then use
// String.Join() to splice things back together.
private string CleanStringOfNonDigits_V2( string s )
{
if (string.IsNullOrEmpty(s)) return s;
string cleaned = String.Join( rxNonDigits.Split(s) );
return cleaned ;
}
a positive character set that defines what you do want (decimal digits):
private static Regex rxDigits = new Regex( #"[\d]+") ;
In which case you can do something like this:
private string CleanStringOfNonDigits_V3( string s )
{
if ( string.IsNullOrEmpty(s) ) return s ;
StringBuilder sb = new StringBuilder() ;
for ( Match m = rxDigits.Match(s) ; m.Success ; m = m.NextMatch() )
{
sb.Append(m.Value) ;
}
string cleaned = sb.ToString() ;
return cleaned ;
}
You're not required to use a regular expression, either.
You could use LINQ directly, since a string is an IEnumerable<char>:
private string CleanStringOfNonDigits_V4( string s )
{
if ( string.IsNullOrEmpty(s) ) return s;
string cleaned = new string( s.Where( char.IsDigit ).ToArray() ) ;
return cleaned;
}
If you're only dealing with western alphabets where the only decimal digits you'll see are ASCII, skipping char.IsDigit will likely buy you a little performance:
private string CleanStringOfNonDigits_V5( string s )
{
if (string.IsNullOrEmpty(s)) return s;
string cleaned = new string(s.Where( c => c-'0' < 10 ).ToArray() ) ;
return cleaned;
}
Finally, you can simply iterate over the string, chucking the digits you don't want, like this:
private string CleanStringOfNonDigits_V6( string s )
{
if (string.IsNullOrEmpty(s)) return s;
StringBuilder sb = new StringBuilder(s.Length) ;
for (int i = 0; i < s.Length; ++i)
{
char c = s[i];
if ( c < '0' ) continue ;
if ( c > '9' ) continue ;
sb.Append(s[i]);
}
string cleaned = sb.ToString();
return cleaned;
}
Or this:
private string CleanStringOfNonDigits_V7(string s)
{
if (string.IsNullOrEmpty(s)) return s;
StringBuilder sb = new StringBuilder(s);
int j = 0 ;
int i = 0 ;
while ( i < sb.Length )
{
bool isDigit = char.IsDigit( sb[i] ) ;
if ( isDigit )
{
sb[j++] = sb[i++];
}
else
{
++i ;
}
}
sb.Length = j;
string cleaned = sb.ToString();
return cleaned;
}
From a standpoint of clarity and cleanness of code, the version 1 is what you want. It's hard to beat a one liner.
If performance matters, my suspicion is that the version 7, the last version, is the winner. It creates one temporary — a StringBuilder() and does the transformation in-place within the StringBuilder's in-place buffer.
The other options all do more work.

use reg expression
string result = Regex.Replace(phoneNumber, #"[^\d]", "");

try something like this
return new String(input.Where(Char.IsDigit).ToArray());

string phoneNumber = "(914) 395-1430";
var numbers = String.Join("", phoneNumber.Where(char.IsDigit));

He means everything #gleng
Regex rgx = new Regex(#"\D");
str = rgx.Replace(str, "");

Instead of a regular expression, you can use a LINQ method:
phoneNumber = String.Concat(phoneNumber.Where(c => c >= '0' && c <= '9'));
or:
phoneNumber = String.Concat(phoneNumber.Where(Char.IsDigit));

How do I use regex to split only on commas not in angle brackets?

I have the string
DobuleGeneric<DoubleGeneric<int,string>,string>
I am trying to grab the 2 type arguments:
DoubleGeneric<int,string> and string
Initially I was using a split on ','. This worked, but only if the generic args are not themeselves generic.
My Code:
string fullName = "DobuleGeneric<DoubleGeneric<int,string>,string>";
Regex regex = new Regex( #"([a-zA-Z\._]+)\<(.+)\>$" );
Match m = regex.Match( fullName );
string frontName = m.Groups[1].Value;
string[] innerTypes = m.Groups[2].Value.Split( ',' );
foreach( string strInnerType in innerTypes ) {
Console.WriteLine( strInnerType );
}
Question:
How do I do a regex split on commas that are not encapsulated in angle brackets?

Both commas are between angle brackets! Regex does a bad job when parsing a complex nested syntax. The question should be, how to find a comma, which is between angle brackets that are themselves not between angle brackets. I don't think that this can be done with regex.
If possible, try to work with Reflection. You might also use CS-Script to compile your code snippet and then use Reflection to retrieve the information you need.

To split the example you have given you can use the following. However, this is not generic; it could be made generic based on the other strings that you expect. Depending on the variation of the strings you have, this method could get complex; but I would suggest that the use of Roslyn here is overkill...
string fullName = "DobuleGeneric<DoubleGeneric<int,string>,string>";
Regex Reg =
new Regex(#"(?i)<\s*\p{L}+\s*<\s*\p{L}+\s*,\s*\p{L}+\s*>\s*,\s*\p{L}+\s*>");
Match m = Reg.Match(fullName);
string str = m.ToString().Trim(new char[] { '<', '>' });
Regex rr = new Regex(#"(?i),(?!.*>\s*)");
string[] strArr = rr.Split(str);
I hope this helps.

The answers are correct, using Regex is the wrong approach.
I ended up doing a linear pass, replacing items encapsulated in brackets with ~s, and then doing a split.
static void Main( string[] args ) {
string fullName = "Outer<blah<int,string>,int,blah<int,int>>";
Regex regex = new Regex( #"([a-zA-Z\._]+)\<(.+)\>$" );
Match m = regex.Match( fullName );
string frontName = m.Groups[1].Value;
string inner = m.Groups[2].Value;
var genArgs = ParseInnerGenericArgs( inner );
foreach( string s in genArgs ) {
Console.WriteLine(s);
}
Console.ReadKey();
}
private static IEnumerable<string> ParseInnerGenericArgs( string inner ) {
List<string> pieces = new List<string>();
int angleCount = 0;
StringBuilder sb = new StringBuilder();
for( int i = 0; i < inner.Length; i++ ) {
string currChar = inner[i].ToString();
if( currChar == ">" ) {
angleCount--;
}
if( currChar == "<" ) {
angleCount++;
}
if( currChar == "," && angleCount > 0 ) {
sb.Append( "~" );
} else {
sb.Append( currChar );
}
}
foreach( string item in sb.ToString().Split( ',' ) ) {
pieces.Add(item.Replace('~',','));
}
return pieces;
}

Here is the regex I will use:
\<(([\w\.]+)(\<.+\>)?)\,(([\w\.]+)(\<.+\>)?)$
([\w\.]+) matches "DoubleGeneric".
(\<.+\>)? matches the possible generic args like DoubleGeneric<OtherGeneric<int, ...>>
The key point is that no matter how many nested generic args you have you will have only one ">," in the whole expression.
You can use m.Gruops[1] and m.Groups[4] to get the first and second Type.

How to make these 2 methods more Efficient [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Hi guys this is my first question ever on SO so please go easy on me.
I am playing with Lambda/LINQ while building myself few utility methods.
First method takes string like,
"AdnanRazaBhatti"
and breaks it up like ,
"Adnan Raza Bhatti"
Second Methods takes string like first method and also takes,
out String[] brokenResults
and returns broken string like the first method as well as fill up brokenResults array as follows.
"Adnan" "Raza" "Bhatti"
Questions:
A. Can you please suggest how to make these methods more efficient?
B. When I try to use StringBuilder it tells me extension methods like, Where, Select does not exist for StringBuilder class, why is it so? Although indexer works on StringBuilder to get the characters like StringBuilder s = new StrinBuilder("Dang"); char c = s[0]; Here char will be D;
Code
Method 1:
public static string SplitCapital( string source )
{
string result = "";
int i = 0;
//Separate all the Capital Letter
var charUpper = source.Where( x => char.IsUpper( x ) ).ToArray<char>( );
//If there is only one Capital letter then it is already atomic.
if ( charUpper.Count( ) > 1 ) {
var strLower = source.Split( charUpper );
foreach ( string s in strLower )
if ( i < strLower.Count( ) - 1 && !String.IsNullOrEmpty( s ) )
result += charUpper.ElementAt( i++ ) + s + " ";
return result;
}
return source;
}
Method 2:
public static string SplitCapital( string source, out string[] brokenResults )
{
string result = "";
int i = 0;
var strUpper = source.Where( x => char.IsUpper( x ) ).ToArray<char>( );
if ( strUpper.Count( ) > 1 ) {
var strLower = source.Split( strUpper );
brokenResults = (
from s in strLower
where i < strLower.Count( ) - 1 && !String.IsNullOrEmpty( s )
select result = strUpper.ElementAt( i++ ) + s + " " ).ToArray( );
result = "";
foreach ( string s in brokenResults )
result += s;
return result;
}
else { brokenResults = new string[] { source }; }
return source;
}
Note:
I am planning to use these utility methods to break up the table column names I get from my database.
For Example if column name is "BooksId" I will break it up using one of these methods as "Books Id" programmatically, I know there are other ways or renaming the column names like in design window or [dataset].[tableName].HeadersRow.Cells[0].Text = "Books Id" but I am also planning to use this method somewhere else in the future.
Thanks

you can use the following extension methods to split your string based on Capital letters:
public static string Wordify(this string camelCaseWord)
{
/* CamelCaseWord will become Camel Case Word,
if the word is all upper, just return it*/
if (!Regex.IsMatch(camelCaseWord, "[a-z]"))
return camelCaseWord;
return string.Join(" ", Regex.Split(camelCaseWord, #"(?<!^)(?=[A-Z])"));
}
To split a string in a string array, you can use this:
public static string[] SplitOnVal(this string text,string value)
{
return text.Split(new[] { value }, StringSplitOptions.None);
}
If we take your example for consideration, the code will be as follows:
string strTest = "AdnanRazaBhatti";
var capitalCase = strTest.Wordify(); //Adnan Raza Bhatti
var brokenResults = capitalCase.SplitOnVal(" "); //seperate by a blank value in an array

Check this code
public static string SeperateCamelCase(this string value)
{
return Regex.Replace(value, "((?<=[a-z])[A-Z]|[A-Z](?=[a-z]))", " $1");
}
Hope this answer helps you. If you find solution kindly mark my answer and point it up.

Looks to me like regular expressions is the way to go.
I think [A-Z][a-z]+ might be a good one to start with.

Updated version. String builder was used to reduce memory utilization.
string SplitCapital(string str)
{
//Search all capital letters and store indexes
var indexes = str
.Select((c, i) => new { c = c, i = i }) // Select information about char and position
.Where(c => Char.IsUpper(c.c)) // Get only capital chars
.Select(cl => cl.i); // Get indexes of capital chars
// If no indexes found or if indicies count equal to the source string length then return source string
if (!indexes.Any() || indexes.Count() == str.Length)
{
return str;
}
// Create string builder from the source string
var sb = new StringBuilder(str);
// Reverse indexes and remove 0 if necessary
foreach (var index in indexes.Reverse().Where(i => i != 0))
{
// Insert spaces before capital letter
sb.Insert(index, ' ');
}
return sb.ToString();
}
string SplitCapital(string str, out string[] parts)
{
var splitted = SplitCapital(str);
parts = splitted.Split(new[] { ' ' }, StringSplitOptions.None);
return splitted;
}

How do I replace multiple spaces with a single space in C#?

How can I replace multiple spaces in a string with only one space in C#?
Example:
1 2 3 4 5
would be:
1 2 3 4 5

I like to use:
myString = Regex.Replace(myString, #"\s+", " ");
Since it will catch runs of any kind of whitespace (e.g. tabs, newlines, etc.) and replace them with a single space.

string sentence = "This is a sentence with multiple spaces";
RegexOptions options = RegexOptions.None;
Regex regex = new Regex("[ ]{2,}", options);
sentence = regex.Replace(sentence, " ");

string xyz = "1 2 3 4 5";
xyz = string.Join( " ", xyz.Split( new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries ));

I think Matt's answer is the best, but I don't believe it's quite right. If you want to replace newlines, you must use:
myString = Regex.Replace(myString, #"\s+", " ", RegexOptions.Multiline);

Another approach which uses LINQ:
var list = str.Split(' ').Where(s => !string.IsNullOrWhiteSpace(s));
str = string.Join(" ", list);

It's much simpler than all that:
while(str.Contains(" ")) str = str.Replace(" ", " ");

Regex can be rather slow even with simple tasks. This creates an extension method that can be used off of any string.
public static class StringExtension
{
public static String ReduceWhitespace(this String value)
{
var newString = new StringBuilder();
bool previousIsWhitespace = false;
for (int i = 0; i < value.Length; i++)
{
if (Char.IsWhiteSpace(value[i]))
{
if (previousIsWhitespace)
{
continue;
}
previousIsWhitespace = true;
}
else
{
previousIsWhitespace = false;
}
newString.Append(value[i]);
}
return newString.ToString();
}
}
It would be used as such:
string testValue = "This contains too much whitespace."
testValue = testValue.ReduceWhitespace();
// testValue = "This contains too much whitespace."

myString = Regex.Replace(myString, " {2,}", " ");

For those, who don't like Regex, here is a method that uses the StringBuilder:
public static string FilterWhiteSpaces(string input)
{
if (input == null)
return string.Empty;
StringBuilder stringBuilder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i == 0 || c != ' ' || (c == ' ' && input[i - 1] != ' '))
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
In my tests, this method was 16 times faster on average with a very large set of small-to-medium sized strings, compared to a static compiled Regex. Compared to a non-compiled or non-static Regex, this should be even faster.
Keep in mind, that it does not remove leading or trailing spaces, only multiple occurrences of such.

This is a shorter version, which should only be used if you are only doing this once, as it creates a new instance of the Regex class every time it is called.
temp = new Regex(" {2,}").Replace(temp, " ");
If you are not too acquainted with regular expressions, here's a short explanation:
The {2,} makes the regex search for the character preceding it, and finds substrings between 2 and unlimited times.
The .Replace(temp, " ") replaces all matches in the string temp with a space.
If you want to use this multiple times, here is a better option, as it creates the regex IL at compile time:
Regex singleSpacify = new Regex(" {2,}", RegexOptions.Compiled);
temp = singleSpacify.Replace(temp, " ");

You can simply do this in one line solution!
string s = "welcome to london";
s.Replace(" ", "()").Replace(")(", "").Replace("()", " ");
You can choose other brackets (or even other characters) if you like.

no Regex, no Linq... removes leading and trailing spaces as well as reducing any embedded multiple space segments to one space
string myString = " 0 1 2 3 4 5 ";
myString = string.Join(" ", myString.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries));
result:"0 1 2 3 4 5"

// Mysample string
string str ="hi you are a demo";
//Split the words based on white sapce
var demo= str .Split(' ').Where(s => !string.IsNullOrWhiteSpace(s));
//Join the values back and add a single space in between
str = string.Join(" ", demo);
// output: string str ="hi you are a demo";

Consolodating other answers, per Joel, and hopefully improving slightly as I go:
You can do this with Regex.Replace():
string s = Regex.Replace (
" 1 2 4 5",
#"[ ]{2,}",
" "
);
Or with String.Split():
static class StringExtensions
{
public static string Join(this IList<string> value, string separator)
{
return string.Join(separator, value.ToArray());
}
}
//...
string s = " 1 2 4 5".Split (
" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries
).Join (" ");

I just wrote a new Join that I like, so I thought I'd re-answer, with it:
public static string Join<T>(this IEnumerable<T> source, string separator)
{
return string.Join(separator, source.Select(e => e.ToString()).ToArray());
}
One of the cool things about this is that it work with collections that aren't strings, by calling ToString() on the elements. Usage is still the same:
//...
string s = " 1 2 4 5".Split (
" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries
).Join (" ");

Many answers are providing the right output but for those looking for the best performances, I did improve Nolanar's answer (which was the best answer for performance) by about 10%.
public static string MergeSpaces(this string str)
{
if (str == null)
{
return null;
}
else
{
StringBuilder stringBuilder = new StringBuilder(str.Length);
int i = 0;
foreach (char c in str)
{
if (c != ' ' || i == 0 || str[i - 1] != ' ')
stringBuilder.Append(c);
i++;
}
return stringBuilder.ToString();
}
}

Use the regex pattern
[ ]+ #only space
var text = Regex.Replace(inputString, #"[ ]+", " ");

I know this is pretty old, but ran across this while trying to accomplish almost the same thing. Found this solution in RegEx Buddy. This pattern will replace all double spaces with single spaces and also trim leading and trailing spaces.
pattern: (?m:^ +| +$|( ){2,})
replacement: $1
Its a little difficult to read since we're dealing with empty space, so here it is again with the "spaces" replaced with a "_".
pattern: (?m:^_+|_+$|(_){2,}) <-- don't use this, just for illustration.
The "(?m:" construct enables the "multi-line" option. I generally like to include whatever options I can within the pattern itself so it is more self contained.

I can remove whitespaces with this
while word.contains(" ") //double space
word = word.Replace(" "," "); //replace double space by single space.
word = word.trim(); //to remove single whitespces from start & end.

Without using regular expressions:
while (myString.IndexOf(" ", StringComparison.CurrentCulture) != -1)
{
myString = myString.Replace(" ", " ");
}
OK to use on short strings, but will perform badly on long strings with lots of spaces.

try this method
private string removeNestedWhitespaces(char[] st)
{
StringBuilder sb = new StringBuilder();
int indx = 0, length = st.Length;
while (indx < length)
{
sb.Append(st[indx]);
indx++;
while (indx < length && st[indx] == ' ')
indx++;
if(sb.Length > 1 && sb[0] != ' ')
sb.Append(' ');
}
return sb.ToString();
}
use it like this:
string test = removeNestedWhitespaces("1 2 3 4 5".toCharArray());

Here is a slight modification on Nolonar original answer.
Checking if the character is not just a space, but any whitespace, use this:
It will replace any multiple whitespace character with a single space.
public static string FilterWhiteSpaces(string input)
{
if (input == null)
return string.Empty;
var stringBuilder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i == 0 || !char.IsWhiteSpace(c) || (char.IsWhiteSpace(c) &&
!char.IsWhiteSpace(strValue[i - 1])))
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}

How about going rogue?
public static string MinimizeWhiteSpace(
this string _this)
{
if (_this != null)
{
var returned = new StringBuilder();
var inWhiteSpace = false;
var length = _this.Length;
for (int i = 0; i < length; i++)
{
var character = _this[i];
if (char.IsWhiteSpace(character))
{
if (!inWhiteSpace)
{
inWhiteSpace = true;
returned.Append(' ');
}
}
else
{
inWhiteSpace = false;
returned.Append(character);
}
}
return returned.ToString();
}
else
{
return null;
}
}

Mix of StringBuilder and Enumerable.Aggregate() as extension method for strings:
using System;
using System.Linq;
using System.Text;
public static class StringExtension
{
public static string CondenseSpaces(this string s)
{
return s.Aggregate(new StringBuilder(), (acc, c) =>
{
if (c != ' ' || acc.Length == 0 || acc[acc.Length - 1] != ' ')
acc.Append(c);
return acc;
}).ToString();
}
public static void Main()
{
const string input = " (five leading spaces) (five internal spaces) (five trailing spaces) ";
Console.WriteLine(" Input: \"{0}\"", input);
Console.WriteLine("Output: \"{0}\"", StringExtension.CondenseSpaces(input));
}
}
Executing this program produces the following output:
Input: " (five leading spaces) (five internal spaces) (five trailing spaces) "
Output: " (five leading spaces) (five internal spaces) (five trailing spaces) "

Old skool:
string oldText = " 1 2 3 4 5 ";
string newText = oldText
.Replace(" ", " " + (char)22 )
.Replace( (char)22 + " ", "" )
.Replace( (char)22 + "", "" );
Assert.That( newText, Is.EqualTo( " 1 2 3 4 5 " ) );

You can create a StringsExtensions file with a method like RemoveDoubleSpaces().
StringsExtensions.cs
public static string RemoveDoubleSpaces(this string value)
{
Regex regex = new Regex("[ ]{2,}", RegexOptions.None);
value = regex.Replace(value, " ");
// this removes space at the end of the value (like "demo ")
// and space at the start of the value (like " hi")
value = value.Trim(' ');
return value;
}
And then you can use it like this:
string stringInput =" hi here is a demo ";
string stringCleaned = stringInput.RemoveDoubleSpaces();

I looked over proposed solutions, could not find the one that would handle mix of white space characters acceptable for my case, for example:
Regex.Replace(input, #"\s+", " ") - it will eat your line breaks, if they are mixed with spaces, for example \n \n sequence will be replaced with
Regex.Replace(source, #"(\s)\s+", "$1") - it will depend on whitespace first character, meaning that it again might eat your line breaks
Regex.Replace(source, #"[ ]{2,}", " ") - it won't work correctly when there's mix of whitespace characters - for example "\t \t "
Probably not perfect, but quick solution for me was:
Regex.Replace(input, #"\s+",
(match) => match.Value.IndexOf('\n') > -1 ? "\n" : " ", RegexOptions.Multiline)
Idea is - line break wins over the spaces and tabs.
This won't handle windows line breaks correctly, but it would be easy to adjust to work with that too, don't know regex that well - may be it is possible to fit into single pattern.

The following code remove all the multiple spaces into a single space
public string RemoveMultipleSpacesToSingle(string str)
{
string text = str;
do
{
//text = text.Replace(" ", " ");
text = Regex.Replace(text, #"\s+", " ");
} while (text.Contains(" "));
return text;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I remove multiple offending characters from my string? [duplicate] - c#

Output = Regex.Replace(Input, "[()-]", ""); The [] characters in the expression create a character class. It doesn't match those character directly.

LINQ solution: Output = new String(Input.Except("()-").ToArray());

As an alternative to Regex, it may be easier to manage as a collection of replacements and doing the replaces using a StringBuilder. var replacements = new[] { ")", "-" }; var output = new StringBuilder(Input); foreach (var r in replacements) output.Replace(r, string.Empty);

You can use Regex.Replace(), documented here.

Related

Remove specific characters except last

Keep only numeric value from a string?

How do I use regex to split only on commas not in angle brackets?

How to make these 2 methods more Efficient [closed]

How do I replace multiple spaces with a single space in C#?

Categories

Resources