C# How can I remove spaces before sentence ending - c#

I was trying to remove spaces before sentence ending but had no success. I was thinking of doing it with Split function but it didn't go well. The only thing I succeeded at was adding spaces after sentence ending. Here is my code:
static void Main(string[] args)
{
System.Windows.Forms.OpenFileDialog dlgOpen = new System.Windows.Forms.OpenFileDialog();
if (dlgOpen.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
StreamReader sr = new StreamReader(dlgOpen.FileName);
string dat1 = sr.ReadToEnd();
string dat2 = Path.GetDirectoryName(dlgOpen.FileName);
string dat3 = Path.GetFileNameWithoutExtension(dlgOpen.FileName);
string dat4 = Path.GetExtension(dlgOpen.FileName);
dat2 = dat2 + "/" + dat3 + "_norm" + dat4;
sz1(ref dat1);
Console.Write(dat1);
StreamWriter sw = new StreamWriter(dat2, false);
sw.WriteLine(dat1);
sw.Flush();
sw.Close();
Console.ReadLine();
}
}
static void sz1(ref string dat1)
{
char[] ArrayCharacters = { '.', ':', ',', ';', '!', '?' };
int i = -1;
dat1 = dat1.Trim();
for (int k = 0; k < dat1.Length; k++)
{
dat1 = dat1.Replace(" ", " ");
}
do
{
i = dat1.IndexOfAny(ArrayCharacters, i + 1);
if (i != -1)
{
dat1 = dat1.Insert((i + 1), " ");
dat1 = dat1.Replace(" ", " ");
}
} while (i != -1);
do
{
i = dat1.IndexOfAny(ArrayCharacters, i + 1);
if (i != -1)
{
dat1 = dat1.Insert((i - 1), " ");
dat1 = dat1.Replace(" ", " ");
dat1 = dat1.Remove(i - 1, 1);
}
} while (i != -1);
}

One option is using regex:
string pattern = "\\s+$";
string replacement = "";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(dat1, replacement);

If you're just learning programming, then one solution that you should be familiar with is to use a loop to walk the string one character at a time, and since we're examining the end of the string, it makes sense to walk it backwards.
I'm assuming from your code (though it would be nice if you clarified it in your question) that you have a set of characters that are allowed at the end of a sentence, and you would like to leave these characters alone, but remove any additional spaces.
The logic, then, would be to start from the end of the string, and if a character is a valid ending character leave it alone. Otherwise, if it's a space, remove it. And finally, if it's neither then we're done.
Below is a method that uses this logic, along with a StringBuilder variable that is used to store the result. Starting at the end of the string, we capture the last characters, adding them to the result if they're valid and skipping them if they're spaces, until we reach a "regular" character, at which point we keep the rest of the string:
static string TrimEndSpaces(string input)
{
// If the input is null, there's nothing to do - just return null
if (input == null) return input;
// Our array of valid ending punctuation
char[] validEndingPunctuation = { '.', ':', ',', ';', '!', '?' };
// This will contain our final result
var result = new StringBuilder();
// Walk backwards through the input string
for (int i = input.Length - 1; i >= 0; i--)
{
if (validEndingPunctuation.Contains(input[i]))
{
// Valid character, so add it and keep going backwards
result.Insert(0, input[i]);
continue;
}
if (input[i] == ' ')
{
// Space character at end - skip it
continue;
}
// Regular character found - we're done. Add the rest of the string
result.Insert(0, input.Substring(0, i + 1));
break;
}
return result.ToString();
}
And here is an example usage, with some test sentences with varying endings of spaces, valid characters, null strings, empty strings, etc:
private static void Main()
{
var testInput = new List<string>
{
null,
"",
" ",
"Normal sentence test.",
"Test with spaces .",
"Test with multiple ending chars !?!?!",
"Test with only spaces at end ",
"Test with spaces after punctuation. ",
"Test with mixed punctuation and spaces ! ? ! ? ! "
};
foreach (var test in testInput)
{
// Format output so we can "see" null and empty strings
var original = test ?? "<null>";
if (original.Length == 0) original = "<empty>";
// Show original and the result. Wrap result in <> so we know where it ends.
Console.WriteLine($"{original.PadRight(50, '-')} = <{TrimEndSpaces(test)}>");
}
GetKeyFromUser("\nDone! Press any key to exit...");
}
Output

If you just want to remove them from the end, you can use:
if(myString.EndsWith(" ") == true)
{
myString = myString.TrimEnd();
}
Of course, you will need to consider the ending symbol ".", "!" or "?", you might want to exclude that char if the spaces are just before it.
Another approach would be:
var keepTrimming = true;
while(keepTrimming == true)
{
if(myString.EndsWith(" ") == true)
{
myString= myString.Remove(myString.Length - 1);
}
else
{
keepTrimming = false
}
}

Related

Remove text between quotes

I have a program, in which you can input a string. But I want text between quotes " " to be removed.
Example:
in: Today is a very "nice" and hot day.
out: Today is a very "" and hot day.
Console.WriteLine("Enter text: ");
text = Console.ReadLine();
int letter;
string s = null;
string s2 = null;
for (s = 0; s < text.Length; letter++)
{
if (text[letter] != '"')
{
s = s + text[letter];
}
else if (text[letter] == '"')
{
s2 = s2 + letter;
letter++;
(text[letter] != '"')
{
s2 = s2 + letter;
letter++;
}
}
}
I don't know how to write the string without text between quotes to the console.
I am not allowed to use a complex method like regex.
This should do the trick. It checks every character in the string for quotes.
If it finds quotes then sets a quotesOpened flag as true, so it will ignore any subsequent character.
When it encounters another quotes, it sets the flag to false, so it will resume copying the characters.
Console.WriteLine("Enter text: ");
text = Console.ReadLine();
int letterIndex;
string s2 = "";
bool quotesOpened = false;
for (letterIndex= 0; letterIndex< text.Length; letterIndex++)
{
if (text[letterIndex] == '"')
{
quotesOpened = !quotesOpened;
s2 = s2 + text[letterIndex];
}
else
{
if (!quotesOpened)
s2 = s2 + text[letterIndex];
}
}
Hope this helps!
A take without regular expressions, which I like better, but okay:
string input = "abc\"def\"ghi";
string output = input;
int firstQuoteIndex = input.IndexOf("\"");
if (firstQuoteIndex >= 0)
{
int secondQuoteIndex = input.IndexOf("\"", firstQuoteIndex + 1);
if (secondQuoteIndex >= 0)
{
output = input.Substring(0, firstQuoteIndex + 1) + input.Substring(secondQuoteIndex);
}
}
Console.WriteLine(output);
What it does:
It searches for the first occurrence of "
Then it searches for the second occurrence of "
Then it takes the first part, including the first " and the second part, including the second "
You could improve this yourself by searching until the end of the string and replace all occurrences. You have to remember the new 'first index' you have to search on.
string text = #" Today is a very ""nice"" and hot day. Second sentense with ""text"" test";
Regex r = new Regex("\"([^\"]*)\"");
var a = r.Replace(text,string.Empty);
Please try this.
First we need to split string and then remove odd items:
private static String Remove(String s)
{
var rs = s.Split(new[] { '"' }).ToList();
return String.Join("\"\"", rs.Where(_ => rs.IndexOf(_) % 2 == 0));
}
static void Main(string[] args)
{
var test = Remove("hello\"world\"\"yeah\" test \"fhfh\"");
return;
}
This would be a possible solution:
String cmd = "This is a \"Test\".";
// This is a "".
String newCmd = cmd.Split('\"')[0] + "\"\"" + cmd.Split('\"')[2];
Console.WriteLine(newCmd);
Console.Read();
You simply split the text at " and then add both parts together and add the old ". Not a very nice solution, but it works anyway.
€dit:
cmd[0] = "This is a "
cmd[1] = "Test"
cmd[2] = "."
You can do it like this:
Console.WriteLine("Enter text: ");
var text = Console.ReadLine();
var skipping = false;
var result = string.Empty;
foreach (var c in text)
{
if (!skipping || c == '"') result += c;
if (c == '"') skipping = !skipping;
}
Console.WriteLine(result);
Console.ReadLine();
The result string is created by adding characters from the original string as long we are not between quotes (using the skipping variable).
Take all indexes of quotes remove the text between quotes using substring.
static void Main(string[] args)
{
string text = #" Today is a very ""nice"" and hot day. Second sentense with ""text"" test";
var foundIndexes = new List<int>();
foundIndexes.Add(0);
for (int i = 0; i < text.Length; i++)
{
if (text[i] == '"')
foundIndexes.Add(i);
}
string result = "";
for(int i =0; i<foundIndexes.Count; i+=2)
{
int length = 0;
if(i == foundIndexes.Count - 1)
{
length = text.Length - foundIndexes[i];
}
else
{
length = foundIndexes[i + 1] - foundIndexes[i]+1;
}
result += text.Substring(foundIndexes[i], length);
}
Console.WriteLine(result);
Console.ReadKey();
}
Output: Today is a very "" and hot day. Second sentense with "" test";
Here dotNetFiddle

Need RegEx or some other way for separating quoted tokens containing escaped quotes

Basically, my task is to parse this command line:
-p "This is a string ""with quotes""" d:\1.txt "d:\some folder\1.out"
What I need is to split this string into:
-p
This is a string "with quotes"
d:\1.txt
d:\some folder\1.out
I searched (yes, I really did), but all examples I found either had not included escaped quotes or used \" for escape symbol.
I would use a real csv-parser instead, for example the only one available in .NET:
string str = "-p \"This is a string \"\"with quotes\"\"\" d:\\1.txt \"d:\\some folder\\1.out\"";
var allLineFields = new List<string[]>();
using (var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(new StringReader(str)))
{
parser.Delimiters = new string[] { " " };
parser.HasFieldsEnclosedInQuotes = true; // <--- !!!
string[] lineFields;
while ((lineFields = parser.ReadFields()) != null)
{
allLineFields.Add(lineFields);
}
}
With your sample string the list contains a single string[] with your four tokens:
-p
This is a string "with quotes"
d:\1.txt
d:\some folder\1.out
Using a regex (if you insist on not using a parser as Tim Schmelter's answer suggested), something like this should work (it matches the given string, but I can't guarantee it's completely bullet-proof):
((?:"(?:[^"]|"")*")|\S+)
Breaking it down, you are grouping either:
A quote " followed by not a quote ^" or two quotes "", followed by a quote "
A bunch (one or more) of non-space characters \S
See here to play around with it.
A handwritten version:
private static string[] ParseArguments(string text)
{
if (string.IsNullOrWhiteSpace(text)) return new string[0];
var entries = new List<string>(8);
var stringBuilder = new StringBuilder(64);
var inString = false;
var l = text.Length;
for (var i = 0; i < l; i++)
{
var c = text[i];
if (inString)
{
if (c == '"')
{
if (i != l - 1 && text[i + 1] == '"')
{
stringBuilder.Append(c);
i++;
}
else inString = false;
}
else stringBuilder.Append(c);
}
else if (c == '"') inString = true;
else if (char.IsWhiteSpace(c))
{
if (stringBuilder.Length == 0) continue;
entries.Add(stringBuilder.ToString());
stringBuilder.Length = 0;
}
else stringBuilder.Append(c);
}
if (stringBuilder.Length != 0) entries.Add(stringBuilder.ToString());
return entries.ToArray();
}

Add spaces in string but not between quotes

I have this string:
string myString = "do Output.printString(\"Do you want to Hit (h) or Stand (s)?\");";
My string as plain text:
do Output.printString("Do you want to Hit (h) or Stand (s)?");
I want to make it:
do Output . printString ("Do#you#want#to Hit#(h)#or#Stand#(s)?");
The idea is that there is a space between each word but if there is a string within apostrophes I want it to be WITHOUT SPACE and after this function I can do:
s.Split(' ');
and get the string in one string.
What I did is:
public static string PrepareForSplit(this string s)
{
string ret = "";
if (s.Contains("\""))
{
bool equalsAppear = false;
foreach (var nextChar in s)
{
char charToConcat;
if (nextChar == '"')
{
equalsAppear= equalsAppear == true ? false : true;
}
if (nextChar == ' ' && equalsAppear)
{
charToConcat = '#';
}
else
{
charToConcat = nextChar;
}
ret += charToConcat;
}
}
if (String.IsNullOrWhiteSpace(ret))
ret = s;
string[] symbols = {"{", "}", "(", ")", "[", "]", ".",
",", ";", "+", "-", "*", "/", "&", "|", "<", ">", "=", "~","#"};
foreach(var symbol in symbols)
if(ret.Contains(symbol))
{
if (!ret.Contains('"') || !((symbol=="-") || symbol==","))
{
ret = ret.Replace(symbol, " " + symbol + " ");
}
}
if(ret.Contains("\t"))
{
ret = Regex.Replace(ret, #"\t", " ");
}
return ret;
}
My problem is that in the end of this function I get this string:
do Output . printString ( "Do#you#want#to#Hit# ( h ) #or#Stand# ( s ) ?" ) ;
As you can see in the string that suppose to be without spacing I have spaces and then my program not behave as it should. Someone please help!
I would use a regular expression to extract your string.
You probably enter the starting string like this:
string source = "do Output.printString(\"Do you want to Hit (h) or Stand (s)?\");";
Try this regular expression:
\("([^\"]+)
The group between round brackets (i.e. the capturing group) is what you're looking for.
Edit: use it like this (based on http://www.dotnetperls.com/regex-match)
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// First we see the input string.
string source = "do Output.printString(\"Do you want to Hit (h) or Stand (s)?\");";
// Here we call Regex.Match.
Match match = Regex.Match(source, "\\(\"([^\"]+)");
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine("result: "+ key);
}
else
{
Console.WriteLine("nothing found");
}
Console.ReadLine();
}
}
Edit2: now it works :-)
I suggest you to split the whole string at the apostrophes. This will make it much easier to differentiate between parts that are within apostrophes and the others.
string[] parts = s.Split('"');
Now you have:
part[0] ==> "do Output . printString ("
part[1] ==> "Do#you#want#to Hit#(h)#or#Stand#(s)?"
part[2] ==> ");"
I.e., the even indexes in part[] are outside the apostrophes, the odd indexes are within.
// Treat the parts not between apostrophes:
for (int i = 0; i < parts.Length; i += 2) {
part[i] = InsertSpacesBetweenWords(part[i]);
}
string result = String.Join("\"", part);
By the way: In your example, you can simplify
equalsAppear = equalsAppear == true ? false : true;
to
equalsAppear = !equalsAppear;
by using the logical NOT operator !.

How to make a first letter capital in C#

How can the first letter in a text be set to capital?
Example:
it is a text. = It is a text.
public static string ToUpperFirstLetter(this string source)
{
if (string.IsNullOrEmpty(source))
return string.Empty;
// convert to char array of the string
char[] letters = source.ToCharArray();
// upper case the first char
letters[0] = char.ToUpper(letters[0]);
// return the array made of the new char array
return new string(letters);
}
It'll be something like this:
// precondition: before must not be an empty string
String after = before.Substring(0, 1).ToUpper() + before.Substring(1);
polygenelubricants' answer is fine for most cases, but you potentially need to think about cultural issues. Do you want this capitalized in a culture-invariant way, in the current culture, or a specific culture? It can make a big difference in Turkey, for example. So you may want to consider:
CultureInfo culture = ...;
text = char.ToUpper(text[0], culture) + text.Substring(1);
or if you prefer methods on String:
CultureInfo culture = ...;
text = text.Substring(0, 1).ToUpper(culture) + text.Substring(1);
where culture might be CultureInfo.InvariantCulture, or the current culture etc.
For more on this problem, see the Turkey Test.
If you are using C# then try this code:
Microsoft.VisualBasic.StrConv(sourceString, Microsoft.VisualBasic.vbProperCase)
I use this variant:
private string FirstLetterCapital(string str)
{
return Char.ToUpper(str[0]) + str.Remove(0, 1);
}
If you are sure that str variable is valid (never an empty-string or null), try:
str = Char.ToUpper(str[0]) + str[1..];
Unlike the other solutions that use Substring, this one does not do additional string allocations. This example basically concatenates char with ReadOnlySpan<char>.
I realize this is an old post, but I recently had this problem and solved it with the following method.
private string capSentences(string str)
{
string s = "";
if (str[str.Length - 1] == '.')
str = str.Remove(str.Length - 1, 1);
char[] delim = { '.' };
string[] tokens = str.Split(delim);
for (int i = 0; i < tokens.Length; i++)
{
tokens[i] = tokens[i].Trim();
tokens[i] = char.ToUpper(tokens[i][0]) + tokens[i].Substring(1);
s += tokens[i] + ". ";
}
return s;
}
In the sample below clicking on the button executes this simple code outBox.Text = capSentences(inBox.Text.Trim()); which pulls the text from the upper box and puts it in the lower box after the above method runs on it.
Take the first letter out of the word and then extract it to the other string.
strFirstLetter = strWord.Substring(0, 1).ToUpper();
strFullWord = strFirstLetter + strWord.Substring(1);
text = new String(
new [] { char.ToUpper(text.First()) }
.Concat(text.Skip(1))
.ToArray()
);
this functions makes capital the first letter of all words in a string
public static string FormatSentence(string source)
{
var words = source.Split(' ').Select(t => t.ToCharArray()).ToList();
words.ForEach(t =>
{
for (int i = 0; i < t.Length; i++)
{
t[i] = i.Equals(0) ? char.ToUpper(t[i]) : char.ToLower(t[i]);
}
});
return string.Join(" ", words.Select(t => new string(t)));;
}
string str = "it is a text";
// first use the .Trim() method to get rid of all the unnecessary space at the begining and the end for exemple (" This string ".Trim() is gonna output "This string").
str = str.Trim();
char theFirstLetter = str[0]; // this line is to take the first letter of the string at index 0.
theFirstLetter.ToUpper(); // .ToTupper() methode to uppercase the firstletter.
str = theFirstLetter + str.substring(1); // we add the first letter that we uppercased and add the rest of the string by using the str.substring(1) (str.substring(1) to skip the first letter at index 0 and only print the letters from the index 1 to the last index.)
Console.WriteLine(str); // now it should output "It is a text"
static String UppercaseWords(String BadName)
{
String FullName = "";
if (BadName != null)
{
String[] FullBadName = BadName.Split(' ');
foreach (string Name in FullBadName)
{
String SmallName = "";
if (Name.Length > 1)
{
SmallName = char.ToUpper(Name[0]) + Name.Substring(1).ToLower();
}
else
{
SmallName = Name.ToUpper();
}
FullName = FullName + " " + SmallName;
}
}
FullName = FullName.Trim();
FullName = FullName.TrimEnd();
FullName = FullName.TrimStart();
return FullName;
}
string Input = " it is my text";
Input = Input.TrimStart();
//Create a char array
char[] Letters = Input.ToCharArray();
//Make first letter a capital one
string First = char.ToUpper(Letters[0]).ToString();
//Concatenate
string Output = string.Concat(First,Input.Substring(1));
Try this code snippet:
char nm[] = "this is a test";
if(char.IsLower(nm[0])) nm[0] = char.ToUpper(nm[0]);
//print result: This is a test

How do I replace multiple spaces with a single space in C#?

How can I replace multiple spaces in a string with only one space in C#?
Example:
1 2 3 4 5
would be:
1 2 3 4 5
I like to use:
myString = Regex.Replace(myString, #"\s+", " ");
Since it will catch runs of any kind of whitespace (e.g. tabs, newlines, etc.) and replace them with a single space.
string sentence = "This is a sentence with multiple spaces";
RegexOptions options = RegexOptions.None;
Regex regex = new Regex("[ ]{2,}", options);
sentence = regex.Replace(sentence, " ");
string xyz = "1 2 3 4 5";
xyz = string.Join( " ", xyz.Split( new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries ));
I think Matt's answer is the best, but I don't believe it's quite right. If you want to replace newlines, you must use:
myString = Regex.Replace(myString, #"\s+", " ", RegexOptions.Multiline);
Another approach which uses LINQ:
var list = str.Split(' ').Where(s => !string.IsNullOrWhiteSpace(s));
str = string.Join(" ", list);
It's much simpler than all that:
while(str.Contains(" ")) str = str.Replace(" ", " ");
Regex can be rather slow even with simple tasks. This creates an extension method that can be used off of any string.
public static class StringExtension
{
public static String ReduceWhitespace(this String value)
{
var newString = new StringBuilder();
bool previousIsWhitespace = false;
for (int i = 0; i < value.Length; i++)
{
if (Char.IsWhiteSpace(value[i]))
{
if (previousIsWhitespace)
{
continue;
}
previousIsWhitespace = true;
}
else
{
previousIsWhitespace = false;
}
newString.Append(value[i]);
}
return newString.ToString();
}
}
It would be used as such:
string testValue = "This contains too much whitespace."
testValue = testValue.ReduceWhitespace();
// testValue = "This contains too much whitespace."
myString = Regex.Replace(myString, " {2,}", " ");
For those, who don't like Regex, here is a method that uses the StringBuilder:
public static string FilterWhiteSpaces(string input)
{
if (input == null)
return string.Empty;
StringBuilder stringBuilder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i == 0 || c != ' ' || (c == ' ' && input[i - 1] != ' '))
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
In my tests, this method was 16 times faster on average with a very large set of small-to-medium sized strings, compared to a static compiled Regex. Compared to a non-compiled or non-static Regex, this should be even faster.
Keep in mind, that it does not remove leading or trailing spaces, only multiple occurrences of such.
This is a shorter version, which should only be used if you are only doing this once, as it creates a new instance of the Regex class every time it is called.
temp = new Regex(" {2,}").Replace(temp, " ");
If you are not too acquainted with regular expressions, here's a short explanation:
The {2,} makes the regex search for the character preceding it, and finds substrings between 2 and unlimited times.
The .Replace(temp, " ") replaces all matches in the string temp with a space.
If you want to use this multiple times, here is a better option, as it creates the regex IL at compile time:
Regex singleSpacify = new Regex(" {2,}", RegexOptions.Compiled);
temp = singleSpacify.Replace(temp, " ");
You can simply do this in one line solution!
string s = "welcome to london";
s.Replace(" ", "()").Replace(")(", "").Replace("()", " ");
You can choose other brackets (or even other characters) if you like.
no Regex, no Linq... removes leading and trailing spaces as well as reducing any embedded multiple space segments to one space
string myString = " 0 1 2 3 4 5 ";
myString = string.Join(" ", myString.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries));
result:"0 1 2 3 4 5"
// Mysample string
string str ="hi you are a demo";
//Split the words based on white sapce
var demo= str .Split(' ').Where(s => !string.IsNullOrWhiteSpace(s));
//Join the values back and add a single space in between
str = string.Join(" ", demo);
// output: string str ="hi you are a demo";
Consolodating other answers, per Joel, and hopefully improving slightly as I go:
You can do this with Regex.Replace():
string s = Regex.Replace (
" 1 2 4 5",
#"[ ]{2,}",
" "
);
Or with String.Split():
static class StringExtensions
{
public static string Join(this IList<string> value, string separator)
{
return string.Join(separator, value.ToArray());
}
}
//...
string s = " 1 2 4 5".Split (
" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries
).Join (" ");
I just wrote a new Join that I like, so I thought I'd re-answer, with it:
public static string Join<T>(this IEnumerable<T> source, string separator)
{
return string.Join(separator, source.Select(e => e.ToString()).ToArray());
}
One of the cool things about this is that it work with collections that aren't strings, by calling ToString() on the elements. Usage is still the same:
//...
string s = " 1 2 4 5".Split (
" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries
).Join (" ");
Many answers are providing the right output but for those looking for the best performances, I did improve Nolanar's answer (which was the best answer for performance) by about 10%.
public static string MergeSpaces(this string str)
{
if (str == null)
{
return null;
}
else
{
StringBuilder stringBuilder = new StringBuilder(str.Length);
int i = 0;
foreach (char c in str)
{
if (c != ' ' || i == 0 || str[i - 1] != ' ')
stringBuilder.Append(c);
i++;
}
return stringBuilder.ToString();
}
}
Use the regex pattern
[ ]+ #only space
var text = Regex.Replace(inputString, #"[ ]+", " ");
I know this is pretty old, but ran across this while trying to accomplish almost the same thing. Found this solution in RegEx Buddy. This pattern will replace all double spaces with single spaces and also trim leading and trailing spaces.
pattern: (?m:^ +| +$|( ){2,})
replacement: $1
Its a little difficult to read since we're dealing with empty space, so here it is again with the "spaces" replaced with a "_".
pattern: (?m:^_+|_+$|(_){2,}) <-- don't use this, just for illustration.
The "(?m:" construct enables the "multi-line" option. I generally like to include whatever options I can within the pattern itself so it is more self contained.
I can remove whitespaces with this
while word.contains(" ") //double space
word = word.Replace(" "," "); //replace double space by single space.
word = word.trim(); //to remove single whitespces from start & end.
Without using regular expressions:
while (myString.IndexOf(" ", StringComparison.CurrentCulture) != -1)
{
myString = myString.Replace(" ", " ");
}
OK to use on short strings, but will perform badly on long strings with lots of spaces.
try this method
private string removeNestedWhitespaces(char[] st)
{
StringBuilder sb = new StringBuilder();
int indx = 0, length = st.Length;
while (indx < length)
{
sb.Append(st[indx]);
indx++;
while (indx < length && st[indx] == ' ')
indx++;
if(sb.Length > 1 && sb[0] != ' ')
sb.Append(' ');
}
return sb.ToString();
}
use it like this:
string test = removeNestedWhitespaces("1 2 3 4 5".toCharArray());
Here is a slight modification on Nolonar original answer.
Checking if the character is not just a space, but any whitespace, use this:
It will replace any multiple whitespace character with a single space.
public static string FilterWhiteSpaces(string input)
{
if (input == null)
return string.Empty;
var stringBuilder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i == 0 || !char.IsWhiteSpace(c) || (char.IsWhiteSpace(c) &&
!char.IsWhiteSpace(strValue[i - 1])))
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
How about going rogue?
public static string MinimizeWhiteSpace(
this string _this)
{
if (_this != null)
{
var returned = new StringBuilder();
var inWhiteSpace = false;
var length = _this.Length;
for (int i = 0; i < length; i++)
{
var character = _this[i];
if (char.IsWhiteSpace(character))
{
if (!inWhiteSpace)
{
inWhiteSpace = true;
returned.Append(' ');
}
}
else
{
inWhiteSpace = false;
returned.Append(character);
}
}
return returned.ToString();
}
else
{
return null;
}
}
Mix of StringBuilder and Enumerable.Aggregate() as extension method for strings:
using System;
using System.Linq;
using System.Text;
public static class StringExtension
{
public static string CondenseSpaces(this string s)
{
return s.Aggregate(new StringBuilder(), (acc, c) =>
{
if (c != ' ' || acc.Length == 0 || acc[acc.Length - 1] != ' ')
acc.Append(c);
return acc;
}).ToString();
}
public static void Main()
{
const string input = " (five leading spaces) (five internal spaces) (five trailing spaces) ";
Console.WriteLine(" Input: \"{0}\"", input);
Console.WriteLine("Output: \"{0}\"", StringExtension.CondenseSpaces(input));
}
}
Executing this program produces the following output:
Input: " (five leading spaces) (five internal spaces) (five trailing spaces) "
Output: " (five leading spaces) (five internal spaces) (five trailing spaces) "
Old skool:
string oldText = " 1 2 3 4 5 ";
string newText = oldText
.Replace(" ", " " + (char)22 )
.Replace( (char)22 + " ", "" )
.Replace( (char)22 + "", "" );
Assert.That( newText, Is.EqualTo( " 1 2 3 4 5 " ) );
You can create a StringsExtensions file with a method like RemoveDoubleSpaces().
StringsExtensions.cs
public static string RemoveDoubleSpaces(this string value)
{
Regex regex = new Regex("[ ]{2,}", RegexOptions.None);
value = regex.Replace(value, " ");
// this removes space at the end of the value (like "demo ")
// and space at the start of the value (like " hi")
value = value.Trim(' ');
return value;
}
And then you can use it like this:
string stringInput =" hi here is a demo ";
string stringCleaned = stringInput.RemoveDoubleSpaces();
I looked over proposed solutions, could not find the one that would handle mix of white space characters acceptable for my case, for example:
Regex.Replace(input, #"\s+", " ") - it will eat your line breaks, if they are mixed with spaces, for example \n \n sequence will be replaced with
Regex.Replace(source, #"(\s)\s+", "$1") - it will depend on whitespace first character, meaning that it again might eat your line breaks
Regex.Replace(source, #"[ ]{2,}", " ") - it won't work correctly when there's mix of whitespace characters - for example "\t \t "
Probably not perfect, but quick solution for me was:
Regex.Replace(input, #"\s+",
(match) => match.Value.IndexOf('\n') > -1 ? "\n" : " ", RegexOptions.Multiline)
Idea is - line break wins over the spaces and tabs.
This won't handle windows line breaks correctly, but it would be easy to adjust to work with that too, don't know regex that well - may be it is possible to fit into single pattern.
The following code remove all the multiple spaces into a single space
public string RemoveMultipleSpacesToSingle(string str)
{
string text = str;
do
{
//text = text.Replace(" ", " ");
text = Regex.Replace(text, #"\s+", " ");
} while (text.Contains(" "));
return text;
}

Categories

Resources