English to PigLatin Conversion of a Sentence - c#

I am making a program to convert English to PigLatin. However, my solution only seems to work with one word. If I enter in more than ond word, only the last is translated.
testing one translation
Would simply output:
translationway
I've looked at some solutions, but most are in the same fashion as mine, or use "simplified" solutions beyond the scope of my knowledge.
Code:
static void Main(string[] args)
{
Console.WriteLine("Enter a sentence to convert to PigLatin:");
string sentence = Console.ReadLine();
string pigLatin = ToPigLatin(sentence);
Console.WriteLine(pigLatin);
}
static string ToPigLatin (string sentence)
{
string firstLetter,
restOfWord,
vowels = "AEIOUaeio";
int currentLetter;
foreach (string word in sentence.Split())
{
firstLetter = sentence.Substring(0, 1);
restOfWord = sentence.Substring(1, sentence.Length - 1);
currentLetter = vowels.IndexOf(firstLetter);
if (currentLetter == -1)
{
sentence = restOfWord + firstLetter + "ay";
}
else
{
sentence = word + "way";
}
}
return sentence;
All help is greatly appreciated!
Edit
Thanks to great feedback, I've updated my code:
static string ToPigLatin (string sentence)
{
const string vowels = "AEIOUaeio";
List<string> pigWords = new List<string>();
foreach (string word in sentence.Split(' '))
{
string firstLetter = word.Substring(0, 1);
string restOfWord = word.Substring(1, word.Length - 1);
int currentLetter = vowels.IndexOf(firstLetter);
if (currentLetter == -1)
{
pigWords.Add(restOfWord + firstLetter + "ay");
}
else
{
pigWords.Add(word + "way");
}
}
return string.Join(" ", pigWords);
}
Would it be very complex to adapt this code to work with consonant clusters?
For example, right now testing one translation prints as:
estingtay oneway ranslationtay
While, as I understand PigLatin rules, it should read:
estingtay oneway anslationtray

Just place += instead of = here:
if (currentLetter == -1)
{
sentence += restOfWord + firstLetter + "ay";
}
else
{
sentence += word + "way";
}
On your version, you were overriding the sentence in each iteration of your loop
Edit
I've made a lot of changes to the code:
public static string ToPigLatin(string sentence)
{
const string vowels = "AEIOUaeio";
List<string> newWords = new List<string>();
foreach (string word in sentence.Split(' '))
{
string firstLetter = word.Substring(0, 1);
string restOfWord = word.Substring(1, word.Length - 1);
int currentLetter = vowels.IndexOf(firstLetter);
if (currentLetter == -1)
{
newWords.Add(restOfWord + firstLetter + "ay");
}
else
{
newWords.Add(word + "way");
}
}
return string.Join(" ", newWords);
}
As Panagiotis-Kanavos said, and he's damn right, don't build your output on your input but with your input. Thus, I added the newWords list (some might prefer a StringBuilder, I don't).
You were misusing your variable in your loop, especially with the Substrings calls, it's now fixed.
If you have any question on this, don't hesitate.

private static void Main(string[] args)
{
Console.WriteLine("Enter a sentence to convert to PigLatin:");
string sentence = Console.ReadLine();
var pigLatin = GetSentenceInPigLatin(sentence);
Console.WriteLine(pigLatin);
Console.ReadLine();
}
private static string GetSentenceInPigLatin(string sentence)
{
const string vowels = "AEIOUaeio";
var returnSentence = "";
foreach (var word in sentence.Split())
{
var firstLetter = word.Substring(0, 1);
var restOfWord = word.Substring(1, word.Length - 1);
var currentLetter = vowels.IndexOf(firstLetter, StringComparison.Ordinal);
if (currentLetter == -1)
{
returnSentence += restOfWord + firstLetter + "ay ";
}
else
{
returnSentence += word + "way ";
}
}
return returnSentence;
}

I came up with a short LINQ implementation for this:
string.Join(" ", "testing one translation".Split(' ')
.Select(word => "aeiouy".Contains(word[0])
? word.Skip(1).Concat(word.Take(1))
: word.ToCharArray())
.Select(word => word.Concat("way".ToCharArray()))
.Select(word => string.Concat(word)));
Output: "testingway neoway translationway"
Of course, I'd probably refactor it to something like this:
"testing one translation"
.Split(' ')
.Select(word => word.ToCharsWithStartingVowelLast())
.Select(word => word.WithEnding("way"))
.Select(word => string.Concat(word))
.Join(' ');
static class Extensions {
public static IEnumerable<char> ToCharsWithStartingVowelLast(this string word)
{
return "aeiouy".Contains(word[0])
? word.Skip(1).Concat(word.Take(1))
: word.ToCharArray();
}
public static IEnumerable<char> WithEnding(this IEnumerable<char> word, string ending)
{
return word.Concat(ending.ToCharArray())
}
public static string Join(this IEnumerable<IEnumerable<char>> words, char separator)
{
return string.Join(separator, words.Select(word => string.Concat(word)));
}
}
Update:
With your edit, you asked about consonant clusters. One of the things I like about doing this with LINQ is that it's pretty simple to just update that part of the pipeline, and make it all work:
public static IEnumerable<char> ToCharsWithStartingConsonantsLast(this string word)
{
return word.SkipWhile(c => c.IsConsonant()).Concat(word.TakeWhile(c => c.IsConsonant()));
}
public static bool IsConsonant(this char c)
{
return !"aeiouy".Contains(c);
}
The entire pipeline, without refactoring to extension methods, now looks like this:
string.Join(" ", "testing one translation".Split(' ')
.Select(word => word.SkipWhile(c => !"aeiouy".Contains(c)).Concat(word.TakeWhile(c => !"aeiou".Contains(c))))
.Select(word => word.Concat("way".ToCharArray()))
.Select(word => string.Concat(word)))
and outputs "estingtway oneway anslationtrway".
Update 2:
I noticed I wasn't handling word endings correctly. Here's an update that takes care of only adding w to the ending when the word (without the ending) ends with a vowel:
string.Join(" ", "testing one translation".Split(' ')
.Select(word => word.SkipWhile(c => !"aeiouy".Contains(c)).Concat(word.TakeWhile(c => !"aeiou".Contains(c))))
.Select(word =>
{
var ending = "aeiouy".Contains(word.Last()) ? "way" : "ay";
return word.Concat(ending.ToCharArray());
})
.Select(word => string.Concat(word)))
Output: "estingtay oneway anslationtray". Note how it's only the step that handles adding the ending that changed - all other parts of the algorithm were unchanged.
Given how simple this now is, I'd probably only use two extension methods: Join(this IEnumerable<IEnumerable<char>> words, char separator) and IsConsonant(this char c) (the implementation of the latter should be easy given the code samples above). This yields the following final implementation:
"testing one translation"
.Split(' ')
.Select(word => word.SkipWhile(c => !c.IsVowel()).Concat(word.TakeWhile(c => c.IsVowel())))
.Select(word => word.Concat((word.Last().IsVowel() ? "way" : "ay").ToCharArray()))
.Select(word => string.Concat(word))
.Join(" ")
It's also really easy to see here what we do to translate:
Split the sentence into words
Shuffle any consonants to the end of the word (it's admittedly not apparent at first sight that this is what happens, but I can't find a simpler way to express it except by wrapping it in an extension method)
Add the ending
Convert IEnumerable<char>s to strings
Re-join the words into a sentence

Related

problems with for loop string

I'm new to programming in c# and I'm trying to figure out how I could
potentially reverse all words except words containing e in a string.
my current code will detect words containing e, and just writes them down in another textbox:
string text = txbInput.Text;
var words = text.Split(' ');
for (int i = 0; i < words.Length; i++)
{
if (words[i].Contains('e'))
{
txbOutput.Text += words[i];
}
Current:
Input: chicken crossing the road
Output: chickenthe
.
Expected outcome:
Input: chicken crossing the road
Output chicken gnissorc the daor
You can simply split the word on the space character, then, for each word, select either the word itself, or the word reversed (depending on whether or not it contains the 'e' character), and then join them back together again with the space character:
txbOutput.Text = string.Join(" ", txbInput.Text.Split(' ')
.Select(word => word.Contains("e") ? string.Concat(word.Reverse()) : word));
Outputs: chicken gnissorc the daor
using System;
namespace ConsoleApp4
{
class Program
{
static void Main(string[] args)
{
var input = "chicken crossing the road";
foreach (var item in input.Split(' '))
{
if (item.Contains('e'))
{
Console.Write(item + ' ');
}
else
{
Console.Write(Reverse(item) + ' ');
}
}
}
public static string Reverse(string s)
{
char[] charArray = s.ToCharArray();
Array.Reverse(charArray);
return new string(charArray);
}
}
}
enter code here
EDIT
foreach (var item in input.Split(' '))
{
if (item.Contains('e'))
{
txbOutput.Text = txbOutput.Text+ item + ' ';
}
else
{
txbOutput.Text= txbOutput.Text+ Reverse(item) + ' ';
}
}
You can try using the following code -
string.Join(” “,
str.Split(‘ ‘)
.Select(x => new String(x.Reverse().ToArray()))
.ToArray());
Copied from - https://www.declarecode.com/code-solutions/csharp/caprogramtoreverseeachwordinthegivenstring

Is there a simple way to apply grammatical casing to a string?

I am developing a Xamarin.Forms application on UWP
I have an Editor control - Basically a multi-line TextBox
I am trying to apply some simple grammatical casing to the string basically the following:
Capitalise the word "I"
Capitalise the First word
Capitalise the First word after a full stop.
I have managed to do the first two, and am a bit stuck on the third and was wondering if there is an easier way or whether my algorithm can be adapted.
What I have so far is:
public static string ToGramaticalCase(this string s)
{
var thingsToCapitalise = new String[] {"i"};
string newString = string.Empty;
if (!string.IsNullOrEmpty(s))
{
var wordSplit = s.Split(' ');
if (wordSplit.Count() > 1)
{
var wordToCapitalise = wordSplit.First();
wordToCapitalise = wordToCapitalise.Substring(0, 1).ToUpper() + wordToCapitalise.Substring(1);
var value = wordToCapitalise + s.Substring(wordToCapitalise.Length);
foreach (var item in thingsToCapitalise)
{
value = value.Replace(string.Format(" {0} ", item), string.Format(" {0} ", item.ToUpper()));
}
newString = value;
}
}
return newString;
}
This method will capitalize all words after ". ":
[Test]
public void Test()
{
var result = NewSentenceWithUpperLetter("Sentence one. sentence two.");
// result will be 'Sentence one. Sentence two.'
}
private string NewSentenceWithUpperLetter(string text)
{
var splitted = text.Split(' ');
for (var i = 1; i < splitted.Length; i++)
{
if (splitted[i - 1].EndsWith("."))
{
splitted[i] = splitted[i][0].ToString().ToUpper() + splitted[i].Substring(1);
}
}
return string.Join(" ", splitted);
}
Just split the string also on full stop. Change this line:
var wordSplit = s.Split(' ');
Into this:
var wordSplit = s.Split(new char[] { ' ', '.' },StringSplitOptions.RemoveEmptyEntries);
Edit
This extension method would do what you want:
public static string ToTitleCase(this string input)
{
string output =
String.Join(" ", input.Split(new char[] { ' ' },StringSplitOptions.RemoveEmptyEntries)
.ToList()
.Select(x => x = x.Length>1?
x.First().ToString().ToUpper() + x.Substring(1):
x.First().ToString().ToUpper()));
output =
String.Join(".", output.Split(new char[] { '.' },StringSplitOptions.RemoveEmptyEntries)
.ToList()
.Select(x => x = x.Length > 1 ?
x.First().ToString().ToUpper() + x.Substring(1) :
x.First().ToString().ToUpper()));
return output;
}
Test string: string input = "i try this test sentence .now it works as i want";
Output: I Try This Test Sentence .Now It Works As I Want

Split string and keeping the delimiter without Regex C#

Problem: spending too much time solving simple problems. Oh, here's the simple problem.
Input: string inStr, char delimiter
Output: string[] outStrs where string.Join("", outStrs) == inStr and each item in outStrs before the last item must end with the delimiter. If inStr ends with the delimiter, then the last item in outStrs ends with the delimiter as well.
Example 1:
Input: "my,string,separated,by,commas", ','
Output: ["my,", "string,", "separated,", "by,", "commas"]
Example 2:
Input: "my,string,separated,by,commas,", ','
Output: ["my,", "string,", "separated,", "by,", "commas,"] (notice trailing comma)
Solution with Regex: here
I want to avoid using Regex, simply because this requires only character comparison. It's algorithmically just as complex to do as what string.Split() does. It bothers me that I cannot find a more succinct way to do what I want.
My bad solution, which doesn't work for me... it should be faster and more succinct.
var outStr = inStr.Split(new[]{delimiter},
StringSplitOptions.RemoveEmptyEntries)
.Select(x => x + delimiter).ToArray();
if (inStr.Last() != delimiter) {
var lastOutStr = outStr.Last();
outStr[outStr.Length-1] = lastOutStr.Substring(0, lastOutStr.Length-1);
}
Using LINQ:
string input = "my,string,separated,by,commas";
string[] groups = input.Split(',');
string[] output = groups
.Select((x, idx) => x + (idx < groups.Length - 1 ? "," : string.Empty))
.Where(x => x != "")
.ToArray();
Split the string into groups, then transform every group that is not the last element by appending a comma to it.
Just thought of another way you could do it, but I don't think this method is as clear:
string[] output = (input + ',').Split( new[] { "," }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x + ',').ToArray();
Seems pretty simple to me without using Regex:
string inStr = "dasdasdas";
char delimiter = 'A';
string[] result = inStr.Split(new string[] { inStr }, System.StringSplitOptions.RemoveEmptyEntries);
string lastItem = result[result.Length - 1];
int amountOfLoops = lastItem[lastItem.Length - 1] == delimiter ? result.Length - 1 : result.Length - 2;
for (int i = 0; i < amountOfLoops; i++)
{
result[i] += delimiter;
}
public static IEnumerable<string> SplitAndKeep(this string s, string[] delims)
{
int start = 0, index;
string selectedSeperator = null;
while ((index = s.IndexOfAny(delims, start, out selectedSeperator)) != -1)
{
if (selectedSeperator == null)
continue;
if (index - start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, selectedSeperator.Length);
start = index + selectedSeperator.Length;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}

How to split string preserving whole words?

I need to split long sentence into parts preserving whole words. Each part should have given maximum number of characters (including space, dots etc.).
For example:
int partLenght = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon."
Output:
1 part: "Silver badges are awarded for"
2 part: "longer term goals. Silver badges are"
3 part: "uncommon."
Try this:
static void Main(string[] args)
{
int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
string[] words = sentence.Split(' ');
var parts = new Dictionary<int, string>();
string part = string.Empty;
int partCounter = 0;
foreach (var word in words)
{
if (part.Length + word.Length < partLength)
{
part += string.IsNullOrEmpty(part) ? word : " " + word;
}
else
{
parts.Add(partCounter, part);
part = word;
partCounter++;
}
}
parts.Add(partCounter, part);
foreach (var item in parts)
{
Console.WriteLine("Part {0} (length = {2}): {1}", item.Key, item.Value, item.Value.Length);
}
Console.ReadLine();
}
I knew there had to be a nice LINQ-y way of doing this, so here it is for the fun of it:
var input = "The quick brown fox jumps over the lazy dog.";
var charCount = 0;
var maxLineLength = 11;
var lines = input.Split(' ', StringSplitOptions.RemoveEmptyEntries)
.GroupBy(w => (charCount += w.Length + 1) / maxLineLength)
.Select(g => string.Join(" ", g));
// That's all :)
foreach (var line in lines) {
Console.WriteLine(line);
}
Obviously this code works only as long as the query is not parallel, since it depends on charCount to be incremented "in word order".
I've been testing Jon's and Lessan's answers, but they don't work properly if your max length needs to be absolute, rather than approximate. As their counter increments, it doesn't count the empty space left at the end of a line.
Running their code against the OP's example, you get:
1 part: "Silver badges are awarded for " - 29 Characters
2 part: "longer term goals. Silver badges are" - 36 Characters
3 part: "uncommon. " - 13 Characters
The "are" on line two, should be on line three. This happens because the counter does not include the 6 characters from the end of line one.
I came up with the following modification of Lessan's answer to account for this:
public static class ExtensionMethods
{
public static string[] Wrap(this string text, int max)
{
var charCount = 0;
var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
return lines.GroupBy(w => (charCount += (((charCount % max) + w.Length + 1 >= max)
? max - (charCount % max) : 0) + w.Length + 1) / max)
.Select(g => string.Join(" ", g.ToArray()))
.ToArray();
}
}
Split the string with a (space), that build up new strings from the resulting array, stopping before your limit for each new segment.
Untested pseudo-code:
string[] words = sentence.Split(new char[] {' '});
IList<string> sentenceParts = new List<string>();
sentenceParts.Add(string.Empty);
int partCounter = 0;
foreach (var word in words)
{
if(sentenceParts[partCounter].Length + word.Length > myLimit)
{
partCounter++;
sentenceParts.Add(string.Empty);
}
sentenceParts[partCounter] += word + " ";
}
It seems like everyone is using some form of "Split then rebuild the sentence"...
I thought I would take a stab at this the way my brain would logically think about doing this manually, which is:
Split on length
Go backwards to the nearest space and use that chunk
Remove the used chunk and start over
The code ended up being a little more complex than I was hoping for, however I believe it handles most (all?) edge cases - including words that are longer than maxLength, when the words end exactly on the maxLength, etc.
Here's my function:
private static List<string> SplitWordsByLength(string str, int maxLength)
{
List<string> chunks = new List<string>();
while (str.Length > 0)
{
if (str.Length <= maxLength) //if remaining string is less than length, add to list and break out of loop
{
chunks.Add(str);
break;
}
string chunk = str.Substring(0, maxLength); //Get maxLength chunk from string.
if (char.IsWhiteSpace(str[maxLength])) //if next char is a space, we can use the whole chunk and remove the space for the next line
{
chunks.Add(chunk);
str = str.Substring(chunk.Length + 1); //Remove chunk plus space from original string
}
else
{
int splitIndex = chunk.LastIndexOf(' '); //Find last space in chunk.
if (splitIndex != -1) //If space exists in string,
chunk = chunk.Substring(0, splitIndex); // remove chars after space.
str = str.Substring(chunk.Length + (splitIndex == -1 ? 0 : 1)); //Remove chunk plus space (if found) from original string
chunks.Add(chunk); //Add to list
}
}
return chunks;
}
Test usage:
string testString = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
int length = 35;
List<string> test = SplitWordsByLength(testString, length);
foreach (string chunk in test)
{
Console.WriteLine(chunk);
}
Console.ReadLine();
At first I was thinking this might be a Regex kind of thing but here's my shot at it:
List<string> parts = new List<string>();
int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
string[] pieces = sentence.Split(' ');
StringBuilder tempString = new StringBuilder("");
foreach(var piece in pieces)
{
if(piece.Length + tempString.Length + 1 > partLength)
{
parts.Add(tempString.ToString());
tempString.Clear();
}
tempString.Append(" " + piece);
}
Expanding on jon's answer above; I needed to switch g with g.toArray(), and also change max to (max + 2) to get an exact wrapping on the max'th character.
public static class ExtensionMethods
{
public static string[] Wrap(this string text, int max)
{
var charCount = 0;
var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
return lines.GroupBy(w => (charCount += w.Length + 1) / (max + 2))
.Select(g => string.Join(" ", g.ToArray()))
.ToArray();
}
}
And here is sample usage as NUnit tests:
[Test]
public void TestWrap()
{
Assert.AreEqual(2, "A B C".Wrap(4).Length);
Assert.AreEqual(1, "A B C".Wrap(5).Length);
Assert.AreEqual(2, "AA BB CC".Wrap(7).Length);
Assert.AreEqual(1, "AA BB CC".Wrap(8).Length);
Assert.AreEqual(2, "TEST TEST TEST TEST".Wrap(10).Length);
Assert.AreEqual(2, " TEST TEST TEST TEST ".Wrap(10).Length);
Assert.AreEqual("TEST TEST", " TEST TEST TEST TEST ".Wrap(10)[0]);
}
Joel there is a little bug in your code that I've corrected here:
public static string[] StringSplitWrap(string sentence, int MaxLength)
{
List<string> parts = new List<string>();
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
string[] pieces = sentence.Split(' ');
StringBuilder tempString = new StringBuilder("");
foreach (var piece in pieces)
{
if (piece.Length + tempString.Length + 1 > MaxLength)
{
parts.Add(tempString.ToString());
tempString.Clear();
}
tempString.Append((tempString.Length == 0 ? "" : " ") + piece);
}
if (tempString.Length>0)
parts.Add(tempString.ToString());
return parts.ToArray();
}
This works:
int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
List<string> lines =
sentence
.Split(' ')
.Aggregate(new [] { "" }.ToList(), (a, x) =>
{
var last = a[a.Count - 1];
if ((last + " " + x).Length > partLength)
{
a.Add(x);
}
else
{
a[a.Count - 1] = (last + " " + x).Trim();
}
return a;
});
It gives me:
Silver badges are awarded for
longer term goals. Silver badges
are uncommon.
While CsConsoleFormat† was primarily designed to format text for console, it supports generating plain text as well.
var doc = new Document().AddChildren(
new Div("Silver badges are awarded for longer term goals. Silver badges are uncommon.") {
TextWrap = TextWrapping.WordWrap
}
);
var bounds = new Rect(0, 0, 35, Size.Infinity);
string text = ConsoleRenderer.RenderDocumentToText(doc, new TextRenderTarget(), bounds);
And, if you actually need trimmed strings like in your question:
List<string> lines = text.Trim()
.Split(new[] { Environment.NewLine }, StringSplitOptions.None)
.Select(s => s.Trim())
.ToList();
In addition to word wrap on spaces, you get proper handling of hyphens, zero-width spaces, no-break spaces etc.
† CsConsoleFormat was developed by me.

How do I replace multiple spaces with a single space in C#?

How can I replace multiple spaces in a string with only one space in C#?
Example:
1 2 3 4 5
would be:
1 2 3 4 5
I like to use:
myString = Regex.Replace(myString, #"\s+", " ");
Since it will catch runs of any kind of whitespace (e.g. tabs, newlines, etc.) and replace them with a single space.
string sentence = "This is a sentence with multiple spaces";
RegexOptions options = RegexOptions.None;
Regex regex = new Regex("[ ]{2,}", options);
sentence = regex.Replace(sentence, " ");
string xyz = "1 2 3 4 5";
xyz = string.Join( " ", xyz.Split( new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries ));
I think Matt's answer is the best, but I don't believe it's quite right. If you want to replace newlines, you must use:
myString = Regex.Replace(myString, #"\s+", " ", RegexOptions.Multiline);
Another approach which uses LINQ:
var list = str.Split(' ').Where(s => !string.IsNullOrWhiteSpace(s));
str = string.Join(" ", list);
It's much simpler than all that:
while(str.Contains(" ")) str = str.Replace(" ", " ");
Regex can be rather slow even with simple tasks. This creates an extension method that can be used off of any string.
public static class StringExtension
{
public static String ReduceWhitespace(this String value)
{
var newString = new StringBuilder();
bool previousIsWhitespace = false;
for (int i = 0; i < value.Length; i++)
{
if (Char.IsWhiteSpace(value[i]))
{
if (previousIsWhitespace)
{
continue;
}
previousIsWhitespace = true;
}
else
{
previousIsWhitespace = false;
}
newString.Append(value[i]);
}
return newString.ToString();
}
}
It would be used as such:
string testValue = "This contains too much whitespace."
testValue = testValue.ReduceWhitespace();
// testValue = "This contains too much whitespace."
myString = Regex.Replace(myString, " {2,}", " ");
For those, who don't like Regex, here is a method that uses the StringBuilder:
public static string FilterWhiteSpaces(string input)
{
if (input == null)
return string.Empty;
StringBuilder stringBuilder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i == 0 || c != ' ' || (c == ' ' && input[i - 1] != ' '))
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
In my tests, this method was 16 times faster on average with a very large set of small-to-medium sized strings, compared to a static compiled Regex. Compared to a non-compiled or non-static Regex, this should be even faster.
Keep in mind, that it does not remove leading or trailing spaces, only multiple occurrences of such.
This is a shorter version, which should only be used if you are only doing this once, as it creates a new instance of the Regex class every time it is called.
temp = new Regex(" {2,}").Replace(temp, " ");
If you are not too acquainted with regular expressions, here's a short explanation:
The {2,} makes the regex search for the character preceding it, and finds substrings between 2 and unlimited times.
The .Replace(temp, " ") replaces all matches in the string temp with a space.
If you want to use this multiple times, here is a better option, as it creates the regex IL at compile time:
Regex singleSpacify = new Regex(" {2,}", RegexOptions.Compiled);
temp = singleSpacify.Replace(temp, " ");
You can simply do this in one line solution!
string s = "welcome to london";
s.Replace(" ", "()").Replace(")(", "").Replace("()", " ");
You can choose other brackets (or even other characters) if you like.
no Regex, no Linq... removes leading and trailing spaces as well as reducing any embedded multiple space segments to one space
string myString = " 0 1 2 3 4 5 ";
myString = string.Join(" ", myString.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries));
result:"0 1 2 3 4 5"
// Mysample string
string str ="hi you are a demo";
//Split the words based on white sapce
var demo= str .Split(' ').Where(s => !string.IsNullOrWhiteSpace(s));
//Join the values back and add a single space in between
str = string.Join(" ", demo);
// output: string str ="hi you are a demo";
Consolodating other answers, per Joel, and hopefully improving slightly as I go:
You can do this with Regex.Replace():
string s = Regex.Replace (
" 1 2 4 5",
#"[ ]{2,}",
" "
);
Or with String.Split():
static class StringExtensions
{
public static string Join(this IList<string> value, string separator)
{
return string.Join(separator, value.ToArray());
}
}
//...
string s = " 1 2 4 5".Split (
" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries
).Join (" ");
I just wrote a new Join that I like, so I thought I'd re-answer, with it:
public static string Join<T>(this IEnumerable<T> source, string separator)
{
return string.Join(separator, source.Select(e => e.ToString()).ToArray());
}
One of the cool things about this is that it work with collections that aren't strings, by calling ToString() on the elements. Usage is still the same:
//...
string s = " 1 2 4 5".Split (
" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries
).Join (" ");
Many answers are providing the right output but for those looking for the best performances, I did improve Nolanar's answer (which was the best answer for performance) by about 10%.
public static string MergeSpaces(this string str)
{
if (str == null)
{
return null;
}
else
{
StringBuilder stringBuilder = new StringBuilder(str.Length);
int i = 0;
foreach (char c in str)
{
if (c != ' ' || i == 0 || str[i - 1] != ' ')
stringBuilder.Append(c);
i++;
}
return stringBuilder.ToString();
}
}
Use the regex pattern
[ ]+ #only space
var text = Regex.Replace(inputString, #"[ ]+", " ");
I know this is pretty old, but ran across this while trying to accomplish almost the same thing. Found this solution in RegEx Buddy. This pattern will replace all double spaces with single spaces and also trim leading and trailing spaces.
pattern: (?m:^ +| +$|( ){2,})
replacement: $1
Its a little difficult to read since we're dealing with empty space, so here it is again with the "spaces" replaced with a "_".
pattern: (?m:^_+|_+$|(_){2,}) <-- don't use this, just for illustration.
The "(?m:" construct enables the "multi-line" option. I generally like to include whatever options I can within the pattern itself so it is more self contained.
I can remove whitespaces with this
while word.contains(" ") //double space
word = word.Replace(" "," "); //replace double space by single space.
word = word.trim(); //to remove single whitespces from start & end.
Without using regular expressions:
while (myString.IndexOf(" ", StringComparison.CurrentCulture) != -1)
{
myString = myString.Replace(" ", " ");
}
OK to use on short strings, but will perform badly on long strings with lots of spaces.
try this method
private string removeNestedWhitespaces(char[] st)
{
StringBuilder sb = new StringBuilder();
int indx = 0, length = st.Length;
while (indx < length)
{
sb.Append(st[indx]);
indx++;
while (indx < length && st[indx] == ' ')
indx++;
if(sb.Length > 1 && sb[0] != ' ')
sb.Append(' ');
}
return sb.ToString();
}
use it like this:
string test = removeNestedWhitespaces("1 2 3 4 5".toCharArray());
Here is a slight modification on Nolonar original answer.
Checking if the character is not just a space, but any whitespace, use this:
It will replace any multiple whitespace character with a single space.
public static string FilterWhiteSpaces(string input)
{
if (input == null)
return string.Empty;
var stringBuilder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i == 0 || !char.IsWhiteSpace(c) || (char.IsWhiteSpace(c) &&
!char.IsWhiteSpace(strValue[i - 1])))
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
How about going rogue?
public static string MinimizeWhiteSpace(
this string _this)
{
if (_this != null)
{
var returned = new StringBuilder();
var inWhiteSpace = false;
var length = _this.Length;
for (int i = 0; i < length; i++)
{
var character = _this[i];
if (char.IsWhiteSpace(character))
{
if (!inWhiteSpace)
{
inWhiteSpace = true;
returned.Append(' ');
}
}
else
{
inWhiteSpace = false;
returned.Append(character);
}
}
return returned.ToString();
}
else
{
return null;
}
}
Mix of StringBuilder and Enumerable.Aggregate() as extension method for strings:
using System;
using System.Linq;
using System.Text;
public static class StringExtension
{
public static string CondenseSpaces(this string s)
{
return s.Aggregate(new StringBuilder(), (acc, c) =>
{
if (c != ' ' || acc.Length == 0 || acc[acc.Length - 1] != ' ')
acc.Append(c);
return acc;
}).ToString();
}
public static void Main()
{
const string input = " (five leading spaces) (five internal spaces) (five trailing spaces) ";
Console.WriteLine(" Input: \"{0}\"", input);
Console.WriteLine("Output: \"{0}\"", StringExtension.CondenseSpaces(input));
}
}
Executing this program produces the following output:
Input: " (five leading spaces) (five internal spaces) (five trailing spaces) "
Output: " (five leading spaces) (five internal spaces) (five trailing spaces) "
Old skool:
string oldText = " 1 2 3 4 5 ";
string newText = oldText
.Replace(" ", " " + (char)22 )
.Replace( (char)22 + " ", "" )
.Replace( (char)22 + "", "" );
Assert.That( newText, Is.EqualTo( " 1 2 3 4 5 " ) );
You can create a StringsExtensions file with a method like RemoveDoubleSpaces().
StringsExtensions.cs
public static string RemoveDoubleSpaces(this string value)
{
Regex regex = new Regex("[ ]{2,}", RegexOptions.None);
value = regex.Replace(value, " ");
// this removes space at the end of the value (like "demo ")
// and space at the start of the value (like " hi")
value = value.Trim(' ');
return value;
}
And then you can use it like this:
string stringInput =" hi here is a demo ";
string stringCleaned = stringInput.RemoveDoubleSpaces();
I looked over proposed solutions, could not find the one that would handle mix of white space characters acceptable for my case, for example:
Regex.Replace(input, #"\s+", " ") - it will eat your line breaks, if they are mixed with spaces, for example \n \n sequence will be replaced with
Regex.Replace(source, #"(\s)\s+", "$1") - it will depend on whitespace first character, meaning that it again might eat your line breaks
Regex.Replace(source, #"[ ]{2,}", " ") - it won't work correctly when there's mix of whitespace characters - for example "\t \t "
Probably not perfect, but quick solution for me was:
Regex.Replace(input, #"\s+",
(match) => match.Value.IndexOf('\n') > -1 ? "\n" : " ", RegexOptions.Multiline)
Idea is - line break wins over the spaces and tabs.
This won't handle windows line breaks correctly, but it would be easy to adjust to work with that too, don't know regex that well - may be it is possible to fit into single pattern.
The following code remove all the multiple spaces into a single space
public string RemoveMultipleSpacesToSingle(string str)
{
string text = str;
do
{
//text = text.Replace(" ", " ");
text = Regex.Replace(text, #"\s+", " ");
} while (text.Contains(" "));
return text;
}

Categories

Resources