C# split string but keep split chars / separators [duplicate] - c#

I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?

If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.

If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2

Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}

Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on

This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");

A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}

This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}

Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}

I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.

To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");

result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)

Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.

veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.

using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}

I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two

I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());

I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}

Related

Method that takes a message and index, creates a substring using the index

Problem: I want to write a method that takes a message/index pair like this:
("Hello, I am *Name1, how are you doing *Name2?", 2)
The index refers to the asterisk delimited name in the message. So if the index is 1, it should refer to *Name1, if it's 2 it should refer to *Name2.
The method should return just the name with the asterisk (*Name2).
I have attempted to play around with substrings, taking the first delimited * and ending when we reach a character that isn't a letter, number, underscore or hyphen, but the logic just isn't setting in.
I know this is similar to a few problems on SO but I can't find anything this specific. Any help is appreciated.
This is what's left of my very vague attempt so far. Based on this thread:
public string GetIndexedNames(string message, int index)
{
int strStart = message.IndexOf("#") + "#".Length;
int strEnd = message.LastIndexOf(" ");
String result = message.Substring(strStart, strEnd - strStart);
}
If you want to do it the old school way, then something like:
public static void Main(string[] args)
{
string message = "Hello, I am *Name1, how are you doing *Name2?";
string name1 = GetIndexedNames(message, "*", 1);
string name2 = GetIndexedNames(message, "*", 2);
Console.WriteLine(message);
Console.WriteLine(name1);
Console.WriteLine(name2);
Console.ReadLine();
}
public static string GetIndexedNames(string message, string singleCharDelimiter, int index)
{
string valid = "abcdefghijlmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-";
string[] parts = message.Split(singleCharDelimiter.ToArray());
if (parts.Length >= index)
{
StringBuilder sb = new StringBuilder();
for(int i = 0; i < parts[index].Length; i++)
{
string character = parts[index].Substring(i, 1);
if (valid.Contains(character))
{
sb.Append(character);
}
else
{
return sb.ToString();
}
}
return sb.ToString();
}
return "";
}
You can try using regular expressions to match the names. Assuming that name is a sequence of word characters (letters or digits):
using System.Linq;
using System.Text.RegularExpressions;
...
// Either name with asterisk *Name or null
// index is 1-based
private static ObtainName(string source, int index) => Regex
.Matches(source, #"\*\w+")
.Cast<Match>()
.Select(match => match.Value)
.Distinct() // in case the same name repeats several times
.ElementAtOrDefault(index - 1);
Demo:
string name = ObtainName(
"Hello, I am *Name1, how are you doing *Name2?", 2);
Console.Write(name);
Outcome:
*Name2
Perhaps not the most elegant solution, but if you want to use IndexOf, use a loop:
public static string GetIndexedNames(string message, int index, char marker='*')
{
int lastFound = 0;
for (int i = 0; i < index; i++) {
lastFound = message.IndexOf(marker, lastFound+1);
if (lastFound == -1) return null;
}
var space = message.IndexOf(' ', lastFound);
return space == -1 ? message.Substring(lastFound) : message.Substring(lastFound, space - lastFound);
}

Deleting lines. My algorithm for solving the problem does not work

I need to decide to complete the solution so that all text is removed after any comment markers. Any spaces at the end of the line must be removed
My code:
public static string StripComments(string text, string[] commentSymbols)
{
string[] str = text.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < str.Length; i++)
{
int index=0;
if (str[i].IndexOf(commentSymbols[0]) != -1)
{
index = str[i].IndexOf(commentSymbols[0], StringComparison.Ordinal)-1;
str[i] = str[i].Remove(index)+"\n";
}
else if (str[i].IndexOf(commentSymbols[1]) != -1)
{
index = str[i].IndexOf(commentSymbols[1],StringComparison.Ordinal)-1;
str[i] = str[i].Remove(index)+"\n";
}
else {
str[i]+="\n";
}
}
str[str.Length-1]=str[str.Length-1].Remove(str[str.Length-1].LastIndexOf('\n'));
return string.Join("",str).TrimEnd();
}
For example:
string stripped = StripCommentsSolution.StripComments("apples, pears # and bananas\ngrapes\nbananas !apples", new [] { "#", "!" })
// result should == "apples, pears\ngrapes\nbananas"
When running tests, it throws an error:
Expected string length 6 but was 8. Strings differ at index 1.
Expected: "a \ n b \ nc" But was: "a \ n b \ nc"
cause of #evgeny20 mentioned in comments, he want's it without regexes here one solution i found, without it.
public static string StripComments(string text, string[] commentSymbols)
{
string[] lines = text.Split(new[] { "\n" }, StringSplitOptions.None);
lines = lines.Select(x => x.Split(commentSymbols, StringSplitOptions.None).First().TrimEnd()).ToArray();
return string.Join("\n", lines);
}
Best Regards and happy learning
Here is my Solution, have a try
you need this using...
using System.Text.RegularExpressions;
public static string StripComments(string text, string[] commentSymbols)
{
foreach (string commentSymbol in commentSymbols)
{
text = Regex.Replace(text, $"{commentSymbol}[^\\n]*", "");
}
text = Regex.Replace(text, "\\s*(\\n+)", "$1");
return text;
}
Edit:
Regex changed that line ends are trimmed too
Best Regards
In general, your code seems a resonable approach. It does however not cover:
more or less than two commentSymbols
lines containing more than one comment symbol
The following code handles these cases:
public static string StripComments(string text, string[] commentSymbols)
{
string[] str = text.Split('\n');
for (int i = 0; i < str.Length; i++)
{
int index = -1;
// find first occurrence of any commentSymbol in str[i]
foreach (var commentSymbol in commentSymbols)
{
int indexOfThisSymbol = str[i].IndexOf(commentSymbol);
if (indexOfThisSymbol != -1 && (index == -1 || indexOfThisSymbol < index))
index = indexOfThisSymbol;
}
if (index != -1)
{
// strip off comment symbol and everything after it
str[i] = str[i].Substring(0, index);
}
// strip off trailing spaces
str[i] = str[i].TrimEnd();
}
// combine again
return string.Join("\n", str);
}
i can provide a second Solution like i understand your request.
Please have a look.
public static string StripComments(string text, string[] commentSymbols)
{
var textLines = text.Split("\n");
for (int i = 0; i < textLines.Length;i++ )
{
foreach (string commentSymbol in commentSymbols)
{
textLines[i] = Regex.Replace(textLines[i], $"{Regex.Escape(commentSymbol)}[^\\n]*", "");
}
textLines[i] = textLines[i].TrimEnd();
}
return string.Join("\n",textLines);
}
this preserves all line breaks and removes comments and whitespaces as requested in your link.
If it isn't right, you can give a more constructive example. At least I'm trying to help you.
EDIT: this solution now supports Regex Specialcharacters as commentsymbol
With this i passed the test in codewars successfully
Best Regards

Split string and keeping the delimiter without Regex C#

Problem: spending too much time solving simple problems. Oh, here's the simple problem.
Input: string inStr, char delimiter
Output: string[] outStrs where string.Join("", outStrs) == inStr and each item in outStrs before the last item must end with the delimiter. If inStr ends with the delimiter, then the last item in outStrs ends with the delimiter as well.
Example 1:
Input: "my,string,separated,by,commas", ','
Output: ["my,", "string,", "separated,", "by,", "commas"]
Example 2:
Input: "my,string,separated,by,commas,", ','
Output: ["my,", "string,", "separated,", "by,", "commas,"] (notice trailing comma)
Solution with Regex: here
I want to avoid using Regex, simply because this requires only character comparison. It's algorithmically just as complex to do as what string.Split() does. It bothers me that I cannot find a more succinct way to do what I want.
My bad solution, which doesn't work for me... it should be faster and more succinct.
var outStr = inStr.Split(new[]{delimiter},
StringSplitOptions.RemoveEmptyEntries)
.Select(x => x + delimiter).ToArray();
if (inStr.Last() != delimiter) {
var lastOutStr = outStr.Last();
outStr[outStr.Length-1] = lastOutStr.Substring(0, lastOutStr.Length-1);
}
Using LINQ:
string input = "my,string,separated,by,commas";
string[] groups = input.Split(',');
string[] output = groups
.Select((x, idx) => x + (idx < groups.Length - 1 ? "," : string.Empty))
.Where(x => x != "")
.ToArray();
Split the string into groups, then transform every group that is not the last element by appending a comma to it.
Just thought of another way you could do it, but I don't think this method is as clear:
string[] output = (input + ',').Split( new[] { "," }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x + ',').ToArray();
Seems pretty simple to me without using Regex:
string inStr = "dasdasdas";
char delimiter = 'A';
string[] result = inStr.Split(new string[] { inStr }, System.StringSplitOptions.RemoveEmptyEntries);
string lastItem = result[result.Length - 1];
int amountOfLoops = lastItem[lastItem.Length - 1] == delimiter ? result.Length - 1 : result.Length - 2;
for (int i = 0; i < amountOfLoops; i++)
{
result[i] += delimiter;
}
public static IEnumerable<string> SplitAndKeep(this string s, string[] delims)
{
int start = 0, index;
string selectedSeperator = null;
while ((index = s.IndexOfAny(delims, start, out selectedSeperator)) != -1)
{
if (selectedSeperator == null)
continue;
if (index - start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, selectedSeperator.Length);
start = index + selectedSeperator.Length;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}

Count words and spaces in string C#

I want to count words and spaces in my string. String looks like this:
Command do something ptuf(123) and bo(1).ctq[5] v:0,
I have something like this so far
int count = 0;
string mystring = "Command do something ptuf(123) and bo(1).ctq[5] v:0,";
foreach(char c in mystring)
{
if(char.IsLetter(c))
{
count++;
}
}
What should I do to count spaces also?
int countSpaces = mystring.Count(Char.IsWhiteSpace); // 6
int countWords = mystring.Split().Length; // 7
Note that both use Char.IsWhiteSpace which assumes other characters than " " as white-space(like newline). Have a look at the remarks section to see which exactly .
you can use string.Split with a space
http://msdn.microsoft.com/en-us/library/system.string.split.aspx
When you get a string array the number of elements is the number of words, and the number of spaces is the number of words -1
if you want to count spaces you can use LINQ :
int count = mystring.Count(s => s == ' ');
This will take into account:
Strings starting or ending with a space.
Double/triple/... spaces.
Assuming that the only word seperators are spaces and that your string is not null.
private static int CountWords(string S)
{
if (S.Length == 0)
return 0;
S = S.Trim();
while (S.Contains(" "))
S = S.Replace(" "," ");
return S.Split(' ').Length;
}
Note: the while loop can also be done with a regex: How do I replace multiple spaces with a single space in C#?
Here's a method using regex. Just something else to consider. It is better if you have long strings with lots of different types of whitespace. Similar to Microsoft Word's WordCount.
var str = "Command do something ptuf(123) and bo(1).ctq[5] v:0,";
int count = Regex.Matches(str, #"[\S]+").Count; // count is 7
For comparison,
var str = "Command do something ptuf(123) and bo(1).ctq[5] v:0,";
str.Count(char.IsWhiteSpace) is 17, while the regex count is still 7.
I've got some ready code to get a list of words in a string:
(extension methods, must be in a static class)
/// <summary>
/// Gets a list of words in the text. A word is any string sequence between two separators.
/// No word is added if separators are consecutive (would mean zero length words).
/// </summary>
public static List<string> GetWords(this string Text, char WordSeparator)
{
List<int> SeparatorIndices = Text.IndicesOf(WordSeparator.ToString(), true);
int LastIndexNext = 0;
List<string> Result = new List<string>();
foreach (int index in SeparatorIndices)
{
int WordLen = index - LastIndexNext;
if (WordLen > 0)
{
Result.Add(Text.Substring(LastIndexNext, WordLen));
}
LastIndexNext = index + 1;
}
return Result;
}
/// <summary>
/// returns all indices of the occurrences of a passed string in this string.
/// </summary>
public static List<int> IndicesOf(this string Text, string ToFind, bool IgnoreCase)
{
int Index = -1;
List<int> Result = new List<int>();
string T, F;
if (IgnoreCase)
{
T = Text.ToUpperInvariant();
F = ToFind.ToUpperInvariant();
}
else
{
T = Text;
F = ToFind;
}
do
{
Index = T.IndexOf(F, Index + 1);
Result.Add(Index);
}
while (Index != -1);
Result.RemoveAt(Result.Count - 1);
return Result;
}
/// <summary>
/// Implemented - returns all the strings in uppercase invariant.
/// </summary>
public static string[] ToUpperAll(this string[] Strings)
{
string[] Result = new string[Strings.Length];
Strings.ForEachIndex(i => Result[i] = Strings[i].ToUpperInvariant());
return Result;
}
In addition to Tim's entry, in case you have padding on either side, or multiple spaces beside each other:
Int32 words = somestring.Split( // your string
new[]{ ' ' }, // break apart by spaces
StringSplitOptions.RemoveEmptyEntries // remove empties (double spaces)
).Length; // number of "words" remaining
using namespace;
namespace Application;
class classname
{
static void Main(string[] args)
{
int count;
string name = "I am the student";
count = name.Split(' ').Length;
Console.WriteLine("The count is " +count);
Console.ReadLine();
}
}
if you need whitespace count only try this.
string myString="I Love Programming";
var strArray=myString.Split(new char[] { ' ' });
int countSpace=strArray.Length-1;
How about indirectly?
int countl = 0, countt = 0, count = 0;
foreach(char c in str)
{
countt++;
if (char.IsLetter(c))
{
countl++;
}
}
count = countt - countl;
Console.WriteLine("No. of spaces are: "+count);

Count how many words in each sentence

I'm stuck on how to count how many words are in each sentence, an example of this is: string sentence = "hello how are you. I am good. that's good."
and have it come out like:
//sentence1: 4 words
//sentence2: 3 words
//sentence3: 2 words
I can get the number of sentences
public int GetNoOfWords(string s)
{
return s.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries).Length;
}
label2.Text = (GetNoOfWords(sentance).ToString());
and i can get the number of words in the whole string
public int CountWord (string text)
{
int count = 0;
for (int i = 0; i < text.Length; i++)
{
if (text[i] != ' ')
{
if ((i + 1) == text.Length)
{
count++;
}
else
{
if(text[i + 1] == ' ')
{
count++;
}
}
}
}
return count;
}
then button1
int words = CountWord(sentance);
label4.Text = (words.ToString());
But I can't count how many words are in each sentence.
Instead of looping over the string as you do in CountWords I would just use;
int words = s.Split(' ').Length;
It's much more clean and simple. You split on white spaces which returns an array of all the words, the length of that array is the number of words in the string.
Why not use Split instead?
var sentences = "hello how are you. I am good. that's good.";
foreach (var sentence in sentences.TrimEnd('.').Split('.'))
Console.WriteLine(sentence.Trim().Split(' ').Count());
If you want number of words in each sentence, you need to
string s = "This is a sentence. Also this counts. This one is also a thing.";
string[] sentences = s.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string sentence in sentences)
{
Console.WriteLine(sentence.Split(' ').Length + " words in sentence *" + sentence + "*");
}
Use CountWord on each element of the array returned by s.Split:
string sentence = "hello how are you. I am good. that's good.";
string[] words = sentence.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries).Length;
for (string sentence in sentences)
{
int noOfWordsInSentence = CountWord(sentence);
}
string text = "hello how are you. I am good. that's good.";
string[] sentences = s.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
IEnumerable<int> wordsPerSentence = sentences.Select(s => s.Trim().Split(' ').Length);
As noted in several answers here, look at String functions like Split, Trim, Replace, etc to get you going. All answers here will solve your simple example, but here are some sentences which they may fail to analyse correctly;
"Hello, how are you?" (no '.' to parse on)
"That apple costs $1.50." (a '.' used as a decimal)
"I like whitespace . "
"Word"
If you only need a count, I'd avoid Split() -- it takes up unnecessary space. Perhaps:
static int WordCount(string s)
{
int wordCount = 0;
for(int i = 0; i < s.Length - 1; i++)
if (Char.IsWhiteSpace(s[i]) && !Char.IsWhiteSpace(s[i + 1]) && i > 0)
wordCount++;
return ++wordCount;
}
public static void Main()
{
Console.WriteLine(WordCount(" H elloWor ld g ")); // prints "4"
}
It counts based on the number of spaces (1 space = 2 words). Consecutive spaces are ignored.
Does your spelling of sentence in:
int words = CountWord(sentance);
have anything to do with it?

Categories

Resources