Count how many words in each sentence - c#

I'm stuck on how to count how many words are in each sentence, an example of this is: string sentence = "hello how are you. I am good. that's good."
and have it come out like:
//sentence1: 4 words
//sentence2: 3 words
//sentence3: 2 words
I can get the number of sentences
public int GetNoOfWords(string s)
{
return s.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries).Length;
}
label2.Text = (GetNoOfWords(sentance).ToString());
and i can get the number of words in the whole string
public int CountWord (string text)
{
int count = 0;
for (int i = 0; i < text.Length; i++)
{
if (text[i] != ' ')
{
if ((i + 1) == text.Length)
{
count++;
}
else
{
if(text[i + 1] == ' ')
{
count++;
}
}
}
}
return count;
}
then button1
int words = CountWord(sentance);
label4.Text = (words.ToString());
But I can't count how many words are in each sentence.

Instead of looping over the string as you do in CountWords I would just use;
int words = s.Split(' ').Length;
It's much more clean and simple. You split on white spaces which returns an array of all the words, the length of that array is the number of words in the string.

Why not use Split instead?
var sentences = "hello how are you. I am good. that's good.";
foreach (var sentence in sentences.TrimEnd('.').Split('.'))
Console.WriteLine(sentence.Trim().Split(' ').Count());

If you want number of words in each sentence, you need to
string s = "This is a sentence. Also this counts. This one is also a thing.";
string[] sentences = s.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string sentence in sentences)
{
Console.WriteLine(sentence.Split(' ').Length + " words in sentence *" + sentence + "*");
}

Use CountWord on each element of the array returned by s.Split:
string sentence = "hello how are you. I am good. that's good.";
string[] words = sentence.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries).Length;
for (string sentence in sentences)
{
int noOfWordsInSentence = CountWord(sentence);
}

string text = "hello how are you. I am good. that's good.";
string[] sentences = s.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
IEnumerable<int> wordsPerSentence = sentences.Select(s => s.Trim().Split(' ').Length);

As noted in several answers here, look at String functions like Split, Trim, Replace, etc to get you going. All answers here will solve your simple example, but here are some sentences which they may fail to analyse correctly;
"Hello, how are you?" (no '.' to parse on)
"That apple costs $1.50." (a '.' used as a decimal)
"I like whitespace . "
"Word"

If you only need a count, I'd avoid Split() -- it takes up unnecessary space. Perhaps:
static int WordCount(string s)
{
int wordCount = 0;
for(int i = 0; i < s.Length - 1; i++)
if (Char.IsWhiteSpace(s[i]) && !Char.IsWhiteSpace(s[i + 1]) && i > 0)
wordCount++;
return ++wordCount;
}
public static void Main()
{
Console.WriteLine(WordCount(" H elloWor ld g ")); // prints "4"
}
It counts based on the number of spaces (1 space = 2 words). Consecutive spaces are ignored.

Does your spelling of sentence in:
int words = CountWord(sentance);
have anything to do with it?

Related

How can I make it so that text.Split(' ')[0] increments?

How can I make it so that text.Split(' ')[0] increments? I would like it to do text.Split(' ')[++] but putting that ++ in there doesn't work. The goal is to have the code count the "search" words. Sorry, new to c#.
using System;
namespace TESTER
{
class Program
{
static void Main(string[] args)
{
int wordCount = 0;
int index = 0;
string text = "I ate a donut on national donut day and it tasted like a donut";
string search = "donut";
// skip whitespace until first word
while (index < text.Length)
{
if (search == text.Split(' ')[0])
{
wordCount++;
}
}
Console.WriteLine(wordCount);
}
}
}
You could just do this:
string text = "I ate a donut on national donut day and it tasted like a donut";
string search = "donut";
int wordCount = text.Split(' ').Count(x => x == search);
Console.WriteLine(wordCount);
That gives 3.
Try doing this.
using System;
namespace TESTER
{
class Program
{
static void Main(string[] args)
{
int wordCount = 0;
int index = 0;
string text = "I ate a donut on national donut day and it tasted like a donut";
string search = "donut";
// skip whitespace until first word
string[] wordArray = text.Split(' ');
while (index < wordArray.Length)
{
if (search == wordArray[index])
{
wordCount++;
}
index++;
}
Console.WriteLine(wordCount);
}
}
}
You can use this:
using System;
using System.Linq;
namespace TESTER
{
class Program
{
static void Main(string[] args)
{
string text = "I ate a donut on national donut day and it tasted like a donut";
string search = "donut";
var wordCount = text.Split(' ', StringSplitOptions.RemoveEmptyEntries)
.Count(x => x == search);
Console.WriteLine(wordCount);
}
}
}
If you want a case-insensitive search use:
var wordCount = text.Split(' ', StringSplitOptions.RemoveEmptyEntries).Count(
x => string.Equals(x, search, StringComparison.InvariantCultureIgnoreCase)
);
The answer of Enigmativity is the right one. That's how you should do what you want.
But you're learning and using LINQ won't make it easier.
Your variable text is of string. When you use the member function Split(...) of the type string (or String, which is the same), it will return an array of string. This is string[]. To use the [] you can declare such an object.
string[] words;
Then you assign the result of your text.Split(' ') to it.
words = text.Split(' ');
This gives you access to all entries through the variable words.
string str = words[0];
To count without LINQ you can iterate through the array. Think this was you intention with the [++]. You have two options.
Use a for-loop or a foreach.
int wordCount = 0;
for( int i = 0; i < words.Count )
{
if( words[i] == search)
++wordCount;
}
or the foreach-loop
// let pretend it's a real program here and
// reset the wordCount rather then declaring it
wordCount = 0;
foreach( string str in words )
{
if( words[i] == search)
++wordCount;
}
Incrementation with the ++ sign, or it's opposite --:
These need a number. For instance an int.
int number = 0;
This you can increment with:
number++;
Now number will have the value of 1.
You can use it in the indexer of an array. But you do need an integer.
Consider this code:
int index = 0;
while(index < words.Length)
{
Console.WriteLine( words[ index++ ] );
}
Here you have the array words. In the indexer you request entry of what number holds as value. Then index will be incremented by 1 and due to the while-loop index will be 14 upon exiting the while-loop. 14 is the number of words in your initial string.

Count chars and words in text, can it be optimized?

I want to count chars in a big text, I do it with this code:
string s = textBox.Text;
int chars = 0;
int words = 0;
foreach(var v in s.ToCharArray())
chars++;
foreach(var v in s.Split(' '))
words++;
this code works but it seems pretty slow with large text, so how can i improve this?
You don't need another char-array, you can use String.Length directly:
int chars = s.Length;
int words = s.Split().Length;
Side-note: if you call String.Split without an argument all white-space characters are used as delimiter. Those include spaces, tab-characters and new-line characters. This is not a complete list of possible word delimiters but it's better than " ".
You are also counting consecutive spaces as different "words". Use StringSplitOptions.RemoveEmptyEntries:
string[] wordSeparators = { "\r\n", "\n", ",", ".", "!", "?", ";", ":", " ", "-", "/", "\\", "[", "]", "(", ")", "<", ">", "#", "\"", "'" }; // this list is probably too extensive, tim.schmelter#myemail.com would count as 4 words, but it should give you an idea
string[] words = s.Split(wordSeparators, StringSplitOptions.RemoveEmptyEntries);
int wordCount = words.Length;
You can do this in a single pass through without making a copy of your string:
int chars = 0;
int words = 0;
//keep track of spaces so as to only count nonspace-space-nonspace transitions
//it is initialized to true to count the first word only when we come to it
bool lastCharWasSpace = true;
foreach (var c in s)
{
chars++;
if (c == ' ')
{
lastCharWasSpace = true;
}
else if (lastCharWasSpace)
{
words++;
lastCharWasSpace = false;
}
}
Note the reason I do not use string.Split here is that it does a bunch of string copies under the hood to return the resulting array. Since you're not using the contents but instead are only interested in the count, this is a waste of time and memory - especially if you have a big enough text that has to be shuffled off to main memory, or worse yet swap space.
Do be aware that string.Split does on the other hand by default use a longer list of delimiters than just ' ', so you may want to add other conditions to the if statement.
You can simply use
int numberOfLetters = textBox.Length;
or use LINQ
int numberOfLetters = textBox.ToCharArray().Count();
or
int numberOfLetters = 0;
foreach (char letter in textBox)
{
numberOfLetters++;
}
var chars = textBox.Text.Length;
var words = textbox.Text.Count(c => c == ' ') + 1;

Using String Split

I have a text
Category2,"Something with ,comma"
when I split this by ',' it should give me two string
Category2
"Something with ,comma"
but in actual it split string from every comma.
how can I achieve my expected result.
Thanx
Just call variable.Split(new char[] { ',' }, 2). Complete documentation in MSDN.
There are a number of things that you could be wanting to do here so I will address a few:
Split on the first comma
String text = text.Split(new char[] { ',' }, 2);
Split on every comma
String text = text.Split(new char[] {','});
Split on a comma not in "
var result = Regex.Split(samplestring, ",(?=(?:[^']*'[^']*')*[^']*$)");
Last one taken from C# Regex Split
Specify the maximum number of strings you want in the array:
string[] parts = text.Split(new char[] { ',' }, 2);
String.Split works at the simplest, fastest level - so it splits the text on all of the delimiters you pass into it, and it has no concept of special rules like double-quotes.
If you need a CSV parser which understands double-quotes, then you can write your own or there are some excellent open source parsers available - e.g. http://www.codeproject.com/KB/database/CsvReader.aspx - this is one I've used in several projects and recommend.
Try this:
public static class StringExtensions
{
public static IEnumerable<string> SplitToSubstrings(this string str)
{
int startIndex = 0;
bool isInQuotes = false;
for (int index = 0; index < str.Length; index++ )
{
if (str[index] == '\"')
isInQuotes = !isInQuotes;
bool isStartOfNewSubstring = (!isInQuotes && str[index] == ',');
if (isStartOfNewSubstring)
{
yield return str.Substring(startIndex, index - startIndex).Trim();
startIndex = index + 1;
}
}
yield return str.Substring(startIndex).Trim();
}
}
Usage is pretty simple:
foreach(var str in text.SplitToSubstrings())
Console.WriteLine(str);

How to split string preserving whole words?

I need to split long sentence into parts preserving whole words. Each part should have given maximum number of characters (including space, dots etc.).
For example:
int partLenght = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon."
Output:
1 part: "Silver badges are awarded for"
2 part: "longer term goals. Silver badges are"
3 part: "uncommon."
Try this:
static void Main(string[] args)
{
int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
string[] words = sentence.Split(' ');
var parts = new Dictionary<int, string>();
string part = string.Empty;
int partCounter = 0;
foreach (var word in words)
{
if (part.Length + word.Length < partLength)
{
part += string.IsNullOrEmpty(part) ? word : " " + word;
}
else
{
parts.Add(partCounter, part);
part = word;
partCounter++;
}
}
parts.Add(partCounter, part);
foreach (var item in parts)
{
Console.WriteLine("Part {0} (length = {2}): {1}", item.Key, item.Value, item.Value.Length);
}
Console.ReadLine();
}
I knew there had to be a nice LINQ-y way of doing this, so here it is for the fun of it:
var input = "The quick brown fox jumps over the lazy dog.";
var charCount = 0;
var maxLineLength = 11;
var lines = input.Split(' ', StringSplitOptions.RemoveEmptyEntries)
.GroupBy(w => (charCount += w.Length + 1) / maxLineLength)
.Select(g => string.Join(" ", g));
// That's all :)
foreach (var line in lines) {
Console.WriteLine(line);
}
Obviously this code works only as long as the query is not parallel, since it depends on charCount to be incremented "in word order".
I've been testing Jon's and Lessan's answers, but they don't work properly if your max length needs to be absolute, rather than approximate. As their counter increments, it doesn't count the empty space left at the end of a line.
Running their code against the OP's example, you get:
1 part: "Silver badges are awarded for " - 29 Characters
2 part: "longer term goals. Silver badges are" - 36 Characters
3 part: "uncommon. " - 13 Characters
The "are" on line two, should be on line three. This happens because the counter does not include the 6 characters from the end of line one.
I came up with the following modification of Lessan's answer to account for this:
public static class ExtensionMethods
{
public static string[] Wrap(this string text, int max)
{
var charCount = 0;
var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
return lines.GroupBy(w => (charCount += (((charCount % max) + w.Length + 1 >= max)
? max - (charCount % max) : 0) + w.Length + 1) / max)
.Select(g => string.Join(" ", g.ToArray()))
.ToArray();
}
}
Split the string with a (space), that build up new strings from the resulting array, stopping before your limit for each new segment.
Untested pseudo-code:
string[] words = sentence.Split(new char[] {' '});
IList<string> sentenceParts = new List<string>();
sentenceParts.Add(string.Empty);
int partCounter = 0;
foreach (var word in words)
{
if(sentenceParts[partCounter].Length + word.Length > myLimit)
{
partCounter++;
sentenceParts.Add(string.Empty);
}
sentenceParts[partCounter] += word + " ";
}
It seems like everyone is using some form of "Split then rebuild the sentence"...
I thought I would take a stab at this the way my brain would logically think about doing this manually, which is:
Split on length
Go backwards to the nearest space and use that chunk
Remove the used chunk and start over
The code ended up being a little more complex than I was hoping for, however I believe it handles most (all?) edge cases - including words that are longer than maxLength, when the words end exactly on the maxLength, etc.
Here's my function:
private static List<string> SplitWordsByLength(string str, int maxLength)
{
List<string> chunks = new List<string>();
while (str.Length > 0)
{
if (str.Length <= maxLength) //if remaining string is less than length, add to list and break out of loop
{
chunks.Add(str);
break;
}
string chunk = str.Substring(0, maxLength); //Get maxLength chunk from string.
if (char.IsWhiteSpace(str[maxLength])) //if next char is a space, we can use the whole chunk and remove the space for the next line
{
chunks.Add(chunk);
str = str.Substring(chunk.Length + 1); //Remove chunk plus space from original string
}
else
{
int splitIndex = chunk.LastIndexOf(' '); //Find last space in chunk.
if (splitIndex != -1) //If space exists in string,
chunk = chunk.Substring(0, splitIndex); // remove chars after space.
str = str.Substring(chunk.Length + (splitIndex == -1 ? 0 : 1)); //Remove chunk plus space (if found) from original string
chunks.Add(chunk); //Add to list
}
}
return chunks;
}
Test usage:
string testString = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
int length = 35;
List<string> test = SplitWordsByLength(testString, length);
foreach (string chunk in test)
{
Console.WriteLine(chunk);
}
Console.ReadLine();
At first I was thinking this might be a Regex kind of thing but here's my shot at it:
List<string> parts = new List<string>();
int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
string[] pieces = sentence.Split(' ');
StringBuilder tempString = new StringBuilder("");
foreach(var piece in pieces)
{
if(piece.Length + tempString.Length + 1 > partLength)
{
parts.Add(tempString.ToString());
tempString.Clear();
}
tempString.Append(" " + piece);
}
Expanding on jon's answer above; I needed to switch g with g.toArray(), and also change max to (max + 2) to get an exact wrapping on the max'th character.
public static class ExtensionMethods
{
public static string[] Wrap(this string text, int max)
{
var charCount = 0;
var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
return lines.GroupBy(w => (charCount += w.Length + 1) / (max + 2))
.Select(g => string.Join(" ", g.ToArray()))
.ToArray();
}
}
And here is sample usage as NUnit tests:
[Test]
public void TestWrap()
{
Assert.AreEqual(2, "A B C".Wrap(4).Length);
Assert.AreEqual(1, "A B C".Wrap(5).Length);
Assert.AreEqual(2, "AA BB CC".Wrap(7).Length);
Assert.AreEqual(1, "AA BB CC".Wrap(8).Length);
Assert.AreEqual(2, "TEST TEST TEST TEST".Wrap(10).Length);
Assert.AreEqual(2, " TEST TEST TEST TEST ".Wrap(10).Length);
Assert.AreEqual("TEST TEST", " TEST TEST TEST TEST ".Wrap(10)[0]);
}
Joel there is a little bug in your code that I've corrected here:
public static string[] StringSplitWrap(string sentence, int MaxLength)
{
List<string> parts = new List<string>();
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
string[] pieces = sentence.Split(' ');
StringBuilder tempString = new StringBuilder("");
foreach (var piece in pieces)
{
if (piece.Length + tempString.Length + 1 > MaxLength)
{
parts.Add(tempString.ToString());
tempString.Clear();
}
tempString.Append((tempString.Length == 0 ? "" : " ") + piece);
}
if (tempString.Length>0)
parts.Add(tempString.ToString());
return parts.ToArray();
}
This works:
int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
List<string> lines =
sentence
.Split(' ')
.Aggregate(new [] { "" }.ToList(), (a, x) =>
{
var last = a[a.Count - 1];
if ((last + " " + x).Length > partLength)
{
a.Add(x);
}
else
{
a[a.Count - 1] = (last + " " + x).Trim();
}
return a;
});
It gives me:
Silver badges are awarded for
longer term goals. Silver badges
are uncommon.
While CsConsoleFormat† was primarily designed to format text for console, it supports generating plain text as well.
var doc = new Document().AddChildren(
new Div("Silver badges are awarded for longer term goals. Silver badges are uncommon.") {
TextWrap = TextWrapping.WordWrap
}
);
var bounds = new Rect(0, 0, 35, Size.Infinity);
string text = ConsoleRenderer.RenderDocumentToText(doc, new TextRenderTarget(), bounds);
And, if you actually need trimmed strings like in your question:
List<string> lines = text.Trim()
.Split(new[] { Environment.NewLine }, StringSplitOptions.None)
.Select(s => s.Trim())
.ToList();
In addition to word wrap on spaces, you get proper handling of hyphens, zero-width spaces, no-break spaces etc.
† CsConsoleFormat was developed by me.

C# split string but keep split chars / separators [duplicate]

I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}

Categories

Resources