How to find out next word in a sentence in c#? - c#

I have a string
"bat and ball not pen or boat not phone"
I want to pick words adjacent to not
for example -- "not pen", "not phone"
but I was unable to do it? I have tried to pick up the word by using the index and substring but its not possible.
tempTerm = tempTerm.Trim().Substring(0, tempTerm.Length - (orterm.Length + 1)).ToString();

How about using some Regex
Something like
string s = "bat and ball not pen or boat not phone";
Regex reg = new Regex("not\\s\\w+");
MatchCollection matches = reg.Matches(s);
foreach (Match match in matches)
{
string sub = match.Value;
}
See Learn Regular Expression (Regex) syntax with C# and .NET for some more details

You can split the sentence, and then just loop through looking for "not":
string sentence = "bat and ball not pen or boat not phone";
string[] words = sentence.Split(new char[] {' '});
List<string> wordsBesideNot = new List<string>();
for (int i = 0; i < words.Length - 1; i++)
{
if (words[i].Equals("not"))
wordsBesideNot.Add(words[i + 1]);
}
// At this point, wordsBesideNot is { "pen", "phone" }

String[] parts = myStr.Split(' ');
for (int i = 0; i < parts.Length; i++)
if (parts[i] == "not" && i + 1 < parts.Length)
someList.Add(parts[i + 1]);
This should get you all the words adjacent to not, you could compare with case insensitive if need be.

You can use this regex: not\s\w+\b. It will match desired phrases:
not pen
not phone

I'd say start by splitting your string into an array - it will make this kind of thing a whole lot easier.

In C# I would so something like this
// Orginal string
string s = "bat and ball not pen or boat not phone";
// Seperator
string seperate = "not ";
// Length of the seperator
int length = seperate.Length;
// sCopy so you dont touch the original string
string sCopy = s.ToString();
// List to store the words, you could use an array if
// you count the 'not's.
List<string> stringList = new List<string>();
// While the seperator (not ) exists in the string
while (sCopy.IndexOf(seperate) != -1)
{
// Index of the next seperator
int index = sCopy.IndexOf(seperate);
// Remove anything before the seperator and the
// seperator itself.
sCopy = sCopy.Substring(index + length);
// In case of multiple spaces remove them.
sCopy = sCopy.TrimStart(' ');
// If there are more spaces or more words to come
// then specify the length
if (sCopy.IndexOf(' ') != -1)
{
// Cut the word out of sCopy
string sub = sCopy.Substring(0, sCopy.IndexOf(' '));
// Add the word to the list
stringList.Add(sub);
}
// Otherwise just get the rest of the string
else
{
// Cut the word out of sCopy
string sub = sCopy.Substring(0);
// Add the word to the list
stringList.Add(sub);
}
}
int p = 0;
The words in the list are pen and phone. This will fail when you get odd characters, full stops etc. If you don't know how the string is going to be constructed you might need something more complex.

public class StringHelper
{
/// <summary>
/// Gets the surrounding words of a given word in a given text.
/// </summary>
/// <param name="text">A text in which the given word to be searched.</param>
/// <param name="word">A word to be searched in the given text.</param>
/// <param name="prev">The number of previous words to include in the result.</param>
/// <param name="next">The number of next words to include in the result.</param>
/// <param name="all">Sets whether the method returns all instances of the search word.</param>
/// <returns>An array that consists of parts of the text, including the search word and the surrounding words.</returns>
public static List<string> GetSurroundingWords(string text, string word, int prev, int next, bool all = false)
{
var phrases = new List<string>();
var words = text.Split();
var indices = new List<int>();
var index = -1;
while ((index = Array.IndexOf(words, word, index + 1)) != -1)
{
indices.Add(index);
if (!all && indices.Count == 1)
break;
}
foreach (var ind in indices)
{
var prevActual = ind;
if (prev > prevActual)
prev = prevActual;
var nextActual = words.Length - ind;
if (next > nextActual)
next = nextActual;
var picked = new List<string>();
for (var i = 1; i <= prev; i++)
picked.Add(words[ind - i]);
picked.Reverse();
picked.Add(word);
for (var i = 1; i <= next; i++)
picked.Add(words[ind + i]);
phrases.Add(string.Join(" ", picked));
}
return phrases;
}
}
[TestClass]
public class StringHelperTests
{
private const string Text = "Date and Time in C# are handled by DateTime class in C# that provides properties and methods to format dates in different datetime formats.";
[TestMethod]
public void GetSurroundingWords()
{
// Arrange
var word = "class";
var expected = new [] { "DateTime class in C#" };
// Act
var actual = StringHelper.GetSurroundingWords(Text, word, 1, 2);
// Assert
Assert.AreEqual(expected.Length, actual.Count);
Assert.AreEqual(expected[0], actual[0]);
}
[TestMethod]
public void GetSurroundingWords_NoMatch()
{
// Arrange
var word = "classify";
var expected = new List<string>();
// Act
var actual = StringHelper.GetSurroundingWords(Text, word, 1, 2);
// Assert
Assert.AreEqual(expected.Count, actual.Count);
}
[TestMethod]
public void GetSurroundingWords_MoreSurroundingWordsThanAvailable()
{
// Arrange
var word = "class";
var expected = "Date and Time in C# are handled by DateTime class in C#";
// Act
var actual = StringHelper.GetSurroundingWords(Text, word, 50, 2);
// Assert
Assert.AreEqual(expected.Length, actual[0].Length);
Assert.AreEqual(expected, actual[0]);
}
[TestMethod]
public void GetSurroundingWords_ZeroSurroundingWords()
{
// Arrange
var word = "class";
var expected = "class";
// Act
var actual = StringHelper.GetSurroundingWords(Text, word, 0, 0);
// Assert
Assert.AreEqual(expected.Length, actual[0].Length);
Assert.AreEqual(expected, actual[0]);
}
[TestMethod]
public void GetSurroundingWords_AllInstancesOfSearchWord()
{
// Arrange
var word = "and";
var expected = new[] { "Date and Time", "properties and methods" };
// Act
var actual = StringHelper.GetSurroundingWords(Text, word, 1, 1, true);
// Assert
Assert.AreEqual(expected.Length, actual.Count);
Assert.AreEqual(expected[0], actual[0]);
Assert.AreEqual(expected[1], actual[1]);
}
}

Related

List[i].Replace in for-loop won't return string

Trying to make hangman (i'm still a newbie) and the program chooses a random word out of a textfile ==> word turned into arrays. And i have to put it in a label while having the textlabel modified to what's in the letterlist. Thing is: it doesn't show anything in the label and i can't seem to figure out why.
So the for-loop is the modifier and when it has modified every string in the list it should return the word with the right letter or "_".
At first i tried is by doing: letterlist[i] = Letter or letterlist[i] = "_", but would happen is if i typed in a right letter it would show only that letter.
For example: word = "pen". If i typed in "p", it resulted in "ppp".
letterlist = new List<string>();
char[] wordarray = woord.GetWordcharArray(); //word in charArrays
string newwordstring = new string(wordarray);
for (int i = 0; i < wordarray.Length; i++)
{
letterlist.Add(" "); //adds empty strings in list with the length of the word
}
/*
* For-loop for every string in List to check and modify if it's correct or not
*/
for (int i = 0; i < letterlist.Count; i++)
{
if (letterlist[i].Contains(Letter) && newwordstring.Contains(Letter)) //right answer: letter[i] = Letter
{
letterlist[i].Replace(Letter, Letter);
}
else if (letterlist[i].Contains(" ") && newwordstring.Contains(Letter)) //right answer: letter[i] = ""
{
letterlist[i].Replace(" ", Letter);
}
else if (letterlist[i].Contains("_") && newwordstring.Contains(Letter)) //right answer: letter[i] = "_"
{
letterlist[i].Replace("_", Letter);
}
else if (letterlist[i].Contains(" ") && !newwordstring.Contains(Letter)) //wrong answer: letter[i] = ""
{
letterlist[i].Replace(" ", "_");
}
else if (letterlist[i].Contains("_") && !newwordstring.Contains(Letter)) //wrong answer: letter[i] = "_"
{
letterlist[i].Replace(" ", "_");
}
}
/*
* empty += every modified letterlist[i]-string
*/
string empty = "";
foreach (string letter in letterlist)
{
empty += letter;
}
return empty;
New code but it only shows "___" ("_" as many times as the amount of letters as word has):
char[] wordarray = woord.GetWordcharArray(); //word in charArrays
string newwordstring = new string(wordarray); //actual word
string GuessedWord = new string('_', newwordstring.Length);//word that shows in form
bool GuessLetter(char letterguess)
{
bool guessedright = false;
StringBuilder builder = new StringBuilder(GuessedWord);
for(int i = 0; i < GuessedWord.Length; i++)
{
if(char.ToLower(wordarray[i]) == Convert.ToChar(Letter))
{
builder[i] = wordarray[i];
guessedright = true;
}
}
GuessedWord = builder.ToString();
return guessedright;
}
return GuessedWord;
First of all, note that C# string are immutable, which means letterlist[i].Replace(" ", "_") does not replace spaces with underscores. It returns a new string in which spaces have been replaced with underscores.
Therefore, you should reassign this result:
letterlist[i] = letterlist[i].Replace(" ", "_");
Second, Replace(Letter, Letter) won't do much.
Third, in your first for loop, you set every item in letterlist to " ".
I don't understand then why you expect (in your second for loop) letterlist[i].Contains("_") to ever be true.
Finally, I'll leave here something you might find interesting (especially the use of StringBuilder):
class Hangman
{
static void Main()
{
Hangman item = new Hangman();
item.Init();
Console.WriteLine(item.Guessed); // ____
item.GuessLetter('t'); // true
Console.WriteLine(item.Guessed); // T__t
item.GuessLetter('a'); // false
Console.WriteLine(item.Guessed); // T__t
item.GuessLetter('e'); // true
Console.WriteLine(item.Guessed); // Te_t
}
string Word {get;set;}
string Guessed {get;set;}
void Init()
{
Word = "Test";
Guessed = new string('_',Word.Length);
}
bool GuessLetter(char letter)
{
bool guessed = false;
// use a stringbuilder so you can change any character
var sb = new StringBuilder(Guessed);
// for each character of Word, we check if it is the one we claimed
for(int i=0; i<Word.Length; i++)
{
// Let's put both characters to lower case so we can compare them right
if(Char.ToLower(Word[i]) == Char.ToLower(letter)) // have we found it?
{
// Yeah! So we put it in the stringbuilder at the same place
sb[i] = Word[i];
guessed = true;
}
}
// reassign the stringbuilder's representation to Guessed
Guessed = sb.ToString();
// tell if you guessed right
return guessed;
}
}

sub-strings replacements according to some mapping

Given a string, I need to replace substrings according to a given mapping. The mapping determines where to start the replacement, the length of text to be replaced and the replacement string. The mapping is according to the following scheme:
public struct mapItem
{
public int offset;
public int length;
public string newString;
}
For example: given a mapping {{0,3,"frog"},{9,3,"kva"}} and a string
"dog says gav"
we replace starting at position 0 a substring of the length 3 to the "frog", i.e.
dog - > frog
and starting the position 9 a substring of the length 3 to the "kva", i.e.
gav->kva
The new string becomes:
"frog says kva"
How can I do it efficiently?
You have to take care that replacements take into account the shift produced by preceding replacements. Also using a StringBuilder is more efficient, as is doesn't allocate new memory at each operation as string operations do. (Strings are invariant, which means that a completely new string is created at each string operation.)
var maps = new List<MapItem> { ... };
var sb = new StringBuilder("dog says gav");
int shift = 0;
foreach (MapItem map in maps.OrderBy(m => m.Offset)) {
sb.Remove(map.Offset + shift, map.Length);
sb.Insert(map.Offset + shift, map.NewString);
shift += map.NewString.Length - map.Length;
}
string result = sb.ToString();
The OrderBy makes sure that the replacements are executed from left to right. If you know that the mappings are provided in this order, you can drop the OrderBy.
Another simpler way is to begin with the replacements at the right end and work backwards, so that the character shifts do not alter the positions of not yet executed replacements:
var sb = new StringBuilder("dog says gav");
foreach (MapItem map in maps.OrderByDescending(m => m.Offset)) {
sb.Remove(map.Offset, map.Length);
sb.Insert(map.Offset, map.NewString);
}
string result = sb.ToString();
In case the mappings are already ordered in ascending order, a simple reverse for-statement seems appropriate:
var sb = new StringBuilder("dog says gav");
for (int i = maps.Count - 1; i >= 0; i--) {
MapItem map = maps[i];
sb.Remove(map.Offset, map.Length);
sb.Insert(map.Offset, map.NewString);
}
string result = sb.ToString();
You can write an Extension Method like below:
public static class ExtensionMethod
{
public static string ReplaceSubstringByMap(this string str, List<mapItem> map)
{
int offsetShift = 0;
foreach (mapItem mapItem in map.OrderBy(x => x.offset))
{
str = str.Remove(mapItem.offset + offsetShift, mapItem.length).Insert(mapItem.offset + offsetShift, mapItem.newString);
offsetShift += mapItem.newString.Length - mapItem.length;
}
return str;
}
}
And invoke it like below:
var map = new List<mapItem>
{
new mapItem
{
offset = 0,
length = 1,
newString = "frog"
},
new mapItem
{
offset = 9,
length = 1,
newString = "kva"
}
};
string str = "dog says gav";
var result = str.ReplaceSubstringByMap(map);

Count words and spaces in string C#

I want to count words and spaces in my string. String looks like this:
Command do something ptuf(123) and bo(1).ctq[5] v:0,
I have something like this so far
int count = 0;
string mystring = "Command do something ptuf(123) and bo(1).ctq[5] v:0,";
foreach(char c in mystring)
{
if(char.IsLetter(c))
{
count++;
}
}
What should I do to count spaces also?
int countSpaces = mystring.Count(Char.IsWhiteSpace); // 6
int countWords = mystring.Split().Length; // 7
Note that both use Char.IsWhiteSpace which assumes other characters than " " as white-space(like newline). Have a look at the remarks section to see which exactly .
you can use string.Split with a space
http://msdn.microsoft.com/en-us/library/system.string.split.aspx
When you get a string array the number of elements is the number of words, and the number of spaces is the number of words -1
if you want to count spaces you can use LINQ :
int count = mystring.Count(s => s == ' ');
This will take into account:
Strings starting or ending with a space.
Double/triple/... spaces.
Assuming that the only word seperators are spaces and that your string is not null.
private static int CountWords(string S)
{
if (S.Length == 0)
return 0;
S = S.Trim();
while (S.Contains(" "))
S = S.Replace(" "," ");
return S.Split(' ').Length;
}
Note: the while loop can also be done with a regex: How do I replace multiple spaces with a single space in C#?
Here's a method using regex. Just something else to consider. It is better if you have long strings with lots of different types of whitespace. Similar to Microsoft Word's WordCount.
var str = "Command do something ptuf(123) and bo(1).ctq[5] v:0,";
int count = Regex.Matches(str, #"[\S]+").Count; // count is 7
For comparison,
var str = "Command do something ptuf(123) and bo(1).ctq[5] v:0,";
str.Count(char.IsWhiteSpace) is 17, while the regex count is still 7.
I've got some ready code to get a list of words in a string:
(extension methods, must be in a static class)
/// <summary>
/// Gets a list of words in the text. A word is any string sequence between two separators.
/// No word is added if separators are consecutive (would mean zero length words).
/// </summary>
public static List<string> GetWords(this string Text, char WordSeparator)
{
List<int> SeparatorIndices = Text.IndicesOf(WordSeparator.ToString(), true);
int LastIndexNext = 0;
List<string> Result = new List<string>();
foreach (int index in SeparatorIndices)
{
int WordLen = index - LastIndexNext;
if (WordLen > 0)
{
Result.Add(Text.Substring(LastIndexNext, WordLen));
}
LastIndexNext = index + 1;
}
return Result;
}
/// <summary>
/// returns all indices of the occurrences of a passed string in this string.
/// </summary>
public static List<int> IndicesOf(this string Text, string ToFind, bool IgnoreCase)
{
int Index = -1;
List<int> Result = new List<int>();
string T, F;
if (IgnoreCase)
{
T = Text.ToUpperInvariant();
F = ToFind.ToUpperInvariant();
}
else
{
T = Text;
F = ToFind;
}
do
{
Index = T.IndexOf(F, Index + 1);
Result.Add(Index);
}
while (Index != -1);
Result.RemoveAt(Result.Count - 1);
return Result;
}
/// <summary>
/// Implemented - returns all the strings in uppercase invariant.
/// </summary>
public static string[] ToUpperAll(this string[] Strings)
{
string[] Result = new string[Strings.Length];
Strings.ForEachIndex(i => Result[i] = Strings[i].ToUpperInvariant());
return Result;
}
In addition to Tim's entry, in case you have padding on either side, or multiple spaces beside each other:
Int32 words = somestring.Split( // your string
new[]{ ' ' }, // break apart by spaces
StringSplitOptions.RemoveEmptyEntries // remove empties (double spaces)
).Length; // number of "words" remaining
using namespace;
namespace Application;
class classname
{
static void Main(string[] args)
{
int count;
string name = "I am the student";
count = name.Split(' ').Length;
Console.WriteLine("The count is " +count);
Console.ReadLine();
}
}
if you need whitespace count only try this.
string myString="I Love Programming";
var strArray=myString.Split(new char[] { ' ' });
int countSpace=strArray.Length-1;
How about indirectly?
int countl = 0, countt = 0, count = 0;
foreach(char c in str)
{
countt++;
if (char.IsLetter(c))
{
countl++;
}
}
count = countt - countl;
Console.WriteLine("No. of spaces are: "+count);

How can I search through a string in C# and replace areas bounded by a pattern?

We tried a few solutions now that try and use XML parsers. All fail because the strings are not always 100% valid XML. Here's our problem.
We have strings that look like this:
var a = "this is a testxxx of my data yxxx and of these xxx parts yxxx";
var b = "hello testxxx world yxxx ";
"this is a testxxx3yxxx and of these xxx1yxxx";
"hello testxxx1yxxx ";
The key here is that we want to do something to the data between xxx and yxxx. In the example above I would need a function that counts words and replaces the strings with a word count.
Is there a way we can process the string a and apply a function to change the data that's between the xxx and yxxx? Any function right now as we're just trying to get an idea of how to code this.
You can use Split method:
var parts = a.Split(new[] {"xxx", "yxxx"}, StringSplitOptions.None)
.Select((s, index) =>
{
string s1 = index%2 == 1 ? string.Format("{0}{2}{1}", "xxx", "yxxx", s + "1") : s;
return s1;
});
var result = string.Join("", parts);
If it always going to xxx and yxxx, you can use regex as suggested.
var stringBuilder = new StringBuilder();
Regex regex = new Regex("xxx(.*?)yxxx");
var splitGroups = Regex.Match(a);
foreach(var group in splitGroups)
{
var value = splitGroupsCopy[i];
// do something to value and then append it to string builder
stringBuilder.Append(string.Format("{0}{1}{2}", "xxx", value, "yxxx"));
}
I suppose this is as basic as it gets.
Using Regex.Replace will replace all the matches with your choice of text, something like this:
Regex rgx = new Regex("xxx.+yxxx");
string cleaned = rgx.Replace(a, "replacementtext");
This code will process each of the parts delimited by "xxx". It preserves the "xxx" separators. If you do not want to preserve the "xxx" separators, remove the two lines that say "result.Append(separator);".
Given:
"this is a testxxx of my data yxxx and there are many of these xxx parts yxxx"
It prints:
"this is a testxxx>> of my data y<<xxx and there are many of these xxx>> parts y<<xxx"
I'm assuming that's the kind of thing you want. Add your own processing to "processPart()".
using System;
using System.Text;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
string text = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
string separator = "xxx";
var result = new StringBuilder();
int index = 0;
while (true)
{
int start = text.IndexOf(separator, index);
if (start < 0)
{
result.Append(text.Substring(index));
break;
}
result.Append(text.Substring(index, start - index));
int end = text.IndexOf(separator, start + separator.Length);
if (end < 0)
{
throw new InvalidOperationException("Unbalanced separators.");
}
start += separator.Length;
result.Append(separator);
result.Append(processPart(text.Substring(start, end-start)));
result.Append(separator);
index = end + separator.Length;
}
Console.WriteLine(result);
}
private static string processPart(string part)
{
return ">>" + part + "<<";
}
}
}
[EDIT] Here's the code amended to work with two different separators:
using System;
using System.Text;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
string text = "this is a test<pre> of my data y</pre> and there are many of these <pre> parts y</pre>";
string separator1 = "<pre>";
string separator2 = "</pre>";
var result = new StringBuilder();
int index = 0;
while (true)
{
int start = text.IndexOf(separator1, index);
if (start < 0)
{
result.Append(text.Substring(index));
break;
}
result.Append(text.Substring(index, start - index));
int end = text.IndexOf(separator2, start + separator1.Length);
if (end < 0)
{
throw new InvalidOperationException("Unbalanced separators.");
}
start += separator1.Length;
result.Append(separator1);
result.Append(processPart(text.Substring(start, end-start)));
result.Append(separator2);
index = end + separator2.Length;
}
Console.WriteLine(result);
}
private static string processPart(string part)
{
return "|" + part + "|";
}
}
}
The indexOf() function will return to you the index of the first occurrence of a given substring.
(My indices might be a bit off, but) I would suggest doing something like this:
var searchme = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
var startindex= searchme.indexOf("xxx");
var endindex = searchme.indexOf("yxxx") + 3; //added 3 to find the index of the last 'x' instead of the index of the 'y' character
var stringpiece = searchme.substring(startindex, endindex - startindex);
and you can repeat that while startindex != -1
Like I said, the indices might be slightly off, you might have to add a +1 or -1 somewhere, but this will get you along nicely (I think).
Here is a little sample program that counts chars instead of words. But you should just need to change the processor function.
var a = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
a = ProcessString(a, CountChars);
string CountChars(string a)
{
return a.Length.ToString();
}
string ProcessString(string a, Func<string, string> processor)
{
int idx_start, idx_end = -4;
while ((idx_start = a.IndexOf("xxx", idx_end + 4)) >= 0)
{
idx_end = a.IndexOf("yxxx", idx_start + 3);
if (idx_end < 0)
break;
var string_in_between = a.Substring(idx_start + 3, idx_end - idx_start - 3);
var newString = processor(string_in_between);
a = a.Substring(0, idx_start + 3) + newString + a.Substring(idx_end, a.Length - idx_end);
idx_end -= string_in_between.Length - newString.Length;
}
return a;
}
I would use Regex Groups:
Here my solution to get the parts in the string:
private static IEnumerable<string> GetParts( string searchFor, string begin, string end ) {
string exp = string.Format("({0}(?<searchedPart>.+?){1})+", begin, end);
Regex regex = new Regex(exp);
MatchCollection matchCollection = regex.Matches(searchFor);
foreach (Match match in matchCollection) {
Group #group = match.Groups["searchedPart"];
yield return #group.ToString();
}
}
you can use it like to get the parts:
string a = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
IEnumerable<string> parts = GetParts(a, "xxx", "yxxx");
To replace the parts in the original String you can use the Regex Group to determine Length and StartPosition (#group.Index, #group.Length).

C# split string but keep split chars / separators [duplicate]

I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}

Categories

Resources