Using String Split - c#

I have a text
Category2,"Something with ,comma"
when I split this by ',' it should give me two string
Category2
"Something with ,comma"
but in actual it split string from every comma.
how can I achieve my expected result.
Thanx

Just call variable.Split(new char[] { ',' }, 2). Complete documentation in MSDN.

There are a number of things that you could be wanting to do here so I will address a few:
Split on the first comma
String text = text.Split(new char[] { ',' }, 2);
Split on every comma
String text = text.Split(new char[] {','});
Split on a comma not in "
var result = Regex.Split(samplestring, ",(?=(?:[^']*'[^']*')*[^']*$)");
Last one taken from C# Regex Split

Specify the maximum number of strings you want in the array:
string[] parts = text.Split(new char[] { ',' }, 2);

String.Split works at the simplest, fastest level - so it splits the text on all of the delimiters you pass into it, and it has no concept of special rules like double-quotes.
If you need a CSV parser which understands double-quotes, then you can write your own or there are some excellent open source parsers available - e.g. http://www.codeproject.com/KB/database/CsvReader.aspx - this is one I've used in several projects and recommend.

Try this:
public static class StringExtensions
{
public static IEnumerable<string> SplitToSubstrings(this string str)
{
int startIndex = 0;
bool isInQuotes = false;
for (int index = 0; index < str.Length; index++ )
{
if (str[index] == '\"')
isInQuotes = !isInQuotes;
bool isStartOfNewSubstring = (!isInQuotes && str[index] == ',');
if (isStartOfNewSubstring)
{
yield return str.Substring(startIndex, index - startIndex).Trim();
startIndex = index + 1;
}
}
yield return str.Substring(startIndex).Trim();
}
}
Usage is pretty simple:
foreach(var str in text.SplitToSubstrings())
Console.WriteLine(str);

Related

Separating string on each character and putting it in an array

I'm pulling a string from a MySQL database containing all ID's from friends in this format:
5+6+12+33+1+9+
Now, whenever i have to add a friend it's simple, I just take the the string and add whatever ID and a "+". My problem lies with separating all the ID's and putting it in an array. I'm currently using this method
string InputString = "5+6+12+33+1+9+";
string CurrentID = string.Empty;
List<string> AllIDs = new List<string>();
for (int i = 0; i < InputString.Length; i++)
{
if (InputString.Substring(i,1) != "+")
{
CurrentID += InputString.Substring(i, 1);
}
else
{
AllIDs.Add(CurrentID);
CurrentID = string.Empty;
}
}
string[] SeparatedIDs = AllIDs.ToArray();
Even though this does work it just seems overly complicated for what i'm trying to do.
Is there an easier way to split strings or cleaner ?
Try this:-
var result = InputString.Split(new char[] { '+' });
You can use other overloads of Split
as well if you want to remove empty spaces.
You should to use the Split method with StringSplitOptions.RemoveEmptyEntries. RemoveEmptyEntries helps you to avoid empty string as the last array element.
Example:
char[] delimeters = new[] { '+' };
string InputString = "5+6+12+33+1+9+";
string[] SeparatedIDs = InputString.Split(delimeters,
StringSplitOptions.RemoveEmptyEntries);
foreach (string SeparatedID in SeparatedIDs)
Console.WriteLine(SeparatedID);
string[] IDs = InputString.Split('+'); will split your InputString into an array of strings using the + character
var result = InputString.Split('+', StringSplitOptions.RemoveEmptyEntries);
extra options needed since there is a trailing +

Get first 250 words of a string?

How do I get the first 250 words of a string?
You need to split the string. You can use the overload without parameter(whitespaces are assumed).
IEnumerable<string> words = str.Split().Take(250);
Note that you need to add using System.Linq for Enumerable.Take.
You can use ToList() or ToArray() ro create a new collection from the query or save memory and enumerate it directly:
foreach(string word in words)
Console.WriteLine(word);
Update
Since it seems to be quite popular I'm adding following extension which is more efficient than the Enumerable.Take approach and also returns a collection instead of the (deferred executed) query.
It uses String.Split where white-space characters are assumed to be the delimiters if the separator parameter is null or contains no characters. But the method also allows to pass different delimiters:
public static string[] GetWords(
this string input,
int count = -1,
string[] wordDelimiter = null,
StringSplitOptions options = StringSplitOptions.None)
{
if (string.IsNullOrEmpty(input)) return new string[] { };
if(count < 0)
return input.Split(wordDelimiter, options);
string[] words = input.Split(wordDelimiter, count + 1, options);
if (words.Length <= count)
return words; // not so many words found
// remove last "word" since that contains the rest of the string
Array.Resize(ref words, words.Length - 1);
return words;
}
It can be used easily:
string str = "A B C D E F";
string[] words = str.GetWords(5, null, StringSplitOptions.RemoveEmptyEntries); // A,B,C,D,E
yourString.Split(' ').Take(250);
I guess. You should provide more info.
String.Join(" ", yourstring.Split().Take(250).ToArray())
Addition to Tim answer, this is what you can try
IEnumerable<string> words = str.Split().Take(250);
StringBuilder firstwords = new StringBuilder();
foreach(string s in words)
{
firstwords.Append(s + " ");
}
firstwords.Append("...");
Console.WriteLine(firstwords.ToString());
Try this one:
public string TakeWords(string str,int wordCount)
{
char lastChar='\0';
int spaceFound=0;
var strLen= str.Length;
int i=0;
for(; i<strLen; i++)
{
if(str[i]==' ' && lastChar!=' ')
{
spaceFound++;
}
lastChar=str[i];
if(spaceFound==wordCount)
break;
}
return str.Substring(0,i);
}
It's possible without calling Take().
string[] separatedWords = stringToProcess.Split(new char[] {' '}, 250, StringSplitOptions.RemoveEmptyEntries);
Which also allows splitting based on more than just space " " and therefore fixes occurrences when spaces are incorrectly missing in input.
string[] separatedWords = stringToProcess.Split(new char[] {' ', '.', ',' ':', ';'}, 250, StringSplitOptions.RemoveEmptyEntries);
Use StringSplitOptions.None, if you want empty strings to be returned instead.

Count how many words in each sentence

I'm stuck on how to count how many words are in each sentence, an example of this is: string sentence = "hello how are you. I am good. that's good."
and have it come out like:
//sentence1: 4 words
//sentence2: 3 words
//sentence3: 2 words
I can get the number of sentences
public int GetNoOfWords(string s)
{
return s.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries).Length;
}
label2.Text = (GetNoOfWords(sentance).ToString());
and i can get the number of words in the whole string
public int CountWord (string text)
{
int count = 0;
for (int i = 0; i < text.Length; i++)
{
if (text[i] != ' ')
{
if ((i + 1) == text.Length)
{
count++;
}
else
{
if(text[i + 1] == ' ')
{
count++;
}
}
}
}
return count;
}
then button1
int words = CountWord(sentance);
label4.Text = (words.ToString());
But I can't count how many words are in each sentence.
Instead of looping over the string as you do in CountWords I would just use;
int words = s.Split(' ').Length;
It's much more clean and simple. You split on white spaces which returns an array of all the words, the length of that array is the number of words in the string.
Why not use Split instead?
var sentences = "hello how are you. I am good. that's good.";
foreach (var sentence in sentences.TrimEnd('.').Split('.'))
Console.WriteLine(sentence.Trim().Split(' ').Count());
If you want number of words in each sentence, you need to
string s = "This is a sentence. Also this counts. This one is also a thing.";
string[] sentences = s.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string sentence in sentences)
{
Console.WriteLine(sentence.Split(' ').Length + " words in sentence *" + sentence + "*");
}
Use CountWord on each element of the array returned by s.Split:
string sentence = "hello how are you. I am good. that's good.";
string[] words = sentence.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries).Length;
for (string sentence in sentences)
{
int noOfWordsInSentence = CountWord(sentence);
}
string text = "hello how are you. I am good. that's good.";
string[] sentences = s.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
IEnumerable<int> wordsPerSentence = sentences.Select(s => s.Trim().Split(' ').Length);
As noted in several answers here, look at String functions like Split, Trim, Replace, etc to get you going. All answers here will solve your simple example, but here are some sentences which they may fail to analyse correctly;
"Hello, how are you?" (no '.' to parse on)
"That apple costs $1.50." (a '.' used as a decimal)
"I like whitespace . "
"Word"
If you only need a count, I'd avoid Split() -- it takes up unnecessary space. Perhaps:
static int WordCount(string s)
{
int wordCount = 0;
for(int i = 0; i < s.Length - 1; i++)
if (Char.IsWhiteSpace(s[i]) && !Char.IsWhiteSpace(s[i + 1]) && i > 0)
wordCount++;
return ++wordCount;
}
public static void Main()
{
Console.WriteLine(WordCount(" H elloWor ld g ")); // prints "4"
}
It counts based on the number of spaces (1 space = 2 words). Consecutive spaces are ignored.
Does your spelling of sentence in:
int words = CountWord(sentance);
have anything to do with it?

How to delete words between specified chars?

I want to create a method that reads from every line from a file. Next, it has to check between the pipes and determine if there are words that are more than three characters long, and are only numbers. In the file are strings organized like this:
What's going on {noway|that's cool|1293328|why|don't know|see}
With this sentence, the software should remove 1293328.
The resulting sentence would be:
What's going on {noway|that's cool|don't know}
Until now I am reading every line from the file and I made the functions that determine if the words between | | have to be deleted or not (checking a string like noway,that's cool, etc)
I don't know how to get the strings between the pipes.
You can split a string by a character using the Split method.
string YourStringVariable = "{noway|that's cool|1293328|why|don't know|see}";
YourStringVariable.Split('|'); //Returns an array of the strings between the brackets
What's about:
string RemoveValues(string sentence, string[] values){
foreach(string s in values){
while(sentence.IndexOf("|" + s) != -1 && sentence.IndexOf("|" + s) != 0){
sentence = sentence.Remove(sentence.IndexOf("|" + s), s.Lenght + 1);
}
}
return sentence;
}
In your case:
string[] values = new string[3]{ "1293328", "why", "see" };
string sentence = RemoveValues("noway|that's cool|1293328|why|don't know|see", values);
//result: noway|that's cool|don't know
string YourStringVariable = "{noway|that's cool|1293328|why|don't know|see}";
string[] SplitValue=g.Split('|');
string FinalValue = string.Empty;
for (int i = 0; i < SplitValue.Length; i++)
{
if (!SplitValue[i].ToString().Any(char.IsDigit))
{
FinalValue += SplitValue[i]+"|";
}
}

C# split string but keep split chars / separators [duplicate]

I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}

Categories

Resources