I want to separate a string consisting of one or more two-letter codes separated by commas into two-letter substrings and put them in a string array or other suitable data structure. The result is at one point to be databound to a combo box so this needs to be taken into account.
The string I want to manipulate can either be empty, consist of two letters only or be made up by multiple two-letter codes separated by commas (and possibly a space).
I was thinking of using a simple string array but I'm not sure if this is the best way to go.
So... what data structure would you recommend that I use and how would you implement it?
Definitely at least start with a string array, because it's the return type of string.Split():
string MyCodes = "AB,BC,CD";
char[] delimiters = new char[] {',', ' '};
string[] codes = MyCodes.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
Update: added space to the delimiters. That will have the effect of trimming spaces from your result strings.
Would something like this work?
var list = theString.Split(", ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).ToList();
My answer is "right", but I suggest Joel Coehoorn's answer.
public static string[] splitItems(string inp)
{
if(inp.Length == 0)
return new string[0];
else
return inp.Split(',');
}
If you are simply going to bind to the structure then a String[] ought to be fine - if you need to work with the data before you use it as a data source then a List<String> is probably a better choice.
Here is an example:
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
String s = "ab,cd,ef";
// either a String[]
String[] array = s.Split(new Char[] {','});
// or a List<String>
List<String> list = new List<String>(s.Split(new Char[] { ',' }));
}
}
Related
Consider a number of strings, which are assumed to contain "keys" of the form "Wxxx", where x are digits from 0-9. Each one can contain either one only, or multiple ones, separated by ',' followed by two spaces. For example:
W123
W432
W546, W234, W167
The ones that contain multiple "keys" need to be split up, into an array. So, the last one in the above examples should be split into an array like this: {"W546", "W234", "W167"}.
As a quick solution, String.Split comes to mind, but as far as I am aware, it can take one character, like ','. The problem is that it would return an array with like this: {"W546", " W234", " W167"}. The two spaces in all the array entries from the second one onwards can probably be removed using Substring, but is there a better solution?
For context, these values are being held in a spreadsheet, and are assumed to have undergone data validation to ensure the "keys" are separated by a comma followed by two spaces.
while ((ws.Cells[row,1].Value!=null) && (ws.Cells[row,1].Value.ToString().Equals("")))
{
// there can be one key, or multiple keys separated by ','
if (ws.Cells[row,keysCol].Value.ToString().Contains(','))
{
// there are multiple
// need to split the ones in this cell separated by a comma
}
else
{
// there is one
}
row++;
}
You can just specify ',' and ' ' as separators and RemoveEmptyEntries.
Using your sample of single keys and a string containing multiple keys you can just handle them all the same and get your list of individual keys:
List<string> cells = new List<string>() { "W123", "W432", "W546, W234, W167" };
List<string> keys = new List<string>();
foreach (string cell in cells)
{
keys.AddRange(cell.Split(new char[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries));
}
Split can handle strings where's nothing to split and AddRange will accept your single keys as well as the multi-key split results.
You could use an old favorite--Regular Expressions.
Here are two flavors 'Loop' or 'LINQ'.
static void Main(string[] args)
{
var list = new List<string>{"W848","W998, W748","W953, W9484, W7373","W888"};
Console.WriteLine("LINQ");
list.ForEach(l => TestSplitRegexLinq(l));
Console.WriteLine();
Console.WriteLine("Loop");
list.ForEach(l => TestSplitRegexLoop(l));
}
private static void TestSplitRegexLinq(string s)
{
string pattern = #"[W][0-9]*";
var reg = new Regex(pattern);
reg.Matches(s).ToList().ForEach(m => Console.WriteLine(m.Value));
}
private static void TestSplitRegexLoop(string s)
{
string pattern = #"[W][0-9]*";
var reg = new Regex(pattern);
foreach (Match m in reg.Matches(s))
{
Console.WriteLine(m.Value);
}
}
Just replace the Console.Write with anything you want: eg. myList.Add(m.Value).
You will need to add the NameSpace: using System.Text.RegularExpressions;
Eliminate the extra space first (using Replace()), then use split.
var input = "W546, W234, W167";
var normalized = input.Replace(", ",",");
var array = normalized.Split(',');
This way, you treat a comma followed by a space exactly the same as you'd treat a comma. If there might be two spaces you can also replace that:
var input = "W546, W234, W167";
var normalized = input.Replace(" "," ").Replace(", ",",");
var array = normalized.Split(',');
After trying this in .NET fiddle, I think I may have a solution:
// if there are multiple
string keys = ws.Cells[row,keysCol].Value.ToString();
// remove spaces
string keys_normalised = keys.Replace(" ", string.Empty);
Console.WriteLine("Checking that spaces have been removed: " + keys3_normalised + "\n");
string[] splits = keys3_normalised.Split(',');
for (int i = 0; i < splits.Length; i++)
{
Console.WriteLine(splits[i]);
}
This produces the following output in the console:
Checking that spaces have been removed: W456,W234,W167
W456
W234
W167
When i program with C#,using Split function:
string[] singleStr=str.Split(';');
the str is column111.dwg&186&0;
Why the singleStr.Length=2? Why give a array althrough the array is null?
I'm not sure what desStr looks like but sounds like you need to use StringSplitOptions.RemoveEmptyEntries
The return value does not include array elements that contain an empty
string
string str = "column111.dwg&186&0;";
string[] singleStr = str.Split(new char[] {';'}, StringSplitOptions.RemoveEmptyEntries);
foreach (var item in singleStr)
{
Console.WriteLine(item);
}
Output will be only;
column111.dwg&186&0
Here a demonstration
If we don't use StringSplitOptions.RemoveEmptyEntries in this case, singleStr array has 2 items; column111.dwg&186&0 and ""
If you dont want to have empty entries, use this construction:
string[] singleStr=str.Split(new[] {';'}, StringSplitOptions.RemoveEmptyEntries);
Basically, it's correct to have two items in your array as you do have a semicolon in your string.
One of the prototypes of Split method allow you to set the SplitStringOptions to RemoveEmptyEntries
Eg:
var parts = yourString.Split( new []{';'}, SplitStringOptions.RemoveEmptyEntries);
Use this:
string[] singleStr=str.Split(new[]{';'}, StringSplitOptions.RemoveEmptyEntries);
it will remove null's and empty strings like whitespaces and others
Answer to your question:
array length is 2 because Split sees: column111.dwg&186&0; and do split on ; and gets:
column111.dwg&186&0 and after ; it has null string only
If you wish to ignore empty entries, try using
String.Split Method (Char[], StringSplitOptions)
Returns a string array that contains the substrings in this string
that are delimited by elements of a specified Unicode character array.
A parameter specifies whether to return empty array elements.
StringSplitOptions Enumeration
RemoveEmptyEntries:
The return value does not include array elements that contain an empty
string
This is because it splits the string everywhere it finds a ";" in the string and that results in an empty entry because your ";" is at the end of the string.
You can use following call to remove emtpy entries:
str.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
I am trying to split a string into a string[] made of the words the string originally held using the fallowing code.
private string[] ConvertWordsFromFile(String NewFileText)
{
char[] delimiterChars = { ' ', ',', '.', ':', '/', '|', '<' , '>','/','#','#','$','%','^','&','*','"','(',')',';'};
string[] words = NewFileText.Split(delimiterChars);
return words;
}
I am then using this to add the words to a dictionary that keeps up with word keys and their frequency value. All other duplicated words are not added as keys and only the value is affected. However the last word is counted as a different word and is therefore made into a new key. How can i fix this?
This is the code I have for adding words to the dictionary :
public void AddWord(String newWord)
{
newWord = newWord.ToLower();
try
{
MyWords.Add(newWord, 1);
}
catch (ArgumentException)
{
MyWords[newWord]++;
}
}
To clarify the problem i am having is that even if the word at the end of a string is a duplicate it is still treated like a new word and therefore a new string.
Random guess - space at the end makes empty word that you don't expect. If yes - use correct option for Split:
var words = newFileText.Split(delimiterChars,
StringSplitOptions.RemoveEmptyEntries);
Split is not the best choice to do what you want to do because you end having this kind of problems and you also have to specify all the delimiters, etc.
A much better option is using a regular expressions instead of your ConvertWordsFromFile method as follow:
Regex.Split(theTextToBeSplitted, #"\W+")
This line will return an array containing all the 'words'. Once you have that, the next step should be create your dictionary so, if you can use linq in your code, the easiest and cleaner way to do what you want is this one:
var theTextToBeSplitted = "#Hi, this is a 'little' test: <I hope it is useful>";
var myDictionary = Regex.Split(theTextToBeSplitted, #"\W+")
.GroupBy(x => x)
.ToDictionary(x => x.Key, x => x.Count());
That´s all that you need.
Good luck!
I have strings like this:
/Administration/References
/Administration/Menus/Home
etc
Is there an easy way that I can find the 1st, 2nd and 3rd words that appear in these strings and place it into an array. ie. the text between the slashes?
The easiest way in this case is
var words = myString.Split(new[]{'/'}, StringSplitOptions.RemoveEmptyEntries);
This will give you an array of all the words seperated by the slashes.
The StringSplitOptions.RemoveEmptyEntries will make sure that you don't get empty entries, since the string is starting with a / it will give an empty first element in the array. If you have a trailing / it will give a empty last element as well.
string.Split(new char[] { '/' })
See MSDN for more info:
http://msdn.microsoft.com/en-us/library/b873y76a.aspx
I think what you are looking for is the split method on string i.e.
string[] words = yourstring.Split('/');
It will give you a List that contains 1st line, 2nd line and etc. Each list item is an Array of strings that you want to parse.
private List<string[]> ParseText(string text)
{
string[] lines = text.Split(new string[] { System.Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
List<string[]> list = new List<string[]>();
foreach (var item in lines)
{
list.Add(item.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries));
}
return list;
}
I'm doing simple string input parsing and I am in need of a string tokenizer. I am new to C# but have programmed Java, and it seems natural that C# should have a string tokenizer. Does it? Where is it? How do I use it?
You could use String.Split method.
class ExampleClass
{
public ExampleClass()
{
string exampleString = "there is a cat";
// Split string on spaces. This will separate all the words in a string
string[] words = exampleString.Split(' ');
foreach (string word in words)
{
Console.WriteLine(word);
// there
// is
// a
// cat
}
}
}
For more information see Sam Allen's article about splitting strings in c# (Performance, Regex)
I just want to highlight the power of C#'s Split method and give a more detailed comparison, particularly from someone who comes from a Java background.
Whereas StringTokenizer in Java only allows a single delimiter, we can actually split on multiple delimiters making regular expressions less necessary (although if one needs regex, use regex by all means!) Take for example this:
str.Split(new char[] { ' ', '.', '?' })
This splits on three different delimiters returning an array of tokens. We can also remove empty arrays with what would be a second parameter for the above example:
str.Split(new char[] { ' ', '.', '?' }, StringSplitOptions.RemoveEmptyEntries)
One thing Java's String tokenizer does have that I believe C# is lacking (at least Java 7 has this feature) is the ability to keep the delimiter(s) as tokens. C#'s Split will discard the tokens. This could be important in say some NLP applications, but for more general purpose applications this might not be a problem.
The split method of a string is what you need. In fact the tokenizer class in Java is deprecated in favor of Java's string split method.
I think the nearest in the .NET Framework is
string.Split()
For complex splitting you could use a regex creating a match collection.
_words = new List<string>(YourText.ToLower().Trim('\n', '\r').Split(' ').
Select(x => new string(x.Where(Char.IsLetter).ToArray())));
Or
_words = new List<string>(YourText.Trim('\n', '\r').Split(' ').
Select(x => new string(x.Where(Char.IsLetterOrDigit).ToArray())));
The similar to Java's method is:
Regex.Split(string, pattern);
where
string - the text you need to split
pattern - string type pattern, what is splitting the text
use Regex.Split(string,"#|#");
read this, split function has an overload takes an array consist of seperators
http://msdn.microsoft.com/en-us/library/system.stringsplitoptions.aspx
If you're trying to do something like splitting command line arguments in a .NET Console app, you're going to have issues because .NET is either broken or is trying to be clever (which means it's as good as broken). I needed to be able to split arguments by the space character, preserving any literals that were quoted so they didn't get split in the middle. This is the code I wrote to do the job:
private static List<String> Tokenise(string value, char seperator)
{
List<string> result = new List<string>();
value = value.Replace(" ", " ").Replace(" ", " ").Trim();
StringBuilder sb = new StringBuilder();
bool insideQuote = false;
foreach(char c in value.ToCharArray())
{
if(c == '"')
{
insideQuote = !insideQuote;
}
if((c == seperator) && !insideQuote)
{
if (sb.ToString().Trim().Length > 0)
{
result.Add(sb.ToString().Trim());
sb.Clear();
}
}
else
{
sb.Append(c);
}
}
if (sb.ToString().Trim().Length > 0)
{
result.Add(sb.ToString().Trim());
}
return result;
}
If you are using C# 3.5 you could write an extension method to System.String that does the splitting you need. You then can then use syntax:
string.SplitByMyTokens();
More info and a useful example from MS here http://msdn.microsoft.com/en-us/library/bb383977.aspx