Splitting a string with a space/two spaces after the character - c#

Consider a number of strings, which are assumed to contain "keys" of the form "Wxxx", where x are digits from 0-9. Each one can contain either one only, or multiple ones, separated by ',' followed by two spaces. For example:
W123
W432
W546, W234, W167
The ones that contain multiple "keys" need to be split up, into an array. So, the last one in the above examples should be split into an array like this: {"W546", "W234", "W167"}.
As a quick solution, String.Split comes to mind, but as far as I am aware, it can take one character, like ','. The problem is that it would return an array with like this: {"W546", " W234", " W167"}. The two spaces in all the array entries from the second one onwards can probably be removed using Substring, but is there a better solution?
For context, these values are being held in a spreadsheet, and are assumed to have undergone data validation to ensure the "keys" are separated by a comma followed by two spaces.
while ((ws.Cells[row,1].Value!=null) && (ws.Cells[row,1].Value.ToString().Equals("")))
{
// there can be one key, or multiple keys separated by ','
if (ws.Cells[row,keysCol].Value.ToString().Contains(','))
{
// there are multiple
// need to split the ones in this cell separated by a comma
}
else
{
// there is one
}
row++;
}

You can just specify ',' and ' ' as separators and RemoveEmptyEntries.
Using your sample of single keys and a string containing multiple keys you can just handle them all the same and get your list of individual keys:
List<string> cells = new List<string>() { "W123", "W432", "W546, W234, W167" };
List<string> keys = new List<string>();
foreach (string cell in cells)
{
keys.AddRange(cell.Split(new char[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries));
}
Split can handle strings where's nothing to split and AddRange will accept your single keys as well as the multi-key split results.

You could use an old favorite--Regular Expressions.
Here are two flavors 'Loop' or 'LINQ'.
static void Main(string[] args)
{
var list = new List<string>{"W848","W998, W748","W953, W9484, W7373","W888"};
Console.WriteLine("LINQ");
list.ForEach(l => TestSplitRegexLinq(l));
Console.WriteLine();
Console.WriteLine("Loop");
list.ForEach(l => TestSplitRegexLoop(l));
}
private static void TestSplitRegexLinq(string s)
{
string pattern = #"[W][0-9]*";
var reg = new Regex(pattern);
reg.Matches(s).ToList().ForEach(m => Console.WriteLine(m.Value));
}
private static void TestSplitRegexLoop(string s)
{
string pattern = #"[W][0-9]*";
var reg = new Regex(pattern);
foreach (Match m in reg.Matches(s))
{
Console.WriteLine(m.Value);
}
}
Just replace the Console.Write with anything you want: eg. myList.Add(m.Value).
You will need to add the NameSpace: using System.Text.RegularExpressions;

Eliminate the extra space first (using Replace()), then use split.
var input = "W546, W234, W167";
var normalized = input.Replace(", ",",");
var array = normalized.Split(',');
This way, you treat a comma followed by a space exactly the same as you'd treat a comma. If there might be two spaces you can also replace that:
var input = "W546, W234, W167";
var normalized = input.Replace(" "," ").Replace(", ",",");
var array = normalized.Split(',');

After trying this in .NET fiddle, I think I may have a solution:
// if there are multiple
string keys = ws.Cells[row,keysCol].Value.ToString();
// remove spaces
string keys_normalised = keys.Replace(" ", string.Empty);
Console.WriteLine("Checking that spaces have been removed: " + keys3_normalised + "\n");
string[] splits = keys3_normalised.Split(',');
for (int i = 0; i < splits.Length; i++)
{
Console.WriteLine(splits[i]);
}
This produces the following output in the console:
Checking that spaces have been removed: W456,W234,W167
W456
W234
W167

Related

C# use split to separate two parts of string

I have the following text in an Excel spreadsheet cell:
"Calories (kcal) "
(minus quotes).
I can get the value of the cell into my code:
string nutrientLabel = dataRow[0].ToString().Trim();
I'm new to C# and need help in separating the "Calories" and "(kcal)" to to different variables that I can upload into my online system. I need the result to be two strings:
nutrientLabel = Calories
nutrientUOM = kcal
I've googled the hell out of this and found out how to make it work to separate them and display into Console.WriteLine but I need the values actually out to 2 variables.
foreach (DataRow dataRow in nutrientsdataTable.Rows)
{
string nutrientLabel = dataRow[0].ToString().Trim();
}
char[] paraSeparator = new char[] { '(', ')' };
string[] result;
Console.WriteLine("=======================================");
Console.WriteLine("Para separated strings :\n");
result = nutrientLabel.Split(paraSeparator,
StringSplitOptions.RemoveEmptyEntries);
foreach (string str in result)
{
Console.WriteLine(str);
}
You can use a simple regex for this:
var reg = new Regex(#"(?<calories>\d+)\s\((?<kcal>\d+)\)");
Which essentially says:
Match at least one number and store it in the group 'calories'
Match a space and an opening parenthesis
Match at least one number and store it in the group 'kcal'
Match a closing parenthesis
Then we can extract the results using the named groups:
var sampleInput = "15 (35)";
var match = reg.Match(sampleInput);
var calories = match.Groups["calories"];
var kcal = match.Groups["kcal"];
Note that calories and kcal are still strings here, you'll need to parse them into an integer (or decimal)
string [] s = dataRow[0].ToString().Split(' ');
nutrientLabel = s[0];
nutrientUOM = s[1].Replace(")","").Replace("(","");

How to save the strings in array and display the next string array if match found?

I read the *.txt file from c# and displayed in the console.
My text file looks like a table.
diwas hey
ivonne how
pokhara d kd
lekhanath when
dipisha dalli hos
dfsa sasf
Now I want to search for a string "pokhara" and if it is found then it should display the "d kd" and if not found display "Not found"
What I tried?
string[] lines = System.IO.ReadAllLines(#"C:\readme.txt");
foreach(string line in lines)
{
string [] words = line.Split();
foreach(string word in words)
{
if (word=="pokhara")
{
Console.WriteLine("Match Found");
}
}
}
My Problem:
Match was found but how to display the next word of the line. Also sometimes
in second row some words are split in two with a space, I need to show both words.
I guess your delimiter is the tab-character, then you can use String.Split and LINQ:
var lineFields = System.IO.File.ReadLines(#"C:\readme.txt")
.Select(l => l.Split('\t'));
var matches = lineFields
.Where(arr => arr.First().Trim() == "pokhara")
.Select(arr => arr.Last().Trim());
// if you just want the first match:
string result = matches.FirstOrDefault(); // is null if not found
If you don't know the delimiter as suggested by your comment you have a problem. If you don't even know the rules of how the fields are separated it's very likely that your code is incorrect. So first determine the business logic, ask the people who created the text file. Then use the correct delimiter in String.Split.
If it's a space you can either use string.Split()(without argument), that includes spaces, tabs and new-line characters or use string.Split(' ') which only includes the space. But note that is a bad delimiter if the fields can contain spaces as well. Then either use a different or wrap the fields in quoting characters like "text with spaces". But then i suggest a real text-parser like the Microsoft.VisualBasic.FileIO.TextFieldParser which can also be used in C#. It has a HasFieldsEnclosedInQuotes property.
This works ...
string[] lines = System.IO.ReadAllLines(#"C:\readme.txt");
string stringTobeDisplayed = string.Empty;
foreach(string line in lines)
{
stringTobeDisplayed = string.Empty;
string [] words = line.Split();
//I assume that the first word in every line is the key word to be found
if (word[0].Trim()=="pokhara")
{
Console.WriteLine("Match Found");
for(int i=1 ; i < words.Length ; i++)
{
stringTobeDisplayed += words[i]
}
Console.WriteLine(stringTobeDisplayed);
}
}

Checking if string contains multiple letters

I have a collection of an object called dictionaryWords.
For each word in this collection I need to check if it does not contain certain letters. If it does contain one or more of a certain letter it is removed from the collection.
Example:
Collection before removal: ['abc','dee',fff']
letters to check for: e,f
Collection after removal: ['abc']
Rather than specifying multiple letters is there a way to check against an array?
My code:
foreach(DictionaryWord word in dictionaryWords)
{
if (!word.Contains("D") && !word.Contains("E") // optimize this line
{
// Word does not contain letters, word is good
}
}
How can I replace the "optimize this line" to say "if word contains any letter from an array of values"
Thanks,
Andrew
Try something like this:
// isolate the letters
string[] letters = new string[] { "D", "E", "F" }; // other strings or letters
// interate between your data
foreach(DictionaryWord word in dictionaryWords)
{
// check if the work does not contain any letter
if (!letters.Any(x => word.Contains(x))
{
// Word does not contain letters, word is good
}
}
Why not abstract this out to a function?
void CheckForLetters(IEnumerable<DictionaryWord> source, IEnumerable<char> letters) {
foreach (var word in source) {
if (letters.any(c => word.Contains(c)) {
// It has the letter
}
}
}
Create a method that accepts an array of characters to compare against.
The prototype would look like the following: bool isLetterInWord(DictionaryWord word, char[] letters);
Assuming that you can map your DictionaryWord class to a string, you can use a Linq query for this:
var words = new string[] { "abc", "def", "fe" };
var lettersToExclude = new char[] { 'e', 'f' };
var goodWords = from x in words where !lettersToExclude.Intersect(x).Any() select x;
Use HashSets and make use of IsSubsetOf method
http://msdn.microsoft.com/en-us/library/bb358446(v=vs.110).aspx
string str = "teststing";
var strset = new HashSet<char>(str.ToCharArray());
var charstoCheck = new HashSet<char>(new Char[] {'z'});
bool isstrcontainschars = charstoCheck.IsSubsetOf(strset);
You could use Regex for that.
string value = "abcdef";
var result = new Regex("[d-e-f]").Replace(value, string.Empty);
It checks if the string contains the letters at the Regex's constructor, and replace the letters for what you want. In my example, I replaced the letter to string.Empty.
In your case, you could do like this:
if (new Regex("[d-e-f]").IsMatch(word))
{
//Do something.
}

How do I know which delimiter was used when delimiting a string on multiple delimiters? (C#)

I read strings from a file and they come in various styles:
item0 item1 item2
item0,item1,item2
item0_item1_item2
I split them like this:
string[] split_line = line[i].split(new char[] {' ',',','_'});
I change an item (column) and then i stitch the strings back together using string builder.
But now when putting the string back I have to use the right delimiter.
Is it possible to know which delimiter was used when splitting the string?
UPDATE
the caller will pass me the first item so that I only change that line.
Unless you keep track of splitting action (one at the time) you don't.
Otherwise, you could create a regular expression, to catch the item and the delimiter and go from there.
Instead of passing in an array of characters, you can use a Regex to split the string instead. The advantage of doing this, is that you can capture the splitting character. Regex.Split will insert any captures between elements in the array like so:
string[] space = Regex.Split("123 456 789", #"([,_ ])");
// Results in { "123", " ", "456", " ", "789" }
string[] comma = Regex.Split("123,456,789", #"([,_ ])");
// Results in { "123", ",", "456", ",", "789" }
string[] underscore = Regex.Split("123_456_789", #"([,_ ])");
// Results in { "123", "_", "456", "_", "789" }
Then you can edit all items in the array with something like
for (int x = 0; x < space.Length; x += 2)
space[x] = space[x] + "x";
Console.WriteLine(String.Join("", space));
// Will print: 123x 456x 789x
One thing to be wary of when dealing with multiple separators is if there are any lines that have spaces, commas and underscores in them. e.g.
37,hello world,238_3
This code will preserve all the distinct separators but your results might not be expected. e.g. the output of the above would be:
37x,hellox worldx,238x_3x
As I mentioned that the caller passes me the first item so I tried something like this:
// find the right row
if (lines[i].ToLower().StartsWith(rowID))
{
// we have to know which delim was used to split the string since this will be
// used when stitching back the string together.
for (int delim = 0; delim < delims.Length; delim++)
{
// we split the line into an array and then use the array index as our column index
split_line = lines[i].Trim().Split(delims[delim]);
// we found the right delim
if (split_line.Length > 1)
{
delim_used = delims[delim];
break;
}
}
}
basically I iterate each line over the delims and check the resulting array length. If it is > 1 that means that delim worked otherwise skip to next one. I am using split functions property "If this instance does not contain any of the characters in separator, the returned array consists of a single element that contains this instance."

Regex.Split adding empty strings to result array

I have a Regex to split out words operators and brackets in simple logic statements (e.g. "WORD1 & WORD2 | (WORd_3 & !word_4 )". the Regex I've come up with is "(?[A-Za-z0-9_]+)|(?[&!\|()]{1})". Here is a quick test program.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("* Test Project *");
string testExpression = "!(LIONV6 | NOT_superCHARGED) &RHD";
string removedSpaces = testExpression.Replace(" ", "");
string[] expectedResults = new string[] { "!", "(", "LIONV6", "|", "NOT_superCHARGED", ")", "&", "RHD" };
string[] splits = Regex.Split(removedSpaces, #"(?[A-Za-z0-9_]+)|(?[&!\|()]{1})");
Console.WriteLine("Expected\n{0}\nActual\n{1}", expectedResults.AllElements(), splits.AllElements());
Console.WriteLine("*** Any Key to finish ***");
Console.ReadKey();
}
}
public static class Extensions
{
public static string AllElements(this string[] str)
{
string output = "";
if (str != null)
{
foreach (string item in str)
{
output += "'" + item + "',";
}
}
return output;
}
}
The Regex does the required job of splitting out words and operators into an array in the right sequence, but the result array contains many empty elements, and I can't work out why. Its not a serious problem as I just ignore empty elements when consuming the array but I'd like Regex to do all the work if possible, including ignoring spaces.
Try this:
string[] splits = Regex.Split(removedSpaces, #"(?[A-Za-z0-9_]+)|(?[&!\|()]{1})").Where(x => x != String.Empty);
The spaces are jsut becasue of the way the split works. From the help page:
If multiple matches are adjacent to one another, an empty string is inserted into the array.
What split is doing as standard is taking your matches as delimiters. So in effect the standard that would be returned is a lot of empty strings between the adjacent matches (imagine as a comparison what you might expect if you split ",,,," on ",", you'd probably expect all the gaps.
Also from that help page though is:
If capturing parentheses are used in a Regex.Split expression, any
captured text is included in the resulting string array.
This is the reason you are getting what you actually want in there at all. So effectively it is now showing you the text that has been split (all the empty strings) with the delimiters too.
What you are doing may well be better off done with just matching the regular expression (with Regex.Match) since what is in your regular expression is actually what you want to match.
Something like this (using some linq to convert to a string array):
Regex.Matches(testExpression, #"([A-Za-z0-9_]+)|([&!\|()]{1})")
.Cast<Match>()
.Select(x=>x.Value)
.ToArray();
Note that because this is taking positive matches it doesn't need the spaces to be removed first.
var matches = Regex.Matches(removedSpaces, #"(\w+|[&!|()])");
foreach (var match in matches)
Console.Write("'{0}', ", match); // '!', '(', 'LIONV6', '|', 'NOT_superCHARGED', ')', '&', 'RHD',
Actually, you don't need to delete spaces before extracting your identifiers and operators, the regex I proposed will ignore them anyway.

Categories

Resources