The "Why" Behind String.Empty at the End of a String - c#

Background
I am working with a delimited string and was using String.Split to put each substring into an array when I noticed that the last spot in the array was "". It was throwing off my results since I was looking for a specific substring at the last index in the array and I eventually came across this post explaining all strings end with string.Empty.
Example
The following shows this behavior in action. When I split my sentence and write each substring to the console, we can see the last element is the empty string:
public class Program
{
static void Main(string[] args)
{
const string mySentence = "Hello,this,is,my,string!";
var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.None);
foreach (var word in wordArray)
{
var message = word;
if (word == string.Empty) message = "Empty string";
Console.WriteLine(message);
}
Console.ReadKey();
}
}
Question & "Fix"
I get conceptually that there are empty strings between every character, but why does String behave like this even for the end of a string? It seems confusing that "ABC" is equivalent to "ABC" + "" or ABC + "" + "" + "" so why not treat the string literally as only "ABC"?
There is a "fix" around it to get the "true" substrings I wanted:
public class Program
{
static void Main(string[] args)
{
const string mySentence = "Hello,this,is,my,string!";
var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.None);
var wordList = new List<string>();
wordList.AddRange(wordArray);
wordList.RemoveAt(wordList.LastIndexOf(string.Empty));
foreach (var word in wordList)
{
var message = word;
if (word == string.Empty) message = "Empty string";
Console.WriteLine(message);
}
Console.ReadKey();
}
}
But I still don't understand why the end of the string gets treated with the same behavior since there is not another character following it where an empty string would be needed. Does it serve some purpose for the compiler?

Empty strings are the 0 of strings. There are literally infinity of them everywhere.
It's only natural that "ABC" is equivalent to "ABC" + "" or ABC + "" + "" + "". Just like it's natural that 3 is equivalent to 3 + 0 or 3 + 0 + 0 + 0.
and the fact that you have an empty string after "Hello,this,is,my,string!".Split('!')" does mean something. It means that your string ended with a "!"

This is happening because you are using StringSplitOptions.None while one of your delimiter values occurs at the end of the string. The entire purpose of that option is to create the behavior you are observing: it splits a string containing N delimiters into exactly N + 1 pieces.
To see the behavior you want, use StringSplitOptions.RemoveEmptyEntries:
var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.RemoveEmptyEntries);
As for why you are seeing what you're seeing. The behavior StringSplitOptions.None is to find all the places where the delimiters are in the input string and return an array of each piece before and after the delimiters. This could be useful, for example, if you're parsing a string that you know to have exactly N parts, but where some of them could be blank. So for example, splitting the following on a comma delimiter, they would each yield exactly 3 parts:
a,b,c
a,b,
a,,c
a,,
,b,c
,b,
,,c
,,
If you want to allow empty values between delimiters, but not at the beginning or end, you can strip off delimiters at the beginning or end of the string before splitting:
var wordArray = Regex
.Replace(mySentence, "^[,!]|[,!]$", "")
.Split(new[] {",", "!"}, StringSplitOptions.None);

"" is the gap in-between each letter of Hello,this,is,my,string! So when the string is split by , and ! the result is Hello, this, is, my, string, "". The "" being the empty character between the end of the string and !.
If you replaced "" with a visible character (say #) your string would look like this #H#e#l#l#o#,#t#h#i#s#,#i#s#,#m#y#,#s#t#r#i#n#g#!#.

Related

C# Replace Everything before the first Space

I need to remove everything in a string before the first occurrence of a space.
Every string starts with a number and followed by a space
Replace the number and the space, thus leaving the rest of the string in tact
For Example:
22 The cats of India
4 Royal Highness
562 Eating Potatoes
42 Biscuits in the 2nd fridge
2564 Niagara Falls at 2 PM
I just need:
The cats of India
Royal Highness
Eating Potatoes
Biscuits in the 2nd fridge
Niagara Falls at 2 PM
Basically remove every number before the first space, including the first space.
I tried this:
foreach (string line in lines)
{
string newline = line.Trim().Remove(0, line.IndexOf(' ') + 1);
}
This works for numbers below 10. After it hits 2 digits, it doesn't work properly.
How should I change my code?
If you want to make sure you only match digits at the beginning of the string, you can use the following regex:
^\d+\p{Zs}
See demo
Declare it like:
public static readonly Regex rx = new Regex(#"^\d+\p{Zs}", RegexOptions.Compiled);
The ^\d+\p{Zs} regex means: one or more digits at the start of the string followed with 1 whitespace.
And then use it like
string newline = rx.Replace(line, string.Empty);
EDIT: To make sure the line has no leading whitespace, we can add .Trim() to strip it like:
Regex rx = new Regex(#"^\d+\p{Zs}", RegexOptions.Compiled);
string newline = rx.Replace(line.Trim(), string.Empty);
I know you already found a resolution to your issue. But I am going to explain why your code didn't work in the first place.
Your data has extra spaces which is why you are trimming it: line.Trim(). But the real problem lies in the the following statement:
string newline = line.Trim().Remove(0, line.IndexOf(' ') + 1);
You are making the assumption about the order of the operation and the fact that string data type is not immutable. When the operation of Trim() function is complete it returns a whole new string which is used in the Remove() operation. But the IndexOf() function is done on the original line of data.
So the correct line of code would be the following:
foreach (string line in lines)
{
// trim the line first
var temp = line.Trim();
// now perform all operation on the new temporary string
string newline = temp.Remove(0, temp.IndexOf(' ') + 1);
// debugging purpose
Console.WriteLine(newline);
}
Another solution:
var lines = new string[]
{
"22 The cats of India",
"4 Royal Highness",
"562 Eating Potatoes",
"42 Biscuits in the 2nd fridge",
"2564 Niagara Falls at 2 PM"
};
foreach (var line in lines)
{
var newLine = string.Join(" ", line.Split(' ').Skip(1));
}
Use a regex like so:
string newline = Regex.Replace(line, #"^\s*\d+\s*", "");
This will remove numbers only, not other text before the first space.
This is what you are looking for
foreach (string line in lines)
{
string newline = line.Replace(line.Split(new Char[]{' '})[0] + ' ',string.Empty);
}
UPDATE
string search=line.Split(new Char[]{' '})[0];
int pos=line.indexOf(search);
string newline = line.Substring(0, pos) + string.Empty + line.Substring(pos + search.Length);
FULL CODE
using System;
public class Program
{
public static void Main()
{
var lines = new string[]
{
"22 The cats of India",
"4 Royal Highness",
"562 Eating Potatoes",
"42 Biscuits in the 2nd fridge",
"2 Niagara Falls at 2 PM"
};
foreach(string line in lines){
string search=line.Split(new Char[]{' '})[0];
int pos=line.IndexOf(search);
string newline = line.Substring(0, pos) + string.Empty + line.Substring(pos + search.Length);
Console.WriteLine(newline);
}
}
}

Is it possible to split a string into an array of strings and remove sections not between delimiters using String.Split or regex?

I was wanting to split a string with a known delimiter between different parts into an array of strings using a method (e.g. MethodToSplitIntoArray(String toSplit) like in the example below. The values are string values which can have any character except for '{', '}', or ',' so am unable to delimit on any other character. The string can also contain undesired white space at the start and end as the file can be generated from multiple different sources, the desired information will also be inbetween "{" "}" and separated by a comma.
String valueCombined = " {value},{value1},{value2} ";
String[] values = MethodToSplitIntoArray(valueCombined);
foreach(String value in values)
{
//Do something with array
Label.Text += "\r\nString: " + value;
}
Where the label would show:
String: value
String: value1
String: value2
My current implementation of splitting method is below. It splits the values but includes any spaces before the first parenthesis and anything between them.
private String[] MethodToSplitIntoArray(String toSplit)
{
return filesPassed.Split(new string[] { "{", "}" }, StringSplitOptions.RemoveEmptyEntries);
}
I though this would separate out the strings between the curly braces and remove the rest of the string, but my output is:
String:
String: value
String: ,
String: value1
String: ,
String: value2
String:
What am I doing wrong in my split that I'm still getting the string values outside of the parenthesis? Ideally I would like to use regex or String.Split if its possible
For those with similar problems check out DotNet Perls on splitting
Making the assumption that commas are not permitted inside a curly brace pair, and that outside a curly brace pair only commas or whitespace will appear, it seems to me that the most straightforward, easy-to-read way to approach this is to first split on commas, then trim the results of that (to remove whitespace), and then finally to remove the first and last characters (which at that point should only be the curly braces):
valuesCombined.Split(',').Select(s => s.Trim().Substring(1, s.Length - 2)).ToArray();
I believe that including the curly braces in the initial split operation just makes everything harder, and is more likely to break in hard-to-identify ways (i.e. bad data will result in weirder results than if you use something like the above).
Add , to delimeters:
return filesPassed.Split(new char[] { '{', '}', ',' }, StringSplitOptions.RemoveEmptyEntries);
Not sure if you are expecting those spaces in the front and end so added some trimming to prevent empty results for those.
private String[] MethodToSplitIntoArray(String toSplit)
{
return toSplit.Trim().Split(new char[] { '{', '}', ',' }, StringSplitOptions.RemoveEmptyEntries);
}
This might be one of the way to get all the values as u are looking for
String valueCombined = " {value},{value1},{value2} ";
String[] values = valueCombined.Split(new string[] { "},{" }, StringSplitOptions.RemoveEmptyEntries);
int lastVal = values.Count() - 1;
values[0] = values[0].Replace("{", "");
values[lastVal] = values[lastVal].Replace("}", "");
What I did here is that splited the string with "},{" and then removed { from the first array item and } from the last array item.
Try regex and linq.
return Regex.Split(toSplit, "[.{.}.,]").Where(x => !string.IsNullOrWhiteSpace(x)).ToArray();
Though very late but can you try this:
Regex.Split(" { value},{ value1},{ value2};", #"\s*},{\s*|{\s*|},?;?").Where(s => string.IsNullOrWhiteSpace(s) == false).ToArray()

getting string and numbers

I got a string
string newString = "[17, Appliance]";
how can I put the 17 and Appliance in two separate variables while ignoring the , and the [ and ]?
I tried looping though it but the loop doesn't stop when it reaches the ,, not to mention it separated 1 & 7 instead of reading it as 17.
For example, you could use this:
newString.Split(new[] {'[', ']', ' ', ','}, StringSplitOptions.RemoveEmptyEntries);
This is another option, even though I wouldn't go with it, especially if you might have more than one [something, anothersomething] in the string.
But there you go:
string newString = "assuming you might [17, Appliance] have it like this";
int first = newString.IndexOf('[')+1; // location of first after the `[`
int last = newString.IndexOf(']'); // location of last before the ']'
var parts = newString.Substring(first, last-first).Split(','); // an array of 2
var int_bit = parts.First ().Trim(); // you could also go with parts[0]
var string_bit = parts.Last ().Trim(); // and parts[1]
This may not be the most performant method, but I'd go with it for ease of understanding.
string newString = "[17, Appliance]";
newString = newString.Replace("[", "").Replace("]",""); // Remove the square brackets
string[] results = newString.Split(new string[] { ", " }, StringSplitOptions.RemoveEmptyEntries); // Split the string
// If your string is always going to contain one number and one string:
int num1 = int.Parse(results[0]);
string string1 = results[1];
You'd want to include some validation to ensure your first element is indeed a number (use int.TryParse), and that there are indeed two elements returned after you split the string.

How do I know which delimiter was used when delimiting a string on multiple delimiters? (C#)

I read strings from a file and they come in various styles:
item0 item1 item2
item0,item1,item2
item0_item1_item2
I split them like this:
string[] split_line = line[i].split(new char[] {' ',',','_'});
I change an item (column) and then i stitch the strings back together using string builder.
But now when putting the string back I have to use the right delimiter.
Is it possible to know which delimiter was used when splitting the string?
UPDATE
the caller will pass me the first item so that I only change that line.
Unless you keep track of splitting action (one at the time) you don't.
Otherwise, you could create a regular expression, to catch the item and the delimiter and go from there.
Instead of passing in an array of characters, you can use a Regex to split the string instead. The advantage of doing this, is that you can capture the splitting character. Regex.Split will insert any captures between elements in the array like so:
string[] space = Regex.Split("123 456 789", #"([,_ ])");
// Results in { "123", " ", "456", " ", "789" }
string[] comma = Regex.Split("123,456,789", #"([,_ ])");
// Results in { "123", ",", "456", ",", "789" }
string[] underscore = Regex.Split("123_456_789", #"([,_ ])");
// Results in { "123", "_", "456", "_", "789" }
Then you can edit all items in the array with something like
for (int x = 0; x < space.Length; x += 2)
space[x] = space[x] + "x";
Console.WriteLine(String.Join("", space));
// Will print: 123x 456x 789x
One thing to be wary of when dealing with multiple separators is if there are any lines that have spaces, commas and underscores in them. e.g.
37,hello world,238_3
This code will preserve all the distinct separators but your results might not be expected. e.g. the output of the above would be:
37x,hellox worldx,238x_3x
As I mentioned that the caller passes me the first item so I tried something like this:
// find the right row
if (lines[i].ToLower().StartsWith(rowID))
{
// we have to know which delim was used to split the string since this will be
// used when stitching back the string together.
for (int delim = 0; delim < delims.Length; delim++)
{
// we split the line into an array and then use the array index as our column index
split_line = lines[i].Trim().Split(delims[delim]);
// we found the right delim
if (split_line.Length > 1)
{
delim_used = delims[delim];
break;
}
}
}
basically I iterate each line over the delims and check the resulting array length. If it is > 1 that means that delim worked otherwise skip to next one. I am using split functions property "If this instance does not contain any of the characters in separator, the returned array consists of a single element that contains this instance."

How to remove leading and trailing spaces from a string

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?
String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.
You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"
I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.
txt = txt.Trim();
Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.
text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();
Use the Trim method.
static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.
You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

Categories

Resources