Related
Background
I am working with a delimited string and was using String.Split to put each substring into an array when I noticed that the last spot in the array was "". It was throwing off my results since I was looking for a specific substring at the last index in the array and I eventually came across this post explaining all strings end with string.Empty.
Example
The following shows this behavior in action. When I split my sentence and write each substring to the console, we can see the last element is the empty string:
public class Program
{
static void Main(string[] args)
{
const string mySentence = "Hello,this,is,my,string!";
var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.None);
foreach (var word in wordArray)
{
var message = word;
if (word == string.Empty) message = "Empty string";
Console.WriteLine(message);
}
Console.ReadKey();
}
}
Question & "Fix"
I get conceptually that there are empty strings between every character, but why does String behave like this even for the end of a string? It seems confusing that "ABC" is equivalent to "ABC" + "" or ABC + "" + "" + "" so why not treat the string literally as only "ABC"?
There is a "fix" around it to get the "true" substrings I wanted:
public class Program
{
static void Main(string[] args)
{
const string mySentence = "Hello,this,is,my,string!";
var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.None);
var wordList = new List<string>();
wordList.AddRange(wordArray);
wordList.RemoveAt(wordList.LastIndexOf(string.Empty));
foreach (var word in wordList)
{
var message = word;
if (word == string.Empty) message = "Empty string";
Console.WriteLine(message);
}
Console.ReadKey();
}
}
But I still don't understand why the end of the string gets treated with the same behavior since there is not another character following it where an empty string would be needed. Does it serve some purpose for the compiler?
Empty strings are the 0 of strings. There are literally infinity of them everywhere.
It's only natural that "ABC" is equivalent to "ABC" + "" or ABC + "" + "" + "". Just like it's natural that 3 is equivalent to 3 + 0 or 3 + 0 + 0 + 0.
and the fact that you have an empty string after "Hello,this,is,my,string!".Split('!')" does mean something. It means that your string ended with a "!"
This is happening because you are using StringSplitOptions.None while one of your delimiter values occurs at the end of the string. The entire purpose of that option is to create the behavior you are observing: it splits a string containing N delimiters into exactly N + 1 pieces.
To see the behavior you want, use StringSplitOptions.RemoveEmptyEntries:
var wordArray = mySentence.Split(new[] {",", "!"}, StringSplitOptions.RemoveEmptyEntries);
As for why you are seeing what you're seeing. The behavior StringSplitOptions.None is to find all the places where the delimiters are in the input string and return an array of each piece before and after the delimiters. This could be useful, for example, if you're parsing a string that you know to have exactly N parts, but where some of them could be blank. So for example, splitting the following on a comma delimiter, they would each yield exactly 3 parts:
a,b,c
a,b,
a,,c
a,,
,b,c
,b,
,,c
,,
If you want to allow empty values between delimiters, but not at the beginning or end, you can strip off delimiters at the beginning or end of the string before splitting:
var wordArray = Regex
.Replace(mySentence, "^[,!]|[,!]$", "")
.Split(new[] {",", "!"}, StringSplitOptions.None);
"" is the gap in-between each letter of Hello,this,is,my,string! So when the string is split by , and ! the result is Hello, this, is, my, string, "". The "" being the empty character between the end of the string and !.
If you replaced "" with a visible character (say #) your string would look like this #H#e#l#l#o#,#t#h#i#s#,#i#s#,#m#y#,#s#t#r#i#n#g#!#.
I have string in my c# code
a,b,c,d,"e,f",g,h
I want to replace "e,f" with "e f" i.e. ',' which is inside inverted comma should be replaced by space.
I tried using string.split but it is not working for me.
OK, I can't be bothered to think of a regex approach so I am going to offer an old fashioned loop approach which will work:
string DoReplace(string input)
{
bool isInner = false;//flag to detect if we are in the inner string or not
string result = "";//result to return
foreach(char c in input)//loop each character in the input string
{
if(isInner && c == ',')//if we are in an inner string and it is a comma, append space
result += " ";
else//otherwise append the character
result += c;
if(c == '"')//if we have hit an inner quote, toggle the flag
isInner = !isInner;
}
return result;
}
NOTE: This solution assumes that there can only be one level of inner quotes, for example you cannot have "a,b,c,"d,e,"f,g",h",i,j" - because that's just plain madness!
For the scenario where you only need to match one pair of letters, the following regex will work:
string source = "a,b,c,d,\"e,f\",g,h";
string pattern = "\"([\\w]),([\\w])\"";
string replace = "\"$1 $2\"";
string result = Regex.Replace(source, pattern, replace);
Console.WriteLine(result); // a,b,c,d,"e f",g,h
Breaking apart the pattern, it is matching any instance where there is a "X,X" sequence where X is any letter, and is replacing it with the very same sequence, with a space in between the letters instead of a comma.
You could easily extend this if you needed to to have it match more than one letter, etc, as needed.
For the case where you can have multiple letters separated by commas within quotes that need to be replaced, the following can do it for you. Sample text is a,b,c,d,"e,f,a",g,h:
string source = "a,b,c,d,\"e,f,a\",g,h";
string pattern = "\"([ ,\\w]+),([ ,\\w]+)\"";
string replace = "\"$1 $2\"";
string result = source;
while (Regex.IsMatch(result, pattern)) {
result = Regex.Replace(result, pattern, replace);
}
Console.WriteLine(result); // a,b,c,d,"e f a",g,h
This does something similar compared to the first one, but just removes any comma that is sandwiched by letters surrounded by quotes, and repeats it until all cases are removed.
Here's a somewhat fragile but simple solution:
string.Join("\"", line.Split('"').Select((s, i) => i % 2 == 0 ? s : s.Replace(",", " ")))
It's fragile because it doesn't handle flavors of CSV that escape double-quotes inside double-quotes.
Use the following code:
string str = "a,b,c,d,\"e,f\",g,h";
string[] str2 = str.Split('\"');
var str3 = str2.Select(p => ((p.StartsWith(",") || p.EndsWith(",")) ? p : p.Replace(',', ' '))).ToList();
str = string.Join("", str3);
Use Split() and Join():
string input = "a,b,c,d,\"e,f\",g,h";
string[] pieces = input.Split('"');
for ( int i = 1; i < pieces.Length; i += 2 )
{
pieces[i] = string.Join(" ", pieces[i].Split(','));
}
string output = string.Join("\"", pieces);
Console.WriteLine(output);
// output: a,b,c,d,"e f",g,h
It is a pattern that occurs quite often in one part of our Framework.
Given an Array of Strings, we have to concat all of them, seperated by Semicolons.
I´d like to know in which elegant way it can be done.
I`ve seen some variations across our codebase, and always, when i have to do this, i have to rethink again.
My current pattern is this:
String[] values = new String[] {"a","b","c","d"};
String concat = String.Empty;
foreach(String s in values)
{
if(String.IsEmptyOrNullString(s) == false)
concat + = ", ";
concat += s;
}
What negs me is the if statement, i could insert the first item before the loop and start with a for loop, starting at index 1, but this doesn´t increase the readability.
What are your suggestions?
You can use string.Join():
String[] values = new String[] {"a","b","c","d"};
var concat = string.Join(", ", values);
This will result in something looking like this:
a, b, c, d
try:
var result = string.Join(",", values.Where(s => !string.IsNullOrEmpty(s)));
I have a List of words I want to ignore like this one :
public List<String> ignoreList = new List<String>()
{
"North",
"South",
"East",
"West"
};
For a given string, say "14th Avenue North" I want to be able to remove the "North" part, so basically a function that would return "14th Avenue " when called.
I feel like there is something I should be able to do with a mix of LINQ, regex and replace, but I just can't figure it out.
The bigger picture is, I'm trying to write an address matching algorithm. I want to filter out words like "Street", "North", "Boulevard", etc. before I use the Levenshtein algorithm to evaluate the similarity.
How about this:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)));
or for .Net 3:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)).ToArray());
Note that this method splits the string up into individual words so it only removes whole words. That way it will work properly with addresses like Northampton Way #123 that string.Replace can't handle.
Regex r = new Regex(string.Join("|", ignoreList.Select(s => Regex.Escape(s)).ToArray()));
string s = "14th Avenue North";
s = r.Replace(s, string.Empty);
Something like this should work:
string FilterAllValuesFromIgnoreList(string someStringToFilter)
{
return ignoreList.Aggregate(someStringToFilter, (str, filter)=>str.Replace(filter, ""));
}
What's wrong with a simple for loop?
string street = "14th Avenue North";
foreach (string word in ignoreList)
{
street = street.Replace(word, string.Empty);
}
If you know that the list of word contains only characters that do not need escaping inside a regular expression then you can do this:
string s = "14th Avenue North";
Regex regex = new Regex(string.Format(#"\b({0})\b",
string.Join("|", ignoreList.ToArray())));
s = regex.Replace(s, "");
Result:
14th Avenue
If there are special characters you will need to fix two things:
Use Regex.Escape on each element of ignore list.
The word-boundary \b will not match a whitespace followed by a symbol or vice versa. You may need to check for whitespace (or other separating characters such as punctuation) using lookaround assertions instead.
Here's how to fix these two problems:
Regex regex = new Regex(string.Format(#"(?<= |^)({0})(?= |$)",
string.Join("|", ignoreList.Select(x => Regex.Escape(x)).ToArray())));
If it's a short string as in your example, you can just loop though the strings and replace one at a time. If you want to get fancy you can use the LINQ Aggregate method to do it:
address = ignoreList.Aggregate(address, (a, s) => a.Replace(s, String.Empty));
If it's a large string, that would be slow. Instead you can replace all strings in a single run through the string, which is much faster. I made a method for that in this answer.
LINQ makes this easy and readable. This requires normalized data though, particularly in that it is case-sensitive.
List<string> ignoreList = new List<string>()
{
"North",
"South",
"East",
"West"
};
string s = "123 West 5th St"
.Split(' ') // Separate the words to an array
.ToList() // Convert array to TList<>
.Except(ignoreList) // Remove ignored keywords
.Aggregate((s1, s2) => s1 + " " + s2); // Reconstruct the string
Why not juts Keep It Simple ?
public static string Trim(string text)
{
var rv = text.trim();
foreach (var ignore in ignoreList) {
if(tv.EndsWith(ignore) {
rv = rv.Replace(ignore, string.Empty);
}
}
return rv;
}
You can do this using and expression if you like, but it's easier to turn it around than using a Aggregate. I would do something like this:
string s = "14th Avenue North"
ignoreList.ForEach(i => s = s.Replace(i, ""));
//result is "14th Avenue "
public static string Trim(string text)
{
var rv = text;
foreach (var ignore in ignoreList)
rv = rv.Replace(ignore, "");
return rv;
}
Updated For Gabe
public static string Trim(string text)
{
var rv = "";
var words = text.Split(" ");
foreach (var word in words)
{
var present = false;
foreach (var ignore in ignoreList)
if (word == ignore)
present = true;
if (!present)
rv += word;
}
return rv;
}
If you have a list, I think you're going to have to touch all the items. You could create a massive RegEx with all your ignore keywords and replace to String.Empty.
Here's a start:
(^|\s+)(North|South|East|West){1,2}(ern)?(\s+|$)
If you have a single RegEx for ignore words, you can do a single replace for each phrase you want to pass to the algorithm.
This probably has a simple answer, but I must not have had enough coffee to figure it out on my own:
If I had a comma delimited string such as:
string list = "Fred,Sam,Mike,Sarah";
How would get each element and add quotes around it and stick it back in a string like this:
string newList = "'Fred','Sam','Mike','Sarah'";
I'm assuming iterating over each one would be a start, but I got stumped after that.
One solution that is ugly:
int number = 0;
string newList = "";
foreach (string item in list.Split(new char[] {','}))
{
if (number > 0)
{
newList = newList + "," + "'" + item + "'";
}
else
{
newList = "'" + item + "'";
}
number++;
}
string s = "A,B,C";
string replaced = "'"+s.Replace(",", "','")+"'";
Thanks for the comments, I had missed the external quotes.
Of course.. if the source was an empty string, would you want the extra quotes around it or not ? And what if the input was a bunch of whitespaces... ? I mean, to give a 100% complete solution I'd probably ask for a list of unit tests but I hope my gut instinct answered your core question.
Update: A LINQ-based alternative has also been suggested (with the added benefit of using String.Format and therefore not having to worry about leading/trailing quotes):
string list = "Fred,Sam,Mike,Sarah";
string newList = string.Join(",", list.Split(',').Select(x => string.Format("'{0}'", x)).ToList());
Following Jon Skeet's example above, this is what worked for me. I already had a List<String> variable called __messages so this is what I did:
string sep = String.Join(", ", __messages.Select(x => "'" + x + "'"));
string[] bits = list.Split(','); // Param arrays are your friend
for (int i=0; i < bits.Length; i++)
{
bits[i] = "'" + bits[i] + "'";
}
return string.Join(",", bits);
Or you could use LINQ, particularly with a version of String.Join which supports IEnumerable<string>...
return list.Split(',').Select(x => "'" + x + "'").JoinStrings(",");
There's an implementation of JoinStrings elsewhere on SO... I'll have a look for it.
EDIT: Well, there isn't quite the JoinStrings I was thinking of, so here it is:
public static string JoinStrings<T>(this IEnumerable<T> source,
string separator)
{
StringBuilder builder = new StringBuilder();
bool first = true;
foreach (T element in source)
{
if (first)
{
first = false;
}
else
{
builder.Append(separator);
}
builder.Append(element);
}
return builder.ToString();
}
These days string.Join has a generic overload instead though, so you could just use:
return string.Join(",", list.Split(',').Select(x => $"'{x}'"));
string[] splitList = list.Split(',');
string newList = "'" + string.Join("','", splitList) + "'";
Based off Jon Skeet's example, but modernized for .NET 4+:
// [ "foo", "bar" ] => "\"foo\"", "\"bar\""
string.Join(", ", strList.Select(x => $"\"{x}\""));
I think the simplest thing would be to Split and then Join.
string nameList = "Fred,Sam,Mike,Sarah";
string[] names = nameList.Split(',');
string quotedNames = "'" + string.Join("','", names) + "'";
I can't write C# code, but this simple JavaScript code is probably easy to adapt:
var s = "Fred,Sam,Mike,Sarah";
alert(s.replace(/\b/g, "'"));
It just replace bounds (start/end of string, transition from word chars non punctuation) by single quote.
string list = "Fred,Sam,Mike,Sarah";
string[] splitList = list.Split(',');
for (int i = 0; i < splitList.Length; i++)
splitList[i] = String.Format("'{0}'", splitList[i]);
string newList = String.Join(",", splitList);
If you are using JSON, following function would help
var string[] keys = list.Split(',');
console.log(JSON.stringify(keys));
My Requirements:
Separate items using commas.
Wrap all items in list in double-quotes.
Escape existing double-quotes in the string.
Handle null-strings to avoid errors.
Do not bother wrapping null-strings in double-quotes.
Terminate with carriage-return and line-feed.
string.Join(",", lCol.Select(s => s == null ? null : ("\"" + s.Replace("\"", "\"\"") + "\""))) + "\r\n";
The C# implementation of #PhiLho's JavaScript regular expression solution looks something like the following:
Regex regex = new Regex(
#"\b",
RegexOptions.ECMAScript
| RegexOptions.Compiled
);
string list = "Fred,Sam,Mike,Sarah";
string newList = regex.Replace(list,"'");
My "less sophisticated" approach ...
I suppose it's always good practice to use a StringBuilder because the list can be very large.
string list = "Fred,Sam,Mike,Sarah";
StringBuilder sb = new StringBuilder();
string[] listArray = list.Split(new char[] { ',' });
for (int i = 0; i < listArray.Length; i++)
{
sb.Append("'").Append(listArray[i]).Append("'");
if (i != (listArray.Length - 1))
sb.Append(",");
}
string newList = sb.ToString();
Console.WriteLine(newList);
Are you going to be processing a lot of CSV? If so, you should also consider using a library to do this. Don't reinvent the wheel. Unfortunately I haven't found a library quite as simple as Python's CSV library, but I have seen FileHelpers (free) reviewed at MSDN Magazine and it looks pretty good. There are probably other free libraries out there as well. It all depends on how much processing you will be doing though. Often it grows and grows until you realize you would be better off using a library.
Here is a C# 6 solution using String Interpolation.
string newList = string.Join(",", list.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => $"'{x}'")
.ToList());
Or, if you prefer the C# 5 option with String.Format:
string newList = string.Join(",", list.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => String.Format("'{0}'", x))
.ToList());
Using the StringSplitOptions will remove any empty values so you won't have any empty quotes, if that's something you're trying to avoid.
I have found a new solution for this problem
I bind a list by selected items values from the grid using linq, after that added a comma delimited string for each string collections by using String.Join() properties.
String str1 = String.Empty;
String str2 = String.Empty;
//str1 = String.Join(",", values); if you use this method,result "X,Y,Z"
str1 =String.Join("'" + "," + "'", values);
//The result of str1 is "X','Y','Z"
str2 = str1.Insert(0, "'").Insert(str1.Length+1, "'");
//The result of str2 is 'X','Y','Z'
I hope this will helpful !!!!!!
For people who love extension methods like me, here it is:
public static string MethodA(this string[] array, string seperatedCharecter = "|")
{
return array.Any() ? string.Join(seperatedCharecter, array) : string.Empty;
}
public static string MethodB(this string[] array, string seperatedChar = "|")
{
return array.Any() ? MethodA(array.Select(x => $"'{x}'").ToArray(), seperatedChar) : string.Empty;
}
When I work with some database queries, there are some occasions that we need to create part of the string query like this.
So this is my one line approach for this using Split and Join.
String newList = "'" + String.Join("','", list.Split(',')) + "'";