Is there a way to do a string.Split on whitespace - c#

I have a string "mystring theEnd" but I want to do a string.Split on white space, not just on a space because I want to get a string[] that contains "mystring" and "theEnd" between "mystring" and "theEnd" there is an unknown amount of spaces, this is why I need to split on whitespace. Is there a way to do this?

How about:
string[] bits = text.Split(new[] {' '}, StringSplitOptions.RemoveEmptEntries);
(Or text.Split specifying the exact whitespace characters you want to split on, or using null as Henk suggested.)
Or you could use a regex to handle all whitespace characters:
Regex regex = new Regex(#"\s+");
string[] bits = regex.Split(text);

Simplest is to do:
a.Split(new [] {' ', '\t'},StringSplitOptions.RemoveEmptyEntries)
Thanks Jon :)

Related

Split string with specific requirements

Let's say I have the string
string Song = "The-Sun - Is Red";
I need to split it from the '-' char, but only if the char before and after is a space.
I don't want it to split at "The-Sun"'s dash, but rather at "Sun - Is"'s dash.
The code I was using to split was
string[] SongTokens = Song.Split('-');
But that obviously splits at the first I believe. I only need to split if it has a space before and after the '-'
Thanks
I need to split it from the '-' char, but only if the char before and after is a space.
You can use a non-regex solution like this:
string[] SongTokens = Song.Split(new[] {" - "}, StringSplitOptions.RemoveEmptyEntries);
Result:
See more details about String.Split Method (String[], StringSplitOptions) at MSDN. The first argument is separator that represent a string array that delimits the substrings in this string, an empty array that contains no delimiters, or null.
The StringSplitOptions.RemoveEmptyEntries removes all empty elements from the resulting array. You may use StringSplitOptions.None to keep the empty elements.
Yet there can be a problem if you have a hard space or a regular space on both ends. Then, you'd rather choose a regex solution like this:
string[] SongTokens = Regex.Split(Song, #"\p{Zs}+-\p{Zs}+")
.Where(x => !String.IsNullOrWhiteSpace(x))
.ToArray();
The \p{Zs}+ pattern matches any Unicode "horizontal" whitespace, 1 or more occurrences.
string[] SongTokens = Song.Split(new string[] {" - "}, StringSplitOptions.None);

Regex + Convert line of numbers separated by white space into array

I'm trying to convert a string that contains multiple numbers, where each number is separated by white space, into a double array.
For example, the original string looks like:
originalString = "50 12.2 30 48.1"
I've been using Regex.Split(originalString, #"\s*"), but it's returning an array that looks like:
[50
""
12
"."
2
""
...]
Any help is much appreciated.
Using this instead
originalString.Split(new char[]{'\t', '\n', ' ', '\r'}, StringSplitOptions.RemoveEmptyEntries);
No need to rush RegEx everytime :)
What about string[] myArray = originalString.Split(' ');
I don't see the need for a RegEx here..
If you really want to use a RegEx, use the pattern \s+ instead of \s*.
The * means zero or more, but you want to split on one or more space character.
Working example with a RegEx:
string originalString = "50 12.2 30 48.1";
string[] arr = Regex.Split(originalString, #"\s+");
foreach (string s in arr)
Console.WriteLine(s);
Regex.Split(originalString, #"\s+").Where(s => !string.IsNullOrWhiteSpace(s))
The Where returns an IEnumerable with the null/whitespace filtered out. if you want it as an array still, then just add .ToArray() to that chain of calls.
The + character is necessary because you need a MINIMUM of one to make this a correct match.
I would stick with String.Split, supplying all whitespace characters that you are expecting.
In regular expressions, \s is equivalent to [ \t\r\n] (plus some other characters specific to the flavour in use); we can represent these through a char[]:
string[] nums = originalString.Split(
new char[] { ' ', '\t', '\r', '\n' },
StringSplitOptions.RemoveEmptyEntries);
The default behaviour if you pass null as a separator to String.Split is to split on whitespace. That includes anything that matches the Unicode IsWhiteSpace test. Within the ASCII range that means tab, line feed, vertical tab, form feed, carriage return and space.
Also you can avoid empty fields by passing the RemoveEmptyEntries option.
originalString = "50 12.2 30 48.1";
string[] fields = originalString.Split(null as char[], StringSplitOptions.RemoveEmptyEntries);

Getting punctuation from the end of a string only

I'm looking for a C# snippet to remove and store any punctuation from the end of a string only.
Example:
Test! would return !
Test;; would return ;;
Test?:? would return ?:?
!!Test!?! would return !?!
I have a rather clunky solution at the moment but wondered if anybody could suggest a more succinct way to do this.
My puncutation list is
new char[] { '.', ':', '-', '!', '?', ',', ';' })
You could use the following regular expression:
\p{P}*$
This breaks down to:
\p{P} - Unicode punctuation
* - Any number of times
$ - End of line anchor
If you know that there will always be some punctuation at the end of the string, use + for efficiency.
And use it like this in order to get the punctuation:
string punctuation = Regex.Match(myString, #"\p{P}*$").Value;
To actually remove it:
string noPunctuation = Regex.Replace(myString, #"\p{P}*$", string.Empty);
Use a regex:
resultString = Regex.Replace(subjectString, #"[.:!?,;-]+$", "");
Explanation:
[.:!?,;-] # Match a character that's one of the enclosed characters
+ # Do this once or more (as many times as possible)
$ # Assert position at the end of the string
As Oded suggested, use \p{P} instead of [.:!?,;-] if you want to remove all punctuation characters, not just the ones from your list.
To also "store" the punctuation, you could split the string:
splitArray = Regex.Split(subjectString, #"(?=\p{P}+$)");
Then splitArray[0] contains the part before the punctuation, and splitArray[1] the punctuation characters. If there are any.
Using Linq:
var punctuationMap = new HashSet<char>(new char[] { '.', ':', '-', '!', '?', ',', ';' });
var endPunctuationChars = aString.Reverse().
TakeWhile(ch => punctuationMap.Contains(ch));
var result = new string(endPunctuationChars.Reverse().ToArray());
The HashSet is not mandatory, you can use Linq's Contains on the array directly.

String.Split cut separator

Is it possible to use String.Split without cutting separator from string?
For example I have string
convertSource = "http://www.domain.com http://www.domain1.com";
I want to build array and use code below
convertSource.Split(new[] { " http" }, StringSplitOptions.RemoveEmptyEntries)
I get such array
[1] http://www.domain.com
[2] ://www.domain1.com
I would like to keep http, it seems String.Split not only separate string but also cut off separator.
This is screaming for Regular Expressions:
Regex regEx = new Regex(#"((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)");
Match match= regEx.Match("http://www.domain.com http://www.domain1.com");
IList<string> values = new List<string>();
while (match.Success)
{
values.Add(match.Value);
match = match.NextMatch();
}
string[] array = Regex.Split(convertSource, #"(?=http://)");
That's because you use " http" as separator.
Try this:
string separator = " ";
convertSource.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
The Split method works in a way that when it comes to the separator you provide it cuts it off right there and removes the separator from the string also.
From what you are saying you want to do there are other ways to split the string keeping the delimiters and then if you only want to remove leading or trailing spaces from your string then I wouuld suggest that you use .Trim() method: convertSource.Trim()

Split a string by word using one of any or all delimiters?

I may have just hit the point where i;m overthinking it, but I'm wondering: is there a way to designate a list of special characters that should all be considered delimiters, then splitting a string using that list? Example:
"battlestar.galactica-season 1"
should be returned as
battlestar galactica season 1
i'm thinking regex but i'm kinda flustered at the moment, been staring at it for too long.
EDIT:
Thanks guys for confirming my suspicion that i was overthinking it lol: here is what i ended up with:
//remove the delimiter
string[] tempString = fileTitle.Split(#"\/.-<>".ToCharArray());
fileTitle = "";
foreach (string part in tempString)
{
fileTitle += part + " ";
}
return fileTitle;
I suppose i could just replace delimiters with " " spaces as well... i will select an answer as soon as the timer is up!
The built-in String.Split method can take a collection of characters as delimiters.
string s = "battlestar.galactica-season 1";
string[] words = s.split('.', '-');
The standard split method does that for you. It takes an array of characters:
public string[] Split(
params char[] separator
)
You can just call an overload of split:
myString.Split(new char[] { '.', '-', ' ' }, StringSplitOptions.RemoveEmptyEntries);
The char array is a list of delimiters to split on.
"battlestar.galactica-season 1".Split(new string[] { ".", "-" }, StringSplitOptions.RemoveEmptyEntries);
This may not be complete but something like this.
string value = "battlestar.galactica-season 1"
char[] delimiters = new char[] { '\r', '\n', '.', '-' };
string[] parts = value.Split(delimiters,
StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < parts.Length; i++)
{
Console.WriteLine(parts[i]);
}
Are you trying to split the string (make multiple strings) or do you just want to replace the special characters with a space as your example might also suggest (make 1 altered string).
For the first option just see the other answers :)
If you want to replace you could use
string title = "battlestar.galactica-season 1".Replace('.', ' ').Replace('-', ' ');
For more information split with easy examples you may see following Url:
This also include split on words (multiple chars).
C# Split Function explained

Categories

Resources