Split string with specific requirements - c#

Let's say I have the string
string Song = "The-Sun - Is Red";
I need to split it from the '-' char, but only if the char before and after is a space.
I don't want it to split at "The-Sun"'s dash, but rather at "Sun - Is"'s dash.
The code I was using to split was
string[] SongTokens = Song.Split('-');
But that obviously splits at the first I believe. I only need to split if it has a space before and after the '-'
Thanks

I need to split it from the '-' char, but only if the char before and after is a space.
You can use a non-regex solution like this:
string[] SongTokens = Song.Split(new[] {" - "}, StringSplitOptions.RemoveEmptyEntries);
Result:
See more details about String.Split Method (String[], StringSplitOptions) at MSDN. The first argument is separator that represent a string array that delimits the substrings in this string, an empty array that contains no delimiters, or null.
The StringSplitOptions.RemoveEmptyEntries removes all empty elements from the resulting array. You may use StringSplitOptions.None to keep the empty elements.
Yet there can be a problem if you have a hard space or a regular space on both ends. Then, you'd rather choose a regex solution like this:
string[] SongTokens = Regex.Split(Song, #"\p{Zs}+-\p{Zs}+")
.Where(x => !String.IsNullOrWhiteSpace(x))
.ToArray();
The \p{Zs}+ pattern matches any Unicode "horizontal" whitespace, 1 or more occurrences.

string[] SongTokens = Song.Split(new string[] {" - "}, StringSplitOptions.None);

Related

Need to create a Regular expression to Split String after first \r\n

I have been stuck in a situation .
Here are few input strings -
"abacuses\r\n25"
"alphabet\r\n56,\r\n57"
"animals\r\n44,\r\n45,\r\n47"
I need the output to be splited like -
"abacuses\r\n25" to be splitted into A)abacuses B)25
"alphabet\r\n56,\r\n57" to be splitted into A)alphabet B)56,57
"animals\r\n44,\r\n45,\r\n47" to be splitted into A)animals B)44,45,47
So far I have tried this but it doesn't work-
string[] ina = Regex.Split(indexname, #"\r\n\D+");
string[] ina = Regex.Split(indexname, #"\r\n\");
Please Help
No regex needed in your example. You basicaly parse string:
string input = "animals\r\n44,\r\n45,\r\n47";
var split = input.Split(new char[]{'\r','\n',','}, StringSplitOptions.RemoveEmptyEntries);
var name = split[0]; //animals
var args = string.Join(",", split.Skip(1)); //44,45,37
Many people use it for parsing, but Regex is not a parsing language! It is pattern matcher! It is used to find substrings in string! If you can just Split your string - just do it, really. It is much easier to understand than Regex expression.
If you need to split a string at the first \r\n, you may use a String.Split with a count argument:
var line = "animals\r\n44,\r\n45,\r\n47";
var res = line
.Split(new[] {"\r\n"}, 2, StringSplitOptions.RemoveEmptyEntries);
// Demo output
Console.WriteLine(res[0]);
if (res.GetLength(0) > 1)
Console.WriteLine(res[1].Replace("\r\n", "")); // In the second value, linebreaks should be removed
See the C# demo
The 2 in .Split(new[] {"\r\n"}, 2, StringSplitOptions.RemoveEmptyEntries) means that the whole string should be split into 2 parts only and since the string is processed from left to right, the split will occur on the first "\r\n" substring found.

Is it possible to split a string into an array of strings and remove sections not between delimiters using String.Split or regex?

I was wanting to split a string with a known delimiter between different parts into an array of strings using a method (e.g. MethodToSplitIntoArray(String toSplit) like in the example below. The values are string values which can have any character except for '{', '}', or ',' so am unable to delimit on any other character. The string can also contain undesired white space at the start and end as the file can be generated from multiple different sources, the desired information will also be inbetween "{" "}" and separated by a comma.
String valueCombined = " {value},{value1},{value2} ";
String[] values = MethodToSplitIntoArray(valueCombined);
foreach(String value in values)
{
//Do something with array
Label.Text += "\r\nString: " + value;
}
Where the label would show:
String: value
String: value1
String: value2
My current implementation of splitting method is below. It splits the values but includes any spaces before the first parenthesis and anything between them.
private String[] MethodToSplitIntoArray(String toSplit)
{
return filesPassed.Split(new string[] { "{", "}" }, StringSplitOptions.RemoveEmptyEntries);
}
I though this would separate out the strings between the curly braces and remove the rest of the string, but my output is:
String:
String: value
String: ,
String: value1
String: ,
String: value2
String:
What am I doing wrong in my split that I'm still getting the string values outside of the parenthesis? Ideally I would like to use regex or String.Split if its possible
For those with similar problems check out DotNet Perls on splitting
Making the assumption that commas are not permitted inside a curly brace pair, and that outside a curly brace pair only commas or whitespace will appear, it seems to me that the most straightforward, easy-to-read way to approach this is to first split on commas, then trim the results of that (to remove whitespace), and then finally to remove the first and last characters (which at that point should only be the curly braces):
valuesCombined.Split(',').Select(s => s.Trim().Substring(1, s.Length - 2)).ToArray();
I believe that including the curly braces in the initial split operation just makes everything harder, and is more likely to break in hard-to-identify ways (i.e. bad data will result in weirder results than if you use something like the above).
Add , to delimeters:
return filesPassed.Split(new char[] { '{', '}', ',' }, StringSplitOptions.RemoveEmptyEntries);
Not sure if you are expecting those spaces in the front and end so added some trimming to prevent empty results for those.
private String[] MethodToSplitIntoArray(String toSplit)
{
return toSplit.Trim().Split(new char[] { '{', '}', ',' }, StringSplitOptions.RemoveEmptyEntries);
}
This might be one of the way to get all the values as u are looking for
String valueCombined = " {value},{value1},{value2} ";
String[] values = valueCombined.Split(new string[] { "},{" }, StringSplitOptions.RemoveEmptyEntries);
int lastVal = values.Count() - 1;
values[0] = values[0].Replace("{", "");
values[lastVal] = values[lastVal].Replace("}", "");
What I did here is that splited the string with "},{" and then removed { from the first array item and } from the last array item.
Try regex and linq.
return Regex.Split(toSplit, "[.{.}.,]").Where(x => !string.IsNullOrWhiteSpace(x)).ToArray();
Though very late but can you try this:
Regex.Split(" { value},{ value1},{ value2};", #"\s*},{\s*|{\s*|},?;?").Where(s => string.IsNullOrWhiteSpace(s) == false).ToArray()

Is there a way to do a string.Split on whitespace

I have a string "mystring theEnd" but I want to do a string.Split on white space, not just on a space because I want to get a string[] that contains "mystring" and "theEnd" between "mystring" and "theEnd" there is an unknown amount of spaces, this is why I need to split on whitespace. Is there a way to do this?
How about:
string[] bits = text.Split(new[] {' '}, StringSplitOptions.RemoveEmptEntries);
(Or text.Split specifying the exact whitespace characters you want to split on, or using null as Henk suggested.)
Or you could use a regex to handle all whitespace characters:
Regex regex = new Regex(#"\s+");
string[] bits = regex.Split(text);
Simplest is to do:
a.Split(new [] {' ', '\t'},StringSplitOptions.RemoveEmptyEntries)
Thanks Jon :)

Regex + Convert line of numbers separated by white space into array

I'm trying to convert a string that contains multiple numbers, where each number is separated by white space, into a double array.
For example, the original string looks like:
originalString = "50 12.2 30 48.1"
I've been using Regex.Split(originalString, #"\s*"), but it's returning an array that looks like:
[50
""
12
"."
2
""
...]
Any help is much appreciated.
Using this instead
originalString.Split(new char[]{'\t', '\n', ' ', '\r'}, StringSplitOptions.RemoveEmptyEntries);
No need to rush RegEx everytime :)
What about string[] myArray = originalString.Split(' ');
I don't see the need for a RegEx here..
If you really want to use a RegEx, use the pattern \s+ instead of \s*.
The * means zero or more, but you want to split on one or more space character.
Working example with a RegEx:
string originalString = "50 12.2 30 48.1";
string[] arr = Regex.Split(originalString, #"\s+");
foreach (string s in arr)
Console.WriteLine(s);
Regex.Split(originalString, #"\s+").Where(s => !string.IsNullOrWhiteSpace(s))
The Where returns an IEnumerable with the null/whitespace filtered out. if you want it as an array still, then just add .ToArray() to that chain of calls.
The + character is necessary because you need a MINIMUM of one to make this a correct match.
I would stick with String.Split, supplying all whitespace characters that you are expecting.
In regular expressions, \s is equivalent to [ \t\r\n] (plus some other characters specific to the flavour in use); we can represent these through a char[]:
string[] nums = originalString.Split(
new char[] { ' ', '\t', '\r', '\n' },
StringSplitOptions.RemoveEmptyEntries);
The default behaviour if you pass null as a separator to String.Split is to split on whitespace. That includes anything that matches the Unicode IsWhiteSpace test. Within the ASCII range that means tab, line feed, vertical tab, form feed, carriage return and space.
Also you can avoid empty fields by passing the RemoveEmptyEntries option.
originalString = "50 12.2 30 48.1";
string[] fields = originalString.Split(null as char[], StringSplitOptions.RemoveEmptyEntries);

Quick way of splitting a mixed alphanum string into text and numeric parts?

Say I have a string such as
abc123def456
What's the best way to split the string into an array such as
["abc", "123", "def", "456"]
string input = "abc123def456";
Regex re = new Regex(#"\D+|\d+");
string[] result = re.Matches(input).OfType<Match>()
.Select(m => m.Value).ToArray();
string[] result = Regex.Split("abc123def456", "([0-9]+)");
The above will use any sequence of numbers as the delimiter, though wrapping it in () says that we still would like to keep our delimiter in our returned array.
Note: In the example snippet we will get an empty element as the last entry of our array.
The boundary you look for can be described as "A position where a digit follows a non-digit, or where a non-digit follows a digit."
So:
string[] result = Regex.Split("abc123def456", #"(?<=\D)(?=\d)|(?<=\d)(?=\D)");
Use [0-9] and [^0-9], respectively, if \d and \D are not specific enough.
Add space around digitals, then split it. So there is the solution.
Regex.Replace("abc123def456", #"(\d+)", #" \1 ").Split(' ');
I hope it works.
You could convert the string to a char array and then loop through the characters. As long as the characters are of the same type (letter or number) keep adding them to a string. When the next character no longer is of the same type (or you've reached the end of the string), add the temporary string to the array and reset the temporary string to null.

Categories

Resources