Split string and keeping the delimiter without Regex C# - c#

Problem: spending too much time solving simple problems. Oh, here's the simple problem.
Input: string inStr, char delimiter
Output: string[] outStrs where string.Join("", outStrs) == inStr and each item in outStrs before the last item must end with the delimiter. If inStr ends with the delimiter, then the last item in outStrs ends with the delimiter as well.
Example 1:
Input: "my,string,separated,by,commas", ','
Output: ["my,", "string,", "separated,", "by,", "commas"]
Example 2:
Input: "my,string,separated,by,commas,", ','
Output: ["my,", "string,", "separated,", "by,", "commas,"] (notice trailing comma)
Solution with Regex: here
I want to avoid using Regex, simply because this requires only character comparison. It's algorithmically just as complex to do as what string.Split() does. It bothers me that I cannot find a more succinct way to do what I want.
My bad solution, which doesn't work for me... it should be faster and more succinct.
var outStr = inStr.Split(new[]{delimiter},
StringSplitOptions.RemoveEmptyEntries)
.Select(x => x + delimiter).ToArray();
if (inStr.Last() != delimiter) {
var lastOutStr = outStr.Last();
outStr[outStr.Length-1] = lastOutStr.Substring(0, lastOutStr.Length-1);
}

Using LINQ:
string input = "my,string,separated,by,commas";
string[] groups = input.Split(',');
string[] output = groups
.Select((x, idx) => x + (idx < groups.Length - 1 ? "," : string.Empty))
.Where(x => x != "")
.ToArray();
Split the string into groups, then transform every group that is not the last element by appending a comma to it.
Just thought of another way you could do it, but I don't think this method is as clear:
string[] output = (input + ',').Split( new[] { "," }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x + ',').ToArray();

Seems pretty simple to me without using Regex:
string inStr = "dasdasdas";
char delimiter = 'A';
string[] result = inStr.Split(new string[] { inStr }, System.StringSplitOptions.RemoveEmptyEntries);
string lastItem = result[result.Length - 1];
int amountOfLoops = lastItem[lastItem.Length - 1] == delimiter ? result.Length - 1 : result.Length - 2;
for (int i = 0; i < amountOfLoops; i++)
{
result[i] += delimiter;
}

public static IEnumerable<string> SplitAndKeep(this string s, string[] delims)
{
int start = 0, index;
string selectedSeperator = null;
while ((index = s.IndexOfAny(delims, start, out selectedSeperator)) != -1)
{
if (selectedSeperator == null)
continue;
if (index - start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, selectedSeperator.Length);
start = index + selectedSeperator.Length;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}

Related

Regular expression to split a string (which contains newline characters) into equal length chunks

I have got a string let's say
Test Subject\r\nTest Comments...
I want to write a regular expression which would split the string to chunks of n characters say n=6 and the split process should not be affected by newline characters (\r\n).
The code which i have come up with is
string pattern = ".{1," + 6 + "}";
string noteDetails = "Test Subject\r\nTest Comments...";
List<string> noteComments = Regex.Matches(noteDetails, pattern).Cast<Match>().Select(x => x.Value).ToList();`
But the output which i am getting is
Test S
ubject
Test C
omment
s...
The desired output is
Test S
ubject
\r\nTe
st Com
ments.
..
If \r\n is not present then the code works fine. The bottom line is \r\n should also be considered as normal characters.
Thanks in advance
You do not need regex. Use string methods :
string input = "Test Subject\nTest Comment";
string[] results = input.ToCharArray()
.Where(x => x != '\n')
.Select((x, i) => new { chr = x, index = i })
.GroupBy(x => x.index / 6)
.Select(x => string.Join("", x.Select(y => y.chr)))
.ToArray();
A second more traditional approach, because Regex is rarely the best choice:
var stringToSplit = #"Test Subject\r\nTest Comments...";
var length = stringToSplit.Length;
var lineLength = 6;
var lastIndex = 0;
for(int i = 0; i < length - lineLength ; i+= lineLength)
{
lastIndex = i;
Console.WriteLine(stringToSplit.Substring(i, lineLength));
}
if (lastIndex < length)
{
Console.WriteLine(stringToSplit.Substring(lastIndex + lineLength, (length - (lastIndex + lineLength))));
}
And the output:
Test S
ubject
\r\nTe
st Com
ments.
..

Best way to split string into lines with maximum length, without breaking words

I want to break a string up into lines of a specified maximum length, without splitting any words, if possible (if there is a word that exceeds the maximum line length, then it will have to be split).
As always, I am acutely aware that strings are immutable and that one should preferably use the StringBuilder class. I have seen examples where the string is split into words and the lines are then built up using the StringBuilder class, but the code below seems "neater" to me.
I mentioned "best" in the description and not "most efficient" as I am also interested in the "eloquence" of the code. The strings will never be huge, generally splitting into 2 or three lines, and it won't be happening for thousands of lines.
Is the following code really bad?
private static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
stringToSplit = stringToSplit.Trim();
var lines = new List<string>();
while (stringToSplit.Length > 0)
{
if (stringToSplit.Length <= maximumLineLength)
{
lines.Add(stringToSplit);
break;
}
var indexOfLastSpaceInLine = stringToSplit.Substring(0, maximumLineLength).LastIndexOf(' ');
lines.Add(stringToSplit.Substring(0, indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine : maximumLineLength).Trim());
stringToSplit = stringToSplit.Substring(indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine + 1 : maximumLineLength);
}
return lines.ToArray();
}
Even when this post is 3 years old I wanted to give a better solution using Regex to accomplish the same:
If you want the string to be splitted and then use the text to be displayed you can use this:
public string SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Replace(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)", "$1\n");
}
If on the other hand you need a collection you can use this:
public MatchCollection SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Matches(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)");
}
NOTES
Remember to import regex (using System.Text.RegularExpressions;)
You can use string interpolation on the match:
$#"(.{{1,{maximumLineLength}}})(?:\s|$)"
The MatchCollection works almost like an Array
Matching example with explanation here
How about this as a solution:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ').Concat(new [] { "" });
return
words
.Skip(1)
.Aggregate(
words.Take(1).ToList(),
(a, w) =>
{
var last = a.Last();
while (last.Length > maximumLineLength)
{
a[a.Count() - 1] = last.Substring(0, maximumLineLength);
last = last.Substring(maximumLineLength);
a.Add(last);
}
var test = last + " " + w;
if (test.Length > maximumLineLength)
{
a.Add(w);
}
else
{
a[a.Count() - 1] = test;
}
return a;
});
}
I reworked this as prefer this:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ');
var line = words.First();
foreach (var word in words.Skip(1))
{
var test = $"{line} {word}";
if (test.Length > maximumLineLength)
{
yield return line;
line = word;
}
else
{
line = test;
}
}
yield return line;
}
I don't think your solution is too bad. I do, however, think you should break up your ternary into an if else because you are testing the same condition twice. Your code might also have a bug. Based on your description, it seems you want lines <= maxLineLength, but your code counts the space after the last word and uses it in the <= comparison resulting in effectively < behavior for the trimmed string.
Here is my solution.
private static IEnumerable<string> SplitToLines(string stringToSplit, int maxLineLength)
{
string[] words = stringToSplit.Split(' ');
StringBuilder line = new StringBuilder();
foreach (string word in words)
{
if (word.Length + line.Length <= maxLineLength)
{
line.Append(word + " ");
}
else
{
if (line.Length > 0)
{
yield return line.ToString().Trim();
line.Clear();
}
string overflow = word;
while (overflow.Length > maxLineLength)
{
yield return overflow.Substring(0, maxLineLength);
overflow = overflow.Substring(maxLineLength);
}
line.Append(overflow + " ");
}
}
yield return line.ToString().Trim();
}
It is a bit longer than your solution, but it should be more straightforward. It also uses a StringBuilder so it is much faster for large strings. I performed a benchmarking test for 20,000 words ranging from 1 to 11 characters each split into lines of 10 character width. My method completed in 14ms compared to 1373ms for your method.
Try this (untested)
private static IEnumerable<string> SplitToLines(string value, int maximumLineLength)
{
var words = value.Split(' ');
var line = new StringBuilder();
foreach (var word in words)
{
if ((line.Length + word.Length) >= maximumLineLength)
{
yield return line.ToString();
line = new StringBuilder();
}
line.AppendFormat("{0}{1}", (line.Length>0) ? " " : "", word);
}
yield return line.ToString();
}
~6x faster than the accepted answer
More than 1.5x faster than the Regex version in Release Mode (dependent on line length)
Optionally keep the space at the end of the line or not (the regex version always keeps it)
static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength, bool removeSpace = true)
{
int start = 0;
int end = 0;
for (int i = 0; i < stringToSplit.Length; i++)
{
char c = stringToSplit[i];
if (c == ' ' || c == '\n')
{
if (i - start > maximumLineLength)
{
string substring = stringToSplit.Substring(start, end - start); ;
start = removeSpace ? end + 1 : end; // + 1 to remove the space on the next line
yield return substring;
}
else
end = i;
}
}
yield return stringToSplit.Substring(start); // remember last line
}
Here is the example code used to test speeds (again, run on your own machine and test in Release mode to get accurate timings)
https://dotnetfiddle.net/h5I1GC
Timings on my machine in release mode .Net 4.8
Accepted Answer: 667ms
Regex: 368ms
My Version: 117ms
My requirement was to have a line break at the last space before the 30 char limit.
So here is how i did it. Hope this helps anyone looking.
private string LineBreakLongString(string input)
{
var outputString = string.Empty;
var found = false;
int pos = 0;
int prev = 0;
while (!found)
{
var p = input.IndexOf(' ', pos);
{
if (pos <= 30)
{
pos++;
if (p < 30) { prev = p; }
}
else
{
found = true;
}
}
outputString = input.Substring(0, prev) + System.Environment.NewLine + input.Substring(prev, input.Length - prev).Trim();
}
return outputString;
}
An approach using recursive method and ReadOnlySpan (Tested)
public static void SplitToLines(ReadOnlySpan<char> stringToSplit, int index, ref List<string> values)
{
if (stringToSplit.IsEmpty || index < 1) return;
var nextIndex = stringToSplit.IndexOf(' ');
var slice = stringToSplit.Slice(0, nextIndex < 0 ? stringToSplit.Length : nextIndex);
if (slice.Length <= index)
{
values.Add(slice.ToString());
nextIndex++;
}
else
{
values.Add(slice.Slice(0, index).ToString());
nextIndex = index;
}
if (stringToSplit.Length <= index) return;
SplitToLines(stringToSplit.Slice(nextIndex), index, ref values);
}

How to split string with date in c#

i have string with date , i want to split it with date and string
For example :
I have this type of strings data
9/23/2013/marking abandoned based on notes below/DB
12/8/2012/I think the thid is string/SG
and i want to make it like as
9/23/2013 marking abandoned based on notes below/DB
12/8/2013 I think the thid is string/SG
so, i don't know how to split these strings and store in different columns of table.
pls help me.
string[] vals = { "9/23/2013/marking abandoned based on notes below/DB",
"12/8/2012/I think the thid is string/SG" };
var regex = #"(\d{1,2}/\d{1,2}/\d{4})/(.*)";
var matches = vals.Select(val => Regex.Match(vals, regex));
foreach (var match in matches)
{
Console.WriteLine ("{0} {1}", match.Groups[1], match.Groups[2]);
}
prints:
9/23/2013 marking abandoned based on notes below/DB
12/8/2012 I think the thid is string/SG
(\d{1,2}/\d{1,2}/\d{4})/(.*) breaks down to
(\d{1,2}/\d{1,2}/\d{4}):
\d{1,2} - matches any one or two digit number
/ - matches to one / symbol
\d{4} - matches to four digit number
(...) - denotes first group
(.*) - matches everything else and creates second group
Another way to do it with LINQ:
var inputs = new[]{
"9/23/2013/marking abandoned based on notes below/DB",
"12/8/2012/I think the thid is string/SG"
};
foreach (var item in inputs)
{
int counter = 0;
var r = item.Split('/')
.Aggregate("", (a, b) =>
a + ((counter++ == 3) ? "\t" : ((counter == 1) ? "" : "/")) + b);
Console.WriteLine(r);
}
Or you may use the IndexOf and Substring methods:
foreach (var item in inputs)
{
var lastPos =
item.IndexOf('/',
1 + item.IndexOf('/',
1 + item.IndexOf('/')));
if (lastPos != -1)
{
var r = String.Join("\t",
item.Substring(0, lastPos),
item.Substring(lastPos + 1, item.Length - lastPos - 1));
Console.WriteLine(r);
}
}
Perhaps with pure string methods, the third slash separates the date and the text:
string line = "9/23/2013/marking abandoned based on notes below/DB";
int slashIndex = line.IndexOf('/');
if(slashIndex >= 0)
{
int slashCount = 1;
while(slashCount < 3 && slashIndex >= 0)
{
slashIndex = line.IndexOf('/', slashIndex + 1);
if(slashIndex >= 0) slashCount++;
}
if(slashCount == 3)
{
Console.WriteLine("Date:{0} Text: {1}"
, line.Substring(0, slashIndex)
, line.Substring(slashIndex +1));
}
}
For what it's worth, here is a extension method to "break" a string in half on nth occurence of astring:
public static class StringExtensions
{
public static string[] BreakOnNthIndexOf(this string input, string value, int breakOn, StringComparison comparison)
{
if (breakOn <= 0)
throw new ArgumentException("breakOn must be greater than 0", "breakOn");
if (value == null) value = " "; // fallback on white-space
int slashIndex = input.IndexOf(value, comparison);
if (slashIndex >= 0)
{
int slashCount = 1;
while (slashCount < breakOn && slashIndex >= 0)
{
slashIndex = input.IndexOf(value, slashIndex + value.Length, comparison);
if (slashIndex >= 0) slashCount++;
}
if (slashCount == breakOn)
{
return new[] {
input.Substring(0, slashIndex),
input.Substring(slashIndex + value.Length)
};
}
}
return new[]{ input };
}
}
Use it in this way:
string line1 = "9/23/2013/marking abandoned based on notes below/DB";
string line2 = "12/8/2012/I think the thid is string/SG";
string[] res1 = line1.BreakOnNthIndexOf("/", 3, StringComparison.OrdinalIgnoreCase);
string[] res2 = line2.BreakOnNthIndexOf("/", 3, StringComparison.OrdinalIgnoreCase);

remove chars in string except last

I'm tring to remove the char '.' from a string except the last occurrence; for example the string
12.34.56.78
should became
123456.78
I'm using this loop:
while (value != null && value.Count(c => c == '.') > 1)
{
value = value.Substring(0, value.IndexOf('.')) + value.Substring(value.IndexOf('.') + 1);
}
I wonder if there is a cleaner way (maybe using linq?) to do this whitout an explicit loop?
(I know there is a very similar question but is about perl and things are quite different)
int lastIndex = value.LastIndexOf('.');
if (lastIndex > 0)
{
value = value.Substring(0, lastIndex).Replace(".", "")
+ value.Substring(lastIndex);
}
Perhaps a mixture of string methods and Linq:
string str = "12.34.56.78";
Char replaceChar = '.';
int lastIndex = str.LastIndexOf(replaceChar);
if (lastIndex != -1)
{
IEnumerable<Char> chars = str
.Where((c, i) => c != replaceChar || i == lastIndex);
str = new string(chars.ToArray());
}
Demo
I would do that way:
search for the last '.' ;
substring [0 .. indexOfLastDot] ;
remove in place any '.' of the substring
concatenate the substring with the rest of the original string, [indexOfLastDot .. remaining]
OR
search for the last '.'
for each enumerated char of the string
if it’s a '.' and i ≠ indexOfLastDot, remove it
var splitResult = v.Split(new char[] { '.' }).ToList();
var lastSplit = splitResult.Last();
splitResult.RemoveAt(splitResult.Count - 1);
var output = string.Join("", splitResult) + "." + lastSplit;
I would do it that way. The neatest way isn't always the shortest way.
Something like this should do the trick. Whether it is "good" or not is another matter. Note also that there is no error checking. Might want to check for null or empty string and that the string has at least one "." in it.
string numbers = "12.34.56.78";
var parts = String.Split(new char [] {'.'});
string newNumbers = String.Join("",parts.Take(parts.Length-1)
.Concat(".")
.Concat(parts.Last());
I don't claim that this would have great performance characteristics for long strings, but it does use Linq ;-)
you do not have to use loop:
//string val = "12345678";
string val = "12.34.56.78";
string ret = val;
int index = val.LastIndexOf(".");
if (index >= 0)
{
ret = val.Substring(0, index).Replace(".", "") + val.Substring(index);
}
Debug.WriteLine(ret);

C# split string but keep split chars / separators [duplicate]

I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}

Categories

Resources