Split a string on multiple delimiters and keep them in the output

Split a string on multiple delimiters and keep them in the output - c#

I have an string that can be 2 to N chars long. I also have 4 ocode (each 2 chars long).
Is there a way to so something like:
var tmpArray = inputStr.Split(char1, char2, char3, char4).ToArray();
Say that the opcodes are A,B,C,D or 8 and I have this string AB123456789C123412341234B123 the array would be like this:
A
B
123456789
C
123412341234
B
123

This is all you need.
string toSplit = "AB123456789C123412341234B123";
string pattern = #"([ABCD])";
IEnumerable<string> substrings = Regex.Split(toSplit, pattern).Where(i => !String.IsNullOrWhiteSpace(i));
Test here: http://www.beansoftware.com/Test-Net-Regular-Expressions/Split-String.aspx
All you have to do is declare a character class [...] involving all your characters you want to split on, then you encompass that in (...) parens to keep the delimiters.

Try this,
private char[] alphabets = {'A','B','C', 'D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
var input = "AB123456789C123412341234B123";
var result = input.SplitAndKeep(alphabets).ToList();
public static class Extensions
{
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if (index - start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
}

Use Regex.Split
var str = "AB123456789C123412341234B123";
Regex r = new Regex(#"([A-Z])|(\d*)");
var parts = r.Split(str).Where(x=> !string.IsNullOrWhiteSpace(x)).ToArray();
if you want just A,B,C and D, use this
Regex r = new Regex(#"([A-D])|(\d*)");

I find that Regex lookahead/lookbehind fits this scenario. The pattern basically says to split when a single letter is found behind or ahead of the current position. Then use Linq to not return any empty spaces as part of the result, which in this sample case the empty element would be the first element.
Lookahead/Lookbehind Reference
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string data = "AB123456789C123412341234B123";
var pieces = Regex.Split(data, "(?<=[a-zA-Z])|(?=[a-zA-Z])").Where(p => !String.IsNullOrEmpty(p));
foreach (string p in pieces)
Console.WriteLine(p);
}
}
Results:
A
B
123456789
C
123412341234
B
123
Fiddle Demo

Try first splitting on A only. Then split each result on B only. Then split each result on C only etc...
Splitting on A gives you:
B123456789C123412341234B123
which you know starts with A
Splitting on B gives you:
123456789C123412341234,
123
each of which you know starts with B and so on.

Related

How to remove all but the first occurences of a character from a string?

I have a string of text and want to ensure that it contains at most one single occurrence of a specific character (,). Therefore I want to keep the first one, but simply remove all further occurrences of that character.
How could I do this the most elegant way using C#?

This works, but not the most elegant for sure :-)
string a = "12,34,56,789";
int pos = 1 + a.IndexOf(',');
return a.Substring(0, pos) + a.Substring(pos).Replace(",", string.Empty);

You could use a counter variable and a StringBuilder to create the new string efficiently:
var sb = new StringBuilder(text.Length);
int maxCount = 1;
int currentCount = 0;
char specialChar = ',';
foreach(char c in text)
if(c != specialChar || ++currentCount <= maxCount)
sb.Append(c);
text = sb.ToString();
This approach is not the shortest but it's efficient and you can specify the char-count to keep.
Here's a more "elegant" way using LINQ:
int commasFound = 0; int maxCommas = 1;
text = new string(text.Where(c => c != ',' || ++commasFound <= maxCommas).ToArray());
I don't like it because it requires to modify a variable from a query, so it's causing a side-effect.

Regular expressions are elegant, right?
Regex.Replace("Eats, shoots, and leaves.", #"(?<=,.*),", "");
This replaces every comma, as long as there is a comma before it, with nothing.
(Actually, it's probably not elegant - it may only be one line of code, but it may also be O(n^2)...)

If you don't deal with large strings and you reaaaaaaly like Linq oneliners:
public static string KeepFirstOccurence (this string #string, char #char)
{
var index = #string.IndexOf(#char);
return String.Concat(String.Concat(#string.TakeWhile(x => #string.IndexOf(x) < index + 1)), String.Concat(#string.SkipWhile(x=>#string.IndexOf(x) < index)).Replace(#char.ToString(), ""));
}

You could write a function like the following one that would split the string into two sections based on the location of what you were searching (via the String.Split() method) for and it would only remove matches from the second section (using String.Replace()) :
public static string RemoveAllButFirst(string s, string stuffToRemove)
{
// Check if the stuff to replace exists and if not, return the original string
var locationOfStuff = s.IndexOf(stuffToRemove);
if (locationOfStuff < 0)
{
return s;
}
// Calculate where to pull the first string from and then replace the rest of the string
var splitLocation = locationOfStuff + stuffToRemove.Length;
return s.Substring(0, splitLocation) + (s.Substring(splitLocation)).Replace(stuffToRemove,"");
}
You could simply call it by using :
var output = RemoveAllButFirst(input,",");
A prettier approach might actually involve building an extension method that handled this a bit more cleanly :
public static class StringExtensions
{
public static string RemoveAllButFirst(this string s, string stuffToRemove)
{
// Check if the stuff to replace exists and if not, return the
// original string
var locationOfStuff = s.IndexOf(stuffToRemove);
if (locationOfStuff < 0)
{
return s;
}
// Calculate where to pull the first string from and then replace the rest of the string
var splitLocation = locationOfStuff + stuffToRemove.Length;
return s.Substring(0, splitLocation) + (s.Substring(splitLocation)).Replace(stuffToRemove,"");
}
}
which would be called via :
var output = input.RemoveAllButFirst(",");
You can see a working example of it here.

static string KeepFirstOccurance(this string str, char c)
{
int charposition = str.IndexOf(c);
return str.Substring(0, charposition + 1) +
str.Substring(charposition, str.Length - charposition)
.Replace(c, ' ').Trim();
}

Pretty short with Linq; split string into chars, keep distinct set and join back to a string.
text = string.Join("", text.Select(c => c).Distinct());

How to Return true if String array having mix of Int and String type value converted to int in C#

I have string which I split to see if any of the split value is string. If so I want to return true else false.
string words = "1 2 c 5";
Easy approach, I can follow by converting into int array and then compare value side by side.
int[] iar = words.Split(' ').Select(s => int.TryParse(s, out n) ? n : 0).ToArray();
Can any one recommend better approach?

You can simply check without using Split:
var result = words.Any(c => !char.IsWhiteSpace(c)
&& !char.IsDigit(c));
Or using Split:
var result = words.Split()
.Any(w => w.Any(c => !char.IsDigit(c)));
The point is you can use char.IsDigit to check instead of using int.Parse or int.TryParse.

You could do this with a simple little method:
public static bool CheckForNum(string[] wordsArr)
{
int i = 0;
foreach (string s in wordsArr)
{
if (Int32.TryParse(s, out i))
{
return true;
}
}
return false;
}
Using:
bool result = CheckForNum(words.Split(' '));
Console.Write(result);

Why not use a regular expression? If a string has words and numbers in it, it must have letters and number characters. I don't entirely understand the logic in your question, so you may need to adjust the logic here.
using System;
using System.Text.RegularExpressions;
...
string words = "1 2 c 5";
Match numberMatch = Regex.Match(words, #"[0-9]", RegexOptions.IgnoreCase);
Match letterMatch = Regex.Match(words, #"[a-zA-Z]", RegexOptions.IgnoreCase);
// Here we check the Match instance.
if (numberMatch.Success && letterMatch.Success)
{
// there are letters and numbers
}

How can I search through a string in C# and replace areas bounded by a pattern?

We tried a few solutions now that try and use XML parsers. All fail because the strings are not always 100% valid XML. Here's our problem.
We have strings that look like this:
var a = "this is a testxxx of my data yxxx and of these xxx parts yxxx";
var b = "hello testxxx world yxxx ";
"this is a testxxx3yxxx and of these xxx1yxxx";
"hello testxxx1yxxx ";
The key here is that we want to do something to the data between xxx and yxxx. In the example above I would need a function that counts words and replaces the strings with a word count.
Is there a way we can process the string a and apply a function to change the data that's between the xxx and yxxx? Any function right now as we're just trying to get an idea of how to code this.

You can use Split method:
var parts = a.Split(new[] {"xxx", "yxxx"}, StringSplitOptions.None)
.Select((s, index) =>
{
string s1 = index%2 == 1 ? string.Format("{0}{2}{1}", "xxx", "yxxx", s + "1") : s;
return s1;
});
var result = string.Join("", parts);

If it always going to xxx and yxxx, you can use regex as suggested.
var stringBuilder = new StringBuilder();
Regex regex = new Regex("xxx(.*?)yxxx");
var splitGroups = Regex.Match(a);
foreach(var group in splitGroups)
{
var value = splitGroupsCopy[i];
// do something to value and then append it to string builder
stringBuilder.Append(string.Format("{0}{1}{2}", "xxx", value, "yxxx"));
}
I suppose this is as basic as it gets.

Using Regex.Replace will replace all the matches with your choice of text, something like this:
Regex rgx = new Regex("xxx.+yxxx");
string cleaned = rgx.Replace(a, "replacementtext");

This code will process each of the parts delimited by "xxx". It preserves the "xxx" separators. If you do not want to preserve the "xxx" separators, remove the two lines that say "result.Append(separator);".
Given:
"this is a testxxx of my data yxxx and there are many of these xxx parts yxxx"
It prints:
"this is a testxxx>> of my data y<<xxx and there are many of these xxx>> parts y<<xxx"
I'm assuming that's the kind of thing you want. Add your own processing to "processPart()".
using System;
using System.Text;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
string text = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
string separator = "xxx";
var result = new StringBuilder();
int index = 0;
while (true)
{
int start = text.IndexOf(separator, index);
if (start < 0)
{
result.Append(text.Substring(index));
break;
}
result.Append(text.Substring(index, start - index));
int end = text.IndexOf(separator, start + separator.Length);
if (end < 0)
{
throw new InvalidOperationException("Unbalanced separators.");
}
start += separator.Length;
result.Append(separator);
result.Append(processPart(text.Substring(start, end-start)));
result.Append(separator);
index = end + separator.Length;
}
Console.WriteLine(result);
}
private static string processPart(string part)
{
return ">>" + part + "<<";
}
}
}
[EDIT] Here's the code amended to work with two different separators:
using System;
using System.Text;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
string text = "this is a test<pre> of my data y</pre> and there are many of these <pre> parts y</pre>";
string separator1 = "<pre>";
string separator2 = "</pre>";
var result = new StringBuilder();
int index = 0;
while (true)
{
int start = text.IndexOf(separator1, index);
if (start < 0)
{
result.Append(text.Substring(index));
break;
}
result.Append(text.Substring(index, start - index));
int end = text.IndexOf(separator2, start + separator1.Length);
if (end < 0)
{
throw new InvalidOperationException("Unbalanced separators.");
}
start += separator1.Length;
result.Append(separator1);
result.Append(processPart(text.Substring(start, end-start)));
result.Append(separator2);
index = end + separator2.Length;
}
Console.WriteLine(result);
}
private static string processPart(string part)
{
return "|" + part + "|";
}
}
}

The indexOf() function will return to you the index of the first occurrence of a given substring.
(My indices might be a bit off, but) I would suggest doing something like this:
var searchme = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
var startindex= searchme.indexOf("xxx");
var endindex = searchme.indexOf("yxxx") + 3; //added 3 to find the index of the last 'x' instead of the index of the 'y' character
var stringpiece = searchme.substring(startindex, endindex - startindex);
and you can repeat that while startindex != -1
Like I said, the indices might be slightly off, you might have to add a +1 or -1 somewhere, but this will get you along nicely (I think).
Here is a little sample program that counts chars instead of words. But you should just need to change the processor function.
var a = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
a = ProcessString(a, CountChars);
string CountChars(string a)
{
return a.Length.ToString();
}
string ProcessString(string a, Func<string, string> processor)
{
int idx_start, idx_end = -4;
while ((idx_start = a.IndexOf("xxx", idx_end + 4)) >= 0)
{
idx_end = a.IndexOf("yxxx", idx_start + 3);
if (idx_end < 0)
break;
var string_in_between = a.Substring(idx_start + 3, idx_end - idx_start - 3);
var newString = processor(string_in_between);
a = a.Substring(0, idx_start + 3) + newString + a.Substring(idx_end, a.Length - idx_end);
idx_end -= string_in_between.Length - newString.Length;
}
return a;
}

I would use Regex Groups:
Here my solution to get the parts in the string:
private static IEnumerable<string> GetParts( string searchFor, string begin, string end ) {
string exp = string.Format("({0}(?<searchedPart>.+?){1})+", begin, end);
Regex regex = new Regex(exp);
MatchCollection matchCollection = regex.Matches(searchFor);
foreach (Match match in matchCollection) {
Group #group = match.Groups["searchedPart"];
yield return #group.ToString();
}
}
you can use it like to get the parts:
string a = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
IEnumerable<string> parts = GetParts(a, "xxx", "yxxx");
To replace the parts in the original String you can use the Regex Group to determine Length and StartPosition (#group.Index, #group.Length).

Getting parts of a string and combine them in C#?

I have a string like this: C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg
Now, what I want to do is to dynamically combine the last 4 numbers, in this case its 10000080 as result. My idea was ti split this and combine them in some way, is there an easier way? I cant rely on the array index, because the path can be longer or shorter as well.
Is there a nice way to do that?
Thanks :)

A compact way using string.Join and Regex.Split.
string text = #"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
string newString = string.Join(null, Regex.Split(text, #"[^\d]")); //10000080

Use String.Split
String toSplit = "C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
String[] parts = toSplit.Split(new String[] { #"\" });
String result = String.Empty;
for (int i = 5, i > 1; i--)
{
result += parts[parts.Length - i];
}
// Gives the result 10000080

You can rely on array index if the last part always is the filename.
since the last part is always
array_name[array_name.length - 1]
the 4 parts before that can be found by
array_name[array_name.length - 2]
array_name[array_name.length - 3]
etc

If you always want to combine the last four numbers, split the string (use \ as the separator), start counting from the last part and take 4 numbers, or the 4 almost last parts.
If you want to take all the digits, just scan the string from start to finish and copy just the digits to a new string.

string input = "C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
string[] parts = toSplit.Split(new char[] {'\\'});
IEnumerable<string> reversed = parts.Reverse();
IEnumerable<string> selected = reversed.Skip(1).Take(4).Reverse();
string result = string.Concat(selected);
The idea is to extract the parts, reverse them to keep only the last 4 (excluding the file name) and re reversing to rollback to the initial order, then concat.

Using LINQ:
string path = #"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg";
var parts = Path.GetDirectoryName(path).Split('\\');
string numbersPart = parts.Skip(parts.Count() - 4)
.Aggregate((acc, next) => acc + next);
Result: "10000080"

var r = new Regex(#"[^\d+]");
var match = r
.Split(#"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg")
.Aggregate((i, j) => i + j);
return match.ToString();

to find the number you can use regex:
(([0-9]{2})\\){4}
use concat all inner Group ([0-9]{2}) to get your searched number.
This will always find your searched number in any position in the given string.
Sample Code:
static class TestClass {
static void Main(string[] args) {
string[] tests = { #"C:\Projects\test\whatever\files\media\10\00\00\80\test.jpg",
#"C:\Projects\test\whatever\files\media\10\00\00\80\some\foldertest.jpg",
#"C:\10\00\00\80\test.jpg",
#"C:\10\00\00\80\test.jpg"};
foreach (string test in tests) {
int number = ExtractNumber(test);
Console.WriteLine(number);
}
Console.ReadLine();
}
static int ExtractNumber(string path) {
Match match = Regex.Match(path, #"(([0-9]{2})\\){4}");
if (!match.Success) {
throw new Exception("The string does not contain the defined Number");
}
//get second group that is where the number is
Group #group = match.Groups[2];
//now concat all captures
StringBuilder builder = new StringBuilder();
foreach (var capture in #group.Captures) {
builder.Append(capture);
}
//pares it as string and off we go!
return int.Parse(builder.ToString());
}
}

C# split string but keep split chars / separators [duplicate]

I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?

If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.

If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2

Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}

Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on

This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");

A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}

This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}

Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}

I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.

To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");

result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)

Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.

veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.

using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}

I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two

I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());

I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Split a string on multiple delimiters and keep them in the output - c#

Use Regex.Split var str = "AB123456789C123412341234B123"; Regex r = new Regex(#"([A-Z])|(\d)"); var parts = r.Split(str).Where(x=> !string.IsNullOrWhiteSpace(x)).ToArray(); if you want just A,B,C and D, use this Regex r = new Regex(#"([A-D])|(\d)");

Try first splitting on A only. Then split each result on B only. Then split each result on C only etc... Splitting on A gives you: B123456789C123412341234B123 which you know starts with A Splitting on B gives you: 123456789C123412341234, 123 each of which you know starts with B and so on.

Related

How to remove all but the first occurences of a character from a string?

How to Return true if String array having mix of Int and String type value converted to int in C#

How can I search through a string in C# and replace areas bounded by a pattern?

Getting parts of a string and combine them in C#?

C# split string but keep split chars / separators [duplicate]

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Split a string on multiple delimiters and keep them in the output - c#

Use Regex.Split var str = "AB123456789C123412341234B123"; Regex r = new Regex(#"([A-Z])|(\d*)"); var parts = r.Split(str).Where(x=> !string.IsNullOrWhiteSpace(x)).ToArray(); if you want just A,B,C and D, use this Regex r = new Regex(#"([A-D])|(\d*)");

Try first splitting on A only. Then split each result on B only. Then split each result on C only etc... Splitting on A gives you: B123456789C123412341234B123 which you know starts with A Splitting on B gives you: 123456789C123412341234, 123 each of which you know starts with B and so on.

Related

How to remove all but the first occurences of a character from a string?

How to Return true if String array having mix of Int and String type value converted to int in C#

How can I search through a string in C# and replace areas bounded by a pattern?

Getting parts of a string and combine them in C#?

C# split string but keep split chars / separators [duplicate]

Categories

Resources

Use Regex.Split var str = "AB123456789C123412341234B123"; Regex r = new Regex(#"([A-Z])|(\d)"); var parts = r.Split(str).Where(x=> !string.IsNullOrWhiteSpace(x)).ToArray(); if you want just A,B,C and D, use this Regex r = new Regex(#"([A-D])|(\d)");