Split a string into an array - c#

I want to split a string to an array of sub-strings. The string is delimited by space, but space may appear inside the sub-strings too. And spliced strings must be of the same length.
Example:
"a b aab bb aaa" -> "a b", "aab", "bb ", "aaa"
I have the following code:
var T = Regex.Split(S, #"(?<=\G.{4})").Select(x => x.Substring(0, 3));
But I need to parameterize this code, split by various length(3, 4, 5 or n) and I don't know how do this. Please help.
If impossible to parameterize Regex, fully linq version ok.

You can use the same regex, but "parameterize" it by inserting the desired number into the string.
In C# 6.0, you can do it like this:
var n = 5;
var T = Regex.Split(S, $#"(?<=\G.{{{n}}})").Select(x => x.Substring(0, n-1));
Prior to that you could use string.Format:
var n = 5;
var regex = string.Format(#"(?<=\G.{{{0}}})", n);
var T = Regex.Split(S, regex).Select(x => x.Substring(0, n-1));

It seems rather easy with LINQ:
var source = "a b aab bb aaa";
var results =
Enumerable
.Range(0, source.Length / 4 + 1)
.Select(n => source.Substring(n * 4, 3))
.ToList();
Or using Microsoft's Reactive Framework's team's Interactive Extensions (NuGet "Ix-Main") and do this:
var results =
source
.Buffer(3, 4)
.Select(x => new string(x.ToArray()))
.ToList();
Both give you the output you require.

A lookbehind (?<=pattern) matches a zero-length string. To split using spaces as delimiters, the match has to actually return a "" (the space has to be in the main pattern, outside the lookbehind).
Regex for length = 3: #"(?<=\G.{3}) " (note the trailing space)
Code for length n:
var n = 3;
var S = "a b aab bb aaa";
var regex = #"(?<=\G.{" + n + #"}) ";
var T = Regex.Split(S, regex);
Run this code online

Related

Extract only Coefficients from a polynomial Equation in C# using Regular Expression?

String p = "f(x) = 0.0000001122*x^5 - 0.0000184003*x^4 + 0.0009611014*x^3 - 0.0179035548*x^2 - 0.7956585082*x + 79.9900932407";
String expr1 = p.ToString().Replace(" ", "");
var results = Regex.Matches(expr1, #"[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?").Cast<Match>().Select(m => m.Value).ToList();
Console.WriteLine(results[9]);
I am able to extract coefficients of the equation, but the output also contains the powers of x, which I don't want.
Can Anyone please assist me with this.
I am not much familiar with Regular Expression.
Thank you.
You can try with var results = Regex.Matches(expr1, #"\d+\.?\d+").Cast<Match>().Select(m => m.Value).ToList(); as long as coefficients are numbers represented only with digits.
\d+ matches one or more digits
\.? matches 0 or 1 "."
By regex it may be difficult; with Linq it would be easier: take every two coefficient, you get what you want:
Regex.Matches(expr1, #"[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?").Cast<Match>().Select(m => m.Value).Where ((x,i) => i%2==0).ToList();
It uses the indexed version of Where.
try this
String p = "f(x) = 0.0000001122*x^5 - 0.0000184003*x^4 + 0.0009611014*x^3 - 0.0179035548*x^2 - 0.7956585082*x + 79.9900932407";
String expr1 = p.ToString().Replace(" ", "");
var results = Regex.Matches(expr1, #"(?<coe>[-+]?[0-9]*\.[0-9]+)").Cast<Match>().Select(m => m.Groups["coe"].Value).ToList();
foreach (var result in results)
{
Console.WriteLine($"{result}");
}
Console.Read();
The result :
0.0000001122
-0.0000184003
+0.0009611014
-0.0179035548
-0.7956585082
+79.9900932407

Can you split a string and keep the split char(s)?

Is there a way to split a string but keep the split char(s), if you do this:
"A+B+C+D+E+F+G+H".Split(new char[] { '+' });
you get
A
B
C
D
E
F
G
H
Is there a way to use split so it would keep the split char:
A
+B
+C
+D
+E
+F
+G
+H
or if you were to have + in front of A then
+A
+B
+C
+D
+E
+F
+G
+H
You can use Regex.Split with a pattern that doesn't consume delimiter characters:
var pattern = #"(?=\+)";
var ans = Regex.Split(src, pattern);
This will create an empty entry if there is a leading + as there is an implied split before the +.
You could use LINQ to remove the empty entries if they aren't wanted:
var ans2 = Regex.Split(src, pattern).Where(s => !String.IsNullOrEmpty(s)).ToArray();
Alternatively, you could use Regex.Matches to extract the full matching patterns:
var ans3 = Regex.Matches(src, #"\+[^+]*").Cast<Match>().Select(m => m.Value).ToArray();
You could do:
"A+B+C+D+E+F+G+H".Split(new char[] { '+' }).Select(x => "+" + x);

extracting strings between 2 chars - all occurrences

I would like to do something like this:
My string example: "something;123:somethingelse;156:somethingelse2;589:somethingelse3"
I would like to get an array with values extracted from the string example. These values lies between ";" and ":" : 123, 156, 589
I have tried this, but I do not know how to iterate to get all occurrences:
string str = stringExample.Split(';', ':')[1];
string[i] arr = str;
Thank you for helping me.
LINQ is your friend here, something like this would do:
str.Split(';').Select(s => s.Split(':')[0]).Skip(1)
I would work with named groups:
string stringExample = "something;123:somethingelse;156:somethingelse2;589:somethingelse3";
Regex r = new Regex(";(?<digit>[0-9]+):");
foreach (Match item in r.Matches(stringExample))
{
var digit = item.Groups["digit"].Value;
}
You can use a regular expression like this:
Regex r = new Regex(#";(\d+):");
string s = "something;123:somethingelse;156:somethingelse2;589:somethingelse3";
foreach(Match m in r.Matches(s))
Console.WriteLine(m.Groups[1]);
;(\d+): matches one or more digits standing between ; and : and Groups[1] selects the content inside the brackest, ergo the digits.
Output:
123
156
589
To get these strings into an array use:
string[] numberStrings = r.Matches(s).OfType<Match>()
.Select(m => m.Groups[1].Value)
.ToArray();
So you want to extract all 3 numbers, you could use this approach:
string stringExample = "something;123:somethingelse;156:somethingelse2;589:somethingelse3";
string[] allTokens = stringExample.Split(';', ':'); // remove [1] since you want the whole array
string[] allNumbers = allTokens.Where(str => str.All(Char.IsDigit)).ToArray();
Result is:
allNumbers {string[3]} string[]
[0] "123" string
[1] "156" string
[2] "589" string
This sounds like a perfect case for a regular expression.
var sample = "something;123:somethingelse;156:somethingelse2;589:somethingelse3";
var regex = new Regex(#"(?<=;)(\d+)(?=:)");
var matches = regex.Matches(sample);
var array = matches.Cast<Match>().Select(m => m.Value).ToArray();

Using Regex, how to find repeating patterns between 2 characters?

How an I use regex to find anything between 2 ASCII codes?
ASCII code STX (\u0002) and ETX (\u0003)
Example string "STX,T1,ETXSTX,1,1,1,1,1,1,ETXSTX,A,1,0,B,ERRETX"
Using Regex on the above my matches should be
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR
Did a bit of googling and I tried the following pattern but it didn't find anything.
#"^\u0002.*\u0003$"
UPDATE: Thank you all, some great answers below and all seem to work!
You could use Regex.Split.
var input = (char)2 + ",T1," + (char)3 + (char)2 + ",1,1,1,1,1,1," + (char)3 + (char)2 + ",A,1,0,B,ERR" + (char)3;
var result = Regex.Split(input, "\u0002|\u0003").Where(r => !String.IsNullOrEmpty(r));
You may use a non-regex solution, too (based on Wyatt's answer):
var result = input.Split(new[] {'\u0002', '\u0003'}) // split with the known char delimiters
.Where(p => !string.IsNullOrEmpty(p)) // Only take non-empty ones
.ToList();
A Regex solution I suggested in comments:
var res = Regex.Matches(input, "(?s)\u0002(.*?)\u0003")
.OfType<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
var s = "STX,T1,ETXSTX,1,1,1,1,1,1,ETXSTX,A,1,0,B,ERRETX";
s = s.Replace("STX", "\u0002");
s = s.Replace("ETX", "\u0003");
var result1 = Regex.Split(s, #"[\u0002\u0003]").Where(a => a != String.Empty).ToList();
result1.ForEach(a=>Console.WriteLine(a));
Console.WriteLine("------------ OR WITHOUT REGEX ---------------");
var result2 = s.Split(new char[] { '\u0002','\u0003' }, StringSplitOptions.RemoveEmptyEntries).ToList();
result2.ForEach(a => Console.WriteLine(a));
output:
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR
------------ OR WITHOUT REGEX ---------------
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR

What is a good approach for splitting this string in C#?

I need to split a string in C# that is formatted as follows:
"(11)123456(14)abc123(18)gt567"
With the desired result being a string array such as:
"(11)123456"
"(14)abc123"
"(18)gt567"
I'm guessing that a Regular Expression might be involved but that is one of my weak areas.
var s = "(11)123456(14)abc123(18)gt567";
Regex r = new Regex(#"\(\d+\)\w+");
var matches = r.Matches(s);
string[] array = new string[matches.Count];
for(int i = 0; i < matches.Count; i++)
array[i] = matches[i].Captures[0].Value;
var result = "(11)123456(14)abc123(18)gt567"
.Split(new string[]{"("}, StringSplitOptions.RemoveEmptyEntries)
.Select(i => "(" + i).ToList();
Something like:
string theString = "(11)123456(14)abc123(18)gt567";
Regex reSplit = new Regex(#"\(\d+\)[^\(]+");
var matches = reSplit.Matches(theString);
That will give you a collection of Match objects that you can then examine.
To get an array of strings:
var splits = matches.Cast<Match>().Select(m => m.Value).ToArray();
You can use a regex along with its Split method to get the array of parts.
var s = "(11)123456(14)abc123(18)gt567";
var pattern = new Regex(#"(\([^\(]+)");
var components = pattern.Split(s)
The pattern matches an left parenthesis followed by any number of characters up until the next left parenthesis.
If you need to deal with whitespace such as new lines, you might need to tweak the pattern or the RegexOptions a little.

Categories

Resources