Using Regex, how to find repeating patterns between 2 characters? - c#

How an I use regex to find anything between 2 ASCII codes?
ASCII code STX (\u0002) and ETX (\u0003)
Example string "STX,T1,ETXSTX,1,1,1,1,1,1,ETXSTX,A,1,0,B,ERRETX"
Using Regex on the above my matches should be
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR
Did a bit of googling and I tried the following pattern but it didn't find anything.
#"^\u0002.*\u0003$"
UPDATE: Thank you all, some great answers below and all seem to work!

You could use Regex.Split.
var input = (char)2 + ",T1," + (char)3 + (char)2 + ",1,1,1,1,1,1," + (char)3 + (char)2 + ",A,1,0,B,ERR" + (char)3;
var result = Regex.Split(input, "\u0002|\u0003").Where(r => !String.IsNullOrEmpty(r));

You may use a non-regex solution, too (based on Wyatt's answer):
var result = input.Split(new[] {'\u0002', '\u0003'}) // split with the known char delimiters
.Where(p => !string.IsNullOrEmpty(p)) // Only take non-empty ones
.ToList();
A Regex solution I suggested in comments:
var res = Regex.Matches(input, "(?s)\u0002(.*?)\u0003")
.OfType<Match>()
.Select(p => p.Groups[1].Value)
.ToList();

var s = "STX,T1,ETXSTX,1,1,1,1,1,1,ETXSTX,A,1,0,B,ERRETX";
s = s.Replace("STX", "\u0002");
s = s.Replace("ETX", "\u0003");
var result1 = Regex.Split(s, #"[\u0002\u0003]").Where(a => a != String.Empty).ToList();
result1.ForEach(a=>Console.WriteLine(a));
Console.WriteLine("------------ OR WITHOUT REGEX ---------------");
var result2 = s.Split(new char[] { '\u0002','\u0003' }, StringSplitOptions.RemoveEmptyEntries).ToList();
result2.ForEach(a => Console.WriteLine(a));
output:
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR
------------ OR WITHOUT REGEX ---------------
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR

Related

Can you split a string and keep the split char(s)?

Is there a way to split a string but keep the split char(s), if you do this:
"A+B+C+D+E+F+G+H".Split(new char[] { '+' });
you get
A
B
C
D
E
F
G
H
Is there a way to use split so it would keep the split char:
A
+B
+C
+D
+E
+F
+G
+H
or if you were to have + in front of A then
+A
+B
+C
+D
+E
+F
+G
+H
You can use Regex.Split with a pattern that doesn't consume delimiter characters:
var pattern = #"(?=\+)";
var ans = Regex.Split(src, pattern);
This will create an empty entry if there is a leading + as there is an implied split before the +.
You could use LINQ to remove the empty entries if they aren't wanted:
var ans2 = Regex.Split(src, pattern).Where(s => !String.IsNullOrEmpty(s)).ToArray();
Alternatively, you could use Regex.Matches to extract the full matching patterns:
var ans3 = Regex.Matches(src, #"\+[^+]*").Cast<Match>().Select(m => m.Value).ToArray();
You could do:
"A+B+C+D+E+F+G+H".Split(new char[] { '+' }).Select(x => "+" + x);

Splitting text and integers into array/list

I'm trying to find a way to split a string by its letters and numbers but I've had luck.
An example:
I have a string "AAAA000343BBB343"
I am either needing to split it into 2 values "AAAA000343" and "BBB343" or into 4 "AAAA" "000343" "BBB" "343"
Any help would be much appreciated
Thanks
Here is a RegEx approach to split your string into 4 values
string input = "AAAA000343BBB343";
string[] result = Regex.Matches(input, #"[a-zA-Z]+|\d+")
.Cast<Match>()
.Select(x => x.Value)
.ToArray(); //"AAAA" "000343" "BBB" "343"
So you can use regex
For
"AAAA000343" and "BBB343"
var regex = new Regex(#"[a-zA-Z]+\d+");
var result = regex
.Matches("AAAA000343BBB343")
.Cast<Match>()
.Select(x => x.Value);
// result outputs: "AAAA000343" and "BBB343"
For
4 "AAAA" "000343" "BBB" "343"
See #fubo answer
Try this:
var numAlpha = new Regex("(?<Alpha>[a-zA-Z]*)(?<Numeric>[0-9]*)");
var match = numAlpha.Match("codename123");
var Character = match.Groups["Alpha"].Value;
var Integer = match.Groups["Numeric"].Value;

extracting strings between 2 chars - all occurrences

I would like to do something like this:
My string example: "something;123:somethingelse;156:somethingelse2;589:somethingelse3"
I would like to get an array with values extracted from the string example. These values lies between ";" and ":" : 123, 156, 589
I have tried this, but I do not know how to iterate to get all occurrences:
string str = stringExample.Split(';', ':')[1];
string[i] arr = str;
Thank you for helping me.
LINQ is your friend here, something like this would do:
str.Split(';').Select(s => s.Split(':')[0]).Skip(1)
I would work with named groups:
string stringExample = "something;123:somethingelse;156:somethingelse2;589:somethingelse3";
Regex r = new Regex(";(?<digit>[0-9]+):");
foreach (Match item in r.Matches(stringExample))
{
var digit = item.Groups["digit"].Value;
}
You can use a regular expression like this:
Regex r = new Regex(#";(\d+):");
string s = "something;123:somethingelse;156:somethingelse2;589:somethingelse3";
foreach(Match m in r.Matches(s))
Console.WriteLine(m.Groups[1]);
;(\d+): matches one or more digits standing between ; and : and Groups[1] selects the content inside the brackest, ergo the digits.
Output:
123
156
589
To get these strings into an array use:
string[] numberStrings = r.Matches(s).OfType<Match>()
.Select(m => m.Groups[1].Value)
.ToArray();
So you want to extract all 3 numbers, you could use this approach:
string stringExample = "something;123:somethingelse;156:somethingelse2;589:somethingelse3";
string[] allTokens = stringExample.Split(';', ':'); // remove [1] since you want the whole array
string[] allNumbers = allTokens.Where(str => str.All(Char.IsDigit)).ToArray();
Result is:
allNumbers {string[3]} string[]
[0] "123" string
[1] "156" string
[2] "589" string
This sounds like a perfect case for a regular expression.
var sample = "something;123:somethingelse;156:somethingelse2;589:somethingelse3";
var regex = new Regex(#"(?<=;)(\d+)(?=:)");
var matches = regex.Matches(sample);
var array = matches.Cast<Match>().Select(m => m.Value).ToArray();

regex expression last match C#

I have string like this:
test- qweqw (Barcelona - Bayer) - testestsetset
And i need to capture Bayer word.I tried this regex expression ( between "-" and ")" )
(?<=-)(.*)(?=\))
Example: https://regex101.com/r/wI9zD0/2
As you see it worked a bit incorrect.What should i fix?
Here's a different regex to do what you are looking for:
-\s([^()]+)\)
https://regex101.com/r/wI9zD0/3
You don't need regex for that, you can use LINQ:
string input = "test - qweqw(Barcelona - Bayer) - testestsetset";
string res = String.Join("", input.SkipWhile(c => c != '(')
.SkipWhile(c => c != '-').Skip(1)
.TakeWhile(c => c != ')'))
.Trim();
Console.WriteLine(res); // Bayer

Split a string into an array

I want to split a string to an array of sub-strings. The string is delimited by space, but space may appear inside the sub-strings too. And spliced strings must be of the same length.
Example:
"a b aab bb aaa" -> "a b", "aab", "bb ", "aaa"
I have the following code:
var T = Regex.Split(S, #"(?<=\G.{4})").Select(x => x.Substring(0, 3));
But I need to parameterize this code, split by various length(3, 4, 5 or n) and I don't know how do this. Please help.
If impossible to parameterize Regex, fully linq version ok.
You can use the same regex, but "parameterize" it by inserting the desired number into the string.
In C# 6.0, you can do it like this:
var n = 5;
var T = Regex.Split(S, $#"(?<=\G.{{{n}}})").Select(x => x.Substring(0, n-1));
Prior to that you could use string.Format:
var n = 5;
var regex = string.Format(#"(?<=\G.{{{0}}})", n);
var T = Regex.Split(S, regex).Select(x => x.Substring(0, n-1));
It seems rather easy with LINQ:
var source = "a b aab bb aaa";
var results =
Enumerable
.Range(0, source.Length / 4 + 1)
.Select(n => source.Substring(n * 4, 3))
.ToList();
Or using Microsoft's Reactive Framework's team's Interactive Extensions (NuGet "Ix-Main") and do this:
var results =
source
.Buffer(3, 4)
.Select(x => new string(x.ToArray()))
.ToList();
Both give you the output you require.
A lookbehind (?<=pattern) matches a zero-length string. To split using spaces as delimiters, the match has to actually return a "" (the space has to be in the main pattern, outside the lookbehind).
Regex for length = 3: #"(?<=\G.{3}) " (note the trailing space)
Code for length n:
var n = 3;
var S = "a b aab bb aaa";
var regex = #"(?<=\G.{" + n + #"}) ";
var T = Regex.Split(S, regex);
Run this code online

Categories

Resources