Can you split a string and keep the split char(s)? - c#

Is there a way to split a string but keep the split char(s), if you do this:
"A+B+C+D+E+F+G+H".Split(new char[] { '+' });
you get
A
B
C
D
E
F
G
H
Is there a way to use split so it would keep the split char:
A
+B
+C
+D
+E
+F
+G
+H
or if you were to have + in front of A then
+A
+B
+C
+D
+E
+F
+G
+H

You can use Regex.Split with a pattern that doesn't consume delimiter characters:
var pattern = #"(?=\+)";
var ans = Regex.Split(src, pattern);
This will create an empty entry if there is a leading + as there is an implied split before the +.
You could use LINQ to remove the empty entries if they aren't wanted:
var ans2 = Regex.Split(src, pattern).Where(s => !String.IsNullOrEmpty(s)).ToArray();
Alternatively, you could use Regex.Matches to extract the full matching patterns:
var ans3 = Regex.Matches(src, #"\+[^+]*").Cast<Match>().Select(m => m.Value).ToArray();

You could do:
"A+B+C+D+E+F+G+H".Split(new char[] { '+' }).Select(x => "+" + x);

Related

Replace text in camelCase inside the string

I have string like:
/api/agencies/{AgencyGuid}/contacts/{ContactGuid}
I need to change text in { } to cameCase
/api/agencies/{agencyGuid}/contacts/{contactGuid}
How can I do that? What is the best way to do that? Please help
I have no experience with Regex. So, I have tried so far:
string str1 = "/api/agencies/{AgencyGuid}/contacts/{ContactGuid}";
string str3 = "";
int i = 0;
while(i < str1.Length)
{
if (str1[i] == '{')
{
str3 += "{" + char.ToLower(str1[i + 1]);
i = i + 2;
} else
{
str3 += str1[i];
i++;
}
}
You can do it with regex of course.
But you can do it also with LINQ like this:
var result = String.Join("/{",
str1.Split(new string[1] { "/{" }, StringSplitOptions.RemoveEmptyEntries)
.Select(k => k = !k.StartsWith("/") ? Char.ToLowerInvariant(k[0]) + k.Substring(1) : k));
What is done here is: Splitting into 3 parts:
"/api/agencies/"
"AgencyGuid}/contactpersons"
"ContactPersonGuid}"
After that we are selecting from each element such value: "If you start with "/" it means you are the first element. If so - you should be returned without tampering. Otherwise : take first char (k[0]) change it to lowercase ( Char.ToLowerInvariant() ) and concatenate with the rest.
At the end Join those three (one unchanged and two changed) strings
With Regex you can do it as:
var regex = new Regex(#"\/{(\w)");
var result = regex.Replace(str1, m => m.ToString().ToLower());
in regex we search for pattern "/{\w" meaning find "/{" and one letter (\w). This char will be taken into a group ( because of () surrounding) and after that run Regex and replace such group to m.ToString().ToLower()
I probably wouldn't use regex, but since you asked
Regex.Replace(
"/api/agencies/{AgencyGuid}/contactpersons/{ContactPersonGuid}",
#"\{[^\}]+\}",
m =>
$"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}",
RegexOptions.ExplicitCapture
)
This assumes string interpolation in c# 6, but you can do the same thing by concatenating.
Explanation:
{[^}]+} - grab all letters that follow an open mustache that are not a close mustache and then the close mustache
m => ... - A lambda to run on each match
"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}" - return a new string by taking the an open mustache, the first letter lowercased, then the rest of the string, then a close mustache.

Using Regex, how to find repeating patterns between 2 characters?

How an I use regex to find anything between 2 ASCII codes?
ASCII code STX (\u0002) and ETX (\u0003)
Example string "STX,T1,ETXSTX,1,1,1,1,1,1,ETXSTX,A,1,0,B,ERRETX"
Using Regex on the above my matches should be
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR
Did a bit of googling and I tried the following pattern but it didn't find anything.
#"^\u0002.*\u0003$"
UPDATE: Thank you all, some great answers below and all seem to work!
You could use Regex.Split.
var input = (char)2 + ",T1," + (char)3 + (char)2 + ",1,1,1,1,1,1," + (char)3 + (char)2 + ",A,1,0,B,ERR" + (char)3;
var result = Regex.Split(input, "\u0002|\u0003").Where(r => !String.IsNullOrEmpty(r));
You may use a non-regex solution, too (based on Wyatt's answer):
var result = input.Split(new[] {'\u0002', '\u0003'}) // split with the known char delimiters
.Where(p => !string.IsNullOrEmpty(p)) // Only take non-empty ones
.ToList();
A Regex solution I suggested in comments:
var res = Regex.Matches(input, "(?s)\u0002(.*?)\u0003")
.OfType<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
var s = "STX,T1,ETXSTX,1,1,1,1,1,1,ETXSTX,A,1,0,B,ERRETX";
s = s.Replace("STX", "\u0002");
s = s.Replace("ETX", "\u0003");
var result1 = Regex.Split(s, #"[\u0002\u0003]").Where(a => a != String.Empty).ToList();
result1.ForEach(a=>Console.WriteLine(a));
Console.WriteLine("------------ OR WITHOUT REGEX ---------------");
var result2 = s.Split(new char[] { '\u0002','\u0003' }, StringSplitOptions.RemoveEmptyEntries).ToList();
result2.ForEach(a => Console.WriteLine(a));
output:
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR
------------ OR WITHOUT REGEX ---------------
,T1,
,1,1,1,1,1,1,
,A,1,0,B,ERR

Split a string into an array

I want to split a string to an array of sub-strings. The string is delimited by space, but space may appear inside the sub-strings too. And spliced strings must be of the same length.
Example:
"a b aab bb aaa" -> "a b", "aab", "bb ", "aaa"
I have the following code:
var T = Regex.Split(S, #"(?<=\G.{4})").Select(x => x.Substring(0, 3));
But I need to parameterize this code, split by various length(3, 4, 5 or n) and I don't know how do this. Please help.
If impossible to parameterize Regex, fully linq version ok.
You can use the same regex, but "parameterize" it by inserting the desired number into the string.
In C# 6.0, you can do it like this:
var n = 5;
var T = Regex.Split(S, $#"(?<=\G.{{{n}}})").Select(x => x.Substring(0, n-1));
Prior to that you could use string.Format:
var n = 5;
var regex = string.Format(#"(?<=\G.{{{0}}})", n);
var T = Regex.Split(S, regex).Select(x => x.Substring(0, n-1));
It seems rather easy with LINQ:
var source = "a b aab bb aaa";
var results =
Enumerable
.Range(0, source.Length / 4 + 1)
.Select(n => source.Substring(n * 4, 3))
.ToList();
Or using Microsoft's Reactive Framework's team's Interactive Extensions (NuGet "Ix-Main") and do this:
var results =
source
.Buffer(3, 4)
.Select(x => new string(x.ToArray()))
.ToList();
Both give you the output you require.
A lookbehind (?<=pattern) matches a zero-length string. To split using spaces as delimiters, the match has to actually return a "" (the space has to be in the main pattern, outside the lookbehind).
Regex for length = 3: #"(?<=\G.{3}) " (note the trailing space)
Code for length n:
var n = 3;
var S = "a b aab bb aaa";
var regex = #"(?<=\G.{" + n + #"}) ";
var T = Regex.Split(S, regex);
Run this code online

replacing characters in a single field of a comma-separated list

I have string in my c# code
a,b,c,d,"e,f",g,h
I want to replace "e,f" with "e f" i.e. ',' which is inside inverted comma should be replaced by space.
I tried using string.split but it is not working for me.
OK, I can't be bothered to think of a regex approach so I am going to offer an old fashioned loop approach which will work:
string DoReplace(string input)
{
bool isInner = false;//flag to detect if we are in the inner string or not
string result = "";//result to return
foreach(char c in input)//loop each character in the input string
{
if(isInner && c == ',')//if we are in an inner string and it is a comma, append space
result += " ";
else//otherwise append the character
result += c;
if(c == '"')//if we have hit an inner quote, toggle the flag
isInner = !isInner;
}
return result;
}
NOTE: This solution assumes that there can only be one level of inner quotes, for example you cannot have "a,b,c,"d,e,"f,g",h",i,j" - because that's just plain madness!
For the scenario where you only need to match one pair of letters, the following regex will work:
string source = "a,b,c,d,\"e,f\",g,h";
string pattern = "\"([\\w]),([\\w])\"";
string replace = "\"$1 $2\"";
string result = Regex.Replace(source, pattern, replace);
Console.WriteLine(result); // a,b,c,d,"e f",g,h
Breaking apart the pattern, it is matching any instance where there is a "X,X" sequence where X is any letter, and is replacing it with the very same sequence, with a space in between the letters instead of a comma.
You could easily extend this if you needed to to have it match more than one letter, etc, as needed.
For the case where you can have multiple letters separated by commas within quotes that need to be replaced, the following can do it for you. Sample text is a,b,c,d,"e,f,a",g,h:
string source = "a,b,c,d,\"e,f,a\",g,h";
string pattern = "\"([ ,\\w]+),([ ,\\w]+)\"";
string replace = "\"$1 $2\"";
string result = source;
while (Regex.IsMatch(result, pattern)) {
result = Regex.Replace(result, pattern, replace);
}
Console.WriteLine(result); // a,b,c,d,"e f a",g,h
This does something similar compared to the first one, but just removes any comma that is sandwiched by letters surrounded by quotes, and repeats it until all cases are removed.
Here's a somewhat fragile but simple solution:
string.Join("\"", line.Split('"').Select((s, i) => i % 2 == 0 ? s : s.Replace(",", " ")))
It's fragile because it doesn't handle flavors of CSV that escape double-quotes inside double-quotes.
Use the following code:
string str = "a,b,c,d,\"e,f\",g,h";
string[] str2 = str.Split('\"');
var str3 = str2.Select(p => ((p.StartsWith(",") || p.EndsWith(",")) ? p : p.Replace(',', ' '))).ToList();
str = string.Join("", str3);
Use Split() and Join():
string input = "a,b,c,d,\"e,f\",g,h";
string[] pieces = input.Split('"');
for ( int i = 1; i < pieces.Length; i += 2 )
{
pieces[i] = string.Join(" ", pieces[i].Split(','));
}
string output = string.Join("\"", pieces);
Console.WriteLine(output);
// output: a,b,c,d,"e f",g,h

Losing characters in strings after performing a split with RegEx

I want to split a string into 2 strings,
my string looks like:
HAMAN*3 whitespaces*409991
I want to have two strings the first with 'HAMAN' and the second should contain '409991'
PROBLEM: My second string has '09991' instead of '409991' after implementing this code:
string str = "HAMAN 409991 ";
string[] result = Regex.Split(str, #"\s4");
foreach (var item in result)
{
MessageBox.Show(item.ToString());
}
CURRENT SOLUTION with PROBLEM:
Split my original string when I find whitespace followed by the number 4. The character '4' is missing in the second string.
PREFERED SOLUTION:
To split when I find 3 whitespaces followed by the digit 4. And have '4' mentioned in my second string.
Try this
string[] result = Regex.Split(str, #"\s{3,}(?=4)");
Here is the Demo
Positive lookahead is your friend:
Regex.Split(str, #"\s+(?=4)");
Or you could not use Regex:
var str = "HAMAN 409991 ";
var result = str.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
EXAMPLE
Alternative if you need it to start with SPACESPACESPACE4:
var str = new[] {
"HAMAN 409991 ",
"HAMAN 409991",
"HAMAN 509991"
};
foreach (var s in str)
{
var result = s.Trim()
.Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries)
.Select(a => a.Trim())
.ToList();
if (result.Count != 2 || !result[1].StartsWith("4"))
continue;
Console.WriteLine("{0}, {1}", result[0], result[1]);
}
That's because you're splitting including the 4. If you want to split by three-consecutive-spaces then you should specify exactly that:
string[] result = Regex.Split(str, #"\s{3}");

Categories

Resources