Regex.Split on "() " and "?" - c#

myString= "First?Second Third";
String[] result = Regex.Split(myString, #"( )\?");
Should result:
First,
Second,
Third
What am I missing? (I also need brackets to split on for something else)

I guess with ( ), you meant whitespace. You don't need any capturing group there. Just use alteration, or a character class:
String[] result = Regex.Split(myString, #"\s|\?");
// OR
String[] result = Regex.Split(myString, #"[\s?]");

Using string methods:
myString= "First?Second Third";
String[] result = myString.Split(' ','?');

I'm not quite sure what you are trying to do with the quotes. Remember that in C# parenthesis are used to denote a logical group in your regular expression, they do not escape a space. Rather you want to split on an explicit set of characters, which is denoted by brackets []. You should use the following pattern to split:
String[] result = Regex.Split(myString, #"[\?\s]");
Note that \? is an escaped space (as you had in your original). White-space characters are escaped as \s. Thus, my solution is essentially saying to separate the string on any of the explicitly indicated characters (based on the []) and lists those characters as ? (escaped as \?) and " " (escaped as \s).
EDIT AFTER MORE INFO FROM OP:
I also saw, after answering this post, that you editted the top comment to say you wanted a logical grouping for the white-space, in which case I would go with:
String[] result = Regex.Split(myString, #"[\?(\s)]");

You need to surround those chars inside [] to create a range of them. [\s\?] This will split on:
a space
?
You can use \s to handle "any" whitespace char.

Related

Split string by different marks

how to split string by several different symbols, for example like dot . and - in c# string
string str = "sally-vikram.dean.sarah-ray";
but without replace all to same mark:
str = str.Replace("-", "."):
and split by dot for example:
string[] words = str.Split('.');
to get:
sally
vikram
dean
sarah
ray
string.Split can actually take an array of values:
string[] words = str.Split('.', '-');
For your use case, a regex character class (MSDN) is a good choise:
string[] words = Regex.Split(str, "[.-]");
Note: Since - is also used to define a character range like a-z it's good practice to put the - at the end of character group. Otherwise, just escape it, e.g. \-.
This is most appropriate if you expected that you need further delimiters and other requirements, find the regex more readable, and performance isn't an issue (the Regex.Split is much slower than the String.Split equivalent).

Regular expression to split on spaces, but included phrases in quotes?

Here's an example string -
"EP(DebugFlag="N",UILogFlag="N")" Other words here
I'd like to split the string by spaces, but need to keep quoted phrases together - even if there are quotes within quotes. So I'd like the sample string to be split as -
"EP(DebugFlag="N",UILogFlag="N")"
Other
words
here
I'm not sure how to take the quotes into consideration (finding the starting and ending one). Is there an easy way to do this?
Thanks!
You mean something lie this:
string example = #"EP(DebugFlag='N',UILogFlag='N') Other words here";
var result = example.Split(new string[]{" "}, StringSplitOptions.RemoveEmptyEntries).ToList();
foreach(var phrase in result)
{
Console.WriteLine("{0}", phrase);
}
Note: as #CodeCaster suggested, i have to mention that i replaced double-quote with single quote to provide "working example". If your sample differs to mine, you have to provide exact text without arounding quotes.

Regular Expression - Remove zeroes inside an expression

I need to remove leading zeroes from the numerical part of an expression (using .net 2.0 C# Regex class).
Ex:
PAR0000034 -> PAR34
WP0003204 -> WP3204
I tried the following:
//keep starting characters, get rid of leading zeroes, keep remaining digits
string result = Regex.Replace(inputStr, "^(.+)(0+)(/d*)", "$1$3", RegexOptions.IgnoreCase)
Obviously, it did not work. I need a bit of help to find the mistake.
You don't need a regular expression for that, the Split method can do that for you.
Splitting on '0', removing empty entries (i.e. between the mulitple zeroes), and limiting the result to two strings will give you the two strings before and after the leading zeroes. Then you just put those two strings together again:
string result = String.Concat(
input.Split(new char[] { '0' }, 2, StringSplitOptions.RemoveEmptyEntries)
);
In your expression the .* part is greedy, so it catches full string. Further
use backslash instead of slash for digit \d
string result = Regex.Replace(inputStr, #"^([^0]+)(0+)(\d*)", "$1$3");
Or use look behind instead:
string result = Regex.Replace(inputStr, "(?<=[a-zA-Z])0+", "");
This works for me:
Regex.Replace("PPP00001001", "([^0]*)0+(.*)", "$1$2");
The phrase "leading zeroes" is confusing, since the zeroes you're talking about aren't actually at the beginning of the string. But if I understand you correctly, you want this:
string result = Regex.Replace(inputStr, "^(.*?)0+", "$1");
There are actually several ways to do it, with and without regex, but the above is probably the shortest and easiest to understand. The important part is the .*? lazy quantifier. This will ensure that it a) finds only the first string of zeroes, and b) deletes all the "leading" zeroes in the string.

Regex.Split everything inside square brackets []

I'm really a n00b when it comes to regular expressions. I've been trying to Split a string wherever there's a [----anything inside-----] for example.
string s = "Hello Word my name_is [right now I'm hungry] Julian";
string[] words = Regex.Split( s, "------");
The outcome would be "Hello Word my name_is " and " Julian"
The regex you want to use is:
Regex.Split( s, "\\[.*?\\]" );
Square brackets are special characters (specifying a character group), so they have to be escaped with a backslash. Inside the square brackets, you want any sequence of characters EXCEPT a close square bracket. There are a couple of ways to handle that. One is to specify [^\]]* (explicitly specifying "not a close square bracket"). The other, as I used in my answer, is to specify that the match is not greedy by affixing a question mark after it. This tells the regular expression processor not to greedily consume as many characters as it can, but to stop as soon as the next expression is matched.
#"\[.*?\]" will match the brackets of text
Another way to write it:
Regex.Split(str, #"\[[^]]*\]");

regex split to implement tokenizer

I have a string with all possible chars and now I want to split it by following
"+"
",OU="
can anyone show me how to do this with regex.split?
I tried many times, but still no luck
I'm using C#
I think you can use string.split, which you can specify multiple separators.
string[] separator = new string[]{"+", ",OU="};
string[] resultTokens = testString.split(separator, StringSplitOption.None);
for the Regex version :
string[] split = Regex.Split(yourstring, #"\+|OU=");
You may have needed a backslash in front of the "+" to treat it as a literal, and you're probably defining the regex using a string, so the string itself will want the backslash character escaped. It can be easier to read to use square brackets instead.
"([+]|,[Oo][Uu]=)"

Categories

Resources