Regular expression to split on spaces, but included phrases in quotes? - c#

Here's an example string -
"EP(DebugFlag="N",UILogFlag="N")" Other words here
I'd like to split the string by spaces, but need to keep quoted phrases together - even if there are quotes within quotes. So I'd like the sample string to be split as -
"EP(DebugFlag="N",UILogFlag="N")"
Other
words
here
I'm not sure how to take the quotes into consideration (finding the starting and ending one). Is there an easy way to do this?
Thanks!

You mean something lie this:
string example = #"EP(DebugFlag='N',UILogFlag='N') Other words here";
var result = example.Split(new string[]{" "}, StringSplitOptions.RemoveEmptyEntries).ToList();
foreach(var phrase in result)
{
Console.WriteLine("{0}", phrase);
}
Note: as #CodeCaster suggested, i have to mention that i replaced double-quote with single quote to provide "working example". If your sample differs to mine, you have to provide exact text without arounding quotes.

Related

Split string by different marks

how to split string by several different symbols, for example like dot . and - in c# string
string str = "sally-vikram.dean.sarah-ray";
but without replace all to same mark:
str = str.Replace("-", "."):
and split by dot for example:
string[] words = str.Split('.');
to get:
sally
vikram
dean
sarah
ray
string.Split can actually take an array of values:
string[] words = str.Split('.', '-');
For your use case, a regex character class (MSDN) is a good choise:
string[] words = Regex.Split(str, "[.-]");
Note: Since - is also used to define a character range like a-z it's good practice to put the - at the end of character group. Otherwise, just escape it, e.g. \-.
This is most appropriate if you expected that you need further delimiters and other requirements, find the regex more readable, and performance isn't an issue (the Regex.Split is much slower than the String.Split equivalent).

Concatenate and split strings

Maybe this is a silly question. But I haven't found an answer yet.
I've got some strings. I would like to concatenate them and then split the resulting string in a different moment. I would like to know if there's something available inside the .NET Framework. The Join and Split methods of String work quite well. The problem is to escape the separator character.
For example, I would like to use the "#" as separator character. If I have "String1", "Str#ing2" and "String3", I would like to obtain "String1#Str##ing2#String3".
Is there something that does what I need or do I have to write my own function?
Thank you.
Just escape the separator on the way in.
var inputs = ["String1", "Str#ing2", "String3"];
var joined = string.Join(inputs.Select(i => i.Replace("#", "##"));
You can then split on single # chars.
var split = Regex.Split(joined, "(?<!#)#(?!#)");
This uses zero-width negative lookbehind/lookahead patterns to assert the character before and after the # is not another #. You should run some tests on cases where # is at the start or end of your input strings however.
Call .Replace("#", "##") on each string before passing them to .Join

Regex.Split on "() " and "?"

myString= "First?Second Third";
String[] result = Regex.Split(myString, #"( )\?");
Should result:
First,
Second,
Third
What am I missing? (I also need brackets to split on for something else)
I guess with ( ), you meant whitespace. You don't need any capturing group there. Just use alteration, or a character class:
String[] result = Regex.Split(myString, #"\s|\?");
// OR
String[] result = Regex.Split(myString, #"[\s?]");
Using string methods:
myString= "First?Second Third";
String[] result = myString.Split(' ','?');
I'm not quite sure what you are trying to do with the quotes. Remember that in C# parenthesis are used to denote a logical group in your regular expression, they do not escape a space. Rather you want to split on an explicit set of characters, which is denoted by brackets []. You should use the following pattern to split:
String[] result = Regex.Split(myString, #"[\?\s]");
Note that \? is an escaped space (as you had in your original). White-space characters are escaped as \s. Thus, my solution is essentially saying to separate the string on any of the explicitly indicated characters (based on the []) and lists those characters as ? (escaped as \?) and " " (escaped as \s).
EDIT AFTER MORE INFO FROM OP:
I also saw, after answering this post, that you editted the top comment to say you wanted a logical grouping for the white-space, in which case I would go with:
String[] result = Regex.Split(myString, #"[\?(\s)]");
You need to surround those chars inside [] to create a range of them. [\s\?] This will split on:
a space
?
You can use \s to handle "any" whitespace char.

How to split string preserving spaces and any number of \n characters

I want to split the string and create a collection, with the following rules:
The string should be splitted into words.
1) If the string contains '\n' it should be considered as a seperate '\n' word.
2) If the string contains more than one '\n' it should considered it as more than on '\n' words.
3) No space should be removed from the string. Only exception is, if space comes between two \n it can be ignored.
PS: I tried a lot with string split, first split-ted \n characters and created a collection, downside is, if I have two \n consecutively, I'm unable to create two dummy words into the collection. Any help would be greatly appreciated.
Is there anyway to do this using regex?
Split with a regex like this:
(?<=[\S\n])(?=\s)
Something like:
var substrings = Regex.Split(input, #"(?<=[\S\n])(?=\s)");
This will not remove any spaces at all, but that was not required so should be fine.
If you really want the spaces between \ns to be removed, you could split with something like:
(?<=[\S\n])(?=\s)(?:[ \t]+(?=\n))?
Looks like homework. As such, read up on \b.
Should set you in the right direction.
Read up on the zero-width assertions. With them you can define a split position between e.g. \s and \S without actually matching either adjacent character.
edit:
Here's another question where the OP asked about those constructs.

regex split to implement tokenizer

I have a string with all possible chars and now I want to split it by following
"+"
",OU="
can anyone show me how to do this with regex.split?
I tried many times, but still no luck
I'm using C#
I think you can use string.split, which you can specify multiple separators.
string[] separator = new string[]{"+", ",OU="};
string[] resultTokens = testString.split(separator, StringSplitOption.None);
for the Regex version :
string[] split = Regex.Split(yourstring, #"\+|OU=");
You may have needed a backslash in front of the "+" to treat it as a literal, and you're probably defining the regex using a string, so the string itself will want the backslash character escaped. It can be easier to read to use square brackets instead.
"([+]|,[Oo][Uu]=)"

Categories

Resources