Using Regex to split string by different characters based on occurance - c#

I'm currently replacing a very old (and long) C# string parsing class that I think could be condensed into a single regex statement. Being a newbie to Regex, I'm having some issues getting it working correctly.
Description of the possible input strings:
The input string can have up to three words separated by spaces. It can stop there, or it can have an = followed by more words (any amount) separated by a comma. The words can also be contained in quotes. If a word is in quotes and has a space, it should NOT be split by the space.
Examples of input and expected output elements in the string array:
Input1:
this is test
Output1:
{"this", "is", "test"}
Input2:this is test=param1,param2,param3
Output2: {"this", "is", "test", "param1", "param2", "param3"}
Input3:use file "c:\test file.txt"=param1 , param2,param3
Output3: {"use", "file", "c:\test file.txt", "param1", "param2", "param3"}
Input4:log off
Output4: {"log", "off"}
And the most complex one:
Input5:
use object "c:\test file.txt"="C:\Users\layer.shp" | ( object = 10 ),param2
Output5:
{"use", "object", "c:\test file.txt", "C:\Users\layer.shp | ( object = 10 )", "param2"}
So to break this down:
I need to split by spaces up to the first three words
Then, if there is an =, ignore the = and split by commas instead.
If there are quotes around one of the first three words and contains a space, INCLUDE that space (don't split)
Here's the closest regex I've got:
\w+|"[\w\s\:\\\.]*"+([^,]+)
This seems to split the string based on spaces, and by commas after the =. However, it seems to include the = for some reason if one of the first three words is surrounded by quotes. Also, I'm not sure how to split by space only up to the first three words in the string, and the rest by comma if there is an =.
It looks like part of my solution is to use quantifiers with {}, but I've unable to set it up properly.

Without Regex. Regex should be used when string methods cannot be used. :
string[] inputs = {
"this is test",
"this is test=param1,param2,param3",
"use file \"c:\\test file.txt\"=param1 , param2,param3",
"log off",
"use object \"c:\\test file.txt\"=\"C:\\Users\\layer.shp\" | ( object = 10 ),param2"
};
foreach (string input in inputs)
{
List<string> splitArray;
if (!input.Contains("="))
{
splitArray = input.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
}
else
{
int equalPosition = input.IndexOf("=");
splitArray = input.Substring(0, equalPosition).Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
string end = input.Substring(equalPosition + 1);
splitArray.AddRange(end.Split(new char[] { ',' }).ToList());
}
string output = string.Join(",", splitArray.Select(x => x.Contains("\"") ? x : "\"" + x + "\""));
Console.WriteLine(output);
}
Console.ReadLine();

Related

How to split a string with a certain character without considering the length of the character?

I'm trying to work on this string
abc
def
--------------
efg
hij
------
xyz
pqr
--------------
Now I have to split the string with the - character.
So far I'm first spliting the string in lines and then finding the occurrence of - and the replacing the line with a single *, then combining the whole string and splitting them again.
I'm trying to get the data as
string[] set =
{
"abc
def",
"efg
hij",
"xyz
pqr"
}
Is there a better way to do this?
var spitStrings = yourString.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
If i understand your question, this above code solves it.
Use of string split function using the specific char or string of chars (here -) can be used.
The output will be array of strings. Then choose whichever strings you want.
Example:
http://www.dotnetperls.com/split
I'm confused with exactly what you're asking, but won't this work?
string[] seta =
{
"abc\ndef",
"efg\nhij",
"xyz\npqr"
}
\n = CR (Carriage Return) // Used as a new line character in Unix
\r = LF (Line Feed) // Used as a new line character in Mac OS
\n\r = CR + LF // Used as a new line character in Windows
(char)13 = \n = CR // Same as \n
If I'm understanding your question about splitting -'s then the following should work.
string s = "abc-def-efg-hij-xyz-pqr"; // example?
string[] letters = s.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
If this is what your array looks like at the moment, then you can loop through it as follows:
string[] seta = {
"abc-def",
"efg-hij",
"xyz-pqr"
};
foreach (var letter in seta)
{
string[] letters = letter.Split(new char[] { '-' }, StringSplitOptions.RemoveEmptyEntries);
// do something with letters?
}
I'm sure this below code will help you...
string m = "adasd------asdasd---asdasdsad-------asdasd------adsadasd---asdasd---asdadadad-asdadsa-asdada-s---adadasd-adsd";
var array = m.Split('-');
List<string> myCollection = new List<string>();
if (array.Length > 0)
{
foreach (string item in array)
{
if (item != "")
{
myCollection.Add(item);
}
}
}
string[] str = myCollection.ToArray();
if it does then don't forget to mark my answer thanks....;)
string set = "abc----def----------------efg----hij--xyz-------pqr" ;
var spitStrings = set.Split(new char[]{'-'},StringSplitOptions.RemoveEmptyEntries);
EDIT -
He wants to split the strings no matter how many '-' are there.
var spitStrings = set.Split(new char[]{'-'},StringSplitOptions.RemoveEmptyEntries);
This will do the work.

Is it possible to split a string into an array of strings and remove sections not between delimiters using String.Split or regex?

I was wanting to split a string with a known delimiter between different parts into an array of strings using a method (e.g. MethodToSplitIntoArray(String toSplit) like in the example below. The values are string values which can have any character except for '{', '}', or ',' so am unable to delimit on any other character. The string can also contain undesired white space at the start and end as the file can be generated from multiple different sources, the desired information will also be inbetween "{" "}" and separated by a comma.
String valueCombined = " {value},{value1},{value2} ";
String[] values = MethodToSplitIntoArray(valueCombined);
foreach(String value in values)
{
//Do something with array
Label.Text += "\r\nString: " + value;
}
Where the label would show:
String: value
String: value1
String: value2
My current implementation of splitting method is below. It splits the values but includes any spaces before the first parenthesis and anything between them.
private String[] MethodToSplitIntoArray(String toSplit)
{
return filesPassed.Split(new string[] { "{", "}" }, StringSplitOptions.RemoveEmptyEntries);
}
I though this would separate out the strings between the curly braces and remove the rest of the string, but my output is:
String:
String: value
String: ,
String: value1
String: ,
String: value2
String:
What am I doing wrong in my split that I'm still getting the string values outside of the parenthesis? Ideally I would like to use regex or String.Split if its possible
For those with similar problems check out DotNet Perls on splitting
Making the assumption that commas are not permitted inside a curly brace pair, and that outside a curly brace pair only commas or whitespace will appear, it seems to me that the most straightforward, easy-to-read way to approach this is to first split on commas, then trim the results of that (to remove whitespace), and then finally to remove the first and last characters (which at that point should only be the curly braces):
valuesCombined.Split(',').Select(s => s.Trim().Substring(1, s.Length - 2)).ToArray();
I believe that including the curly braces in the initial split operation just makes everything harder, and is more likely to break in hard-to-identify ways (i.e. bad data will result in weirder results than if you use something like the above).
Add , to delimeters:
return filesPassed.Split(new char[] { '{', '}', ',' }, StringSplitOptions.RemoveEmptyEntries);
Not sure if you are expecting those spaces in the front and end so added some trimming to prevent empty results for those.
private String[] MethodToSplitIntoArray(String toSplit)
{
return toSplit.Trim().Split(new char[] { '{', '}', ',' }, StringSplitOptions.RemoveEmptyEntries);
}
This might be one of the way to get all the values as u are looking for
String valueCombined = " {value},{value1},{value2} ";
String[] values = valueCombined.Split(new string[] { "},{" }, StringSplitOptions.RemoveEmptyEntries);
int lastVal = values.Count() - 1;
values[0] = values[0].Replace("{", "");
values[lastVal] = values[lastVal].Replace("}", "");
What I did here is that splited the string with "},{" and then removed { from the first array item and } from the last array item.
Try regex and linq.
return Regex.Split(toSplit, "[.{.}.,]").Where(x => !string.IsNullOrWhiteSpace(x)).ToArray();
Though very late but can you try this:
Regex.Split(" { value},{ value1},{ value2};", #"\s*},{\s*|{\s*|},?;?").Where(s => string.IsNullOrWhiteSpace(s) == false).ToArray()

How do I know which delimiter was used when delimiting a string on multiple delimiters? (C#)

I read strings from a file and they come in various styles:
item0 item1 item2
item0,item1,item2
item0_item1_item2
I split them like this:
string[] split_line = line[i].split(new char[] {' ',',','_'});
I change an item (column) and then i stitch the strings back together using string builder.
But now when putting the string back I have to use the right delimiter.
Is it possible to know which delimiter was used when splitting the string?
UPDATE
the caller will pass me the first item so that I only change that line.
Unless you keep track of splitting action (one at the time) you don't.
Otherwise, you could create a regular expression, to catch the item and the delimiter and go from there.
Instead of passing in an array of characters, you can use a Regex to split the string instead. The advantage of doing this, is that you can capture the splitting character. Regex.Split will insert any captures between elements in the array like so:
string[] space = Regex.Split("123 456 789", #"([,_ ])");
// Results in { "123", " ", "456", " ", "789" }
string[] comma = Regex.Split("123,456,789", #"([,_ ])");
// Results in { "123", ",", "456", ",", "789" }
string[] underscore = Regex.Split("123_456_789", #"([,_ ])");
// Results in { "123", "_", "456", "_", "789" }
Then you can edit all items in the array with something like
for (int x = 0; x < space.Length; x += 2)
space[x] = space[x] + "x";
Console.WriteLine(String.Join("", space));
// Will print: 123x 456x 789x
One thing to be wary of when dealing with multiple separators is if there are any lines that have spaces, commas and underscores in them. e.g.
37,hello world,238_3
This code will preserve all the distinct separators but your results might not be expected. e.g. the output of the above would be:
37x,hellox worldx,238x_3x
As I mentioned that the caller passes me the first item so I tried something like this:
// find the right row
if (lines[i].ToLower().StartsWith(rowID))
{
// we have to know which delim was used to split the string since this will be
// used when stitching back the string together.
for (int delim = 0; delim < delims.Length; delim++)
{
// we split the line into an array and then use the array index as our column index
split_line = lines[i].Trim().Split(delims[delim]);
// we found the right delim
if (split_line.Length > 1)
{
delim_used = delims[delim];
break;
}
}
}
basically I iterate each line over the delims and check the resulting array length. If it is > 1 that means that delim worked otherwise skip to next one. I am using split functions property "If this instance does not contain any of the characters in separator, the returned array consists of a single element that contains this instance."

Insert space between characters

I have a string,
string aString = "a,aaa,aaaa,aaaaa,,,,,";
Where i want to insert to a List..But when i do using the following method,
List<string> aList = new List<string>();
aList.AddRange(aString.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries));
MessageBox.Show(aList.Count.ToString());
I get the count as only 4, But there are actually 8 elements even the final characters in between the (,) sign is blank.
But if i pass the string as shown below,
string aString = "a,aaa,aaaa,aaaaa, , , , ,";
It will be shown as 8 elements..Please help me on this, the default way thw program retrieves the string is like so,
a,aaa,aaaa,aaaaa,,,,,
Please help on this one, It would be great if i could add spaces to the empty area or any other way so that i could add all these characters in between (,) sign to the list.. even the blank areas. Thank you :)
Don't use StringSplitOptions.RemoveEmptyEntries
string aString = "a,aaa,aaaa,aaaaa,,,,,";
var newStr = String.Join(", ", aString.Split(','));
I think you must remove StringSplitOptions.RemoveEmptyEntries
aList.AddRange(aString.Replace(",,", ", ,").Split(new string[] { "," }));
You can just Replace the space before split it.
aList.AddRange(aString.Replace(" ", "").Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries));

Split a string by word using one of any or all delimiters?

I may have just hit the point where i;m overthinking it, but I'm wondering: is there a way to designate a list of special characters that should all be considered delimiters, then splitting a string using that list? Example:
"battlestar.galactica-season 1"
should be returned as
battlestar galactica season 1
i'm thinking regex but i'm kinda flustered at the moment, been staring at it for too long.
EDIT:
Thanks guys for confirming my suspicion that i was overthinking it lol: here is what i ended up with:
//remove the delimiter
string[] tempString = fileTitle.Split(#"\/.-<>".ToCharArray());
fileTitle = "";
foreach (string part in tempString)
{
fileTitle += part + " ";
}
return fileTitle;
I suppose i could just replace delimiters with " " spaces as well... i will select an answer as soon as the timer is up!
The built-in String.Split method can take a collection of characters as delimiters.
string s = "battlestar.galactica-season 1";
string[] words = s.split('.', '-');
The standard split method does that for you. It takes an array of characters:
public string[] Split(
params char[] separator
)
You can just call an overload of split:
myString.Split(new char[] { '.', '-', ' ' }, StringSplitOptions.RemoveEmptyEntries);
The char array is a list of delimiters to split on.
"battlestar.galactica-season 1".Split(new string[] { ".", "-" }, StringSplitOptions.RemoveEmptyEntries);
This may not be complete but something like this.
string value = "battlestar.galactica-season 1"
char[] delimiters = new char[] { '\r', '\n', '.', '-' };
string[] parts = value.Split(delimiters,
StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < parts.Length; i++)
{
Console.WriteLine(parts[i]);
}
Are you trying to split the string (make multiple strings) or do you just want to replace the special characters with a space as your example might also suggest (make 1 altered string).
For the first option just see the other answers :)
If you want to replace you could use
string title = "battlestar.galactica-season 1".Replace('.', ' ').Replace('-', ' ');
For more information split with easy examples you may see following Url:
This also include split on words (multiple chars).
C# Split Function explained

Categories

Resources