How to parse out all unique variables with a certain naming convention? - c#

I have a code file and I need to find all unique objects of type TADODataSet, but they aren't defined in this 30,000 line file I have.
I wrote a console application that splits each line into individual words and adds that word to a list if it contains ADODataSet (the naming convention prefix for the objects I'm interested in) but this didn't work quite right because of how I'm splitting my lines of code.
This is all of my code:
static void Main(string[] args)
{
string file = #"C:\somePath\Form1.cs";
string output = #"C:\someOtherPath\New Text Document.txt";
List<string> datasets = new List<string>();
string[] lines = File.ReadAllLines(file);
foreach (string line in lines)
{
string[] words = line.Split(' ');
foreach (string word in words)
{
if (word.ToLower().Contains("adodataset"))
datasets.Add(word);
}
}
if (datasets.Count > 0)
{
using (StreamWriter sw = new StreamWriter(output))
{
foreach (string dataset in datasets.Distinct())
{
sw.WriteLine(dataset);
}
}
Console.WriteLine(String.Format("Wrote {0} data sets to {1}", datasets.Distinct().Count(), output));
Console.ReadKey();
}
}
But this didn't work as I hoped, and added "words" such as these:
SQLText(ADODataSetEnrollment->FieldByName("Age1")->AsString)
SQLText(ADODataSetEnrollment->FieldByName("Age2")->AsString)
SQLText(ADODataSetEnrollment->FieldByName("Age3")->AsString)
I'm only interested in ADODataSetEnrollment, so I should only have 1 entry for that variable in my output file but because that line of code doesn't contain a space it's treated as a single "word".
How can I split my lines array instead, so that way I can find unique variables?

Have you tried RegEx matching? With RegEx you can for example say
RegEx.IsMatch(word, "(?i)(?<!\w)adodataset(?!\w)")
> (?i) means ignore case (like uppercase, lower case, i think)
> (?<!\w)means not preceded by a literal (like letters, ABC..., abc... and so
> forth)
> (?!\w) means not followed by a literal RegEx.IsMatch(...)
> returns a bool value

Ended up with this as a solution:
string file = #"C:\somePath\Form1.cs";
string output = #"C:\someOtherPath\New Text Document.txt";
List<string> datasets = new List<string>();
string[] lines = File.ReadAllLines(file);
decimal i = 0;
foreach (string line in lines)
{
string[] words = line.Split(' ');
foreach (string word in words)
{
if (word.ToLower().Contains("adodataset"))
{
int start = word.ToLower().IndexOf("adodataset");
string dsWord = String.Empty;
string temp = word.Substring(start, word.Length - start);
foreach (char c in temp)
{
if (Char.IsLetter(c))
dsWord += c;
else
break;
}
if (dsWord != String.Empty)
datasets.Add(dsWord);
}
}
i++;
Console.Write("\r{0}% ", Math.Round(i / lines.Count() * 100, 2));
}
if (datasets.Count > 0)
{
using (StreamWriter sw = new StreamWriter(output))
{
foreach (string dataset in datasets.Distinct())
sw.WriteLine(dataset);
}
Console.WriteLine(String.Format("Wrote {0} data sets to {1}", datasets.Distinct().Count(), output));
Console.ReadKey();
}
Pretty ghetto, but it did what I needed it to do. I'll happily accept someone else's answer though if they know of a better way to use Regex to just pull out the variable name from within the line of code, rather than the whole line itself.

You can try this solution:
string file = File.ReadAllText(#"text.txt");
string output = #"C:\someOtherPath\New Text Document.txt";
List<string> datasets = new List<string>();
var a = Regex.Matches(file, #"\W(ADODataSet\w*)", RegexOptions.IgnoreCase);
foreach (Match m in a)
{
datasets.Add(m.Groups[1].Value);
}

Related

Splitting a line of text that has key value pairs where value can be empty

I need to split a line of text
The general syntax for a delivery instruction is |||name|value||name|value||…..|||
Each delivery instruction starts and ends with 3 pipe characters - |||
A delivery instruction is a set of name/value pairs separated by a single pipe eg name|value
Each name value pair is separated by 2 pipe characters ||
Names and Values may not contain the pipe character
The value of any pair may be a blank string.
I need a regex that will help me resolve the above problem.
My latest attempt with my limited Regex skills:
string SampleData = "|||env|af245g||mail_idx|39||gen_date|2016/01/03 11:40:06||docm_name|Client Statement (01.03.2015−31.03.2015)||docm_cat_name|Client Statement||docm_type_id|9100||docm_type_name|Client Statement||addr_type_id|1||addr_type_name|Postal address||addr_street_nr|||addr_street_name|Robinson Road||addr_po_box|||addr_po_box_type|||addr_postcode|903334||addr_city|Singapore||addr_state|||addr_country_id|29955||addr_country_name|Singapore||obj_nr|10000023||bp_custr_type|Customer||access_portal|Y||access_library|Y||avsr_team_id|13056||pri_avsr_id|||pri_avsr_name|||ctact_phone|||dlv_type_id|5001||dlv_type_name|Channel to standard mail||ao_id|14387||ao_name|Corp Limited||ao_title|||ao_mob_nr|||ao_email_addr||||??";
string[] Split = Regex.Matches(SampleData, "(\|\|\|(?:\w+\|\w*\|\|)*\|)").Cast<Match>().Select(m => m.Value).ToArray();
The expected output should be as follows(based on the sample data string provided):
env|af245g
mail_idx|39
gen_date|2016/01/03 11:40:06
docm_name|Client Statement (01.03.2015−31.03.2015)
docm_cat_name|Client Statement
docm_type_id|9100
docm_type_name|Client Statement
addr_type_id|1
addr_type_name|Postal address
addr_street_nr|
addr_street_name|Robinson Road
addr_po_box|
addr_po_box_type|
addr_postcode|903334
addr_city|Singapore
addr_state|
addr_country_id|29955
addr_country_name|Singapore
obj_nr|10000023
bp_custr_type|Customer
access_portal|Y
access_library|Y
avsr_team_id|13056
pri_avsr_id|
pri_avsr_name|
ctact_phone|
dlv_type_id|5001
dlv_type_name|Channel to standard mail
ao_id|14387
ao_name|Corp Limited
ao_title|
ao_mob_nr|
ao_email_addr|
You can also do it without using Regex. Its just simple splitting.
string nameValues = "|||zeeshan|1||ali|2||ahsan|3|||";
string sub = nameValues.Substring(3, nameValues.Length - 6);
Dictionary<string, string> dic = new Dictionary<string, string>();
string[] subsub = sub.Split(new string[] {"||"}, StringSplitOptions.None);
foreach (string item in subsub)
{
string[] nameVal = item.Split('|');
dic.Add(nameVal[0], nameVal[1]);
}
foreach (var item in dic)
{
// Retrieve key and value here i.e:
// item.Key
// item.Value
}
Hope this helps.
I think you're making this more difficult than it needs to be. This regex yields the desired result:
#"[^|]+\|([^|]*)"
Assuming you're dealing with a single, well-formed delivery instruction, there's no need to match the starting and ending triple-pipes. You don't need to worry about the double-pipe separators either, because the "name" part of the "name|value" pair is always present. Just look for the first thing that looks like a name with a pipe following it, and everything up to the next pipe character is the value.
(?<=\|\|\|).*?(?=\|\|\|)
You can use this to get all the key value pairs between |||.See demo.
https://regex101.com/r/fM9lY3/59
string strRegex = #"(?<=\|\|\|).*?(?=\|\|\|)";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
string strTargetString = #"|||env|af245g||mail_idx|39||gen_date|2016/01/03 11:40:06||docm_name|Client Statement (01.03.2015−31.03.2015)||docm_cat_name|Client Statement||docm_type_id|9100||docm_type_name|Client Statement||addr_type_id|1||addr_type_name|Postal address||addr_street_nr|||addr_street_name|Robinson Road||addr_po_box|||addr_po_box_type|||addr_postcode|903334||addr_city|Singapore||addr_state|||addr_country_id|29955||addr_country_name|Singapore||obj_nr|10000023||bp_custr_type|Customer||access_portal|Y||access_library|Y||avsr_team_id|13056||pri_avsr_id|||pri_avsr_name|||ctact_phone|||dlv_type_id|5001||dlv_type_name|Channel to standard mail||ao_id|14387||ao_name|Corp Limited||ao_title|||ao_mob_nr|||ao_email_addr||||??";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}
Here's a variation of #Syed Muhammad Zeeshan code that runs faster:
string nameValues = "|||zeeshan|1||ali|2||ahsan|3|||";
string[] nameArray = nameValues.Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
Dictionary<string, string> dic = new Dictionary<string, string>();
int i = 0;
foreach (string item in nameArray)
{
if (i < nameArray.Length - 1)
dic.Add(nameArray[i], nameArray[i + 1]);
i = i + 2;
}
Interesting, I will like to try:
class Program
{
static void Main(string[] args)
{
string nameValueList = "|||zeeshan|1||ali|2||ahsan|3|||";
while (nameValueList != "|||")
{
nameValueList = nameValueList.TrimStart('|');
string nameValue = GetNameValue(ref nameValueList);
Console.WriteLine(nameValue);
}
Console.ReadLine();
}
private static string GetNameValue(ref string nameValues)
{
string retVal = string.Empty;
while(nameValues[0] != '|') // for name
{
retVal += nameValues[0];
nameValues = nameValues.Remove(0, 1);
}
retVal += nameValues[0];
nameValues = nameValues.Remove(0, 1);
while (nameValues[0] != '|') // for value
{
retVal += nameValues[0];
nameValues = nameValues.Remove(0, 1);
}
return retVal;
}
}
https://dotnetfiddle.net/WRbsRu

Split string by commas ignoring any punctuation marks (including ',') in quotation marks

How can I split string (from a textbox) by commas excluding those in double quotation marks (without getting rid of the quotation marks), along with other possible punctuation marks (e.g. ' . ' ' ; ' ' - ')?
E.g. If someone entered the following into the textbox:
apple, orange, "baboons, cows", rainbow, "unicorns, gummy bears"
How can I split the above string into the following (say, into a List)?
apple
orange
"baboons, cows"
rainbow
"Unicorns, gummy bears..."
Thank you for your help!
You could try the below regex which uses positive lookahead,
string value = #"apple, orange, ""baboons, cows"", rainbow, ""unicorns, gummy bears""";
string[] lines = Regex.Split(value, #", (?=(?:""[^""]*?(?: [^""]*)*))|, (?=[^"",]+(?:,|$))");
foreach (string line in lines) {
Console.WriteLine(line);
}
Output:
apple
orange
"baboons, cows"
rainbow
"unicorns, gummy bears"
IDEONE
Try this:
Regex str = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match m in str.Matches(input))
{
Console.WriteLine(m.Value.TrimStart(','));
}
You may also try to look at FileHelpers
Much like a CSV parser, instead of Regex, you can loop through each character, like so:
public List<string> ItemStringToList(string inputString)
{
var itemList = new List<string>();
var currentIem = "";
var quotesOpen = false;
for (int i = 0; i < inputString.Length; i++)
{
if (inputString[i] == '"')
{
quotesOpen = !quotesOpen;
continue;
}
if (inputString[i] == ',' && !quotesOpen)
{
itemList.Add(currentIem);
currentIem = "";
continue;
}
if (currentIem == "" && inputString[i] == ' ') continue;
currentIem += inputString[i];
}
if (currentIem != "") itemList.Add(currentIem);
return itemList;
}
Example test usage:
var test1 = ItemStringToList("one, two, three");
var test2 = ItemStringToList("one, \"two\", three");
var test3 = ItemStringToList("one, \"two, three\"");
var test4 = ItemStringToList("one, \"two, three\", four, \"five six\", seven");
var test5 = ItemStringToList("one, \"two, three\", four, \"five six\", seven");
var test6 = ItemStringToList("one, \"two, three\", four, \"five six, seven\"");
var test7 = ItemStringToList("\"one, two, three\", four, \"five six, seven\"");
You could change it to use StringBuilder if you want faster character joining.
Try with this it will work u c an split array string in many waysif you want to split by white space just put a space in (' ') .
namespace LINQExperiment1
{
class Program
{
static void Main(string[] args)
{
string[] sentence = new string[] { "apple", "orange", "baboons cows", " rainbow", "unicorns gummy bears" };
Console.WriteLine("option 1:"); Console.WriteLine("————-");
// option 1: Select returns three string[]’s with
// three strings in each.
IEnumerable<string[]> words1 =
sentence.Select(w => w.Split(' '));
// to get each word, we have to use two foreach loops
foreach (string[] segment in words1)
foreach (string word in segment)
Console.WriteLine(word);
Console.WriteLine();
Console.WriteLine("option 2:"); Console.WriteLine("————-");
// option 2: SelectMany returns nine strings
// (sub-iterates the Select result)
IEnumerable<string> words2 =
sentence.SelectMany(segment => segment.Split(','));
// with SelectMany we have every string individually
foreach (var word in words2)
Console.WriteLine(word);
// option 3: identical to Opt 2 above written using
// the Query Expression syntax (multiple froms)
IEnumerable<string> words3 =from segment in sentence
from word in segment.Split(' ')
select word;
}
}
}
This was trickier than I thought, a good practical problem I think.
Below is the solution I came up with for this. One thing I don't like about my solution is having to add double quotations back and the other one being names of the variables :p:
internal class Program
{
private static void Main(string[] args)
{
string searchString =
#"apple, orange, ""baboons, cows. dogs- hounds"", rainbow, ""unicorns, gummy bears"", abc, defghj";
char delimeter = ',';
char excludeSplittingWithin = '"';
string[] splittedByExcludeSplittingWithin = searchString.Split(excludeSplittingWithin);
List<string> splittedSearchString = new List<string>();
for (int i = 0; i < splittedByExcludeSplittingWithin.Length; i++)
{
if (i == 0 || splittedByExcludeSplittingWithin[i].StartsWith(delimeter.ToString()))
{
string[] splitttedByDelimeter = splittedByExcludeSplittingWithin[i].Split(delimeter);
for (int j = 0; j < splitttedByDelimeter.Length; j++)
{
splittedSearchString.Add(splitttedByDelimeter[j].Trim());
}
}
else
{
splittedSearchString.Add(excludeSplittingWithin + splittedByExcludeSplittingWithin[i] +
excludeSplittingWithin);
}
}
foreach (string s in splittedSearchString)
{
if (s.Trim() != string.Empty)
{
Console.WriteLine(s);
}
}
Console.ReadKey();
}
}
Another Regex solution:
private static IEnumerable<string> Parse(string input)
{
// if used frequently, should be instantiated with Compiled option
Regex regex = new Regex(#"(?<=^|,\s)(\""(?:[^\""]|\""\"")*\""|[^,\s]*)");
return regex.Matches(inputData).Where(m => m.Success);
}

SubString editing

I've tried a few different methods and none of them work correctly so I'm just looking for someone to straight out show me how to do it . I want my application to read in a file based on an OpenFileDialog.
When the file is read in I want to go through it and and run this function which uses Linq to insert the data into my DB.
objSqlCommands.sqlCommandInsertorUpdate
However I want to go through the string , counting the number of ","'s found . when the number reaches four I want to only take the characters encountered until the next "," and do this until the end of the file .. can someone show me how to do this ?
Based on the answers given here my code now looks like this
string fileText = File.ReadAllText(ofd.FileName).Replace(Environment.NewLine, ",");
int counter = 0;
int idx = 0;
List<string> foo = new List<string>();
foreach (char c in fileText.ToArray())
{
idx++;
if (c == ',')
{
counter++;
}
if (counter == 4)
{
string x = fileText.Substring(idx);
foo.Add(fileText.Substring(idx, x.IndexOf(',')));
counter = 0;
}
}
foreach (string s in foo)
{
objSqlCommands.sqlCommandInsertorUpdate("INSERT", s);//laClient[0]);
}
However I am getting an "length cannot be less than 0" error on the foo.add function call , any ideas ?
A Somewhat hacky example. You would pass this the entire text from your file as a single string.
string str = "1,2,3,4,i am some text,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20";
int counter = 0;
int idx = 0;
List<string> foo = new List<string>();
foreach (char c in str.ToArray())
{
idx++;
if (c == ',')
{
counter++;
}
if (counter == 4)
{
string x = str.Substring(idx);
foo.Add(str.Substring(idx, x.IndexOf(',')));
counter = 0;
}
}
foreach(string s in foo)
{
Console.WriteLine(s);
}
Console.Read();
Prints:
i am some text
9
13
17
As Raidri indicates in his answer, String.Split is definitely your friend. To catch every fifth word, you could try something like this (not tested):
string fileText = File.ReadAllText(OpenDialog.FileName).Replace(Environment.NewLine, ",");
string words[] = fileText.Split(',');
List<string> everFifthWord = new List<string>();
for (int i = 4; i <= words.Length - 1, i + 5)
{
everyFifthWord.Add(words[i]);
}
The above code reads the selected file from the OpenFileDialog, then replaces every newline with a ",". Then it splits the string on ",", and starting with the fifth word takes every fifth word in the string and adds it to the list.
File.ReadAllText reads a text file to a string and Split turns that string into an array seperated at the commas:
File.ReadAllText(OpenDialog.FileName).Split(',')[4]
If you have more than one line use:
File.ReadAllLines(OpenDialog.FileName).Select(l => l.Split(',')[4])
This gives an IEnumerable<string> where each string contains the wanted part from one line of the file
It's not clear to me if you're after every fifth piece of text between the commas or if there are multiple lines and you want only the fifth piece of text on each line. So I've done both.
Every fifth piece of text:
var text = "1,2,3,4,i am some text,6,7,8,9"
+ ",10,11,12,13,14,15,16,17,18,19,20";
var everyFifth =
text
.Split(',')
.Where((x, n) => n % 5 == 4);
Only the fifth piece of text on each line:
var lines = new []
{
"1,2,3,4,i am some text,6,7",
"8,9,10,11,12,13,14,15",
"16,17,18,19,20",
};
var fifthOnEachLine =
lines
.Select(x => x.Split(',')[4]);

How to get value specific column value in csv using c#?

I do a project in c# winforms.
I want to get first column value in csv.
var reader = new StreamReader(File.OpenRead(#"C:\test.csv"));
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(';');
listA.Add(values[0]);
}
------------------
no |name |
------------------
1 |wwwwww
2 |yyyyy
3 |aaaaa
4 |bbbbbb
Now I am using above this code. It gives the value row by row. i want all name value in listA
Any one have idea?
There is now way to read column in CSV without reading whole file. You can use some wrappers (for example: LINQ to CSV library) but they will just "hide" reading operation.
Yes - you're currently spliting on ;
Try using a comma instead.
Better to use a dedicated library btw...
Some frown upon Regex but I think it provides good flexibility. Here is an example inspired by
Adrian Mejia. Basically, you can choose particular characters between which the delimiter is valid in the context. i.e. a comma in "hello, world" or 'hello, world' would be valid.
static void Main(string[] args)
{
string csv = "Hello,1,3.5,25,\"speech marks\",'inverted commas'\r\nWorld,2,4,60,\"again, more speech marks\",'something else in inverted commas, with a comma'";
// General way to create grouping constructs which are valid 'text' fields
string p = "{0}([^{0}]*){0}"; // match group '([^']*)' (inverted commas) or \"([^\"]*)\" (speech marks)
string c = "(?<={0}|^)([^{0}]*)(?:{0}|$)"; // commas or other delimiter group (?<=,|^)([^,]*)(?:,|$)
char delimiter = ','; // this can be whatever delimiter you like
string p1 = String.Format(p, "\""); // speechmarks group (0)
string p2 = String.Format(p, "'"); // inverted comma group (1)
string c1 = String.Format(c, delimiter); // delimiter group (2)
/*
* The first capture group will be speech marks ie. "some text, "
* The second capture group will be inverted commas ie. 'this text'
* The third is everything else seperated by commas i.e. this,and,this will be [this][and][this]
* You can extend this to customise delimiters that represent text where a comma between is a valid entry eg. "this text, complete with a pause, is perfectly valid"
*
* */
//string pattern = "\"([^\"]*)\"|'([^']*)'|(?<=,|^)([^,]*)(?:,|$)";
string pattern = String.Format("{0}|{1}|{2}", new object[] { p1, p2, c1 }); // The actual pattern to match based on groups
string text = csv;
// If you're reading from a text file then this will do the trick. Uses the ReadToEnd() to put the whole file to a string.
//using (TextReader tr = new StreamReader("PATH TO MY CSV FILE", Encoding.ASCII))
//{
// text = tr.ReadToEnd(); // just read the whole stream
//}
string[] lines = text.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries); // if you have a blank line just remove it?
Regex regex = new Regex(pattern, RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase); // compile for speed
List<object> rowsOfColumns = new List<object>();
foreach (string row in lines)
{
List<string> columns = new List<string>();
// Find matches.
MatchCollection matches = regex.Matches(row);
foreach (Match match in matches)
{
for (int ii = 0; ii < match.Groups.Count; ii++)
{
if (match.Groups[ii].Success) // ignore things that don't match
{
columns.Add(match.Groups[ii].Value.TrimEnd(new char[] { delimiter })); // strip the delimiter
break;
}
}
}
// Do something with your columns here (add to List for example)
rowsOfColumns.Add(columns);
}
}
var reader = new StreamReader(File.OpenRead(#"C:\test.csv"));
List<string> listA = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
string[] dates = line.Split(',');
for (int i = 0; i < dates.Length; i++)
{
if(i==0)
listA.Add(dates[0]);
}
}

How to split a space-delimited list of paths where paths can include spaces in .NET 2?

For instance:
c:\dir1 c:\dir2 "c:\my files" c:\code "old photos" "new photos"
Should be read as a list:
c:\dir1
c:\dir2
c:\my files
c:\code
old photos
new photos
I can write a function which parses the string linearly but wondered if the .NET 2.0 toolbox has any cool tricks one could use?
Since you have to hit every character I think a brute force is going to give you the best performance.
That way you hit every character exactly once.
And it limits the number of comparisons performed.
static void Main(string[] args)
{
string input = #"c:\dir1 c:\dir2 ""c:\my files"" c:\code ""old photos"" ""new photos""";
List<string> splitInput = MySplit(input);
foreach (string s in splitInput)
{
System.Diagnostics.Debug.WriteLine(s);
}
System.Diagnostics.Debug.WriteLine(input);
}
public static List<string> MySplit(string input)
{
List<string> split = new List<string>();
StringBuilder sb = new StringBuilder();
bool splitOnQuote = false;
char quote = '"';
char space = ' ';
foreach (char c in input.ToCharArray())
{
if (splitOnQuote)
{
if (c == quote)
{
if (sb.Length > 0)
{
split.Add(sb.ToString());
sb.Clear();
}
splitOnQuote = false;
}
else { sb.Append(c); }
}
else
{
if (c == space)
{
if (sb.Length > 0)
{
split.Add(sb.ToString());
sb.Clear();
}
}
else if (c == quote)
{
if (sb.Length > 0)
{
split.Add(sb.ToString());
sb.Clear();
}
splitOnQuote = true;
}
else { sb.Append(c); }
}
}
if (sb.Length > 0) split.Add(sb.ToString());
return split;
}
Usually for this type of problem one could develop a regular expression to parse out the fields. ( "(.*?)" ) would give you all the string values in quotes. You could strip all those values from your string, and then do a simple split on space after all the quoted items are out.
static void Main(string[] args)
{
string myString = "\"test\" test1 \"test2 test3\" test4 test6 \"test5\"";
string myRegularExpression = #"""(.*?)""";
List<string> listOfMatches = new List<string>();
myString = Regex.Replace(myString, myRegularExpression, delegate(Match match)
{
string v = match.ToString();
listOfMatches.Add(v);
return "";
});
var array = myString.Split(' ');
foreach (string s in array)
{
if(s.Trim().Length > 0)
listOfMatches.Add(s);
}
foreach (string match in listOfMatches)
{
Console.WriteLine(match);
}
Console.Read();
}
Unfortunately, I don't think there is any sort of C# kungfu that makes it much simpler. I should add that obviously, this algorithm gives you the items out of order... so if that matters... this isn't a good solution.
Here's a regex-only solution which captures both space-delimited and quoted paths. Quoted paths are stripped of the quotes, multiple spaces don't cause empty list entries. Edge case of mixing a quoted path with a non-quoted path without intervening space is interpreted as multiple entries.
It can be optimized by disabling captures for unused groups but I opted for more readability instead.
static Regex re = new Regex(#"^([ ]*((?<r>[^ ""]+)|[""](?<r>[^""]*)[""]))*[ ]*$");
public static IEnumerable<string> RegexSplit(string input)
{
var m = re.Match(input ?? "");
if(!m.Success)
throw new ArgumentException("Malformed input.");
return from Capture capture in m.Groups["r"].Captures select capture.Value;
}
Assuming that a space acts as a delimiter between except when enclosed in quotes (to allow paths to contain spaces), I'd recommend the following algorithm:
ignore_space = false;
i = 0;
list_of_breaks=[];
while(i < input_length)
{
if(charat(i) is a space and ignore_space is false)
{
add i to list_of_breaks;
}
else if(charat(i) is a quote)
{
ignore_space = ! ignore_space
}
}
split the input at the indices listed in list_of_breaks

Categories

Resources