Don't split by escaped string - C# [duplicate] - c#

This question already has answers here:
How can I Split(',') a string while ignore commas in between quotes?
(3 answers)
C# Regex Split - commas outside quotes
(7 answers)
Closed 5 years ago.
I need to split a csv file by comma apart from where the columns is between quote marks. However, what I have here does not seem to be achieving what I need and comma's in columns are being split into separate array items.
public List<string> GetData(string dataFile, int row)
{
try
{
var lines = File.ReadAllLines(dataFile).Select(a => a.Split(';'));
var csv = from line in lines select (from piece in line select piece.Split(',')).ToList();
var foo = csv.ToList();
var result = foo[row][0].ToList();
return result;
}
catch
{
return null;
}
}
private const string QUOTE = "\"";
private const string ESCAPED_QUOTE = "\"\"";
private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };
public static string Escape(string s)
{
if (s.Contains(QUOTE))
s = s.Replace(QUOTE, ESCAPED_QUOTE);
if (s.IndexOfAny(CHARACTERS_THAT_MUST_BE_QUOTED) > -1)
s = QUOTE + s + QUOTE;
return s;
}
I am not sure where I can use my escape function in this case.
Example:
Degree,Graduate,08-Dec-17,Level 1,"Advanced, Maths"
The string Advanced, Maths are being split into two different array items which I don't want

You could use regex, linq or just loop through each character and use Booleans to figure out what the current behaviour should be. This question actually got me thinking, as I'd previously just looped through and acted on each character. Here is Linq way of breaking an entire csv document up, assuming the end of line can be found with ';':
private static void Main(string[] args)
{
string example = "\"Hello World, My name is Gumpy!\",20,male;My sister's name is Amy,29,female";
var result1 = example.Split(';')
.Select(s => s.Split('"')) // This will leave anything in abbreviation marks at odd numbers
.Select(sl => sl.Select((ss, index) => index % 2 == 0 ? ss.Split(',') : new string[] { ss })) // if it's an even number split by a comma
.Select(sl => sl.SelectMany(sc => sc));
Console.WriteLine("Press any key to continue.");
Console.ReadKey();
}

Not sure how this performes - but you can solve that with Linq.Aggregate like this:
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static IEnumerable<string> SplitIt(
char[] splitters,
string text,
StringSplitOptions opt = StringSplitOptions.None)
{
bool inside = false;
var result = text.Aggregate(new List<string>(), (acc, c) =>
{
// this will check each char of your given text
// and accumulate it in the (empty starting) string list
// your splitting chars will lead to a new item put into
// the list if they are not inside. inside starst as false
// and is flipped anytime it hits a "
// at end we either return all that was parsed or only those
// that are neither null nor "" depending on given opt's
if (!acc.Any()) // nothing in yet
{
if (c != '"' && (!splitters.Contains(c) || inside))
acc.Add("" + c);
else if (c == '"')
inside = !inside;
else if (!inside && splitters.Contains(c)) // ",bla"
acc.Add(null);
return acc;
}
if (c != '"' && (!splitters.Contains(c) || inside))
acc[acc.Count - 1] = (acc[acc.Count - 1] ?? "") + c;
else if (c == '"')
inside = !inside;
else if (!inside && splitters.Contains(c)) // ",bla"
acc.Add(null);
return acc;
}
);
if (opt == StringSplitOptions.RemoveEmptyEntries)
return result.Where(r => !string.IsNullOrEmpty(r));
return result;
}
public static void Main()
{
var s = ",,Degree,Graduate,08-Dec-17,Level 1,\"Advanced, Maths\",,";
var spl = SplitIt(new[]{','}, s);
var spl2 = SplitIt(new[]{','}, s, StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(string.Join("|", spl));
Console.WriteLine(string.Join("|", spl2));
}
}
Output:
|Degree|Graduate|08-Dec-17|Level 1|Advanced, Maths||
Degree|Graduate|08-Dec-17|Level 1|Advanced, Maths

The function gets comma separated fields within a string, excluding commas embedded in a quoted field
The assumptions
It should return empty fields ,,
There are no quotes within a quote field (as per the example)
The method
I uses a for loop with i as a place holder of the current field
It scans for the next comma or quote and if it finds a quote it scans for the next comma to create the field
It needed to be efficient otherwise we would use regex or Linq
The OP didn't want to use a CSV library
Note : There is no error checking, and scanning each character would be faster this was just easy to understand
Code
public List<string> GetFields(string line)
{
var list = new List<string>();
for (var i = 0; i < line.Length; i++)
{
var firstQuote = line.IndexOf('"', i);
var firstComma = line.IndexOf(',', i);
if (firstComma >= 0)
{
// first comma is before the first quote, then its just a standard field
if (firstComma < firstQuote || firstQuote == -1)
{
list.Add(line.Substring(i, firstComma - i));
i = firstComma;
continue;
}
// We have found quote so look for the next comma afterwards
var nextQuote = line.IndexOf('"', firstQuote + 1);
var nextComma = line.IndexOf(',', nextQuote + 1);
// if we found a comma, then we have found the end of this field
if (nextComma >= 0)
{
list.Add(line.Substring(i, nextComma - i));
i = nextComma;
continue;
}
}
list.Add(line.Substring(i)); // if were are here there are no more fields
break;
}
return list;
}
Tests 1
Degree,Graduate,08-Dec-17,Level 1,"Advanced, Maths",another
Degree
Graduate
08-Dec-17
Level 1
"Advanced, Maths"
another
Tests 2
,Degree,Graduate,08-Dec-17,\"asdasd\",Level 1,\"Advanced, Maths\",another
<Empty Line>
Degree
Graduate
08-Dec-17
"asdasd"
Level 1
"Advanced, Maths"
another

Related

Editing string in C#

given a string with words separated by spaces how would you go about merging two words if one of them is made by one character only ? An example should clarify:
"a bcd tttt" => "abcd tttt"
"abc d hhhh" => "abcd hhhh"
I would like to merge the single characer word with the one on the left in all cases where it is not the first word in the string, in this case i would like to merge it with the one on the right.
I am trying to loop through the string and create some logic but it turned out to be more complex than i was expecting.
Try the below program's approach:
using System;
using System.Text;
public class Program
{
public static void Main()
{
var delimiter=new char[]{' '};
var stringToMerge="abc d hhhh";
var splitArray=stringToMerge.Split(delimiter);
var stringBuilder=new StringBuilder();
for(int wordIndex=0;wordIndex<splitArray.Length;wordIndex++)
{
var word=splitArray[wordIndex];
if(wordIndex!=0 && word.Length>1)
{
stringBuilder.Append(" ");
}
stringBuilder.Append(word);
}
Console.WriteLine(stringBuilder.ToString());
}
}
Basically, you split the string to words, then using StringBuilder, build a new string, inserting a space before a word only if the word is larger than one character.
One way to approach this is to first use string.Split(' ') to get an array of words, which is easier to deal with.
Then you can loop though the words, handling single character words by concatenating them with the previous word, with special handling for the first word.
One such approach:
public static void Main()
{
string data = "abcd hhhh";
var words = data.Split(' ');
var sb = new StringBuilder();
for (int i = 0; i < words.Length; ++i)
{
var word = words[i];
if (word.Length == 1)
{
sb.Append(word);
if (i == 0 && i < words.Length - 1) // Single character first word is special case: Merge with next word.
sb.Append(words[++i]); // Note the "++i" to increment the loop counter, skipping the next word.
}
else
{
sb.Append(' ' + word);
}
}
var result = sb.ToString();
Console.WriteLine(result);
}
Note that this will concatenate multiple instances of single-letter words, so that "a b c d e" will result in "abcde" and "ab c d e fg" will result in "abcde fg". You don't actually specify what should happen in this case.
if you want to do it with a plain for loop and string walking:
using System;
using System.Text;
public class Program
{
public static void Main()
{
Console.WriteLine(MergeOrphant("bcd a tttt") == "bcda tttt");
Console.WriteLine(MergeOrphant("bcd a tttt a") == "bcda tttta");
Console.WriteLine(MergeOrphant("a bcd tttt") == "abcd tttt");
Console.WriteLine(MergeOrphant("a b") == "ab");
}
private static string MergeOrphant(string source)
{
var stringBuilder = new StringBuilder();
for (var i = 0; i < source.Length; i++)
{
if (i == 1 && char.IsWhiteSpace(source[i]) && char.IsLetter(source[i - 1])) {
i++;
}
if (i > 0 && char.IsWhiteSpace(source[i]) && char.IsLetter(source[i - 1]) && char.IsLetter(source[i + 1]) && (i + 2 == source.Length || char.IsWhiteSpace(source[i + 2])) )
{
i++;
}
stringBuilder.Append(source[i]);
}
return stringBuilder.ToString();
}
}
Quite short with Regex.
string foo = "a bcd b tttt";
foo = Regex.Replace(foo, #"^(\w) (\w{2,})", "$1$2");
foo = Regex.Replace(foo, #"(\w{2,}) (\w)\b", "$1$2");
Be aware \w is [a-zA-Z0-9_] if you need an other definition you have to define you own character class.
My answer would not be the best practice but it works for your second case, but still you should be clear about the letter merging rules.
public static void Main()
{
Console.WriteLine(Edit("abc d hhhh") == "abcd hhhh");
Console.WriteLine(Edit("abc d hhhh a") == "abcd hhhha");
Console.WriteLine(Edit("abc d hhhh a b") == "abcd hhhhab");
Console.WriteLine(Edit("abc d hhhh a def g") == "abcd hhhha defg");
}
public static string Edit(string str)
{
var result = string.Empty;
var split = str.Split(' ', StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < split.Length; i++)
{
if(i == 0)
result += split[i];
else
{
if (i > 0 && split[i].Length == 1)
{
result += split[i];
}
else
{
result += $" {split[i]}";
}
}
}
return result;
}
As I have mentioned above, this does not work for your 1st case which is : Edit("a bcd") would not generate "abcd".
Expanding on Matthew's answer,
If you don't want the extra space in the output you can change the last line to;
Console.WriteLine(result.TrimStart(' '));

C# - ignore Split character if it's inside parentheses

I'm writing the following method in C# to parse a CSV file and write values to a SQL Server database.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
public List<Entity> ParseEntityExclusionFile(List<string> entries, string urlFile)
{
entries.RemoveAt(0);
List<Entity> entities = new List<Entity>();
foreach (string line in entries)
{
Entity exclusionEntity = new Entity();
string[] lineParts = line.Split(',').Select(p => p.Trim('\"')).ToArray();
exclusionEntity.Id = 1000;
exclusionEntity.Names = lineParts[3];
exclusionEntity.Identifier = $"{lineParts[25]}" + $" | " + $"Classification: " + ${lineParts[0]}";
entities.Add(exclusionEntity);
}
return entities;
}
The data in some columns of the csv file are comma-separated values inside a set of parentheses, meant to represent one value for that column. So for any values like that, I need to capture it as one value to go into one field in the database. How would I adjust/add-to the line of code string[] lineParts = line.Split(',').Select(p => p.Trim('\"')).ToArray(); to instruct the application that if it encounters a column with open parenthesis, capture all the data after the open parenthesis, including the commas, until the close parenthesis, all as one value?
EDIT: it seems the Select(p => p.Trim('\"')).ToArray(); part of the above line of code is confusing some folks - don't worry about that part - I just need to know how I would go about adding 'exception' code to create a condition where Split(',') is ignored if the commas happen to be in between a set of parentheses. One field in the csv file looks like this (1,2,3,4) - currently the code parses it as four fields, whereas I need that parsed as one field like 1,2,3,4 OR (1,2,3,4) it actually doesn't matter whether the resulting fields contain the parentheses or not.
EDIT 2: I appreciate the suggestions of using a .NET CSV library - however, everything is working perfectly in this project outside of this one field in the csv file containing a set of parentheses with comma-separated values inside - I feel as though it's a bit overkill to install and configure an entire library, including having to set up new models and properties, just for this one column of data.
Try this code:
public static class Ex
{
private static string Peek(this string source, int peek) => (source == null || peek < 0) ? null : source.Substring(0, source.Length < peek ? source.Length : peek);
private static (string, string) Pop(this string source, int pop) => (source == null || pop < 0) ? (null, source) : (source.Substring(0, source.Length < pop ? source.Length : pop), source.Length < pop ? String.Empty : source.Substring(pop));
public static string[] ParseCsvLine(this string line)
{
return ParseCsvLineImpl(line).ToArray();
IEnumerable<string> ParseCsvLineImpl(string l)
{
string remainder = line;
string field;
while (remainder.Peek(1) != "")
{
(field, remainder) = ParseField(remainder);
yield return field;
}
}
}
private const string DQ = "\"";
private static (string field, string remainder) ParseField(string line)
{
if (line.Peek(1) == DQ)
{
var (_, split) = line.Pop(1);
return ParseFieldQuoted(split);
}
else
{
var field = "";
var (head, tail) = line.Pop(1);
while (head != "," && head != "")
{
field += head;
(head, tail) = tail.Pop(1);
}
return (field, tail);
}
}
private static (string field, string remainder) ParseFieldQuoted(string line)
{
var field = "";
var head = "";
var tail = line;
while (tail.Peek(1) != "" && (tail.Peek(1) != DQ || tail.Peek(2) == DQ + DQ))
{
if (tail.Peek(2) == DQ + DQ)
{
(head, tail) = tail.Pop(2);
field += DQ;
}
else
{
(head, tail) = tail.Pop(1);
field += head;
}
}
if (tail.Peek(2) == DQ + ",")
{
(head, tail) = tail.Pop(2);
}
else if (tail.Peek(1) == DQ)
{
(head, tail) = tail.Pop(1);
}
return (field, tail);
}
}
It handles double-quotes, and double-double-quotes.
You can then do this:
string line = "45,\"23\"\",34\",66"; // 45,"23"",34",66
string[] fields = line.ParseCsvLine();
That produces:
45
23",34
66
Here's an updated version of my code that deals with ( and ) as delimiters. It deals with nested delimiters and treats them as part of the field string.
You would need to remove the " as you see fit - I'm not entirely sure why you are doing this.
Also, this is no longer CSV. The parenthesis are not a normal part of CSV. I've changed the name of the method to ParseLine as a result.
public static class Ex
{
private static string Peek(this string source, int peek) => (source == null || peek < 0) ? null : source.Substring(0, source.Length < peek ? source.Length : peek);
private static (string, string) Pop(this string source, int pop) => (source == null || pop < 0) ? (null, source) : (source.Substring(0, source.Length < pop ? source.Length : pop), source.Length < pop ? String.Empty : source.Substring(pop));
public static string[] ParseLine(this string line)
{
return ParseLineImpl(line).ToArray();
IEnumerable<string> ParseLineImpl(string l)
{
string remainder = line;
string field;
while (remainder.Peek(1) != "")
{
(field, remainder) = ParseField(remainder);
yield return field;
}
}
}
private const string GroupOpen = "(";
private const string GroupClose = ")";
private static (string field, string remainder) ParseField(string line)
{
if (line.Peek(1) == GroupOpen)
{
var (_, split) = line.Pop(1);
return ParseFieldQuoted(split);
}
else
{
var field = "";
var (head, tail) = line.Pop(1);
while (head != "," && head != "")
{
field += head;
(head, tail) = tail.Pop(1);
}
return (field, tail);
}
}
private static (string field, string remainder) ParseFieldQuoted(string line) => ParseFieldQuoted(line, false);
private static (string field, string remainder) ParseFieldQuoted(string line, bool isNested)
{
var field = "";
var head = "";
var tail = line;
while (tail.Peek(1) != "" && tail.Peek(1) != GroupClose)
{
if (tail.Peek(1) == GroupOpen)
{
(head, tail) = tail.Pop(1);
(head, tail) = ParseFieldQuoted(tail, true);
field += GroupOpen + head + GroupClose;
}
else
{
(head, tail) = tail.Pop(1);
field += head;
}
}
if (tail.Peek(2) == GroupClose + ",")
{
(head, tail) = tail.Pop(isNested ? 1 : 2);
}
else if (tail.Peek(1) == GroupClose)
{
(head, tail) = tail.Pop(1);
}
return (field, tail);
}
}
It's used like this:
string line = "45,(23(Fo(,,(,)),(\"Bar\")o),34),66"; // 45,(23(Fo(,,(,)),("Bar")o),34),66
string[] fields = line.ParseLine();
Console.WriteLine(fields.All(f => line.Contains(f))); // True == maybe code is right, False == code is WRONG
And it gives me:
45
23(Fo(,,(,)),("Bar")o),34
66
First, calling line.Trim('\"') will not strip "any existing double-quotes"; it will only remove all leading and trailing instances of the '\"' char.
var line = "\"\"example \"goes here\"";
var trimmed = line.Trim('\"');
Console.WriteLine(trimmed); //output: example "goes here
Here's how you strip all of the '\"' char:
var line = "\"\"example \"goes here\"";
var trimmed = string.Join(string.Empty, line.Split('"'));
Console.WriteLine(trimmed); //output: example goes here
Notice you can also nix the escape because the " is inside of single quotes.
I'm making an assumption that what your string inputs look like this:
"OneValue,TwoValue,(OneB,TwoB),FiveValue"
or if you have quotes (I'm also assuming you won't actually have quotes inside, but we'll solve for that anyway:
"\"OneValue,TwoValue,(OneB,TwoB),FiveValue\"\""
And I'm expecting your final string[] lineparts variable to have the values in this hard declaration after processing:
var lineparts = new string[] { "OneValue", "TwoValue", "OneB, TwoB", "FiveValue" };
The first solution I can think of is to first split by '(', then iterate over the collection, conditionally splitting by ')' or ',', depending on which side of the opening parenthesis the current element is on. Pretty sure this is linear, so that's neat:
const string l = ",(";
const string r = "),";
const char c = ',';
const char a = '"';
var line = "\"One,Two,(OneB,TwoB),Five\"";
line = string.Join(string.Empty, line.Split(a)); //Strip "
var splitL = line.Split(l); //,(
var partsList = new List<string>();
foreach (var value in splitL)
{
if (value.Contains(r))//),
{
//inside of parentheses, so we keep the values before the ),
var splitR = value.Split(r);//),
//I don't like literal indexes, but we know we have at least one element because we have a value.
partsList.Add(splitR[0]);
//Everything else is after the closing parenthesis for this group, and before the parenthesis after that
//so we'll parse it all into different values.
//The literal index is safe here because split always returns two values if any value is found.
partsList.AddRange(splitR[1].Split(c));//,
}
else
{
//before the parentheses, so these are all different values
partsList.AddRange(value.Split(c));//,
}
}
var lineparts = partsList.ToArray();//{ "One", "Two", "OneB, TwoB", "Five" };
Here's a better example of a tighter integration with the code in your question, not considering the specific intended values of your Entity properties or the need to trim for quotations:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
public List<Entity> ParseEntityExclusionFile(List<string> entries, string urlFile)
{
entries.RemoveAt(0);
const char l = '(';
const char r = ')';
const char c = ',';
const char a = '"';
List<Entity> entities = new List<Entity>();
foreach (string line in entries)
{
var splitL = line.Split(l); //(
var partsList = new List<string>();
foreach (var value in splitL)
{
if (value.Contains(r))//)
{
var splitR = value.Split(r);//)
partsList.Add(splitR[0]);
if (!line.EndsWith(r))
{
partsList.AddRange(splitR[1].Remove(0, 1).Split(c));//,
}
}
else
{
if (!line.StartsWith(l))
{
partsList.AddRange(value.Remove(value.Length - 1).Split(c));//,
}
}
}
var lineParts = partsList.ToArray();//{ "One", "Two", "OneB, TwoB", "Five" };
entities.Add(new Entity
{
Id = 1000,
Names = lineParts[3],
Identifier = $"{lineParts[25]} | Classification: {lineParts[0]}";
});
}
return entities;
}
This solution may get hairy if your groups contain other groups, i.e...
"OneValue,TwoValue,(OneB,(TwoB, ThreeB)),SixValue"

Regular expression for pipe delimited and double quoted string

I have a string something like this:
"2014-01-23 09:13:45|\"10002112|TR0859657|25-DEC-2013>0000000000000001\"|10002112"
I would like to split by pipe apart from anything wrapped in double quotes so I have something like (similar to how csv is done):
[0] => 2014-01-23 09:13:45
[1] => 10002112|TR0859657|25-DEC-2013>0000000000000001
[2] => 10002112
I would like to know if there is a regular expression that can do this?
I think you may need to write your own parser.
Yo will need:
custom collection to keep results
boolean flag to decide whether pipe is inside quotation or outside quotation marks
string (or StringBuilder) to keep current word
The idea is that you read string char by char. Each char is appended to the word. If there is a pipe outside quotation marks you add the word to your result collection. If there is a quote you switch a flag so you don't treat the pipe as a divider anymore but you append it as a part of the word. Then if there is another quotation you switch the flag back again. So next pipe will result in adding the whole word (with pipes within quotation marks) to the collection. I tested the code below on your example and it worked.
private static List<string> ParseLine(string yourString)
{
bool ignorePipe = false;
string word = string.Empty;
List<string> divided = new List<string>();
foreach (char c in yourString)
{
if (c == '|' &&
!ignorePipe)
{
divided.Add(word);
word = string.Empty;
}
else if (c == '"')
{
ignorePipe = !ignorePipe;
}
else
{
word += c;
}
}
divided.Add(word);
return divided;
}
How about this Regular Expression:
/((["|]).*\2)/g
Online Demo
It looks like it could be used as valid split expression.
I'm going to blatantly ignore the fact that you want a RegEx, because I think that making your own IEnumerable will be easier. Plus, you get instant access to Linq.
var line = "2014-01-23 09:13:45|\"10002112|TR0859657|25-DEC-2013>0000000000000001\"|10002112";
var data = GetPartsFromLine(line).ToList();
private static IEnumerable<string> GetPartsFromLine(string line)
{
int position = -1;
while (position < line.Length)
{
position++;
if (line[position] == '"')
{
//go find the next "
int endQuote = line.IndexOf('"', position + 1);
yield return line.Substring(position + 1, endQuote - position - 1);
position = endQuote;
if (position < line.Length && line[position + 1] == '|')
{
position++;
}
}
else
{
//go find the next |
int pipe = line.IndexOf('|', position + 1);
if (pipe == -1)
{
//hit the end of the line
yield return line.Substring(position);
position = line.Length;
}
else
{
yield return line.Substring(position, pipe - position);
position = pipe;
}
}
}
}
This hasn't been fully tested, but it works with your example.

How to make these 2 methods more Efficient [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Hi guys this is my first question ever on SO so please go easy on me.
I am playing with Lambda/LINQ while building myself few utility methods.
First method takes string like,
"AdnanRazaBhatti"
and breaks it up like ,
"Adnan Raza Bhatti"
Second Methods takes string like first method and also takes,
out String[] brokenResults
and returns broken string like the first method as well as fill up brokenResults array as follows.
"Adnan" "Raza" "Bhatti"
Questions:
A. Can you please suggest how to make these methods more efficient?
B. When I try to use StringBuilder it tells me extension methods like, Where, Select does not exist for StringBuilder class, why is it so? Although indexer works on StringBuilder to get the characters like StringBuilder s = new StrinBuilder("Dang"); char c = s[0]; Here char will be D;
Code
Method 1:
public static string SplitCapital( string source )
{
string result = "";
int i = 0;
//Separate all the Capital Letter
var charUpper = source.Where( x => char.IsUpper( x ) ).ToArray<char>( );
//If there is only one Capital letter then it is already atomic.
if ( charUpper.Count( ) > 1 ) {
var strLower = source.Split( charUpper );
foreach ( string s in strLower )
if ( i < strLower.Count( ) - 1 && !String.IsNullOrEmpty( s ) )
result += charUpper.ElementAt( i++ ) + s + " ";
return result;
}
return source;
}
Method 2:
public static string SplitCapital( string source, out string[] brokenResults )
{
string result = "";
int i = 0;
var strUpper = source.Where( x => char.IsUpper( x ) ).ToArray<char>( );
if ( strUpper.Count( ) > 1 ) {
var strLower = source.Split( strUpper );
brokenResults = (
from s in strLower
where i < strLower.Count( ) - 1 && !String.IsNullOrEmpty( s )
select result = strUpper.ElementAt( i++ ) + s + " " ).ToArray( );
result = "";
foreach ( string s in brokenResults )
result += s;
return result;
}
else { brokenResults = new string[] { source }; }
return source;
}
Note:
I am planning to use these utility methods to break up the table column names I get from my database.
For Example if column name is "BooksId" I will break it up using one of these methods as "Books Id" programmatically, I know there are other ways or renaming the column names like in design window or [dataset].[tableName].HeadersRow.Cells[0].Text = "Books Id" but I am also planning to use this method somewhere else in the future.
Thanks
you can use the following extension methods to split your string based on Capital letters:
public static string Wordify(this string camelCaseWord)
{
/* CamelCaseWord will become Camel Case Word,
if the word is all upper, just return it*/
if (!Regex.IsMatch(camelCaseWord, "[a-z]"))
return camelCaseWord;
return string.Join(" ", Regex.Split(camelCaseWord, #"(?<!^)(?=[A-Z])"));
}
To split a string in a string array, you can use this:
public static string[] SplitOnVal(this string text,string value)
{
return text.Split(new[] { value }, StringSplitOptions.None);
}
If we take your example for consideration, the code will be as follows:
string strTest = "AdnanRazaBhatti";
var capitalCase = strTest.Wordify(); //Adnan Raza Bhatti
var brokenResults = capitalCase.SplitOnVal(" "); //seperate by a blank value in an array
Check this code
public static string SeperateCamelCase(this string value)
{
return Regex.Replace(value, "((?<=[a-z])[A-Z]|[A-Z](?=[a-z]))", " $1");
}
Hope this answer helps you. If you find solution kindly mark my answer and point it up.
Looks to me like regular expressions is the way to go.
I think [A-Z][a-z]+ might be a good one to start with.
Updated version. String builder was used to reduce memory utilization.
string SplitCapital(string str)
{
//Search all capital letters and store indexes
var indexes = str
.Select((c, i) => new { c = c, i = i }) // Select information about char and position
.Where(c => Char.IsUpper(c.c)) // Get only capital chars
.Select(cl => cl.i); // Get indexes of capital chars
// If no indexes found or if indicies count equal to the source string length then return source string
if (!indexes.Any() || indexes.Count() == str.Length)
{
return str;
}
// Create string builder from the source string
var sb = new StringBuilder(str);
// Reverse indexes and remove 0 if necessary
foreach (var index in indexes.Reverse().Where(i => i != 0))
{
// Insert spaces before capital letter
sb.Insert(index, ' ');
}
return sb.ToString();
}
string SplitCapital(string str, out string[] parts)
{
var splitted = SplitCapital(str);
parts = splitted.Split(new[] { ' ' }, StringSplitOptions.None);
return splitted;
}

C# split string but keep split chars / separators [duplicate]

I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}

Categories

Resources