Related
The following function splits a string at any uppercase character found within the string.
public static string ToSentence(this string input)
{
var list = new List<char>();
for (var i = 0; i < input.ToCharArray().Length; i++)
{
var c = input.ToCharArray()[i];
foreach (char c1 in i > 0 && char.IsUpper(c) ? new[] {' ', c} : new[] {c})
list.Add(c1);
}
return new string(list.ToArray());
}
In my code, this function is being used in conjunction with another function that retrieves the name of the current method in code. I'm finding this function breaks when a method name contains multiple capital letters sequentially.
For example, if I have a method called GetDatabaseIDE() it would return as "Get Database I D E"
How can I change my ToSentence function so that it accepts a list of keywords that won't be split (For example, I D E becomes IDE)?
Why not try Regex ? Demo # https://dotnetfiddle.net/FsPZ9O
1. ([A-Z]+) - match all leading uppercase char.
2. ([^A-Z])* - followed with zero-or-more of any that isn't an uppercase char.
Regex.Matches("GetDatabaseIDE", #"([A-Z]+)([^A-Z])*").Cast<Match>().Select(m => m.Value);
TakeWhile method can be useful here, once you find an upper case character, you can take the following upper case characters:
for (var i = 0; i < input.Length; i++)
{
var c = input[i];
if(char.IsUpper(c))
{
var charsToAdd = input.Skip(i).TakeWhile(char.IsUpper).ToArray();
if(charsToAdd.Length > 1)
{
i += charsToAdd.Length - 1;
}
if(i > 0) list.Add(' ');
list.Add(charsToAdd);
}
else
{
list.Add(c);
}
}
you can add keywords you want to skip:
public static string ToSentence(string input)
{
var list = new List<char>();
for (var i = 0; i < input.ToCharArray().Length; i++)
{
if(input.IndexOf("IDE",i,input.Length-i)==i){
list.AddRange(" IDE");
i+=2;
}
else{
var c = input.ToCharArray()[i];
foreach (char c1 in i > 0 && char.IsUpper(c) ? new[] {' ', c} : new[] {c})
list.Add(c1);
}
}
return new string(list.ToArray());
}
watch it here
This will work for your example, you will have to tweak it if your method names have numbers in them
using System.Text.RegularExpressions;
public static string GetNiceName(string testString)
{
var pattern = "([A-Z][a-z]+)|([A-Z]+)";
var result = new List<string>();
foreach (Match m in Regex.Matches(testString, pattern))
{
result.Add(m.Groups[0].Value);
}
return string.Join(" ", result);
}
How can I split string (from a textbox) by commas excluding those in double quotation marks (without getting rid of the quotation marks), along with other possible punctuation marks (e.g. ' . ' ' ; ' ' - ')?
E.g. If someone entered the following into the textbox:
apple, orange, "baboons, cows", rainbow, "unicorns, gummy bears"
How can I split the above string into the following (say, into a List)?
apple
orange
"baboons, cows"
rainbow
"Unicorns, gummy bears..."
Thank you for your help!
You could try the below regex which uses positive lookahead,
string value = #"apple, orange, ""baboons, cows"", rainbow, ""unicorns, gummy bears""";
string[] lines = Regex.Split(value, #", (?=(?:""[^""]*?(?: [^""]*)*))|, (?=[^"",]+(?:,|$))");
foreach (string line in lines) {
Console.WriteLine(line);
}
Output:
apple
orange
"baboons, cows"
rainbow
"unicorns, gummy bears"
IDEONE
Try this:
Regex str = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match m in str.Matches(input))
{
Console.WriteLine(m.Value.TrimStart(','));
}
You may also try to look at FileHelpers
Much like a CSV parser, instead of Regex, you can loop through each character, like so:
public List<string> ItemStringToList(string inputString)
{
var itemList = new List<string>();
var currentIem = "";
var quotesOpen = false;
for (int i = 0; i < inputString.Length; i++)
{
if (inputString[i] == '"')
{
quotesOpen = !quotesOpen;
continue;
}
if (inputString[i] == ',' && !quotesOpen)
{
itemList.Add(currentIem);
currentIem = "";
continue;
}
if (currentIem == "" && inputString[i] == ' ') continue;
currentIem += inputString[i];
}
if (currentIem != "") itemList.Add(currentIem);
return itemList;
}
Example test usage:
var test1 = ItemStringToList("one, two, three");
var test2 = ItemStringToList("one, \"two\", three");
var test3 = ItemStringToList("one, \"two, three\"");
var test4 = ItemStringToList("one, \"two, three\", four, \"five six\", seven");
var test5 = ItemStringToList("one, \"two, three\", four, \"five six\", seven");
var test6 = ItemStringToList("one, \"two, three\", four, \"five six, seven\"");
var test7 = ItemStringToList("\"one, two, three\", four, \"five six, seven\"");
You could change it to use StringBuilder if you want faster character joining.
Try with this it will work u c an split array string in many waysif you want to split by white space just put a space in (' ') .
namespace LINQExperiment1
{
class Program
{
static void Main(string[] args)
{
string[] sentence = new string[] { "apple", "orange", "baboons cows", " rainbow", "unicorns gummy bears" };
Console.WriteLine("option 1:"); Console.WriteLine("————-");
// option 1: Select returns three string[]’s with
// three strings in each.
IEnumerable<string[]> words1 =
sentence.Select(w => w.Split(' '));
// to get each word, we have to use two foreach loops
foreach (string[] segment in words1)
foreach (string word in segment)
Console.WriteLine(word);
Console.WriteLine();
Console.WriteLine("option 2:"); Console.WriteLine("————-");
// option 2: SelectMany returns nine strings
// (sub-iterates the Select result)
IEnumerable<string> words2 =
sentence.SelectMany(segment => segment.Split(','));
// with SelectMany we have every string individually
foreach (var word in words2)
Console.WriteLine(word);
// option 3: identical to Opt 2 above written using
// the Query Expression syntax (multiple froms)
IEnumerable<string> words3 =from segment in sentence
from word in segment.Split(' ')
select word;
}
}
}
This was trickier than I thought, a good practical problem I think.
Below is the solution I came up with for this. One thing I don't like about my solution is having to add double quotations back and the other one being names of the variables :p:
internal class Program
{
private static void Main(string[] args)
{
string searchString =
#"apple, orange, ""baboons, cows. dogs- hounds"", rainbow, ""unicorns, gummy bears"", abc, defghj";
char delimeter = ',';
char excludeSplittingWithin = '"';
string[] splittedByExcludeSplittingWithin = searchString.Split(excludeSplittingWithin);
List<string> splittedSearchString = new List<string>();
for (int i = 0; i < splittedByExcludeSplittingWithin.Length; i++)
{
if (i == 0 || splittedByExcludeSplittingWithin[i].StartsWith(delimeter.ToString()))
{
string[] splitttedByDelimeter = splittedByExcludeSplittingWithin[i].Split(delimeter);
for (int j = 0; j < splitttedByDelimeter.Length; j++)
{
splittedSearchString.Add(splitttedByDelimeter[j].Trim());
}
}
else
{
splittedSearchString.Add(excludeSplittingWithin + splittedByExcludeSplittingWithin[i] +
excludeSplittingWithin);
}
}
foreach (string s in splittedSearchString)
{
if (s.Trim() != string.Empty)
{
Console.WriteLine(s);
}
}
Console.ReadKey();
}
}
Another Regex solution:
private static IEnumerable<string> Parse(string input)
{
// if used frequently, should be instantiated with Compiled option
Regex regex = new Regex(#"(?<=^|,\s)(\""(?:[^\""]|\""\"")*\""|[^,\s]*)");
return regex.Matches(inputData).Where(m => m.Success);
}
Consider the following english phrase
FRIEND AND COLLEAGUE AND (FRIEND OR COLLEAGUE AND (COLLEAGUE AND FRIEND AND FRIEND))
I want to be able to programmatically change arbitrary phrases, such as above, to something like:
SELECT * FROM RelationTable R1 JOIN RelationTable R2 ON R2.RelationName etc etc WHERE
R2.RelationName = FRIEND AND R2.RelationName = Colleague AND (R3.RelationName = FRIENd,
etc. etc.
My question is. How do I take the initial string, strip it of the following words and symbols : AND, OR, (, ),
Then change each word, and create a new string.
I can do most of it, but my main problem is that if I do a string.split and only get the words I care for, I can't really replace them in the original string because I lack their original index. Let me explain in a smaller example:
string input = "A AND (B AND C)"
Split the string for space, parenthesies, etc, gives: A,B,C
input.Replace("A", "MyRandomPhrase")
But there is an A in AND.
So I moved into trying to create a regular expression that matches exact words, post split, and replaces. It started to look like this:
"(\(|\s|\))*" + itemOfInterest + "(\(|\s|\))+"
Am I on the right track or am I overcomplicating things..Thanks !
You can try using Regex.Replace, with \b word boundary regex
string input = "A AND B AND (A OR B AND (B AND A AND A))";
string pattern = "\\bA\\b";
string replacement = "MyRandomPhrase";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
class Program
{
static void Main(string[] args)
{
string text = "A AND (B AND C)";
List<object> result = ParseBlock(text);
Console.ReadLine();
}
private static List<object> ParseBlock(string text)
{
List<object> result = new List<object>();
int bracketsCount = 0;
int lastIndex = 0;
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (c == '(')
bracketsCount++;
else if (c == ')')
bracketsCount--;
if (bracketsCount == 0)
if (c == ' ' || i == text.Length - 1)
{
string substring = text.Substring(lastIndex, i + 1 - lastIndex).Trim();
object itm = substring;
if (substring[0] == '(')
itm = ParseBlock(substring.Substring(1, substring.Length - 2));
result.Add(itm);
lastIndex = i;
}
}
return result;
}
}
Rather than describing what I want (it's difficult to explain), Let me provide an example of what I need to accomplish in C# using a regular expression:
"HelloWorld" should be transformed to "Hello World"
"HelloWORld" should be transformed to "Hello WO Rld" //Two consecutive letters in capital should be treatead as one word
"helloworld" should be transformed to "helloworld"
EDIT:
"HellOWORLd" should be transformed to "Hell OW OR Ld"
Every 2-consecutive capital letters should be considered one word.
Is this possible?
This is fully working C# code, not just the regex:
Console.WriteLine(
Regex.Replace(
"HelloWORld",
"(?<!^)(?<wordstart>[A-Z]{1,2})",
" ${wordstart}", RegexOptions.Compiled));
And it prints:
Hello WO Rld
Update
To make this more UNICODE/international aware, consider replacing [A-Z] by \p{Lt} (meaning a UNICODE code point that represents a Letter in uppercase). The result for the current input would the same. So here is a slightly more compelling example:
Console.WriteLine(Regex.Replace(
#"ÉclaireürfØÑJßå",
#"(?<!^)(?<wordstart>\p{Lu}{1,2})",
#" ${wordstart}",
RegexOptions.Compiled));
The regular expression engine is not a transformative thing by nature, but rather a pattern matching (and replacing) engine. People often mistake the replace part of Regex, thinking that it can do more than it's designed to.
Back to your question, though... Regex cannot do what you want, instead, you should write your own parser to do this. With C#, if you're familiar with the language, this task is somewhat trivial.
It's a case of "You're using the wrong tool for the job".
Here are regular expressions that detect what you are looking for:
([A-Z]\w*?)[A-Z]
this matches any uppercase letter from A to Z once followed by aphanumerics up to the next uppercase.
([A-Z]{2}\w*?)[A-Z]
this matches any uppercase letter from A to Z exactly 2 times.
Regex is a matching engine, you can parse the input string and use regex.isMatch to find candidate matches to then insert spaces into the output string
string f(string input)
{
//'lowerUPPER' -> 'lower UPPER'
var x = Regex.Replace(input, "([a-z])([A-Z])","$1 $2");
//'UPPER' -> 'UP PE R'
return Regex.Replace(x, "([A-Z]{2})","$1 ");
}
class Program
{
static void Main(string[] args)
{
Print(Parse("HelloWorld"));
Print(Parse("HelloWORld"));
Print(Parse("helloworld"));
Print(Parse("HellOWORLd"));
Console.ReadLine();
}
static void Print(IEnumerable<string> input)
{
foreach (var s in input)
{
Console.Write(s);
Console.Write(' ');
}
Console.WriteLine();
}
static IEnumerable<string> Parse(string input)
{
var sb = new StringBuilder();
for (int i = 0; i < input.Length; i++)
{
if (!char.IsUpper(input[i]))
{
sb.Append(input[i]);
continue;
}
if (sb.Length > 0)
{
yield return sb.ToString();
sb.Clear();
}
sb.Append(input[i]);
if (char.IsUpper(input[i + 1]))
{
sb.Append(input[++i]);
yield return sb.ToString();
sb.Clear();
}
}
if (sb.Length > 0)
{
yield return sb.ToString();
}
}
}
I think does not need regular expression in this case.
Try this:
static void Main(string[] args)
{
var input = "HellOWORLd";
var i = 0;
var x = 4;
var len = input.Length;
var output = new List<string>();
while (x <= len)
{
output.Add(SubStr(input, i, x));
i = x;
x += 2;
}
var ret = output.ToArray(); //["Hell","OW", "OR", "Ld"]
Console.ReadLine();
}
static string SubStr(string str, int start, int end)
{
var len = str.Length;
if (start >= 0 && end <= len)
{
var ret = new StringBuilder();
for (int i = 0; i < len; i++)
{
if (i == start)
{
do
{
ret.Append(str[i]);
i++;
} while (i != end);
}
}
return ret.ToString();
}
return null;
}
I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}