I have a variable name, say "WARD_VS_VITAL_SIGNS", and I want to convert it to Pascal case format: "WardVsVitalSigns"
WARD_VS_VITAL_SIGNS -> WardVsVitalSigns
How can I make this conversion?
You do not need a regular expression for that.
var yourString = "WARD_VS_VITAL_SIGNS".ToLower().Replace("_", " ");
TextInfo info = CultureInfo.CurrentCulture.TextInfo;
yourString = info.ToTitleCase(yourString).Replace(" ", string.Empty);
Console.WriteLine(yourString);
Here is my quick LINQ & regex solution to save someone's time:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public string ToPascalCase(string original)
{
Regex invalidCharsRgx = new Regex("[^_a-zA-Z0-9]");
Regex whiteSpace = new Regex(#"(?<=\s)");
Regex startsWithLowerCaseChar = new Regex("^[a-z]");
Regex firstCharFollowedByUpperCasesOnly = new Regex("(?<=[A-Z])[A-Z0-9]+$");
Regex lowerCaseNextToNumber = new Regex("(?<=[0-9])[a-z]");
Regex upperCaseInside = new Regex("(?<=[A-Z])[A-Z]+?((?=[A-Z][a-z])|(?=[0-9]))");
// replace white spaces with undescore, then replace all invalid chars with empty string
var pascalCase = invalidCharsRgx.Replace(whiteSpace.Replace(original, "_"), string.Empty)
// split by underscores
.Split(new char[] { '_' }, StringSplitOptions.RemoveEmptyEntries)
// set first letter to uppercase
.Select(w => startsWithLowerCaseChar.Replace(w, m => m.Value.ToUpper()))
// replace second and all following upper case letters to lower if there is no next lower (ABC -> Abc)
.Select(w => firstCharFollowedByUpperCasesOnly.Replace(w, m => m.Value.ToLower()))
// set upper case the first lower case following a number (Ab9cd -> Ab9Cd)
.Select(w => lowerCaseNextToNumber.Replace(w, m => m.Value.ToUpper()))
// lower second and next upper case letters except the last if it follows by any lower (ABcDEf -> AbcDef)
.Select(w => upperCaseInside.Replace(w, m => m.Value.ToLower()));
return string.Concat(pascalCase);
}
Example output:
"WARD_VS_VITAL_SIGNS" "WardVsVitalSigns"
"Who am I?" "WhoAmI"
"I ate before you got here" "IAteBeforeYouGotHere"
"Hello|Who|Am|I?" "HelloWhoAmI"
"Live long and prosper" "LiveLongAndProsper"
"Lorem ipsum dolor..." "LoremIpsumDolor"
"CoolSP" "CoolSp"
"AB9CD" "Ab9Cd"
"CCCTrigger" "CccTrigger"
"CIRC" "Circ"
"ID_SOME" "IdSome"
"ID_SomeOther" "IdSomeOther"
"ID_SOMEOther" "IdSomeOther"
"CCC_SOME_2Phases" "CccSome2Phases"
"AlreadyGoodPascalCase" "AlreadyGoodPascalCase"
"999 999 99 9 " "999999999"
"1 2 3 " "123"
"1 AB cd EFDDD 8" "1AbCdEfddd8"
"INVALID VALUE AND _2THINGS" "InvalidValueAnd2Things"
First off, you are asking for title case and not camel-case, because in camel-case the first letter of the word is lowercase and your example shows you want the first letter to be uppercase.
At any rate, here is how you could achieve your desired result:
string textToChange = "WARD_VS_VITAL_SIGNS";
System.Text.StringBuilder resultBuilder = new System.Text.StringBuilder();
foreach(char c in textToChange)
{
// Replace anything, but letters and digits, with space
if(!Char.IsLetterOrDigit(c))
{
resultBuilder.Append(" ");
}
else
{
resultBuilder.Append(c);
}
}
string result = resultBuilder.ToString();
// Make result string all lowercase, because ToTitleCase does not change all uppercase correctly
result = result.ToLower();
// Creates a TextInfo based on the "en-US" culture.
TextInfo myTI = new CultureInfo("en-US",false).TextInfo;
result = myTI.ToTitleCase(result).Replace(" ", String.Empty);
Note: result is now WardVsVitalSigns.
If you did, in fact, want camel-case, then after all of the above, just use this helper function:
public string LowercaseFirst(string s)
{
if (string.IsNullOrEmpty(s))
{
return string.Empty;
}
char[] a = s.ToCharArray();
a[0] = char.ToLower(a[0]);
return new string(a);
}
So you could call it, like this:
result = LowercaseFirst(result);
Single semicolon solution:
public static string PascalCase(this string word)
{
return string.Join("" , word.Split('_')
.Select(w => w.Trim())
.Where(w => w.Length > 0)
.Select(w => w.Substring(0,1).ToUpper() + w.Substring(1).ToLower()));
}
Extension method for System.String with .NET Core compatible code by using System and System.Linq.
Does not modify the original string.
.NET Fiddle for the code below
using System;
using System.Linq;
public static class StringExtensions
{
/// <summary>
/// Converts a string to PascalCase
/// </summary>
/// <param name="str">String to convert</param>
public static string ToPascalCase(this string str){
// Replace all non-letter and non-digits with an underscore and lowercase the rest.
string sample = string.Join("", str?.Select(c => Char.IsLetterOrDigit(c) ? c.ToString().ToLower() : "_").ToArray());
// Split the resulting string by underscore
// Select first character, uppercase it and concatenate with the rest of the string
var arr = sample?
.Split(new []{'_'}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => $"{s.Substring(0, 1).ToUpper()}{s.Substring(1)}");
// Join the resulting collection
sample = string.Join("", arr);
return sample;
}
}
public class Program
{
public static void Main()
{
Console.WriteLine("WARD_VS_VITAL_SIGNS".ToPascalCase()); // WardVsVitalSigns
Console.WriteLine("Who am I?".ToPascalCase()); // WhoAmI
Console.WriteLine("I ate before you got here".ToPascalCase()); // IAteBeforeYouGotHere
Console.WriteLine("Hello|Who|Am|I?".ToPascalCase()); // HelloWhoAmI
Console.WriteLine("Live long and prosper".ToPascalCase()); // LiveLongAndProsper
Console.WriteLine("Lorem ipsum dolor sit amet, consectetur adipiscing elit.".ToPascalCase()); // LoremIpsumDolorSitAmetConsecteturAdipiscingElit
}
}
var xs = "WARD_VS_VITAL_SIGNS".Split('_');
var q =
from x in xs
let first_char = char.ToUpper(x[0])
let rest_chars = new string(x.Skip(1).Select(c => char.ToLower(c)).ToArray())
select first_char + rest_chars;
Some answers are correct but I really don't understand why they set the text to LowerCase first, because the ToTitleCase will handle that automatically:
var text = "WARD_VS_VITAL_SIGNS".Replace("_", " ");
TextInfo textInfo = CultureInfo.CurrentCulture.TextInfo;
text = textInfo.ToTitleCase(text).Replace(" ", string.Empty);
Console.WriteLine(text);
You can use this:
public static string ConvertToPascal(string underScoreString)
{
string[] words = underScoreString.Split('_');
StringBuilder returnStr = new StringBuilder();
foreach (string wrd in words)
{
returnStr.Append(wrd.Substring(0, 1).ToUpper());
returnStr.Append(wrd.Substring(1).ToLower());
}
return returnStr.ToString();
}
This answer understands that there are Unicode categories which can be tapped while processing the text to ignore the connecting characters such as - or _. In regex parlance it is \p (for category) then the type which is {Pc} for punctuation and connector type character; \p{Pc} using our MatchEvaluator which is kicked off for each match within a session.
So during the match phase, we get words and ignore the punctuations, so the replace operation handles the removal of the connector character. Once we have the match word, we can push it down to lowercase and then only up case the first character as the return for the replace:
public static class StringExtensions
{
public static string ToPascalCase(this string initial)
=> Regex.Replace(initial,
// (Match any non punctuation) & then ignore any punctuation
#"([^\p{Pc}]+)[\p{Pc}]*",
new MatchEvaluator(mtch =>
{
var word = mtch.Groups[1].Value.ToLower();
return $"{Char.ToUpper(word[0])}{word.Substring(1)}";
}));
}
Usage:
"TOO_MUCH_BABY".ToPascalCase(); // TooMuchBaby
"HELLO|ITS|ME".ToPascalCase(); // HelloItsMe
See Word Character in Character Classes in Regular Expressions
Pc Punctuation, Connector. This category includes ten characters, the
most commonly used of which is the LOWLINE character (_), u+005F.
If you did want to replace any formatted string into a pascal case then you can do
public static string ToPascalCase(this string original)
{
string newString = string.Empty;
bool makeNextCharacterUpper = false;
for (int index = 0; index < original.Length; index++)
{
char c = original[index];
if(index == 0)
newString += $"{char.ToUpper(c)}";
else if (makeNextCharacterUpper)
{
newString += $"{char.ToUpper(c)}";
makeNextCharacterUpper = false;
}
else if (char.IsUpper(c))
newString += $" {c}";
else if (char.IsLower(c) || char.IsNumber(c))
newString += c;
else if (char.IsNumber(c))
newString += $"{c}";
else
{
makeNextCharacterUpper = true;
newString += ' ';
}
}
return newString.TrimStart().Replace(" ", "");
}
Tested with strings
I|Can|Get|A|String
ICan_GetAString
i-can-get-a-string
i_can_get_a_string
I Can Get A String
ICanGetAString
I found this gist useful after adding a ToLower() to it.
"WARD_VS_VITAL_SIGNS"
.ToLower()
.Split(new [] {"_"}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => char.ToUpperInvariant(s[0]) + s.Substring(1, s.Length - 1))
.Aggregate(string.Empty, (s1, s2) => s1 + s2)
Related
I am processing files in order to replace a list of pre-defined keywords with a pre- and a post-string (say "#" and ".") like this :
"Word Word2 anotherWord and some other stuff" should become "#Word. #Word2. #anotherWord. and some other stuff"
My keys are unique and processed the keys from longest key to smallest, so I know inclusion can only be on already
However, if I have key inclusion (e.g. Word2 contains Word), and if I do
"Word Word2 anotherWord and some other stuff"
.Replace("anotherWord", "#anotherWord.")
.Replace("Word2", "#Word2.")
.Replace("Word", "#Word.")
I get the following result:
"#Word. ##Word.2. #another#Word.. and some other stuff"
For sure, my approach isn't wokring. So what is the way to make sure I only replace a key in the string, if it is NOT contained in another key? I tried RegExp but didn't find the correct way. Or there is another solution?
Just use Regular expressions with word boundary if performance is not a key requirement:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace Subst
{
public class Program
{
public static void Main(string[] args)
{
var map = new Dictionary<string, string>{
{"Word", "#Word."},
{"anotherWord", "#anotherWord."},
{"Word2", "#Word2."}
};
var input = "Word Word2 anotherWord and some other stuff";
foreach(var mapping in map) {
input = Regex.Replace(input, String.Format("\\b{0}\\b", mapping.Key), Regex.Escape(mapping.Value));
}
Console.WriteLine(input);
}
}
}
One way is to use
string myString = String.Format("ORIGINAL TEXT {1} {2}", "TEXT TO PUT INSIDE CURLY BRACKET 1", "TEXT TO PUT IN CURLY BRACKET 2");
//Result: "ORIGINAL TEXT TEXT TO PUT INSIDE CURLY BRACKET 1 TEXT TO PUT IN CURLY BRACKET 2"
However, this requires your original text to have the curly brackets inside in the first place.
Quite messy, but you could always replace the words you are looking for with the Replace and then change the curly backets at the same time. There is probably a far better way of doing this but I cant think of it right now.
I suggest direct implementation, e.g.
private static String MyReplace(string value, params Tuple<string, string>[] substitutes) {
if (string.IsNullOrEmpty(value))
return value;
else if (null == substitutes || !substitutes.Any())
return value;
int start = 0;
StringBuilder sb = new StringBuilder();
while (true) {
int at = -1;
Tuple<string, string> best = null;
foreach (var pair in substitutes) {
int index = value.IndexOf(pair.Item1, start);
if (index >= 0)
if (best == null ||
index < at ||
index == at && best.Item1.Length < pair.Item1.Length) {
at = index;
best = pair;
}
}
if (best == null) {
sb.Append(value.Substring(start));
break;
}
sb.Append(value.Substring(start, at - start));
sb.Append(best.Item2);
start = best.Item1.Length + at;
}
return sb.ToString();
}
Test
string source = "Word Word2 anotherWord and some other stuff";
var result = MyReplace(source,
new Tuple<string, string>("anotherWord", "#anotherWord."),
new Tuple<string, string>("Word2", "#Word2."),
new Tuple<string, string>("Word", "#Word."));
Console.WriteLine(result);
Outcome:
#Word. #Word2. #anotherWord. and some other stuff
Regex alternative (order doesn't matter):
var result = Regex.Replace("Word Word2 anotherWord and some other stuff", #"\b\S+\b", m =>
m.Value == "anotherWord" ? "#anotherWord." :
m.Value == "Word2" ? "#Word2." :
m.Value == "Word" ? "#Word." : m.Value)
Or separate:
string s = "Word Word2 anotherWord and some other stuff";
s = Regex.Replace(s, #"\b" + Regex.Escape("anotherWord") + #"\b", "#anotherWord.");
s = Regex.Replace(s, #"\b" + Regex.Escape("Word2") + #"\b", "#Word2.");
s = Regex.Replace(s, #"\b" + Regex.Escape("Word") + #"\b", "#Word.");
Solved the problem using a two-loop-through approach as follows...
List<string> keys = new List<string>();
keys.Add("Word1"); // ... and so on
// IMPORTANT: algorithm works only when we are sure that one key cannot be
// included in another key with higher index. Also, uniqueness is
// guaranteed by construction, although the routine would work
// duplicate key...!
keys = keys.OrderByDescending(x => x.Length).ThenBy(x => x).ToList<string>();
// first loop: replace with some UNIQUE key hash in text
foreach(string key in keys) {
txt.Replace(key, string.Format("!#someUniqueKeyNotInKeysAndNotInTXT_{0}_#!", keys.IndexOf(key)));
}
// second loop: replace UNIQUE key hash with corresponding values...
foreach(string key in keys) {
txt.Replace(string.Format("!#someUniqueKeyNotInKeysAndNotInTXT_{0}_#!", keys.IndexOf(key)), string.Format("{0}{1}{2}", preStr, key, postStr));
}
You can split your string by ' ' and cycle through the string array. Compare each index of the array to your replacement strings and then concatenate them when finished.
string newString = "Word Word2 anotherWord and some other stuff";
string[] split = newString.Split(' ');
foreach (var s in split){
if(s == "Word"){
s = "#Word";
} else if(s == "Word2"){
s = "#Word2";
} else if(s == "anotherWord"){
s = "#anotherWord";
}
}
string finalString = string.Concat(split);
This question already has answers here:
How can I Split(',') a string while ignore commas in between quotes?
(3 answers)
C# Regex Split - commas outside quotes
(7 answers)
Closed 5 years ago.
I need to split a csv file by comma apart from where the columns is between quote marks. However, what I have here does not seem to be achieving what I need and comma's in columns are being split into separate array items.
public List<string> GetData(string dataFile, int row)
{
try
{
var lines = File.ReadAllLines(dataFile).Select(a => a.Split(';'));
var csv = from line in lines select (from piece in line select piece.Split(',')).ToList();
var foo = csv.ToList();
var result = foo[row][0].ToList();
return result;
}
catch
{
return null;
}
}
private const string QUOTE = "\"";
private const string ESCAPED_QUOTE = "\"\"";
private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };
public static string Escape(string s)
{
if (s.Contains(QUOTE))
s = s.Replace(QUOTE, ESCAPED_QUOTE);
if (s.IndexOfAny(CHARACTERS_THAT_MUST_BE_QUOTED) > -1)
s = QUOTE + s + QUOTE;
return s;
}
I am not sure where I can use my escape function in this case.
Example:
Degree,Graduate,08-Dec-17,Level 1,"Advanced, Maths"
The string Advanced, Maths are being split into two different array items which I don't want
You could use regex, linq or just loop through each character and use Booleans to figure out what the current behaviour should be. This question actually got me thinking, as I'd previously just looped through and acted on each character. Here is Linq way of breaking an entire csv document up, assuming the end of line can be found with ';':
private static void Main(string[] args)
{
string example = "\"Hello World, My name is Gumpy!\",20,male;My sister's name is Amy,29,female";
var result1 = example.Split(';')
.Select(s => s.Split('"')) // This will leave anything in abbreviation marks at odd numbers
.Select(sl => sl.Select((ss, index) => index % 2 == 0 ? ss.Split(',') : new string[] { ss })) // if it's an even number split by a comma
.Select(sl => sl.SelectMany(sc => sc));
Console.WriteLine("Press any key to continue.");
Console.ReadKey();
}
Not sure how this performes - but you can solve that with Linq.Aggregate like this:
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static IEnumerable<string> SplitIt(
char[] splitters,
string text,
StringSplitOptions opt = StringSplitOptions.None)
{
bool inside = false;
var result = text.Aggregate(new List<string>(), (acc, c) =>
{
// this will check each char of your given text
// and accumulate it in the (empty starting) string list
// your splitting chars will lead to a new item put into
// the list if they are not inside. inside starst as false
// and is flipped anytime it hits a "
// at end we either return all that was parsed or only those
// that are neither null nor "" depending on given opt's
if (!acc.Any()) // nothing in yet
{
if (c != '"' && (!splitters.Contains(c) || inside))
acc.Add("" + c);
else if (c == '"')
inside = !inside;
else if (!inside && splitters.Contains(c)) // ",bla"
acc.Add(null);
return acc;
}
if (c != '"' && (!splitters.Contains(c) || inside))
acc[acc.Count - 1] = (acc[acc.Count - 1] ?? "") + c;
else if (c == '"')
inside = !inside;
else if (!inside && splitters.Contains(c)) // ",bla"
acc.Add(null);
return acc;
}
);
if (opt == StringSplitOptions.RemoveEmptyEntries)
return result.Where(r => !string.IsNullOrEmpty(r));
return result;
}
public static void Main()
{
var s = ",,Degree,Graduate,08-Dec-17,Level 1,\"Advanced, Maths\",,";
var spl = SplitIt(new[]{','}, s);
var spl2 = SplitIt(new[]{','}, s, StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(string.Join("|", spl));
Console.WriteLine(string.Join("|", spl2));
}
}
Output:
|Degree|Graduate|08-Dec-17|Level 1|Advanced, Maths||
Degree|Graduate|08-Dec-17|Level 1|Advanced, Maths
The function gets comma separated fields within a string, excluding commas embedded in a quoted field
The assumptions
It should return empty fields ,,
There are no quotes within a quote field (as per the example)
The method
I uses a for loop with i as a place holder of the current field
It scans for the next comma or quote and if it finds a quote it scans for the next comma to create the field
It needed to be efficient otherwise we would use regex or Linq
The OP didn't want to use a CSV library
Note : There is no error checking, and scanning each character would be faster this was just easy to understand
Code
public List<string> GetFields(string line)
{
var list = new List<string>();
for (var i = 0; i < line.Length; i++)
{
var firstQuote = line.IndexOf('"', i);
var firstComma = line.IndexOf(',', i);
if (firstComma >= 0)
{
// first comma is before the first quote, then its just a standard field
if (firstComma < firstQuote || firstQuote == -1)
{
list.Add(line.Substring(i, firstComma - i));
i = firstComma;
continue;
}
// We have found quote so look for the next comma afterwards
var nextQuote = line.IndexOf('"', firstQuote + 1);
var nextComma = line.IndexOf(',', nextQuote + 1);
// if we found a comma, then we have found the end of this field
if (nextComma >= 0)
{
list.Add(line.Substring(i, nextComma - i));
i = nextComma;
continue;
}
}
list.Add(line.Substring(i)); // if were are here there are no more fields
break;
}
return list;
}
Tests 1
Degree,Graduate,08-Dec-17,Level 1,"Advanced, Maths",another
Degree
Graduate
08-Dec-17
Level 1
"Advanced, Maths"
another
Tests 2
,Degree,Graduate,08-Dec-17,\"asdasd\",Level 1,\"Advanced, Maths\",another
<Empty Line>
Degree
Graduate
08-Dec-17
"asdasd"
Level 1
"Advanced, Maths"
another
How to remove leading zeros in strings using C#?
For example in the following numbers, I would like to remove all the leading zeros.
0001234
0000001234
00001234
This is the code you need:
string strInput = "0001234";
strInput = strInput.TrimStart('0');
It really depends on how long the NVARCHAR is, as a few of the above (especially the ones that convert through IntXX) methods will not work for:
String s = "005780327584329067506780657065786378061754654532164953264952469215462934562914562194562149516249516294563219437859043758430587066748932647329814687194673219673294677438907385032758065763278963247982360675680570678407806473296472036454612945621946";
Something like this would
String s ="0000058757843950000120465875468465874567456745674000004000".TrimStart(new Char[] { '0' } );
// s = "58757843950000120465875468465874567456745674000004000"
Code to avoid returning an empty string ( when input is like "00000").
string myStr = "00012345";
myStr = myStr.TrimStart('0');
myStr = myStr.Length > 0 ? myStr : "0";
return numberString.TrimStart('0');
Using the following will return a single 0 when input is all 0.
string s = "0000000"
s = int.Parse(s).ToString();
TryParse works if your number is less than Int32.MaxValue. This also gives you the opportunity to handle badly formatted strings. Works the same for Int64.MaxValue and Int64.TryParse.
int number;
if(Int32.TryParse(nvarchar, out number))
{
// etc...
number.ToString();
}
This Regex let you avoid wrong result with digits which consits only from zeroes "0000" and work on digits of any length:
using System.Text.RegularExpressions;
/*
00123 => 123
00000 => 0
00000a => 0a
00001a => 1a
00001a => 1a
0000132423423424565443546546356546454654633333a => 132423423424565443546546356546454654633333a
*/
Regex removeLeadingZeroesReg = new Regex(#"^0+(?=\d)");
var strs = new string[]
{
"00123",
"00000",
"00000a",
"00001a",
"00001a",
"0000132423423424565443546546356546454654633333a",
};
foreach (string str in strs)
{
Debug.Print(string.Format("{0} => {1}", str, removeLeadingZeroesReg.Replace(str, "")));
}
And this regex will remove leading zeroes anywhere inside string:
new Regex(#"(?<!\d)0+(?=\d)");
// "0000123432 d=0 p=002 3?0574 m=600"
// => "123432 d=0 p=2 3?574 m=600"
Regex rx = new Regex(#"^0+(\d+)$");
rx.Replace("0001234", #"$1"); // => "1234"
rx.Replace("0001234000", #"$1"); // => "1234000"
rx.Replace("000", #"$1"); // => "0" (TrimStart will convert this to "")
// usage
var outString = rx.Replace(inputString, #"$1");
I just crafted this as I needed a good, simple way.
If it gets to the final digit, and if it is a zero, it will stay.
You could also use a foreach loop instead for super long strings.
I just replace each leading oldChar with the newChar.
This is great for a problem I just solved, after formatting an int into a string.
/* Like this: */
int counterMax = 1000;
int counter = ...;
string counterString = counter.ToString($"D{counterMax.ToString().Length}");
counterString = RemoveLeadingChars('0', ' ', counterString);
string fullCounter = $"({counterString}/{counterMax})";
// = ( 1/1000) ... ( 430/1000) ... (1000/1000)
static string RemoveLeadingChars(char oldChar, char newChar, char[] chars)
{
string result = "";
bool stop = false;
for (int i = 0; i < chars.Length; i++)
{
if (i == (chars.Length - 1)) stop = true;
if (!stop && chars[i] == oldChar) chars[i] = newChar;
else stop = true;
result += chars[i];
}
return result;
}
static string RemoveLeadingChars(char oldChar, char newChar, string text)
{
return RemoveLeadingChars(oldChar, newChar, text.ToCharArray());
}
I always tend to make my functions suitable for my own library, so there are options.
I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}
How can I replace multiple spaces in a string with only one space in C#?
Example:
1 2 3 4 5
would be:
1 2 3 4 5
I like to use:
myString = Regex.Replace(myString, #"\s+", " ");
Since it will catch runs of any kind of whitespace (e.g. tabs, newlines, etc.) and replace them with a single space.
string sentence = "This is a sentence with multiple spaces";
RegexOptions options = RegexOptions.None;
Regex regex = new Regex("[ ]{2,}", options);
sentence = regex.Replace(sentence, " ");
string xyz = "1 2 3 4 5";
xyz = string.Join( " ", xyz.Split( new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries ));
I think Matt's answer is the best, but I don't believe it's quite right. If you want to replace newlines, you must use:
myString = Regex.Replace(myString, #"\s+", " ", RegexOptions.Multiline);
Another approach which uses LINQ:
var list = str.Split(' ').Where(s => !string.IsNullOrWhiteSpace(s));
str = string.Join(" ", list);
It's much simpler than all that:
while(str.Contains(" ")) str = str.Replace(" ", " ");
Regex can be rather slow even with simple tasks. This creates an extension method that can be used off of any string.
public static class StringExtension
{
public static String ReduceWhitespace(this String value)
{
var newString = new StringBuilder();
bool previousIsWhitespace = false;
for (int i = 0; i < value.Length; i++)
{
if (Char.IsWhiteSpace(value[i]))
{
if (previousIsWhitespace)
{
continue;
}
previousIsWhitespace = true;
}
else
{
previousIsWhitespace = false;
}
newString.Append(value[i]);
}
return newString.ToString();
}
}
It would be used as such:
string testValue = "This contains too much whitespace."
testValue = testValue.ReduceWhitespace();
// testValue = "This contains too much whitespace."
myString = Regex.Replace(myString, " {2,}", " ");
For those, who don't like Regex, here is a method that uses the StringBuilder:
public static string FilterWhiteSpaces(string input)
{
if (input == null)
return string.Empty;
StringBuilder stringBuilder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i == 0 || c != ' ' || (c == ' ' && input[i - 1] != ' '))
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
In my tests, this method was 16 times faster on average with a very large set of small-to-medium sized strings, compared to a static compiled Regex. Compared to a non-compiled or non-static Regex, this should be even faster.
Keep in mind, that it does not remove leading or trailing spaces, only multiple occurrences of such.
This is a shorter version, which should only be used if you are only doing this once, as it creates a new instance of the Regex class every time it is called.
temp = new Regex(" {2,}").Replace(temp, " ");
If you are not too acquainted with regular expressions, here's a short explanation:
The {2,} makes the regex search for the character preceding it, and finds substrings between 2 and unlimited times.
The .Replace(temp, " ") replaces all matches in the string temp with a space.
If you want to use this multiple times, here is a better option, as it creates the regex IL at compile time:
Regex singleSpacify = new Regex(" {2,}", RegexOptions.Compiled);
temp = singleSpacify.Replace(temp, " ");
You can simply do this in one line solution!
string s = "welcome to london";
s.Replace(" ", "()").Replace(")(", "").Replace("()", " ");
You can choose other brackets (or even other characters) if you like.
no Regex, no Linq... removes leading and trailing spaces as well as reducing any embedded multiple space segments to one space
string myString = " 0 1 2 3 4 5 ";
myString = string.Join(" ", myString.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries));
result:"0 1 2 3 4 5"
// Mysample string
string str ="hi you are a demo";
//Split the words based on white sapce
var demo= str .Split(' ').Where(s => !string.IsNullOrWhiteSpace(s));
//Join the values back and add a single space in between
str = string.Join(" ", demo);
// output: string str ="hi you are a demo";
Consolodating other answers, per Joel, and hopefully improving slightly as I go:
You can do this with Regex.Replace():
string s = Regex.Replace (
" 1 2 4 5",
#"[ ]{2,}",
" "
);
Or with String.Split():
static class StringExtensions
{
public static string Join(this IList<string> value, string separator)
{
return string.Join(separator, value.ToArray());
}
}
//...
string s = " 1 2 4 5".Split (
" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries
).Join (" ");
I just wrote a new Join that I like, so I thought I'd re-answer, with it:
public static string Join<T>(this IEnumerable<T> source, string separator)
{
return string.Join(separator, source.Select(e => e.ToString()).ToArray());
}
One of the cool things about this is that it work with collections that aren't strings, by calling ToString() on the elements. Usage is still the same:
//...
string s = " 1 2 4 5".Split (
" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries
).Join (" ");
Many answers are providing the right output but for those looking for the best performances, I did improve Nolanar's answer (which was the best answer for performance) by about 10%.
public static string MergeSpaces(this string str)
{
if (str == null)
{
return null;
}
else
{
StringBuilder stringBuilder = new StringBuilder(str.Length);
int i = 0;
foreach (char c in str)
{
if (c != ' ' || i == 0 || str[i - 1] != ' ')
stringBuilder.Append(c);
i++;
}
return stringBuilder.ToString();
}
}
Use the regex pattern
[ ]+ #only space
var text = Regex.Replace(inputString, #"[ ]+", " ");
I know this is pretty old, but ran across this while trying to accomplish almost the same thing. Found this solution in RegEx Buddy. This pattern will replace all double spaces with single spaces and also trim leading and trailing spaces.
pattern: (?m:^ +| +$|( ){2,})
replacement: $1
Its a little difficult to read since we're dealing with empty space, so here it is again with the "spaces" replaced with a "_".
pattern: (?m:^_+|_+$|(_){2,}) <-- don't use this, just for illustration.
The "(?m:" construct enables the "multi-line" option. I generally like to include whatever options I can within the pattern itself so it is more self contained.
I can remove whitespaces with this
while word.contains(" ") //double space
word = word.Replace(" "," "); //replace double space by single space.
word = word.trim(); //to remove single whitespces from start & end.
Without using regular expressions:
while (myString.IndexOf(" ", StringComparison.CurrentCulture) != -1)
{
myString = myString.Replace(" ", " ");
}
OK to use on short strings, but will perform badly on long strings with lots of spaces.
try this method
private string removeNestedWhitespaces(char[] st)
{
StringBuilder sb = new StringBuilder();
int indx = 0, length = st.Length;
while (indx < length)
{
sb.Append(st[indx]);
indx++;
while (indx < length && st[indx] == ' ')
indx++;
if(sb.Length > 1 && sb[0] != ' ')
sb.Append(' ');
}
return sb.ToString();
}
use it like this:
string test = removeNestedWhitespaces("1 2 3 4 5".toCharArray());
Here is a slight modification on Nolonar original answer.
Checking if the character is not just a space, but any whitespace, use this:
It will replace any multiple whitespace character with a single space.
public static string FilterWhiteSpaces(string input)
{
if (input == null)
return string.Empty;
var stringBuilder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i == 0 || !char.IsWhiteSpace(c) || (char.IsWhiteSpace(c) &&
!char.IsWhiteSpace(strValue[i - 1])))
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
How about going rogue?
public static string MinimizeWhiteSpace(
this string _this)
{
if (_this != null)
{
var returned = new StringBuilder();
var inWhiteSpace = false;
var length = _this.Length;
for (int i = 0; i < length; i++)
{
var character = _this[i];
if (char.IsWhiteSpace(character))
{
if (!inWhiteSpace)
{
inWhiteSpace = true;
returned.Append(' ');
}
}
else
{
inWhiteSpace = false;
returned.Append(character);
}
}
return returned.ToString();
}
else
{
return null;
}
}
Mix of StringBuilder and Enumerable.Aggregate() as extension method for strings:
using System;
using System.Linq;
using System.Text;
public static class StringExtension
{
public static string CondenseSpaces(this string s)
{
return s.Aggregate(new StringBuilder(), (acc, c) =>
{
if (c != ' ' || acc.Length == 0 || acc[acc.Length - 1] != ' ')
acc.Append(c);
return acc;
}).ToString();
}
public static void Main()
{
const string input = " (five leading spaces) (five internal spaces) (five trailing spaces) ";
Console.WriteLine(" Input: \"{0}\"", input);
Console.WriteLine("Output: \"{0}\"", StringExtension.CondenseSpaces(input));
}
}
Executing this program produces the following output:
Input: " (five leading spaces) (five internal spaces) (five trailing spaces) "
Output: " (five leading spaces) (five internal spaces) (five trailing spaces) "
Old skool:
string oldText = " 1 2 3 4 5 ";
string newText = oldText
.Replace(" ", " " + (char)22 )
.Replace( (char)22 + " ", "" )
.Replace( (char)22 + "", "" );
Assert.That( newText, Is.EqualTo( " 1 2 3 4 5 " ) );
You can create a StringsExtensions file with a method like RemoveDoubleSpaces().
StringsExtensions.cs
public static string RemoveDoubleSpaces(this string value)
{
Regex regex = new Regex("[ ]{2,}", RegexOptions.None);
value = regex.Replace(value, " ");
// this removes space at the end of the value (like "demo ")
// and space at the start of the value (like " hi")
value = value.Trim(' ');
return value;
}
And then you can use it like this:
string stringInput =" hi here is a demo ";
string stringCleaned = stringInput.RemoveDoubleSpaces();
I looked over proposed solutions, could not find the one that would handle mix of white space characters acceptable for my case, for example:
Regex.Replace(input, #"\s+", " ") - it will eat your line breaks, if they are mixed with spaces, for example \n \n sequence will be replaced with
Regex.Replace(source, #"(\s)\s+", "$1") - it will depend on whitespace first character, meaning that it again might eat your line breaks
Regex.Replace(source, #"[ ]{2,}", " ") - it won't work correctly when there's mix of whitespace characters - for example "\t \t "
Probably not perfect, but quick solution for me was:
Regex.Replace(input, #"\s+",
(match) => match.Value.IndexOf('\n') > -1 ? "\n" : " ", RegexOptions.Multiline)
Idea is - line break wins over the spaces and tabs.
This won't handle windows line breaks correctly, but it would be easy to adjust to work with that too, don't know regex that well - may be it is possible to fit into single pattern.
The following code remove all the multiple spaces into a single space
public string RemoveMultipleSpacesToSingle(string str)
{
string text = str;
do
{
//text = text.Replace(" ", " ");
text = Regex.Replace(text, #"\s+", " ");
} while (text.Contains(" "));
return text;
}