c# replace string if it is not a substring - c#

I am processing files in order to replace a list of pre-defined keywords with a pre- and a post-string (say "#" and ".") like this :
"Word Word2 anotherWord and some other stuff" should become "#Word. #Word2. #anotherWord. and some other stuff"
My keys are unique and processed the keys from longest key to smallest, so I know inclusion can only be on already
However, if I have key inclusion (e.g. Word2 contains Word), and if I do
"Word Word2 anotherWord and some other stuff"
.Replace("anotherWord", "#anotherWord.")
.Replace("Word2", "#Word2.")
.Replace("Word", "#Word.")
I get the following result:
"#Word. ##Word.2. #another#Word.. and some other stuff"
For sure, my approach isn't wokring. So what is the way to make sure I only replace a key in the string, if it is NOT contained in another key? I tried RegExp but didn't find the correct way. Or there is another solution?

Just use Regular expressions with word boundary if performance is not a key requirement:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace Subst
{
public class Program
{
public static void Main(string[] args)
{
var map = new Dictionary<string, string>{
{"Word", "#Word."},
{"anotherWord", "#anotherWord."},
{"Word2", "#Word2."}
};
var input = "Word Word2 anotherWord and some other stuff";
foreach(var mapping in map) {
input = Regex.Replace(input, String.Format("\\b{0}\\b", mapping.Key), Regex.Escape(mapping.Value));
}
Console.WriteLine(input);
}
}
}

One way is to use
string myString = String.Format("ORIGINAL TEXT {1} {2}", "TEXT TO PUT INSIDE CURLY BRACKET 1", "TEXT TO PUT IN CURLY BRACKET 2");
//Result: "ORIGINAL TEXT TEXT TO PUT INSIDE CURLY BRACKET 1 TEXT TO PUT IN CURLY BRACKET 2"
However, this requires your original text to have the curly brackets inside in the first place.
Quite messy, but you could always replace the words you are looking for with the Replace and then change the curly backets at the same time. There is probably a far better way of doing this but I cant think of it right now.

I suggest direct implementation, e.g.
private static String MyReplace(string value, params Tuple<string, string>[] substitutes) {
if (string.IsNullOrEmpty(value))
return value;
else if (null == substitutes || !substitutes.Any())
return value;
int start = 0;
StringBuilder sb = new StringBuilder();
while (true) {
int at = -1;
Tuple<string, string> best = null;
foreach (var pair in substitutes) {
int index = value.IndexOf(pair.Item1, start);
if (index >= 0)
if (best == null ||
index < at ||
index == at && best.Item1.Length < pair.Item1.Length) {
at = index;
best = pair;
}
}
if (best == null) {
sb.Append(value.Substring(start));
break;
}
sb.Append(value.Substring(start, at - start));
sb.Append(best.Item2);
start = best.Item1.Length + at;
}
return sb.ToString();
}
Test
string source = "Word Word2 anotherWord and some other stuff";
var result = MyReplace(source,
new Tuple<string, string>("anotherWord", "#anotherWord."),
new Tuple<string, string>("Word2", "#Word2."),
new Tuple<string, string>("Word", "#Word."));
Console.WriteLine(result);
Outcome:
#Word. #Word2. #anotherWord. and some other stuff

Regex alternative (order doesn't matter):
var result = Regex.Replace("Word Word2 anotherWord and some other stuff", #"\b\S+\b", m =>
m.Value == "anotherWord" ? "#anotherWord." :
m.Value == "Word2" ? "#Word2." :
m.Value == "Word" ? "#Word." : m.Value)
Or separate:
string s = "Word Word2 anotherWord and some other stuff";
s = Regex.Replace(s, #"\b" + Regex.Escape("anotherWord") + #"\b", "#anotherWord.");
s = Regex.Replace(s, #"\b" + Regex.Escape("Word2") + #"\b", "#Word2.");
s = Regex.Replace(s, #"\b" + Regex.Escape("Word") + #"\b", "#Word.");

Solved the problem using a two-loop-through approach as follows...
List<string> keys = new List<string>();
keys.Add("Word1"); // ... and so on
// IMPORTANT: algorithm works only when we are sure that one key cannot be
// included in another key with higher index. Also, uniqueness is
// guaranteed by construction, although the routine would work
// duplicate key...!
keys = keys.OrderByDescending(x => x.Length).ThenBy(x => x).ToList<string>();
// first loop: replace with some UNIQUE key hash in text
foreach(string key in keys) {
txt.Replace(key, string.Format("!#someUniqueKeyNotInKeysAndNotInTXT_{0}_#!", keys.IndexOf(key)));
}
// second loop: replace UNIQUE key hash with corresponding values...
foreach(string key in keys) {
txt.Replace(string.Format("!#someUniqueKeyNotInKeysAndNotInTXT_{0}_#!", keys.IndexOf(key)), string.Format("{0}{1}{2}", preStr, key, postStr));
}

You can split your string by ' ' and cycle through the string array. Compare each index of the array to your replacement strings and then concatenate them when finished.
string newString = "Word Word2 anotherWord and some other stuff";
string[] split = newString.Split(' ');
foreach (var s in split){
if(s == "Word"){
s = "#Word";
} else if(s == "Word2"){
s = "#Word2";
} else if(s == "anotherWord"){
s = "#anotherWord";
}
}
string finalString = string.Concat(split);

Related

c# get the first ';' after parentheses

i feel dumb for asking a most likely silly question.
I am helping someone getting the results he wishes for his custom compiler that reads all lines of an xml file in one string so it will look like below, and since he wants it to "Support" to call variables inside the array worst case scenario would look like below:
"Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];"
What i need is to find the first ";" after "[" and "]" and split it, so i stand with this:
"Var1 = [5,4,3,2];
It will also have to support multiple "[", "]" for example:
"Var2 = [5,Var1,[4],2];"
EDIT: There may also be Data in between the last "]" and ";"
For example:
"Var2 = [5,[4],2]Var1;
What can i do here? Im kind of stuck.
You can try regular expressions, e.g.
string source = "Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];";
// 1. final (or the only) chunk doesn't necessary contain '];':
// "abc" -> "abc"
// 2. chunk has at least one symbol except '];'
string pattern = ".+?(][a-zA-Z0-9]*;|$)";
var items = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Console.Write(string.Join(Environment.NewLine, items));
Outcome:
Var1 = [5,4,3,2]abc123;
Var2 = [2,8,6,Var1;4];
^([^;]+);
This regex should work for all.
You can use it like here:
string[] lines =
{
"Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];",
"Var2 = [5,[4],2]Var1; Var2 = [2,8,6,Var1;4];"
};
Regex pattern = new Regex(#"^([^;]+);");
foreach (string s in lines){
Match match = pattern.Match(s);
if (match.Success)
{
Console.WriteLine(match.Value);
}
}
The explanation is:
^ means starts with and is [^;] anything but a semicolon
+ means repeated one or more times and is ; followed by a semicolon
This will find Var1 = [5,4,3,2]; as well as Var1 = [5,4,3,2];
You can see the output HERE
public static string Extract(string str, char splitOn)
{
var split = false;
var count = 0;
var bracketCount = 0;
foreach (char c in str)
{
count++;
if (split && c == splitOn)
return str.SubString(0, count);
if (c == '[')
{
bracketCount++;
split = false;
}
else if (c == ']')
{
bracketCount--;
if (bracketCount == 0)
{
split = true;
}
else if (bracketCount < 0)
throw new FormatException(); //?
}
}
return str;
}

Splitting a line of text that has key value pairs where value can be empty

I need to split a line of text
The general syntax for a delivery instruction is |||name|value||name|value||…..|||
Each delivery instruction starts and ends with 3 pipe characters - |||
A delivery instruction is a set of name/value pairs separated by a single pipe eg name|value
Each name value pair is separated by 2 pipe characters ||
Names and Values may not contain the pipe character
The value of any pair may be a blank string.
I need a regex that will help me resolve the above problem.
My latest attempt with my limited Regex skills:
string SampleData = "|||env|af245g||mail_idx|39||gen_date|2016/01/03 11:40:06||docm_name|Client Statement (01.03.2015−31.03.2015)||docm_cat_name|Client Statement||docm_type_id|9100||docm_type_name|Client Statement||addr_type_id|1||addr_type_name|Postal address||addr_street_nr|||addr_street_name|Robinson Road||addr_po_box|||addr_po_box_type|||addr_postcode|903334||addr_city|Singapore||addr_state|||addr_country_id|29955||addr_country_name|Singapore||obj_nr|10000023||bp_custr_type|Customer||access_portal|Y||access_library|Y||avsr_team_id|13056||pri_avsr_id|||pri_avsr_name|||ctact_phone|||dlv_type_id|5001||dlv_type_name|Channel to standard mail||ao_id|14387||ao_name|Corp Limited||ao_title|||ao_mob_nr|||ao_email_addr||||??";
string[] Split = Regex.Matches(SampleData, "(\|\|\|(?:\w+\|\w*\|\|)*\|)").Cast<Match>().Select(m => m.Value).ToArray();
The expected output should be as follows(based on the sample data string provided):
env|af245g
mail_idx|39
gen_date|2016/01/03 11:40:06
docm_name|Client Statement (01.03.2015−31.03.2015)
docm_cat_name|Client Statement
docm_type_id|9100
docm_type_name|Client Statement
addr_type_id|1
addr_type_name|Postal address
addr_street_nr|
addr_street_name|Robinson Road
addr_po_box|
addr_po_box_type|
addr_postcode|903334
addr_city|Singapore
addr_state|
addr_country_id|29955
addr_country_name|Singapore
obj_nr|10000023
bp_custr_type|Customer
access_portal|Y
access_library|Y
avsr_team_id|13056
pri_avsr_id|
pri_avsr_name|
ctact_phone|
dlv_type_id|5001
dlv_type_name|Channel to standard mail
ao_id|14387
ao_name|Corp Limited
ao_title|
ao_mob_nr|
ao_email_addr|
You can also do it without using Regex. Its just simple splitting.
string nameValues = "|||zeeshan|1||ali|2||ahsan|3|||";
string sub = nameValues.Substring(3, nameValues.Length - 6);
Dictionary<string, string> dic = new Dictionary<string, string>();
string[] subsub = sub.Split(new string[] {"||"}, StringSplitOptions.None);
foreach (string item in subsub)
{
string[] nameVal = item.Split('|');
dic.Add(nameVal[0], nameVal[1]);
}
foreach (var item in dic)
{
// Retrieve key and value here i.e:
// item.Key
// item.Value
}
Hope this helps.
I think you're making this more difficult than it needs to be. This regex yields the desired result:
#"[^|]+\|([^|]*)"
Assuming you're dealing with a single, well-formed delivery instruction, there's no need to match the starting and ending triple-pipes. You don't need to worry about the double-pipe separators either, because the "name" part of the "name|value" pair is always present. Just look for the first thing that looks like a name with a pipe following it, and everything up to the next pipe character is the value.
(?<=\|\|\|).*?(?=\|\|\|)
You can use this to get all the key value pairs between |||.See demo.
https://regex101.com/r/fM9lY3/59
string strRegex = #"(?<=\|\|\|).*?(?=\|\|\|)";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
string strTargetString = #"|||env|af245g||mail_idx|39||gen_date|2016/01/03 11:40:06||docm_name|Client Statement (01.03.2015−31.03.2015)||docm_cat_name|Client Statement||docm_type_id|9100||docm_type_name|Client Statement||addr_type_id|1||addr_type_name|Postal address||addr_street_nr|||addr_street_name|Robinson Road||addr_po_box|||addr_po_box_type|||addr_postcode|903334||addr_city|Singapore||addr_state|||addr_country_id|29955||addr_country_name|Singapore||obj_nr|10000023||bp_custr_type|Customer||access_portal|Y||access_library|Y||avsr_team_id|13056||pri_avsr_id|||pri_avsr_name|||ctact_phone|||dlv_type_id|5001||dlv_type_name|Channel to standard mail||ao_id|14387||ao_name|Corp Limited||ao_title|||ao_mob_nr|||ao_email_addr||||??";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}
Here's a variation of #Syed Muhammad Zeeshan code that runs faster:
string nameValues = "|||zeeshan|1||ali|2||ahsan|3|||";
string[] nameArray = nameValues.Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
Dictionary<string, string> dic = new Dictionary<string, string>();
int i = 0;
foreach (string item in nameArray)
{
if (i < nameArray.Length - 1)
dic.Add(nameArray[i], nameArray[i + 1]);
i = i + 2;
}
Interesting, I will like to try:
class Program
{
static void Main(string[] args)
{
string nameValueList = "|||zeeshan|1||ali|2||ahsan|3|||";
while (nameValueList != "|||")
{
nameValueList = nameValueList.TrimStart('|');
string nameValue = GetNameValue(ref nameValueList);
Console.WriteLine(nameValue);
}
Console.ReadLine();
}
private static string GetNameValue(ref string nameValues)
{
string retVal = string.Empty;
while(nameValues[0] != '|') // for name
{
retVal += nameValues[0];
nameValues = nameValues.Remove(0, 1);
}
retVal += nameValues[0];
nameValues = nameValues.Remove(0, 1);
while (nameValues[0] != '|') // for value
{
retVal += nameValues[0];
nameValues = nameValues.Remove(0, 1);
}
return retVal;
}
}
https://dotnetfiddle.net/WRbsRu

Alternatively upper- and lowercase words in a string

I use Visual Studio 2010 ver.
I have array strings [] = { "eat and go"};
I display it with foreach
I wanna convert strings like this : EAT and GO
Here my code:
Console.Write( myString.First().ToString().ToUpper() + String.Join("",myString].Skip(1)).ToLower()+ "\n");
But the output is : Eat and go . :D lol
Could you help me? I would appreciate it. Thanks
While .ToUpper() will convert a string to its upper case equivalent, calling .First() on a string object actually returns the first element of the string (since it's effectively a char[] under the hood). First() is actually exposed as a LINQ extension method and works on any collection type.
As with many string handling functions, there are a number of ways to handle it, and this is my approach. Obviously you'll need to validate value to ensure it's being given a long enough string.
using System.Text;
public string CapitalizeFirstAndLast(string value)
{
string[] words = value.Split(' '); // break into individual words
StringBuilder result = new StringBuilder();
// Add the first word capitalized
result.Append(words[0].ToUpper());
// Add everything else
for (int i = 1; i < words.Length - 1; i++)
result.Append(words[i]);
// Add the last word capitalized
result.Append(words[words.Length - 1].ToUpper());
return result.ToString();
}
If it's always gonna be a 3 words string, the you can simply do it like this:
string[] mystring = {"eat and go", "fast and slow"};
foreach (var s in mystring)
{
string[] toUpperLower = s.Split(' ');
Console.Write(toUpperLower.First().ToUpper() + " " + toUpperLower[1].ToLower() +" " + toUpperLower.Last().ToUpper());
}
If you want to continuously alternate, you can do the following:
private static string alternateCase( string phrase )
{
String[] words = phrase.split(" ");
StringBuilder builder = new StringBuilder();
//create a flag that keeps track of the case change
book upperToggle = true;
//loops through the words
for(into i = 0; i < words.length; i++)
{
if(upperToggle)
//converts to upper if flag is true
words[i] = words[i].ToUpper();
else
//converts to lower if flag is false
words[i] = words[i].ToLower();
upperToggle = !upperToggle;
//adds the words to the string builder
builder.append(words[i]);
}
//returns the new string
return builder.ToString();
}
Quickie using ScriptCS:
scriptcs (ctrl-c to exit)
> var input = "Eat and go";
> var words = input.Split(' ');
> var result = string.Join(" ", words.Select((s, i) => i % 2 == 0 ? s.ToUpperInvariant() : s.ToLowerInvariant()));
> result
"EAT and GO"

How can I convert text to Pascal case?

I have a variable name, say "WARD_VS_VITAL_SIGNS", and I want to convert it to Pascal case format: "WardVsVitalSigns"
WARD_VS_VITAL_SIGNS -> WardVsVitalSigns
How can I make this conversion?
You do not need a regular expression for that.
var yourString = "WARD_VS_VITAL_SIGNS".ToLower().Replace("_", " ");
TextInfo info = CultureInfo.CurrentCulture.TextInfo;
yourString = info.ToTitleCase(yourString).Replace(" ", string.Empty);
Console.WriteLine(yourString);
Here is my quick LINQ & regex solution to save someone's time:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public string ToPascalCase(string original)
{
Regex invalidCharsRgx = new Regex("[^_a-zA-Z0-9]");
Regex whiteSpace = new Regex(#"(?<=\s)");
Regex startsWithLowerCaseChar = new Regex("^[a-z]");
Regex firstCharFollowedByUpperCasesOnly = new Regex("(?<=[A-Z])[A-Z0-9]+$");
Regex lowerCaseNextToNumber = new Regex("(?<=[0-9])[a-z]");
Regex upperCaseInside = new Regex("(?<=[A-Z])[A-Z]+?((?=[A-Z][a-z])|(?=[0-9]))");
// replace white spaces with undescore, then replace all invalid chars with empty string
var pascalCase = invalidCharsRgx.Replace(whiteSpace.Replace(original, "_"), string.Empty)
// split by underscores
.Split(new char[] { '_' }, StringSplitOptions.RemoveEmptyEntries)
// set first letter to uppercase
.Select(w => startsWithLowerCaseChar.Replace(w, m => m.Value.ToUpper()))
// replace second and all following upper case letters to lower if there is no next lower (ABC -> Abc)
.Select(w => firstCharFollowedByUpperCasesOnly.Replace(w, m => m.Value.ToLower()))
// set upper case the first lower case following a number (Ab9cd -> Ab9Cd)
.Select(w => lowerCaseNextToNumber.Replace(w, m => m.Value.ToUpper()))
// lower second and next upper case letters except the last if it follows by any lower (ABcDEf -> AbcDef)
.Select(w => upperCaseInside.Replace(w, m => m.Value.ToLower()));
return string.Concat(pascalCase);
}
Example output:
"WARD_VS_VITAL_SIGNS" "WardVsVitalSigns"
"Who am I?" "WhoAmI"
"I ate before you got here" "IAteBeforeYouGotHere"
"Hello|Who|Am|I?" "HelloWhoAmI"
"Live long and prosper" "LiveLongAndProsper"
"Lorem ipsum dolor..." "LoremIpsumDolor"
"CoolSP" "CoolSp"
"AB9CD" "Ab9Cd"
"CCCTrigger" "CccTrigger"
"CIRC" "Circ"
"ID_SOME" "IdSome"
"ID_SomeOther" "IdSomeOther"
"ID_SOMEOther" "IdSomeOther"
"CCC_SOME_2Phases" "CccSome2Phases"
"AlreadyGoodPascalCase" "AlreadyGoodPascalCase"
"999 999 99 9 " "999999999"
"1 2 3 " "123"
"1 AB cd EFDDD 8" "1AbCdEfddd8"
"INVALID VALUE AND _2THINGS" "InvalidValueAnd2Things"
First off, you are asking for title case and not camel-case, because in camel-case the first letter of the word is lowercase and your example shows you want the first letter to be uppercase.
At any rate, here is how you could achieve your desired result:
string textToChange = "WARD_VS_VITAL_SIGNS";
System.Text.StringBuilder resultBuilder = new System.Text.StringBuilder();
foreach(char c in textToChange)
{
// Replace anything, but letters and digits, with space
if(!Char.IsLetterOrDigit(c))
{
resultBuilder.Append(" ");
}
else
{
resultBuilder.Append(c);
}
}
string result = resultBuilder.ToString();
// Make result string all lowercase, because ToTitleCase does not change all uppercase correctly
result = result.ToLower();
// Creates a TextInfo based on the "en-US" culture.
TextInfo myTI = new CultureInfo("en-US",false).TextInfo;
result = myTI.ToTitleCase(result).Replace(" ", String.Empty);
Note: result is now WardVsVitalSigns.
If you did, in fact, want camel-case, then after all of the above, just use this helper function:
public string LowercaseFirst(string s)
{
if (string.IsNullOrEmpty(s))
{
return string.Empty;
}
char[] a = s.ToCharArray();
a[0] = char.ToLower(a[0]);
return new string(a);
}
So you could call it, like this:
result = LowercaseFirst(result);
Single semicolon solution:
public static string PascalCase(this string word)
{
return string.Join("" , word.Split('_')
.Select(w => w.Trim())
.Where(w => w.Length > 0)
.Select(w => w.Substring(0,1).ToUpper() + w.Substring(1).ToLower()));
}
Extension method for System.String with .NET Core compatible code by using System and System.Linq.
Does not modify the original string.
.NET Fiddle for the code below
using System;
using System.Linq;
public static class StringExtensions
{
/// <summary>
/// Converts a string to PascalCase
/// </summary>
/// <param name="str">String to convert</param>
public static string ToPascalCase(this string str){
// Replace all non-letter and non-digits with an underscore and lowercase the rest.
string sample = string.Join("", str?.Select(c => Char.IsLetterOrDigit(c) ? c.ToString().ToLower() : "_").ToArray());
// Split the resulting string by underscore
// Select first character, uppercase it and concatenate with the rest of the string
var arr = sample?
.Split(new []{'_'}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => $"{s.Substring(0, 1).ToUpper()}{s.Substring(1)}");
// Join the resulting collection
sample = string.Join("", arr);
return sample;
}
}
public class Program
{
public static void Main()
{
Console.WriteLine("WARD_VS_VITAL_SIGNS".ToPascalCase()); // WardVsVitalSigns
Console.WriteLine("Who am I?".ToPascalCase()); // WhoAmI
Console.WriteLine("I ate before you got here".ToPascalCase()); // IAteBeforeYouGotHere
Console.WriteLine("Hello|Who|Am|I?".ToPascalCase()); // HelloWhoAmI
Console.WriteLine("Live long and prosper".ToPascalCase()); // LiveLongAndProsper
Console.WriteLine("Lorem ipsum dolor sit amet, consectetur adipiscing elit.".ToPascalCase()); // LoremIpsumDolorSitAmetConsecteturAdipiscingElit
}
}
var xs = "WARD_VS_VITAL_SIGNS".Split('_');
var q =
from x in xs
let first_char = char.ToUpper(x[0])
let rest_chars = new string(x.Skip(1).Select(c => char.ToLower(c)).ToArray())
select first_char + rest_chars;
Some answers are correct but I really don't understand why they set the text to LowerCase first, because the ToTitleCase will handle that automatically:
var text = "WARD_VS_VITAL_SIGNS".Replace("_", " ");
TextInfo textInfo = CultureInfo.CurrentCulture.TextInfo;
text = textInfo.ToTitleCase(text).Replace(" ", string.Empty);
Console.WriteLine(text);
You can use this:
public static string ConvertToPascal(string underScoreString)
{
string[] words = underScoreString.Split('_');
StringBuilder returnStr = new StringBuilder();
foreach (string wrd in words)
{
returnStr.Append(wrd.Substring(0, 1).ToUpper());
returnStr.Append(wrd.Substring(1).ToLower());
}
return returnStr.ToString();
}
This answer understands that there are Unicode categories which can be tapped while processing the text to ignore the connecting characters such as - or _. In regex parlance it is \p (for category) then the type which is {Pc} for punctuation and connector type character; \p{Pc} using our MatchEvaluator which is kicked off for each match within a session.
So during the match phase, we get words and ignore the punctuations, so the replace operation handles the removal of the connector character. Once we have the match word, we can push it down to lowercase and then only up case the first character as the return for the replace:
public static class StringExtensions
{
public static string ToPascalCase(this string initial)
=> Regex.Replace(initial,
// (Match any non punctuation) & then ignore any punctuation
#"([^\p{Pc}]+)[\p{Pc}]*",
new MatchEvaluator(mtch =>
{
var word = mtch.Groups[1].Value.ToLower();
return $"{Char.ToUpper(word[0])}{word.Substring(1)}";
}));
}
Usage:
"TOO_MUCH_BABY".ToPascalCase(); // TooMuchBaby
"HELLO|ITS|ME".ToPascalCase(); // HelloItsMe
See Word Character in Character Classes in Regular Expressions
Pc Punctuation, Connector. This category includes ten characters, the
most commonly used of which is the LOWLINE character (_), u+005F.
If you did want to replace any formatted string into a pascal case then you can do
public static string ToPascalCase(this string original)
{
string newString = string.Empty;
bool makeNextCharacterUpper = false;
for (int index = 0; index < original.Length; index++)
{
char c = original[index];
if(index == 0)
newString += $"{char.ToUpper(c)}";
else if (makeNextCharacterUpper)
{
newString += $"{char.ToUpper(c)}";
makeNextCharacterUpper = false;
}
else if (char.IsUpper(c))
newString += $" {c}";
else if (char.IsLower(c) || char.IsNumber(c))
newString += c;
else if (char.IsNumber(c))
newString += $"{c}";
else
{
makeNextCharacterUpper = true;
newString += ' ';
}
}
return newString.TrimStart().Replace(" ", "");
}
Tested with strings
I|Can|Get|A|String
ICan_GetAString
i-can-get-a-string
i_can_get_a_string
I Can Get A String
ICanGetAString
I found this gist useful after adding a ToLower() to it.
"WARD_VS_VITAL_SIGNS"
.ToLower()
.Split(new [] {"_"}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => char.ToUpperInvariant(s[0]) + s.Substring(1, s.Length - 1))
.Aggregate(string.Empty, (s1, s2) => s1 + s2)

How do I replace multiple spaces with a single space in C#?

How can I replace multiple spaces in a string with only one space in C#?
Example:
1 2 3 4 5
would be:
1 2 3 4 5
I like to use:
myString = Regex.Replace(myString, #"\s+", " ");
Since it will catch runs of any kind of whitespace (e.g. tabs, newlines, etc.) and replace them with a single space.
string sentence = "This is a sentence with multiple spaces";
RegexOptions options = RegexOptions.None;
Regex regex = new Regex("[ ]{2,}", options);
sentence = regex.Replace(sentence, " ");
string xyz = "1 2 3 4 5";
xyz = string.Join( " ", xyz.Split( new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries ));
I think Matt's answer is the best, but I don't believe it's quite right. If you want to replace newlines, you must use:
myString = Regex.Replace(myString, #"\s+", " ", RegexOptions.Multiline);
Another approach which uses LINQ:
var list = str.Split(' ').Where(s => !string.IsNullOrWhiteSpace(s));
str = string.Join(" ", list);
It's much simpler than all that:
while(str.Contains(" ")) str = str.Replace(" ", " ");
Regex can be rather slow even with simple tasks. This creates an extension method that can be used off of any string.
public static class StringExtension
{
public static String ReduceWhitespace(this String value)
{
var newString = new StringBuilder();
bool previousIsWhitespace = false;
for (int i = 0; i < value.Length; i++)
{
if (Char.IsWhiteSpace(value[i]))
{
if (previousIsWhitespace)
{
continue;
}
previousIsWhitespace = true;
}
else
{
previousIsWhitespace = false;
}
newString.Append(value[i]);
}
return newString.ToString();
}
}
It would be used as such:
string testValue = "This contains too much whitespace."
testValue = testValue.ReduceWhitespace();
// testValue = "This contains too much whitespace."
myString = Regex.Replace(myString, " {2,}", " ");
For those, who don't like Regex, here is a method that uses the StringBuilder:
public static string FilterWhiteSpaces(string input)
{
if (input == null)
return string.Empty;
StringBuilder stringBuilder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i == 0 || c != ' ' || (c == ' ' && input[i - 1] != ' '))
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
In my tests, this method was 16 times faster on average with a very large set of small-to-medium sized strings, compared to a static compiled Regex. Compared to a non-compiled or non-static Regex, this should be even faster.
Keep in mind, that it does not remove leading or trailing spaces, only multiple occurrences of such.
This is a shorter version, which should only be used if you are only doing this once, as it creates a new instance of the Regex class every time it is called.
temp = new Regex(" {2,}").Replace(temp, " ");
If you are not too acquainted with regular expressions, here's a short explanation:
The {2,} makes the regex search for the character preceding it, and finds substrings between 2 and unlimited times.
The .Replace(temp, " ") replaces all matches in the string temp with a space.
If you want to use this multiple times, here is a better option, as it creates the regex IL at compile time:
Regex singleSpacify = new Regex(" {2,}", RegexOptions.Compiled);
temp = singleSpacify.Replace(temp, " ");
You can simply do this in one line solution!
string s = "welcome to london";
s.Replace(" ", "()").Replace(")(", "").Replace("()", " ");
You can choose other brackets (or even other characters) if you like.
no Regex, no Linq... removes leading and trailing spaces as well as reducing any embedded multiple space segments to one space
string myString = " 0 1 2 3 4 5 ";
myString = string.Join(" ", myString.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries));
result:"0 1 2 3 4 5"
// Mysample string
string str ="hi you are a demo";
//Split the words based on white sapce
var demo= str .Split(' ').Where(s => !string.IsNullOrWhiteSpace(s));
//Join the values back and add a single space in between
str = string.Join(" ", demo);
// output: string str ="hi you are a demo";
Consolodating other answers, per Joel, and hopefully improving slightly as I go:
You can do this with Regex.Replace():
string s = Regex.Replace (
" 1 2 4 5",
#"[ ]{2,}",
" "
);
Or with String.Split():
static class StringExtensions
{
public static string Join(this IList<string> value, string separator)
{
return string.Join(separator, value.ToArray());
}
}
//...
string s = " 1 2 4 5".Split (
" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries
).Join (" ");
I just wrote a new Join that I like, so I thought I'd re-answer, with it:
public static string Join<T>(this IEnumerable<T> source, string separator)
{
return string.Join(separator, source.Select(e => e.ToString()).ToArray());
}
One of the cool things about this is that it work with collections that aren't strings, by calling ToString() on the elements. Usage is still the same:
//...
string s = " 1 2 4 5".Split (
" ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries
).Join (" ");
Many answers are providing the right output but for those looking for the best performances, I did improve Nolanar's answer (which was the best answer for performance) by about 10%.
public static string MergeSpaces(this string str)
{
if (str == null)
{
return null;
}
else
{
StringBuilder stringBuilder = new StringBuilder(str.Length);
int i = 0;
foreach (char c in str)
{
if (c != ' ' || i == 0 || str[i - 1] != ' ')
stringBuilder.Append(c);
i++;
}
return stringBuilder.ToString();
}
}
Use the regex pattern
[ ]+ #only space
var text = Regex.Replace(inputString, #"[ ]+", " ");
I know this is pretty old, but ran across this while trying to accomplish almost the same thing. Found this solution in RegEx Buddy. This pattern will replace all double spaces with single spaces and also trim leading and trailing spaces.
pattern: (?m:^ +| +$|( ){2,})
replacement: $1
Its a little difficult to read since we're dealing with empty space, so here it is again with the "spaces" replaced with a "_".
pattern: (?m:^_+|_+$|(_){2,}) <-- don't use this, just for illustration.
The "(?m:" construct enables the "multi-line" option. I generally like to include whatever options I can within the pattern itself so it is more self contained.
I can remove whitespaces with this
while word.contains(" ") //double space
word = word.Replace(" "," "); //replace double space by single space.
word = word.trim(); //to remove single whitespces from start & end.
Without using regular expressions:
while (myString.IndexOf(" ", StringComparison.CurrentCulture) != -1)
{
myString = myString.Replace(" ", " ");
}
OK to use on short strings, but will perform badly on long strings with lots of spaces.
try this method
private string removeNestedWhitespaces(char[] st)
{
StringBuilder sb = new StringBuilder();
int indx = 0, length = st.Length;
while (indx < length)
{
sb.Append(st[indx]);
indx++;
while (indx < length && st[indx] == ' ')
indx++;
if(sb.Length > 1 && sb[0] != ' ')
sb.Append(' ');
}
return sb.ToString();
}
use it like this:
string test = removeNestedWhitespaces("1 2 3 4 5".toCharArray());
Here is a slight modification on Nolonar original answer.
Checking if the character is not just a space, but any whitespace, use this:
It will replace any multiple whitespace character with a single space.
public static string FilterWhiteSpaces(string input)
{
if (input == null)
return string.Empty;
var stringBuilder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
char c = input[i];
if (i == 0 || !char.IsWhiteSpace(c) || (char.IsWhiteSpace(c) &&
!char.IsWhiteSpace(strValue[i - 1])))
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
How about going rogue?
public static string MinimizeWhiteSpace(
this string _this)
{
if (_this != null)
{
var returned = new StringBuilder();
var inWhiteSpace = false;
var length = _this.Length;
for (int i = 0; i < length; i++)
{
var character = _this[i];
if (char.IsWhiteSpace(character))
{
if (!inWhiteSpace)
{
inWhiteSpace = true;
returned.Append(' ');
}
}
else
{
inWhiteSpace = false;
returned.Append(character);
}
}
return returned.ToString();
}
else
{
return null;
}
}
Mix of StringBuilder and Enumerable.Aggregate() as extension method for strings:
using System;
using System.Linq;
using System.Text;
public static class StringExtension
{
public static string CondenseSpaces(this string s)
{
return s.Aggregate(new StringBuilder(), (acc, c) =>
{
if (c != ' ' || acc.Length == 0 || acc[acc.Length - 1] != ' ')
acc.Append(c);
return acc;
}).ToString();
}
public static void Main()
{
const string input = " (five leading spaces) (five internal spaces) (five trailing spaces) ";
Console.WriteLine(" Input: \"{0}\"", input);
Console.WriteLine("Output: \"{0}\"", StringExtension.CondenseSpaces(input));
}
}
Executing this program produces the following output:
Input: " (five leading spaces) (five internal spaces) (five trailing spaces) "
Output: " (five leading spaces) (five internal spaces) (five trailing spaces) "
Old skool:
string oldText = " 1 2 3 4 5 ";
string newText = oldText
.Replace(" ", " " + (char)22 )
.Replace( (char)22 + " ", "" )
.Replace( (char)22 + "", "" );
Assert.That( newText, Is.EqualTo( " 1 2 3 4 5 " ) );
You can create a StringsExtensions file with a method like RemoveDoubleSpaces().
StringsExtensions.cs
public static string RemoveDoubleSpaces(this string value)
{
Regex regex = new Regex("[ ]{2,}", RegexOptions.None);
value = regex.Replace(value, " ");
// this removes space at the end of the value (like "demo ")
// and space at the start of the value (like " hi")
value = value.Trim(' ');
return value;
}
And then you can use it like this:
string stringInput =" hi here is a demo ";
string stringCleaned = stringInput.RemoveDoubleSpaces();
I looked over proposed solutions, could not find the one that would handle mix of white space characters acceptable for my case, for example:
Regex.Replace(input, #"\s+", " ") - it will eat your line breaks, if they are mixed with spaces, for example \n \n sequence will be replaced with
Regex.Replace(source, #"(\s)\s+", "$1") - it will depend on whitespace first character, meaning that it again might eat your line breaks
Regex.Replace(source, #"[ ]{2,}", " ") - it won't work correctly when there's mix of whitespace characters - for example "\t \t "
Probably not perfect, but quick solution for me was:
Regex.Replace(input, #"\s+",
(match) => match.Value.IndexOf('\n') > -1 ? "\n" : " ", RegexOptions.Multiline)
Idea is - line break wins over the spaces and tabs.
This won't handle windows line breaks correctly, but it would be easy to adjust to work with that too, don't know regex that well - may be it is possible to fit into single pattern.
The following code remove all the multiple spaces into a single space
public string RemoveMultipleSpacesToSingle(string str)
{
string text = str;
do
{
//text = text.Replace(" ", " ");
text = Regex.Replace(text, #"\s+", " ");
} while (text.Contains(" "));
return text;
}

Categories

Resources