Remove all instances of a character from string - c#

I want to remove all instances of a character from a string except where that character is followed by any form or whitespace. I haven't written unit tests yet but it seems like the code below achieves what I want (possibly forgetting an edge case or two which is ok for now). It feels pretty clunky though. Can anyone suggest an improvement?
public string Strip(string text, char c)
{
if (!string.IsNullOrEmpty(text))
{
bool characterIsInString = true;
int currentIndex = 0;
while (characterIsInString)
{
currentIndex = text.IndexOf(c, currentIndex + 1);
if (currentIndex != -1)
{
var charAfter = text.Substring(currentIndex + 1, 1);
if (charAfter != " ")
{
text = text.Remove(currentIndex, 1);
}
}
else
{
characterIsInString = false;
}
}
}
return text;
}

You can use Regular Expression( here i have assumed that character is x):
string result = Regex.Replace( input , "x(?=\\S)" , "");
Live Demo. Please check that it uses very few steps finding it.

I suggest, you use the following code:
public string Strip(string text, char c)
{
Regex regex = new Regex(c.ToString() + #"[^\s]");
return regex.Replace(text, "");
}
This will remove the char in text if it's not followed by a White Space.
This is a very simple and fast regex.

if you want replace character
you can replace char with Replace method
Follow code:
Console.ReadLine().Replace("e","h");

Related

How to replace a particular character from string in c#?

I have a string like AX_1234X_12345_X_CXY, I want to remove X from after the first underscore _ i.e. from 1234X to 1234. So final output will be like AX_1234_12345_X_CXY. How to do it?? If I use .Replace("X", "") it will replace all X which I don't want
You can iterate trough the string from the first occurrence of '_' .
you can find the first occurrence of '_' using IndexOf().
when loop will get to 'X' it will not append it to the "fixed string".
private static void Func()
{
string Original = "AX_1234X_12345_X_CXY";
string Fixed = Original.Substring(0, Original.IndexOf("_", 0));
// in case you want to remove all 'X`s' after first occurrence of `'_'`
// just dont use that variable
bool found = false;
for (int i = Original.IndexOf("_", 0); i < Original.Length; i++)
{
if (Original[i].ToString()=="X" && found == false)
{
found = true;
}
else
{
Fixed += Original[i];
}
}
Console.WriteLine(Fixed);
Console.ReadLine();
}
Why not good old IndexOf and Substring?
string s = "AX_1234X_12345_X_CXY";
int pUnder = s.IndexOf('_');
if (pUnder >= 0) { // we have underscope...
int pX = s.IndexOf('X', pUnder + 1); // we should search for X after the underscope
if (pX >= 0) // ...as well as X after the underscope
s = s.Substring(0, pX) + s.Substring(pX + 1);
}
Console.Write(s);
Outcome:
AX_1234_12345_X_CXY
string original = #"AX_1234X_12345_X_CXY";
original = #"AX_1234_12345_X_CXY";
One way is String.Remove, because you can tell exactly where to remove from. If the offending "X" is always in the same place, you can use:
string newString = old.Remove(7,1);
This will remove 1 character starting as position 7 (counting from zero as the beginning of the string).
If not always in the same character position, you might try:
int xPos = old.IndexOf("X");
string newString = old.Remove(xPos,1);
EDIT:
Based on OP comment, the "X" we're targeting occurs just after the first underscore character, so let's index off of the first underscore:
int iPosUnderscore = old.IndexOf("_");
string newString = old.Remove(iPosUnderscore + 1 ,1); // start after the underscore
Try looking at string.IndexOf or string.IndexOfAny
string s = "AX_1234X_12345_X_CXY";
string ns = HappyChap(s);
public string HappyChap(string value)
{
int start = value.IndexOf("X_");
int next = start;
next = value.IndexOf("X_", start + 1);
if (next > 0)
{
value = value.Remove(next, 1);
}
return value;
}
If and only if this is always the format then it should be a simple matter of combining substrings of the original text without including the x in that position. But the op hasn't stated that this is always the case. So if this is always the format and the same character position is always removed then you could simply just
string s = "AX_1234X_12345_X_CXY";
string newstring = s.Substring(0, 7) + s.Substring(8);
OK, based on only the second set of numbers being variable in length, you could then do something like:
int startpos = s.IndexOf('_', 4);
string newstring = s.Substring(0, startpos - 1) + s.Substring(startpos);
with this code, the following tests resulted in:
"AX_1234X_12345_X_CXY" became "AX_1234_12345_X_CXY"
"AX_123X_12345_X_CXY" became "AX_123_12345_X_CXY"
"AX_234X_12345_X_CXY" became "AX_234_12345_X_CXY"
"AX_1X_12345_X_CXY" became "AX_1_12345_X_CXY"
Something like this could work. I'm sure there's a more elegant solution.
string input1 = "AX_1234X_12345_X_CXY";
string pattern1 = "^[A-Z]{1,2}_[0-9]{1,4}(X)";
string newInput = string.Empty;
Match match = Regex.Match(input1, pattern1);
if(match.Success){
newInput = input1.Remove(match.Groups[1].Index, 1);
}
Console.WriteLine(newInput);

How to remove all but the first occurences of a character from a string?

I have a string of text and want to ensure that it contains at most one single occurrence of a specific character (,). Therefore I want to keep the first one, but simply remove all further occurrences of that character.
How could I do this the most elegant way using C#?
This works, but not the most elegant for sure :-)
string a = "12,34,56,789";
int pos = 1 + a.IndexOf(',');
return a.Substring(0, pos) + a.Substring(pos).Replace(",", string.Empty);
You could use a counter variable and a StringBuilder to create the new string efficiently:
var sb = new StringBuilder(text.Length);
int maxCount = 1;
int currentCount = 0;
char specialChar = ',';
foreach(char c in text)
if(c != specialChar || ++currentCount <= maxCount)
sb.Append(c);
text = sb.ToString();
This approach is not the shortest but it's efficient and you can specify the char-count to keep.
Here's a more "elegant" way using LINQ:
int commasFound = 0; int maxCommas = 1;
text = new string(text.Where(c => c != ',' || ++commasFound <= maxCommas).ToArray());
I don't like it because it requires to modify a variable from a query, so it's causing a side-effect.
Regular expressions are elegant, right?
Regex.Replace("Eats, shoots, and leaves.", #"(?<=,.*),", "");
This replaces every comma, as long as there is a comma before it, with nothing.
(Actually, it's probably not elegant - it may only be one line of code, but it may also be O(n^2)...)
If you don't deal with large strings and you reaaaaaaly like Linq oneliners:
public static string KeepFirstOccurence (this string #string, char #char)
{
var index = #string.IndexOf(#char);
return String.Concat(String.Concat(#string.TakeWhile(x => #string.IndexOf(x) < index + 1)), String.Concat(#string.SkipWhile(x=>#string.IndexOf(x) < index)).Replace(#char.ToString(), ""));
}
You could write a function like the following one that would split the string into two sections based on the location of what you were searching (via the String.Split() method) for and it would only remove matches from the second section (using String.Replace()) :
public static string RemoveAllButFirst(string s, string stuffToRemove)
{
// Check if the stuff to replace exists and if not, return the original string
var locationOfStuff = s.IndexOf(stuffToRemove);
if (locationOfStuff < 0)
{
return s;
}
// Calculate where to pull the first string from and then replace the rest of the string
var splitLocation = locationOfStuff + stuffToRemove.Length;
return s.Substring(0, splitLocation) + (s.Substring(splitLocation)).Replace(stuffToRemove,"");
}
You could simply call it by using :
var output = RemoveAllButFirst(input,",");
A prettier approach might actually involve building an extension method that handled this a bit more cleanly :
public static class StringExtensions
{
public static string RemoveAllButFirst(this string s, string stuffToRemove)
{
// Check if the stuff to replace exists and if not, return the
// original string
var locationOfStuff = s.IndexOf(stuffToRemove);
if (locationOfStuff < 0)
{
return s;
}
// Calculate where to pull the first string from and then replace the rest of the string
var splitLocation = locationOfStuff + stuffToRemove.Length;
return s.Substring(0, splitLocation) + (s.Substring(splitLocation)).Replace(stuffToRemove,"");
}
}
which would be called via :
var output = input.RemoveAllButFirst(",");
You can see a working example of it here.
static string KeepFirstOccurance(this string str, char c)
{
int charposition = str.IndexOf(c);
return str.Substring(0, charposition + 1) +
str.Substring(charposition, str.Length - charposition)
.Replace(c, ' ').Trim();
}
Pretty short with Linq; split string into chars, keep distinct set and join back to a string.
text = string.Join("", text.Select(c => c).Distinct());

How to remove certain characters

I'm trying to remove single vowels from a string, but not if a vowel is double same.
For example string
"I am keeping a foobar"
should print out as
"m keepng foobr"
I have tried everything but didn't come up with a solution so far.
Try:
Regex.Replace(input, #"([aeiou])\1", "");
Though for I am keeping a foobar, it will give you m keepng foobr, which is different to your required m keepng foobr, as you're stripped spaces out of your required result, too.
If you want to remove the extraneous spaces, then it's a three step operation: remove vowels; remove proceeding/trailing spaces; remove double spaces.
var raw = Regex.Replace(input, #"([aeiou])\1", "");
var trimmed = raw.Trim();
var final = trimmed.Replace(" ", " ");
You could try this logic:
loop trough string and check two by two characters
if (isBothVowelsAndEqual()) do nothing; else removeFirstChar();
EDIT:
public List<char> vowels = "AEIOUaeiou".ToList();
public bool isBothVowelsAndEqual(char first, char second)
{
return (first == second && vowels.Contains(first));
}
const string s = "I am keeeping a foobar";
string output=String.Empty;
for (int i = 0; i < s.Length-1; i++)
{
if (isBothVowelsAndEqual(s[i], s[i + 1]))
{
output = output + s[i] + s[i+1];
i++;
}
else
{
if (!vowels.Contains(s[i])) {
output += s[i];
}
}
}
Console.WriteLine(output.Trim());

How to remove all characters from a string before a specific character

Suppose I have a string A, for example:
string A = "Hello_World";
I want to remove all characters up to (and including) the _. The exact number of characters before the _ may vary. In the above example, A == "World" after removal.
string A = "Hello_World";
string str = A.Substring(A.IndexOf('_') + 1);
You have already received a perfectly fine answer. If you are willing to go one step further, you could wrap up the a.SubString(a.IndexOf('_') + 1) in a robust and flexible extension method:
public static string TrimStartUpToAndIncluding(this string str, char ch)
{
if (str == null) throw new ArgumentNullException("str");
int pos = str.IndexOf(ch);
if (pos >= 0)
{
return str.Substring(pos + 1);
}
else // the given character does not occur in the string
{
return str; // there is nothing to trim; alternatively, return `string.Empty`
}
}
which you would use like this:
"Hello_World".TrimStartUpToAndIncluding('_') == "World"
string a = "Hello_World";
a = a.Substring(a.IndexOf("_")+1);
try this? or is the A= part in your A=Hello_World included?
var foo = str.Substring(str.IndexOf('_') + 1);
string orgStr = "Hello_World";
string newStr = orgStr.Substring(orgStr.IndexOf('_') + 1);
you can do this by creating a substring.
simple exampe is here:
public static String removeTillWord(String input, String word) {
return input.substring(input.indexOf(word));
}
removeTillWord("I need this words removed taken please", "taken");

Remove additional spacing in string [Fastest Way]

I need to remove all additional spaces in a string.
I use regex for matching strings and matched strings i replace with some others.
For better understanding please see examples below:
3 input strings:
Hello, how are you?
Hello , how are you?
Hello , how are you ?
This are 3 strings that should match by one pattern-regex.
It looks something like this:
Hello\s*,\s+how\s+are\s+you\s*?
It works fine but there is a perfomance problem.
If I have a lot of patterns (~20k) and try to execute each pattern it runs very slow (3-5 minutes).
Maybe there is better way for doing this?
for example use some 3d-party libs?
UPD: Folks, this question is not about how to do this. It's about how to do this with best perfomance. :)
Let me explain more detailed. The main goal is tokenize text. (replace some token with special symbols)
For example I have a token "nice try".
Then I input text "this is nice try".
result: "this is #tokenizedtext#" where #tokenizedtext# some special symbols. It doesen't matter in this case.
Next I have string "Mike said it was a nice try".
result should be "Mike said it was a #tokenizedtext#".
I think the main idea is clear.
So I can have a lot of tokens. When I process it I convert my token from "nice try" to pattern "nice\s+try". and try to replace with this pattern input text.
It works fine. But if in tokens there is more spaces and there is also punctuation then my regexes became bigger and works very slow.
Do you have some suggestions (technical or logic) for solving this problem?
I can suggest a few solutions.
First of all, avoid the static Regex method. Create an instance of it (and store it, don't call the constructor for each replacement!) and, if possible, use RegexOptions.Compiled. It should improve your performance.
Second, you can try to review your pattern. I'll do some profiling, but I'm currently undecisive between:
#"(?<=\s)\s+"
With replacement being an empty string or:
#"\s+"
With a space as a replacement. You can try this code, in the meanwhile:
var s = "Hello , how are you?";
var pattern = #"\s+";
var regex = new Regex(pattern, RegexOptions.Compiled);
var replaced = regex.Replace(s, " ");
EDIT: After having done some measurement, the second pattern seems to be faster. I'm editing my sample to adapt it.
EDIT 2: I've written an unsafe method. It's much faster than the other ones presented here, including the Regex ones, but, as the word itself says, it's unsafe. I don't think that there's any problem with the code I've written but I may be wrong -- So please, check it again and again in case there's a bug in the method.
static unsafe string TrimInternal(string input)
{
var length = input.Length;
var array = stackalloc char[length];
fixed (char* fix = input)
{
var ptr = fix;
var counter = 0;
var lastWasSpace = false;
while (*ptr != '\x0')
{
//Current char is a space?
var isSpace = *ptr == ' ';
//If it's a space but the last one wasn't
//Or if it's not a space
if (isSpace && !lastWasSpace || !isSpace)
//Write into the result array
array[counter++] = *ptr;
//The last character (before the next loop) was a space
lastWasSpace = isSpace;
//Increase the pointer
ptr++;
}
return new string(array, 0, counter);
}
}
Usage (compile with /unsafe):
var s = TrimInternal("Hello , how are you?");
Profiling made in Release build, optimizations on, 1000000 iterations:
My above solution with Regex: 00:00:03.2130121
The unsafe solution: 00:00:00.2063467
This might work for you. It should be pretty fast. Note that it also removes spaces at the end of the string; that might not be what you want...
using System;
namespace Demo
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello, how are you?"));
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello , how are you?"));
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello , how are you ?"));
}
public static string RemoveExtraSpaces(string text)
{
var buffer = new char[text.Length];
bool isSpaced = false;
int n = 0;
foreach (char c in text)
{
if (c == ' ')
{
isSpaced = true;
}
else
{
if (isSpaced)
{
if ((c != ',') && (c != '?'))
{
buffer[n++] = ' ';
}
isSpaced = false;
}
buffer[n++] = c;
}
}
return new string(buffer, 0, n);
}
}
}
Something of my own :
find all the position of WhiteSpacechar in string;
private static IEnumerable<int> GetWhiteSpacePos(string input)
{
int iPos = -1;
while ((iPos = input.IndexOf(" ", iPos + 1, StringComparison.Ordinal)) > -1)
{
yield return iPos;
}
}
Remove all whitespace that are in in sequence Returned from GetWhiteSpacePos
string original_string = "Hello , how are you ?";
var poss = GetWhiteSpacePos(original_string).ToList();
int startPos;
int endPos;
StringBuilder builder = new StringBuilder(original_string);
for (int i = poss.Count -1; i > 1; i--)
{
endPos = poss[i];
while ((poss[i] == poss[i - 1] + 1) && i > 1)
{
i--;
}
startPos = poss[i];
if (endPos - startPos > 1)
{
builder.Remove(startPos, endPos - startPos);
}
}
string new_string = builder.ToString();
You are using a very complex regex..simplify the regex and that would definitely increasre the performance
Use \s+ and replace it with a single space
Well, these kind of problems really trouble us. Use this code, and I'm sure you're getting the result for what you've asked. This command removes any extra white space between any string.
cleanString= Regex.Replace(originalString, #"\s", " ");
Hope thar works for you. Thanks.
And since this is a single Instruction. It will utilize less CPU resource and hence less CPU time, which ultimately increases your performance. Therefore A/C to me this method works the best when compared in terms of performance.
if its just a matter of SPACE;
try this
Source : http://www.codeproject.com/Articles/10890/Fastest-C-Case-Insenstive-String-Replace
private static string ReplaceEx(string original,
string pattern, string replacement)
{
int count, position0, position1;
count = position0 = position1 = 0;
string upperString = original.ToUpper();
string upperPattern = pattern.ToUpper();
int inc = (original.Length / pattern.Length) *
(replacement.Length - pattern.Length);
char[] chars = new char[original.Length + Math.Max(0, inc)];
while ((position1 = upperString.IndexOf(upperPattern,
position0)) != -1)
{
for (int i = position0; i < position1; ++i)
chars[count++] = original[i];
for (int i = 0; i < replacement.Length; ++i)
chars[count++] = replacement[i];
position0 = position1 + pattern.Length;
}
if (position0 == 0) return original;
for (int i = position0; i < original.Length; ++i)
chars[count++] = original[i];
return new string(chars, 0, count);
}
Usage:
string original_string = "Hello , how are you ?";
while (original_string.Contains(" "))
{
original_string = ReplaceEx(original_string, " ", " ");
}
Replacing the regex way:
string resultString = null;
try {
resultString = Regex.Replace(subjectString, #"\s+", " ", RegexOption.Compiled);
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Categories

Resources