Remove character and space from a string - c#

I am trying to remove all the characters appearing on one string from another. Ideally resulting string will not contain two spaces next to each other, at very least removed characters must not be replaced with spaces (or any other invisible characters).
I come up with following code but some sort of a space is left behind if I do so (in addition to having multiple sequential spaces instead of " a "). There is a remove method as well but it required an index and hence will be complicating the solution.
String s1="aeiou";
String s2="This is a test string which could be any text";
Console.WriteLine(s2);
for (int i=0; i<s1.Length; i++)
{
if(s2.Contains(s1[i]))
{
s2= s2.Replace(s1[i],'\0');
}
}
Console.WriteLine(s2);
Output:
Expected Output:
Ths s tst strng whch cld b ny txt
I used '\0' as string.Replace() is expecting characters only and for version with the second argument to be string.Empty first argument must be string too (which requires conversion - shown as "variant 1" later).
I already took reference from these related/suggested as duplicates posts (Remove characters from C# string, Remove '\' char from string c#) and did not find any approach that completely satisfy me.
Variant 1 (based on most voted answer. This version requires converting each character I want to replace to string which I don't like:
String s1="aeiou";
String s2="This is a test string which could be any text";
Console.WriteLine(s2);
foreach(var c in s1)
{
s2 = s2.Replace(c.ToString(), string.Empty);
}
Console.WriteLine(s2);
Variant 2 - String.Join with String.Split (answer). Requires converting my source replace string into array when I'd prefer to avoid that.
String s1="aeiou";
String s2="This is a test string which could be any text";
s2 = String.Join("", s2.Split(s1.ToCharArray()));
Variant 3 - Regex.Replace (answer) - this is even more complicated than variant 2 as I need to convert my replace string into proper regular expression, potentially being totally broken for something like "^!" as string to replace (also not needed in this particular case):
String s1="aeiou";
String s2="This is a test string which could be any text";
s2 = Regex.Replace(s2, "["+s1+"]", String.Empty);
Console.WriteLine(s2);
Variant 4 using Linq with constructing string from resulting char array (answer requires converting resulting sequence into array before constructing the string (which ideally should be avoided):
String s1="aeiou";
String s2="This is a test string which could be any text";
s2 = new string(s2.Where(c => !s1.Contains(c)).ToArray());
Console.WriteLine(s2);
Variant 5 - using String.Concat (answer) which so far looks the best but using Linq (I prefer not to... also maybe there is no good reason to be concerned of using Linq here)
String s1="aeiou";
String s2="This is a test string which could be any text";
s2 = string.Concat(s2.Where(c => !s1.Contains(c)));
Console.WriteLine(s2);
None of the solution I come up remove duplicate spaces, all variant X version do remove characters just fine but have some issues for my case. Ideal answer will not create too many extra strings, no Linq and no extra conversions to arrays.

Assuming you want to exclude chars in a string, and replace multiple white spaces with a single space afterwards, you can use regex easily in 2 steps
string input = "This is a test string which could be any text";
string exclude = "aeiou";
var stripped = Regex.Replace(input, $"[{exclude}]", ""); // exclude chars
var cleaned = Regex.Replace(stripped, "[ ]{2,}", " "); // replace multiple spaces
Console.WriteLine(stripped);
Console.WriteLine(cleaned);
Output
Ths s tst strng whch cld b ny txt
Ths s tst strng whch cld b ny txt
Full Demo Here
Note: if your string can contain characters that need to be escaped in regex use Regex.Escape as shown in following answer - $"[{Regex.Escape(exclude)}]".

In your situation use StringBuilder, to build your result from s2:
String s1 = "aeiou";
String s2 = "This is a test string which could be any text";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s2.Length; i++)
{
// Check if current char is not contained in s1,
// then add it to sb
if (!s1.Contains(s2[i]))
{
sb.Append(s2[i]);
}
}
string result = sb.ToString();
Edit:
In order to remove spaces from string you can do:
string result = string.Join(" ", sb.ToString().Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
Output:
Ths s tst strng whch cld b ny txt
Also, here is LINQ solution for that:
var result = string.Concat(s2.Where(c => !s1.Contains(c)));
Also for this one, if you want to remove spaces in between words (you can create an extension method for that):
var raw = string.Concat(s2.Where(c => !s1.Contains(c)));
var result = string.Join(" ", raw.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
References: Enumerable.Where Method, String.Contains Method, String.Concat Method

Related

Regex replace special characters defind by client

I need a c# function which will replace all special characters customized by the client from a string Example
string value1 = #"‹¥ó׬¶ÝÆ";
string input1 = #"Thi¥s is\123a strÆing";
string output1 = Regex.Replace(input1, value1, "");
I want have a result like this : output1 =Thi s is\123a str ing
Why do you need regex? This is more efficient, concise also readable:
string result = string.Concat(input1.Except(value1));
If you don't want to remove but replace them with a different string you can still use a similar(but not as efficient) approach:
string replacement = "[foo]";
var newChars = input1.SelectMany(c => value1.Contains(c) ? replacement : c.ToString());
string result = string.Concat( newChars ); // Thi[foo]s is\123a str[foo]ing
Someone asked for a regex?
string value1 = #"^\-[]‹¥ó׬¶ÝÆ";
string input1 = #"T-^\hi¥s is\123a strÆing";
// Handles ]^-\ by escaping them
string value1b = Regex.Replace(value1, #"([\]\^\-\\])", #"\$1");
// Creates a [...] regex and uses it
string input1b = Regex.Replace(input1, "[" + value1b + "]", " ");
The basic idea is to use a [...] regex. But first you have to escape some characters that have special meaning inside a [...]. They should be ]^-\ Note that you don't need to escape the [
note that this solution isn't compatible with non-BMP unicode characters (characters that fill-up two char)
A solution that is compatible with them is more complex, but for normal use it shouldn't be a problem.

How to remove only certain substrings from a string?

Using C#, I have a string that is a SQL script containing multiple queries. I want to remove sections of the string that are enclosed in single quotes. I can do this using Regex.Replace, in this manner:
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
test = Regex.Replace(test, "'[^']*'", string.Empty);
Results in: "Only can we turn him to the of the Force"
What I want to do is remove the substrings between quotes EXCEPT for substrings containing a specific substring. For example, using the string above, I want to remove the quoted substrings except for those that contain "dark," such that the resulting string is:
Results in: "Only can we turn him to the 'dark side' of the Force"
How can this be accomplished using Regex.Replace, or perhaps by some other technique? I'm currently trying a solution that involves using Substring(), IndexOf(), and Contains().
Note: I don't care if the single quotes around "dark side" are removed or not, so the result could also be: "Only can we turn him to the dark side of the Force." I say this because a solution using Split() would remove all the single quotes.
Edit: I don't have a solution yet using Substring(), IndexOf(), etc. By "working on," I mean I'm thinking in my head how this can be done. I have no code, which is why I haven't posted any yet. Thanks.
Edit: VKS's solution below works. I wasn't escaping the \b the first attempt which is why it failed. Also, it didn't work unless I included the single quotes around the whole string as well.
test = Regex.Replace(test, "'(?![^']*\\bdark\\b)[^']*'", string.Empty);
'(?![^']*\bdark\b)[^']*'
Try this.See demo.Replace by empty string.You can use lookahead here to check if '' contains a word dark.
https://www.regex101.com/r/rG7gX4/12
While vks's solution works, I'd like to demonstrate a different approach:
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
test = Regex.Replace(test, #"'[^']*'", match => {
if (match.Value.Contains("dark"))
return match.Value;
// You can add more cases here
return string.Empty;
});
Or, if your condition is simple enough:
test = Regex.Replace(test, #"'[^']*'", match => match.Value.Contains("dark")
? match.Value
: string.Empty
);
That is, use a lambda to provide a callback for the replacement. This way, you can run arbitrary logic to replace the string.
some thing like this would work. you can add all strings you want to keep into the excludedStrings array
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
var excludedString = new string[] { "dark side" };
int startIndex = 0;
while ((startIndex = test.IndexOf('\'', startIndex)) >= 0)
{
var endIndex = test.IndexOf('\'', startIndex + 1);
var subString = test.Substring(startIndex, (endIndex - startIndex) + 1);
if (!excludedString.Contains(subString.Replace("'", "")))
{
test = test.Remove(startIndex, (endIndex - startIndex) + 1);
}
else
{
startIndex = endIndex + 1;
}
}
Another method through regex alternation operator |.
#"('[^']*\bdark\b[^']*')|'[^']*'"
Then replace the matched character with $1
DEMO
string str = "Only 'together' can we turn him to the 'dark side' of the Force";
string result = Regex.Replace(str, #"('[^']*\bdark\b[^']*')|'[^']*'", "$1");
Console.WriteLine(result);
IDEONE
Explanation:
(...) called capturing group.
'[^']*\bdark\b[^']*' would match all the single quoted strings which contains the substring dark . [^']* matches any character but not of ', zero or more times.
('[^']*\bdark\b[^']*'), because the regex is within a capturing group, all the matched characters are stored inside the group index 1.
| Next comes the regex alternation operator.
'[^']*' Now this matches all the remaining (except the one contains dark) single quoted strings. Note that this won't match the single quoted string which contains the substring dark because we already matched those strings with the pattern exists before to the | alternation operator.
Finally replacing all the matched characters with the chars inside group index 1 will give you the desired output.
I made this attempt that I think you were thinking about (some solution using split, Contain, ... without regex)
string test = "Only 'together' can we turn him to the 'dark side' of the Force";
string[] separated = test.Split('\'');
string result = "";
for (int i = 0; i < separated.Length; i++)
{
string str = separated[i];
str = str.Trim(); //trim the tailing spaces
if (i % 2 == 0 || str.Contains("dark")) // you can expand your condition
{
result += str+" "; // add space after each added string
}
}
result = result.Trim(); //trim the tailing space again

replacing characters in a single field of a comma-separated list

I have string in my c# code
a,b,c,d,"e,f",g,h
I want to replace "e,f" with "e f" i.e. ',' which is inside inverted comma should be replaced by space.
I tried using string.split but it is not working for me.
OK, I can't be bothered to think of a regex approach so I am going to offer an old fashioned loop approach which will work:
string DoReplace(string input)
{
bool isInner = false;//flag to detect if we are in the inner string or not
string result = "";//result to return
foreach(char c in input)//loop each character in the input string
{
if(isInner && c == ',')//if we are in an inner string and it is a comma, append space
result += " ";
else//otherwise append the character
result += c;
if(c == '"')//if we have hit an inner quote, toggle the flag
isInner = !isInner;
}
return result;
}
NOTE: This solution assumes that there can only be one level of inner quotes, for example you cannot have "a,b,c,"d,e,"f,g",h",i,j" - because that's just plain madness!
For the scenario where you only need to match one pair of letters, the following regex will work:
string source = "a,b,c,d,\"e,f\",g,h";
string pattern = "\"([\\w]),([\\w])\"";
string replace = "\"$1 $2\"";
string result = Regex.Replace(source, pattern, replace);
Console.WriteLine(result); // a,b,c,d,"e f",g,h
Breaking apart the pattern, it is matching any instance where there is a "X,X" sequence where X is any letter, and is replacing it with the very same sequence, with a space in between the letters instead of a comma.
You could easily extend this if you needed to to have it match more than one letter, etc, as needed.
For the case where you can have multiple letters separated by commas within quotes that need to be replaced, the following can do it for you. Sample text is a,b,c,d,"e,f,a",g,h:
string source = "a,b,c,d,\"e,f,a\",g,h";
string pattern = "\"([ ,\\w]+),([ ,\\w]+)\"";
string replace = "\"$1 $2\"";
string result = source;
while (Regex.IsMatch(result, pattern)) {
result = Regex.Replace(result, pattern, replace);
}
Console.WriteLine(result); // a,b,c,d,"e f a",g,h
This does something similar compared to the first one, but just removes any comma that is sandwiched by letters surrounded by quotes, and repeats it until all cases are removed.
Here's a somewhat fragile but simple solution:
string.Join("\"", line.Split('"').Select((s, i) => i % 2 == 0 ? s : s.Replace(",", " ")))
It's fragile because it doesn't handle flavors of CSV that escape double-quotes inside double-quotes.
Use the following code:
string str = "a,b,c,d,\"e,f\",g,h";
string[] str2 = str.Split('\"');
var str3 = str2.Select(p => ((p.StartsWith(",") || p.EndsWith(",")) ? p : p.Replace(',', ' '))).ToList();
str = string.Join("", str3);
Use Split() and Join():
string input = "a,b,c,d,\"e,f\",g,h";
string[] pieces = input.Split('"');
for ( int i = 1; i < pieces.Length; i += 2 )
{
pieces[i] = string.Join(" ", pieces[i].Split(','));
}
string output = string.Join("\"", pieces);
Console.WriteLine(output);
// output: a,b,c,d,"e f",g,h

String "de-concatenation"

I have two strings like this
string s = "abcdef";
string t = "def";
I would like to remove t from s. Can I do this like this?
s = s - t?
EDIT
I will have two strings s and t, t will be an ending substring of s. I want to remove t from s.
No, but you can do this:
var newStr = "abcdef".Replace("def", "");
Per your comments, if you want to only remove the trailing pattern you can use a Regex:
var newStr = Regex.Replace("defdefdef", "(def)$", "");
The '$' will anchor to the end of the string, so it will only remove the final 'def'
Turning this into an extension method:
public static String ReplaceEnd(this string input, string subStr, string replace = "")
{
//Per Alexei Levenkov's comments, the string should
// be escaped in order to avoid accidental injection
// of special characters into the Regex pattern
var escaped = Regex.Escape(subStr);
var pattern = String.Format("({0})$", escaped);
return Regex.Replace(input, pattern, replace);
}
Using this method with your code above would become:
string s = "abcdef";
string t = "def";
s = s.ReplaceEnd(t); // Ta Da!
Like this:
if (s.EndsWith(t))
{
s = s.Substring(0, s.LastIndexOf(t));
}
s = s.Substring(0, s.Length - t.Length)
Substring takes two arguments: start and length. You want to take things from the start of abcdef, that's index 0, and you want to take all the characters minus the characters from t, which is the difference of length of the two strings.
This assumes the OP's contract of "t will be an ending substring of s". If in fact this precondition is not guaranteed, it needs if (s.EndsWith(t)) around it.

How to remove leading and trailing spaces from a string

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?
String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.
You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"
I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.
txt = txt.Trim();
Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.
text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();
Use the Trim method.
static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.
You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

Categories

Resources