Fastest way to remove white spaces in string - c#

I'm trying to fetch multiple email addresses seperated by "," within string from database table, but it's also returning me whitespaces, and I want to remove the whitespace quickly.
The following code does remove whitespace, but it also becomes slow whenever I try to fetch large number email addresses in a string like to 30000, and then try to remove whitespace between them. It takes more than four to five minutes to remove those spaces.
Regex Spaces =
new Regex(#"\s+", RegexOptions.Compiled);
txtEmailID.Text = MultipleSpaces.Replace(emailaddress),"");
Could anyone please tell me how can I remove the whitespace within a second even for large number of email address?

I would build a custom extension method using StringBuilder, like:
public static string ExceptChars(this string str, IEnumerable<char> toExclude)
{
StringBuilder sb = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++)
{
char c = str[i];
if (!toExclude.Contains(c))
sb.Append(c);
}
return sb.ToString();
}
Usage:
var str = s.ExceptChars(new[] { ' ', '\t', '\n', '\r' });
or to be even faster:
var str = s.ExceptChars(new HashSet<char>(new[] { ' ', '\t', '\n', '\r' }));
With the hashset version, a string of 11 millions of chars takes less than 700 ms (and I'm in debug mode)
EDIT :
Previous code is generic and allows to exclude any char, but if you want to remove just blanks in the fastest possible way you can use:
public static string ExceptBlanks(this string str)
{
StringBuilder sb = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++)
{
char c = str[i];
switch (c)
{
case '\r':
case '\n':
case '\t':
case ' ':
continue;
default:
sb.Append(c);
break;
}
}
return sb.ToString();
}
EDIT 2 :
as correctly pointed out in the comments, the correct way to remove all the blanks is using char.IsWhiteSpace method :
public static string ExceptBlanks(this string str)
{
StringBuilder sb = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++)
{
char c = str[i];
if(!char.IsWhiteSpace(c))
sb.Append(c);
}
return sb.ToString();
}

Given the implementation of string.Replaceis written in C++ and part of the CLR runtime I'm willing to bet
email.Replace(" ","").Replace("\t","").Replace("\n","").Replace("\r","");
will be the fastest implementation. If you need every type of whitespace, you can supply the hex value the of unicode equivalent.

With linq you can do it simply:
emailaddress = new String(emailaddress
.Where(x=>x!=' ' && x!='\r' && x!='\n')
.ToArray());
I didn't compare it with stringbuilder approaches, but is much more faster than string based approaches.
Because it does not create many copy of strings (string is immutable and using it directly causes to dramatically memory and speed problems), so it's not going to use very big memory and not going to slow down the speed (except one extra pass through the string at first).

You should try String.Trim(). It will trim all spaces from start to end of a string
Or you can try this method from linked topic: [link]
public static unsafe string StripTabsAndNewlines(string s)
{
int len = s.Length;
char* newChars = stackalloc char[len];
char* currentChar = newChars;
for (int i = 0; i < len; ++i)
{
char c = s[i];
switch (c)
{
case '\r':
case '\n':
case '\t':
continue;
default:
*currentChar++ = c;
break;
}
}
return new string(newChars, 0, (int)(currentChar - newChars));
}

emailaddress.Replace(" ", string.Empty);

There are many diffrent ways, some faster then others:
public static string StripTabsAndNewlines(this string str) {
//string builder (fast)
StringBuilder sb = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++) {
if ( ! Char.IsWhiteSpace(s[i])) {
sb.Append();
}
}
return sb.tostring();
//linq (faster ?)
return new string(str.ToCharArray().Where(c => !Char.IsWhiteSpace(c)).ToArray());
//regex (slow)
return Regex.Replace(str, #"\s+", "")
}

Please use the TrimEnd() method of the String class. You can find a great example here.

You should consider replacing spaces on the record-set within your stored procedure or query using the REPLACE( ) function if possible & even better fix your DB records since a space in an email address is invalid anyways.
As mentioned by others you would need to profile the different approaches. If you are using Regex you should minimally make it a class-level static variable:
public static Regex MultipleSpaces = new Regex(#"\s+", RegexOptions.Compiled);
emailAddress.Where(x=>{ return x != ' ';}).ToString( ) is likely to have function overhead although it could be optimized to inline by Microsoft -- again profiling will give you the answer.
The most efficient method would be to allocate a buffer and copy character by character to a new buffer and skip the spaces as you do that. C# does support pointers so you could use unsafe code, allocate a raw buffer and use pointer arithmetic to copy just like in C and that is as fast as this can possibly be done. The REPLACE( ) in SQL will handle it like that for you.

string str = "Hi!! this is a bunch of text with spaces";
MessageBox.Show(new String(str.Where(c => c != ' ').ToArray()));

I haven't done performance testing on this, but it's simpler than most of the other answers.
var s1 = "\tstring \r with \t\t \nwhitespace\r\n";
var s2 = string.Join("", s1.Split());
The result is
stringwithwhitespace

string input =Yourinputstring;
string[] strings = input.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
foreach (string value in strings)
{
string newv= value.Trim();
if (newv.Length > 0)
newline += value + "\r\n";
}

string s = " Your Text ";
string new = s.Replace(" ", string.empty);
// Output:
// "YourText"

Fastest and general way to do this (line terminators, tabs will be processed as well). Regex powerful facilities don't really needed to solve this problem, but Regex can decrease performance.
new string
(stringToRemoveWhiteSpaces
.Where
(
c => !char.IsWhiteSpace(c)
)
.ToArray<char>()
)

Related

Alternatively upper- and lowercase words in a string

I use Visual Studio 2010 ver.
I have array strings [] = { "eat and go"};
I display it with foreach
I wanna convert strings like this : EAT and GO
Here my code:
Console.Write( myString.First().ToString().ToUpper() + String.Join("",myString].Skip(1)).ToLower()+ "\n");
But the output is : Eat and go . :D lol
Could you help me? I would appreciate it. Thanks
While .ToUpper() will convert a string to its upper case equivalent, calling .First() on a string object actually returns the first element of the string (since it's effectively a char[] under the hood). First() is actually exposed as a LINQ extension method and works on any collection type.
As with many string handling functions, there are a number of ways to handle it, and this is my approach. Obviously you'll need to validate value to ensure it's being given a long enough string.
using System.Text;
public string CapitalizeFirstAndLast(string value)
{
string[] words = value.Split(' '); // break into individual words
StringBuilder result = new StringBuilder();
// Add the first word capitalized
result.Append(words[0].ToUpper());
// Add everything else
for (int i = 1; i < words.Length - 1; i++)
result.Append(words[i]);
// Add the last word capitalized
result.Append(words[words.Length - 1].ToUpper());
return result.ToString();
}
If it's always gonna be a 3 words string, the you can simply do it like this:
string[] mystring = {"eat and go", "fast and slow"};
foreach (var s in mystring)
{
string[] toUpperLower = s.Split(' ');
Console.Write(toUpperLower.First().ToUpper() + " " + toUpperLower[1].ToLower() +" " + toUpperLower.Last().ToUpper());
}
If you want to continuously alternate, you can do the following:
private static string alternateCase( string phrase )
{
String[] words = phrase.split(" ");
StringBuilder builder = new StringBuilder();
//create a flag that keeps track of the case change
book upperToggle = true;
//loops through the words
for(into i = 0; i < words.length; i++)
{
if(upperToggle)
//converts to upper if flag is true
words[i] = words[i].ToUpper();
else
//converts to lower if flag is false
words[i] = words[i].ToLower();
upperToggle = !upperToggle;
//adds the words to the string builder
builder.append(words[i]);
}
//returns the new string
return builder.ToString();
}
Quickie using ScriptCS:
scriptcs (ctrl-c to exit)
> var input = "Eat and go";
> var words = input.Split(' ');
> var result = string.Join(" ", words.Select((s, i) => i % 2 == 0 ? s.ToUpperInvariant() : s.ToLowerInvariant()));
> result
"EAT and GO"

Remove additional spacing in string [Fastest Way]

I need to remove all additional spaces in a string.
I use regex for matching strings and matched strings i replace with some others.
For better understanding please see examples below:
3 input strings:
Hello, how are you?
Hello , how are you?
Hello , how are you ?
This are 3 strings that should match by one pattern-regex.
It looks something like this:
Hello\s*,\s+how\s+are\s+you\s*?
It works fine but there is a perfomance problem.
If I have a lot of patterns (~20k) and try to execute each pattern it runs very slow (3-5 minutes).
Maybe there is better way for doing this?
for example use some 3d-party libs?
UPD: Folks, this question is not about how to do this. It's about how to do this with best perfomance. :)
Let me explain more detailed. The main goal is tokenize text. (replace some token with special symbols)
For example I have a token "nice try".
Then I input text "this is nice try".
result: "this is #tokenizedtext#" where #tokenizedtext# some special symbols. It doesen't matter in this case.
Next I have string "Mike said it was a nice try".
result should be "Mike said it was a #tokenizedtext#".
I think the main idea is clear.
So I can have a lot of tokens. When I process it I convert my token from "nice try" to pattern "nice\s+try". and try to replace with this pattern input text.
It works fine. But if in tokens there is more spaces and there is also punctuation then my regexes became bigger and works very slow.
Do you have some suggestions (technical or logic) for solving this problem?
I can suggest a few solutions.
First of all, avoid the static Regex method. Create an instance of it (and store it, don't call the constructor for each replacement!) and, if possible, use RegexOptions.Compiled. It should improve your performance.
Second, you can try to review your pattern. I'll do some profiling, but I'm currently undecisive between:
#"(?<=\s)\s+"
With replacement being an empty string or:
#"\s+"
With a space as a replacement. You can try this code, in the meanwhile:
var s = "Hello , how are you?";
var pattern = #"\s+";
var regex = new Regex(pattern, RegexOptions.Compiled);
var replaced = regex.Replace(s, " ");
EDIT: After having done some measurement, the second pattern seems to be faster. I'm editing my sample to adapt it.
EDIT 2: I've written an unsafe method. It's much faster than the other ones presented here, including the Regex ones, but, as the word itself says, it's unsafe. I don't think that there's any problem with the code I've written but I may be wrong -- So please, check it again and again in case there's a bug in the method.
static unsafe string TrimInternal(string input)
{
var length = input.Length;
var array = stackalloc char[length];
fixed (char* fix = input)
{
var ptr = fix;
var counter = 0;
var lastWasSpace = false;
while (*ptr != '\x0')
{
//Current char is a space?
var isSpace = *ptr == ' ';
//If it's a space but the last one wasn't
//Or if it's not a space
if (isSpace && !lastWasSpace || !isSpace)
//Write into the result array
array[counter++] = *ptr;
//The last character (before the next loop) was a space
lastWasSpace = isSpace;
//Increase the pointer
ptr++;
}
return new string(array, 0, counter);
}
}
Usage (compile with /unsafe):
var s = TrimInternal("Hello , how are you?");
Profiling made in Release build, optimizations on, 1000000 iterations:
My above solution with Regex: 00:00:03.2130121
The unsafe solution: 00:00:00.2063467
This might work for you. It should be pretty fast. Note that it also removes spaces at the end of the string; that might not be what you want...
using System;
namespace Demo
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello, how are you?"));
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello , how are you?"));
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello , how are you ?"));
}
public static string RemoveExtraSpaces(string text)
{
var buffer = new char[text.Length];
bool isSpaced = false;
int n = 0;
foreach (char c in text)
{
if (c == ' ')
{
isSpaced = true;
}
else
{
if (isSpaced)
{
if ((c != ',') && (c != '?'))
{
buffer[n++] = ' ';
}
isSpaced = false;
}
buffer[n++] = c;
}
}
return new string(buffer, 0, n);
}
}
}
Something of my own :
find all the position of WhiteSpacechar in string;
private static IEnumerable<int> GetWhiteSpacePos(string input)
{
int iPos = -1;
while ((iPos = input.IndexOf(" ", iPos + 1, StringComparison.Ordinal)) > -1)
{
yield return iPos;
}
}
Remove all whitespace that are in in sequence Returned from GetWhiteSpacePos
string original_string = "Hello , how are you ?";
var poss = GetWhiteSpacePos(original_string).ToList();
int startPos;
int endPos;
StringBuilder builder = new StringBuilder(original_string);
for (int i = poss.Count -1; i > 1; i--)
{
endPos = poss[i];
while ((poss[i] == poss[i - 1] + 1) && i > 1)
{
i--;
}
startPos = poss[i];
if (endPos - startPos > 1)
{
builder.Remove(startPos, endPos - startPos);
}
}
string new_string = builder.ToString();
You are using a very complex regex..simplify the regex and that would definitely increasre the performance
Use \s+ and replace it with a single space
Well, these kind of problems really trouble us. Use this code, and I'm sure you're getting the result for what you've asked. This command removes any extra white space between any string.
cleanString= Regex.Replace(originalString, #"\s", " ");
Hope thar works for you. Thanks.
And since this is a single Instruction. It will utilize less CPU resource and hence less CPU time, which ultimately increases your performance. Therefore A/C to me this method works the best when compared in terms of performance.
if its just a matter of SPACE;
try this
Source : http://www.codeproject.com/Articles/10890/Fastest-C-Case-Insenstive-String-Replace
private static string ReplaceEx(string original,
string pattern, string replacement)
{
int count, position0, position1;
count = position0 = position1 = 0;
string upperString = original.ToUpper();
string upperPattern = pattern.ToUpper();
int inc = (original.Length / pattern.Length) *
(replacement.Length - pattern.Length);
char[] chars = new char[original.Length + Math.Max(0, inc)];
while ((position1 = upperString.IndexOf(upperPattern,
position0)) != -1)
{
for (int i = position0; i < position1; ++i)
chars[count++] = original[i];
for (int i = 0; i < replacement.Length; ++i)
chars[count++] = replacement[i];
position0 = position1 + pattern.Length;
}
if (position0 == 0) return original;
for (int i = position0; i < original.Length; ++i)
chars[count++] = original[i];
return new string(chars, 0, count);
}
Usage:
string original_string = "Hello , how are you ?";
while (original_string.Contains(" "))
{
original_string = ReplaceEx(original_string, " ", " ");
}
Replacing the regex way:
string resultString = null;
try {
resultString = Regex.Replace(subjectString, #"\s+", " ", RegexOption.Compiled);
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

To fetch the unique characters from a string?

I want to extract unique characters from a string. For example:- 'AAABBBBBCCCCFFFFGGGGGDDDDJJJJJJ' will return 'ABCFGDJ'
I have tried below piece of code but now I want to optimize it.
Please suggest if anyone knows.
static string extract(string original)
{
List<char> characters = new List<char>();
string unique = string.Empty;
foreach (char letter in original.ToCharArray())
{
if (!characters.Contains(letter))
{
characters.Add(letter);
}
}
foreach (char letter in characters)
{
unique += letter;
}
return unique;
}
I don't know if this is faster, but surely shorter
string s = "AAABBBBBCCCCFFFFGGGGGDDDDJJJJJJ";
var newstr = String.Join("", s.Distinct());
Another LINQ approach, but not using string.Join:
var result = new string(original.Distinct().ToArray());
I honestly don't know which approach to string creation would be faster. It probably depends on whether string.Join ends up internally converting each element to a string before appending to a StringBuilder, or whether it has custom support for some well-known types to avoid that.
How about
var result = string.Join("", "AAABBBBBCCCCFFFFGGGGGDDDDJJJJJJ".Distinct());
Make sure that you include System.Linq namespace.
Try this
string str = "AAABBBBBCCCCFFFFGGGGGDDDDJJJJJJ";
string answer = new String(str.Distinct().ToArray());
I hope this helps.
if "AAABBBAAA" should return "ABA", then the following does it. Albeit not very fast.
List<char> no_repeats = new List<char>();
no_repeats.Add(s[0]);
for (int i = 1; i < s.Length; i++)
{
if (s[i] != no_repeats.Last()) no_repeats.Add(s[i]);
}
string result = string.Join("", no_repeats);

Using String Split

I have a text
Category2,"Something with ,comma"
when I split this by ',' it should give me two string
Category2
"Something with ,comma"
but in actual it split string from every comma.
how can I achieve my expected result.
Thanx
Just call variable.Split(new char[] { ',' }, 2). Complete documentation in MSDN.
There are a number of things that you could be wanting to do here so I will address a few:
Split on the first comma
String text = text.Split(new char[] { ',' }, 2);
Split on every comma
String text = text.Split(new char[] {','});
Split on a comma not in "
var result = Regex.Split(samplestring, ",(?=(?:[^']*'[^']*')*[^']*$)");
Last one taken from C# Regex Split
Specify the maximum number of strings you want in the array:
string[] parts = text.Split(new char[] { ',' }, 2);
String.Split works at the simplest, fastest level - so it splits the text on all of the delimiters you pass into it, and it has no concept of special rules like double-quotes.
If you need a CSV parser which understands double-quotes, then you can write your own or there are some excellent open source parsers available - e.g. http://www.codeproject.com/KB/database/CsvReader.aspx - this is one I've used in several projects and recommend.
Try this:
public static class StringExtensions
{
public static IEnumerable<string> SplitToSubstrings(this string str)
{
int startIndex = 0;
bool isInQuotes = false;
for (int index = 0; index < str.Length; index++ )
{
if (str[index] == '\"')
isInQuotes = !isInQuotes;
bool isStartOfNewSubstring = (!isInQuotes && str[index] == ',');
if (isStartOfNewSubstring)
{
yield return str.Substring(startIndex, index - startIndex).Trim();
startIndex = index + 1;
}
}
yield return str.Substring(startIndex).Trim();
}
}
Usage is pretty simple:
foreach(var str in text.SplitToSubstrings())
Console.WriteLine(str);

How to insert spaces between the characters of a string

Is there an easy method to insert spaces between the characters of a string? I'm using the below code which takes a string (for example ( UI$.EmployeeHours * UI.DailySalary ) / ( Month ) ) . As this information is getting from an excel sheet, i need to insert [] for each columnname. The issue occurs if user avoids giving spaces after each paranthesis as well as an operator. AnyOne to help?
text = e.Expression.Split(Splitter);
string expressionString = null;
for (int temp = 0; temp < text.Length; temp++)
{
string str = null;
str = text[temp];
if (str.Length != 1 && str != "")
{
expressionString = expressionString + "[" + text[temp].TrimEnd() + "]";
}
else
expressionString = expressionString + str;
}
User might be inputing something like (UI$.SlNo-UI+UI$.Task)-(UI$.Responsible_Person*UI$.StartDate) while my desired output is ( [UI$.SlNo-UI] + [UI$.Task] ) - ([UI$.Responsible_Person] * [UI$.StartDate] )
Here is a short way to insert spaces after every single character in a string (which I know isn't exactly what you were asking for):
var withSpaces = withoutSpaces.Aggregate(string.Empty, (c, i) => c + i + ' ');
This generates a string the same as the first, except with a space after each character (including the last character).
You can do that with regular expressions:
using System.Text.RegularExpressions;
class Program {
static void Main() {
string expression = "(UI$.SlNo-UI+UI$.Task)-(UI$.Responsible_Person*UI$.StartDate) ";
string replaced = Regex.Replace(expression, #"([\w\$\.]+)", " [ $1 ] ");
}
}
If you are not familiar with regular expressions this might look rather cryptic, but they are a powerful tool, and worth learning. In case, you may check how regular expressions work, and use a tool like Expresso to test your regular expressions.
Hope this helps...
Here is an algorithm that does not use regular expressions.
//Applies dobule spacing between characters
public static string DoubleSpace(string s)
{
if (string.IsNullOrEmpty(s))
{
return string.Empty;
}
char[] a = s.ToCharArray();
char[] b = new char[ (a.Length * 2) - 1];
int bIndex = 0;
for(int i = 0; i < a.Length; i++)
{
b[bIndex++] = a[i];
//Insert a white space after the char
if(i < (a.Length - 1))
{
b[bIndex++] = ' ';
}
}
return new string(b);
}
Well, you can do this by using Regular expressions, search for specific paterns and add brackets where needed. You could also simply Replace every Paranthesis with the same Paranthesis but with spaces on each end.
I would also advice you to use StringBuilder aswell instead of appending to an existing string (this creates a new string for each manipulation, StringBuilder has a smaller memory footprint when doing this kind of manipulation)

Categories

Resources