Performance issue when removing numbers from large string

Performance issue when removing numbers from large string - c#

I have a function containing the following code:
Text = Text.Where(c => !Char.IsDigit(c)).Aggregate<char, string>(null, (current, c) => current + c);
but it is rather slow. Is there anyway I can speed it up?

Try this regex:
Text = Regex.Replace(Text, #"\d+", "");
\d+ is more efficient than just \d because it removes multiple consecutive digits at once.

Yes, you can use Regex.Replace:
Text = Regex.Replace(Text, "\\d", "");
The regular expression matches a single digit. Regex.Replace replaces each occurrence of it in the Text string with an empty string "".

All those concatenations are probably killing you. The easiest/best is probably a regex:
Text = Regex.Replace(Text, "\\d", "");
Or you can try making only one new string instance:
Text = new string(Text.Where(c => !Char.IsDigit(c)).ToArray())

Try with Regex.Replace;
In a specified input string, replaces strings that match a regular
expression pattern with a specified replacement string.
Regex.Replace(Text, "\\d+", "");
Here is a DEMO.

Related

How can I remove all special chars from UTF8 text in c#?

I would like to remove all special characters from my UTF8 text, but I can't find any matching regular expression.
My text like this:
ASDÉÁPŐÓÖŰ_->,.!"%=%!HMHF
I would like to remove only these chars: _->,.!"%=%!
I tried this regex:
result = Regex.Replace(text, #"([^a-zA-Z0-9_]|^\s)", "");
But it removes my uft8 chars also.
I don't want to remove the accented characters, but I want to remove all glyph.

Regex.Replace(text, #"([^\w]|_)", "")

you want only numbers and letters?
then this is your solution:
result = Regex.Replace(text, "[^0-9a-zA-Z]+", "");
you could also try to specify a range in the ASCII table if you want a custom way of things stay in your string:
result = Regex.Replace(text, "[^\x00-\x80]+", "");

Regex to match only numbers , no apostrophes

I want to match only numbers in the following string
String : "40’000"
Match : "40000"
basically tring to ignore apostrophe.
I am using C#, in case it matters.
Cant use any C# methods, need to only use Regex.

Replace like this it replace all char excpet numbers
string input = "40’000";
string result = Regex.Replace(input, #"[^\d]", "");

Since you said; I just want to pick up numbers only, how about without regex?
var s = "40’000";
var result = new string(s.Where(char.IsDigit).ToArray());
Console.WriteLine(result); // 40000

I suggest use regex to find the special characters not the digits, and then replace by ''.
So a simple (?=\S)\D should be enough, the (?=\S) is to ignore the whitespace at the end of number.
DEMO

Replace like this it replace all char excpet numbers and points
string input = "40’000";
string result = Regex.Replace(input, #"[^\d^.]", "");

Don't complicate your life, use Regex.Replace
string s = "40'000";
string replaced = Regex.Replace(s, #"\D", "");

RegEx.Replace to Replace Whole Words and Skip when Part of the Word

I am using regex to replace certain keywords from a string (or Stringbuilder) with the ones that I choose. However, I fail to build a valid regex pattern to replace only whole words.
For example, if I have InputString = "fox foxy" and want to replace "fox" with "dog" it the output would be "dog dogy".
What is the valid RegEx pattern to take only "fox" and leave "foxy"?
public string Replace(string KeywordToReplace, string Replacement) /
{
this.Replacement = Replacement;
this.KeywordToReplace = KeywordToReplace;
Regex RegExHelper = new Regex(KeywordToReplace, RegexOptions.IgnoreCase);
string Output = RegExHelper.Replace(InputString, Replacement);
return Output;
}
Thanks!

Regexes support a special escape sequence that represents a word boundary. Word-characters are everything in [a-zA-Z0-9]. So a word-boundary is between any character that belongs in this group and a character that doesn't. The escape sequence is \b:
\bfox\b

Do not forget to put '#' symbol before your '\bword\b'.
For example:
address = Regex.Replace(address, #"\bNE\b", "Northeast");
# symbol ensures escape character, backslash(\), does not get escaped!

You need to use boundary..
KeywordToReplace="\byourWord\b"

Regular Expression to get all characters before "-"

How can I get the string before the character "-" using regular expressions?
For example, I have "text-1" and I want to return "text".

So I see many possibilities to achieve this.
string text = "Foobar-test";
Regex Match everything till the first "-"
Match result = Regex.Match(text, #"^.*?(?=-)");
^ match from the start of the string
.*? match any character (.), zero or more times (*) but as less as possible (?)
(?=-) till the next character is a "-" (this is a positive look ahead)
Regex Match anything that is not a "-" from the start of the string
Match result2 = Regex.Match(text, #"^[^-]*");
[^-]* matches any character that is not a "-" zero or more times
Regex Match anything that is not a "-" from the start of the string till a "-"
Match result21 = Regex.Match(text, #"^([^-]*)-");
Will only match if there is a dash in the string, but the result is then found in capture group 1.
Split on "-"
string[] result3 = text.Split('-');
Result is an Array the part before the first "-" is the first item in the Array
Substring till the first "-"
string result4 = text.Substring(0, text.IndexOf("-"));
Get the substring from text from the start till the first occurrence of "-" (text.IndexOf("-"))
You get then all the results (all the same) with this
Console.WriteLine(result);
Console.WriteLine(result2);
Console.WriteLine(result21.Groups[1]);
Console.WriteLine(result3[0]);
Console.WriteLine(result4);
I would prefer the first method.
You need to think also about the behavior, when there is no dash in the string. The fourth method will throw an exception in that case, because text.IndexOf("-") will be -1. Method 1 and 2.1 will return nothing and method 2 and 3 will return the complete string.

Here is my suggestion - it's quite simple as that:
[^-]*

This is something like the regular expression you need:
([^-]*)-
Quick tests in JavaScript:
/([^-]*)-/.exec('text-1')[1] // 'text'
/([^-]*)-/.exec('foo-bar-1')[1] // 'foo'
/([^-]*)-/.exec('-1')[1] // ''
/([^-]*)-/.exec('quux')[1] // explodes

I dont think you need regex to achieve this. I would look at the SubString method along with the indexOf method. If you need more help, add a comment showing what you have attempted and I will offer more help.

You could just use another non-regex based method. Someone gave the suggestion of using Substring, but you could also use Split:
string testString = "my-string";
string[] splitString = testString.Split("-");
string resultingString = splitString[0]; //my
See http://msdn.microsoft.com/en-US/library/ms228388%28v=VS.80%29.aspx for another good example.

If you want use RegEx in .NET,
Regex rx = new Regex(#"^([\w]+)(\-)*");
var match = rx.Match("thisis-thefirst");
var text = match.Groups[1].Value;
Assert.AreEqual("thisis", text);

Find all word and space characters up to and including a -
^[\w ]+-

Remove alphabets from a string

I want to remove alphabets from a string. What is the best way to do it. To be more precise, i have MAC address of a system, and I want to extract only the numbers from it. I have found this article or stackoverflow. link text
I want to know, if using the regex is the best way or there are other ways to do it (maybe using LINQ).

To get the digits, you can use this regex:
var digits = Regex.Replace(text, #"\D", "");
\D matches anything that is not a digit, so removing those will give you the remaining digits.

The LINQ approach would be as follows:
string input = "12-34-56-78-9A-BC";
string result = new String(input.Where(Char.IsDigit).ToArray());
Non-LINQ / 2.0 approach:
string result = new String(Array.FindAll(input.ToCharArray(),
delegate(char c) { return Char.IsDigit(c); }));

This will replace anything that's not a number and leave you with just numbers:
string text = "abc123abc:13sdf2";
string numbers = Regex.Replace(text, #"[^\d]+", "");
Console.WriteLine(numbers);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Performance issue when removing numbers from large string - c#

I have a function containing the following code: Text = Text.Where(c => !Char.IsDigit(c)).Aggregate<char, string>(null, (current, c) => current + c); but it is rather slow. Is there anyway I can speed it up?

Try this regex: Text = Regex.Replace(Text, #"\d+", ""); \d+ is more efficient than just \d because it removes multiple consecutive digits at once.

Yes, you can use Regex.Replace: Text = Regex.Replace(Text, "\\d", ""); The regular expression matches a single digit. Regex.Replace replaces each occurrence of it in the Text string with an empty string "".

All those concatenations are probably killing you. The easiest/best is probably a regex: Text = Regex.Replace(Text, "\\d", ""); Or you can try making only one new string instance: Text = new string(Text.Where(c => !Char.IsDigit(c)).ToArray())

Try with Regex.Replace; In a specified input string, replaces strings that match a regular expression pattern with a specified replacement string. Regex.Replace(Text, "\\d+", ""); Here is a DEMO.

Related

How can I remove all special chars from UTF8 text in c#?

Regex to match only numbers , no apostrophes

RegEx.Replace to Replace Whole Words and Skip when Part of the Word

Regular Expression to get all characters before "-"

Remove alphabets from a string

Categories

Resources