Replace non-numeric with empty string - c#

Quick add on requirement in our project. A field in our DB to hold a phone number is set to only allow 10 characters. So, if I get passed "(913)-444-5555" or anything else, is there a quick way to run a string through some kind of special replace function that I can pass it a set of characters to allow?
Regex?

Definitely regex:
string CleanPhone(string phone)
{
Regex digitsOnly = new Regex(#"[^\d]");
return digitsOnly.Replace(phone, "");
}
or within a class to avoid re-creating the regex all the time:
private static Regex digitsOnly = new Regex(#"[^\d]");
public static string CleanPhone(string phone)
{
return digitsOnly.Replace(phone, "");
}
Depending on your real-world inputs, you may want some additional logic there to do things like strip out leading 1's (for long distance) or anything trailing an x or X (for extensions).

You can do it easily with regex:
string subject = "(913)-444-5555";
string result = Regex.Replace(subject, "[^0-9]", ""); // result = "9134445555"

You don't need to use Regex.
phone = new String(phone.Where(c => char.IsDigit(c)).ToArray())

Here's the extension method way of doing it.
public static class Extensions
{
public static string ToDigitsOnly(this string input)
{
Regex digitsOnly = new Regex(#"[^\d]");
return digitsOnly.Replace(input, "");
}
}

Using the Regex methods in .NET you should be able to match any non-numeric digit using \D, like so:
phoneNumber = Regex.Replace(phoneNumber, "\\D", String.Empty);

How about an extension method that doesn't use regex.
If you do stick to one of the Regex options at least use RegexOptions.Compiled in the static variable.
public static string ToDigitsOnly(this string input)
{
return new String(input.Where(char.IsDigit).ToArray());
}
This builds on Usman Zafar's answer converted to a method group.

for the best performance and lower memory consumption , try this:
using System;
using System.Diagnostics;
using System.Text;
using System.Text.RegularExpressions;
public class Program
{
private static Regex digitsOnly = new Regex(#"[^\d]");
public static void Main()
{
Console.WriteLine("Init...");
string phone = "001-12-34-56-78-90";
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
DigitsOnly(phone);
}
sw.Stop();
Console.WriteLine("Time: " + sw.ElapsedMilliseconds);
var sw2 = new Stopwatch();
sw2.Start();
for (int i = 0; i < 1000000; i++)
{
DigitsOnlyRegex(phone);
}
sw2.Stop();
Console.WriteLine("Time: " + sw2.ElapsedMilliseconds);
Console.ReadLine();
}
public static string DigitsOnly(string phone, string replace = null)
{
if (replace == null) replace = "";
if (phone == null) return null;
var result = new StringBuilder(phone.Length);
foreach (char c in phone)
if (c >= '0' && c <= '9')
result.Append(c);
else
{
result.Append(replace);
}
return result.ToString();
}
public static string DigitsOnlyRegex(string phone)
{
return digitsOnly.Replace(phone, "");
}
}
The result in my computer is:
Init...
Time: 307
Time: 2178

I'm sure there's a more efficient way to do it, but I would probably do this:
string getTenDigitNumber(string input)
{
StringBuilder sb = new StringBuilder();
for(int i - 0; i < input.Length; i++)
{
int junk;
if(int.TryParse(input[i], ref junk))
sb.Append(input[i]);
}
return sb.ToString();
}

try this
public static string cleanPhone(string inVal)
{
char[] newPhon = new char[inVal.Length];
int i = 0;
foreach (char c in inVal)
if (c.CompareTo('0') > 0 && c.CompareTo('9') < 0)
newPhon[i++] = c;
return newPhon.ToString();
}

Related

Method that takes a message and index, creates a substring using the index

Problem: I want to write a method that takes a message/index pair like this:
("Hello, I am *Name1, how are you doing *Name2?", 2)
The index refers to the asterisk delimited name in the message. So if the index is 1, it should refer to *Name1, if it's 2 it should refer to *Name2.
The method should return just the name with the asterisk (*Name2).
I have attempted to play around with substrings, taking the first delimited * and ending when we reach a character that isn't a letter, number, underscore or hyphen, but the logic just isn't setting in.
I know this is similar to a few problems on SO but I can't find anything this specific. Any help is appreciated.
This is what's left of my very vague attempt so far. Based on this thread:
public string GetIndexedNames(string message, int index)
{
int strStart = message.IndexOf("#") + "#".Length;
int strEnd = message.LastIndexOf(" ");
String result = message.Substring(strStart, strEnd - strStart);
}
If you want to do it the old school way, then something like:
public static void Main(string[] args)
{
string message = "Hello, I am *Name1, how are you doing *Name2?";
string name1 = GetIndexedNames(message, "*", 1);
string name2 = GetIndexedNames(message, "*", 2);
Console.WriteLine(message);
Console.WriteLine(name1);
Console.WriteLine(name2);
Console.ReadLine();
}
public static string GetIndexedNames(string message, string singleCharDelimiter, int index)
{
string valid = "abcdefghijlmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-";
string[] parts = message.Split(singleCharDelimiter.ToArray());
if (parts.Length >= index)
{
StringBuilder sb = new StringBuilder();
for(int i = 0; i < parts[index].Length; i++)
{
string character = parts[index].Substring(i, 1);
if (valid.Contains(character))
{
sb.Append(character);
}
else
{
return sb.ToString();
}
}
return sb.ToString();
}
return "";
}
You can try using regular expressions to match the names. Assuming that name is a sequence of word characters (letters or digits):
using System.Linq;
using System.Text.RegularExpressions;
...
// Either name with asterisk *Name or null
// index is 1-based
private static ObtainName(string source, int index) => Regex
.Matches(source, #"\*\w+")
.Cast<Match>()
.Select(match => match.Value)
.Distinct() // in case the same name repeats several times
.ElementAtOrDefault(index - 1);
Demo:
string name = ObtainName(
"Hello, I am *Name1, how are you doing *Name2?", 2);
Console.Write(name);
Outcome:
*Name2
Perhaps not the most elegant solution, but if you want to use IndexOf, use a loop:
public static string GetIndexedNames(string message, int index, char marker='*')
{
int lastFound = 0;
for (int i = 0; i < index; i++) {
lastFound = message.IndexOf(marker, lastFound+1);
if (lastFound == -1) return null;
}
var space = message.IndexOf(' ', lastFound);
return space == -1 ? message.Substring(lastFound) : message.Substring(lastFound, space - lastFound);
}

How can i remove none alphabet chars from a string[]? [duplicate]

This question already has answers here:
How do I remove all non alphanumeric characters from a string except dash?
(13 answers)
Closed 9 years ago.
This is the code:
StringBuilder sb = new StringBuilder();
Regex rgx = new Regex("[^a-zA-Z0-9 -]");
var words = Regex.Split(textBox1.Text, #"(?=(?<=[^\s])\s+\w)");
for (int i = 0; i < words.Length; i++)
{
words[i] = rgx.Replace(words[i], "");
}
When im doing the Regex.Split() the words contain also strings with chars inside for exmaple:
Daniel>
or
Hello:
or
\r\nNew
or
hello---------------------------
And i need to get only the words without all the signs
So i tried to use this loop but i end that in words there are many places with ""
And some places with only ------------------------
And i cant use this as strings later in my code.
You don't need a regex to clear non-letters. This will remove all non-unicode letters.
public string RemoveNonUnicodeLetters(string input)
{
StringBuilder sb = new StringBuilder();
foreach(char c in input)
{
if(Char.IsLetter(c))
sb.Append(c);
}
return sb.ToString();
}
Alternatively, if you only want to allow Latin letters, you can use this
public string RemoveNonLatinLetters(string input)
{
StringBuilder sb = new StringBuilder();
foreach(char c in input)
{
if(c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')
sb.Append(c);
}
return sb.ToString();
}
Benchmark vs Regex
public static string RemoveNonUnicodeLetters(string input)
{
StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
if (Char.IsLetter(c))
sb.Append(c);
}
return sb.ToString();
}
static readonly Regex nonUnicodeRx = new Regex("\\P{L}");
public static string RemoveNonUnicodeLetters2(string input)
{
return nonUnicodeRx.Replace(input, "");
}
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
StringBuilder sb = new StringBuilder();
//generate guids as input
for (int j = 0; j < 1000; j++)
{
sb.Append(Guid.NewGuid().ToString());
}
string input = sb.ToString();
sw.Start();
for (int i = 0; i < 1000; i++)
{
RemoveNonUnicodeLetters(input);
}
sw.Stop();
Console.WriteLine("SM: " + sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 0; i < 1000; i++)
{
RemoveNonUnicodeLetters2(input);
}
sw.Stop();
Console.WriteLine("RX: " + sw.ElapsedMilliseconds);
}
Output (SM = String Manipulation, RX = Regex)
SM: 581
RX: 9882
SM: 545
RX: 9557
SM: 664
RX: 10196
keyboardP’s solution is decent – do consider it. But as I’ve argued in the comments, regular expressions are actually the correct tool for the job, you’re just making it unnecessarily complicated. The actual solution is a one-liner:
var result = Regex.Replace(input, "\\P{L}", "");
\P{…} specifies a Unicode character class we do not want to match (the opposite of \p{…}). L is the Unicode character class for letters.
Of course it makes sense to encapsulate this into a method, as keyboardP did. To avoid recompiling the regular expression over again, you should also consider pulling the regex creation out of the actual code (although this probably won’t give a big impact on performance):
static readonly Regex nonUnicodeRx = new Regex("\\P{L}");
public static string RemoveNonUnicodeLetters(string input) {
return nonUnicodeRx.Replace(input, "");
}
To help Konrad and keyboardP resolve their differences, I ran a benchmark test, using their code. It turns out that keyboardP's code is 10x faster than Konrad's code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input = "asdf234!##*advfk234098awfdasdfq9823fna943";
DateTime start = DateTime.Now;
for (int i = 0; i < 100000; i++)
{
RemoveNonUnicodeLetters(input);
}
Console.WriteLine(DateTime.Now.Subtract(start).TotalSeconds);
start = DateTime.Now;
for (int i = 0; i < 100000; i++)
{
RemoveNonUnicodeLetters2(input);
}
Console.WriteLine(DateTime.Now.Subtract(start).TotalSeconds);
}
public static string RemoveNonUnicodeLetters(string input)
{
StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
if (Char.IsLetter(c))
sb.Append(c);
}
return sb.ToString();
}
public static string RemoveNonUnicodeLetters2(string input)
{
var result = Regex.Replace(input, "\\P{L}", "");
return result;
}
}
}
I got
0.12
1.2
as output
UPDATE:
To see if it is the Regex compilation that is slowing down the Regex method, I put the regex in a static variable that is only constructed once.
static Regex rex = new Regex("\\P{L}");
public static string RemoveNonUnicodeLetters2(string input)
{
var result = rex.Replace(input,m => "");
return result;
}
But this had no effect on the runtime.

C# StringBuilder: Check if it ends with a new line

I have a StringBuilder that accumulates code. In some cases, it has 2 empty lines between code blocks, and I'd like to make that 1 empty line.
How can I check if the current code already has an empty line at the end? (I prefer not to use its ToString() method because of performance issues.)
You can access any character of your StringBuilder with its index, like you would with a String.
var sb = new StringBuilder();
sb.Append("Hello world!\n");
Console.WriteLine(sb[sb.Length - 1] == '\n'); // True
You can normalize the newlines, using a regex:
var test = #"hello
moop
hello";
var regex = new Regex(#"(?:\r\n|[\r\n])+");
var newLinesNormalized = regex.Replace(test, Environment.NewLine);
output:
hello
moop
hello
Single line check. Uses a string type, not StringBuilder, but you should get the basic idea.
if (theString.Substring(theString.Length - Environment.NewLine.Length, Environment.NewLine.Length).Contains(Environment.NewLine))
{
//theString does end with a NewLine
}
else
{
//theString does NOT end with a NewLine
}
Here is the complete example.
class string_builder
{
string previousvalue = null;
StringBuilder sB;
public string_builder()
{
sB = new StringBuilder();
}
public void AppendToStringBuilder(string new_val)
{
if (previousvalue.EndsWith("\n") && !String.IsNullOrEmpty(previousvalue) )
{
sB.Append(new_val);
}
else
{
sB.AppendLine(new_val);
}
previousvalue = new_val;
}
}
class Program
{
public static void Main(string[] args)
{
string_builder sb = new string_builder();
sb.AppendToStringBuilder("this is line1\n");
sb.AppendToStringBuilder("this is line2");
sb.AppendToStringBuilder("\nthis is line3\n");
}
}
I've got 'funny' answer:
var sb = new StringBuilder();
sb.AppendLine("test");
sb.AppendLine("test2");
Console.WriteLine(sb.ToString().TrimEnd('\n').Length != sb.ToString().Length); //true
Since I don't care about 2 empty lines in the middle of the code, the simplest way is to use
myCode.Replace(string.Format("{0}{0}", Environment.NewLine),Environment.NewLine);
This option doesn't require any changes to classes that use the code accumulator.
In-case anyone ends up here like I did here is a general method to check the end of a StringBuilder for an arbitrary string with having to use ToString on it.
public static bool EndsWith(this StringBuilder haystack, string needle)
{
var needleLength = needle.Length - 1;
var haystackLength = haystack.Length - 1;
if (haystackLength < needleLength)
{
return false;
}
for (int i = 0; i < needleLength; i++)
{
if (haystack[haystackLength - i] != needle[needleLength - i])
{
return false;
}
}
return true;
}

How to insert/remove hyphen to/from a plain string in c#?

I have a string like this;
string text = "6A7FEBFCCC51268FBFF";
And I have one method for which I want to insert the logic for appending the hyphen after 4 characters to 'text' variable. So, the output should be like this;
6A7F-EBFC-CC51-268F-BFF
Appending hyphen to above 'text' variable logic should be inside this method;
public void GetResultsWithHyphen
{
// append hyphen after 4 characters logic goes here
}
And I want also remove the hyphen from a given string such as 6A7F-EBFC-CC51-268F-BFF. So, removing hyphen from a string logic should be inside this method;
public void GetResultsWithOutHyphen
{
// Removing hyphen after 4 characters logic goes here
}
How can I do this in C# (for desktop app)?
What is the best way to do this?
Appreciate everyone's answer in advance.
GetResultsWithOutHyphen is easy (and should return a string instead of void
public string GetResultsWithOutHyphen(string input)
{
// Removing hyphen after 4 characters logic goes here
return input.Replace("-", "");
}
for GetResultsWithHyphen, there may be slicker ways to do it, but here's one way:
public string GetResultsWithHyphen(string input)
{
// append hyphen after 4 characters logic goes here
string output = "";
int start = 0;
while (start < input.Length)
{
output += input.Substring(start, Math.Min(4,input.Length - start)) + "-";
start += 4;
}
// remove the trailing dash
return output.Trim('-');
}
Use regex:
public String GetResultsWithHyphen(String inputString)
{
return Regex.Replace(inputString, #"(\w{4})(\w{4})(\w{4})(\w{4})(\w{3})",
#"$1-$2-$3-$4-$5");
}
and for removal:
public String GetResultsWithOutHyphen(String inputString)
{
return inputString.Replace("-", "");
}
Here's the shortest regex I could come up with. It will work on strings of any length. Note that the \B token will prevent it from matching at the end of a string, so you don't have to trim off an extra hyphen as with some answers above.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string text = "6A7FEBFCCC51268FBFF";
for (int i = 0; i <= text.Length;i++ )
Console.WriteLine(hyphenate(text.Substring(0, i)));
}
static string hyphenate(string s)
{
var re = new Regex(#"(\w{4}\B)");
return re.Replace (s, "$1-");
}
static string dehyphenate (string s)
{
return s.Replace("-", "");
}
}
}
var hyphenText = new string(
text
.SelectMany((i, ch) => i%4 == 3 && i != text.Length-1 ? new[]{ch, '-'} : new[]{ch})
.ToArray()
)
something along the lines of:
public string GetResultsWithHyphen(string inText)
{
var counter = 0;
var outString = string.Empty;
while (counter < inText.Length)
{
if (counter % 4 == 0)
outString = string.Format("{0}-{1}", outString, inText.Substring(counter, 1));
else
outString += inText.Substring(counter, 1);
counter++;
}
return outString;
}
This is rough code and may not be perfectly, syntactically correct
public static string GetResultsWithHyphen(string str) {
return Regex.Replace(str, "(.{4})", "$1-");
//if you don't want trailing -
//return Regex.Replace(str, "(.{4})(?!$)", "$1-");
}
public static string GetResultsWithOutHyphen(string str) {
//if you just want to remove the hyphens:
//return input.Replace("-", "");
//if you REALLY want to remove hyphens only if they occur after 4 places:
return Regex.Replace(str, "(.{4})-", "$1");
}
For removing:
String textHyphenRemoved=text.Replace('-',''); should remove all of the hyphens
for adding
StringBuilder strBuilder = new StringBuilder();
int startPos = 0;
for (int i = 0; i < text.Length / 4; i++)
{
startPos = i * 4;
strBuilder.Append(text.Substring(startPos,4));
//if it isn't the end of the string add a hyphen
if(text.Length-startPos!=4)
strBuilder.Append("-");
}
//add what is left
strBuilder.Append(text.Substring(startPos, 4));
string textWithHyphens = strBuilder.ToString();
Do note that my adding code is untested.
GetResultsWithOutHyphen method
public string GetResultsWithOutHyphen(string input)
{
return input.Replace("-", "");
}
GetResultsWithOutHyphen method
You could pass a variable instead of four for flexibility.
public string GetResultsWithHyphen(string input)
{
string output = "";
int start = 0;
while (start < input.Length)
{
char bla = input[start];
output += bla;
start += 1;
if (start % 4 == 0)
{
output += "-";
}
}
return output;
}
This worked for me when I had a value for a social security number (123456789) and needed it to display as (123-45-6789) in a listbox.
ListBox1.Items.Add("SS Number : " & vbTab & Format(SSNArray(i), "###-##-####"))
In this case I had an array of Social Security Numbers. This line of code alters the formatting to put a hyphen in.
Callee
public static void Main()
{
var text = new Text("THISisJUSTanEXAMPLEtext");
var convertText = text.Convert();
Console.WriteLine(convertText);
}
Caller
public class Text
{
private string _text;
private int _jumpNo = 4;
public Text(string text)
{
_text = text;
}
public Text(string text, int jumpNo)
{
_text = text;
_jumpNo = jumpNo < 1 ? _jumpNo : jumpNo;
}
public string Convert()
{
if (string.IsNullOrEmpty(_text))
{
return string.Empty;
}
if (_text.Length < _jumpNo)
{
return _text;
}
var convertText = _text.Substring(0, _jumpNo);
int start = _jumpNo;
while (start < _text.Length)
{
convertText += "-" + _text.Substring(start, Math.Min(_jumpNo, _text.Length - start));
start += _jumpNo;
}
return convertText;
}
}

Regular Expression - Is this possible?

Rather than describing what I want (it's difficult to explain), Let me provide an example of what I need to accomplish in C# using a regular expression:
"HelloWorld" should be transformed to "Hello World"
"HelloWORld" should be transformed to "Hello WO Rld" //Two consecutive letters in capital should be treatead as one word
"helloworld" should be transformed to "helloworld"
EDIT:
"HellOWORLd" should be transformed to "Hell OW OR Ld"
Every 2-consecutive capital letters should be considered one word.
Is this possible?
This is fully working C# code, not just the regex:
Console.WriteLine(
Regex.Replace(
"HelloWORld",
"(?<!^)(?<wordstart>[A-Z]{1,2})",
" ${wordstart}", RegexOptions.Compiled));
And it prints:
Hello WO Rld
Update
To make this more UNICODE/international aware, consider replacing [A-Z] by \p{Lt} (meaning a UNICODE code point that represents a Letter in uppercase). The result for the current input would the same. So here is a slightly more compelling example:
Console.WriteLine(Regex.Replace(
#"ÉclaireürfØÑJßå",
#"(?<!^)(?<wordstart>\p{Lu}{1,2})",
#" ${wordstart}",
RegexOptions.Compiled));
The regular expression engine is not a transformative thing by nature, but rather a pattern matching (and replacing) engine. People often mistake the replace part of Regex, thinking that it can do more than it's designed to.
Back to your question, though... Regex cannot do what you want, instead, you should write your own parser to do this. With C#, if you're familiar with the language, this task is somewhat trivial.
It's a case of "You're using the wrong tool for the job".
Here are regular expressions that detect what you are looking for:
([A-Z]\w*?)[A-Z]
this matches any uppercase letter from A to Z once followed by aphanumerics up to the next uppercase.
([A-Z]{2}\w*?)[A-Z]
this matches any uppercase letter from A to Z exactly 2 times.
Regex is a matching engine, you can parse the input string and use regex.isMatch to find candidate matches to then insert spaces into the output string
string f(string input)
{
//'lowerUPPER' -> 'lower UPPER'
var x = Regex.Replace(input, "([a-z])([A-Z])","$1 $2");
//'UPPER' -> 'UP PE R'
return Regex.Replace(x, "([A-Z]{2})","$1 ");
}
class Program
{
static void Main(string[] args)
{
Print(Parse("HelloWorld"));
Print(Parse("HelloWORld"));
Print(Parse("helloworld"));
Print(Parse("HellOWORLd"));
Console.ReadLine();
}
static void Print(IEnumerable<string> input)
{
foreach (var s in input)
{
Console.Write(s);
Console.Write(' ');
}
Console.WriteLine();
}
static IEnumerable<string> Parse(string input)
{
var sb = new StringBuilder();
for (int i = 0; i < input.Length; i++)
{
if (!char.IsUpper(input[i]))
{
sb.Append(input[i]);
continue;
}
if (sb.Length > 0)
{
yield return sb.ToString();
sb.Clear();
}
sb.Append(input[i]);
if (char.IsUpper(input[i + 1]))
{
sb.Append(input[++i]);
yield return sb.ToString();
sb.Clear();
}
}
if (sb.Length > 0)
{
yield return sb.ToString();
}
}
}
I think does not need regular expression in this case.
Try this:
static void Main(string[] args)
{
var input = "HellOWORLd";
var i = 0;
var x = 4;
var len = input.Length;
var output = new List<string>();
while (x <= len)
{
output.Add(SubStr(input, i, x));
i = x;
x += 2;
}
var ret = output.ToArray(); //["Hell","OW", "OR", "Ld"]
Console.ReadLine();
}
static string SubStr(string str, int start, int end)
{
var len = str.Length;
if (start >= 0 && end <= len)
{
var ret = new StringBuilder();
for (int i = 0; i < len; i++)
{
if (i == start)
{
do
{
ret.Append(str[i]);
i++;
} while (i != end);
}
}
return ret.ToString();
}
return null;
}

Categories

Resources