I have some strings like this
string phoneNumber = "(914) 395-1430";
I would like to strip out the parethenses and the dash, in other word just keep the numeric values.
So the output could look like this
9143951430
How do I get the desired output ?
You do any of the following:
Use regular expressions. You can use a regular expression with either
A negative character class that defines the characters that are what you don't want (those characters other than decimal digits):
private static readonly Regex rxNonDigits = new Regex( #"[^\d]+");
In which case, you can do take either of these approaches:
// simply replace the offending substrings with an empty string
private string CleanStringOfNonDigits_V1( string s )
{
if ( string.IsNullOrEmpty(s) ) return s ;
string cleaned = rxNonDigits.Replace(s, "") ;
return cleaned ;
}
// split the string into an array of good substrings
// using the bad substrings as the delimiter. Then use
// String.Join() to splice things back together.
private string CleanStringOfNonDigits_V2( string s )
{
if (string.IsNullOrEmpty(s)) return s;
string cleaned = String.Join( rxNonDigits.Split(s) );
return cleaned ;
}
a positive character set that defines what you do want (decimal digits):
private static Regex rxDigits = new Regex( #"[\d]+") ;
In which case you can do something like this:
private string CleanStringOfNonDigits_V3( string s )
{
if ( string.IsNullOrEmpty(s) ) return s ;
StringBuilder sb = new StringBuilder() ;
for ( Match m = rxDigits.Match(s) ; m.Success ; m = m.NextMatch() )
{
sb.Append(m.Value) ;
}
string cleaned = sb.ToString() ;
return cleaned ;
}
You're not required to use a regular expression, either.
You could use LINQ directly, since a string is an IEnumerable<char>:
private string CleanStringOfNonDigits_V4( string s )
{
if ( string.IsNullOrEmpty(s) ) return s;
string cleaned = new string( s.Where( char.IsDigit ).ToArray() ) ;
return cleaned;
}
If you're only dealing with western alphabets where the only decimal digits you'll see are ASCII, skipping char.IsDigit will likely buy you a little performance:
private string CleanStringOfNonDigits_V5( string s )
{
if (string.IsNullOrEmpty(s)) return s;
string cleaned = new string(s.Where( c => c-'0' < 10 ).ToArray() ) ;
return cleaned;
}
Finally, you can simply iterate over the string, chucking the digits you don't want, like this:
private string CleanStringOfNonDigits_V6( string s )
{
if (string.IsNullOrEmpty(s)) return s;
StringBuilder sb = new StringBuilder(s.Length) ;
for (int i = 0; i < s.Length; ++i)
{
char c = s[i];
if ( c < '0' ) continue ;
if ( c > '9' ) continue ;
sb.Append(s[i]);
}
string cleaned = sb.ToString();
return cleaned;
}
Or this:
private string CleanStringOfNonDigits_V7(string s)
{
if (string.IsNullOrEmpty(s)) return s;
StringBuilder sb = new StringBuilder(s);
int j = 0 ;
int i = 0 ;
while ( i < sb.Length )
{
bool isDigit = char.IsDigit( sb[i] ) ;
if ( isDigit )
{
sb[j++] = sb[i++];
}
else
{
++i ;
}
}
sb.Length = j;
string cleaned = sb.ToString();
return cleaned;
}
From a standpoint of clarity and cleanness of code, the version 1 is what you want. It's hard to beat a one liner.
If performance matters, my suspicion is that the version 7, the last version, is the winner. It creates one temporary — a StringBuilder() and does the transformation in-place within the StringBuilder's in-place buffer.
The other options all do more work.
use reg expression
string result = Regex.Replace(phoneNumber, #"[^\d]", "");
try something like this
return new String(input.Where(Char.IsDigit).ToArray());
string phoneNumber = "(914) 395-1430";
var numbers = String.Join("", phoneNumber.Where(char.IsDigit));
He means everything #gleng
Regex rgx = new Regex(#"\D");
str = rgx.Replace(str, "");
Instead of a regular expression, you can use a LINQ method:
phoneNumber = String.Concat(phoneNumber.Where(c => c >= '0' && c <= '9'));
or:
phoneNumber = String.Concat(phoneNumber.Where(Char.IsDigit));
Related
I have a text string and I want to replace the dots with underscores except for the last character found in the string.
Example:
input = "video.coffee.example.mp4"
result = "video_coffe_example.mp4"
I have a code but this replaces everything including the last character
first option failed
static string replaceForUnderScore(string file)
{
return file = file.Replace(".", "_");
}
I implemented a second option that works for me but I find that it is very extensive and not very optimized
static string replaceForUnderScore(string file)
{
string result = "";
var splits = file.Split(".");
var extension = splits.LastOrDefault();
splits = splits.Take(splits.Count() - 1).ToArray();
foreach (var strItem in splits)
{
result = result + "_" + strItem;
}
result = result.Substring(1, result.Length-1);
string finalResult = result + "."+extension;
return finalResult;
}
Is there a better way to do it?
Since you work with files, I suggest using Path class: all
we want is to change file name only while keeping extension intact:
static string replaceForUnderScore(string file) =>
Path.GetFileNameWithoutExtension(file).Replace('.', '_') + Path.GetExtension(file);
You can replace all the dots with an underscore except for the last dot by asserting that there is still a dot present to the right when matching one.
string result = Regex.Replace(input, #"\.(?=[^.]*\.)", "_");
The result will be
video_coffee_example.mp4
Regex will help you to do this.
Add the namespace using System.Text.RegularExpressions;
And use this code:
var regex = new Regex(Regex.Escape("."));
var newText = regex.Replace("video.coffee.example.mp4", "_", 2);
Here we specified the maximum number of times to replace the .
The output would be the following:
video_coffee_example.mp4
Additionally, you can update the code to replace any number of dots excluding the last one.
var replaceChar = '.';
var regex = new Regex(Regex.Escape(replaceChar.ToString()));
var replaceWith = "_";
// The text to process
var text = "video.coffee.example.mp4";
// Count how many chars to replace excluding extension
var replaceCount = text.Count(s => s == replaceChar) - 1;
var newText = regex.Replace(text, replaceWith, replaceCount);
Off the top of my head but this might work.
return $"{file.Replace(".mp4","").Replace(".","_")}.mp4";
The simplest (and probably fastest) way is just to iterate over the string:
static string replaceForUnderScore(string file)
{
StringBuilder sb = new StringBuilder( file.Length ) ;
int lastDot = -1 ;
for ( int i = 0 ; i < file.Length ; ++i )
{
char c = file[i] ;
// if we found a '.', replace it with '_' and save its position
if ( c == '.' )
{
c = '_' ;
lastDot = i ;
}
sb.Append( c ) ;
}
// if we changed any '.' to '_', convert the last such replacement back to '.'
if ( lastDot >= 0 )
{
sb.Replace ( '.' , '_' , lastDot, 1 );
}
return sb.ToString();
}
Another approach would be to use System.IO.Path. It's certainly the most succinct:
static string replaceForUnderScore( string file )
{
string ext = Path.GetExtension( file ) ;
string name = Path
.GetFileNameWithoutExtension( file )
.Replace( '.' , '_' )
;
return Path.ChangeExtension( name , ext ) ;
}
Hi I am a beginner in C# and I was trying to remove the whitespaces in a string.
I use the following code:
public String RemoveSpace(string str1)
{
char[] source = str1.ToCharArray();
int oldIndex = 0;
int newIndex = 0;
while (oldIndex < source.Length)
{
if (source[oldIndex] != ' ' && source[oldIndex] != '\t')
{
source[newIndex] = source[oldIndex];
newIndex++;
}
oldIndex++;
}
source[oldIndex] = '\0';
return new String(source);
}
But the problem I'm facing is when I give the
input string as "H e l"
the output shows "Hel l"
which is because the at the last iteration oldIndex is at arr[2] being replaced by arr[4] and the last character 'l' is being left out. Can some one point out the mistake that is being done?
Note: There should not be any use of Regex, trim or replace functions.
Thanks.
There's a String constructor which allows you to control the length
So just change the last line to
return new String(source, 0, newIndex);
Note that .NET doesn't care about NUL characters (strings can contain them just fine), so you can remove source[oldIndex] = '\0'; since it's ineffective.
Some key learning points:
Incrementally concatenating strings is relatively slow. Since you know you're going to be doing a 'lot' (indeterminate) number of character-by-character operations, use a char array for the working string.
The fastest way to iterate through characters is C# is to use the built-in string indexer.
If you need to check additional characters besides space, tab, carriage return, and line feed, then add additional conditions in the if statement:
public static string RemoveWhiteSpace(string input) {
int len = input.Length;
int ixOut = 0;
char[] outBuffer = new char[len];
for(int i = 0; i < len; i++) {
char c = input[i];
if(!(c == ' ' || c == '\t' || c == '\r' || c == '\n'))
outBuffer[ixOut++] = c;
}
return new string(outBuffer, 0, ixOut);
}
You can use LINQ for that:
var output = new string(input.Where(x => !char.IsWhiteSpace(x)).ToArray());
Your mistake is you are removing the spaces but your source array still contains the remaining chars.Using that logic you will never get the correct result because you are not removing anything, you are just replacing the chars.After your while loop you can try this:
return new String(source.Take(newIndex+1).ToArray());
Using Take method get the subset of your source array and ignore the rest.
Here is another alternative way of doing this:
var output = string.Concat(input.Split());
You should note that much depends on how you define "whitespace". Unicode and the CLR define whitespace as being a rather exhaustive list of characters: char.IsWhitespace() return true for quite a few characters.
The "classic" definition of whitespace are the following characters: HT, LF, VT, FF, CR and SP (and some might include BS as well).
Myself, I'd probably do something like this:
public static class StringHelpers
{
public static string StripWhitespace( this string s )
{
StringBuilder sb = new StringBuilder() ;
foreach ( char c in s )
{
switch ( c )
{
//case '\b' : continue ; // U+0008, BS uncomment if you want this
case '\t' : continue ; // U+0009, HT
case '\n' : continue ; // U+000A, LF
case '\v' : continue ; // U+000B, VT
case '\f' : continue ; // U+000C, FF
case '\r' : continue ; // U+000D, CR
case ' ' : continue ; // U+0020, SP
}
sb.Append(c) ;
}
string stripped = sb.ToString() ;
return stripped ;
}
}
You could use your approach thusly. However, it's important to READ THE DOCUMENTATION): you'll note the use of a string constructor overload that lets you specify a range within an array as the initialization vector for the string:
public static string StripWhitespace( string s )
{
char[] buf = s.ToCharArray() ;
int j = 0 ; // target pointer
for ( int i = 0 ; i < buf.Length ; ++i )
{
char c = buf[i] ;
if ( !IsWs(c) )
{
buf[j++] = c ;
}
}
string stripped = new string(buf,0,j) ;
return stripped ;
}
private static bool IsWs( char c )
{
bool ws = false ;
switch ( c )
{
//case '\b' : // U+0008, BS uncomment if you want BS as whitespace
case '\t' : // U+0009, HT
case '\n' : // U+000A, LF
case '\v' : // U+000B, VT
case '\f' : // U+000C, FF
case '\r' : // U+000D, CR
case ' ' : // U+0020, SP
ws = true ;
break ;
}
return ws ;
}
You could also use Linq, something like:
public static string StripWhitespace( this string s )
{
return new string( s.Where( c => !char.IsWhiteSpace(c) ).ToArray() ) ;
}
Though, I'm willing to be that the Linq approach will be significantly slower than the other two. It's elegant, though.
This question already has answers here:
Remove characters from C# string
(22 answers)
Closed 9 years ago.
Here's my working code:
string Input;
string Output;
Input = data;
Output = Input.Replace(#")", "");
Here, I am simply removing the parentheses ")" from my string, if it exists. Now how do I expand the list of offending characters like ")" to include "(" and "-" as well?
I realize I can write 2 more Output-like statements, but I'm wondering if there is a better way...
If you're just doing a couple replacements (I see you're only doing three), the easiest way without worrying about Regex or StringBuilders is to chain three Replace calls into one statement:
Output = Input.Replace("(", "").Replace(")", "").Replace("-", "");
... which is marginally better than storing the result in Output every time.
Output = Regex.Replace(Input, "[()-]", "");
The [] characters in the expression create a character class. It doesn't match those character directly.
LINQ solution:
Output = new String(Input.Except("()-").ToArray());
As an alternative to Regex, it may be easier to manage as a collection of replacements and doing the replaces using a StringBuilder.
var replacements = new[] { ")", "-" };
var output = new StringBuilder(Input);
foreach (var r in replacements)
output.Replace(r, string.Empty);
You can use Regex.Replace(), documented here.
You can use a List which contains your badwords. Now just use a foreach loop to iterate over it and replace every bad string.
StringBuilder output = new StringBuilder("(Hello) W,o.r;ld");
List<string> badwords = new List<string>();
badwords.Add("(");
badwords.Add(")");
badwords.Add(",");
badwords.Add(".");
badwords.Add(";");
badwords.ForEach(bad => output = output.Replace(bad, String.Empty));
//Result "Hello World"
Kind regards.
//Edit:
Implemented changes suggested by Khan.
This will allow you to do same thing also
private static string ReplaceBadWords(string[] BadStrings, string input)
{
StringBuilder sb = new StringBuilder(input);
BadStrings.ToList().ForEach(b =>
{
if(b != "")
{
sb = sb.Replace(b, string.Empty);
}
});
return sb.ToString();
}
Sample usage would be
string[] BadStrings = new string[]
{
")",
"(",
"random",
""
};
string input = "Some random text()";
string output = ReplaceBadWords(BadStrings, input);
I'd probably use a regular expression as it's terse and to the point. If you're scared of regular expression, you can teach the computer to write them for you. Here's a simple class for cleaning strings: you just provide it with a list of invalid characters:
class StringCleaner
{
private Regex regex ;
public StringCleaner( string invalidChars ) : this ( (IEnumerable<char>) invalidChars )
{
return ;
}
public StringCleaner ( params char[] invalidChars ) : this( (IEnumerable<char>) invalidChars )
{
return ;
}
public StringCleaner( IEnumerable<char> invalidChars )
{
const string HEX = "0123456789ABCDEF" ;
SortedSet<char> charSet = new SortedSet<char>( invalidChars ) ;
StringBuilder sb = new StringBuilder( 2 + 6*charset.Count ) ;
sb.Append('[') ;
foreach ( ushort c in charSet )
{
sb.Append(#"\u" )
.Append( HEX[ ( c >> 12 ) & 0x000F ] )
.Append( HEX[ ( c >> 8 ) & 0x000F ] )
.Append( HEX[ ( c >> 4 ) & 0x000F ] )
.Append( HEX[ ( c >> 0 ) & 0x000F ] )
;
}
sb.Append(']') ;
this.regex = new Regex( sb.ToString() ) ;
}
public string Clean( string s )
{
if ( string.IsNullOrEmpty(s) ) return s ;
string value = this.regex.Replace(s,"") ;
return value ;
}
}
Once you have that, it's easy:
static void Main(string[] args)
{
StringCleaner cleaner = new StringCleaner( "aeiou" ) ;
string dirty = "The quick brown fox jumped over the lazy dog." ;
string clean = cleaner.Clean(dirty) ;
Console.WriteLine( clean ) ;
return;
}
At the end of which clean is Th qck brwn fx jmpd vr th lzy dg.
Easy!
I have the string
DobuleGeneric<DoubleGeneric<int,string>,string>
I am trying to grab the 2 type arguments:
DoubleGeneric<int,string> and string
Initially I was using a split on ','. This worked, but only if the generic args are not themeselves generic.
My Code:
string fullName = "DobuleGeneric<DoubleGeneric<int,string>,string>";
Regex regex = new Regex( #"([a-zA-Z\._]+)\<(.+)\>$" );
Match m = regex.Match( fullName );
string frontName = m.Groups[1].Value;
string[] innerTypes = m.Groups[2].Value.Split( ',' );
foreach( string strInnerType in innerTypes ) {
Console.WriteLine( strInnerType );
}
Question:
How do I do a regex split on commas that are not encapsulated in angle brackets?
Both commas are between angle brackets! Regex does a bad job when parsing a complex nested syntax. The question should be, how to find a comma, which is between angle brackets that are themselves not between angle brackets. I don't think that this can be done with regex.
If possible, try to work with Reflection. You might also use CS-Script to compile your code snippet and then use Reflection to retrieve the information you need.
To split the example you have given you can use the following. However, this is not generic; it could be made generic based on the other strings that you expect. Depending on the variation of the strings you have, this method could get complex; but I would suggest that the use of Roslyn here is overkill...
string fullName = "DobuleGeneric<DoubleGeneric<int,string>,string>";
Regex Reg =
new Regex(#"(?i)<\s*\p{L}+\s*<\s*\p{L}+\s*,\s*\p{L}+\s*>\s*,\s*\p{L}+\s*>");
Match m = Reg.Match(fullName);
string str = m.ToString().Trim(new char[] { '<', '>' });
Regex rr = new Regex(#"(?i),(?!.*>\s*)");
string[] strArr = rr.Split(str);
I hope this helps.
The answers are correct, using Regex is the wrong approach.
I ended up doing a linear pass, replacing items encapsulated in brackets with ~s, and then doing a split.
static void Main( string[] args ) {
string fullName = "Outer<blah<int,string>,int,blah<int,int>>";
Regex regex = new Regex( #"([a-zA-Z\._]+)\<(.+)\>$" );
Match m = regex.Match( fullName );
string frontName = m.Groups[1].Value;
string inner = m.Groups[2].Value;
var genArgs = ParseInnerGenericArgs( inner );
foreach( string s in genArgs ) {
Console.WriteLine(s);
}
Console.ReadKey();
}
private static IEnumerable<string> ParseInnerGenericArgs( string inner ) {
List<string> pieces = new List<string>();
int angleCount = 0;
StringBuilder sb = new StringBuilder();
for( int i = 0; i < inner.Length; i++ ) {
string currChar = inner[i].ToString();
if( currChar == ">" ) {
angleCount--;
}
if( currChar == "<" ) {
angleCount++;
}
if( currChar == "," && angleCount > 0 ) {
sb.Append( "~" );
} else {
sb.Append( currChar );
}
}
foreach( string item in sb.ToString().Split( ',' ) ) {
pieces.Add(item.Replace('~',','));
}
return pieces;
}
Here is the regex I will use:
\<(([\w\.]+)(\<.+\>)?)\,(([\w\.]+)(\<.+\>)?)$
([\w\.]+) matches "DoubleGeneric".
(\<.+\>)? matches the possible generic args like DoubleGeneric<OtherGeneric<int, ...>>
The key point is that no matter how many nested generic args you have you will have only one ">," in the whole expression.
You can use m.Gruops[1] and m.Groups[4] to get the first and second Type.
How to remove leading zeros in strings using C#?
For example in the following numbers, I would like to remove all the leading zeros.
0001234
0000001234
00001234
This is the code you need:
string strInput = "0001234";
strInput = strInput.TrimStart('0');
It really depends on how long the NVARCHAR is, as a few of the above (especially the ones that convert through IntXX) methods will not work for:
String s = "005780327584329067506780657065786378061754654532164953264952469215462934562914562194562149516249516294563219437859043758430587066748932647329814687194673219673294677438907385032758065763278963247982360675680570678407806473296472036454612945621946";
Something like this would
String s ="0000058757843950000120465875468465874567456745674000004000".TrimStart(new Char[] { '0' } );
// s = "58757843950000120465875468465874567456745674000004000"
Code to avoid returning an empty string ( when input is like "00000").
string myStr = "00012345";
myStr = myStr.TrimStart('0');
myStr = myStr.Length > 0 ? myStr : "0";
return numberString.TrimStart('0');
Using the following will return a single 0 when input is all 0.
string s = "0000000"
s = int.Parse(s).ToString();
TryParse works if your number is less than Int32.MaxValue. This also gives you the opportunity to handle badly formatted strings. Works the same for Int64.MaxValue and Int64.TryParse.
int number;
if(Int32.TryParse(nvarchar, out number))
{
// etc...
number.ToString();
}
This Regex let you avoid wrong result with digits which consits only from zeroes "0000" and work on digits of any length:
using System.Text.RegularExpressions;
/*
00123 => 123
00000 => 0
00000a => 0a
00001a => 1a
00001a => 1a
0000132423423424565443546546356546454654633333a => 132423423424565443546546356546454654633333a
*/
Regex removeLeadingZeroesReg = new Regex(#"^0+(?=\d)");
var strs = new string[]
{
"00123",
"00000",
"00000a",
"00001a",
"00001a",
"0000132423423424565443546546356546454654633333a",
};
foreach (string str in strs)
{
Debug.Print(string.Format("{0} => {1}", str, removeLeadingZeroesReg.Replace(str, "")));
}
And this regex will remove leading zeroes anywhere inside string:
new Regex(#"(?<!\d)0+(?=\d)");
// "0000123432 d=0 p=002 3?0574 m=600"
// => "123432 d=0 p=2 3?574 m=600"
Regex rx = new Regex(#"^0+(\d+)$");
rx.Replace("0001234", #"$1"); // => "1234"
rx.Replace("0001234000", #"$1"); // => "1234000"
rx.Replace("000", #"$1"); // => "0" (TrimStart will convert this to "")
// usage
var outString = rx.Replace(inputString, #"$1");
I just crafted this as I needed a good, simple way.
If it gets to the final digit, and if it is a zero, it will stay.
You could also use a foreach loop instead for super long strings.
I just replace each leading oldChar with the newChar.
This is great for a problem I just solved, after formatting an int into a string.
/* Like this: */
int counterMax = 1000;
int counter = ...;
string counterString = counter.ToString($"D{counterMax.ToString().Length}");
counterString = RemoveLeadingChars('0', ' ', counterString);
string fullCounter = $"({counterString}/{counterMax})";
// = ( 1/1000) ... ( 430/1000) ... (1000/1000)
static string RemoveLeadingChars(char oldChar, char newChar, char[] chars)
{
string result = "";
bool stop = false;
for (int i = 0; i < chars.Length; i++)
{
if (i == (chars.Length - 1)) stop = true;
if (!stop && chars[i] == oldChar) chars[i] = newChar;
else stop = true;
result += chars[i];
}
return result;
}
static string RemoveLeadingChars(char oldChar, char newChar, string text)
{
return RemoveLeadingChars(oldChar, newChar, text.ToCharArray());
}
I always tend to make my functions suitable for my own library, so there are options.