I'm looking for a C# snippet to remove and store any punctuation from the end of a string only.
Example:
Test! would return !
Test;; would return ;;
Test?:? would return ?:?
!!Test!?! would return !?!
I have a rather clunky solution at the moment but wondered if anybody could suggest a more succinct way to do this.
My puncutation list is
new char[] { '.', ':', '-', '!', '?', ',', ';' })
You could use the following regular expression:
\p{P}*$
This breaks down to:
\p{P} - Unicode punctuation
* - Any number of times
$ - End of line anchor
If you know that there will always be some punctuation at the end of the string, use + for efficiency.
And use it like this in order to get the punctuation:
string punctuation = Regex.Match(myString, #"\p{P}*$").Value;
To actually remove it:
string noPunctuation = Regex.Replace(myString, #"\p{P}*$", string.Empty);
Use a regex:
resultString = Regex.Replace(subjectString, #"[.:!?,;-]+$", "");
Explanation:
[.:!?,;-] # Match a character that's one of the enclosed characters
+ # Do this once or more (as many times as possible)
$ # Assert position at the end of the string
As Oded suggested, use \p{P} instead of [.:!?,;-] if you want to remove all punctuation characters, not just the ones from your list.
To also "store" the punctuation, you could split the string:
splitArray = Regex.Split(subjectString, #"(?=\p{P}+$)");
Then splitArray[0] contains the part before the punctuation, and splitArray[1] the punctuation characters. If there are any.
Using Linq:
var punctuationMap = new HashSet<char>(new char[] { '.', ':', '-', '!', '?', ',', ';' });
var endPunctuationChars = aString.Reverse().
TakeWhile(ch => punctuationMap.Contains(ch));
var result = new string(endPunctuationChars.Reverse().ToArray());
The HashSet is not mandatory, you can use Linq's Contains on the array directly.
Related
I'm trying split a string in CRM using different characters (like whitespace, comma, period, colon, semicolon, slash, pipe). But I also need to split on a new line as well.
The below function is working to split using different characters:
string[] values = propertylist.Split(new Char[] { ' ', ',', '.', ':','\t', '/', ';', '|', '\\', '\r', '\n'});
I read that for new line the symbol must be '\r\n'.. but for some reason if I change the function a little bit from Split(new Char[] to Split(new String[], even after changing to use double quotation mark, I keep getting error "Cannot convert from string[] to char[]..." even though I am already using double quotation mark.
Any suggestions for this is appreciated very much. Thanks!
-elisabeth
Most likely you changed the code in your question to this:
string[] values =
propertylist.Split(new string[] { " ", ... "\r", "\n"});
The problem is that the method overload that accepts a string[] requires additional parameters. Without supplying those parameters, you get a syntax error.
This is the closest match for your original code:
string[] values =
propertylist.Split(new string[] { " ", ... "\r\n"}, StringSplitOptions.None);
I'm trying to find the last operator (+, -, * or /) in a string.
I was trying to use the method string.indexof('operator', i);, but in this case I could only get the single type of operator. Is there any better solution for this?
The value of string could, for example, be:
1+1/2*3-4
or
1/2-3+4*7
It means the last operator could be any of them.
http://msdn.microsoft.com/en-us/library/system.string.lastindexofany.aspx
The LastIndexOfAny method is what you're after. It will take an array of characters, and find the last occurrence of any of the characters.
var myString = "1/2-3+4*7";
var lastOperatorIndex = myString.LastIndexOfAny(new char[] { '+', '-', '/', '*' });
In this scenario, lastOperatorIndex == 7
If you're wanting to store the char itself to a variable you could have:
var myString = "1/2-3+4*7";
var operatorChar = myString[myString.LastIndexOfAny(new char[] { '+', '-', '/', '*' })];
I need to split a string in C# using a set of delimiter characters. This set should include the default whitespaces (i.e. what you effectively get when you String.Split(null, StringSplitOptions.RemoveEmptyEntries)) plus some additional characters that I specify like '.', ',', ';', etc. So if I have a char array of those additional characters, how to I add all the default whitespaces to it, in order to then feed that expanded array to String.Split? Or is there a better way of splitting using my custom delimiter set + whitespaces? Thx
Just use the appropriate overload of string.Split if you're at least on .NET 2.0:
char[] separator = new[] { ' ', '.', ',', ';' };
string[] parts = text.Split(separator, StringSplitOptions.RemoveEmptyEntries);
I guess i was downvoted because of the incomplete answer. OP has asked for a way to split by all white-spaces(which are 25 on my pc) but also by other delimiters:
public static class StringExtensions
{
static StringExtensions()
{
var whiteSpaceList = new List<char>();
for (int i = char.MinValue; i <= char.MaxValue; i++)
{
char c = Convert.ToChar(i);
if (char.IsWhiteSpace(c))
{
whiteSpaceList.Add(c);
}
}
WhiteSpaces = whiteSpaceList.ToArray();
}
public static readonly char[] WhiteSpaces;
public static string[] SplitWhiteSpacesAndMore(this string str, IEnumerable<char> otherDeleimiters, StringSplitOptions options = StringSplitOptions.None)
{
var separatorList = new List<char>(WhiteSpaces);
separatorList.AddRange(otherDeleimiters);
return str.Split(separatorList.ToArray(), options);
}
}
Now you can use this extension method in this way:
string str = "word1 word2\tword3.word4,word5;word6";
char[] separator = { '.', ',', ';' };
string[] split = str.SplitWhiteSpacesAndMore(separator, StringSplitOptions.RemoveEmptyEntries);
The answers above do not use all whitespace characters as delimiters, as you state in your request, only the ones specified by the program. In the solution examples above, this is only SPACE, but not TAB, CR, LF, and all the other Unicode-defined whitespace chars.
I have not found a way to retrieve the default whitespace chars from String. However, they are defined in Regex, and you can use that instead of String. In your case, adding period and comma to the Regex whitespace set:
Regex regex = new Regex(#"[\s\.,]+"); // The "+" will remove blank entries
input = #"1.2 3, 4";
string[] tokens = regex.Split(input);
will produce
tokens[0] "1"
tokens[1] "2"
tokens[2] "3"
tokens[3] "4"
str.Split(" .,;".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
I use something like the following to ensure I'm always splitting on Split's default whitespace characters:
public static string[] SplitOnWhitespaceAnd(this string value,
char[] separator, StringSplitOptions options = StringSplitOptions.RemoveEmptyEntries)
=> value.Split().SelectMany(s => s.Split(separator, options)).ToArray();
Note that to be consistent with Microsoft's naming conventions, you'd want to use WhiteSpace rather than Whitespace.
Refer to Microsoft's Char.IsWhiteSpace documentation to see the whitespace characters split on by default.
string[] splitSentence(string sentence)
{
return sentence
.Replace(",", " , ")
.Replace(".", " . ")
.Split(' ', StringSplitOptions.RemoveEmptyEntries)
}
or
string[] result = test.Split(new string[] {"\n", "\r\n"},
StringSplitOptions.RemoveEmptyEntries);
I have a string "mystring theEnd" but I want to do a string.Split on white space, not just on a space because I want to get a string[] that contains "mystring" and "theEnd" between "mystring" and "theEnd" there is an unknown amount of spaces, this is why I need to split on whitespace. Is there a way to do this?
How about:
string[] bits = text.Split(new[] {' '}, StringSplitOptions.RemoveEmptEntries);
(Or text.Split specifying the exact whitespace characters you want to split on, or using null as Henk suggested.)
Or you could use a regex to handle all whitespace characters:
Regex regex = new Regex(#"\s+");
string[] bits = regex.Split(text);
Simplest is to do:
a.Split(new [] {' ', '\t'},StringSplitOptions.RemoveEmptyEntries)
Thanks Jon :)
I want to make sure a string has only characters in this range
[a-z] && [A-Z] && [0-9] && [-]
so all letters and numbers plus the hyphen.
I tried this...
C# App:
char[] filteredChars = { ',', '!', '#', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '{', '}', '[', ']', ':', ';', '"', '\'', '?', '/', '.', '<', '>', '\\', '|' };
string s = str.TrimStart(filteredChars);
This TrimStart() only seems to work with letters no otehr characters like $ % etc
Did I implement it wrong?
Is there a better way to do it?
I just want to avoid looping through each string's index checking because there will be a lot of strings to do...
Thoughts?
Thanks!
This seems like a perfectly valid reason to use a regular expression.
bool stringIsValid = Regex.IsMatch(inputString, #"^[a-zA-Z0-9\-]*?$");
In response to miguel's comment, you could do this to remove all unwanted characters:
string cleanString = Regex.Replace(inputString, #"[^a-zA-Z0-9\-]", "");
Note that the caret (^) is now placed inside the character class, thus negating it (matching any non-allowed character).
Here's a fun way to do it with LINQ - no ugly loops, no complicated RegEx:
private string GetGoodString(string input)
{
var allowedChars =
Enumerable.Range('0', 10).Concat(
Enumerable.Range('A', 26)).Concat(
Enumerable.Range('a', 26)).Concat(
Enumerable.Range('-', 1));
var goodChars = input.Where(c => allowedChars.Contains(c));
return new string(goodChars.ToArray());
}
Feed it "Hello, world? 123!" and it will return "Helloworld123".
Why not just use replace instead? Trimstart will only remove the leading characters in your list...
Try the following
public bool isStringValid(string input) {
if ( null == input ) {
throw new ArgumentNullException("input");
}
return System.Text.RegularExpressions.Regex.IsMatch(input, "^[A-Za-z0-9\-]*$");
}
I'm sure that with a bit more time you can come up wiht something better, but this will give you a good idea:
public string NumberOrLetterOnly(string s)
{
string rtn = s;
for (int i = 0; i < s.Length; i++)
{
if (!char.IsLetterOrDigit(rtn[i]) && rtn[i] != '-')
{
rtn = rtn.Replace(rtn[i].ToString(), " ");
}
}
return rtn.Replace(" ", "");
}
I have tested these two solutions in Linqpad 5. The benefit of these is that they can be used not only for integers, but also decimals / floats with a number decimal separator, which is culture dependent. For example, in Norway we use the comma as the decimal separator, whereas in the US, the dot is used. The comma is used there as a thousands separator. Anyways, first the Linq version and then the Regex version. The most terse bit is accessing the Thread's static property for number separator, but you can compress this a bit using static at the top of the code, or better - put such functionality into C# extension methods, preferably having overloads with arbitrary Regex patterns.
string crappyNumber = #"40430dfkZZZdfldslkggh430FDFLDEFllll340-DIALNOWFORCHRISTSAKE.,CAKE-FORFIRSTDIAL920932903209032093294faøj##R#KKL##K";
string.Join("", crappyNumber.Where(c => char.IsDigit(c)|| c.ToString() == Thread.CurrentThread.CurrentCulture.NumberFormat.NumberDecimalSeparator)).Dump();
new String(crappyNumber.Where(c => new Regex($"[\\d]+{Thread.CurrentThread.CurrentUICulture.NumberFormat.NumberDecimalSeparator}\\d+").IsMatch(c.ToString())).ToArray()).Dump();
Note to the code above, the Dump() method dumps the results to Linqpad. Your code will of course skip this very last part. Also note that we got it down to a one liner, but it is a bit verbose still and can be put into C# extension methods as suggested.
Also, instead of string.join, newing a new String object is more compact syntax and less error prone.
We got a crappy number as input, but we managed to get our number in the end! And it is Culture aware in C#!