Let's say I have a foreach-loop with strings like this:
String newStr='';
String str='a b c d e';
foreach(String strChar in str.split(' ')) {
newStr+=strChar+',';
}
the result would be something like: a,b,c,d,e, but what I want is a,b,c,d,e without the last comma. I normally split the last comma out but this seems ugly and overweight. Is there any lightweight way to do this?
Additional to this question: Is there any easy solution to add an "and" to the constellation that the result is something like: a, b, c, d and e for user output?
p.s.: I know that I can use the replace-method in the example but this is not what I'm looking because in most cases you can't use it (for example when you build a sql string).
I would use string.Join:
string newStr = string.Join(",", str.Split(' '));
Alternatively, you could add the separator at the start of the body of the loop, but not on the first time round.
I'd suggest using StringBuilder if you want to keep doing this by hand though. In fact, with a StringBuilder you could just unconditionally append the separator, and then decrement the length at the end to trim that end.
You also wrote:
for example when you build a sql string
It's very rarely a good idea to build a SQL string like this. In particular, you should absolutely not use strings from user input here - use parameterized SQL instead. Building SQL is typically the domain of ORM code... in which case it's usually better to use an existing ORM than to roll your own :)
you're characterizing the problem as appending a comma after every string except the last. Consider characterizing it as prepending a comma before every string but the first. It's an easier problem.
As for your harder version there are several dozen solutions on my blog and in this question.
Eric Lippert's challenge "comma-quibbling", best answer?
string.Join may be your friend:
String str='a b c d e';
var newStr = string.Join(",", str.Split(' '));
Here's how you can do it where you have "and" before the last value.
var vals = str.Split(' ');
var ans = vals.Length == 1 ?
str :
string.Join(", ", vals.Take(vals.Length - 1))) + ", and " + vals.Last();
newStr = String.Join(",", str.split(' '));
You can use Regex and replace whitespaces with commas
string newst = Regex.Replace(input, " ", ",");
First, you should be using a StringBuilder for string manipulations of this sort. Second, it's just an if conditional on the insert.
System.Text.StringBuilder newStr = new System.Text.StringBuilder("");
string oldStr = "a b c d e";
foreach(string c in oldStr.Split(' ')) {
if (newStr.Length > 0) newStr.Append(",");
newStr.Append(c);
}
Related
Now I'm parsing a text, I want to split and add one by one
But first thing first, the best way is to replace multiple spaces with one unique deliminator
Below is the sample target text:
Total fare 619,999.0d-
12 11 82139 09/13/2013 D 103,500.00 2/025189 PARK LA000137
09/13/2013 D 50.00 File Ticket - PS1309121018882/
Can anybody know how to handle it in C#?
the best way is to replace multiple spaces with one unique
deliminator
Not really sure if its the best way, but following works, without REGEX
string newStr = string.Join(":",
str.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
try
var strings = text.Split(' ').Where(str => str.Length > 0);
You can use a regular expression:
string delimiter = ":";
var whiteSpaceNormalised = Regex.Replace(input, #"\s+", delimiter);
Use regular expressions instead, replace more than one occurrence of space with single space
string parsedText = System.Text.RegularExpressions.Regex.Replace(inputString,"[ ]+"," ");
INPUT : There's string that numbers, and a string, dots and spaces. Notice that e defines a the separator between the numbers.
e.27.3.90.. .e 3.50 2.30..e2.0.1.2. .50..
OUTPUT : I want to remove all the spaces and those extra dots except for the one that makes up following and add a , before e,
,e273.90,e3502.30,e2012.50
Best catch was this How to remove extra decimal points?. But it's based on Javascript parseFloat().
I also saw this post : Convert to valid decimal data type. But that's in terms of SQL and pretty much using multiple replace().
PS: There are so many posts regarding regex in various kind. I tried to build one, but seems like no success so far.
Please propose any efficient one shot regex or ideas.
Would like to hear the performance gain/loss of this regex vs multiple replace()
Here is the code I have been gasping ;)..:
List<string> myList;
string s = "";
string s2 = "";
string str = "e.27.3.90..bl% .e 3.50 2.30. #rp.e2.0.1.2..50..y*x";
s = Regex.Replace(str, #"\b[a-df-z',\s]+", "");
myList = new List<string>(Regex.Split(s, #"[e]"));
Last str is your result
string str = "e.27.3.90..bl% .e 3.50 2.30. #rp.e2.0.1.2..50..y*x";
str = Regex.Replace(str, "[^e^0-9]", "");
str = Regex.Replace(str, "([0-9]{2}?)(e|$)", ".$1,$2");
//str = "," + str.Substring(0, str.Length - 1);
Remove all dots from the string.
Split the string into separate items at each "e".
For each item, add a dot before the last 2 digits.
Recombine the items back into one string, placing a comma between items.
These steps are easily performed with the standard String methods, but you could use regexes if you want.
string str = "Student_123_";
I need to replace the last character "_" with ",". I did it like this.
str.Remove(str.Length -1, 1);
str = str + ",";
However, is it possible to achieve it more efficiently. may be one line of code.??
BTW, last character can be any character. So Replace wont work here.
No.
In C# strings are immutable and thus you can not change the string "in-place". You must first remove a part of the string and then create a new string. In fact, this is also means your original code is wrong, since str.Remove(str.Length -1, 1); doesn't change str at all, it returns a new string! This should do:
str = str.Remove(str.Length -1, 1) + ",";
C# .NET makes it almost too easy.
str = str.TrimEnd('_')
Elegant but not very efficient.
Replaces any character at the end of str with a comma.
str = Regex.Replace(str, ".$", ",");
That's a limitation of working with string. You can use StringBuilder if you need to do a lot of changes like this. But it's not worth it for the simple task you need.
str = str.Substring(0, str.Length - 1) + ",";
Use the StringBuilder class
StringBuilder mbuilder = new StringBuilder("Student_123_");
mbuilder[mbuilder.Length-1] = ',';
Console.WriteLine(mbuilder.ToString());
str = str.Substring(0, str.Length-1) + ",";
Well, what you have won't work because str.Remove(...) doesn't manipulate str, it returns a new string with the removal operation completed on it.
So - you need:
str = str.Remove(str.Length-1,1);
str = str + ",";
In terms of efficiency, there are several other choices you could make (substring, trim ...) but ultimately you're going to get the same time/space complexity.
EDIT:
Also, don't try to squash everything into one line, the programmers who come after you will appreciate the greater readability. (Although in this case a single line is just as easy to read.) One line != more efficient.
With one line of code you could write:
str = str.Remove(str.Length - 1, 1) + ",";
str.Remove doesn't modify str, it returns a new string. Your first line should read str = str.Remove...
One line? OK: str = str.Remove(str.Length - 1) + ",";
I think that's as efficient as you're going to get. Technically, you are creating two new strings here, not one (The result of the Remove, and the result of the Concatenation). However, everything I can think of to not create two strings, ends up creating more than 1 other object to do so. You could use a StringBuilder, but that's heavier weight than an extra string, or perhaps a char[], but it's still an extra object, no better than what I have listed above.
//You mean like this? :D
string str = "Student_123_";
str = $"{str.Remove(str.Length -1)},";
For the hope-to-have-an-answer-in-30-seconds part of this question, I'm specifically looking for C#
But in the general case, what's the best way to strip punctuation in any language?
I should add: Ideally, the solutions won't require you to enumerate all the possible punctuation marks.
Related: Strip Punctuation in Python
new string(myCharCollection.Where(c => !char.IsPunctuation(c)).ToArray());
Why not simply:
string s = "sxrdct?fvzguh,bij.";
var sb = new StringBuilder();
foreach (char c in s)
{
if (!char.IsPunctuation(c))
sb.Append(c);
}
s = sb.ToString();
The usage of RegEx is normally slower than simple char operations. And those LINQ operations look like overkill to me. And you can't use such code in .NET 2.0...
Describes intent, easiest to read (IMHO) and best performing:
s = s.StripPunctuation();
to implement:
public static class StringExtension
{
public static string StripPunctuation(this string s)
{
var sb = new StringBuilder();
foreach (char c in s)
{
if (!char.IsPunctuation(c))
sb.Append(c);
}
return sb.ToString();
}
}
This is using Hades32's algorithm which was the best performing of the bunch posted.
Assuming "best" means "simplest" I suggest using something like this:
String stripped = input.replaceAll("\\p{Punct}+", "");
This example is for Java, but all sufficiently modern Regex engines should support this (or something similar).
Edit: the Unicode-Aware version would be this:
String stripped = input.replaceAll("\\p{P}+", "");
The first version only looks at punctuation characters contained in ASCII.
You can use the regex.replace method:
replace(YourString, RegularExpressionWithPunctuationMarks, Empty String)
Since this returns a string, your method will look something like this:
string s = Regex.Replace("Hello!?!?!?!", "[?!]", "");
You can replace "[?!]" with something more sophiticated if you want:
(\p{P})
This should find any punctuation.
This thread is so old, but I'd be remiss not to post a more elegant (IMO) solution.
string inputSansPunc = input.Where(c => !char.IsPunctuation(c)).Aggregate("", (current, c) => current + c);
It's LINQ sans WTF.
Based off GWLlosa's idea, I was able to come up with the supremely ugly, but working:
string s = "cat!";
s = s.ToCharArray().ToList<char>()
.Where<char>(x => !char.IsPunctuation(x))
.Aggregate<char, string>(string.Empty, new Func<string, char, string>(
delegate(string s, char c) { return s + c; }));
The most braindead simple way of doing it would be using string.replace
The other way I would imagine is a regex.replace and have your regular expression with all the appropriate punctuation marks in it.
Here's a slightly different approach using linq. I like AviewAnew's but this avoids the Aggregate
string myStr = "Hello there..';,]';';., Get rid of Punction";
var s = from ch in myStr
where !Char.IsPunctuation(ch)
select ch;
var bytes = UnicodeEncoding.ASCII.GetBytes(s.ToArray());
var stringResult = UnicodeEncoding.ASCII.GetString(bytes);
If you want to use this for tokenizing text you can use:
new string(myText.Select(c => char.IsPunctuation(c) ? ' ' : c).ToArray())
For anyone who would like to do this via RegEx:
This code shows the full RegEx replace process and gives a sample Regex that only keeps letters, numbers, and spaces in a string - replacing ALL other characters with an empty string:
//Regex to remove all non-alphanumeric characters
System.Text.RegularExpressions.Regex TitleRegex = new
System.Text.RegularExpressions.Regex("[^a-z0-9 ]+",
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
string ParsedString = TitleRegex.Replace(stringToParse, String.Empty);
return ParsedString;
I faced the same issue and was concerned about the performance impact of calling the IsPunctuation for every single check.
I found this post: http://www.dotnetperls.com/char-ispunctuation.
Accross the lines: char.IsPunctuation also handles Unicode on top of ASCII.
The method matches a bunch of characters including control characters. By definiton, this method is heavy and expensive.
The bottom line is that I finally didn't go for it because of its performance impact on my ETL process.
I went for the custom implemetation of dotnetperls.
And jut FYI, here is some code deduced from the previous answers to get the list of all punctuation characters (excluding the control ones):
var punctuationCharacters = new List<char>();
for (int i = char.MinValue; i <= char.MaxValue; i++)
{
var character = Convert.ToChar(i);
if (char.IsPunctuation(character) && !char.IsControl(character))
{
punctuationCharacters.Add(character);
}
}
var commaSeparatedValueOfPunctuationCharacters = string.Join("", punctuationCharacters);
Console.WriteLine(commaSeparatedValueOfPunctuationCharacters);
Cheers,
Andrew
$newstr=ereg_replace("[[:punct:]]",'',$oldstr);
For long strings I use this:
var normalized = input
.Where(c => !char.IsPunctuation(c))
.Aggregate(new StringBuilder(),
(current, next) => current.Append(next), sb => sb.ToString());
performs much better than using string concatenations (though I agree it's less intuitive).
This is simple code for removing punctuation from strings given by the user
Import required library
import string
Ask input from user in string format
strs = str(input('Enter your string:'))
for c in string.punctuation:
strs= strs.replace(c,"")
print(f"\n Your String without punctuation:{strs}")
#include<string>
#include<cctype>
using namespace std;
int main(int a, char* b[]){
string strOne = "H,e.l/l!o W#o#r^l&d!!!";
int punct_count = 0;
cout<<"before : "<<strOne<<endl;
for(string::size_type ix = 0 ;ix < strOne.size();++ix)
{
if(ispunct(strOne[ix]))
{
++punct_count;
strOne.erase(ix,1);
ix--;
}//if
}
cout<<"after : "<<strOne<<endl;
return 0;
}//main
I'm doing simple string input parsing and I am in need of a string tokenizer. I am new to C# but have programmed Java, and it seems natural that C# should have a string tokenizer. Does it? Where is it? How do I use it?
You could use String.Split method.
class ExampleClass
{
public ExampleClass()
{
string exampleString = "there is a cat";
// Split string on spaces. This will separate all the words in a string
string[] words = exampleString.Split(' ');
foreach (string word in words)
{
Console.WriteLine(word);
// there
// is
// a
// cat
}
}
}
For more information see Sam Allen's article about splitting strings in c# (Performance, Regex)
I just want to highlight the power of C#'s Split method and give a more detailed comparison, particularly from someone who comes from a Java background.
Whereas StringTokenizer in Java only allows a single delimiter, we can actually split on multiple delimiters making regular expressions less necessary (although if one needs regex, use regex by all means!) Take for example this:
str.Split(new char[] { ' ', '.', '?' })
This splits on three different delimiters returning an array of tokens. We can also remove empty arrays with what would be a second parameter for the above example:
str.Split(new char[] { ' ', '.', '?' }, StringSplitOptions.RemoveEmptyEntries)
One thing Java's String tokenizer does have that I believe C# is lacking (at least Java 7 has this feature) is the ability to keep the delimiter(s) as tokens. C#'s Split will discard the tokens. This could be important in say some NLP applications, but for more general purpose applications this might not be a problem.
The split method of a string is what you need. In fact the tokenizer class in Java is deprecated in favor of Java's string split method.
I think the nearest in the .NET Framework is
string.Split()
For complex splitting you could use a regex creating a match collection.
_words = new List<string>(YourText.ToLower().Trim('\n', '\r').Split(' ').
Select(x => new string(x.Where(Char.IsLetter).ToArray())));
Or
_words = new List<string>(YourText.Trim('\n', '\r').Split(' ').
Select(x => new string(x.Where(Char.IsLetterOrDigit).ToArray())));
The similar to Java's method is:
Regex.Split(string, pattern);
where
string - the text you need to split
pattern - string type pattern, what is splitting the text
use Regex.Split(string,"#|#");
read this, split function has an overload takes an array consist of seperators
http://msdn.microsoft.com/en-us/library/system.stringsplitoptions.aspx
If you're trying to do something like splitting command line arguments in a .NET Console app, you're going to have issues because .NET is either broken or is trying to be clever (which means it's as good as broken). I needed to be able to split arguments by the space character, preserving any literals that were quoted so they didn't get split in the middle. This is the code I wrote to do the job:
private static List<String> Tokenise(string value, char seperator)
{
List<string> result = new List<string>();
value = value.Replace(" ", " ").Replace(" ", " ").Trim();
StringBuilder sb = new StringBuilder();
bool insideQuote = false;
foreach(char c in value.ToCharArray())
{
if(c == '"')
{
insideQuote = !insideQuote;
}
if((c == seperator) && !insideQuote)
{
if (sb.ToString().Trim().Length > 0)
{
result.Add(sb.ToString().Trim());
sb.Clear();
}
}
else
{
sb.Append(c);
}
}
if (sb.ToString().Trim().Length > 0)
{
result.Add(sb.ToString().Trim());
}
return result;
}
If you are using C# 3.5 you could write an extension method to System.String that does the splitting you need. You then can then use syntax:
string.SplitByMyTokens();
More info and a useful example from MS here http://msdn.microsoft.com/en-us/library/bb383977.aspx