Is there a better way of doing this...
MyString.Trim().Replace("&", "and").Replace(",", "").Replace(" ", " ")
.Replace(" ", "-").Replace("'", "").Replace("/", "").ToLower();
I've extended the string class to keep it down to one job but is there a quicker way?
public static class StringExtension
{
public static string clean(this string s)
{
return s.Replace("&", "and").Replace(",", "").Replace(" ", " ")
.Replace(" ", "-").Replace("'", "").Replace(".", "")
.Replace("eacute;", "é").ToLower();
}
}
Just for fun (and to stop the arguments in the comments)
I've shoved a gist up benchmarking the various examples below.
https://gist.github.com/ChrisMcKee/5937656
The regex option scores terribly; the dictionary option comes up the fastest; the long winded version of the stringbuilder replace is slightly faster than the short hand.
Quicker - no. More effective - yes, if you will use the StringBuilder class. With your implementation each operation generates a copy of a string which under circumstances may impair performance. Strings are immutable objects so each operation just returns a modified copy.
If you expect this method to be actively called on multiple Strings of significant length, it might be better to "migrate" its implementation onto the StringBuilder class. With it any modification is performed directly on that instance, so you spare unnecessary copy operations.
public static class StringExtention
{
public static string clean(this string s)
{
StringBuilder sb = new StringBuilder (s);
sb.Replace("&", "and");
sb.Replace(",", "");
sb.Replace(" ", " ");
sb.Replace(" ", "-");
sb.Replace("'", "");
sb.Replace(".", "");
sb.Replace("eacute;", "é");
return sb.ToString().ToLower();
}
}
If you are simply after a pretty solution and don't need to save a few nanoseconds, how about some LINQ sugar?
var input = "test1test2test3";
var replacements = new Dictionary<string, string> { { "1", "*" }, { "2", "_" }, { "3", "&" } };
var output = replacements.Aggregate(input, (current, replacement) => current.Replace(replacement.Key, replacement.Value));
this will be more efficient:
public static class StringExtension
{
public static string clean(this string s)
{
return new StringBuilder(s)
.Replace("&", "and")
.Replace(",", "")
.Replace(" ", " ")
.Replace(" ", "-")
.Replace("'", "")
.Replace(".", "")
.Replace("eacute;", "é")
.ToString()
.ToLower();
}
}
Maybe a little more readable?
public static class StringExtension {
private static Dictionary<string, string> _replacements = new Dictionary<string, string>();
static StringExtension() {
_replacements["&"] = "and";
_replacements[","] = "";
_replacements[" "] = " ";
// etc...
}
public static string clean(this string s) {
foreach (string to_replace in _replacements.Keys) {
s = s.Replace(to_replace, _replacements[to_replace]);
}
return s;
}
}
Also add New In Town's suggestion about StringBuilder...
There is one thing that may be optimized in the suggested solutions. Having many calls to Replace() makes the code to do multiple passes over the same string. With very long strings the solutions may be slow because of CPU cache capacity misses. May be one should consider replacing multiple strings in a single pass.
The essential content from that link:
static string MultipleReplace(string text, Dictionary replacements) {
return Regex.Replace(text,
"(" + String.Join("|", adict.Keys.ToArray()) + ")",
delegate(Match m) { return replacements[m.Value]; }
);
}
// somewhere else in code
string temp = "Jonathan Smith is a developer";
adict.Add("Jonathan", "David");
adict.Add("Smith", "Seruyange");
string rep = MultipleReplace(temp, adict);
Another option using linq is
[TestMethod]
public void Test()
{
var input = "it's worth a lot of money, if you can find a buyer.";
var expected = "its worth a lot of money if you can find a buyer";
var removeList = new string[] { ".", ",", "'" };
var result = input;
removeList.ToList().ForEach(o => result = result.Replace(o, string.Empty));
Assert.AreEqual(expected, result);
}
I'm doing something similar, but in my case I'm doing serialization/De-serialization so I need to be able to go both directions. I find using a string[][] works nearly identically to the dictionary, including initialization, but you can go the other direction too, returning the substitutes to their original values, something that the dictionary really isn't set up to do.
Edit: You can use Dictionary<Key,List<Values>> in order to obtain same result as string[][]
Regular Expression with MatchEvaluator could also be used:
var pattern = new Regex(#"These|words|are|placed|in|parentheses");
var input = "The matching words in this text are being placed inside parentheses.";
var result = pattern.Replace(input , match=> $"({match.Value})");
Note:
Obviously different expression (like: \b(\w*test\w*)\b) could be used for words matching.
I was hoping it to be more optimized to find the pattern in expression and do the replacements
The advantage is the ability to process the matching elements while doing the replacements
This is essentially Paolo Tedesco's answer, but I wanted to make it re-usable.
public class StringMultipleReplaceHelper
{
private readonly Dictionary<string, string> _replacements;
public StringMultipleReplaceHelper(Dictionary<string, string> replacements)
{
_replacements = replacements;
}
public string clean(string s)
{
foreach (string to_replace in _replacements.Keys)
{
s = s.Replace(to_replace, _replacements[to_replace]);
}
return s;
}
}
One thing to note that I had to stop it being an extension, remove the static modifiers, and remove this from clean(this string s). I'm open to suggestions as to how to implement this better.
string input = "it's worth a lot of money, if you can find a buyer.";
for (dynamic i = 0, repl = new string[,] { { "'", "''" }, { "money", "$" }, { "find", "locate" } }; i < repl.Length / 2; i++) {
input = input.Replace(repl[i, 0], repl[i, 1]);
}
Related
Can somebody please tell me why a space comes up with 2 matches for the below pattern?
((?<key>(?:((?!\d)\w+(?:\.(?!\d)\w+)*)\.)?((?!\d)\w+)):(?<value>([^ "]+)|("[^"]*?")+))*
Trying to match the following cases:
var body = "Key:Hello";
var body = "Key:\"Hello\"";
var body = "Key1:Hello Key2:\"Goodbye\"";
This may provide more context:
pattern = #"((?<key>" + StringExtensions.REGEX_IDENTIFIER_MIDSTRING + "):(?<value>([^ \"]+)|(\"[^\"]*?\")+))*";
My goal is to pull the keys, values out of a command-line like string in the form of [key]:[value] with optional repeats. Values can either be a with no spaces or in quotes with spaces.
Probably right there in front of me but I'm not seeing it.
Probably because “.”, because a period in regex, marches every character except line breaks
I took a different approach:
public static Dictionary<string, string> GetCommandLineKeyValues(this string commandLine)
{
var keyValues = new Dictionary<string, string>();
var pattern = #"(?<command>(" + StringExtensions.REGEX_IDENTIFIER + " )?)(?<args>.*)";
var args = commandLine.RegexGet(pattern, "args");
Match match;
if (args.Length > 0)
{
string key;
string value;
pattern = #" ?(?<key>" + StringExtensions.REGEX_IDENTIFIER_MIDSTRING + ")*?:(?<value>([^ \"]+)|(\"[^\"]*?\")+)";
do
{
match = args.RegexGetMatch(pattern);
if (match == null)
{
break;
}
key = match.Groups["key"].Value;
value = match.Groups["value"].Value;
keyValues.Add(key, value);
args = match.Replace(args, string.Empty);
}
while (args.RegexIsMatch(pattern));
}
return keyValues;
}
I took what I call the "pac-man" approach to Regex.. match, eat (hence the Match.Replace), and continue matching.
For convenience:
public const string REGEX_IDENTIFIER = #"^(?:((?!\d)\w+(?:\.(?!\d)\w+)*)\.)?((?!\d)\w+)$";
I'm trying to find a cleaner way of performing multiple sequential replacements on a single string where each replacement has a unique pattern and string replacement.
For example, if I have 3 pairs of patterns-substitutions strings:
1. /(?<!\\)\\n/, "\n"
2. /(\\)(?=[\;\:\,])/, ""
3. /(\\{2})/, "\\"
I want to apply regex replacement 1 on the original string, then apply 2 on the output of 1, and so on and so forth.
The following console program example does exactly what I want, but it has a lot of repetition, I am looking for a cleaner way to do the same thing.
SanitizeString
static public string SanitizeString(string param)
{
string retval = param;
//first replacement
Regex SanitizePattern = new Regex(#"([\\\;\:\,])");
retval = SanitizePattern.Replace(retval, #"\$1");
//second replacement
SanitizePattern = new Regex(#"\r\n?|\n");
retval = SanitizePattern.Replace(retval, #"\n");
return retval;
}
ParseCommands
static public string ParseCommands(string param)
{
string retval = param;
//first replacement
Regex SanitizePattern = new Regex(#"(?<!\\)\\n");
retval = SanitizePattern.Replace(retval, System.Environment.NewLine);
//second replacement
SanitizePattern = new Regex(#"(\\)(?=[\;\:\,])");
retval = SanitizePattern.Replace(retval, "");
//third replacement
SanitizePattern = new Regex(#"(\\{2})");
retval = SanitizePattern.Replace(retval, #"\");
return retval;
}
Main
using System;
using System.IO;
using System.Text.RegularExpressions;
...
static void Main(string[] args)
{
//read text that contains user input
string sampleText = File.ReadAllText(#"c:\sample.txt");
//sanitize input with certain rules
sampleText = SanitizeString(sampleText);
File.WriteAllText(#"c:\sanitized.txt", sampleText);
//parses escaped characters back into the original text
sampleText = ParseCommands(sampleText);
File.WriteAllText(#"c:\parsed_back.txt", sampleText);
}
Don't mind the file operations. I just used that as a quick way to visualize the actual output. In my program I'm going to use something different.
Here's one way:
var replacements = new List<(Regex regex, string replacement)>()
{
(new Regex(#"(?<!\\)\\n"), System.Environment.NewLine),
(new Regex(#"(\\)(?=[\;\:\,])"), ""),
(new Regex(#"(\\{2})"), #"\"),
};
(Ideally cache that in a static readonly field):
Then:
string retval = param;
foreach (var (regex, replacement) in replacements)
{
retval = regex.Replace(retval, replacement);
}
Or you could go down the linq route:
string retval = replacements
.Aggregate(param, (str, x) => x.regex.Replace(str, x.replacement));
I have an object array:
object[] keys
I need to transform this array into a string which is comma separated and I did it by doing this:
var newKeys = string.Join(",", keys);
My problem here is I want this values to be double quoted.
ex:
"value1","value2","value3"
var new= "\"" + string.Join( "\",\"", keys) + "\"";
To include a double quote in a string, you escape it with a backslash character, thus "\"" is a string consisting of a single double quote character, and "\", \"" is a string containing a double quote, a comma, a space, and another double quote.
Please give a try to this.
var keys = new object[] { "test1", "hello", "world", null, "", "oops"};
var csv = string.Join(",", keys.Select(k => string.Format("\"{0}\"", k)));
Because you have an object[] array, string.Format can deal with null as well as other types than strings. This solutions also works in .NET 3.5.
When the object[] array is empty, then a empty string is returned.
If performance is the key, you can always use a StringBuilder to concatenate everything.
Here's a fiddle to see it in action, but the main part can be summarized as:
// these look like snails, but they are actually pretty fast
using #_____ = System.Collections.Generic.IEnumerable<object>;
using #______ = System.Func<object, object>;
using #_______ = System.Text.StringBuilder;
public static string GetCsv(object[] input)
{
// use a string builder to make things faster
var #__ = new StringBuilder();
// the rest should be self-explanatory
Func<#_____, #______, #_____>
#____ = (_6,
_2) => _6.Select(_2);
Func<#_____, object> #_3 = _6
=> _6.FirstOrDefault();
Func<#_____, #_____> #_4 = _8
=> _8.Skip(input.Length - 1);
Action<#_______, object> #_ = (_9,
_2) => _9.Append(_2);
Action<#_______>
#___ = _7 =>
{ if (_7.Length > 0) #_(
#__, ",");
}; var #snail =
#____(input, (#_0 =>
{ #___(#__); #_(#__, #"""");
#_(#__, #_0); #_(#__, #"""");
return #__; }));
var #linq = #_4(#snail);
var #void = #_3(#linq);
// get the result
return #__.ToString();
}
I have a string
"[\"1,1\",\"2,2\"]"
and I want to turn this string onto this
1,1,2,2
I am using Replace function for that like
obj.str.Replace("[","").Replace("]","").Replace("\\","");
But it does not return the expected result.
Please help.
You haven't removed the double quotes. Use the following:
obj.str = obj.str.Replace("[","").Replace("]","").Replace("\\","").Replace("\"", "");
Here is an optimized approach in case the string or the list of exclude-characters is long:
public static class StringExtensions
{
public static String RemoveAll(this string input, params Char[] charactersToRemove)
{
if(string.IsNullOrEmpty(input) || (charactersToRemove==null || charactersToRemove.Length==0))
return input;
var exclude = new HashSet<Char>(charactersToRemove); // removes duplicates and has constant lookup time
var sb = new StringBuilder(input.Length);
foreach (Char c in input)
{
if (!exclude.Contains(c))
sb.Append(c);
}
return sb.ToString();
}
}
Use it in this way:
str = str.RemoveAll('"', '[', ']', '\\');
// or use a string as "remove-array":
string removeChars = "\"{[]\\";
str = str.RemoveAll(removeChars.ToCharArray());
You should do following:
obj.str = obj.str.Replace("[","").Replace("]","").Replace("\"","");
string.Replace method does not replace string content in place. This means that if you have
string test = "12345" and do
test.Replace("2", "1");
test string will still be "12345". Replace doesn't change string itself, but creates new string with replaced content. So you need to assign this new string to a new or same variable
changedTest = test.Replace("2", "1");
Now, changedTest will containt "11345".
Another note on your code is that you don't actually have \ character in your string. It's only displayed in order to escape quote character. If you want to know more about this, please read MSDN article on string literals.
how about
var exclusions = new HashSet<char>(new[] { '"', '[', ']', '\\' });
return new string(obj.str.Where(c => !exclusions.Contains(c)).ToArray());
To do it all in one sweep.
As Tim Schmelter writes, if you wanted to do it often, especially with large exclusion sets over long strings, you could make an extension like this.
public static string Strip(
this string source,
params char[] exclusions)
{
if (!exclusions.Any())
{
return source;
}
var mask = new HashSet<char>(exclusions);
var result = new StringBuilder(source.Length);
foreach (var c in source.Where(c => !mask.Contains(c)))
{
result.Append(c);
}
return result.ToString();
}
so you could do,
var result = "[\"1,1\",\"2,2\"]".Strip('"', '[', ']', '\\');
Capture the numbers only with this regular expression [0-9]+ and then concatenate the matches:
var input = "[\"1,1\",\"2,2\"]";
var regex = new Regex("[0-9]+");
var matches = regex.Matches(input).Cast<Match>().Select(m => m.Value);
var result = string.Join(",", matches);
In many places in our code we have collections of objects, from which we need to create a comma-separated list. The type of collection varies: it may be a DataTable from which we need a certain column, or a List<Customer>, etc.
Now we loop through the collection and use string concatenation, for example:
string text = "";
string separator = "";
foreach (DataRow row in table.Rows)
{
text += separator + row["title"];
separator = ", ";
}
Is there a better pattern for this? Ideally I would like an approach we could reuse by just sending in a function to get the right field/property/column from each object.
string.Join(", ", Array.ConvertAll(somelist.ToArray(), i => i.ToString()))
static string ToCsv<T>(IEnumerable<T> things, Func<T, string> toStringMethod)
{
StringBuilder sb = new StringBuilder();
foreach (T thing in things)
sb.Append(toStringMethod(thing)).Append(',');
return sb.ToString(0, sb.Length - 1); //remove trailing ,
}
Use like this:
DataTable dt = ...; //datatable with some data
Console.WriteLine(ToCsv(dt.Rows, row => row["ColName"]));
or:
List<Customer> customers = ...; //assume Customer has a Name property
Console.WriteLine(ToCsv(customers, c => c.Name));
I don't have a compiler to hand but in theory it should work. And as everyone knows, in theory, practice and theory are the same. In practice, they're not.
I found string.Join and lambda Select<Func<>> helps to write minimum code.
List<string> fruits = new List<string>();
fruits.Add("Mango");
fruits.Add("Banana");
fruits.Add("Papaya");
string commaSepFruits = string.Join(",", fruits.Select(f => "'" + f + "'"));
Console.WriteLine(commaSepFruits);
List<int> ids = new List<int>();
ids.Add(1001);
ids.Add(1002);
ids.Add(1003);
string commaSepIds = string.Join(",", ids);
Console.WriteLine(commaSepIds);
List<Customer> customers = new List<Customer>();
customers.Add(new Customer { Id = 10001, Name = "John" });
customers.Add(new Customer { Id = 10002, Name = "Robert" });
customers.Add(new Customer { Id = 10002, Name = "Ryan" });
string commaSepCustIds = string.Join(", ", customers.Select(cust => cust.Id));
string commaSepCustNames = string.Join(", ", customers.Select(cust => "'" + cust.Name + "'"));
Console.WriteLine(commaSepCustIds);
Console.WriteLine(commaSepCustNames);
Console.ReadLine();
// using System.Collections;
// using System.Collections.Generic;
// using System.Linq
public delegate string Indexer<T>(T obj);
public static string concatenate<T>(IEnumerable<T> collection, Indexer<T> indexer, char separator)
{
StringBuilder sb = new StringBuilder();
foreach (T t in collection) sb.Append(indexer(t)).Append(separator);
return sb.Remove(sb.Length - 1, 1).ToString();
}
// version for non-generic collections
public static string concatenate<T>(IEnumerable collection, Indexer<T> indexer, char separator)
{
StringBuilder sb = new StringBuilder();
foreach (object t in collection) sb.Append(indexer((T)t)).Append(separator);
return sb.Remove(sb.Length - 1, 1).ToString();
}
// example 1: simple int list
string getAllInts(IEnumerable<int> listOfInts)
{
return concatenate<int>(listOfInts, Convert.ToString, ',');
}
// example 2: DataTable.Rows
string getTitle(DataRow row) { return row["title"].ToString(); }
string getAllTitles(DataTable table)
{
return concatenate<DataRow>(table.Rows, getTitle, '\n');
}
// example 3: DataTable.Rows without Indexer function
string getAllTitles(DataTable table)
{
return concatenate<DataRow>(table.Rows, r => r["title"].ToString(), '\n');
}
In .NET 4 you can just do string.Join(", ", table.Rows.Select(r => r["title"]))
You could write a function that transforms a IEnumerable<string> into a comma-separated string:
public string Concat(IEnumerable<string> stringList)
{
StringBuilder textBuilder = new StringBuilder();
string separator = String.Empty;
foreach(string item in stringList)
{
textBuilder.Append(separator);
textBuilder.Append(item);
separator = ", ";
}
return textBuilder.ToString();
}
You can then use LINQ to query your collection/dataset/etc to provide the stringList.
As an aside: The first modification I would make is to use the StringBuilder Class instead of just a String - it'll save resources for you.
I love Matt Howells answer in this post:
I had to make it into an extension:
public static string ToCsv<T>(this IEnumerable<T> things, Func<T, string> toStringMethod)
Usage (I am getting all the emails and turning them into a CSV string for emails):
var list = Session.Find("from User u where u.IsActive = true").Cast<User>();
return list.ToCsv(i => i.Email);
For collections you can use this method as well, for example:
string.Join(", ", contactsCollection.Select(i => i.FirstName));
You can select any property that you want to separate.
string strTest = "1,2,4,6";
string[] Nums = strTest.Split(',');
Console.Write(Nums.Aggregate<string>((first, second) => first + "," + second));
//OUTPUT:
//1,2,4,6
Here's my favorite answer adapted to the question,
and corrected Convert to ConvertAll:
string text = string.Join(", ", Array.ConvertAll(table.Rows.ToArray(), i => i["title"]));