Join with intelligent separators - c#

It's easy enough to write, of course, but in C# 2010, is there a built-in Join (or similar) method that will only add a separator if both the previous and next elements are non-null and non-empty?
In other words SmartJoin(", ","Hood","Robin") would produce "Hood, Robin" but SmartJoin(", ", "Robin Hood", string.Empty) would produce simply "Robin Hood".

How about this:
public void SmartJoin(string separator, params string[] Items)
{
String.Join(separator, Items.Where(x=>!String.IsNullOrEmpty(x)).ToArray());
}

There is no built-in join which you need.

Here's another way using "aggregate" method of linq
string result = new List<string>() { "Hood", "Robin" }.Aggregate(SmartJoin());
string result2 = new List<string>() { "Robin Hood", "" }.Aggregate(SmartJoin());
private static Func<string, string, string> SmartJoin()
{
return (x, y) => x + (string.IsNullOrEmpty(y) ? "" : ", " + y);
}

NotherDev was right, strictly speaking, there is no such method build in, but still #CodingGorila solution helped me, and should be added to the next .NET version by my account, though I did still turn it into a static function and have it return a string to make it work in my situation:
public static string SmartJoin(string separator, params string[] Items) {
return String.Join(separator, Items.Where(x=>!String.IsNullOrEmpty(x)).ToArray());
}

NotherDev was right, strictly speaking, there is no such method build in, but still #CodingGorila solution helped me, and should be added to the next .NET version by my account, though I did still turn it into a static function and have it return a string to make it work in my situation:
public static string SmartJoin(string separator, params string[] Items) {
return String.Join(separator, Items.Where(x=>!String.IsNullOrEmpty(x)));
}

Related

Finding duplicates in List<string>

In a list with some hundred thousand entries, how does one go about comparing each entry with the rest of the list for duplicates?
For example, List fileNames contains both "00012345.pdf" and "12345.pdf" and are considered duplicte. What is the best strategy to flagging this kind of a duplicate?
Thanks
Update: The naming of files is restricted to numbers. They are padded with zeros. Duplicates are where the padding is missing. Thus, "123.pdf" & "000123.pdf" are duplicates.
You probably want to implement your own substring comparer to test equality based on whether a substring is contained within another string.
This isn't necessarily optimised, but it will work. You could also possibly consider using Parallel Linq if you are using .NET 4.0.
EDIT: Answer updated to reflect refined question after it was edited
void Main()
{
List<string> stringList = new List<string> { "00012345.pdf","12345.pdf","notaduplicate.jpg","3453456363234.jpg"};
IEqualityComparer<string> comparer = new NumericFilenameEqualityComparer ();
var duplicates = stringList.GroupBy (s => s, comparer).Where(grp => grp.Count() > 1);
// do something with grouped duplicates...
}
// Not safe for null's !
// NB do you own parameter / null checks / string-case options etc !
public class NumericFilenameEqualityComparer : IEqualityComparer<string> {
private static Regex digitFilenameRegex = new Regex(#"\d+", RegexOptions.Compiled);
public bool Equals(string left, string right) {
Match leftDigitsMatch = digitFilenameRegex.Match(left);
Match rightDigitsMatch = digitFilenameRegex.Match(right);
long leftValue = leftDigitsMatch.Success ? long.Parse(leftDigitsMatch.Value) : long.MaxValue;
long rightValue = rightDigitsMatch.Success ? long.Parse(rightDigitsMatch.Value) : long.MaxValue;
return leftValue == rightValue;
}
public int GetHashCode(string value) {
return base.GetHashCode();
}
}
I understand you are looking for duplicates in order to remove them?
One way to go about it could be the following:
Create a class MyString which takes care of duplication rules. That is, overrides Equals and GetHashCode to recreate exactly the duplication rules you are considering. (I'm understanding from your question that 00012345.pdf and 12345.pdf should be considered duplicates?)
Make this class explicitly or implictly convertible to string (or override ToString() for that matter).
Create a HashCode<MyString> and fill it up iterating through your original List<String> checking for duplicates.
Might be dirty but it will do the trick. The only "hard" part here is correctly implementing your duplication rules.
I have a simple solution for everyone to find a duplicate string word and cahracter
For word
public class Test {
public static void main(String[] args) {
findDuplicateWords("i am am a a learner learner learner");
}
private static void findDuplicateWords(String string) {
HashMap<String,Integer> hm=new HashMap<>();
String[] s=string.split(" ");
for(String tempString:s){
if(hm.get(tempString)!=null){
hm.put(tempString, hm.get(tempString)+1);
}
else{
hm.put(tempString,1);
}
}
System.out.println(hm);
}
}
for character use for loop, get array length and use charAt()
Maybe somthing like this:
List<string> theList = new List<string>() { "00012345.pdf", "00012345.pdf", "12345.pdf", "1234567.pdf", "12.pdf" };
theList.GroupBy(txt => txt)
.Where(grouping => grouping.Count() > 1)
.ToList()
.ForEach(groupItem => Console.WriteLine("{0} duplicated {1} times with these values {2}",
groupItem.Key,
groupItem.Count(),
string.Join(" ", groupItem.ToArray())));

String Parsing in C#

What is the most efficient way to parse a C# string in the form of
"(params (abc 1.3)(sdc 2.0)(www 3.05)....)"
into a struct in the form
struct Params
{
double abc,sdc,www....;
}
Thanks
EDIT
The structure always have the same parameters (same names,only doubles, known at compile time).. but the order is not granted.. only one struct at a time..
using System;
namespace ConsoleApplication1
{
class Program
{
struct Params
{
public double abc, sdc;
};
static void Main(string[] args)
{
string s = "(params (abc 1.3)(sdc 2.0))";
Params p = new Params();
object pbox = (object)p; // structs must be boxed for SetValue() to work
string[] arr = s.Substring(8).Replace(")", "").Split(new char[] { ' ', '(', }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < arr.Length; i+=2)
typeof(Params).GetField(arr[i]).SetValue(pbox, double.Parse(arr[i + 1]));
p = (Params)pbox;
Console.WriteLine("p.abc={0} p.sdc={1}", p.abc, p.sdc);
}
}
}
Note: if you used a class instead of a struct the boxing/unboxing would not be necessary.
Depending on your complete grammar you have a few options:
if it's a very simple grammar and you don't have to test for errors in it you could simply go with the below (which will be fast)
var input = "(params (abc 1.3)(sdc 2.0)(www 3.05)....)";
var tokens = input.Split('(');
var typeName = tokens[0];
//you'll need more than the type name (assembly/namespace) so I'll leave that to you
Type t = getStructFromType(typeName);
var obj = TypeDescriptor.CreateInstance(null, t, null, null);
for(var i = 1;i<tokens.Length;i++)
{
var innerTokens = tokens[i].Trim(' ', ')').Split(' ');
var fieldName = innerTokens[0];
var value = Convert.ToDouble(innerTokens[1]);
var field = t.GetField(fieldName);
field.SetValue(obj, value);
}
that simple approach however requires a well conforming string or it will misbehave.
If the grammar is a bit more complicated e.g. nested ( ) then that simple approach won't work.
you could try to use a regEx but that still requires a rather simple grammar so if you end up having a complex grammar your best choice is a real parser. Irony is easy to use since you can write it all in simple c# (some knowledge of BNF is a plus though).
Do you need to support multiple structs ? In other words, does this need to be dynamic; or do you know the struct definition at compile time ?
Parsing the string with a regex would be the obvious choice.
Here is a regex, that will parse your string format:
private static readonly Regex regParser = new Regex(#"^\(params\s(\((?<name>[a-zA-Z]+)\s(?<value>[\d\.]+)\))+\)$", RegexOptions.Compiled);
Running that regex on a string will give you two groups named "name" and "value". The Captures property of each group will contain the names and values.
If the struct type is unknown at compile time, then you will need to use reflection to fill in the fields.
If you mean to generate the struct definition at runtime, you will need to use Reflection to emit the type; or you will need to generate the source code.
Which part are you having trouble with ?
A regex can do the job for you:
public Dictionary<string, double> ParseString(string input){
var dict = new Dictionary<string, double>();
try
{
var re = new Regex(#"(?:\(params\s)?(?:\((?<n>[^\s]+)\s(?<v>[^\)]+)\))");
foreach (Match m in re.Matches(input))
dict.Add(m.Groups["n"].Value, double.Parse(m.Groups["v"].Value));
}
catch
{
throw new Exception("Invalid format!");
}
return dict;
}
use it like:
string str = "(params (abc 1.3)(sdc 2.0)(www 3.05))";
var parsed = ParseString(str);
// parsed["abc"] would now return 1.3
That might fit better than creating a lot of different structs for every possible input string, and using reflection for filling them. I dont think that is worth the effort.
Furthermore I assumed the input string is always in exactly the format you posted.
You might consider performing just enough string manipulation to make the input look like standard command line arguments then use an off-the-shelf command line argument parser like NDesk.Options to populate the Params object. You give up some efficiency but you make it up in maintainability.
public Params Parse(string input)
{
var #params = new Params();
var argv = ConvertToArgv(input);
new NDesk.Options.OptionSet
{
{"abc=", v => Double.TryParse(v, out #params.abc)},
{"sdc=", v => Double.TryParse(v, out #params.sdc)},
{"www=", v => Double.TryParse(v, out #params.www)}
}
.Parse(argv);
return #params;
}
private string[] ConvertToArgv(string input)
{
return input
.Replace('(', '-')
.Split(new[] {')', ' '});
}
Do you want to build a data representation of your defined syntax?
If you are looking for easily maintainability, without having to write long RegEx statements you could build your own Lexer parser. here is a prior discussion on SO with good links in the answers as well to help you
Poor man's "lexer" for C#
I would just do a basic recursive-descent parser. It may be more general than you want, but nothing else will be much faster.
Here's an out-of-the-box approach:
convert () to {} and [SPACE] to ":", then use System.Web.Script.Serialization.JavaScriptSerializer.Deserialize
string s = "(params (abc 1.3)(sdc 2.0))"
.Replace(" ", ":")
.Replace("(", "{")
.Replace(")","}");
return new System.Web.Script.Serialization.JavaScriptSerializer().Deserialize(s);

Good way to concatenate string representations of objects?

Ok,
We have a lot of where clauses in our code. We have just as many ways to generate a string to represent the in condition. I am trying to come up with a clean way as follows:
public static string Join<T>(this IEnumerable<T> items, string separator)
{
var strings = from item in items select item.ToString();
return string.Join(separator, strings.ToArray());
}
it can be used as follows:
var values = new []{1, 2, 3, 4, 5, 6};
values.StringJoin(",");
// result should be:
// "1,2,3,4,5,6"
So this is a nice extension method that does a very basic job. I know that simple code does not always turn into fast or efficient execution, but I am just curious as to what could I have missed with this simple code. Other members of our team are arguing that:
it is not flexible enough (no control of the string representation)
may not be memory efficient
may not be fast
Any expert to chime in?
Regards,
Eric.
Regarding the first issue, you could add another 'formatter' parameter to control the conversion of each item into a string:
public static string Join<T>(this IEnumerable<T> items, string separator)
{
return items.Join(separator, i => i.ToString());
}
public static string Join<T>(this IEnumerable<T> items, string separator, Func<T, string> formatter)
{
return String.Join(separator, items.Select(i => formatter(i)).ToArray());
}
Regarding the second two issues, I wouldn't worry about it unless you later run into performance issues and find it to be a problem. It's unlikely to much of a bottleneck however...
For some reason, I thought that String.Join is implemented in terms of a StringBuilder class. But if it isn't, then the following is likely to perform better for large inputs since it doesn't recreate a String object for each join in the iteration.
public static string Join<T>(this IEnumerable<T> items, string separator)
{
// TODO: check for null arguments.
StringBuilder builder = new StringBuilder();
foreach(T t in items)
{
builder.Append(t.ToString()).Append(separator);
}
builder.Length -= separator.Length;
return builder.ToString();
}
EDIT: Here is an analysis of when it is appropriate to use StringBuilder and String.Join.
Why don't you use StringBuilder, and iterate through the collection yourself, appending.
Otherwise you are creating an array of strings (var strings) and then doing the Join.
You are missing null checks for the sequence and the items of the sequence. And yes, it is not the fastest and most memory efficient way. One would probably just enumerate the sequence and render the string representations of the items into a StringBuilder. But does this really matter? Are you experiencing performance problems? Do you need to optimize?
this would work also:
public static string Test(IEnumerable<T> items, string separator)
{
var builder = new StringBuilder();
bool appendSeperator = false;
if(null != items)
{
foreach(var item in items)
{
if(appendSeperator)
{
builder.Append(separator)
}
builder.Append(item.ToString());
appendSeperator = true;
}
}
return builder.ToString();
}

Can I rewrite this more elegantly using LINQ?

I have a double[][] that I want to convert to a CSV string format (i.e. each row in a line, and row elements separated by commas). I wrote it like this:
public static string ToCSV(double[][] array)
{
return String.Join(Environment.NewLine,
Array.ConvertAll(array,
row => String.Join(",",
Array.ConvertAll(row, x => x.ToString())));
}
Is there a more elegant way to write this using LINQ?
(I know, one could use temporary variables to make this look better, but this code format better conveys what I am looking for.)
You can, but I wouldn't personally do all the lines at once - I'd use an iterator block:
public static IEnumerable<string> ToCSV(IEnumerable<double[]> source)
{
return source.Select(row => string.Join(",",
Array.ConvertAll(row, x=>x.ToString())));
}
This returns each line (the caller can then WriteLine etc efficiently, without buffering everything). It is also now callable from any source of double[] rows (including but not limited to a jagged array).
Also - with a local variable you could use StringBuilder to make each line slightly cheaper.
To return the entire string at once, I'd optimize it to use a single StringBuilder for all the string work; a bit more long-winded, but much more efficient (far fewer intermediate strings):
public static string ToCSV(IEnumerable<double[]> source) {
StringBuilder sb = new StringBuilder();
foreach(var row in source) {
if (row.Length > 0) {
sb.Append(row[0]);
for (int i = 1; i < row.Length; i++) {
sb.Append(',').Append(row[i]);
}
}
}
return sb.ToString();
}
You could also use Aggregate
public static string ToCSV(double[][] array)
{
return array.Aggregate(string.Empty, (multiLineStr, arrayDouble) =>
multiLineStr + System.Environment.NewLine +
arrayDouble.Aggregate(string.Empty, (str, dbl) => str + "," + dbl.ToString()));
}
This is compatible with any nested sequences of double. It also defers the ToString implementation to the caller, allowing formatting while avoiding messy IFormatProvider overloads:
public static string Join(this IEnumerable<string> source, string separator)
{
return String.Join(separator, source.ToArray());
}
public static string ToCsv<TRow>(this IEnumerable<TRow> rows, Func<double, string> valueToString)
where TRow : IEnumerable<double>
{
return rows
.Select(row => row.Select(valueToString).Join(", "))
.Join(Environment.NewLine);
}
You can do it with LINQ, but I'm not sure if you like this one better than yours. I'm afraid you don't. :)
var q = String.Join(Environment.NewLine, (from a in d
select String.Join(", ", (from b in a
select b.ToString()).ToArray())).ToArray());
Cheers,
Matthias

Is it possible to explode an array so that its elements can be passed to a method with the params keyword?

Take this non-compiling code for instance:
public string GetPath(string basefolder, string[] extraFolders)
{
string version = Versioner.GetBuildAndDotNetVersions();
string callingModule = StackCrawler.GetCallingModuleName();
return AppendFolders(basefolder, version, callingModule, extraFolders);
}
private string AppendFolders(params string[] folders)
{
string outstring = folders[0];
for (int i = 1; i < folders.Length; i++)
{
string fixedPath = folders[i][0] == '\\' ? folders[i].Substring(1) : folders[i];
Path.Combine(outstring, fixedPath);
}
return outstring;
}
This example is a somewhat simplified version of testing code I am using. Please, I am only interested in solutions having directly to do with the param keyword. I know how lists and other similar things work.
Is there a way to "explode" the extraFolders array so that it's contents can be passed into AppendFolders along with other parameters?
Just pass it. The folders parameter is an array first. the "params" functionality is a little bit of compiler magic, but it's not required.
AppendFolders(extraFolders);
Now, it this particulat instance, you'll have to add some things to that array, first.
List<string> lstFolders = new List<string>(extraFolders);
lstFolder.Insert(0, callingModule);
lstFolder.Insert(0, version);
lstFolder.Insert(0, basefolder);
return AppendFolders(lstFolders.ToArray());
I'll quibble with the term "collapse", since it seems you really want to "expand". And I'm not sure what you mean by solutions "having directly to do with params keyword" and that "you're not interested in workarounds". In the end, you either have to pass a number of strings - which the compiler will magically package into an array - or an array of strings directly. That being said, my solution (without changing the interface) would go something like:
return AppendFolders(new string[] { basefolder, version, callingModule }.Concat(extraFolders).ToArray());
Edit:
While you can't add an operator via extension methods, you could do:
return AppendFolders(new string[] { baseFolder, callingModuleName, version }.Concat(extraFolders));
public static T[] Concat<T>(this T[] a, T[] b) {
return ((IEnumerable<T>)a).Concat(b).ToArray();
}
But, if we're going to go that far - might as well just extend List<T> to handle this elegantly:
return AppendFolders(new Params<string>() { baseFolder, callingModuleName, version, extraFolders });
class Params<T> : List<T> {
public void Add(IEnumerable<T> collection) {
base.AddRange(collection);
}
public static implicit operator T[](Params<T> a) {
return a.ToArray();
}
}
One option is to make the params parameter an object[]:
static string appendFolders(params object[] folders)
{ return (string) folders.Aggregate("",(output, f) =>
Path.Combine( (string)output
,(f is string[])
? appendFolders((object[])f)
: ((string)f).TrimStart('\\')));
}
If you want something more strongly-typed, another option is to create a custom union type with implicit conversion operators:
static string appendFolders(params StringOrArray[] folders)
{ return folders.SelectMany(x=>x.AsEnumerable())
.Aggregate("",
(output, f)=>Path.Combine(output,f.TrimStart('\\')));
}
class StringOrArray
{ string[] array;
public IEnumerable<string> AsEnumerable()
{ return soa.array;}
public static implicit operator StringOrArray(string s)
{ return new StringOrArray{array=new[]{s}};}
public static implicit operator StringOrArray(string[] s)
{ return new StringOrArray{array=s};}
}
In either case, this will compile:
appendFolders("base", "v1", "module", new[]{"debug","bin"});
A quick and dirty solution would be to build a List<string> from the items and then pass that (with ToArray()).
Note that you don't need to test for the backslash. Path.Combine handles the dirty things rather fine.
I think OregonGhost's answer is probably the way you want to go. Just to elaborate on it, he's suggesting doing something like this:
public string GetPath(string basefolder, string[] extraFolders)
{
string version = Versioner.GetBuildAndDotNetVersions();
string callingModule = StackCrawler.GetCallingModuleName();
List<string> parameters = new List<string>(extraFolders.Length + 3);
parameters.Add(basefolder);
parameters.Add(version);
parameters.Add(callingModule);
parameters.AddRange(extraFolders);
return AppendFolders(parameters.ToArray());
}
And I don't mean that as a lesson on how to use Lists, just as a little clarification for anybody who may come along looking for the solution in the future.

Categories

Resources