How to iterate only distinct string values by custom substring equality - c#

Similar to this question, I'm trying to iterate only distinct values of sub-string of given strings, for example:
List<string> keys = new List<string>()
{
"foo_boo_1",
"foo_boo_2,
"foo_boo_3,
"boo_boo_1"
}
The output for the selected distinct values should be (select arbitrary the first sub-string's distinct value):
foo_boo_1 (the first one)
boo_boo_1
I've tried to implement this solution using the IEqualityComparer with:
public class MyEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
int xIndex = x.LastIndexOf("_");
int yIndex = y.LastIndexOf("_");
if (xIndex > 0 && yIndex > 0)
return x.Substring(0, xIndex) == y.Substring(0, yIndex);
else
return false;
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
foreach (var key in myList.Distinct(new MyEqualityComparer()))
{
Console.WriteLine(key)
}
But the resulted output is:
foo_boo_1
foo_boo_2
foo_boo_3
boo_boo_1
Using the IEqualityComparer How do I remove the sub-string distinct values (foo_boo_2 and foo_boo_3)?
*Please note that the "real" keys are a lot longer, something like "1_0_8-B153_GF_6_2", therefore I must use the LastIndexOf.

Your current implementation has some flaws:
Both Equals and GetHashCode must never throw exception (you have to check for null)
If Equals returns true for x and y then GetHashCode(x) == GetHashCode(y). Counter example is "abc_1" and "abc_2".
The 2nd error can well cause Distinct return incorrect results (Distinct first compute hash).
Correct code can be something like this
public class MyEqualityComparer : IEqualityComparer<string> {
public bool Equals(string x, string y) {
if (ReferenceEquals(x, y))
return true;
else if ((null == x) || (null == y))
return false;
int xIndex = x.LastIndexOf('_');
int yIndex = y.LastIndexOf('_');
if (xIndex >= 0)
return (yIndex >= 0)
? x.Substring(0, xIndex) == y.Substring(0, yIndex)
: false;
else if (yIndex >= 0)
return false;
else
return x == y;
}
public int GetHashCode(string obj) {
if (null == obj)
return 0;
int index = obj.LastIndexOf('_');
return index < 0
? obj.GetHashCode()
: obj.Substring(0, index).GetHashCode();
}
}
Now you are ready to use it with Distinct:
foreach (var key in myList.Distinct(new MyEqualityComparer())) {
Console.WriteLine(key)
}

Your GetHashCode method in your equality comparer is returning the hash code for the entire string, just make it hash the substring, for example:
public int GetHashCode(string obj)
{
var index = obj.LastIndexOf("_");
return obj.Substring(0, index).GetHashCode();
}

For a more succinct solution that avoids using a custom IEqualityComparer<>, you could utilise GroupBy. For example:
var keys = new List<string>()
{
"foo_boo_1",
"foo_boo_2",
"foo_boo_3",
"boo_boo_1"
};
var distinct = keys
.Select(k => new
{
original = k,
truncated = k.Contains("_") ? k.Substring(0, k.LastIndexOf("_")) : k
})
.GroupBy(k => k.truncated)
.Select(g => g.First().original);
This outputs:
foo_boo_1
boo_boo_1

Related

Check if characters in String are in alphabetical order

I'm supposed to take in a string and return true if it is in alphabetical order. So far in my solution I am able to get true from "abc" but words like "aPple" throw a false. I am assuming this is because some of the characters are capitalized but I don't know where I am going wrong. This is what I have.
public bool IsAlphabetical(string s)
{
char[] c = s.ToCharArray();
Array.Sort(c);
return (c.SequenceEqual(s));
}
if you need not case-sensitive comparison, you can try this
public static bool IsAlphabetical(string s,bool checkWithoutCase=true)
{
var c = s.ToCharArray();
Array.Sort(c);
var equal = c.SequenceEqual(s);
// check without case
if (!equal && checkWithoutCase)
{
s = s.ToLower();
Array.Sort(c = s.ToCharArray());
equal = c.SequenceEqual(s);
}
return equal;
}
or linq variant ( I am not sure what is faster)
public static bool IsAlphabetical(string s, bool checkWithoutCase=true, bool checkWithCase=true)
{
var equal=false;
if (checkWithCase) equal= s.OrderBy(x => x).SequenceEqual(s);
// check without case
if (!equal && checkWithoutCase)
{
s=s.ToLower();
equal = s.OrderBy(x => x).SequenceEqual(s);
}
return equal;
}
I think that LINQ and array sorting is too much. A quick performance check proves that
https://dotnetfiddle.net/eoKP7Z
public static bool IsAlphabetical(string s, bool checkWithoutCase = true)
{
var comparisonType = checkWithoutCase ?
System.StringComparison.OrdinalIgnoreCase :
System.StringComparison.Ordinal;
for (var idx = 1; idx < s.Length; idx++)
{
if (string.Compare(s, idx - 1, s, idx, 1, comparisonType) > 0)
{
return false;
}
}
return true;
}

Sort order of files [duplicate]

Anyone have a good resource or provide a sample of a natural order sort in C# for an FileInfo array? I am implementing the IComparer interface in my sorts.
The easiest thing to do is just P/Invoke the built-in function in Windows, and use it as the comparison function in your IComparer:
[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
private static extern int StrCmpLogicalW(string psz1, string psz2);
Michael Kaplan has some examples of how this function works here, and the changes that were made for Vista to make it work more intuitively. The plus side of this function is that it will have the same behaviour as the version of Windows it runs on, however this does mean that it differs between versions of Windows so you need to consider whether this is a problem for you.
So a complete implementation would be something like:
[SuppressUnmanagedCodeSecurity]
internal static class SafeNativeMethods
{
[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
public static extern int StrCmpLogicalW(string psz1, string psz2);
}
public sealed class NaturalStringComparer : IComparer<string>
{
public int Compare(string a, string b)
{
return SafeNativeMethods.StrCmpLogicalW(a, b);
}
}
public sealed class NaturalFileInfoNameComparer : IComparer<FileInfo>
{
public int Compare(FileInfo a, FileInfo b)
{
return SafeNativeMethods.StrCmpLogicalW(a.Name, b.Name);
}
}
Just thought I'd add to this (with the most concise solution I could find):
public static IOrderedEnumerable<T> OrderByAlphaNumeric<T>(this IEnumerable<T> source, Func<T, string> selector)
{
int max = source
.SelectMany(i => Regex.Matches(selector(i), #"\d+").Cast<Match>().Select(m => (int?)m.Value.Length))
.Max() ?? 0;
return source.OrderBy(i => Regex.Replace(selector(i), #"\d+", m => m.Value.PadLeft(max, '0')));
}
The above pads any numbers in the string to the max length of all numbers in all strings and uses the resulting string to sort.
The cast to (int?) is to allow for collections of strings without any numbers (.Max() on an empty enumerable throws an InvalidOperationException).
None of the existing implementations looked great so I wrote my own. The results are almost identical to the sorting used by modern versions of Windows Explorer (Windows 7/8). The only differences I've seen are 1) although Windows used to (e.g. XP) handle numbers of any length, it's now limited to 19 digits - mine is unlimited, 2) Windows gives inconsistent results with certain sets of Unicode digits - mine works fine (although it doesn't numerically compare digits from surrogate pairs; nor does Windows), and 3) mine can't distinguish different types of non-primary sort weights if they occur in different sections (e.g. "e-1é" vs "é1e-" - the sections before and after the number have diacritic and punctuation weight differences).
public static int CompareNatural(string strA, string strB) {
return CompareNatural(strA, strB, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase);
}
public static int CompareNatural(string strA, string strB, CultureInfo culture, CompareOptions options) {
CompareInfo cmp = culture.CompareInfo;
int iA = 0;
int iB = 0;
int softResult = 0;
int softResultWeight = 0;
while (iA < strA.Length && iB < strB.Length) {
bool isDigitA = Char.IsDigit(strA[iA]);
bool isDigitB = Char.IsDigit(strB[iB]);
if (isDigitA != isDigitB) {
return cmp.Compare(strA, iA, strB, iB, options);
}
else if (!isDigitA && !isDigitB) {
int jA = iA + 1;
int jB = iB + 1;
while (jA < strA.Length && !Char.IsDigit(strA[jA])) jA++;
while (jB < strB.Length && !Char.IsDigit(strB[jB])) jB++;
int cmpResult = cmp.Compare(strA, iA, jA - iA, strB, iB, jB - iB, options);
if (cmpResult != 0) {
// Certain strings may be considered different due to "soft" differences that are
// ignored if more significant differences follow, e.g. a hyphen only affects the
// comparison if no other differences follow
string sectionA = strA.Substring(iA, jA - iA);
string sectionB = strB.Substring(iB, jB - iB);
if (cmp.Compare(sectionA + "1", sectionB + "2", options) ==
cmp.Compare(sectionA + "2", sectionB + "1", options))
{
return cmp.Compare(strA, iA, strB, iB, options);
}
else if (softResultWeight < 1) {
softResult = cmpResult;
softResultWeight = 1;
}
}
iA = jA;
iB = jB;
}
else {
char zeroA = (char)(strA[iA] - (int)Char.GetNumericValue(strA[iA]));
char zeroB = (char)(strB[iB] - (int)Char.GetNumericValue(strB[iB]));
int jA = iA;
int jB = iB;
while (jA < strA.Length && strA[jA] == zeroA) jA++;
while (jB < strB.Length && strB[jB] == zeroB) jB++;
int resultIfSameLength = 0;
do {
isDigitA = jA < strA.Length && Char.IsDigit(strA[jA]);
isDigitB = jB < strB.Length && Char.IsDigit(strB[jB]);
int numA = isDigitA ? (int)Char.GetNumericValue(strA[jA]) : 0;
int numB = isDigitB ? (int)Char.GetNumericValue(strB[jB]) : 0;
if (isDigitA && (char)(strA[jA] - numA) != zeroA) isDigitA = false;
if (isDigitB && (char)(strB[jB] - numB) != zeroB) isDigitB = false;
if (isDigitA && isDigitB) {
if (numA != numB && resultIfSameLength == 0) {
resultIfSameLength = numA < numB ? -1 : 1;
}
jA++;
jB++;
}
}
while (isDigitA && isDigitB);
if (isDigitA != isDigitB) {
// One number has more digits than the other (ignoring leading zeros) - the longer
// number must be larger
return isDigitA ? 1 : -1;
}
else if (resultIfSameLength != 0) {
// Both numbers are the same length (ignoring leading zeros) and at least one of
// the digits differed - the first difference determines the result
return resultIfSameLength;
}
int lA = jA - iA;
int lB = jB - iB;
if (lA != lB) {
// Both numbers are equivalent but one has more leading zeros
return lA > lB ? -1 : 1;
}
else if (zeroA != zeroB && softResultWeight < 2) {
softResult = cmp.Compare(strA, iA, 1, strB, iB, 1, options);
softResultWeight = 2;
}
iA = jA;
iB = jB;
}
}
if (iA < strA.Length || iB < strB.Length) {
return iA < strA.Length ? 1 : -1;
}
else if (softResult != 0) {
return softResult;
}
return 0;
}
The signature matches the Comparison<string> delegate:
string[] files = Directory.GetFiles(#"C:\");
Array.Sort(files, CompareNatural);
Here's a wrapper class for use as IComparer<string>:
public class CustomComparer<T> : IComparer<T> {
private Comparison<T> _comparison;
public CustomComparer(Comparison<T> comparison) {
_comparison = comparison;
}
public int Compare(T x, T y) {
return _comparison(x, y);
}
}
Example:
string[] files = Directory.EnumerateFiles(#"C:\")
.OrderBy(f => f, new CustomComparer<string>(CompareNatural))
.ToArray();
Here's a good set of filenames I use for testing:
Func<string, string> expand = (s) => { int o; while ((o = s.IndexOf('\\')) != -1) { int p = o + 1;
int z = 1; while (s[p] == '0') { z++; p++; } int c = Int32.Parse(s.Substring(p, z));
s = s.Substring(0, o) + new string(s[o - 1], c) + s.Substring(p + z); } return s; };
string encodedFileNames =
"KDEqLW4xMiotbjEzKjAwMDFcMDY2KjAwMlwwMTcqMDA5XDAxNyowMlwwMTcqMDlcMDE3KjEhKjEtISox" +
"LWEqMS4yNT8xLjI1KjEuNT8xLjUqMSoxXDAxNyoxXDAxOCoxXDAxOSoxXDA2NioxXDA2NyoxYSoyXDAx" +
"NyoyXDAxOCo5XDAxNyo5XDAxOCo5XDA2Nio9MSphMDAxdGVzdDAxKmEwMDF0ZXN0aW5nYTBcMzEqYTAw" +
"Mj9hMDAyIGE/YTAwMiBhKmEwMDIqYTAwMmE/YTAwMmEqYTAxdGVzdGluZ2EwMDEqYTAxdnNmcyphMSph" +
"MWEqYTF6KmEyKmIwMDAzcTYqYjAwM3E0KmIwM3E1KmMtZSpjZCpjZipmIDEqZipnP2cgMT9oLW4qaG8t" +
"bipJKmljZS1jcmVhbT9pY2VjcmVhbT9pY2VjcmVhbS0/ajBcNDE/ajAwMWE/ajAxP2shKmsnKmstKmsx" +
"KmthKmxpc3QqbTAwMDNhMDA1YSptMDAzYTAwMDVhKm0wMDNhMDA1Km0wMDNhMDA1YSpuMTIqbjEzKm8t" +
"bjAxMypvLW4xMipvLW40P28tbjQhP28tbjR6P28tbjlhLWI1Km8tbjlhYjUqb24wMTMqb24xMipvbjQ/" +
"b240IT9vbjR6P29uOWEtYjUqb245YWI1Km/CrW4wMTMqb8KtbjEyKnAwMCpwMDEqcDAxwr0hKnAwMcK9" +
"KnAwMcK9YSpwMDHCvcK+KnAwMipwMMK9KnEtbjAxMypxLW4xMipxbjAxMypxbjEyKnItMDAhKnItMDAh" +
"NSpyLTAwIe+8lSpyLTAwYSpyLe+8kFwxIS01KnIt77yQXDEhLe+8lSpyLe+8kFwxISpyLe+8kFwxITUq" +
"ci3vvJBcMSHvvJUqci3vvJBcMWEqci3vvJBcMyE1KnIwMCEqcjAwLTUqcjAwLjUqcjAwNSpyMDBhKnIw" +
"NSpyMDYqcjQqcjUqctmg2aYqctmkKnLZpSpy27Dbtipy27Qqctu1KnLfgN+GKnLfhCpy34UqcuClpuCl" +
"rCpy4KWqKnLgpasqcuCnpuCnrCpy4KeqKnLgp6sqcuCppuCprCpy4KmqKnLgqasqcuCrpuCrrCpy4Kuq" +
"KnLgq6sqcuCtpuCtrCpy4K2qKnLgrasqcuCvpuCvrCpy4K+qKnLgr6sqcuCxpuCxrCpy4LGqKnLgsasq" +
"cuCzpuCzrCpy4LOqKnLgs6sqcuC1puC1rCpy4LWqKnLgtasqcuC5kOC5lipy4LmUKnLguZUqcuC7kOC7" +
"lipy4LuUKnLgu5UqcuC8oOC8pipy4LykKnLgvKUqcuGBgOGBhipy4YGEKnLhgYUqcuGCkOGClipy4YKU" +
"KnLhgpUqcuGfoOGfpipy4Z+kKnLhn6UqcuGgkOGglipy4aCUKnLhoJUqcuGlhuGljCpy4aWKKnLhpYsq" +
"cuGnkOGnlipy4aeUKnLhp5UqcuGtkOGtlipy4a2UKnLhrZUqcuGusOGutipy4a60KnLhrrUqcuGxgOGx" +
"hipy4bGEKnLhsYUqcuGxkOGxlipy4bGUKnLhsZUqcuqYoFwx6pilKnLqmKDqmKUqcuqYoOqYpipy6pik" +
"KnLqmKUqcuqjkOqjlipy6qOUKnLqo5UqcuqkgOqkhipy6qSEKnLqpIUqcuqpkOqplipy6qmUKnLqqZUq" +
"cvCQkqAqcvCQkqUqcvCdn5gqcvCdn50qcu+8kFwxISpy77yQXDEt77yVKnLvvJBcMS7vvJUqcu+8kFwx" +
"YSpy77yQXDHqmKUqcu+8kFwx77yO77yVKnLvvJBcMe+8lSpy77yQ77yVKnLvvJDvvJYqcu+8lCpy77yV" +
"KnNpKnPEsSp0ZXN02aIqdGVzdNmi2aAqdGVzdNmjKnVBZS0qdWFlKnViZS0qdUJlKnVjZS0xw6kqdWNl" +
"McOpLSp1Y2Uxw6kqdWPDqS0xZSp1Y8OpMWUtKnVjw6kxZSp3ZWlhMSp3ZWlhMip3ZWlzczEqd2Vpc3My" +
"KndlaXoxKndlaXoyKndlacOfMSp3ZWnDnzIqeSBhMyp5IGE0KnknYTMqeSdhNCp5K2EzKnkrYTQqeS1h" +
"Myp5LWE0KnlhMyp5YTQqej96IDA1MD96IDIxP3ohMjE/ejIwP3oyMj96YTIxP3rCqTIxP1sxKl8xKsKt" +
"bjEyKsKtbjEzKsSwKg==";
string[] fileNames = Encoding.UTF8.GetString(Convert.FromBase64String(encodedFileNames))
.Replace("*", ".txt?").Split(new[] { "?" }, StringSplitOptions.RemoveEmptyEntries)
.Select(n => expand(n)).ToArray();
Matthews Horsleys answer is the fastest method which doesn't change behaviour depending on which version of windows your program is running on. However, it can be even faster by creating the regex once, and using RegexOptions.Compiled. I also added the option of inserting a string comparer so you can ignore case if needed, and improved readability a bit.
public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> items, Func<T, string> selector, StringComparer stringComparer = null)
{
var regex = new Regex(#"\d+", RegexOptions.Compiled);
int maxDigits = items
.SelectMany(i => regex.Matches(selector(i)).Cast<Match>().Select(digitChunk => (int?)digitChunk.Value.Length))
.Max() ?? 0;
return items.OrderBy(i => regex.Replace(selector(i), match => match.Value.PadLeft(maxDigits, '0')), stringComparer ?? StringComparer.CurrentCulture);
}
Use by
var sortedEmployees = employees.OrderByNatural(emp => emp.Name);
This takes 450ms to sort 100,000 strings compared to 300ms for the default .net string comparison - pretty fast!
Pure C# solution for linq orderby:
http://zootfroot.blogspot.com/2009/09/natural-sort-compare-with-linq-orderby.html
public class NaturalSortComparer<T> : IComparer<string>, IDisposable
{
private bool isAscending;
public NaturalSortComparer(bool inAscendingOrder = true)
{
this.isAscending = inAscendingOrder;
}
#region IComparer<string> Members
public int Compare(string x, string y)
{
throw new NotImplementedException();
}
#endregion
#region IComparer<string> Members
int IComparer<string>.Compare(string x, string y)
{
if (x == y)
return 0;
string[] x1, y1;
if (!table.TryGetValue(x, out x1))
{
x1 = Regex.Split(x.Replace(" ", ""), "([0-9]+)");
table.Add(x, x1);
}
if (!table.TryGetValue(y, out y1))
{
y1 = Regex.Split(y.Replace(" ", ""), "([0-9]+)");
table.Add(y, y1);
}
int returnVal;
for (int i = 0; i < x1.Length && i < y1.Length; i++)
{
if (x1[i] != y1[i])
{
returnVal = PartCompare(x1[i], y1[i]);
return isAscending ? returnVal : -returnVal;
}
}
if (y1.Length > x1.Length)
{
returnVal = 1;
}
else if (x1.Length > y1.Length)
{
returnVal = -1;
}
else
{
returnVal = 0;
}
return isAscending ? returnVal : -returnVal;
}
private static int PartCompare(string left, string right)
{
int x, y;
if (!int.TryParse(left, out x))
return left.CompareTo(right);
if (!int.TryParse(right, out y))
return left.CompareTo(right);
return x.CompareTo(y);
}
#endregion
private Dictionary<string, string[]> table = new Dictionary<string, string[]>();
public void Dispose()
{
table.Clear();
table = null;
}
}
My solution:
void Main()
{
new[] {"a4","a3","a2","a10","b5","b4","b400","1","C1d","c1d2"}.OrderBy(x => x, new NaturalStringComparer()).Dump();
}
public class NaturalStringComparer : IComparer<string>
{
private static readonly Regex _re = new Regex(#"(?<=\D)(?=\d)|(?<=\d)(?=\D)", RegexOptions.Compiled);
public int Compare(string x, string y)
{
x = x.ToLower();
y = y.ToLower();
if(string.Compare(x, 0, y, 0, Math.Min(x.Length, y.Length)) == 0)
{
if(x.Length == y.Length) return 0;
return x.Length < y.Length ? -1 : 1;
}
var a = _re.Split(x);
var b = _re.Split(y);
int i = 0;
while(true)
{
int r = PartCompare(a[i], b[i]);
if(r != 0) return r;
++i;
}
}
private static int PartCompare(string x, string y)
{
int a, b;
if(int.TryParse(x, out a) && int.TryParse(y, out b))
return a.CompareTo(b);
return x.CompareTo(y);
}
}
Results:
1
a2
a3
a4
a10
b4
b5
b400
C1d
c1d2
You do need to be careful -- I vaguely recall reading that StrCmpLogicalW, or something like it, was not strictly transitive, and I have observed .NET's sort methods to sometimes get stuck in infinite loops if the comparison function breaks that rule.
A transitive comparison will always report that a < c if a < b and b < c. There exists a function that does a natural sort order comparison that does not always meet that criterion, but I can't recall whether it is StrCmpLogicalW or something else.
This is my code to sort a string having both alpha and numeric characters.
First, this extension method:
public static IEnumerable<string> AlphanumericSort(this IEnumerable<string> me)
{
return me.OrderBy(x => Regex.Replace(x, #"\d+", m => m.Value.PadLeft(50, '0')));
}
Then, simply use it anywhere in your code like this:
List<string> test = new List<string>() { "The 1st", "The 12th", "The 2nd" };
test = test.AlphanumericSort();
How does it works ? By replaceing with zeros:
Original | Regex Replace | The | Returned
List | Apply PadLeft | Sorting | List
| | |
"The 1st" | "The 001st" | "The 001st" | "The 1st"
"The 12th" | "The 012th" | "The 002nd" | "The 2nd"
"The 2nd" | "The 002nd" | "The 012th" | "The 12th"
Works with multiples numbers:
Alphabetical Sorting | Alphanumeric Sorting
|
"Page 21, Line 42" | "Page 3, Line 7"
"Page 21, Line 5" | "Page 3, Line 32"
"Page 3, Line 32" | "Page 21, Line 5"
"Page 3, Line 7" | "Page 21, Line 42"
Hope that's will help.
Here's a version for .NET Core 2.1+ / .NET 5.0+, using spans to avoid allocations
public class NaturalSortStringComparer : IComparer<string>
{
public static NaturalSortStringComparer Ordinal { get; } = new NaturalSortStringComparer(StringComparison.Ordinal);
public static NaturalSortStringComparer OrdinalIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.OrdinalIgnoreCase);
public static NaturalSortStringComparer CurrentCulture { get; } = new NaturalSortStringComparer(StringComparison.CurrentCulture);
public static NaturalSortStringComparer CurrentCultureIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.CurrentCultureIgnoreCase);
public static NaturalSortStringComparer InvariantCulture { get; } = new NaturalSortStringComparer(StringComparison.InvariantCulture);
public static NaturalSortStringComparer InvariantCultureIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.InvariantCultureIgnoreCase);
private readonly StringComparison _comparison;
public NaturalSortStringComparer(StringComparison comparison)
{
_comparison = comparison;
}
public int Compare(string x, string y)
{
// Let string.Compare handle the case where x or y is null
if (x is null || y is null)
return string.Compare(x, y, _comparison);
var xSegments = GetSegments(x);
var ySegments = GetSegments(y);
while (xSegments.MoveNext() && ySegments.MoveNext())
{
int cmp;
// If they're both numbers, compare the value
if (xSegments.CurrentIsNumber && ySegments.CurrentIsNumber)
{
var xValue = long.Parse(xSegments.Current);
var yValue = long.Parse(ySegments.Current);
cmp = xValue.CompareTo(yValue);
if (cmp != 0)
return cmp;
}
// If x is a number and y is not, x is "lesser than" y
else if (xSegments.CurrentIsNumber)
{
return -1;
}
// If y is a number and x is not, x is "greater than" y
else if (ySegments.CurrentIsNumber)
{
return 1;
}
// OK, neither are number, compare the segments as text
cmp = xSegments.Current.CompareTo(ySegments.Current, _comparison);
if (cmp != 0)
return cmp;
}
// At this point, either all segments are equal, or one string is shorter than the other
// If x is shorter, it's "lesser than" y
if (x.Length < y.Length)
return -1;
// If x is longer, it's "greater than" y
if (x.Length > y.Length)
return 1;
// If they have the same length, they're equal
return 0;
}
private static StringSegmentEnumerator GetSegments(string s) => new StringSegmentEnumerator(s);
private struct StringSegmentEnumerator
{
private readonly string _s;
private int _start;
private int _length;
public StringSegmentEnumerator(string s)
{
_s = s;
_start = -1;
_length = 0;
CurrentIsNumber = false;
}
public ReadOnlySpan<char> Current => _s.AsSpan(_start, _length);
public bool CurrentIsNumber { get; private set; }
public bool MoveNext()
{
var currentPosition = _start >= 0
? _start + _length
: 0;
if (currentPosition >= _s.Length)
return false;
int start = currentPosition;
bool isFirstCharDigit = Char.IsDigit(_s[currentPosition]);
while (++currentPosition < _s.Length && Char.IsDigit(_s[currentPosition]) == isFirstCharDigit)
{
}
_start = start;
_length = currentPosition - start;
CurrentIsNumber = isFirstCharDigit;
return true;
}
}
}
Adding to Greg Beech's answer (because I've just been searching for that), if you want to use this from Linq you can use the OrderBy that takes an IComparer. E.g.:
var items = new List<MyItem>();
// fill items
var sorted = items.OrderBy(item => item.Name, new NaturalStringComparer());
Here's a relatively simple example that doesn't use P/Invoke and avoids any allocation during execution.
Feel free to use the code from here, or if it's easier there's a NuGet package:
https://www.nuget.org/packages/NaturalSort
https://github.com/drewnoakes/natural-sort
internal sealed class NaturalStringComparer : IComparer<string>
{
public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();
public int Compare(string x, string y)
{
// sort nulls to the start
if (x == null)
return y == null ? 0 : -1;
if (y == null)
return 1;
var ix = 0;
var iy = 0;
while (true)
{
// sort shorter strings to the start
if (ix >= x.Length)
return iy >= y.Length ? 0 : -1;
if (iy >= y.Length)
return 1;
var cx = x[ix];
var cy = y[iy];
int result;
if (char.IsDigit(cx) && char.IsDigit(cy))
result = CompareInteger(x, y, ref ix, ref iy);
else
result = cx.CompareTo(y[iy]);
if (result != 0)
return result;
ix++;
iy++;
}
}
private static int CompareInteger(string x, string y, ref int ix, ref int iy)
{
var lx = GetNumLength(x, ix);
var ly = GetNumLength(y, iy);
// shorter number first (note, doesn't handle leading zeroes)
if (lx != ly)
return lx.CompareTo(ly);
for (var i = 0; i < lx; i++)
{
var result = x[ix++].CompareTo(y[iy++]);
if (result != 0)
return result;
}
return 0;
}
private static int GetNumLength(string s, int i)
{
var length = 0;
while (i < s.Length && char.IsDigit(s[i++]))
length++;
return length;
}
}
It doesn't ignore leading zeroes, so 01 comes after 2.
Corresponding unit test:
public class NumericStringComparerTests
{
[Fact]
public void OrdersCorrectly()
{
AssertEqual("", "");
AssertEqual(null, null);
AssertEqual("Hello", "Hello");
AssertEqual("Hello123", "Hello123");
AssertEqual("123", "123");
AssertEqual("123Hello", "123Hello");
AssertOrdered("", "Hello");
AssertOrdered(null, "Hello");
AssertOrdered("Hello", "Hello1");
AssertOrdered("Hello123", "Hello124");
AssertOrdered("Hello123", "Hello133");
AssertOrdered("Hello123", "Hello223");
AssertOrdered("123", "124");
AssertOrdered("123", "133");
AssertOrdered("123", "223");
AssertOrdered("123", "1234");
AssertOrdered("123", "2345");
AssertOrdered("0", "1");
AssertOrdered("123Hello", "124Hello");
AssertOrdered("123Hello", "133Hello");
AssertOrdered("123Hello", "223Hello");
AssertOrdered("123Hello", "1234Hello");
}
private static void AssertEqual(string x, string y)
{
Assert.Equal(0, NaturalStringComparer.Instance.Compare(x, y));
Assert.Equal(0, NaturalStringComparer.Instance.Compare(y, x));
}
private static void AssertOrdered(string x, string y)
{
Assert.Equal(-1, NaturalStringComparer.Instance.Compare(x, y));
Assert.Equal( 1, NaturalStringComparer.Instance.Compare(y, x));
}
}
I've actually implemented it as an extension method on the StringComparer so that you could do for example:
StringComparer.CurrentCulture.WithNaturalSort() or
StringComparer.OrdinalIgnoreCase.WithNaturalSort().
The resulting IComparer<string> can be used in all places like OrderBy, OrderByDescending, ThenBy, ThenByDescending, SortedSet<string>, etc. And you can still easily tweak case sensitivity, culture, etc.
The implementation is fairly trivial and it should perform quite well even on large sequences.
I've also published it as a tiny NuGet package, so you can just do:
Install-Package NaturalSort.Extension
The code including XML documentation comments and suite of tests is available in the NaturalSort.Extension GitHub repository.
The entire code is this (if you cannot use C# 7 yet, just install the NuGet package):
public static class StringComparerNaturalSortExtension
{
public static IComparer<string> WithNaturalSort(this StringComparer stringComparer) => new NaturalSortComparer(stringComparer);
private class NaturalSortComparer : IComparer<string>
{
public NaturalSortComparer(StringComparer stringComparer)
{
_stringComparer = stringComparer;
}
private readonly StringComparer _stringComparer;
private static readonly Regex NumberSequenceRegex = new Regex(#"(\d+)", RegexOptions.Compiled | RegexOptions.CultureInvariant);
private static string[] Tokenize(string s) => s == null ? new string[] { } : NumberSequenceRegex.Split(s);
private static ulong ParseNumberOrZero(string s) => ulong.TryParse(s, NumberStyles.None, CultureInfo.InvariantCulture, out var result) ? result : 0;
public int Compare(string s1, string s2)
{
var tokens1 = Tokenize(s1);
var tokens2 = Tokenize(s2);
var zipCompare = tokens1.Zip(tokens2, TokenCompare).FirstOrDefault(x => x != 0);
if (zipCompare != 0)
return zipCompare;
var lengthCompare = tokens1.Length.CompareTo(tokens2.Length);
return lengthCompare;
}
private int TokenCompare(string token1, string token2)
{
var number1 = ParseNumberOrZero(token1);
var number2 = ParseNumberOrZero(token2);
var numberCompare = number1.CompareTo(number2);
if (numberCompare != 0)
return numberCompare;
var stringCompare = _stringComparer.Compare(token1, token2);
return stringCompare;
}
}
}
Inspired by Michael Parker's solution, here is an IComparer implementation that you can drop in to any of the linq ordering methods:
private class NaturalStringComparer : IComparer<string>
{
public int Compare(string left, string right)
{
int max = new[] { left, right }
.SelectMany(x => Regex.Matches(x, #"\d+").Cast<Match>().Select(y => (int?)y.Value.Length))
.Max() ?? 0;
var leftPadded = Regex.Replace(left, #"\d+", m => m.Value.PadLeft(max, '0'));
var rightPadded = Regex.Replace(right, #"\d+", m => m.Value.PadLeft(max, '0'));
return string.Compare(leftPadded, rightPadded);
}
}
Here is a naive one-line regex-less LINQ way (borrowed from python):
var alphaStrings = new List<string>() { "10","2","3","4","50","11","100","a12","b12" };
var orderedString = alphaStrings.OrderBy(g => new Tuple<int, string>(g.ToCharArray().All(char.IsDigit)? int.Parse(g) : int.MaxValue, g));
// Order Now: ["2","3","4","10","11","50","100","a12","b12"]
Expanding on a couple of the previous answers and making use of extension methods, I came up with the following that doesn't have the caveats of potential multiple enumerable enumeration, or performance issues concerned with using multiple regex objects, or calling regex needlessly, that being said, it does use ToList(), which can negate the benefits in larger collections.
The selector supports generic typing to allow any delegate to be assigned, the elements in the source collection are mutated by the selector, then converted to strings with ToString().
private static readonly Regex _NaturalOrderExpr = new Regex(#"\d+", RegexOptions.Compiled);
public static IEnumerable<TSource> OrderByNatural<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
int max = 0;
var selection = source.Select(
o =>
{
var v = selector(o);
var s = v != null ? v.ToString() : String.Empty;
if (!String.IsNullOrWhiteSpace(s))
{
var mc = _NaturalOrderExpr.Matches(s);
if (mc.Count > 0)
{
max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
}
}
return new
{
Key = o,
Value = s
};
}).ToList();
return
selection.OrderBy(
o =>
String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
.Select(o => o.Key);
}
public static IEnumerable<TSource> OrderByDescendingNatural<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
int max = 0;
var selection = source.Select(
o =>
{
var v = selector(o);
var s = v != null ? v.ToString() : String.Empty;
if (!String.IsNullOrWhiteSpace(s))
{
var mc = _NaturalOrderExpr.Matches(s);
if (mc.Count > 0)
{
max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
}
}
return new
{
Key = o,
Value = s
};
}).ToList();
return
selection.OrderByDescending(
o =>
String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
.Select(o => o.Key);
}
A version that's easier to read/maintain.
public class NaturalStringComparer : IComparer<string>
{
public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();
public int Compare(string x, string y) {
const int LeftIsSmaller = -1;
const int RightIsSmaller = 1;
const int Equal = 0;
var leftString = x;
var rightString = y;
var stringComparer = CultureInfo.CurrentCulture.CompareInfo;
int rightIndex;
int leftIndex;
for (leftIndex = 0, rightIndex = 0;
leftIndex < leftString.Length && rightIndex < rightString.Length;
leftIndex++, rightIndex++) {
var leftChar = leftString[leftIndex];
var rightChar = rightString[leftIndex];
var leftIsNumber = char.IsNumber(leftChar);
var rightIsNumber = char.IsNumber(rightChar);
if (!leftIsNumber && !rightIsNumber) {
var result = stringComparer.Compare(leftString, leftIndex, 1, rightString, leftIndex, 1);
if (result != 0) return result;
} else if (leftIsNumber && !rightIsNumber) {
return LeftIsSmaller;
} else if (!leftIsNumber && rightIsNumber) {
return RightIsSmaller;
} else {
var leftNumberLength = NumberLength(leftString, leftIndex, out var leftNumber);
var rightNumberLength = NumberLength(rightString, rightIndex, out var rightNumber);
if (leftNumberLength < rightNumberLength) {
return LeftIsSmaller;
} else if (leftNumberLength > rightNumberLength) {
return RightIsSmaller;
} else {
if(leftNumber < rightNumber) {
return LeftIsSmaller;
} else if(leftNumber > rightNumber) {
return RightIsSmaller;
}
}
}
}
if (leftString.Length < rightString.Length) {
return LeftIsSmaller;
} else if(leftString.Length > rightString.Length) {
return RightIsSmaller;
}
return Equal;
}
public int NumberLength(string str, int offset, out int number) {
if (string.IsNullOrWhiteSpace(str)) throw new ArgumentNullException(nameof(str));
if (offset >= str.Length) throw new ArgumentOutOfRangeException(nameof(offset), offset, "Offset must be less than the length of the string.");
var currentOffset = offset;
var curChar = str[currentOffset];
if (!char.IsNumber(curChar))
throw new ArgumentException($"'{curChar}' is not a number.", nameof(offset));
int length = 1;
var numberString = string.Empty;
for (currentOffset = offset + 1;
currentOffset < str.Length;
currentOffset++, length++) {
curChar = str[currentOffset];
numberString += curChar;
if (!char.IsNumber(curChar)) {
number = int.Parse(numberString);
return length;
}
}
number = int.Parse(numberString);
return length;
}
}
We had a need for a natural sort to deal with text with the following pattern:
"Test 1-1-1 something"
"Test 1-2-3 something"
...
For some reason when I first looked on SO, I didn't find this post and implemented our own. Compared to some of the solutions presented here, while similar in concept, it could have the benefit of maybe being simpler and easier to understand. However, while I did try to look at performance bottlenecks, It is still a much slower implementation than the default OrderBy().
Here is the extension method I implement:
public static class EnumerableExtensions
{
// set up the regex parser once and for all
private static readonly Regex Regex = new Regex(#"\d+|\D+", RegexOptions.Compiled | RegexOptions.Singleline);
// stateless comparer can be built once
private static readonly AggregateComparer Comparer = new AggregateComparer();
public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> source, Func<T, string> selector)
{
// first extract string from object using selector
// then extract digit and non-digit groups
Func<T, IEnumerable<IComparable>> splitter =
s => Regex.Matches(selector(s))
.Cast<Match>()
.Select(m => Char.IsDigit(m.Value[0]) ? (IComparable) int.Parse(m.Value) : m.Value);
return source.OrderBy(splitter, Comparer);
}
/// <summary>
/// This comparer will compare two lists of objects against each other
/// </summary>
/// <remarks>Objects in each list are compare to their corresponding elements in the other
/// list until a difference is found.</remarks>
private class AggregateComparer : IComparer<IEnumerable<IComparable>>
{
public int Compare(IEnumerable<IComparable> x, IEnumerable<IComparable> y)
{
return
x.Zip(y, (a, b) => new {a, b}) // walk both lists
.Select(pair => pair.a.CompareTo(pair.b)) // compare each object
.FirstOrDefault(result => result != 0); // until a difference is found
}
}
}
The idea is to split the original strings into blocks of digits and non-digits ("\d+|\D+"). Since this is a potentially expensive task, it is done only once per entry. We then use a comparer of comparable objects (sorry, I can't find a more proper way to say it). It compares each block to its corresponding block in the other string.
I would like feedback on how this could be improved and what the major flaws are. Note that maintainability is important to us at this point and we are not currently using this in extremely large data sets.
Let me explain my problem and how i was able to solve it.
Problem:- Sort files based on FileName from FileInfo objects which are retrieved from a Directory.
Solution:- I selected the file names from FileInfo and trimed the ".png" part of the file name. Now, just do List.Sort(), which sorts the filenames in Natural sorting order. Based on my testing i found that having .png messes up sorting order. Have a look at the below code
var imageNameList = new DirectoryInfo(#"C:\Temp\Images").GetFiles("*.png").Select(x =>x.Name.Substring(0, x.Name.Length - 4)).ToList();
imageNameList.Sort();

List.orderBy is getting stuck in an infinite loop

I have a List<Expense> myList where expense contains 2 fields: decimal Amount and a Status ItemStatus. Status is an enum {Paid, DueSoon, DueToday, Overdue, Unpaid}.
I was trying to sort the list in ascending or descending order however Status.Unpaid needs to always appear last in either ascending or descending order.
Using myList.Sort((x, y) => comparer.Compare(x.ItemStatus, y.ItemStatus)) along with my comparer worked well.
However, after sorting the list by ItemStatus I also wanted to sort the list by Amount. So I decided to use myList = myList.OrderBy(x => x.ItemStatus, comparer).ThenBy(x => x.Amount).ToList() this resulted in an infinite loop somewhere.
The infinite loop was still there when i removed the .ThenBy() method entirely.
I added a static counter to my comparer to try and debug and the OrderBy() method used the comparer 90 times on a list of 30 expenses before entering the infinite loop.
This is my comparer:
class StatusComparer : IComparer<Status>
{
public bool IsAscending { get; private set; } = true;
public StatusComparer(bool isAscending)
{
IsAscending = isAscending;
}
public int Compare(Status x, Status y)
{
if (IsUnpaid(x)) { return IsAscending? 1 : -1; }
if (IsUnpaid(y)) { return IsAscending ? -1 : 1; }
return x.CompareTo(y);
}
private static bool IsUnpaid(Status status)
{
return status.CompareTo(Status.Unpaid) == 0;
}
}
What am I doing wrong or how can I achieve what I'm trying to do?
Thanks in advance.
Your implementation of Compare is incorrect
public int Compare(Status x, Status y)
{
if (IsUnpaid(x)) { return IsAscending? 1 : -1; }
if (IsUnpaid(y)) { return IsAscending ? -1 : 1; }
return x.CompareTo(y);
}
Imagine, that we have IsAscending == true, IsUnpaid(x) == true and IsUnpaid(y) == true. In this case
x.Compare(y) == 1 // so x > y
y.Compare(x) == 1 // so y > x
That's why OrderBy may well enter into infinite loop (what is the right order for {x, y} collection if x > y and y > x?). You, probably, want
public int Compare(Status x, Status y) {
if (IsUnpaid(x)) {
if (!IsUnpaid(y))
return IsAscending ? -1 : 1; // x is UnPaid, y is Paid
}
else if (IsUnpaid(y)) {
return IsAscending ? 1 : -1; // x is Paid, y is UnPaid
}
// x and y either both Paid or unPaid
// If IsAscending should be taken into account, use it as below:
// return IsAscending ? x.CompareTo(y) : y.CompareTo(x);
return x.CompareTo(y);
}

Perform Linq Orderby generic list contain integer and string value [duplicate]

Anyone have a good resource or provide a sample of a natural order sort in C# for an FileInfo array? I am implementing the IComparer interface in my sorts.
The easiest thing to do is just P/Invoke the built-in function in Windows, and use it as the comparison function in your IComparer:
[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
private static extern int StrCmpLogicalW(string psz1, string psz2);
Michael Kaplan has some examples of how this function works here, and the changes that were made for Vista to make it work more intuitively. The plus side of this function is that it will have the same behaviour as the version of Windows it runs on, however this does mean that it differs between versions of Windows so you need to consider whether this is a problem for you.
So a complete implementation would be something like:
[SuppressUnmanagedCodeSecurity]
internal static class SafeNativeMethods
{
[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
public static extern int StrCmpLogicalW(string psz1, string psz2);
}
public sealed class NaturalStringComparer : IComparer<string>
{
public int Compare(string a, string b)
{
return SafeNativeMethods.StrCmpLogicalW(a, b);
}
}
public sealed class NaturalFileInfoNameComparer : IComparer<FileInfo>
{
public int Compare(FileInfo a, FileInfo b)
{
return SafeNativeMethods.StrCmpLogicalW(a.Name, b.Name);
}
}
Just thought I'd add to this (with the most concise solution I could find):
public static IOrderedEnumerable<T> OrderByAlphaNumeric<T>(this IEnumerable<T> source, Func<T, string> selector)
{
int max = source
.SelectMany(i => Regex.Matches(selector(i), #"\d+").Cast<Match>().Select(m => (int?)m.Value.Length))
.Max() ?? 0;
return source.OrderBy(i => Regex.Replace(selector(i), #"\d+", m => m.Value.PadLeft(max, '0')));
}
The above pads any numbers in the string to the max length of all numbers in all strings and uses the resulting string to sort.
The cast to (int?) is to allow for collections of strings without any numbers (.Max() on an empty enumerable throws an InvalidOperationException).
None of the existing implementations looked great so I wrote my own. The results are almost identical to the sorting used by modern versions of Windows Explorer (Windows 7/8). The only differences I've seen are 1) although Windows used to (e.g. XP) handle numbers of any length, it's now limited to 19 digits - mine is unlimited, 2) Windows gives inconsistent results with certain sets of Unicode digits - mine works fine (although it doesn't numerically compare digits from surrogate pairs; nor does Windows), and 3) mine can't distinguish different types of non-primary sort weights if they occur in different sections (e.g. "e-1é" vs "é1e-" - the sections before and after the number have diacritic and punctuation weight differences).
public static int CompareNatural(string strA, string strB) {
return CompareNatural(strA, strB, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase);
}
public static int CompareNatural(string strA, string strB, CultureInfo culture, CompareOptions options) {
CompareInfo cmp = culture.CompareInfo;
int iA = 0;
int iB = 0;
int softResult = 0;
int softResultWeight = 0;
while (iA < strA.Length && iB < strB.Length) {
bool isDigitA = Char.IsDigit(strA[iA]);
bool isDigitB = Char.IsDigit(strB[iB]);
if (isDigitA != isDigitB) {
return cmp.Compare(strA, iA, strB, iB, options);
}
else if (!isDigitA && !isDigitB) {
int jA = iA + 1;
int jB = iB + 1;
while (jA < strA.Length && !Char.IsDigit(strA[jA])) jA++;
while (jB < strB.Length && !Char.IsDigit(strB[jB])) jB++;
int cmpResult = cmp.Compare(strA, iA, jA - iA, strB, iB, jB - iB, options);
if (cmpResult != 0) {
// Certain strings may be considered different due to "soft" differences that are
// ignored if more significant differences follow, e.g. a hyphen only affects the
// comparison if no other differences follow
string sectionA = strA.Substring(iA, jA - iA);
string sectionB = strB.Substring(iB, jB - iB);
if (cmp.Compare(sectionA + "1", sectionB + "2", options) ==
cmp.Compare(sectionA + "2", sectionB + "1", options))
{
return cmp.Compare(strA, iA, strB, iB, options);
}
else if (softResultWeight < 1) {
softResult = cmpResult;
softResultWeight = 1;
}
}
iA = jA;
iB = jB;
}
else {
char zeroA = (char)(strA[iA] - (int)Char.GetNumericValue(strA[iA]));
char zeroB = (char)(strB[iB] - (int)Char.GetNumericValue(strB[iB]));
int jA = iA;
int jB = iB;
while (jA < strA.Length && strA[jA] == zeroA) jA++;
while (jB < strB.Length && strB[jB] == zeroB) jB++;
int resultIfSameLength = 0;
do {
isDigitA = jA < strA.Length && Char.IsDigit(strA[jA]);
isDigitB = jB < strB.Length && Char.IsDigit(strB[jB]);
int numA = isDigitA ? (int)Char.GetNumericValue(strA[jA]) : 0;
int numB = isDigitB ? (int)Char.GetNumericValue(strB[jB]) : 0;
if (isDigitA && (char)(strA[jA] - numA) != zeroA) isDigitA = false;
if (isDigitB && (char)(strB[jB] - numB) != zeroB) isDigitB = false;
if (isDigitA && isDigitB) {
if (numA != numB && resultIfSameLength == 0) {
resultIfSameLength = numA < numB ? -1 : 1;
}
jA++;
jB++;
}
}
while (isDigitA && isDigitB);
if (isDigitA != isDigitB) {
// One number has more digits than the other (ignoring leading zeros) - the longer
// number must be larger
return isDigitA ? 1 : -1;
}
else if (resultIfSameLength != 0) {
// Both numbers are the same length (ignoring leading zeros) and at least one of
// the digits differed - the first difference determines the result
return resultIfSameLength;
}
int lA = jA - iA;
int lB = jB - iB;
if (lA != lB) {
// Both numbers are equivalent but one has more leading zeros
return lA > lB ? -1 : 1;
}
else if (zeroA != zeroB && softResultWeight < 2) {
softResult = cmp.Compare(strA, iA, 1, strB, iB, 1, options);
softResultWeight = 2;
}
iA = jA;
iB = jB;
}
}
if (iA < strA.Length || iB < strB.Length) {
return iA < strA.Length ? 1 : -1;
}
else if (softResult != 0) {
return softResult;
}
return 0;
}
The signature matches the Comparison<string> delegate:
string[] files = Directory.GetFiles(#"C:\");
Array.Sort(files, CompareNatural);
Here's a wrapper class for use as IComparer<string>:
public class CustomComparer<T> : IComparer<T> {
private Comparison<T> _comparison;
public CustomComparer(Comparison<T> comparison) {
_comparison = comparison;
}
public int Compare(T x, T y) {
return _comparison(x, y);
}
}
Example:
string[] files = Directory.EnumerateFiles(#"C:\")
.OrderBy(f => f, new CustomComparer<string>(CompareNatural))
.ToArray();
Here's a good set of filenames I use for testing:
Func<string, string> expand = (s) => { int o; while ((o = s.IndexOf('\\')) != -1) { int p = o + 1;
int z = 1; while (s[p] == '0') { z++; p++; } int c = Int32.Parse(s.Substring(p, z));
s = s.Substring(0, o) + new string(s[o - 1], c) + s.Substring(p + z); } return s; };
string encodedFileNames =
"KDEqLW4xMiotbjEzKjAwMDFcMDY2KjAwMlwwMTcqMDA5XDAxNyowMlwwMTcqMDlcMDE3KjEhKjEtISox" +
"LWEqMS4yNT8xLjI1KjEuNT8xLjUqMSoxXDAxNyoxXDAxOCoxXDAxOSoxXDA2NioxXDA2NyoxYSoyXDAx" +
"NyoyXDAxOCo5XDAxNyo5XDAxOCo5XDA2Nio9MSphMDAxdGVzdDAxKmEwMDF0ZXN0aW5nYTBcMzEqYTAw" +
"Mj9hMDAyIGE/YTAwMiBhKmEwMDIqYTAwMmE/YTAwMmEqYTAxdGVzdGluZ2EwMDEqYTAxdnNmcyphMSph" +
"MWEqYTF6KmEyKmIwMDAzcTYqYjAwM3E0KmIwM3E1KmMtZSpjZCpjZipmIDEqZipnP2cgMT9oLW4qaG8t" +
"bipJKmljZS1jcmVhbT9pY2VjcmVhbT9pY2VjcmVhbS0/ajBcNDE/ajAwMWE/ajAxP2shKmsnKmstKmsx" +
"KmthKmxpc3QqbTAwMDNhMDA1YSptMDAzYTAwMDVhKm0wMDNhMDA1Km0wMDNhMDA1YSpuMTIqbjEzKm8t" +
"bjAxMypvLW4xMipvLW40P28tbjQhP28tbjR6P28tbjlhLWI1Km8tbjlhYjUqb24wMTMqb24xMipvbjQ/" +
"b240IT9vbjR6P29uOWEtYjUqb245YWI1Km/CrW4wMTMqb8KtbjEyKnAwMCpwMDEqcDAxwr0hKnAwMcK9" +
"KnAwMcK9YSpwMDHCvcK+KnAwMipwMMK9KnEtbjAxMypxLW4xMipxbjAxMypxbjEyKnItMDAhKnItMDAh" +
"NSpyLTAwIe+8lSpyLTAwYSpyLe+8kFwxIS01KnIt77yQXDEhLe+8lSpyLe+8kFwxISpyLe+8kFwxITUq" +
"ci3vvJBcMSHvvJUqci3vvJBcMWEqci3vvJBcMyE1KnIwMCEqcjAwLTUqcjAwLjUqcjAwNSpyMDBhKnIw" +
"NSpyMDYqcjQqcjUqctmg2aYqctmkKnLZpSpy27Dbtipy27Qqctu1KnLfgN+GKnLfhCpy34UqcuClpuCl" +
"rCpy4KWqKnLgpasqcuCnpuCnrCpy4KeqKnLgp6sqcuCppuCprCpy4KmqKnLgqasqcuCrpuCrrCpy4Kuq" +
"KnLgq6sqcuCtpuCtrCpy4K2qKnLgrasqcuCvpuCvrCpy4K+qKnLgr6sqcuCxpuCxrCpy4LGqKnLgsasq" +
"cuCzpuCzrCpy4LOqKnLgs6sqcuC1puC1rCpy4LWqKnLgtasqcuC5kOC5lipy4LmUKnLguZUqcuC7kOC7" +
"lipy4LuUKnLgu5UqcuC8oOC8pipy4LykKnLgvKUqcuGBgOGBhipy4YGEKnLhgYUqcuGCkOGClipy4YKU" +
"KnLhgpUqcuGfoOGfpipy4Z+kKnLhn6UqcuGgkOGglipy4aCUKnLhoJUqcuGlhuGljCpy4aWKKnLhpYsq" +
"cuGnkOGnlipy4aeUKnLhp5UqcuGtkOGtlipy4a2UKnLhrZUqcuGusOGutipy4a60KnLhrrUqcuGxgOGx" +
"hipy4bGEKnLhsYUqcuGxkOGxlipy4bGUKnLhsZUqcuqYoFwx6pilKnLqmKDqmKUqcuqYoOqYpipy6pik" +
"KnLqmKUqcuqjkOqjlipy6qOUKnLqo5UqcuqkgOqkhipy6qSEKnLqpIUqcuqpkOqplipy6qmUKnLqqZUq" +
"cvCQkqAqcvCQkqUqcvCdn5gqcvCdn50qcu+8kFwxISpy77yQXDEt77yVKnLvvJBcMS7vvJUqcu+8kFwx" +
"YSpy77yQXDHqmKUqcu+8kFwx77yO77yVKnLvvJBcMe+8lSpy77yQ77yVKnLvvJDvvJYqcu+8lCpy77yV" +
"KnNpKnPEsSp0ZXN02aIqdGVzdNmi2aAqdGVzdNmjKnVBZS0qdWFlKnViZS0qdUJlKnVjZS0xw6kqdWNl" +
"McOpLSp1Y2Uxw6kqdWPDqS0xZSp1Y8OpMWUtKnVjw6kxZSp3ZWlhMSp3ZWlhMip3ZWlzczEqd2Vpc3My" +
"KndlaXoxKndlaXoyKndlacOfMSp3ZWnDnzIqeSBhMyp5IGE0KnknYTMqeSdhNCp5K2EzKnkrYTQqeS1h" +
"Myp5LWE0KnlhMyp5YTQqej96IDA1MD96IDIxP3ohMjE/ejIwP3oyMj96YTIxP3rCqTIxP1sxKl8xKsKt" +
"bjEyKsKtbjEzKsSwKg==";
string[] fileNames = Encoding.UTF8.GetString(Convert.FromBase64String(encodedFileNames))
.Replace("*", ".txt?").Split(new[] { "?" }, StringSplitOptions.RemoveEmptyEntries)
.Select(n => expand(n)).ToArray();
Matthews Horsleys answer is the fastest method which doesn't change behaviour depending on which version of windows your program is running on. However, it can be even faster by creating the regex once, and using RegexOptions.Compiled. I also added the option of inserting a string comparer so you can ignore case if needed, and improved readability a bit.
public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> items, Func<T, string> selector, StringComparer stringComparer = null)
{
var regex = new Regex(#"\d+", RegexOptions.Compiled);
int maxDigits = items
.SelectMany(i => regex.Matches(selector(i)).Cast<Match>().Select(digitChunk => (int?)digitChunk.Value.Length))
.Max() ?? 0;
return items.OrderBy(i => regex.Replace(selector(i), match => match.Value.PadLeft(maxDigits, '0')), stringComparer ?? StringComparer.CurrentCulture);
}
Use by
var sortedEmployees = employees.OrderByNatural(emp => emp.Name);
This takes 450ms to sort 100,000 strings compared to 300ms for the default .net string comparison - pretty fast!
Pure C# solution for linq orderby:
http://zootfroot.blogspot.com/2009/09/natural-sort-compare-with-linq-orderby.html
public class NaturalSortComparer<T> : IComparer<string>, IDisposable
{
private bool isAscending;
public NaturalSortComparer(bool inAscendingOrder = true)
{
this.isAscending = inAscendingOrder;
}
#region IComparer<string> Members
public int Compare(string x, string y)
{
throw new NotImplementedException();
}
#endregion
#region IComparer<string> Members
int IComparer<string>.Compare(string x, string y)
{
if (x == y)
return 0;
string[] x1, y1;
if (!table.TryGetValue(x, out x1))
{
x1 = Regex.Split(x.Replace(" ", ""), "([0-9]+)");
table.Add(x, x1);
}
if (!table.TryGetValue(y, out y1))
{
y1 = Regex.Split(y.Replace(" ", ""), "([0-9]+)");
table.Add(y, y1);
}
int returnVal;
for (int i = 0; i < x1.Length && i < y1.Length; i++)
{
if (x1[i] != y1[i])
{
returnVal = PartCompare(x1[i], y1[i]);
return isAscending ? returnVal : -returnVal;
}
}
if (y1.Length > x1.Length)
{
returnVal = 1;
}
else if (x1.Length > y1.Length)
{
returnVal = -1;
}
else
{
returnVal = 0;
}
return isAscending ? returnVal : -returnVal;
}
private static int PartCompare(string left, string right)
{
int x, y;
if (!int.TryParse(left, out x))
return left.CompareTo(right);
if (!int.TryParse(right, out y))
return left.CompareTo(right);
return x.CompareTo(y);
}
#endregion
private Dictionary<string, string[]> table = new Dictionary<string, string[]>();
public void Dispose()
{
table.Clear();
table = null;
}
}
My solution:
void Main()
{
new[] {"a4","a3","a2","a10","b5","b4","b400","1","C1d","c1d2"}.OrderBy(x => x, new NaturalStringComparer()).Dump();
}
public class NaturalStringComparer : IComparer<string>
{
private static readonly Regex _re = new Regex(#"(?<=\D)(?=\d)|(?<=\d)(?=\D)", RegexOptions.Compiled);
public int Compare(string x, string y)
{
x = x.ToLower();
y = y.ToLower();
if(string.Compare(x, 0, y, 0, Math.Min(x.Length, y.Length)) == 0)
{
if(x.Length == y.Length) return 0;
return x.Length < y.Length ? -1 : 1;
}
var a = _re.Split(x);
var b = _re.Split(y);
int i = 0;
while(true)
{
int r = PartCompare(a[i], b[i]);
if(r != 0) return r;
++i;
}
}
private static int PartCompare(string x, string y)
{
int a, b;
if(int.TryParse(x, out a) && int.TryParse(y, out b))
return a.CompareTo(b);
return x.CompareTo(y);
}
}
Results:
1
a2
a3
a4
a10
b4
b5
b400
C1d
c1d2
You do need to be careful -- I vaguely recall reading that StrCmpLogicalW, or something like it, was not strictly transitive, and I have observed .NET's sort methods to sometimes get stuck in infinite loops if the comparison function breaks that rule.
A transitive comparison will always report that a < c if a < b and b < c. There exists a function that does a natural sort order comparison that does not always meet that criterion, but I can't recall whether it is StrCmpLogicalW or something else.
This is my code to sort a string having both alpha and numeric characters.
First, this extension method:
public static IEnumerable<string> AlphanumericSort(this IEnumerable<string> me)
{
return me.OrderBy(x => Regex.Replace(x, #"\d+", m => m.Value.PadLeft(50, '0')));
}
Then, simply use it anywhere in your code like this:
List<string> test = new List<string>() { "The 1st", "The 12th", "The 2nd" };
test = test.AlphanumericSort();
How does it works ? By replaceing with zeros:
Original | Regex Replace | The | Returned
List | Apply PadLeft | Sorting | List
| | |
"The 1st" | "The 001st" | "The 001st" | "The 1st"
"The 12th" | "The 012th" | "The 002nd" | "The 2nd"
"The 2nd" | "The 002nd" | "The 012th" | "The 12th"
Works with multiples numbers:
Alphabetical Sorting | Alphanumeric Sorting
|
"Page 21, Line 42" | "Page 3, Line 7"
"Page 21, Line 5" | "Page 3, Line 32"
"Page 3, Line 32" | "Page 21, Line 5"
"Page 3, Line 7" | "Page 21, Line 42"
Hope that's will help.
Here's a version for .NET Core 2.1+ / .NET 5.0+, using spans to avoid allocations
public class NaturalSortStringComparer : IComparer<string>
{
public static NaturalSortStringComparer Ordinal { get; } = new NaturalSortStringComparer(StringComparison.Ordinal);
public static NaturalSortStringComparer OrdinalIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.OrdinalIgnoreCase);
public static NaturalSortStringComparer CurrentCulture { get; } = new NaturalSortStringComparer(StringComparison.CurrentCulture);
public static NaturalSortStringComparer CurrentCultureIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.CurrentCultureIgnoreCase);
public static NaturalSortStringComparer InvariantCulture { get; } = new NaturalSortStringComparer(StringComparison.InvariantCulture);
public static NaturalSortStringComparer InvariantCultureIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.InvariantCultureIgnoreCase);
private readonly StringComparison _comparison;
public NaturalSortStringComparer(StringComparison comparison)
{
_comparison = comparison;
}
public int Compare(string x, string y)
{
// Let string.Compare handle the case where x or y is null
if (x is null || y is null)
return string.Compare(x, y, _comparison);
var xSegments = GetSegments(x);
var ySegments = GetSegments(y);
while (xSegments.MoveNext() && ySegments.MoveNext())
{
int cmp;
// If they're both numbers, compare the value
if (xSegments.CurrentIsNumber && ySegments.CurrentIsNumber)
{
var xValue = long.Parse(xSegments.Current);
var yValue = long.Parse(ySegments.Current);
cmp = xValue.CompareTo(yValue);
if (cmp != 0)
return cmp;
}
// If x is a number and y is not, x is "lesser than" y
else if (xSegments.CurrentIsNumber)
{
return -1;
}
// If y is a number and x is not, x is "greater than" y
else if (ySegments.CurrentIsNumber)
{
return 1;
}
// OK, neither are number, compare the segments as text
cmp = xSegments.Current.CompareTo(ySegments.Current, _comparison);
if (cmp != 0)
return cmp;
}
// At this point, either all segments are equal, or one string is shorter than the other
// If x is shorter, it's "lesser than" y
if (x.Length < y.Length)
return -1;
// If x is longer, it's "greater than" y
if (x.Length > y.Length)
return 1;
// If they have the same length, they're equal
return 0;
}
private static StringSegmentEnumerator GetSegments(string s) => new StringSegmentEnumerator(s);
private struct StringSegmentEnumerator
{
private readonly string _s;
private int _start;
private int _length;
public StringSegmentEnumerator(string s)
{
_s = s;
_start = -1;
_length = 0;
CurrentIsNumber = false;
}
public ReadOnlySpan<char> Current => _s.AsSpan(_start, _length);
public bool CurrentIsNumber { get; private set; }
public bool MoveNext()
{
var currentPosition = _start >= 0
? _start + _length
: 0;
if (currentPosition >= _s.Length)
return false;
int start = currentPosition;
bool isFirstCharDigit = Char.IsDigit(_s[currentPosition]);
while (++currentPosition < _s.Length && Char.IsDigit(_s[currentPosition]) == isFirstCharDigit)
{
}
_start = start;
_length = currentPosition - start;
CurrentIsNumber = isFirstCharDigit;
return true;
}
}
}
Adding to Greg Beech's answer (because I've just been searching for that), if you want to use this from Linq you can use the OrderBy that takes an IComparer. E.g.:
var items = new List<MyItem>();
// fill items
var sorted = items.OrderBy(item => item.Name, new NaturalStringComparer());
Here's a relatively simple example that doesn't use P/Invoke and avoids any allocation during execution.
Feel free to use the code from here, or if it's easier there's a NuGet package:
https://www.nuget.org/packages/NaturalSort
https://github.com/drewnoakes/natural-sort
internal sealed class NaturalStringComparer : IComparer<string>
{
public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();
public int Compare(string x, string y)
{
// sort nulls to the start
if (x == null)
return y == null ? 0 : -1;
if (y == null)
return 1;
var ix = 0;
var iy = 0;
while (true)
{
// sort shorter strings to the start
if (ix >= x.Length)
return iy >= y.Length ? 0 : -1;
if (iy >= y.Length)
return 1;
var cx = x[ix];
var cy = y[iy];
int result;
if (char.IsDigit(cx) && char.IsDigit(cy))
result = CompareInteger(x, y, ref ix, ref iy);
else
result = cx.CompareTo(y[iy]);
if (result != 0)
return result;
ix++;
iy++;
}
}
private static int CompareInteger(string x, string y, ref int ix, ref int iy)
{
var lx = GetNumLength(x, ix);
var ly = GetNumLength(y, iy);
// shorter number first (note, doesn't handle leading zeroes)
if (lx != ly)
return lx.CompareTo(ly);
for (var i = 0; i < lx; i++)
{
var result = x[ix++].CompareTo(y[iy++]);
if (result != 0)
return result;
}
return 0;
}
private static int GetNumLength(string s, int i)
{
var length = 0;
while (i < s.Length && char.IsDigit(s[i++]))
length++;
return length;
}
}
It doesn't ignore leading zeroes, so 01 comes after 2.
Corresponding unit test:
public class NumericStringComparerTests
{
[Fact]
public void OrdersCorrectly()
{
AssertEqual("", "");
AssertEqual(null, null);
AssertEqual("Hello", "Hello");
AssertEqual("Hello123", "Hello123");
AssertEqual("123", "123");
AssertEqual("123Hello", "123Hello");
AssertOrdered("", "Hello");
AssertOrdered(null, "Hello");
AssertOrdered("Hello", "Hello1");
AssertOrdered("Hello123", "Hello124");
AssertOrdered("Hello123", "Hello133");
AssertOrdered("Hello123", "Hello223");
AssertOrdered("123", "124");
AssertOrdered("123", "133");
AssertOrdered("123", "223");
AssertOrdered("123", "1234");
AssertOrdered("123", "2345");
AssertOrdered("0", "1");
AssertOrdered("123Hello", "124Hello");
AssertOrdered("123Hello", "133Hello");
AssertOrdered("123Hello", "223Hello");
AssertOrdered("123Hello", "1234Hello");
}
private static void AssertEqual(string x, string y)
{
Assert.Equal(0, NaturalStringComparer.Instance.Compare(x, y));
Assert.Equal(0, NaturalStringComparer.Instance.Compare(y, x));
}
private static void AssertOrdered(string x, string y)
{
Assert.Equal(-1, NaturalStringComparer.Instance.Compare(x, y));
Assert.Equal( 1, NaturalStringComparer.Instance.Compare(y, x));
}
}
I've actually implemented it as an extension method on the StringComparer so that you could do for example:
StringComparer.CurrentCulture.WithNaturalSort() or
StringComparer.OrdinalIgnoreCase.WithNaturalSort().
The resulting IComparer<string> can be used in all places like OrderBy, OrderByDescending, ThenBy, ThenByDescending, SortedSet<string>, etc. And you can still easily tweak case sensitivity, culture, etc.
The implementation is fairly trivial and it should perform quite well even on large sequences.
I've also published it as a tiny NuGet package, so you can just do:
Install-Package NaturalSort.Extension
The code including XML documentation comments and suite of tests is available in the NaturalSort.Extension GitHub repository.
The entire code is this (if you cannot use C# 7 yet, just install the NuGet package):
public static class StringComparerNaturalSortExtension
{
public static IComparer<string> WithNaturalSort(this StringComparer stringComparer) => new NaturalSortComparer(stringComparer);
private class NaturalSortComparer : IComparer<string>
{
public NaturalSortComparer(StringComparer stringComparer)
{
_stringComparer = stringComparer;
}
private readonly StringComparer _stringComparer;
private static readonly Regex NumberSequenceRegex = new Regex(#"(\d+)", RegexOptions.Compiled | RegexOptions.CultureInvariant);
private static string[] Tokenize(string s) => s == null ? new string[] { } : NumberSequenceRegex.Split(s);
private static ulong ParseNumberOrZero(string s) => ulong.TryParse(s, NumberStyles.None, CultureInfo.InvariantCulture, out var result) ? result : 0;
public int Compare(string s1, string s2)
{
var tokens1 = Tokenize(s1);
var tokens2 = Tokenize(s2);
var zipCompare = tokens1.Zip(tokens2, TokenCompare).FirstOrDefault(x => x != 0);
if (zipCompare != 0)
return zipCompare;
var lengthCompare = tokens1.Length.CompareTo(tokens2.Length);
return lengthCompare;
}
private int TokenCompare(string token1, string token2)
{
var number1 = ParseNumberOrZero(token1);
var number2 = ParseNumberOrZero(token2);
var numberCompare = number1.CompareTo(number2);
if (numberCompare != 0)
return numberCompare;
var stringCompare = _stringComparer.Compare(token1, token2);
return stringCompare;
}
}
}
Inspired by Michael Parker's solution, here is an IComparer implementation that you can drop in to any of the linq ordering methods:
private class NaturalStringComparer : IComparer<string>
{
public int Compare(string left, string right)
{
int max = new[] { left, right }
.SelectMany(x => Regex.Matches(x, #"\d+").Cast<Match>().Select(y => (int?)y.Value.Length))
.Max() ?? 0;
var leftPadded = Regex.Replace(left, #"\d+", m => m.Value.PadLeft(max, '0'));
var rightPadded = Regex.Replace(right, #"\d+", m => m.Value.PadLeft(max, '0'));
return string.Compare(leftPadded, rightPadded);
}
}
Here is a naive one-line regex-less LINQ way (borrowed from python):
var alphaStrings = new List<string>() { "10","2","3","4","50","11","100","a12","b12" };
var orderedString = alphaStrings.OrderBy(g => new Tuple<int, string>(g.ToCharArray().All(char.IsDigit)? int.Parse(g) : int.MaxValue, g));
// Order Now: ["2","3","4","10","11","50","100","a12","b12"]
Expanding on a couple of the previous answers and making use of extension methods, I came up with the following that doesn't have the caveats of potential multiple enumerable enumeration, or performance issues concerned with using multiple regex objects, or calling regex needlessly, that being said, it does use ToList(), which can negate the benefits in larger collections.
The selector supports generic typing to allow any delegate to be assigned, the elements in the source collection are mutated by the selector, then converted to strings with ToString().
private static readonly Regex _NaturalOrderExpr = new Regex(#"\d+", RegexOptions.Compiled);
public static IEnumerable<TSource> OrderByNatural<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
int max = 0;
var selection = source.Select(
o =>
{
var v = selector(o);
var s = v != null ? v.ToString() : String.Empty;
if (!String.IsNullOrWhiteSpace(s))
{
var mc = _NaturalOrderExpr.Matches(s);
if (mc.Count > 0)
{
max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
}
}
return new
{
Key = o,
Value = s
};
}).ToList();
return
selection.OrderBy(
o =>
String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
.Select(o => o.Key);
}
public static IEnumerable<TSource> OrderByDescendingNatural<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
int max = 0;
var selection = source.Select(
o =>
{
var v = selector(o);
var s = v != null ? v.ToString() : String.Empty;
if (!String.IsNullOrWhiteSpace(s))
{
var mc = _NaturalOrderExpr.Matches(s);
if (mc.Count > 0)
{
max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
}
}
return new
{
Key = o,
Value = s
};
}).ToList();
return
selection.OrderByDescending(
o =>
String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
.Select(o => o.Key);
}
A version that's easier to read/maintain.
public class NaturalStringComparer : IComparer<string>
{
public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();
public int Compare(string x, string y) {
const int LeftIsSmaller = -1;
const int RightIsSmaller = 1;
const int Equal = 0;
var leftString = x;
var rightString = y;
var stringComparer = CultureInfo.CurrentCulture.CompareInfo;
int rightIndex;
int leftIndex;
for (leftIndex = 0, rightIndex = 0;
leftIndex < leftString.Length && rightIndex < rightString.Length;
leftIndex++, rightIndex++) {
var leftChar = leftString[leftIndex];
var rightChar = rightString[leftIndex];
var leftIsNumber = char.IsNumber(leftChar);
var rightIsNumber = char.IsNumber(rightChar);
if (!leftIsNumber && !rightIsNumber) {
var result = stringComparer.Compare(leftString, leftIndex, 1, rightString, leftIndex, 1);
if (result != 0) return result;
} else if (leftIsNumber && !rightIsNumber) {
return LeftIsSmaller;
} else if (!leftIsNumber && rightIsNumber) {
return RightIsSmaller;
} else {
var leftNumberLength = NumberLength(leftString, leftIndex, out var leftNumber);
var rightNumberLength = NumberLength(rightString, rightIndex, out var rightNumber);
if (leftNumberLength < rightNumberLength) {
return LeftIsSmaller;
} else if (leftNumberLength > rightNumberLength) {
return RightIsSmaller;
} else {
if(leftNumber < rightNumber) {
return LeftIsSmaller;
} else if(leftNumber > rightNumber) {
return RightIsSmaller;
}
}
}
}
if (leftString.Length < rightString.Length) {
return LeftIsSmaller;
} else if(leftString.Length > rightString.Length) {
return RightIsSmaller;
}
return Equal;
}
public int NumberLength(string str, int offset, out int number) {
if (string.IsNullOrWhiteSpace(str)) throw new ArgumentNullException(nameof(str));
if (offset >= str.Length) throw new ArgumentOutOfRangeException(nameof(offset), offset, "Offset must be less than the length of the string.");
var currentOffset = offset;
var curChar = str[currentOffset];
if (!char.IsNumber(curChar))
throw new ArgumentException($"'{curChar}' is not a number.", nameof(offset));
int length = 1;
var numberString = string.Empty;
for (currentOffset = offset + 1;
currentOffset < str.Length;
currentOffset++, length++) {
curChar = str[currentOffset];
numberString += curChar;
if (!char.IsNumber(curChar)) {
number = int.Parse(numberString);
return length;
}
}
number = int.Parse(numberString);
return length;
}
}
We had a need for a natural sort to deal with text with the following pattern:
"Test 1-1-1 something"
"Test 1-2-3 something"
...
For some reason when I first looked on SO, I didn't find this post and implemented our own. Compared to some of the solutions presented here, while similar in concept, it could have the benefit of maybe being simpler and easier to understand. However, while I did try to look at performance bottlenecks, It is still a much slower implementation than the default OrderBy().
Here is the extension method I implement:
public static class EnumerableExtensions
{
// set up the regex parser once and for all
private static readonly Regex Regex = new Regex(#"\d+|\D+", RegexOptions.Compiled | RegexOptions.Singleline);
// stateless comparer can be built once
private static readonly AggregateComparer Comparer = new AggregateComparer();
public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> source, Func<T, string> selector)
{
// first extract string from object using selector
// then extract digit and non-digit groups
Func<T, IEnumerable<IComparable>> splitter =
s => Regex.Matches(selector(s))
.Cast<Match>()
.Select(m => Char.IsDigit(m.Value[0]) ? (IComparable) int.Parse(m.Value) : m.Value);
return source.OrderBy(splitter, Comparer);
}
/// <summary>
/// This comparer will compare two lists of objects against each other
/// </summary>
/// <remarks>Objects in each list are compare to their corresponding elements in the other
/// list until a difference is found.</remarks>
private class AggregateComparer : IComparer<IEnumerable<IComparable>>
{
public int Compare(IEnumerable<IComparable> x, IEnumerable<IComparable> y)
{
return
x.Zip(y, (a, b) => new {a, b}) // walk both lists
.Select(pair => pair.a.CompareTo(pair.b)) // compare each object
.FirstOrDefault(result => result != 0); // until a difference is found
}
}
}
The idea is to split the original strings into blocks of digits and non-digits ("\d+|\D+"). Since this is a potentially expensive task, it is done only once per entry. We then use a comparer of comparable objects (sorry, I can't find a more proper way to say it). It compares each block to its corresponding block in the other string.
I would like feedback on how this could be improved and what the major flaws are. Note that maintainability is important to us at this point and we are not currently using this in extremely large data sets.
Let me explain my problem and how i was able to solve it.
Problem:- Sort files based on FileName from FileInfo objects which are retrieved from a Directory.
Solution:- I selected the file names from FileInfo and trimed the ".png" part of the file name. Now, just do List.Sort(), which sorts the filenames in Natural sorting order. Based on my testing i found that having .png messes up sorting order. Have a look at the below code
var imageNameList = new DirectoryInfo(#"C:\Temp\Images").GetFiles("*.png").Select(x =>x.Name.Substring(0, x.Name.Length - 4)).ToList();
imageNameList.Sort();

fastest way for accessing double array as key in dictionary

I have a double[] array, i want to use it as key (not literally, but in the way that the key is matched when all the doubles in the double array need to be matched)
What is the fastest way to use the double[] array as key to dictionary?
Is it using
Dictionary<string, string> (convert double[] to a string)
or
anything else like converting it
Given that all key arrays will have the same length, either consider using a Tuple<,,, ... ,>, or use a structural equality comparer on the arrays.
With tuple:
var yourDidt = new Dictionary<Tuple<double, double, double>, string>();
yourDict.Add(Tuple.Create(3.14, 2.718, double.NaN), "da value");
string read = yourDict[Tuple.Create(3.14, 2.718, double.NaN)];
With (strongly typed version of) StructuralEqualityComparer:
class DoubleArrayStructuralEqualityComparer : EqualityComparer<double[]>
{
public override bool Equals(double[] x, double[] y)
{
return System.Collections.StructuralComparisons.StructuralEqualityComparer
.Equals(x, y);
}
public override int GetHashCode(double[] obj)
{
return System.Collections.StructuralComparisons.StructuralEqualityComparer
.GetHashCode(obj);
}
}
...
var yourDict = new Dictionary<double[], string>(
new DoubleArrayStructuralEqualityComparer());
yourDict.Add(new[] { 3.14, 2.718, double.NaN, }, "da value");
string read = yourDict[new[] { 3.14, 2.718, double.NaN, }];
Also consider the suggestion by Sergey Berezovskiy to create a custom class or (immutable!) struct to hold your set of doubles. In that way you can name your type and its members in a natural way that makes it more clear what you do. And your class/struct can easily be extended later on, if needed.
Thus all arrays have same length and each item in array have specific meaning, then create class which holds all items as properties with descriptive names. E.g. instead of double array with two items you can have class Point with properties X and Y. Then override Equals and GetHashCode of this class and use it as key (see What is the best algorithm for an overriding GetHashCode):
Dictionary<Point, string>
Benefits - instead of having array, you have data structure which makes its purpose clear. Instead of referencing items by indexes, you have nice named property names, which also make their purpose clear. And also speed - calculating hash code is fast. Compare:
double[] a = new [] { 12.5, 42 };
// getting first coordinate a[0];
Point a = new Point { X = 12.5, Y = 42 };
// getting first coordinate a.X
[Do not consider this a separate answer; this is an extension of #JeppeStigNielsen's answer]
I'd just like to point out that you make Jeppe's approach generic as follows:
public class StructuralEqualityComparer<T>: IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return StructuralComparisons.StructuralEqualityComparer.Equals(x, y);
}
public int GetHashCode(T obj)
{
return StructuralComparisons.StructuralEqualityComparer.GetHashCode(obj);
}
public static StructuralEqualityComparer<T> Default
{
get
{
StructuralEqualityComparer<T> comparer = _defaultComparer;
if (comparer == null)
{
comparer = new StructuralEqualityComparer<T>();
_defaultComparer = comparer;
}
return comparer;
}
}
private static StructuralEqualityComparer<T> _defaultComparer;
}
(From an original answer here: https://stackoverflow.com/a/5601068/106159)
Then you would declare the dictionary like this:
var yourDict = new Dictionary<double[], string>(new StructuralEqualityComparer<double[]>());
Note: It might be better to initialise _defaultComparer using Lazy<T>.
[EDIT]
It's possible that this might be faster; worth a try:
class DoubleArrayComparer: IEqualityComparer<double[]>
{
public bool Equals(double[] x, double[] y)
{
if (x == y)
return true;
if (x == null || y == null)
return false;
if (x.Length != y.Length)
return false;
for (int i = 0; i < x.Length; ++i)
if (x[i] != y[i])
return false;
return true;
}
public int GetHashCode(double[] data)
{
if (data == null)
return 0;
int result = 17;
foreach (var value in data)
result += result*23 + value.GetHashCode();
return result;
}
}
...
var yourDict = new Dictionary<double[], string>(new DoubleArrayComparer());
Ok this is what I found so far:
I input an entry (length 4 arrray) to the dictionary, and access it for 999999 times on my machine:
Dictionary<double[], string>(
new DoubleArrayStructuralEqualityComparer()); takes 1.75 seconds
Dictionary<Tuple<double...>,string> takes 0.85 seconds
The code below takes 0.1755285 seconds, which is the fastest now! (in line with the comment with Sergey.)
The fastest - The code of DoubleArrayComparer by Matthew Watson takes 0.15 seconds!
public class DoubleArray
{
private double[] d = null;
public DoubleArray(double[] d)
{
this.d = d;
}
public override bool Equals(object obj)
{
if (!(obj is DoubleArray)) return false;
DoubleArray dobj = (DoubleArray)obj;
if (dobj.d.Length != d.Length) return false;
for (int i = 0; i < d.Length; i++)
{
if (dobj.d[i] != d[i]) return false;
}
return true;
}
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
for (int i = 0; i < d.Length;i++ )
{
hash = hash*23 + d[i].GetHashCode();
}
return hash;
}
}
}

Categories

Resources