A list of unique sequences with EqualityComparer

A list of unique sequences with EqualityComparer - c#

I want to make a List<List<int>> structure so that only distinct sequences of numbers can be added. For this I create an EqualityComparer this way:
public class ListEqualityComparer<TElement> : EqualityComparer<List<TElement>>
{
Object hashHelp1 = new Object();
Object hashHelp2 = new Object();
public override bool Equals(List<TElement> x, List<TElement> y)
{
if (x == null && y == null)
return true;
else if (x == null || y == null)
return false;
else if (x.SequenceEqual(y))
return true;
else
return false;
}
public override int GetHashCode(List<TElement> obj)
{
return (hashHelp1, hashHelp2).GetHashCode();
}
}
so I want two lists to be considered equal, if their sequences of elements are equal.
However, if I try to create
ListEqualityComparer<List<int>> LEC = new ListEqualityComparer<List<int>>();
List<List<int>> list = new List<List<int>>(LEC);
I get an error:
Argument 1: cannot convert from 'MyNamespace.MyFolder.ListEqualityComparer<System.Collections.Generic.List<int>>' to 'System.Collections.Generic.IEnumerable<System.Collections.Generic.List<int>>'.
What does it actually mean?
On my belief the program should just replace TElement with List<int>and go, but obviously something different happens here. I am somehow confused, because I have implemented abother EqualityComparer recently, and the lines
MyStructureEqualityComparer<int, ulong> MSEC= new MyStructureEqualityComparer<int, ulong>();
HashSet<MyStructure<int, ulong>> test = new HashSet<MyStructure<int,ulong>>(MSEC);
were fine. What I am doing wrong?

Related

Math Comparison on 2 Numbers with Multiple Decimals

I need to compare section numbers in a document, at first I was just going to convert to decimal and check to see if one number is greater than another.
The issue there is that some sections have multiple decimals.
Example: I need to perform a math comparison on 1.1 with 1.1.2.3 to determine which one is farther along in a document.
They are strings to begin with and I need to do some math comparisons on them essentially.
I though about removing the decimals then converting to int but this throws off certain sections, like section 2 would be considered less than section 1.1 since 1.1 would be changed to a 11, which is no good.
string section1 = "1.1";
string section2 = "2";
int convertedSection1 = Convert.ToInt32(section1.Replace(".",""));
int convertedSection2 = Convert.ToInt32(section2.Replace(".",""));
if(convertedSection1 < convertedSection2)
//This will incorrectly say 1.1 is greater than 2
string section1 = "1.1.2.4";
string section2 = "2.4";
decimal convertedSection1 = Convert.ToDecimal(section1);
decimal convertedSection2 = Convert.ToDecimal(section2);
if(convertedSection1 < convertedSection2)
//This will convert 1.1.2.4 to 1.1 which is no good

You can create a class similar to the Version class of the .NET framework. If you implement some operators and IComparable, it's really nice.
How does it work? It will convert the given string into a list of integer numbers. Upon comparison, it will start at the beginning of each list and compare each individual part.
public class Section: IComparable<Section>
{
// Stores all individual components of the section
private List<int> parts = new List<int>();
// Construct a section from a string
public Section(string section)
{
var strings = section.Split('.');
foreach (var s in strings)
{
parts.Add(int.Parse(s));
}
}
// Make it nice for display
public override string ToString()
{
return string.Join(".", parts);
}
// Implement comparison operators for convenience
public static bool operator ==(Section a, Section b)
{
// Comparing the identical object
if (ReferenceEquals(a, b)) return true;
// One object is null and the other isn't
if ((object)a == null) return false;
if ((object)b == null) return false;
// Different amount of items
if (a.parts.Count != b.parts.Count) return false;
// Check all individual items
for (int i=0; i<a.parts.Count;i++)
{
if (a.parts[i] != b.parts[i]) return false;
}
return true;
}
public static bool operator !=(Section a, Section b)
{
return !(a == b);
}
public static bool operator <(Section a, Section b)
{
// Use minimum, otherwise we exceed the index
for (int i=0; i< Math.Min(a.parts.Count, b.parts.Count); i++)
{
if (a.parts[i] < b.parts[i]) return true;
}
if (b.parts.Count > a.parts.Count) return true;
return false;
}
public static bool operator >(Section a, Section b)
{
// Use minimum, otherwise we exceed the index
for (int i = 0; i < Math.Min(a.parts.Count, b.parts.Count); i++)
{
if (a.parts[i] > b.parts[i]) return true;
}
if (a.parts.Count > b.parts.Count) return true;
return false;
}
// Implement the IComparable interface for sorting
public int CompareTo(Section other)
{
if (this == other) return 0;
if (this < other) return -1;
return 1;
}
}
Tests for 96% coverage:
Assert.IsTrue(new Section("1.2.3.4") > new Section("1.2.3"));
Assert.IsFalse(new Section("1.2.3.4") < new Section("1.2.3"));
Assert.IsFalse(new Section("1.2.3.4") == new Section("1.2.3"));
Assert.IsTrue(new Section("1.2.3.4") == new Section("1.2.3.4"));
Assert.IsFalse(new Section("1.2.3.4") == new Section("1.2.3.5"));
Assert.IsTrue(new Section("1.2.3.4") != new Section("1.2.3.5"));
var sec = new Section("1");
Assert.IsTrue(sec == sec);
Assert.AreEqual("1.2.3.4", new Section("1.2.3.4").ToString());
var sortTest = new List<Section> { new Section("2"), new Section("1.2"), new Section("1"), new Section("3.1") };
sortTest.Sort();
var expected = new List<Section> { new Section("1"), new Section("1.2"), new Section("2"), new Section("3.1") };
CollectionAssert.AreEqual(expected, sortTest, new SectionComparer());

If you know that your section strings are always well formed, and you know that they don't go deeper than 6 levels, and that no level has more than 999 items, then this works nicely:
string zero = ".0.0.0.0.0.0";
long Section2Long(string section) =>
(section + zero)
.Split('.')
.Take(6)
.Select(t => long.Parse(t))
.Aggregate((x, y) => x * 1000 + y);
Now, if I have this:
string[] sections = new []
{
"1.2.4", "2.3", "1", "1.2", "1.1.1.1", "1.0.0.1.0.1", "2.2.9"
};
I can easily sort it like this:
string[] sorted = sections.OrderBy(x => Section2Long(x)).ToArray();
I get this output:
1
1.0.0.1.0.1
1.1.1.1
1.2
1.2.4
2.2.9
2.3

How to iterate only distinct string values by custom substring equality

Similar to this question, I'm trying to iterate only distinct values of sub-string of given strings, for example:
List<string> keys = new List<string>()
{
"foo_boo_1",
"foo_boo_2,
"foo_boo_3,
"boo_boo_1"
}
The output for the selected distinct values should be (select arbitrary the first sub-string's distinct value):
foo_boo_1 (the first one)
boo_boo_1
I've tried to implement this solution using the IEqualityComparer with:
public class MyEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
int xIndex = x.LastIndexOf("_");
int yIndex = y.LastIndexOf("_");
if (xIndex > 0 && yIndex > 0)
return x.Substring(0, xIndex) == y.Substring(0, yIndex);
else
return false;
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
foreach (var key in myList.Distinct(new MyEqualityComparer()))
{
Console.WriteLine(key)
}
But the resulted output is:
foo_boo_1
foo_boo_2
foo_boo_3
boo_boo_1
Using the IEqualityComparer How do I remove the sub-string distinct values (foo_boo_2 and foo_boo_3)?
*Please note that the "real" keys are a lot longer, something like "1_0_8-B153_GF_6_2", therefore I must use the LastIndexOf.

Your current implementation has some flaws:
Both Equals and GetHashCode must never throw exception (you have to check for null)
If Equals returns true for x and y then GetHashCode(x) == GetHashCode(y). Counter example is "abc_1" and "abc_2".
The 2nd error can well cause Distinct return incorrect results (Distinct first compute hash).
Correct code can be something like this
public class MyEqualityComparer : IEqualityComparer<string> {
public bool Equals(string x, string y) {
if (ReferenceEquals(x, y))
return true;
else if ((null == x) || (null == y))
return false;
int xIndex = x.LastIndexOf('_');
int yIndex = y.LastIndexOf('_');
if (xIndex >= 0)
return (yIndex >= 0)
? x.Substring(0, xIndex) == y.Substring(0, yIndex)
: false;
else if (yIndex >= 0)
return false;
else
return x == y;
}
public int GetHashCode(string obj) {
if (null == obj)
return 0;
int index = obj.LastIndexOf('_');
return index < 0
? obj.GetHashCode()
: obj.Substring(0, index).GetHashCode();
}
}
Now you are ready to use it with Distinct:
foreach (var key in myList.Distinct(new MyEqualityComparer())) {
Console.WriteLine(key)
}

Your GetHashCode method in your equality comparer is returning the hash code for the entire string, just make it hash the substring, for example:
public int GetHashCode(string obj)
{
var index = obj.LastIndexOf("_");
return obj.Substring(0, index).GetHashCode();
}

For a more succinct solution that avoids using a custom IEqualityComparer<>, you could utilise GroupBy. For example:
var keys = new List<string>()
{
"foo_boo_1",
"foo_boo_2",
"foo_boo_3",
"boo_boo_1"
};
var distinct = keys
.Select(k => new
{
original = k,
truncated = k.Contains("_") ? k.Substring(0, k.LastIndexOf("_")) : k
})
.GroupBy(k => k.truncated)
.Select(g => g.First().original);
This outputs:
foo_boo_1
boo_boo_1

Huge two list<string> data should be compared using c# and must ensure performance improves

I have a two list contains huge data and had the following code which I used till now.
But have some doubts regarding this, due to lot of confusion about the data is compared or not in the list items.
Here I am using sequence Equal to compare the data
I have two questions, somewhere I found that sequenceEqual will compare the data in the lists. So used it.
1. Will sequenceEqual compares the data in the lists
2. better way of code to improve performance. As per understanding I kept only three items in both the lists but our requirement has huge data itmes in the lists. so need to improve performance
bool value = false;
List<string> list1 = new List<string>();
list1.Add("one");
list1.Add("two");
list1.Add( "three" );
List<string> list2 = new List<string>();
list2.Add("one");
list2.Add("two");
list2.Add("three");
list1.Sort();
list2.Sort();
if (list1.SequenceEqual(list2))
{
value = true;
}
else
{
value = false;
}
return value;

Here is the implementation of SequenceEqual:
public static bool SequenceEqual<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
if (comparer == null)
{
comparer = EqualityComparer<TSource>.Default;
}
if (first == null)
{
throw Error.ArgumentNull("first");
}
if (second == null)
{
throw Error.ArgumentNull("second");
}
using (IEnumerator<TSource> enumerator = first.GetEnumerator())
{
using (IEnumerator<TSource> enumerator2 = second.GetEnumerator())
{
while (enumerator.MoveNext())
{
if (!enumerator2.MoveNext() || !comparer.Equals(enumerator.Current, enumerator2.Current))
{
return false;
}
}
if (enumerator2.MoveNext())
{
return false;
}
}
}
return true;
}
It does not check length and simply traverses both lists to confirm if they are equal.
If you check two lists that are 1_000_000 and 1_000_001 in length then it is terribly inefficient.
So check the lengths first, then call SequenceEqual only if they are the same.
If you're checking for set equality rather than position equality then make a HashSet<string> out of each and then check against those.

As I understand it, using the SequenceEqual method from Linq will be very efficient. It effectively returns true if, quote, "the two source sequences are of equal length and their corresponding elements are equal according to the default equality comparer for their type", otherwise false. There won't really be a much faster way of doing this comparision.

Your method of comparing the two lists is essentially this:
private static bool SequenceUnorderedEqual<T>(IEnumerable<T> source1,
IEnumerable<T> source2)
{
var list1 = source1.ToList();
var list2 = source2.ToList();
list1.Sort();
list2.Sort();
return list1.SequenceEqual(list2);
}
Using a dictionary is at least ten times faster, at the cost of being a bit more complex.
private static bool SequenceUnorderedEqual<T>(IEnumerable<T> source1,
IEnumerable<T> source2)
{
var occurences = new Dictionary<T, int>();
// Populating the dictionary using source1
foreach (var item in source1)
{
int value;
if (!occurences.TryGetValue(item, out value))
{
value = 0;
}
occurences[item] = value + 1;
}
// Depopulating the dictionary using source2
foreach (var item in source2)
{
int value;
if (!occurences.TryGetValue(item, out value))
{
value = 0;
}
if (value <= 0) return false;
occurences[item] = value - 1;
}
// Now all counters should be reduced to zero
return occurences.Values.All(c => c == 0);
}
Update: bellow is an even faster version that uses parallelism.
private static bool SequenceUnorderedEqual<T>(IEnumerable<T> source1,
IEnumerable<T> source2)
{
var task1 = Task.Run(() => GetOccurences(source1));
var task2 = Task.Run(() => GetOccurences(source2));
Task.WaitAll(task1, task2);
var occurences1 = task1.Result;
var occurences2 = task2.Result;
if (occurences1.Count != occurences2.Count) return false;
var result = Parallel.ForEach(occurences1, (entry1, loopState) =>
{
var found2 = occurences2.TryGetValue(entry1.Key, out var value2);
if (!found2 || entry1.Value != value2) loopState.Stop();
});
return result.IsCompleted;
Dictionary<T, int> GetOccurences(IEnumerable<T> source)
{
var dic = new Dictionary<T, int>();
foreach (var item in source)
{
dic[item] = dic.TryGetValue(item, out int c) ? c + 1 : 1;
}
return dic;
}
}

fastest way for accessing double array as key in dictionary

I have a double[] array, i want to use it as key （not literally, but in the way that the key is matched when all the doubles in the double array need to be matched)
What is the fastest way to use the double[] array as key to dictionary?
Is it using
Dictionary<string, string> (convert double[] to a string)
or
anything else like converting it

Given that all key arrays will have the same length, either consider using a Tuple<,,, ... ,>, or use a structural equality comparer on the arrays.
With tuple:
var yourDidt = new Dictionary<Tuple<double, double, double>, string>();
yourDict.Add(Tuple.Create(3.14, 2.718, double.NaN), "da value");
string read = yourDict[Tuple.Create(3.14, 2.718, double.NaN)];
With (strongly typed version of) StructuralEqualityComparer:
class DoubleArrayStructuralEqualityComparer : EqualityComparer<double[]>
{
public override bool Equals(double[] x, double[] y)
{
return System.Collections.StructuralComparisons.StructuralEqualityComparer
.Equals(x, y);
}
public override int GetHashCode(double[] obj)
{
return System.Collections.StructuralComparisons.StructuralEqualityComparer
.GetHashCode(obj);
}
}
...
var yourDict = new Dictionary<double[], string>(
new DoubleArrayStructuralEqualityComparer());
yourDict.Add(new[] { 3.14, 2.718, double.NaN, }, "da value");
string read = yourDict[new[] { 3.14, 2.718, double.NaN, }];
Also consider the suggestion by Sergey Berezovskiy to create a custom class or (immutable!) struct to hold your set of doubles. In that way you can name your type and its members in a natural way that makes it more clear what you do. And your class/struct can easily be extended later on, if needed.

Thus all arrays have same length and each item in array have specific meaning, then create class which holds all items as properties with descriptive names. E.g. instead of double array with two items you can have class Point with properties X and Y. Then override Equals and GetHashCode of this class and use it as key (see What is the best algorithm for an overriding GetHashCode):
Dictionary<Point, string>
Benefits - instead of having array, you have data structure which makes its purpose clear. Instead of referencing items by indexes, you have nice named property names, which also make their purpose clear. And also speed - calculating hash code is fast. Compare:
double[] a = new [] { 12.5, 42 };
// getting first coordinate a[0];
Point a = new Point { X = 12.5, Y = 42 };
// getting first coordinate a.X

[Do not consider this a separate answer; this is an extension of #JeppeStigNielsen's answer]
I'd just like to point out that you make Jeppe's approach generic as follows:
public class StructuralEqualityComparer<T>: IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return StructuralComparisons.StructuralEqualityComparer.Equals(x, y);
}
public int GetHashCode(T obj)
{
return StructuralComparisons.StructuralEqualityComparer.GetHashCode(obj);
}
public static StructuralEqualityComparer<T> Default
{
get
{
StructuralEqualityComparer<T> comparer = _defaultComparer;
if (comparer == null)
{
comparer = new StructuralEqualityComparer<T>();
_defaultComparer = comparer;
}
return comparer;
}
}
private static StructuralEqualityComparer<T> _defaultComparer;
}
(From an original answer here: https://stackoverflow.com/a/5601068/106159)
Then you would declare the dictionary like this:
var yourDict = new Dictionary<double[], string>(new StructuralEqualityComparer<double[]>());
Note: It might be better to initialise _defaultComparer using Lazy<T>.
[EDIT]
It's possible that this might be faster; worth a try:
class DoubleArrayComparer: IEqualityComparer<double[]>
{
public bool Equals(double[] x, double[] y)
{
if (x == y)
return true;
if (x == null || y == null)
return false;
if (x.Length != y.Length)
return false;
for (int i = 0; i < x.Length; ++i)
if (x[i] != y[i])
return false;
return true;
}
public int GetHashCode(double[] data)
{
if (data == null)
return 0;
int result = 17;
foreach (var value in data)
result += result*23 + value.GetHashCode();
return result;
}
}
...
var yourDict = new Dictionary<double[], string>(new DoubleArrayComparer());

Ok this is what I found so far:
I input an entry (length 4 arrray) to the dictionary, and access it for 999999 times on my machine:
Dictionary<double[], string>(
new DoubleArrayStructuralEqualityComparer()); takes 1.75 seconds
Dictionary<Tuple<double...>,string> takes 0.85 seconds
The code below takes 0.1755285 seconds, which is the fastest now! (in line with the comment with Sergey.)
The fastest - The code of DoubleArrayComparer by Matthew Watson takes 0.15 seconds!
public class DoubleArray
{
private double[] d = null;
public DoubleArray(double[] d)
{
this.d = d;
}
public override bool Equals(object obj)
{
if (!(obj is DoubleArray)) return false;
DoubleArray dobj = (DoubleArray)obj;
if (dobj.d.Length != d.Length) return false;
for (int i = 0; i < d.Length; i++)
{
if (dobj.d[i] != d[i]) return false;
}
return true;
}
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
for (int i = 0; i < d.Length;i++ )
{
hash = hash*23 + d[i].GetHashCode();
}
return hash;
}
}
}

Removing duplicates from string array

I'm new to C#, have looked at numerous posts but am still confused.
I have a array list:
List<Array> moves = new List<Array>();
I'm adding moves to it using the following:
string[] newmove = { piece, axis.ToString(), direction.ToString() };
moves.Add(newmove);
And now I wish to remove duplicates using the following:
moves = moves.Distinct();
However it's not letting me do it. I get this error:
Cannot implicitly convert type 'System.Collections.Generic.IEnumerable' to 'System.Collections.Generic.List'. An explicit conversion exists (are you missing a cast?)
Help please? I'd be so grateful.
Steve

You need to call .ToList() after the .Distinct method as it returns IEnumerable<T>. I would also recommend you using a strongly typed List<string[]> instead of List<Array>:
List<string[]> moves = new List<string[]>();
string[] newmove = { piece, axis.ToString(), direction.ToString() };
moves.Add(newmove);
moves.Add(newmove);
moves = moves.Distinct().ToList();
// At this stage moves.Count = 1

Your code has two errors. The first is the missing call to ToList, as already pointed out. The second is subtle. Unique compares objects by identity, but your duplicate list items have are different array instances.
There are multiple solutions for that problem.
Use a custom equality comparer in moves.Distinct().ToList(). No further changes necessary.
Sample implementation:
class ArrayEqualityComparer<T> : EqualityComparer<T> {
public override bool Equals(T[] x, T[] y) {
if ( x == null ) return y == null;
else if ( y == null ) return false;
return x.SequenceEquals(y);
}
public override int GetHashCode(T[] obj) {
if ( obj == null) return 0;
return obj.Aggregate(0, (hash, x) => hash ^ x.GetHashCode());
}
}
Filtering for unique items:
moves = moves.Distinct(new ArrayEqualityComparer<string>()).ToList();
Use Tuple<string,string,string> instead of string[]. Tuple offers built-in structural equality and comparison. This variant might make your code cluttered because of the long type name.
Instantiation:
List<Tuple<string, string, string>> moves =
new List<Tuple<string, string, string>>();
Adding new moves:
Tuple<string, string, string> newmove =
Tuple.Create(piece, axis.ToString(), direction.ToString());
moves.Add(newmove);
Filtering for unique items:
moves = moves.Distinct().ToList();
Use a custom class to hold your three values. I'd actually recommend this variant, because it makes all your code dealing with moves much more readable.
Sample implementation:
class Move {
public Move(string piece, string axis, string direction) {
Piece = piece;
Axis = axis;
Direction = direction;
}
string Piece { get; private set; }
string Axis { get; private set; }
string Direction { get; private set; }
public override Equals(object obj) {
Move other = obj as Move;
if ( other != null )
return Piece == other.Piece &&
Axis == other.Axis &&
Direction == other.Direction;
return false;
}
public override GetHashCode() {
return Piece.GetHashCode() ^
Axis.GetHashCode() ^
Direction.GetHashCode();
}
// TODO: override ToString() as well
}
Instantiation:
List<Move> moves = new List<Move>();
Adding new moves:
Move newmove = new Move(piece, axis.ToString(), direction.ToString());
moves.Add(newmove);
Filtering for unique items:
moves = moves.Distinct().ToList();

The compiler error is because you need to convert the result to a list:
moves = moves.Distinct().ToList();
However it probably won't work as you want, because arrays don't have Equals defined in the way that you are hoping (it compares the references of the array objects, not the values inside the array). Instead of using an array, create a class to hold your data and define Equals and GetHashCode to compare the values.

Old question, but this is an O(n) solution using O(1) additional space:
public static void RemoveDuplicates(string[] array)
{
int c = 0;
int i = -1;
for (int n = 1; n < array.Length; n++)
{
if (array[c] == array[n])
{
if (i == -1)
{
i = n;
}
}
else
{
if (i == -1)
{
c++;
}
else
{
array[i] = array[n];
c++;
i++;
}
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

A list of unique sequences with EqualityComparer - c#

Related

Math Comparison on 2 Numbers with Multiple Decimals

How to iterate only distinct string values by custom substring equality

Huge two list<string> data should be compared using c# and must ensure performance improves

fastest way for accessing double array as key in dictionary

Removing duplicates from string array

Categories

Resources