Removing duplicate byte[]s from a collection

Removing duplicate byte[]s from a collection - c#

This will probably be an extremely simple question. I'm simply trying to remove duplicate byte[]s from a collection.
Since the default behaviour is to compare references, I tought that creating an IEqualityComparer would work, but it doesn't.
I've tried using a HashSet and LINQ's Distinct().
Sample code:
using System;
using System.Collections.Generic;
using System.Linq;
namespace cstest
{
class Program
{
static void Main(string[] args)
{
var l = new List<byte[]>();
l.Add(new byte[] { 5, 6, 7 });
l.Add(new byte[] { 5, 6, 7 });
Console.WriteLine(l.Distinct(new ByteArrayEqualityComparer()).Count());
Console.ReadKey();
}
}
class ByteArrayEqualityComparer : IEqualityComparer<byte[]>
{
public bool Equals(byte[] x, byte[] y)
{
return x.SequenceEqual(y);
}
public int GetHashCode(byte[] obj)
{
return obj.GetHashCode();
}
}
}
Output:
2

The GetHashCode will be used by Distinct, and won't work "as is"; try something like:
int result = 13 * obj.Length;
for(int i = 0 ; i < obj.Length ; i++) {
result = (17 * result) + obj[i];
}
return result;
which should provide the necessary equality conditions for hash-codes.
Personally, I would also unroll the equality test for performance:
if(ReferenceEquals(x,y)) return true;
if(x == null || y == null) return false;
if(x.Length != y.Length) return false;
for(int i = 0 ; i < x.Length; i++) {
if(x[i] != y[i]) return false;
}
return true;

Related

How to use ExceptWith with the type of HashSet<ReadOnlyCollection<string>> in C#?

HashSet<ReadOnlyCollection<int>> test1 = new HashSet<ReadOnlyCollection<int>> ();
for (int i = 0; i < 10; i++) {
List<int> temp = new List<int> ();
for (int j = 1; j < 2; j++) {
temp.Add (i);
temp.Add (j);
}
test1.Add (temp.AsReadOnly ());
}
Here test1 is {[0,1], [1,1], [2,1], [3,1], [4,1], [5,1], [6,1], [7,1], [8,1], [9,1]}
HashSet<ReadOnlyCollection<int>> test2 = new HashSet<ReadOnlyCollection<int>> ();
for (int i = 5; i < 10; i++) {
List<int> temp = new List<int> ();
for (int j = 1; j < 2; j++) {
temp.Add (i);
temp.Add (j);
}
test2.Add (temp.AsReadOnly ());
}
Here test2 is {[5,1], [6,1], [7,1], [8,1], [9,1]}
test1.ExceptWith(test2);
After doing this, I want test1 to be {[0,1], [1,1], [2,1], [3,1], [4,1]}, but it gives me the original test1.
How fix this problem? Or is there any other way to do the same thing? Thank you!

Objects in c# are usually compared by reference, not by value.
This means that new object() != new object(). In the same way, new List<int>() { 1 } != new List<int>() { 1 }. Structs and primitives, on the other hand, are compared by value, not by reference.
Some objects override their equality method to compare values instead. For example strings: new string(new[] { 'a', 'b', 'c'}) == "abc", even if object.ReferenceEquals(new string(new[] { 'a', 'b', 'c'}), "abc") == false.
But collections, lists, arrays etc. do not. For good reason - when comparing two lists of ints, what do you want to compare? The exact elements, regardless of order? The exact elements in order? The sum of elements? There's not one answer that fits everything. And often you might actually want to check if you have the same object.
When working with collections or LINQ, you can often specify a custom 'comparer' that will handle comparisons the way you want to. The collection methods then use this 'comparer' whenever it needs to compare two elements.
A very simple comparer that works on a ReadOnlyCollection<T> might look like this:
class ROCollectionComparer<T> : IEqualityComparer<IReadOnlyCollection<T>>
{
private readonly IEqualityComparer<T> elementComparer;
public ROCollectionComparer() : this(EqualityComparer<T>.Default) {}
public ROCollectionComparer(IEqualityComparer<T> elementComparer) {
this.elementComparer = elementComparer;
}
public bool Equals(IReadOnlyCollection<T> x, IReadOnlyCollection<T> y)
{
if(x== null && y == null) return true;
if(x == null || y == null) return false;
if(object.ReferenceEquals(x, y)) return true;
return x.Count == y.Count &&
x.SequenceEqual(y, elementComparer);
}
public int GetHashCode(IReadOnlyCollection<T> obj)
{
// simplistic implementation - but should OK-ish when just looking for equality
return (obj.Count, obj.Count == 0 ? 0 : elementComparer.GetHashCode(obj.First())).GetHashCode();
}
}
And then you can compare the behavior of the default equality check, and your custom one:
var std = new HashSet<int[]>(new[] { new[] { 1, 2 }, new[] { 2, 2}});
std.ExceptWith(new[] { new[] { 2, 2}});
std.Dump();
var custom = new HashSet<int[]>(new[] { new[] { 1, 2 }, new[] { 2, 2 } }, new ROCollectionComparer<int>());
custom.ExceptWith(new[] { new[] { 2, 2 }});
custom.ExceptWith(new[] { new int[] { }});
custom.Dump();
You can test the whole thing in this fiddle.

Here you have the implementation of ExceptWith:
https://github.com/microsoft/referencesource/blob/3b1eaf5203992df69de44c783a3eda37d3d4cd10/System.Core/System/Collections/Generic/HashSet.cs#L532
What it actually does is:
// remove every element in other from this
foreach (T element in other) {
Remove(element);
}
And Remove implementation:
https://github.com/microsoft/referencesource/blob/3b1eaf5203992df69de44c783a3eda37d3d4cd10/System.Core/System/Collections/Generic/HashSet.cs#L287
if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, item)) {
So if the hashcode is not the same, Remove will do nothing.
A small test to prove that hashcode is not the same:
List<int> temp = new List<int> ();
temp.Add(1);
temp.Add(2);
HashSet<ReadOnlyCollection<int>> test1 = new HashSet<ReadOnlyCollection<int>> ();
HashSet<ReadOnlyCollection<int>> test2 = new HashSet<ReadOnlyCollection<int>> ();
test1.Add (temp.AsReadOnly ());
test2.Add (temp.AsReadOnly ());
Console.WriteLine(test1.First().GetHashCode() == test2.First().GetHashCode());

How to iterate lists with different lengths to find all permutations?

This one should not be too hard but my mind seems to be having a stack overflow (huehue). I have a series of Lists and I want to find all permutations they can be ordered in. All of the lists have different lengths.
For example:
List 1: 1
List 2: 1, 2
All permutations would be:
1, 1
1, 2
In my case I don't switch the numbers around. (For example 2, 1)
What is the easiest way to write this?

I can't say if the following is the easiest way, but IMO it's the most efficient way. It's basically a generalized version of the my answer to the Looking at each combination in jagged array:
public static class Algorithms
{
public static IEnumerable<T[]> GenerateCombinations<T>(this IReadOnlyList<IReadOnlyList<T>> input)
{
var result = new T[input.Count];
var indices = new int[input.Count];
for (int pos = 0, index = 0; ;)
{
for (; pos < result.Length; pos++, index = 0)
{
indices[pos] = index;
result[pos] = input[pos][index];
}
yield return result;
do
{
if (pos == 0) yield break;
index = indices[--pos] + 1;
}
while (index >= input[pos].Count);
}
}
}
You can see the explanation in the linked answer (shortly it's emulating nested loops). Also since for performace reasons it yields the internal buffer w/o cloning it, you need to clone it if you want store the result for later processing.
Sample usage:
var list1 = new List<int> { 1 };
var list2 = new List<int> { 1, 2 };
var lists = new[] { list1, list2 };
// Non caching usage
foreach (var combination in lists.GenerateCombinations())
{
// do something with the combination
}
// Caching usage
var combinations = lists.GenerateCombinations().Select(c => c.ToList()).ToList();
UPDATE: The GenerateCombinations is a standard C# iterator method, and the implementation basically emulates N nested loops (where N is the input.Count) like this (in pseudo code):
for (int i0 = 0; i0 < input[0].Count; i0++)
for (int i1 = 0; i1 < input[1].Count; i1++)
for (int i2 = 0; i2 < input[2].Count; i2++)
...
for (int iN-1 = 0; iN-1 < input[N-1].Count; iN-1++)
yield { input[0][i0], input[1][i1], input[2][i2], ..., input[N-1][iN-1] }
or showing it differently:
for (indices[0] = 0; indices[0] < input[0].Count; indices[0]++)
{
result[0] = input[0][indices[0]];
for (indices[1] = 0; indices[1] < input[1].Count; indices[1]++)
{
result[1] = input[1][indices[1]];
// ...
for (indices[N-1] = 0; indices[N-1] < input[N-1].Count; indices[N-1]++)
{
result[N-1] = input[N-1][indices[N-1]];
yield return result;
}
}
}

Nested loops:
List<int> listA = (whatever), listB = (whatever);
var answers = new List<Tuple<int,int>>;
for(int a in listA)
for(int b in listB)
answers.add(Tuple.create(a,b));
// do whatever with answers

Try this:
Func<IEnumerable<string>, IEnumerable<string>> combine = null;
combine = xs =>
xs.Skip(1).Any()
? xs.First().SelectMany(x => combine(xs.Skip(1)), (x, y) => String.Format("{0}{1}", x, y))
: xs.First().Select(x => x.ToString());
var strings = new [] { "AB", "12", "$%" };
foreach (var x in combine(strings))
{
Console.WriteLine(x);
}
That gives me:
A1$
A1%
A2$
A2%
B1$
B1%
B2$
B2%

I made the following IEnumerable<IEnumerable<TValue>> class to solve this problem which allows use of generic IEnumerable's and whose enumerator returns all permutations of the values, one from each inner list. It can be conventiently used directly in a foreach loop.
It's a variant of Michael Liu's answer to IEnumerable and Recursion using yield return
I've modified it to return lists with the permutations instead of the single values.
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
namespace Permutation
{
public class ListOfListsPermuter<TValue> : IEnumerable<IEnumerable<TValue>>
{
private int count;
private IEnumerable<TValue>[] listOfLists;
public ListOfListsPermuter(IEnumerable<IEnumerable<TValue>> listOfLists_)
{
if (object.ReferenceEquals(listOfLists_, null))
{
throw new ArgumentNullException(nameof(listOfLists_));
}
listOfLists =listOfLists_.ToArray();
count = listOfLists.Count();
for (int i = 0; i < count; i++)
{
if (object.ReferenceEquals(listOfLists[i], null))
{
throw new NullReferenceException(string.Format("{0}[{1}] is null.", nameof(listOfLists_), i));
}
}
}
// A variant of Michael Liu's answer in StackOverflow
// https://stackoverflow.com/questions/2055927/ienumerable-and-recursion-using-yield-return
public IEnumerator<IEnumerable<TValue>> GetEnumerator()
{
TValue[] currentList = new TValue[count];
int level = 0;
var enumerators = new Stack<IEnumerator<TValue>>();
IEnumerator<TValue> enumerator = listOfLists[level].GetEnumerator();
try
{
while (true)
{
if (enumerator.MoveNext())
{
currentList[level] = enumerator.Current;
level++;
if (level >= count)
{
level--;
yield return currentList;
}
else
{
enumerators.Push(enumerator);
enumerator = listOfLists[level].GetEnumerator();
}
}
else
{
if (level == 0)
{
yield break;
}
else
{
enumerator.Dispose();
enumerator = enumerators.Pop();
level--;
}
}
}
}
finally
{
// Clean up in case of an exception.
enumerator?.Dispose();
while (enumerators.Count > 0)
{
enumerator = enumerators.Pop();
enumerator.Dispose();
}
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
You can use it directly in a foreach like this:
public static void Main(string[] args)
{
var listOfLists = new List<List<string>>()
{
{ new List<string>() { "A", "B" } },
{ new List<string>() { "C", "D" } }
};
var permuter = new ListOfListsPermuter<string>(listOfLists);
foreach (IEnumerable<string> item in permuter)
{
Console.WriteLine("{ \"" + string.Join("\", \"", item) + "\" }");
}
}
The output:
{ "A", "C" }
{ "A", "D" }
{ "B", "C" }
{ "B", "D" }

fastest way for accessing double array as key in dictionary

I have a double[] array, i want to use it as key （not literally, but in the way that the key is matched when all the doubles in the double array need to be matched)
What is the fastest way to use the double[] array as key to dictionary?
Is it using
Dictionary<string, string> (convert double[] to a string)
or
anything else like converting it

Given that all key arrays will have the same length, either consider using a Tuple<,,, ... ,>, or use a structural equality comparer on the arrays.
With tuple:
var yourDidt = new Dictionary<Tuple<double, double, double>, string>();
yourDict.Add(Tuple.Create(3.14, 2.718, double.NaN), "da value");
string read = yourDict[Tuple.Create(3.14, 2.718, double.NaN)];
With (strongly typed version of) StructuralEqualityComparer:
class DoubleArrayStructuralEqualityComparer : EqualityComparer<double[]>
{
public override bool Equals(double[] x, double[] y)
{
return System.Collections.StructuralComparisons.StructuralEqualityComparer
.Equals(x, y);
}
public override int GetHashCode(double[] obj)
{
return System.Collections.StructuralComparisons.StructuralEqualityComparer
.GetHashCode(obj);
}
}
...
var yourDict = new Dictionary<double[], string>(
new DoubleArrayStructuralEqualityComparer());
yourDict.Add(new[] { 3.14, 2.718, double.NaN, }, "da value");
string read = yourDict[new[] { 3.14, 2.718, double.NaN, }];
Also consider the suggestion by Sergey Berezovskiy to create a custom class or (immutable!) struct to hold your set of doubles. In that way you can name your type and its members in a natural way that makes it more clear what you do. And your class/struct can easily be extended later on, if needed.

Thus all arrays have same length and each item in array have specific meaning, then create class which holds all items as properties with descriptive names. E.g. instead of double array with two items you can have class Point with properties X and Y. Then override Equals and GetHashCode of this class and use it as key (see What is the best algorithm for an overriding GetHashCode):
Dictionary<Point, string>
Benefits - instead of having array, you have data structure which makes its purpose clear. Instead of referencing items by indexes, you have nice named property names, which also make their purpose clear. And also speed - calculating hash code is fast. Compare:
double[] a = new [] { 12.5, 42 };
// getting first coordinate a[0];
Point a = new Point { X = 12.5, Y = 42 };
// getting first coordinate a.X

[Do not consider this a separate answer; this is an extension of #JeppeStigNielsen's answer]
I'd just like to point out that you make Jeppe's approach generic as follows:
public class StructuralEqualityComparer<T>: IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return StructuralComparisons.StructuralEqualityComparer.Equals(x, y);
}
public int GetHashCode(T obj)
{
return StructuralComparisons.StructuralEqualityComparer.GetHashCode(obj);
}
public static StructuralEqualityComparer<T> Default
{
get
{
StructuralEqualityComparer<T> comparer = _defaultComparer;
if (comparer == null)
{
comparer = new StructuralEqualityComparer<T>();
_defaultComparer = comparer;
}
return comparer;
}
}
private static StructuralEqualityComparer<T> _defaultComparer;
}
(From an original answer here: https://stackoverflow.com/a/5601068/106159)
Then you would declare the dictionary like this:
var yourDict = new Dictionary<double[], string>(new StructuralEqualityComparer<double[]>());
Note: It might be better to initialise _defaultComparer using Lazy<T>.
[EDIT]
It's possible that this might be faster; worth a try:
class DoubleArrayComparer: IEqualityComparer<double[]>
{
public bool Equals(double[] x, double[] y)
{
if (x == y)
return true;
if (x == null || y == null)
return false;
if (x.Length != y.Length)
return false;
for (int i = 0; i < x.Length; ++i)
if (x[i] != y[i])
return false;
return true;
}
public int GetHashCode(double[] data)
{
if (data == null)
return 0;
int result = 17;
foreach (var value in data)
result += result*23 + value.GetHashCode();
return result;
}
}
...
var yourDict = new Dictionary<double[], string>(new DoubleArrayComparer());

Ok this is what I found so far:
I input an entry (length 4 arrray) to the dictionary, and access it for 999999 times on my machine:
Dictionary<double[], string>(
new DoubleArrayStructuralEqualityComparer()); takes 1.75 seconds
Dictionary<Tuple<double...>,string> takes 0.85 seconds
The code below takes 0.1755285 seconds, which is the fastest now! (in line with the comment with Sergey.)
The fastest - The code of DoubleArrayComparer by Matthew Watson takes 0.15 seconds!
public class DoubleArray
{
private double[] d = null;
public DoubleArray(double[] d)
{
this.d = d;
}
public override bool Equals(object obj)
{
if (!(obj is DoubleArray)) return false;
DoubleArray dobj = (DoubleArray)obj;
if (dobj.d.Length != d.Length) return false;
for (int i = 0; i < d.Length; i++)
{
if (dobj.d[i] != d[i]) return false;
}
return true;
}
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
for (int i = 0; i < d.Length;i++ )
{
hash = hash*23 + d[i].GetHashCode();
}
return hash;
}
}
}

Is there a way to optimize a foreach with LINQ?

How can I check an array of integers contain an integer value.
How can i do it in LiNQ. I have to do it in LINQ Query..
Like:-
Int test = 10;
var a = from test in Test
where test.Contains(1,2,3,4,5,6,7,8,9,10)
select test.id
Currently I'm doing it through Extensions Method but the method is slow.
public static bool ContainsAnyInt(this int int_, bool checkForNotContain_, params int[] values_)
{
try
{
if (values_.Length > 0)
{
foreach (int value in values_)
{
if (value == int_)
{
if (checkForNotContain_)
return false;
else
return true;
}
}
}
}
catch (Exception ex)
{
ApplicationLog.Log("Exception: ExtensionsMerhod - ContainsAnyInt() Method ---> " + ex);
}
}
I have to do it in an optimize way because data is huge...

In most cases Linq is slower than a foreach.
You can just call the Linq Extension method:
int[] values = new[]{3,3};
bool hasValue = values.Contains(3);
It accomplishes the same thing as your extension method.

Would the following not work faster (untested):
public static bool ContainsAnyInt(this int int_, bool checkForNotContain_, params int[] values_)
{
if(values_ != null && values_.Contains(int_))
{
return !checkForNotContain_;
}
else
return false;
}

Working within your constraints, I would sort the arrays of values in each of the test classes so you could do something like:
int[] values = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var results = from test in tests
where test.BinaryContains(values)
select test.id;
And the test class would look something like:
class Test
{
public int id;
public int[] vals; //A SORTED list of integers
public bool BinaryContains(int[] values)
{
for (int i = 0; i < values.Length; i++)
if (values[i] >= vals[0] && values[i] <= vals[vals.Length])
{
//Binary search vals for values[i]
//if match found return true
}
return false;
}
}
Of course there are tons of ways you could optimize this further. If memory is not a concern, a Dictionary could give you all of the Test classes that contain a given integer.

Equals method on Binary object

The Microsoft documentation for
public bool Binary.Equals(Binary other)
gives no indication as to whether this tests equality of reference as with objects in general or equality of value as with strings.
Can anyone clarify?
John Skeet's answer inspired me to expand it to this:
using System;
using System.Data.Linq;
public class Program
{
static void Main(string[] args)
{
Binary a = new Binary(new byte[] { 1, 2, 3 });
Binary b = new Binary(new byte[] { 1, 2, 3 });
Console.WriteLine("a.Equals(b) >>> {0}", a.Equals(b));
Console.WriteLine("a {0} == b {1} >>> {2}", a, b, a == b);
b = new Binary(new byte[] { 1, 2, 3, 4 });
Console.WriteLine("a {0} == b {1} >>> {2}",a,b, a == b);
/* a < b is not supported */
}
}

Well, a simple test suggests it is value equality:
using System;
using System.Data.Linq;
class Program {
static void Main(string[] args)
{
Binary a = new Binary(new byte[] { 1, 2, 3 });
Binary b = new Binary(new byte[] { 1, 2, 3 });
Console.WriteLine(a.Equals(b)); // Prints True
}
}
The fact that they've bothered to implement IEquatable<Binary> and override Equals(object) to start with suggests value equality semantics too... but I agree that the docs should make this clear.

It's a value comparison per Reflector...
private bool EqualsTo(Binary binary)
{
if (this != binary)
{
if (binary == null)
{
return false;
}
if (this.bytes.Length != binary.bytes.Length)
{
return false;
}
if (this.hashCode != binary.hashCode)
{
return false;
}
int index = 0;
int length = this.bytes.Length;
while (index < length)
{
if (this.bytes[index] != binary.bytes[index])
{
return false;
}
index++;
}
}
return true;
}

Reflector shows that Binary.Equals compares by real binary value, not by the reference.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Removing duplicate byte[]s from a collection - c#

Related

How to use ExceptWith with the type of HashSet<ReadOnlyCollection<string>> in C#?

How to iterate lists with different lengths to find all permutations?

fastest way for accessing double array as key in dictionary

Is there a way to optimize a foreach with LINQ?

Equals method on Binary object

Categories

Resources