fastest way for accessing double array as key in dictionary - c#

I have a double[] array, i want to use it as key (not literally, but in the way that the key is matched when all the doubles in the double array need to be matched)
What is the fastest way to use the double[] array as key to dictionary?
Is it using
Dictionary<string, string> (convert double[] to a string)
or
anything else like converting it

Given that all key arrays will have the same length, either consider using a Tuple<,,, ... ,>, or use a structural equality comparer on the arrays.
With tuple:
var yourDidt = new Dictionary<Tuple<double, double, double>, string>();
yourDict.Add(Tuple.Create(3.14, 2.718, double.NaN), "da value");
string read = yourDict[Tuple.Create(3.14, 2.718, double.NaN)];
With (strongly typed version of) StructuralEqualityComparer:
class DoubleArrayStructuralEqualityComparer : EqualityComparer<double[]>
{
public override bool Equals(double[] x, double[] y)
{
return System.Collections.StructuralComparisons.StructuralEqualityComparer
.Equals(x, y);
}
public override int GetHashCode(double[] obj)
{
return System.Collections.StructuralComparisons.StructuralEqualityComparer
.GetHashCode(obj);
}
}
...
var yourDict = new Dictionary<double[], string>(
new DoubleArrayStructuralEqualityComparer());
yourDict.Add(new[] { 3.14, 2.718, double.NaN, }, "da value");
string read = yourDict[new[] { 3.14, 2.718, double.NaN, }];
Also consider the suggestion by Sergey Berezovskiy to create a custom class or (immutable!) struct to hold your set of doubles. In that way you can name your type and its members in a natural way that makes it more clear what you do. And your class/struct can easily be extended later on, if needed.

Thus all arrays have same length and each item in array have specific meaning, then create class which holds all items as properties with descriptive names. E.g. instead of double array with two items you can have class Point with properties X and Y. Then override Equals and GetHashCode of this class and use it as key (see What is the best algorithm for an overriding GetHashCode):
Dictionary<Point, string>
Benefits - instead of having array, you have data structure which makes its purpose clear. Instead of referencing items by indexes, you have nice named property names, which also make their purpose clear. And also speed - calculating hash code is fast. Compare:
double[] a = new [] { 12.5, 42 };
// getting first coordinate a[0];
Point a = new Point { X = 12.5, Y = 42 };
// getting first coordinate a.X

[Do not consider this a separate answer; this is an extension of #JeppeStigNielsen's answer]
I'd just like to point out that you make Jeppe's approach generic as follows:
public class StructuralEqualityComparer<T>: IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return StructuralComparisons.StructuralEqualityComparer.Equals(x, y);
}
public int GetHashCode(T obj)
{
return StructuralComparisons.StructuralEqualityComparer.GetHashCode(obj);
}
public static StructuralEqualityComparer<T> Default
{
get
{
StructuralEqualityComparer<T> comparer = _defaultComparer;
if (comparer == null)
{
comparer = new StructuralEqualityComparer<T>();
_defaultComparer = comparer;
}
return comparer;
}
}
private static StructuralEqualityComparer<T> _defaultComparer;
}
(From an original answer here: https://stackoverflow.com/a/5601068/106159)
Then you would declare the dictionary like this:
var yourDict = new Dictionary<double[], string>(new StructuralEqualityComparer<double[]>());
Note: It might be better to initialise _defaultComparer using Lazy<T>.
[EDIT]
It's possible that this might be faster; worth a try:
class DoubleArrayComparer: IEqualityComparer<double[]>
{
public bool Equals(double[] x, double[] y)
{
if (x == y)
return true;
if (x == null || y == null)
return false;
if (x.Length != y.Length)
return false;
for (int i = 0; i < x.Length; ++i)
if (x[i] != y[i])
return false;
return true;
}
public int GetHashCode(double[] data)
{
if (data == null)
return 0;
int result = 17;
foreach (var value in data)
result += result*23 + value.GetHashCode();
return result;
}
}
...
var yourDict = new Dictionary<double[], string>(new DoubleArrayComparer());

Ok this is what I found so far:
I input an entry (length 4 arrray) to the dictionary, and access it for 999999 times on my machine:
Dictionary<double[], string>(
new DoubleArrayStructuralEqualityComparer()); takes 1.75 seconds
Dictionary<Tuple<double...>,string> takes 0.85 seconds
The code below takes 0.1755285 seconds, which is the fastest now! (in line with the comment with Sergey.)
The fastest - The code of DoubleArrayComparer by Matthew Watson takes 0.15 seconds!
public class DoubleArray
{
private double[] d = null;
public DoubleArray(double[] d)
{
this.d = d;
}
public override bool Equals(object obj)
{
if (!(obj is DoubleArray)) return false;
DoubleArray dobj = (DoubleArray)obj;
if (dobj.d.Length != d.Length) return false;
for (int i = 0; i < d.Length; i++)
{
if (dobj.d[i] != d[i]) return false;
}
return true;
}
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
for (int i = 0; i < d.Length;i++ )
{
hash = hash*23 + d[i].GetHashCode();
}
return hash;
}
}
}

Related

Math Comparison on 2 Numbers with Multiple Decimals

I need to compare section numbers in a document, at first I was just going to convert to decimal and check to see if one number is greater than another.
The issue there is that some sections have multiple decimals.
Example: I need to perform a math comparison on 1.1 with 1.1.2.3 to determine which one is farther along in a document.
They are strings to begin with and I need to do some math comparisons on them essentially.
I though about removing the decimals then converting to int but this throws off certain sections, like section 2 would be considered less than section 1.1 since 1.1 would be changed to a 11, which is no good.
string section1 = "1.1";
string section2 = "2";
int convertedSection1 = Convert.ToInt32(section1.Replace(".",""));
int convertedSection2 = Convert.ToInt32(section2.Replace(".",""));
if(convertedSection1 < convertedSection2)
//This will incorrectly say 1.1 is greater than 2
string section1 = "1.1.2.4";
string section2 = "2.4";
decimal convertedSection1 = Convert.ToDecimal(section1);
decimal convertedSection2 = Convert.ToDecimal(section2);
if(convertedSection1 < convertedSection2)
//This will convert 1.1.2.4 to 1.1 which is no good
You can create a class similar to the Version class of the .NET framework. If you implement some operators and IComparable, it's really nice.
How does it work? It will convert the given string into a list of integer numbers. Upon comparison, it will start at the beginning of each list and compare each individual part.
public class Section: IComparable<Section>
{
// Stores all individual components of the section
private List<int> parts = new List<int>();
// Construct a section from a string
public Section(string section)
{
var strings = section.Split('.');
foreach (var s in strings)
{
parts.Add(int.Parse(s));
}
}
// Make it nice for display
public override string ToString()
{
return string.Join(".", parts);
}
// Implement comparison operators for convenience
public static bool operator ==(Section a, Section b)
{
// Comparing the identical object
if (ReferenceEquals(a, b)) return true;
// One object is null and the other isn't
if ((object)a == null) return false;
if ((object)b == null) return false;
// Different amount of items
if (a.parts.Count != b.parts.Count) return false;
// Check all individual items
for (int i=0; i<a.parts.Count;i++)
{
if (a.parts[i] != b.parts[i]) return false;
}
return true;
}
public static bool operator !=(Section a, Section b)
{
return !(a == b);
}
public static bool operator <(Section a, Section b)
{
// Use minimum, otherwise we exceed the index
for (int i=0; i< Math.Min(a.parts.Count, b.parts.Count); i++)
{
if (a.parts[i] < b.parts[i]) return true;
}
if (b.parts.Count > a.parts.Count) return true;
return false;
}
public static bool operator >(Section a, Section b)
{
// Use minimum, otherwise we exceed the index
for (int i = 0; i < Math.Min(a.parts.Count, b.parts.Count); i++)
{
if (a.parts[i] > b.parts[i]) return true;
}
if (a.parts.Count > b.parts.Count) return true;
return false;
}
// Implement the IComparable interface for sorting
public int CompareTo(Section other)
{
if (this == other) return 0;
if (this < other) return -1;
return 1;
}
}
Tests for 96% coverage:
Assert.IsTrue(new Section("1.2.3.4") > new Section("1.2.3"));
Assert.IsFalse(new Section("1.2.3.4") < new Section("1.2.3"));
Assert.IsFalse(new Section("1.2.3.4") == new Section("1.2.3"));
Assert.IsTrue(new Section("1.2.3.4") == new Section("1.2.3.4"));
Assert.IsFalse(new Section("1.2.3.4") == new Section("1.2.3.5"));
Assert.IsTrue(new Section("1.2.3.4") != new Section("1.2.3.5"));
var sec = new Section("1");
Assert.IsTrue(sec == sec);
Assert.AreEqual("1.2.3.4", new Section("1.2.3.4").ToString());
var sortTest = new List<Section> { new Section("2"), new Section("1.2"), new Section("1"), new Section("3.1") };
sortTest.Sort();
var expected = new List<Section> { new Section("1"), new Section("1.2"), new Section("2"), new Section("3.1") };
CollectionAssert.AreEqual(expected, sortTest, new SectionComparer());
If you know that your section strings are always well formed, and you know that they don't go deeper than 6 levels, and that no level has more than 999 items, then this works nicely:
string zero = ".0.0.0.0.0.0";
long Section2Long(string section) =>
(section + zero)
.Split('.')
.Take(6)
.Select(t => long.Parse(t))
.Aggregate((x, y) => x * 1000 + y);
Now, if I have this:
string[] sections = new []
{
"1.2.4", "2.3", "1", "1.2", "1.1.1.1", "1.0.0.1.0.1", "2.2.9"
};
I can easily sort it like this:
string[] sorted = sections.OrderBy(x => Section2Long(x)).ToArray();
I get this output:
1
1.0.0.1.0.1
1.1.1.1
1.2
1.2.4
2.2.9
2.3

How to override Equals and GetHash of HashSet

I have a HashSet<int[]> foo where the int[] represents the coordinates of a point in a plane. The value at position 0 represents the x and the value at position 1 represents the y. I want to override the Equals and GetHashCode methods to be able to remove an element (the point represented as an array of size two) if its internal values are equals to a given one.
Already tried:
public override int GetHashCode(){
return this.GetHashCode();
}
public override bool Equals(object obj){
if (obj == null || ! (obj is int[]))
return false;
HashSet<int[]> item = obj as HashSet<int[]>;
return item == this;
}
In my class Maze.
Thanks in advance.
EDIT
I found a way to do that
class SameHash : EqualityComparer<int[]>
{
public override bool Equals(int[] i1, int[] i2)
{
return i1[0] == i2[0] && i1[1] == i2[1];
}
public override int GetHashCode(int[] i)
{
return base.GetHashCode();
}
}
It may seems like you solved what you asked for, but there is something important that should be pointed out. When you implemented the EqualityComparer<int[]> you coded the GetHashCode(int[] i) as return base.GetHashCode(); which is not correct even when it works. I took the time to provide you with the code below for you to see the results of your implementation and I also gave you a possible solution.
Copy this code and run it in a Console Project. Comment your line of code, uncomment the line right below it and run it again. You will see the difference!
Summarizing, when you return base.GetHashCode() you are returning the same hash code for every item. This causes collisions inside the hash set for all insertions ending up in a behavior as slow as if you were using a List<int[]> and you were asking if it contains an element before inserting it. That is why you will see that by using the function I provided you and for the range of numbers I'm generating you will be able to insert up to one million times in less than 1 sec. However, using yours, no matter the range, it spends 1 sec in around ten thousand insertions. This happens because for all n insertions there are collisions and the resulting time complexity is O(n^2) when the expected for a HashSet and an even distributed Hash Function is O(n).
Check this out:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace hashExample
{
class Program
{
static void Main(string[] args)
{
List<int[]> points = new List<int[]>();
Random random = new Random();
int toInsert = 20000;
for (int i = 0; i < toInsert; i++)
{
int x = random.Next(1000);
int y = random.Next(1000);
points.Add(new int[]{ x,y });
}
HashSet<int[]> set = new HashSet<int[]>(new SameHash());
Stopwatch clock = new Stopwatch();
clock.Start();
foreach (var item in points)
{
set.Add(item);
}
clock.Stop();
Console.WriteLine("Elements inserted: " + set.Count + "/" + toInsert);
Console.WriteLine("Time taken: " + clock.ElapsedMilliseconds);
}
public class SameHash : EqualityComparer<int[]>
{
public override bool Equals(int[] p1, int[] p2)
{
return p1[0] == p2[0] && p1[1] == p2[1];
}
public override int GetHashCode(int[] i)
{
return base.GetHashCode();
//return i[0] * 10000 + i[1];
//Notice that this is a very basic implementation of a HashCode function
}
}
}
}
The only way I found it possible was by creating a class MyPair instead of using an array (int[]) like you did. Notice that I used X*10000 + Y in the GetHashCode() function but you can change the constant value in order to get a better HashCode for every item or you can create you own. I just provided this one as a simple example and because is an easy way of having different hashCodes when the bounds for X and Y are relative small (less than the root of Int.MaxValue).
Here you have the working code:
using System;
using System.Collections.Generic;
using System.Linq;
namespace hash
{
public class MyPair
{
public int X { get; set; }
public int Y { get; set; }
public override int GetHashCode()
{
return X * 10000 + Y;
}
public override bool Equals(object obj)
{
MyPair other = obj as MyPair;
return X == other.X && Y == other.Y;
}
}
class Program
{
static void Main(string[] args)
{
HashSet<MyPair> hash = new HashSet<MyPair>();
MyPair one = new MyPair { X = 10, Y = 2 };
MyPair two = new MyPair { X = 1, Y = 24 };
MyPair three = new MyPair { X = 111, Y = 266 };
MyPair copyOfOne = new MyPair { X = 10, Y = 2 };
Console.WriteLine(hash.Add(one));
Console.WriteLine(hash.Add(two));
Console.WriteLine(hash.Add(three));
Console.WriteLine(hash.Add(copyOfOne));
}
}
}

An integer array as a key for Dictionary

I wish to have the dictionary which uses an array of integers as keys, and if the integer array has the same value (even different object instance), they will be treated as the same key. How should I do it?
The following code does not work as b is different object instances.
int[] a = new int[] { 1, 2, 3 };
int[] b = new int[] { 1, 2, 3 };
Dictionary<int[], string> dic = new Dictionary<int[], string>();
dic.Add(a, "haha");
string output = dic[b];
You can create an IEqualityComparer to define how the dictionary should compare items. If the ordering of items is relevant, then something like this should work:
public class MyEqualityComparer : IEqualityComparer<int[]>
{
public bool Equals(int[] x, int[] y)
{
if (x.Length != y.Length)
{
return false;
}
for (int i = 0; i < x.Length; i++)
{
if (x[i] != y[i])
{
return false;
}
}
return true;
}
public int GetHashCode(int[] obj)
{
int result = 17;
for (int i = 0; i < obj.Length; i++)
{
unchecked
{
result = result * 23 + obj[i];
}
}
return result;
}
}
Then pass it in as you create the dictionary:
Dictionary<int[], string> dic
= new Dictionary<int[], string>(new MyEqualityComparer());
Note: calculation of hash code obtained here:
What is the best algorithm for an overridden System.Object.GetHashCode?
Maybe you should consider using a Tuple
var myDictionary = new Dictionary<Tuple<int,int>, string>();
myDictionary.Add(new Tuple<int,int>(3, 3), "haha1");
myDictionary.Add(new Tuple<int,int>(5, 5), "haha2");
According to MSDN , Tuple objects Equals method will use the values of the two Tuple objects
The easiest way if you don't care about actual hashing may just be to convert the array into a string. Adding a space to avoid numbers joining.
dic.Add(String.Join(" ",a), "haha");

Removing duplicates from string array

I'm new to C#, have looked at numerous posts but am still confused.
I have a array list:
List<Array> moves = new List<Array>();
I'm adding moves to it using the following:
string[] newmove = { piece, axis.ToString(), direction.ToString() };
moves.Add(newmove);
And now I wish to remove duplicates using the following:
moves = moves.Distinct();
However it's not letting me do it. I get this error:
Cannot implicitly convert type 'System.Collections.Generic.IEnumerable' to 'System.Collections.Generic.List'. An explicit conversion exists (are you missing a cast?)
Help please? I'd be so grateful.
Steve
You need to call .ToList() after the .Distinct method as it returns IEnumerable<T>. I would also recommend you using a strongly typed List<string[]> instead of List<Array>:
List<string[]> moves = new List<string[]>();
string[] newmove = { piece, axis.ToString(), direction.ToString() };
moves.Add(newmove);
moves.Add(newmove);
moves = moves.Distinct().ToList();
// At this stage moves.Count = 1
Your code has two errors. The first is the missing call to ToList, as already pointed out. The second is subtle. Unique compares objects by identity, but your duplicate list items have are different array instances.
There are multiple solutions for that problem.
Use a custom equality comparer in moves.Distinct().ToList(). No further changes necessary.
Sample implementation:
class ArrayEqualityComparer<T> : EqualityComparer<T> {
public override bool Equals(T[] x, T[] y) {
if ( x == null ) return y == null;
else if ( y == null ) return false;
return x.SequenceEquals(y);
}
public override int GetHashCode(T[] obj) {
if ( obj == null) return 0;
return obj.Aggregate(0, (hash, x) => hash ^ x.GetHashCode());
}
}
Filtering for unique items:
moves = moves.Distinct(new ArrayEqualityComparer<string>()).ToList();
Use Tuple<string,string,string> instead of string[]. Tuple offers built-in structural equality and comparison. This variant might make your code cluttered because of the long type name.
Instantiation:
List<Tuple<string, string, string>> moves =
new List<Tuple<string, string, string>>();
Adding new moves:
Tuple<string, string, string> newmove =
Tuple.Create(piece, axis.ToString(), direction.ToString());
moves.Add(newmove);
Filtering for unique items:
moves = moves.Distinct().ToList();
Use a custom class to hold your three values. I'd actually recommend this variant, because it makes all your code dealing with moves much more readable.
Sample implementation:
class Move {
public Move(string piece, string axis, string direction) {
Piece = piece;
Axis = axis;
Direction = direction;
}
string Piece { get; private set; }
string Axis { get; private set; }
string Direction { get; private set; }
public override Equals(object obj) {
Move other = obj as Move;
if ( other != null )
return Piece == other.Piece &&
Axis == other.Axis &&
Direction == other.Direction;
return false;
}
public override GetHashCode() {
return Piece.GetHashCode() ^
Axis.GetHashCode() ^
Direction.GetHashCode();
}
// TODO: override ToString() as well
}
Instantiation:
List<Move> moves = new List<Move>();
Adding new moves:
Move newmove = new Move(piece, axis.ToString(), direction.ToString());
moves.Add(newmove);
Filtering for unique items:
moves = moves.Distinct().ToList();
The compiler error is because you need to convert the result to a list:
moves = moves.Distinct().ToList();
However it probably won't work as you want, because arrays don't have Equals defined in the way that you are hoping (it compares the references of the array objects, not the values inside the array). Instead of using an array, create a class to hold your data and define Equals and GetHashCode to compare the values.
Old question, but this is an O(n) solution using O(1) additional space:
public static void RemoveDuplicates(string[] array)
{
int c = 0;
int i = -1;
for (int n = 1; n < array.Length; n++)
{
if (array[c] == array[n])
{
if (i == -1)
{
i = n;
}
}
else
{
if (i == -1)
{
c++;
}
else
{
array[i] = array[n];
c++;
i++;
}
}
}
}

How to get the Point with minimal X from an array of Points without using OrderBy?

Imagine I have
var points = new Point[]
{
new Point(1, 2),
new Point(2, 3)
};
To get the point with the minimum X I could:
var result = points.OrderBy(point => point.X).First();
But for large arrays, I don't think this is the faster option. There is a faster alternative?
It is better to use
int x = points.Min(p => p.X);
var result = points.First(p => p.X == x);
as this eliminates the necessity of sorting this list (i.e., it is O(n) as opposed to, say, O(n log n)). Further, it's clearer than using OrderBy and First.
You could even write an extension method as follows:
static class IEnumerableExtensions {
public static T SelectMin<T>(this IEnumerable<T> source, Func<T, int> selector) {
if (source == null) {
throw new ArgumentNullException("source");
}
int min = 0;
T returnValue = default(T);
bool flag = false;
foreach (T t in source) {
int value = selector(t);
if (flag) {
if (value < min) {
returnValue = t;
min = value;
}
}
else {
min = value;
returnValue = t;
flag = true;
}
}
if (!flag) {
throw new InvalidOperationException("source is empty");
}
return returnValue;
}
Usage:
IEnumerable<Point> points;
Point minPoint = points.SelectMin(p => p.X);
You can generalize to your needs. The advantage of this is that it avoids potentially walking the list twice.
The following should be the quickest, but not the prettiest way to do it:
public static T MinValue<T>(this IEnumerable<T> e, Func<T, int> f)
{
if (e == null) throw new ArgumentException();
var en = e.GetEnumerator();
if (!en.MoveNext()) throw new ArgumentException();
int min = f(en.Current);
T minValue = en.Current;
int possible = int.MinValue;
while (en.MoveNext())
{
possible = f(en.Current);
if (min > possible)
{
min = possible;
minValue = en.Current;
}
}
return minValue;
}
I only included the int extension, but it is trivial to do others.
Edit: modified per Jason.
For anyone looking to do this today, MoreLinq is a library available by NuGet which includes the operator provided by the other answers as well as several other useful operations not present in the framework.

Categories

Resources