C# Compare two Arrays and perform Action on Matches - c#

my question is: How can i compare two arrays, and perform an action on the Elements that are in both ?
I use C# / LINQ
What i'm trying to do: Loop throu a array of users. A other Array, containing rules for some / specific users. So for each user, which has a rule in the rules array, increment a field on the user object.
I already tried using Linq:
var array1 = context.SomeSecret.ToArray();
var array2 = anotherContext.AnotherSecret.ToArray();
(from rule in array2
from user in array1
where user.ID = rule.ID
select user).ToObserveable().Subscribe<User>(x => x.MaxRules++);
What i'm trying to do: Loop throu a array of users. A other Array, containing rules for some / specific users. So for each user, which has a rule in the rules array, update a field on the user object.
This was the original Code:
var userDic = context.SomeSecret.ToDictionary(u => u.ID);
var rules = anotherContext.AnotherSecret.ToList();
foreach(var rule in rules)
{
if(userDic.ContainsKey(rule.UserID))
{
userDic[rule.UserID]++;
}
}
user.ID and rule.UserID are the Same.
Note:
This is "meaningless" Code
Is there any "elegant" way to solve that ?
Thanks in advance.

You are trying to do too much in a few statements. This makes your code difficult to read, difficult to reuse, difficult to change and difficult to unit test. Consider to make it a habit to make small reusable methods.
IEnumerable<Secret> GetSecrets() {...}
IEnumerable<Secret> GetOtherSecrets() {...}
How can i compare two arrays, and perform an action on the Elements that are in both?
LINQ can only extract data from your source data. LINQ cannot change the source data. To change the source data, you should enumerate the data that you extracted using LINQ. This is usually done using foreach.
So you have two sequences of Secrets, and you want to extract all Secrets that are in both sequences.
Define equality
First of all, you need to specify: when is a Secret in both sequences:
Secret a = new Secret();
Secret b = a;
Secret c = (Secret)a.Clone();
It is clear that a and b refer to the same object. Although the values of all properties and fields in Secret a and Secret c are equal, they are different instances.
The effect is, that if you change the value of one of the properties of Secret a, then the value is also changed in Secret b. However, Secret C remains unchanged.
Secret d = new Secret();
Secret e = new Secret();
IEnumerable<Secret> array1 = new Secret[] {a, d};
IEnumerable<Secret> array2 = new Secret[] {a, b, c, e};
It is clear that you want a in your end result. You also want b, because a and b refer to the same object. It is also clear that you don't want d, nor e in your end result. But are in your opinion a and c equal?
Another ambiguity in your requirements:
IEnumerable<Secret> array1 = new Secret[] {a};
IEnumerable<Secret> array2 = new Secret[] {a, a, a, a, a};
How many times do you want a in your end result?
Equality comparers
By default a and c are different objects, a == c yields false.
However if you want to define them equal, you need to say in your LINQ: do not use the standard definition for equality, use my definition of equality.
For this we need to write an Equality Comparer. Or to be more precise: create an object of a class that implement IEqualityComparer<Secret>.
Luckily this is usually quite straightforward.
Definition: Two objects of type Secret are equal if all properties return the same value.
class SecretComparer : EqualityComparer<Secret>
{
public static IEqualityComparer<Secret> ByValue {get;} = new SecretComparer();
public override bool Equals (Secret x, Secret y)
{
... // TODO: implement
}
public override int GetHashCode (Secret x)
{
... // TODO: implement
}
Implementation is below
The reason that I derive from class EqualityComparer<Secret>, and not just implement IEqualityComparer<Secret>, is that class EqualityComparer also give me property Default, which might be useful if you want to use the default definition when comparing two Secrets.
LINQ: get objects that are in two sequences
Once you have the equality comparer, LINQ will be straightforward.
To extract the Secrets that are in both x and y, I use the overload of Enumerable.Intersect that uses an equality comparer:
IEnumerable<Secret> ExtractDuplicateSecrets(IEnumerable<Secret> x, IEnumerable<Secret> y)
{
return x.Intersect(y, SecretComparer.ByValue);
}
That's all. To perform an action on every remaining Secret, use foreach:
void PerformSecretAction(IEnumerable<Secret> secrets)
{
foreach (Secret secret in secrets)
{
secret.Process();
}
}
So your complete code:
IEnumerable<Secret> x = GetSecrets();
IEnumerable<Secret> y = GetOtherSecrets();
IEnumerable<Secret> secretsInXandY = ExtractDuplicateSecrets(x, y);
PerformSecretAction(secretsInXandY);
Or if you want to do this in one statement. Not sure if this improves readability:
PerformSecretAction(ExtractDuplicateSecrets(GetSecrets(), GetOtherSecrets());
The nice thing about making small methods: creation of x and y, a SecretComparer, extract the common Secrets and perform the action on all remaining Secrets, is that most procedure will be quite small, hence easy to read. Also, all procedures can be reused for other purposes. You can easily change them (different definition of equality: just write a different comparer!), and easy to unit test.
Implement Secret Equality
public override bool Equals (Secret x, Secret y)
{
// almost all equality comparers start with the following lines:
if (x == null) return y == null; // True if x and y both null
if (y == null) return false; // because x not null
if (Object.ReferenceEquals(x, y) return true; // same object
Most of the time often we don't want that different derived classes are equal: So a TopSecret (derived from Secret) is not equal to a Secret.
if (x.GetType() != y.GetType()) return false;
The rest depends on your definition of when two Secrets are equal. Most of the time you check all properties. Sometimes you only check a subsection.
return x.Id == y.Id
&& x.Description == y.Description
&& x.Date == y.Date
&& ...
Here you can see that the code depends on your definition of equality. Maybe the Description check is case insensitive:
private static IEqualityComparer<string> descriptionComparer {get;}
= StringComparer.CurrentCultureIgnoreCase;
return x.Id == y.Id
&& descriptionComparer.Equals(x.Description, y.Description)
&& ...
Implement GetHashCode
This method is mainly used to have a fast method to determine that two objects are not equal. A good GetHashCode is fast, and throws away most unequal objects.
There is only one requirement: if x and y are considered equal, they should return the same HashCode. Not the other way round: different objects might have the same Hashcode, although it would be better if they have different HashCodes.
How about this:
public override int GetHashCode (Secret x)
{
if (x == null)
return 8744523; // just a number;
else
return x.Id.GetHashCode(); // only check Id
}
In the code above, I assume that the Id of a Secret is fairly unique. Probably only while updating a Secret you will find two non-equal Secrets with same Id:
Secret existingSecret = this.FindSecretById(42);
Secret secretToEdit = (Secret)existingSecret.Clone();
secretToEdit.Description = this.ReadNewDescription();
Now existingSecret and secretToEdit have the same value for Id, but a different Description. Hence they are not equal. Yet they have the same HashCode.
Still, by far, most Secrets will have a unique Id, GetHashCode will be a very fast method to detect that two Secrets are different.

Related

Dictionary hash function for fuzzy lookups

When an approximated comparison between strings is required, the basic Levenshtein Distance can help. It measures the amount of modifications of the string needed to equal another string:
"aaaa" vs "aaab" => 1
"abba" vs "aabb" => 2
"aaaa" vs "a" => 3
When using a Dictionary<T, U> one can provide a custom IEqualityComparer<T>. One can implement the Levenshtein Distance as an IEqualityComparer<string>:
public class LevenshteinStringComparer : IEqualityComparer<string>
{
private readonly int _maximumDistance;
public LevenshteinStringComparer(int maximumDistance)
=> _maximumDistance = maximumDistance;
public bool Equals(string x, string y)
=> ComputeLevenshteinDistance(x, y) <= _maximumDistance;
public int GetHashCode(string obj)
=> 0;
private static int ComputeLevenshteinDistance(string s, string t)
{
// Omitted for simplicity
// Example can be found here: https://www.dotnetperls.com/levenshtein
}
}
So we can use a fuzzy dictionary:
var dict = new Dictionary<string, int>(new LevenshteinStringComparer(2));
dict["aaa"] = 1;
dict["aab"] = 2; // Modify existing value under "aaa" key
// Only one key was created:
dict.Keys => { "aaa" }
Having all this set up, you may have noticed that we don't have implemented a proper GetHashCode in the LevenshteinStringComparer which would be greatly appreciated by the dictionary. As some rule of thumbs regarding hash codes, I'd use:
Unequal objects should not have the same hash code
Equal objects must have the same hash code
The only possible hash function following these rules I can imagine is a constant number, just as implemented in the given code. This isn't optimal though, but when we start for example to take the default hash of the string, then aaa and aab would end up with different hashes, even though they are handled as equal. Thinking further this means all possible strings have to have the same hash.
Am I correct? And why does the performance of the dictionary gets better when I use the default string hash function with hash collisions for our comparer? Shouldn't this make the hash buckets inside the dictionary invalid?
public int GetHashCode(string obj)
=> obj.GetHashCode();
I don't think there is a hashing function that could work in your case.
The problem is that you have to assign the bucket based on a signle value only, while you can't know what was added before. But the Levenshtein distance of the item being hashed can be anything from 0 to "infinity", only thing that matters is what it is compared with. Hence you cannot satisfy the second condition of the hashing function (to have equal objects have the same hash code).
Another argument "pseudo-proof" would be the situation when you want maximum distance of 2 and you already have two items in the dictionary, which have mutual distance of 3. If you then add a string which is of distance 2 from the first item and distance 1 from the second item, how would you decide which item should it match to? It satisfies your maximum for both items, but it should probably match with the second one rather than the first one. But not knowing anything about the contents of the dictionary you cannot know how to hash it correctly.
For the second question - using the default string.GetHashCode() method does improve performance, but it destroys the functionality of your equality comparer. If you test this solution on your sample code, you can see that the dict will contain two keys now. This is because GetHashCode returned two different hash codes, so there was no conflict and dict now has two buckets and your Equals method was not even executed.
I can understand fuzzy lookup. But not fuzzy storage. Why would you want to overwrite "aaa" when assigning a value for "aab"? If all you want is fuzzy lookup wouldn't it be better to have a normal dictionary which has an extension to do a fuzzy lookup like...
public static class DictionaryExtensions
{
public static IEnumerable<T> FuzzyMatch<T>(this IDictionary<string, T> dictionary, string key, int distance = 2)
{
IEqualityComparer<string> comparer = new LevenshteinStringComparer(distance);
return dictionary
.Keys
.Where(k => comparer.Equals(k, key))
.Select(k => dictionary[k]);
}
}
This is more of a comment than an answer. To answer your question, if you consider the following example...
"abba" vs "cbbc" => 2
"cddc" vs "cbbc" => 2
"abba" vs "cddc" => 4
You get the gist here? i.e Clearly its not possible for the following to be true
abba == cbbc &&
cddc == cbbc &&
abba != cddc

Using Linq Except with two lists of int arrays

Is it possible to use except with two lists of int arrays, like so:
List<int[]> a = new List<int[]>(){ new int[]{3,4,5}, new int[]{7,8,9}, new int[]{10,11,12} };
List<int[]> b = new List<int[]>(){ new int[]{6,7,9}, new int[]{3,4,5}, new int[]{10,41,12} };
var c = a.Except(b);
and exepecting {3,4,5} to be absent of the enumerable c? Of course I tried and this one is not working. Is there a solution as efficient as Except? Or even better, faster?
In .NET, arrays are only equal to another if they are the exact same array object. So two distinct arrays which have the same contents are not considered equal:
int[] x = new int[] { 1, 2 };
int[] y = new int[] { 1, 2 };
Console.WriteLine(x == y); // false
In order to check the equality based on the contents, you can use Enumerable.SequenceEqual:
Console.WriteLine(x.SequenceEqual(y)); // true
Of course that doesn’t help you directly when trying to use Enumerable.Except, since by default that will use the default equality comparer which only checks for equality (and since every array is inequal to every other array except itself…).
So the solution would be to use the other overload, and provide a custom IEqualityComparer which compares the arrays based on their content.
public class IntArrayEqualityComparer : IEqualityComparer<int[]>
{
public bool Equals(int[] a, int[] b)
{
return a.SequenceEqual(b);
}
public int GetHashCode(int[] a)
{
return a.Sum();
}
}
Unfortunately, just delegating to SequenceEqual is not enough. We also have to provide a GetHashCode implementation for this to work. As a simple solution, we can use the sum of the numbers in the array here. Usually, we would want to provide a strong hash function, which tells a lot about the contents, but since we are only using this hash function for the Except call, we can use something simple here. (In general, we would also want to avoid creating a hash value from a mutable object)
And when using that equality comparer, we correctly filter out the duplicate arrays:
var c = a.Except(b, new IntArrayEqualityComparer());
That's because default EqualityComparer for int array returns false for to arrays with same values:
int[] a1 = { 1, 2, 3 };
int[] a2 = { 1, 2, 3 };
var ec = EqualityComparer<int[]>.Default;
Console.WriteLine(ec.Equals(a1, a2));//result is false
You can fix it by implementing your own EqualityComparer and passing its instance to Except method (see documentation).
You can also read about arrays comparison in C# here.

Implementing a content-hashable HashSet in C# (like python's `frozenset`)

Brief summary
I want to build a set of sets of items in C#. The inner sets of items have a GetHashCode and Equals method defined by their contents. In mathematical notation:
x = { }
x.Add( { A, B, C } )
x.Add( { A, D } )
x.Add( { B, C, A } )
now x should be{ { A, B, C }, { A, D } }
In python, this could be accomplished with frozenset:
x = set()
x.add( frozenset(['A','B','C']) )
x.add( frozenset(['A','D']) )
x.add( frozenset(['B','C','A']) )
/BriefSummary
I would like to have a hashable HashSet in C#. This would allow me to do:
HashSet<ContentHashableHashSet<int>> setOfSets;
Although there are more sophisticated ways to accomplish this, This can be trivially achieved in practice (although not in the most efficient manner) by adding overriding ContentHashableHashSet.ToString() (outputing the strings of the elements contained in sorted order) and then using then using ContentHashableHashSet.ToString().GetHashCode() as the hash code.
However, if an object is modified after placement in setOfSets, it could result in multiple copies:
var setA = new ContentHashableHashSet<int>();
setA.Add(1);
setA.Add(2);
var setB = new ContentHashableHashSet<int>();
setB.Add(1);
setOfSets.Add(setA);
setOfSets.Add(setB);
setB.Add(2); // now there are duplicate members!
As far as I can see, I have two options: I can derive ContentHashableHashSet from HashSet, but then I will need to make it so that all modifiers throw an exception. Missing one modifier could cause an insidious bug.
Alternatively, I can use encapsulation and class ContentHashableHashSet can contain a readonly HashSet. But then I would need to reimplement all set methods (except modifiers) so that the ContentHashableHashSet can behave like a HashSet. As far as I know, extensions would not apply.
Lastly, I could encapsulate as above and then all set-like functionality will occur by returning the const (or readonly?) HashSet member.
In hindsight, this is reminiscent of python's frozenset. Does anyone know of a well-designed way to implement this in C#?
If I was able to lose ISet functionality, then I would simply create a sorted ImmutableList, but then I would lose functionality like fast union, fast intersection, and sub-linear ( roughly O(log(n)) ) set membership with Contains.
EDIT: The base class HashSet does not have virtual Add and Remove methods, so overriding them will work within the derived class, but will not work if you perform HashSet<int> set = new ContentHashableHashSet<int>();. Casting to the base class will allow editing.
EDIT 2: Thanks to #xanatos for recommending a simple GetHashCode implementation:
The easiest way to calculate the GetHashCode is to simply xor (^) all the gethashcodes of the elements. The xor operator is commutative, so the ordering is irrelevant. For the comparison you can use the SetEquals
EDIT 3: Someone recently shared information about ImmutableHashSet, but because this class is sealed, it is not possible to derive from it and override GetHashCode.
I was also told that HashSet takes an IEqualityComparer as an argument, and so this can be used to provide an immutable, content-hashable set without deriving from ImmutableHashSet; however, this is not a very object oriented solution: every time I want to use a ContentHashableHashSet, I will need to pass the same (non-trivial) argument. As I'm sure you know, this can really wreak havoc with your coding zen, and where I would be flying along in python with myDictionary[ frozenset(mySet) ] = myValue, I will be stuck doing the same thing again and again and again.
Thanks for any help you can provide. I have a temporary workaround (whose problems are mentioned in EDIT 1 above), but I'd mostly like to learn about the best way to design something like this.
Hide the elements of your set of sets so that they can't be changed. That means copying when you add/retrieve sets, but maybe that's acceptable?
// Better make sure T is immutable too, else set hashes could change
public class SetofSets<T>
{
private class HashSetComparer : IEqualityComparer<HashSet<T>>
{
public int GetHashCode(HashSet<T> x)
{
return x.Aggregate(1, (code,elt) => code ^ elt.GetHashCode());
}
public bool Equals(HashSet<T> x, HashSet<T> y)
{
if (x==null)
return y==null;
return x.SetEquals(y);
}
}
private HashSet<HashSet<T>> setOfSets;
public SetofSets()
{
setOfSets = new HashSet<HashSet<T>>(new HashSetComparer());
}
public void Add(HashSet<T> set)
{
setOfSets.Add(new HashSet<T>(set));
}
public bool Contains(HashSet<T> set)
{
return setOfSets.Contains(set);
}
}

Using Linq Except not Working as I Thought

List1 contains items { A, B } and List2 contains items { A, B, C }.
What I need is to be returned { C } when I use Except Linq extension. Instead I get returned { A, B } and if I flip the lists around in my expression the result is { A, B, C }.
Am I misunderstanding the point of Except? Is there another extension I am not seeing to use?
I have looked through and tried a number of different posts on this matter with no success thus far.
var except = List1.Except(List2); //This is the line I have thus far
EDIT: Yes I was comparing simple objects. I have never used IEqualityComparer, it was interesting to learn about.
Thanks all for the help. The problem was not implementing the comparer. The linked blog post and example below where helpful.
If you are storing reference types in your list, you have to make sure there is a way to compare the objects for equality. Otherwise they will be checked by comparing if they refer to same address.
You can implement IEqualityComparer<T> and send it as a parameter to Except() function. Here's a blog post you may find helpful.
edit: the original blog post link was broken and has been replaced above
So just for completeness...
// Except gives you the items in the first set but not the second
var InList1ButNotList2 = List1.Except(List2);
var InList2ButNotList1 = List2.Except(List1);
// Intersect gives you the items that are common to both lists
var InBothLists = List1.Intersect(List2);
Edit: Since your lists contain objects you need to pass in an IEqualityComparer for your class... Here is what your except will look like with a sample IEqualityComparer based on made up objects... :)
// Except gives you the items in the first set but not the second
var equalityComparer = new MyClassEqualityComparer();
var InList1ButNotList2 = List1.Except(List2, equalityComparer);
var InList2ButNotList1 = List2.Except(List1, equalityComparer);
// Intersect gives you the items that are common to both lists
var InBothLists = List1.Intersect(List2);
public class MyClass
{
public int i;
public int j;
}
class MyClassEqualityComparer : IEqualityComparer<MyClass>
{
public bool Equals(MyClass x, MyClass y)
{
return x.i == y.i &&
x.j == y.j;
}
public int GetHashCode(MyClass obj)
{
unchecked
{
if (obj == null)
return 0;
int hashCode = obj.i.GetHashCode();
hashCode = (hashCode * 397) ^ obj.i.GetHashCode();
return hashCode;
}
}
}
You simply confused the order of arguments. I can see where this confusion arose, because the official documentation isn't as helpful as it could be:
Produces the set difference of two sequences by using the default equality comparer to compare values.
Unless you're versed in set theory, it may not be clear what a set difference actually is—it's not simply what's different between the sets. In reality, Except returns the list of elements in the first set that are not in the second set.
Try this:
var except = List2.Except(List1); // { C }
Writing a custom comparer does seem to solve the problem, but I think https://stackoverflow.com/a/12988312/10042740 is a much more simple and elegant solution.
It overwrites the GetHashCode() and Equals() methods in your object defining class, then the default comparer does its magic without extra code cluttering up the place.
Just for Ref:
I wanted to compare USB Drives connected and available to the system.
So this is the class which implements interface IEqualityComparer
public class DriveInfoEqualityComparer : IEqualityComparer<DriveInfo>
{
public bool Equals(DriveInfo x, DriveInfo y)
{
if (object.ReferenceEquals(x, y))
return true;
if (x == null || y == null)
return false;
// compare with Drive Level
return x.VolumeLabel.Equals(y.VolumeLabel);
}
public int GetHashCode(DriveInfo obj)
{
return obj.VolumeLabel.GetHashCode();
}
}
and you can use it like this
var newDeviceLst = DriveInfo.GetDrives()
.ToList()
.Except(inMemoryDrives, new DriveInfoEqualityComparer())
.ToList();

What's the role of GetHashCode in the IEqualityComparer<T> in .NET?

I'm trying to understand the role of the GetHashCode method of the interface IEqualityComparer.
The following example is taken from MSDN:
using System;
using System.Collections.Generic;
class Example {
static void Main() {
try {
BoxEqualityComparer boxEqC = new BoxEqualityComparer();
Dictionary<Box, String> boxes = new Dictionary<Box,
string>(boxEqC);
Box redBox = new Box(4, 3, 4);
Box blueBox = new Box(4, 3, 4);
boxes.Add(redBox, "red");
boxes.Add(blueBox, "blue");
Console.WriteLine(redBox.GetHashCode());
Console.WriteLine(blueBox.GetHashCode());
}
catch (ArgumentException argEx) {
Console.WriteLine(argEx.Message);
}
}
}
public class Box {
public Box(int h, int l, int w) {
this.Height = h;
this.Length = l;
this.Width = w;
}
public int Height { get; set; }
public int Length { get; set; }
public int Width { get; set; }
}
class BoxEqualityComparer : IEqualityComparer<Box> {
public bool Equals(Box b1, Box b2) {
if (b1.Height == b2.Height & b1.Length == b2.Length
& b1.Width == b2.Width) {
return true;
}
else {
return false;
}
}
public int GetHashCode(Box bx) {
int hCode = bx.Height ^ bx.Length ^ bx.Width;
return hCode.GetHashCode();
}
}
Shouldn't the Equals method implementation be enough to compare two Box objects? That is where we tell the framework the rule used to compare the objects. Why is the GetHashCode needed?
Thanks.
Lucian
A bit of background first...
Every object in .NET has an Equals method and a GetHashCode method.
The Equals method is used to compare one object with another object - to see if the two objects are equivalent.
The GetHashCode method generates a 32-bit integer representation of the object. Since there is no limit to how much information an object can contain, certain hash codes are shared by multiple objects - so the hash code is not necessarily unique.
A dictionary is a really cool data structure that trades a higher memory footprint in return for (more or less) constant costs for Add/Remove/Get operations. It is a poor choice for iterating over though. Internally, a dictionary contains an array of buckets, where values can be stored. When you add a Key and Value to a dictionary, the GetHashCode method is called on the Key. The hashcode returned is used to determine the index of the bucket in which the Key/Value pair should be stored.
When you want to access the Value, you pass in the Key again. The GetHashCode method is called on the Key, and the bucket containing the Value is located.
When an IEqualityComparer is passed into the constructor of a dictionary, the IEqualityComparer.Equals and IEqualityComparer.GetHashCode methods are used instead of the methods on the Key objects.
Now to explain why both methods are necessary, consider this example:
BoxEqualityComparer boxEqC = new BoxEqualityComparer();
Dictionary<Box, String> boxes = new Dictionary<Box, string>(boxEqC);
Box redBox = new Box(100, 100, 25);
Box blueBox = new Box(1000, 1000, 25);
boxes.Add(redBox, "red");
boxes.Add(blueBox, "blue");
Using the BoxEqualityComparer.GetHashCode method in your example, both of these boxes have the same hashcode - 100^100^25 = 1000^1000^25 = 25 - even though they are clearly not the same object. The reason that they are the same hashcode in this case is because you are using the ^ (bitwise exclusive-OR) operator so 100^100 cancels out leaving zero, as does 1000^1000. When two different objects have the same key, we call that a collision.
When we add two Key/Value pairs with the same hashcode to a dictionary, they are both stored in the same bucket. So when we want to retrieve a Value, the GetHashCode method is called on our Key to locate the bucket. Since there is more than one value in the bucket, the dictionary iterates over all of the Key/Value pairs in the bucket calling the Equals method on the Keys to find the correct one.
In the example that you posted, the two boxes are equivalent, so the Equals method returns true. In this case the dictionary has two identical Keys, so it throws an exception.
TLDR
So in summary, the GetHashCode method is used to generate an address where the object is stored. So a dictionary doesn't have to search for it. It just computes the hashcode and jumps to that location. The Equals method is a better test of equality, but cannot be used to map an object into an address space.
GetHashCode is used in Dictionary colections and it creates hash for storing objects in it. Here is a nice article why and how to use IEqualtyComparer and GetHashCode http://dotnetperls.com/iequalitycomparer
While it would be possible for a Dictionary<TKey,TValue> to have its GetValue and similar methods call Equals on every single stored key to see whether it matches the one being sought, that would be very slow. Instead, like many hash-based collections, it relies upon GetHashCode to quickly exclude most non-matching values from consideration. If calling GetHashCode on an item being sought yields 42, and a collection has 53,917 items, but calling GetHashCode on 53,914 of the items yielded a value other than 42, then only 3 items will have to be compared to the ones being sought. The other 53,914 may safely be ignored.
The reason a GetHashCode is included in an IEqualityComparer<T> is to allow for the possibility that a dictionary's consumer might want to regard as equal objects that would normally not regard each other as equal. The most common example would be a caller that wants to use strings as keys but use case-insensitive comparisons. In order to make that work efficiently, the dictionary will need to have some form of hash function that will yield the same value for "Fox" and "FOX", but hopefully yield something else for "box" or "zebra". Since the GetHashCode method built into String doesn't work that way, the dictionary will need to get such a method from somewhere else, and IEqualityComparer<T> is the most logical place since the need for such a hash code would be very strongly associated with an Equals method that considers "Fox" and "FOX" identical to each other, but not to "box" or "zebra".

Categories

Resources