C# HashSet Generic allows duplicate - c#

Reading HashSet on MSDN, it says with HashSet<T>, if T implements IEquatable<T> then the HashSet uses this for IEqualityComparer<T>.Default.
So, let the class Person:
public class Person : IEquality<Person>
{
private string pName;
public Person(string name){ pName=name; }
public string Name
{
get { return pName; }
set
{
if (pName.Equals(value, StringComparison.InvariantCultureIgnoreCase))
{
return;
}
pName = value;
}
}
public bool Equals(Person other)
{
if(other==null){return false;}
return pName.Equals(other.pName, StringComparison.InvariantCultureIgnoreCase);
}
public override bool Equals(object obj)
{
Person other = obj as Person;
if(other==null){return false;}
return Equals(other);
}
public override int GetHashCode(){return pName.GetHashCode();}
public override string ToString(){return pName;}
}
So, let's define in another class or main function:
HashSet<Person> set = new HashSet<Person>();
set.Add(new Person("Smith"); // return true
Person p = new Person("Smi");
set.Add(p); // return true
p.Name = "Smith"; // no error occurs
And now, you've got 2 Person objects in the HashSet with the same name (so that, there are "Equals").
HashSet let us put duplicate objects.

HashSet let us put duplicate objects.
It isn't letting you put in duplicate objects. The issue is that you're mutating the object after it's been added.
Mutating objects being used as keys in dictionaries or stored as hashes is always problematic, and something I would recommend avoiding.

Related

Compare two objects based on criteria

I need to compare two List<object> but during comparison for properties having "string" value I don't want case sensitive comparison.
I have a class:
class User
{
public int Id { get;set; }
public string name { get;set; }
}
I have 2 lists List<User> olduser and List<User> newuser. I need to compare both lists but while comparing I should ignore case sensitivity of "name" field and get values in olduser not part of values in newuser.
List<User> obsoleteUsers = olduser.Except(newuser).ToList();
I need to add a condition that while comparing two lists, please ignore the case for "name" field.
You can use a custom IEqualityComparer<T>:
class UserNameComparer : IEqualityComparer<User>
{
public UserNameComparer(StringComparer comparer)
{
if (comparer == null) throw new ArgumentNullException(nameof(comparer));
this.Comparer = comparer;
}
public StringComparer Comparer { get; }
public bool Equals(User x, User y)
{
if (x == null || y == null) return true;
return Comparer.Equals(x.name, y.name);
}
public int GetHashCode(User obj)
{
return Comparer.GetHashCode(obj?.name);
}
}
You use it in Except(or other LINQ methods):
List<User> obsoleteUsers = olduser
.Except(newuser, new UserNameComparer(StringComparer.InvariantCultureIgnoreCase))
.ToList();
On this way you can implement multiple comparers for different requirements without changing the original class and the way it identifies duplicates(for example by the ID-property).
Note that Except(and other set based methods like Distinct) use GetHashCode to fast-check if an object equals another. That's why your class should override Equals and GetHashCode(always together) to support being used in a set based collection(like HashSet<T> or Dictionary<TKey, TValue>). Otherwise you will use the version from System.Object that just compares references and not properties.
If you want to compare for equality with your own rules, let's implement Equals and GetHashCode methods:
class User : IEquatable<User> {
// Dangerous practice: Id (and name) usually should be readonly:
// we can put instance into, say, dictionary and then change Id loosing the instance
public int Id { get; set; }
public string name { get; set; }
public bool Equals(User other) {
if (null == other)
return false;
return
Id == other.Id &&
string.Equals(name, other.name, StringComparison.OrdinalIgnoreCase);
}
public override bool Equals(object obj) => Equals(obj as User);
public override int GetHashCode() => Id;
}
Then you can put Except as usual

Can I use LINQ to check if objects in a list have a unique ID?

say I have a list containing objects like this one:
public class Person
{
private string _name;
private string _id;
private int _age;
public Person
{
}
// Accessors
}
public class ManipulatePerson
{
Person person = new Person();
List<Person> personList = new List<Person>;
// Assign values
private void PopulateList();
{
// Loop
personList.Add(person);
// Check if every Person has a unique ID
}
}
and I wanted to check that each Person had a unique ID. I would like to return a boolean true/false depending on whether or not the IDs are unique. Is this something I can achieve with LINQ?
Note that you can even leverage directly an HashSet<>:
var hs = new HashSet<string>();
bool areAllPeopleUnique = personList.All(x => hs.Add(x.Id));
(and is the code that I normally use)
It has the advantage that on the best case (presence of some duplicates) it will stop before analyzing all the personList collection.
I would use Distinct and then check against the counts for example:
bool bAreAllPeopleUnique = (personList.Distinct(p => p.ID).Count == personList.Count);
However as #Ian commented you will need to add a property to the Person class so that you can access the Id like so:
public string ID
{
get { return _id; }
}
A 'nicer' way to implement this would be to add a method like so:
private bool AreAllPeopleUnique(IEnumerable<Person> people)
{
return (personList.Distinct(p => p.ID).Count == personList.Count);
}
NOTE: The method takes in an IEnumerable not a list so that any class implementing that interface can use the method.
One of best ways to do so is overriding Equals and GetHashCode, and implementing IEquatable<T>:
public class Person : IEquatable<Person>
{
public string Id { get; set; }
public override bool Equals(object some) => Equals(some as Person);
public override bool GetHashCode() => Id != null ? Id.GetHashCode() : 0;
public bool Equals(Person person) => person != null && person.UniqueId == UniqueId;
}
Now you can use HashSet<T> to store unique objects and it will be impossible that you store duplicates. And, in addition, if you try to add a duplicated item, Add will return false.
NOTE: My IEquatable<T>, and Equals/GetHashCode overrides are very basic, but this sample implementation should give you a good hint on how to elegantly handle your scenario.
You can check this Q&A to get an idea on how to implement GetHashCode What is the best algorithm for an overridden System.Object.GetHashCode?
Maybe this other Q&A might be interesitng for you: Why is it important to override GetHashCode when Equals method is overridden?
You can use GroupBy for getting unique items:
var result = personList.GroupBy(p=> p.Id)
.Select(grp => grp.First())
.ToList();

Mono implementation of Dictionary<T,T> using .Equals(obj o) instead of .GetHashCode()

By searching though msdn c# documentation and stack overflow, I get the clear impression that Dictionary<T,T> is supposed to use GetHashCode() for checking key-uniqueness and to do look-up.
The Dictionary generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.
...
The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.
I Use mono (in Unity3D), and after getting some weird results in my work, I conducted this experiment:
public class DictionaryTest
{
public static void TestKeyUniqueness()
{
//Test a dictionary of type1
Dictionary<KeyType1, string> dictionaryType1 = new Dictionary<KeyType1, string>();
dictionaryType1[new KeyType1(1)] = "Val1";
if(dictionaryType1.ContainsKey(new KeyType1(1)))
{
Debug.Log ("Key in dicType1 was already present"); //This line does NOT print
}
//Test a dictionary of type1
Dictionary<KeyType2, string> dictionaryType2 = new Dictionary<KeyType2, string>();
dictionaryType2[new KeyType2(1)] = "Val1";
if(dictionaryType2.ContainsKey(new KeyType2(1)))
{
Debug.Log ("Key in dicType2 was already present"); // Only this line prints
}
}
}
//This type implements only GetHashCode()
public class KeyType1
{
private int var1;
public KeyType1(int v1)
{
var1 = v1;
}
public override int GetHashCode ()
{
return var1;
}
}
//This type implements both GetHashCode() and Equals(obj), where Equals uses the hashcode.
public class KeyType2
{
private int var1;
public KeyType2(int v1)
{
var1 = v1;
}
public override int GetHashCode ()
{
return var1;
}
public override bool Equals (object obj)
{
return GetHashCode() == obj.GetHashCode();
}
}
Only the when using type KeyType2 are the keys considered equal. To me this demonstrates that Dictionary uses Equals(obj) - and not GetHashCode().
Can someone reproduce this, and help me interpret the meaning is? Is it an incorrect implementation in mono? Or have I misunderstood something.
i get the clear impression that Dictionary is supposed to use
.GetHashCode() for checking key-uniqueness
What made you think that? GetHashCode doesn't return unique values.
And MSDN clearly says:
Dictionary requires an equality implementation to
determine whether keys are equal. You can specify an implementation of
the IEqualityComparer generic interface by using a constructor that
accepts a comparer parameter; if you do not specify an implementation,
the default generic equality comparer EqualityComparer.Default is
used. If type TKey implements the System.IEquatable generic
interface, the default equality comparer uses that implementation.
Doing this:
public override bool Equals (object obj)
{
return GetHashCode() == obj.GetHashCode();
}
is wrong in the general case because you might end up with KeyType2 instances that are equal to StringBuilder, SomeOtherClass, AnythingYouCanImagine and what not instances.
You should totally do it like so:
public override bool Equals (object obj)
{
if (obj is KeyType2) {
return (obj as KeyType2).var1 == this.var1;
} else
return false;
}
When you are trying to override Equals and inherently GetHashCode you must ensure the following points (given the class MyObject) in this order (you were doing it the other way around):
1) When are 2 instances of MyObject equal ? Say you have:
public class MyObject {
public string Name { get; set; }
public string Address { get; set; }
public int Age { get; set; }
public DateTime TimeWhenIBroughtThisInstanceFromTheDatabase { get; set; }
}
And you have 1 record in some database that you need to be mapped to an instance of this class.
And you make the convention that the time you read the record from the database will be stored
in the TimeWhenIBroughtThisInstanceFromTheDatabase:
MyObject obj1 = DbHelper.ReadFromDatabase( ...some params...);
// you do that at 14:05 and thusly the TimeWhenIBroughtThisInstanceFromTheDatabase
// will be assigned accordingly
// later.. at 14:07 you read the same record into a different instance of MyClass
MyObject obj2 = DbHelper.ReadFromDatabase( ...some params...);
// (the same)
// At 14:09 you ask yourself if the 2 instances are the same
bool theyAre = obj1.Equals(obj2)
Do you want the result to be true ? I would say you do.
Therefore the overriding of Equals should like so:
public class MyObject {
...
public override bool Equals(object obj) {
if (obj is MyObject) {
var that = obj as MyObject;
return (this.Name == that.Name) &&
(this.Address == that.Address) &&
(this.Age == that.Age);
// without the syntactically possible but logically challenged:
// && (this.TimeWhenIBroughtThisInstanceFromTheDatabase ==
// that.TimeWhenIBroughtThisInstanceFromTheDatabase)
} else
return false;
}
...
}
2) ENSURE THAT whenever 2 instances are equal (as indicated by the Equals method you implement)
their GetHashCode results will be identitcal.
int hash1 = obj1.GetHashCode();
int hash2 = obj2.GetHashCode();
bool theseMustBeAlso = hash1 == hash2;
The easiest way to do that is (in the sample scenario):
public class MyObject {
...
public override int GetHashCode() {
int result;
result = ((this.Name != null) ? this.Name.GetHashCode() : 0) ^
((this.Address != null) ? this.Address.GetHashCode() : 0) ^
this.Age.GetHashCode();
// without the syntactically possible but logically challenged:
// ^ this.TimeWhenIBroughtThisInstanceFromTheDatabase.GetHashCode()
}
...
}
Note that:
- Strings can be null and that .GetHashCode() might fail with NullReferenceException.
- I used ^ (XOR). You can use whatever you want as long as the golden rule (number 2) is respected.
- x ^ 0 == x (for whatever x)

DDD Entity class

I'm trying to implement DDD approach in my head.
So far I know that entity is unique and identified by combination of its attributes.
Entity is abstract class which will be implemented by other entity classes.
I know so far that Version property is used to manage concurrency.
Need help with rest of this class.
I'm in process of learning DDD so please describe your thoughts on this or share useful concrete theme links.
public abstract class Entity<T>
{
#region Properties
protected T _Id;
private int _Version;
#endregion
private static bool IsTransient(Entity<T> obj)
{
return (obj != null) &&
Equals(obj.Id, default(T));
}
private Type GetUnproxiedType()
{
return GetType();
}
public virtual T Id
{
get
{
return _Id;
}
protected set
{
_Id = value;
}
}
public virtual int Version
{
get
{
return _Version;
}
set
{
_Version = value;
}
}
public override bool Equals(object obj)
{
return Equals(obj as Entity<T>);
}
public virtual bool Equals(Entity<T> other)
{
if (other == null) return false;
if (ReferenceEquals(this, other)) return true;
if (!IsTransient(this) &&
!IsTransient(other) &&
Equals(Id, other.Id))
{
var otherType = other.GetUnproxiedType();
var thisType = GetUnproxiedType();
return (thisType.IsAssignableFrom(otherType)
|| otherType.IsAssignableFrom(thisType));
}
return false;
}
public override int GetHashCode()
{
if (Equals(Id, default(T)))
{
return base.GetHashCode();
}
return Id.GetHashCode();
}
public static bool operator ==(Entity<T> e1, Entity<T> e2)
{
return Equals(e1, e2);
}
public static bool operator !=(Entity<T> e1, Entity<T> e2)
{
return !Equals(e1, e2);
}
}
Updated
After some researching I'm concluded with this
Using Equals method will come handy when we try to compare two objects. For example imaging place in application where we have the same object in more than one instance. One from the database and one from the client. We need to find out if that object is identical and yet not the same.
For this we will be using Equals method.
An entity is not uniquely identified by the value of its attributes; An entity has an identity, but that does not mean that the identity is made up by the value of its attributes.
That would mean that, if you change one if the attributes of an entity, it's identity will change, and you do not want that behaviour.
(For instance, if your address changes, your identity doesn't change, you're still the same person, living at another address).
(I see that you've implemented identity/equality by using an Id property, which is a good thing).
Next to entities, you have value - objects. The identity of a value object is made up by the values of its attributes, and therefore, it is better that a value object is immutable, since, changing the value object would also mean that you change its identity.
But, what is your question ?

Merging two IEnumerable<T>s

I have two IEnumerable<T>s.
One gets filled with the fallback ellements. This one will always contain the most elements.
The other one will get filled depending on some parameters and will possibly contain less elements.
If an element doesn't exist in the second one, I need to fill it with the equivalent one of the first one.
This code does the job, but feels inefficient to me and requires me to cast the IEnumerables to ILists or to use a temporary list
Person implements IEquatable
IEnumerable<Person> fallBack = Repository.GetPersons();
IList<Person> translated = Repository.GetPersons(language).ToList();
foreach (Person person in fallBack)
{
if (!translated.Any(p=>p.equals(person)))
translated.add(person);
}
Any suggestions?
translated.Union(fallback)
or (if Person doesn't implement IEquatable<Person> by ID)
translated.Union(fallback, PersonComparer.Instance)
where PersonComparer is:
public class PersonComparer : IEqualityComparer<Person>
{
public static readonly PersonComparer Instance = new PersonComparer();
// We don't need any more instances
private PersonComparer() {}
public int GetHashCode(Person p)
{
return p.id;
}
public bool Equals(Person p1, Person p2)
{
if (Object.ReferenceEquals(p1, p2))
{
return true;
}
if (Object.ReferenceEquals(p1, null) ||
Object.ReferenceEquals(p2, null))
{
return false;
}
return p1.id == p2.id;
}
}
Try this.
public static IEnumerable<Person> SmartCombine(IEnumerable<Person> fallback, IEnumerable<Person> translated) {
return translated.Concat(fallback.Where(p => !translated.Any(x => x.id.equals(p.id)));
}
use Concat. Union does not work in case List<dynamic> type

Categories

Resources