Selecting DataRows into new structures using LINQ. Calling Distinct() fails - c#

Consider these two structures:
struct Task
{
public Int32 Id;
public String Name;
public List<Registration> Registrations;
}
struct Registration
{
public Int32 Id;
public Int32 TaskId;
public String Comment;
public Double Hours;
}
I am selecting a bunch of entries in a DataTable into new structures, like so:
var tasks = data.AsEnumerable().Select(t => new Task
{
Id = Convert.ToInt32(t["ProjectTaskId"]),
Name = Convert.ToString(t["ProjectTaskName"]),
Registrations = new List<Registration>()
});
But when I call Distinct() on the collection, it doesn't recognize objects with the same values (Id, Name, Registrations) as being equal.
But if I use an equality comparer; comparing the Id property on the objects, it's all fine and dandy...:
class TaskIdComparer : IEqualityComparer<Task>
{
public bool Equals(Task x, Task y)
{
return x.Id == y.Id;
}
public Int32 GetHashCode(Task t)
{
return t.Id.GetHashCode();
}
}
What am I missing here? Is Distinct() checking something else than the value of properties?

LINQ's Distinct method compares objects using the objects' Equals and GetHashCode implementations.
Therefore, if these methods are not overridden, it will compare by reference, not by value.
You need to use an EqualityComparer. (Or implement Equals and GetHashCode for the Task class)

my guess is that it's the list in there. Almost certainly, the two list objects are different, even if they contain the same info.

Related

How do I make structural equality to work on collection properties in C#?

One of the great advantages is supposed to be value based/structural equality, but how do I get that to work with collection properties?
Concrete simple example:
public record Something(string Id);
public record Sample(List<Something> something);
With the above records I would expect the following test to pass:
[Fact]
public void Test()
{
var x = new Sample(new List<Something>() {
new Something("x1")
});
var y = new Sample(new List<Something>() {
new Something("x1")
});
Assert.Equal(x, y);
}
I understand that it is because of List being a reference type, but does it exist a collection that implements value based comparison? Basically I would like to do a "deep" value based comparison.
Records don't do this automatically, but you can implement the Equals method yourself:
public record Sample(List<Something> something) : IEquatable<Sample>
{
public virtual bool Equals(Sample? other) =>
other != null &&
Enumerable.SequenceEqual(something, other.something);
}
But note that GetHashCode should be overridden to be consistent with Equals. See also implement GetHashCode() for objects that contain collections

How to implement multiple GetHashCode methods?

I have an interface which defines a composite key:
public interface IKey : IEquatable<IKey>
{
public bool KeyPart1 { get; }
public uint KeyPart2 { get; }
int GetHashCode(); // never gets called
}
I have an object (with an ID) to which I want to add the composite key interface:
public class MyObject: IEquatable<MyObject>, IKey
{
public MyObject(int i, (bool keyPart1, uint keyPart2) key) {
{
Id=i;
KeyPart1 = key.keyPart1;
KeyPart2 = key.keyPart2;
}
public int Id { get; }
public bool KeyPart1 { get; }
public uint KeyPart2 { get; }
public bool Equals(MyObject other) => this.Id == other.Id;
public override bool Equals(object other) => other is MyObject o && Equals(o);
public override int GetHashCode() => Id.GetHashCode();
bool IEquatable<IKey>.Equals(IKey other) => this.KeyPart1 == other.KeyPart1
&& this.KeyPart2 == other.KeyPart2;
int IKey.GetHashCode() => (KeyPart1, KeyPart2).GetHashCode(); // never gets called
}
However, when have a list of these objects and try to group them using the interface, the grouping fails:
var one = new MyObject(1, (true, 1));
var two = new MyObject(2, (true, 1));
var three = new MyObject(1, (false, 0));
var items = new[] { one, two, three };
var byId = items.GroupBy(i => i);
// result: { [one, three] }, { [two] } -- as expected
var byKey = items.GroupBy<MyObject, IKey>(i => i as IKey);
// result: { [one, two, three] } // not grouped (by 'id' or 'key')
// expected: { [one, two] }, { [three] }
I'd expected that byId would have the items grouped by the Id property, and byKey would have the items grouped by the Key property.
However, byKey is not grouped at all. It appears that the override GetHashCode() method is always used rather than the explicitly implemented interface method.
Is it possible to implement something like this, where the type of the item being grouped determines the hash method to use (avoiding an EqualityComparer)?
I noticed this problem when passing the cast objects to another method expecting an IEnumerable<IKey>. I have a few different types implementing IKey and those with an existing GetHashCode() method did not work, while the others did.
Please note the objects have been simplified here and that I cannot easily change the interfaces (e.g. to use ValueTuple instead).
The GetHashCode() used in equality is either:
the one defined via object.GetHashCode(), if no equality comparer is provided
IEqualityComparer<T>.GetHashCode(T), if an equality comparer is provided
Adding your own GetHashCode() method on your own interface does nothing, and it will never be used, as it is not part of an API that the framework/library code knows about.
So, I'd forget about IKey.GetHashCode(), and either (or both):
make MyObject.GetHashCode() provide the functionality you need, or
provide a custom equality comparer separately to the MyObject instance
There are overloads of GroupBy that accept an IEqualityComparer<TKey>, for the second option.

How come SortedSet gives unique output without using IEqualitycomparer

At high level getting unique values for reference types requires implementing IEqualityComparer with HashSet but with SortedSet which is HashSet as well it does not seem to be required.
Here is an example. Lets say we have Employee class and EmployeeComparer classes below -
public class Employee
{
public string Name { get; set; }
public int Age { get; set; }
}
public class EmployeeComparer : IEqualityComparer<Employee>, IComparer<Employee>
{
public bool Equals(Employee x, Employee y)
{
return string.Equals(x.Name, y.Name);
}
public int GetHashCode(Employee obj)
{
return obj.Name.GetHashCode();
}
public int Compare(Employee x, Employee y)
{
return string.Compare(x.Name, y.Name);
}
}
If I have to use HashSet to get unique Employees based on name it works only if I have EmployeeComparer implementing IEqualityComparer but if I use SortedSet it gives me unique values even if the class EmployeeComparer does not implement IEqualityComparer and just the IComparer. What happens to the requirement of providing GetHashCode() and Equals() method for uniqueness here?
An IComparer<T> is fully sufficient for determining whether two objects are semantically equal.
If ICompararer<T>.Compare() returns 0, then the objects are considered to be equal. If it returns a nonzero value, then they are considered to be nonequal. Since the SortedSet<T> is supposed to put the values in sorted order, it needs a comparison function, but it doesn't need an equality function on top of that.

Can I use LINQ to check if objects in a list have a unique ID?

say I have a list containing objects like this one:
public class Person
{
private string _name;
private string _id;
private int _age;
public Person
{
}
// Accessors
}
public class ManipulatePerson
{
Person person = new Person();
List<Person> personList = new List<Person>;
// Assign values
private void PopulateList();
{
// Loop
personList.Add(person);
// Check if every Person has a unique ID
}
}
and I wanted to check that each Person had a unique ID. I would like to return a boolean true/false depending on whether or not the IDs are unique. Is this something I can achieve with LINQ?
Note that you can even leverage directly an HashSet<>:
var hs = new HashSet<string>();
bool areAllPeopleUnique = personList.All(x => hs.Add(x.Id));
(and is the code that I normally use)
It has the advantage that on the best case (presence of some duplicates) it will stop before analyzing all the personList collection.
I would use Distinct and then check against the counts for example:
bool bAreAllPeopleUnique = (personList.Distinct(p => p.ID).Count == personList.Count);
However as #Ian commented you will need to add a property to the Person class so that you can access the Id like so:
public string ID
{
get { return _id; }
}
A 'nicer' way to implement this would be to add a method like so:
private bool AreAllPeopleUnique(IEnumerable<Person> people)
{
return (personList.Distinct(p => p.ID).Count == personList.Count);
}
NOTE: The method takes in an IEnumerable not a list so that any class implementing that interface can use the method.
One of best ways to do so is overriding Equals and GetHashCode, and implementing IEquatable<T>:
public class Person : IEquatable<Person>
{
public string Id { get; set; }
public override bool Equals(object some) => Equals(some as Person);
public override bool GetHashCode() => Id != null ? Id.GetHashCode() : 0;
public bool Equals(Person person) => person != null && person.UniqueId == UniqueId;
}
Now you can use HashSet<T> to store unique objects and it will be impossible that you store duplicates. And, in addition, if you try to add a duplicated item, Add will return false.
NOTE: My IEquatable<T>, and Equals/GetHashCode overrides are very basic, but this sample implementation should give you a good hint on how to elegantly handle your scenario.
You can check this Q&A to get an idea on how to implement GetHashCode What is the best algorithm for an overridden System.Object.GetHashCode?
Maybe this other Q&A might be interesitng for you: Why is it important to override GetHashCode when Equals method is overridden?
You can use GroupBy for getting unique items:
var result = personList.GroupBy(p=> p.Id)
.Select(grp => grp.First())
.ToList();

Sorting C# List based on its element

I have the C# class as follows :
public class ClassInfo {
public string ClassName;
public int BlocksCovered;
public int BlocksNotCovered;
public ClassInfo() {}
public ClassInfo(string ClassName, int BlocksCovered, int BlocksNotCovered)
{
this.ClassName = ClassName;
this.BlocksCovered = BlocksCovered;
this.BlocksNotCovered = BlocksNotCovered;
}
}
And I have C# List of ClassInfo() as follows
List<ClassInfo> ClassInfoList;
How can I sort ClassInfoList based on BlocksCovered?
myList.Sort((x,y) => x.BlocksCovered.CompareTo(y.BlocksCovered)
This returns a List<ClassInfo> ordered by BlocksCovered:
var results = ClassInfoList.OrderBy( x=>x.BlocksCovered).ToList();
Note that you should really make BlocksCovered a property, right now you have public fields.
If you have a reference to the List<T> object, use the Sort() method provided by List<T> as follows.
ClassInfoList.Sort((x, y) => x.BlocksCovered.CompareTo(y.BlocksCovered));
If you use the OrderBy() Linq extension method, your list will be treated as an enumerator, meaning it will be redundantly converted to a List<T>, sorted and then returned as enumerator which needs to be converted to a List<T> again.
I'd use Linq, for example:
ClassInfoList.OrderBy(c => c.ClassName);

Categories

Resources