LINQ to Objects Group By multiple properties comparer - c#

In Linq to Object (Enumerables)
How does the default comparer resolve the following?
//The following is essentially a select distinct
var x = from student in class
group student by new {student.MajorId, student.GradeId} into performanceStudentGroup
select new { performanceStudentGroup.Key.MajorId, performanceStudentGroup.Key.GradeId};
Obviously in order for the above to work, the framework has to compare 2 anonymous types to check if they belong in the same group.
How does it do that? How is it able to check for something more than the reference pointers?
How is that comparer different than the following code:
var y = (from student in class
select new { student.MajorId, student.GradeId}).Distinct();

Following the MSDN documentation on Anyonymous Types
Because the Equals and GetHashCode methods on anonymous types are defined in terms of the Equals and GetHashcode methods of the properties, two instances of the same anonymous type are equal only if all their properties are equal.
The anonymous type will be created and has an Equals() and GetHashCode() implementation based on the properties. If the properties will be the same, these these two methods will return the same result. I imagine there's some collection in the back that holds the different instances of the anonymous type.

When the compiler generates an anonymous type, it also generates Equals() and GetHashCode() methods to compare it by value. You can see this in a decompiler.
The default EqualityComparer doesn't know anything about anonymous types; it simply calls these methods.

Related

LINQ Group by different output

I am having a hard time grouping a dbset (EntityFramework) by two fields and sending that output to a strongly typed view.
When I use an anonymous type for the composite key I get the right output. A list containing one item and that item in turn has two or more grouping items.
Now if I use a class instead I get a list of two items and in turn each item has one grouping item.
var output = context.Transfers.GroupBy(t=> new { t.TNumber, t.Type}).ToList();
var output2 = context.Transfers.AsEnumerable()
.GroupBy(t => new OTSpecs(t.TNumber, t.Type)).ToList();
OTSpecs is just a simple class, with those public fields and a parameter constructor.
I need to add the AsEnumerable() otherwise I get a System.NotSupportedException Only parameterless constructors and initializers are supported in LINQ to Entities
Also because I need to define the model in the view like this
#model IEnumerable<IGrouping<OTSpecs, Transfer>>
unless of course it is possible to replace OTSpecs in that line with the anonymous type. But I don't know how.
My question is why those lines of code produce a different output?
Is it possible to define the model in the view replacing the OTSpecs for a anonymous type?
Anonymous types implement equality comparison which compares all their properties. So when you are using anonymous type as a key, linq is able to identify that two key objects are same and should be grouped together.
Your custom object, I suspect, does not implement that stuff, so for it just general object comparison is used, which just compares references. Two key objects have difference references - thus different groups.
To fix this, you may need to either pass in equality comparer, or implement Equals in your class OTSpecs.

Build linq groupby expression dynamically with nested property from string

I use nhibernate mapping by code, I want to make this expression dynamicllay (with a nested object)
I have a class event that has a relation many to one with Event state/and I want to grouping by code in the table EventState
var grouping = query.GroupBy(x => x.EventState.Code)
It works for me with a simple property, here is my code:
var arg = Expression.Parameter(type, categoryColumnName);
var bodyy = Expression.Convert(Expression.Property(arg, categoryColumnName), typeof (object));
var lambdaGroupBy = Expression.Lambda<Func<Operation, object>>(bodyy, arg);
var keySelector = lambdaGroupBy.Compile();
var grouping = query.GroupBy(keySelector);
return grouping.Select(a => new PieChartObject { Category = a.Key.ToString(), Value = a.Count().ToString() }).ToList();
But I can't do it with nested object.
GroupBy will partition your query by what you provide as key selector. To determine whether two items in your query have the same key, it uses the default comparer of the given type. For object, this is uses the Equals and GetHashCode methods which in turn for strings mean that the contents of the strings are identical. If you use a class, by default the reference identity is used, so I think that GroupBy isn't doing anything in your case because the keys you provided are not identical, even though they may have the same values.
So there are two valid solutions: You can either override Equals and GetHashCode in your nested object class, or you can provide a custom key comparer to GroupBy, if you want this behavior only for this particular query. But I guess, as you want to be generic, implementing Equals and GetHashCode would be a better option. The only exception is of course when you cannot do this, e.g. because it is a compiler-generated class. In that case, there is few things you can do about that.

Why does the Equals implementation for anonymous types compare fields?

I'm just wondering why designers of the language decided to implement Equals on anonymous types similarly to Equals on value types. Isn't it misleading?
public class Person
{
public string Name { get; set; }
public int Age { get; set; }
}
public static void ProofThatAnonymousTypesEqualsComparesBackingFields()
{
var personOne = new { Name = "Paweł", Age = 18 };
var personTwo = new { Name = "Paweł", Age = 18 };
Console.WriteLine(personOne == personTwo); // false
Console.WriteLine(personOne.Equals(personTwo)); // true
Console.WriteLine(Object.ReferenceEquals(personOne, personTwo)); // false
var personaOne = new Person { Name = "Paweł", Age = 11 };
var personaTwo = new Person { Name = "Paweł", Age = 11 };
Console.WriteLine(personaOne == personaTwo); // false
Console.WriteLine(personaOne.Equals(personaTwo)); // false
Console.WriteLine(Object.ReferenceEquals(personaOne, personaTwo)); // false
}
At first glance, all printed boolean values should be false. But lines with Equals calls return different values when Person type is used, and anonymous type is used.
Anonymous type instances are immutable data values without behavior or identity. It doesn't make much sense to reference-compare them. In that context I think it is entirely reasonable to generate structural equality comparisons for them.
If you want to switch the comparison behavior to something custom (reference comparison or case-insensitivity) you can use Resharper to convert the anonymous type to a named class. Resharper can also generate equality members.
There is also a very practical reason to do this: Anonymous types are convenient to use as hash keys in LINQ joins and groupings. For that reason they require semantically correct Equals and GetHashCode implementations.
For the why part you should ask the language designers...
But I found this in Eric Lippert’s article about Anonymous Types Unify Within An Assembly, Part Two
An anonymous type gives you a convenient place to store a small
immutable set of name/value pairs, but it gives you more than that. It
also gives you an implementation of Equals, GetHashCode and, most
germane to this discussion, ToString. (*)
Where the why part comes in the note:
(*) We give you Equals and GetHashCode so that you can use instances
of anonymous types in LINQ queries as keys upon which to perform
joins. LINQ to Objects implements joins using a hash table for
performance reasons, and therefore we need correct implementations of
Equals and GetHashCode.
The official answer from the C# Language Specification (obtainable here):
The Equals and GetHashcode methods on anonymous types override the methods inherited from object, and are defined in terms of the Equals and GetHashcode of the properties, so that two instances of the same anonymous type are equal if and only if all their properties are equal.
(My emphasis)
The other answers explain why this is done.
It's worth noting that in VB.Net the implementation is different:
An instance of an anonymous types that has no key properties is equal only to itself.
The key properties must be indicated explicitly when creating an anonymous type object. The default is: no key, which can be very confusing for C# users!
These objects aren't equal in VB, but would be in C#-equivalent code:
Dim prod1 = New With {.Name = "paperclips", .Price = 1.29}
Dim prod2 = New With {.Name = "paperclips", .Price = 1.29}
These objects evaluate to "equal":
Dim prod3 = New With {Key .Name = "paperclips", .Price = 1.29}
Dim prod4 = New With {Key .Name = "paperclips", .Price = 2.00}
Because it gives us something that's useful. Consider the following:
var countSameName = from p in PersonInfoStore
group p.Id by new {p.FirstName, p.SecondName} into grp
select new{grp.Key.FirstName, grp.Key.SecondName, grp.Count()};
The works because the implementation of Equals() and GetHashCode() for anonymous types works on the basis of field-by-field equality.
This means the above will be closer to the same query when run against at PersonInfoStore that isn't linq-to-objects. (Still not the same, it'll match what an XML source will do, but not what most databases' collations would result in).
It means we don't have to define an IEqualityComparer for every call to GroupBy which would make group by really hard with anonymous objects - it's possible but not easy to define an IEqualityComparer for anonymous objects - and far from the most natural meaning.
Above all, it doesn't cause problems with most cases.
The third point is worth examining.
When we define a value type, we naturally want a value-based concept of equality. While we may have a different idea of that value-based equality than the default, such as matching a given field case-insensitively, the default is naturally sensible (if poor in performance and buggy in one case*). (Also, reference equality is meaningless in this case).
When we define a reference type, we may or may not want a value-based concept of equality. The default gives us reference equality, but we can easily change that. If we do change it, we can change it for just Equals and GetHashCode or for them and also ==.
When we define an anonymous type, oh wait, we didn't define it, that's what anonymous means! Most of the scenarios in which we care about reference equality aren't there any more. If we're going to be holding an object around for long enough to later wonder if it's the same as another one, we're probably not dealing with an anonymous object. The cases where we care about value-based equality come up a lot. Very often with Linq (GroupBy as we saw above, but also Distinct, Union, GroupJoin, Intersect, SequenceEqual, ToDictionary and ToLookup) and often with other uses (it's not like we weren't doing the things Linq does for us with enumerables in 2.0 and to some extent before then, anyone coding in 2.0 would have written half the methods in Enumerable themselves).
In all, we gain a lot from the way equality works with anonymous classes.
In the off-chance that someone really wants reference equality, == using reference equality means they still have that, so we don't lose anything. It's the way to go.
*The default implementation of Equals() and GetHashCode() has an optimisation that let's it use a binary match in cases where it's safe to do so. Unfortunately there's a bug that makes it sometimes mis-identify some cases as safe for this faster approach when they aren't (or at least it used to, maybe it was fixed). A common case is if you have a decimal field, in a struct, then it'll consider some instances with equivalent fields as unequal.

IStructuralEquatable vs Equals?

according to msdn
IStructuralEquatable
Defines methods to support the comparison of objects for structural
equality. Structural equality means that two objects are equal because
they have equal values. It differs from reference equality, which
indicates that two object references are equal because they reference
the same physical object.
isnt it what Equals should do ? ( when overriding IEquatable) ?
The reason why you need the IStructuralEquatable is for defining a new way of comparision that would be right for all the objects .
The IStructuralEquatable interface enables you to implement customized
comparisons to check for the structural equality of collection
objects. That is, you can create your own definition of structural
equality and specify that this definition be used with a collection
type that accepts the IStructuralEquatable interface.
For example if you want a list that will sort all its elements by a specific definition.
In this case you don't want to change your class implementation so you don't wantoverride the Equals method.
this will define a general way to compare objects in your application.
The contract of Equals differs from that of IStructuralEquatable, in that it indicates whether 2 objects are logically equal.
By default, Equals on a reference type indicates whether two object references reference the same object instance. However, you are able to override Equals according to the logic of your application.
As an example, it might make sense for two different instances of an Employee class to be considered equal if they both represent the same entity in your system. To achieve this, employee objects with matching SSN properties would be treated as logically equal, even if they were not structurally equal.

Why are two empty Lists not Equal?

I thought calling Equals() on two empty Lists would return true, but that's not the case. Could someone explain why?
var lst = new List<Whatever>();
var lst2 = new List<Whatever>();
if(!lst.Equals(lst2))
throw new Exception("seriously?"); // always thrown
Because Equals is checking for references - lst and lst2 are different objects. (note that Equals is inherited from Object and not implemented in List<T>)
You're looking for Linq's SequenceEquals.
Even when using SequenceEquals, don't expect it to work with your Whatever class on non-empty lists (unless it is a struct). You may want to implement a comparer, and use the right overload.
Equals here is comparing reference of two lists which would be different because they are separate lists and that's why it will always be false in this case.
Object documentation (MSDN documentation):
The default implementation of Equals supports reference equality for reference types, and bitwise equality for value types. Reference equality means the object references that are compared refer to the same object. Bitwise equality means the objects that are compared have the same binary representation.
List documentation (MSDN documentation):
Determines whether the specified Object is equal to the current Object. (Inherited from Object.)
You have two different objects (two times new ...) so there not the same.
Because it compares on object identity, not the contents of the list. They are two separate objects.
See this answer from the C# FAQ.
The Equals implementation of List<T> is the inherited one from Object:
The default implementation of Equals supports reference equality for reference types
In other words, since these are two different lists, they have different references, so Equals returns false.
List<T>.Equals() will compare the references of the two lists and return true if they are equal. If you want to compare the elements of two lists, use List<T>.SequenceEquals()
When you compare 2 lists with each other, the equals method will NOT compare the items that are in that list. It will just compare the List object with the other List object. these have their own 'identity'.
They are two different lists allocated somewhere in memory (with new keyword). Therefore they cannot be equal. If you want such functionality you should build your own object inheriting from List and overriding Equals function
In C# and .Net you have reference types and value types.
Value types represent, well, values. integer, double, DateTime and so on.
When you compare value types you compare their actual value, so:
int a = 10;
int b = 10;
if( a == b )
{
// this will fire
}
Note that each variable refers to a new copy, so:
int c = a;
c = c+5;
if( a == c )
{
// this won't, because a==10 and c==15
}
Reference types are objects that you pass around a do things with. You can have more than one variable referring to the the same object, so:
var a = new List<Whatever>();
var b = new List<Whatever>();
if( a == b )
{
// this won't fire, a and be are separate objects
}
var c = a;
c.Add(new Whatever());
if( a == c )
{
// this will, a and c are the same object.
a[0]; // holds the value added to c
}
Finally some special cases of reference types behave like value types, for instance string.
As far as I can see from the documentation .Equals on List is the inherited method from Object which means it checks if the lists are the same object. Since you have made two object they will not be the same.
Two different things cant be the same, even if these things got the same items (or are both empty).
You dont need to be good at programing to understand this ;) Lets say you have a this and that, its not important whats inside this and that. Its just important that a this is not a that or a that is not a this. Thats what you you check there with equals

Categories

Resources