Using Linq Except not Working as I Thought - c#

List1 contains items { A, B } and List2 contains items { A, B, C }.
What I need is to be returned { C } when I use Except Linq extension. Instead I get returned { A, B } and if I flip the lists around in my expression the result is { A, B, C }.
Am I misunderstanding the point of Except? Is there another extension I am not seeing to use?
I have looked through and tried a number of different posts on this matter with no success thus far.
var except = List1.Except(List2); //This is the line I have thus far
EDIT: Yes I was comparing simple objects. I have never used IEqualityComparer, it was interesting to learn about.
Thanks all for the help. The problem was not implementing the comparer. The linked blog post and example below where helpful.

If you are storing reference types in your list, you have to make sure there is a way to compare the objects for equality. Otherwise they will be checked by comparing if they refer to same address.
You can implement IEqualityComparer<T> and send it as a parameter to Except() function. Here's a blog post you may find helpful.
edit: the original blog post link was broken and has been replaced above

So just for completeness...
// Except gives you the items in the first set but not the second
var InList1ButNotList2 = List1.Except(List2);
var InList2ButNotList1 = List2.Except(List1);
// Intersect gives you the items that are common to both lists
var InBothLists = List1.Intersect(List2);
Edit: Since your lists contain objects you need to pass in an IEqualityComparer for your class... Here is what your except will look like with a sample IEqualityComparer based on made up objects... :)
// Except gives you the items in the first set but not the second
var equalityComparer = new MyClassEqualityComparer();
var InList1ButNotList2 = List1.Except(List2, equalityComparer);
var InList2ButNotList1 = List2.Except(List1, equalityComparer);
// Intersect gives you the items that are common to both lists
var InBothLists = List1.Intersect(List2);
public class MyClass
{
public int i;
public int j;
}
class MyClassEqualityComparer : IEqualityComparer<MyClass>
{
public bool Equals(MyClass x, MyClass y)
{
return x.i == y.i &&
x.j == y.j;
}
public int GetHashCode(MyClass obj)
{
unchecked
{
if (obj == null)
return 0;
int hashCode = obj.i.GetHashCode();
hashCode = (hashCode * 397) ^ obj.i.GetHashCode();
return hashCode;
}
}
}

You simply confused the order of arguments. I can see where this confusion arose, because the official documentation isn't as helpful as it could be:
Produces the set difference of two sequences by using the default equality comparer to compare values.
Unless you're versed in set theory, it may not be clear what a set difference actually is—it's not simply what's different between the sets. In reality, Except returns the list of elements in the first set that are not in the second set.
Try this:
var except = List2.Except(List1); // { C }

Writing a custom comparer does seem to solve the problem, but I think https://stackoverflow.com/a/12988312/10042740 is a much more simple and elegant solution.
It overwrites the GetHashCode() and Equals() methods in your object defining class, then the default comparer does its magic without extra code cluttering up the place.

Just for Ref:
I wanted to compare USB Drives connected and available to the system.
So this is the class which implements interface IEqualityComparer
public class DriveInfoEqualityComparer : IEqualityComparer<DriveInfo>
{
public bool Equals(DriveInfo x, DriveInfo y)
{
if (object.ReferenceEquals(x, y))
return true;
if (x == null || y == null)
return false;
// compare with Drive Level
return x.VolumeLabel.Equals(y.VolumeLabel);
}
public int GetHashCode(DriveInfo obj)
{
return obj.VolumeLabel.GetHashCode();
}
}
and you can use it like this
var newDeviceLst = DriveInfo.GetDrives()
.ToList()
.Except(inMemoryDrives, new DriveInfoEqualityComparer())
.ToList();

Related

Linq Except ignoring Custom comparer?

In this toy code:
void Main()
{
var x = new string[] {"abc", "DEF"};
var y = new string[] {"ABC", "def"};
var c = new CompareCI();
var z = x.Except(y, c);
foreach (var s in z) Console.WriteLine(s);
}
private class CompareCI : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return string.Equals(x, y, StringComparison.OrdinalIgnoreCase);
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
It seems like the Except method is ignoring my customer comparer. I get these results:
abc
DEF
Which looks like the case is not being ignored. Also, when I ran it under debug and put a breakpoint at the call to string.Equals in the Customer Comparer, the breakpoint never hit, although the code ran and I got the result I posted. i expected no results, since the sequences are equal if case is ignored.
Guess I'm doing something wrong, but I need a second pair of eyes to spot it.
Debugging your code shows that GetHashCode() is called but not Equals().
I think this is because two equal object must have equal hashcodes AND return true from Equals(). If the hashcodes are different then they cannot be equal, so there is no need to run the Equals() function.
Your code would work if the hashing function was case-insensitive, obj.ToUpper().GetHashCode().
Rui Jarimba's suggestion to use StringComparer.OriginalIgnoreCase works.
Modify your comparer:
public int GetHashCode(string obj)
{
return 0;
}
Now all items will have the same hash: 0 - which means item x and y may be the same so calling Equals is required.
However, this is not recommended because returning just 0 in GetHashCode will cause performance problems.
Best option is to use built-in StringComparer.OrdinalIgnoreCase equality comparer.
.NET Framework already provides a StringComparer Class, that uses specific case and culture-based or ordinal comparison rules - so in this case there is no need to create a custom comparer.
This will work:
var x = new string[] { "abc", "DEF" };
var y = new string[] { "ABC", "def" };
var z = x.Except(y, StringComparer.OrdinalIgnoreCase);

C# Comparing lists using Except

I've tried searching for the answer on the many other questions but they either don't seem to be relevant or I'm just not knowledgeable enough to know that they are.
My problems is comparing two lists (one of twitter followers the of friends(or who you follow))
This is the code I'm using to gather the lists
var friends = user.GetFriends(500).ToList();
var following = user.GetFollowers(500).ToList();
var result = compareFollowingtoFollowers(friends, following);
foreach(var res in result)
{
lstFollowerChecker.Items.Add(res.ScreenName);
}
And this is my compareFollowingtoFollowers function
private List<T> compareFollowingtoFollowers<T>(List<T> friends, List<T> followers)
{
var results = followers.Except(friends).ToList();
return results;
}
My Problem is it doesn't return what I expect, for example if I ran this against my own account where I say 100 friends and I'm following 112 people, It should return the 12 people that are not following but instead it just seems to return them all.
Am I using the correct function? The other questions I've read lead me to believe so.
Thank you for reading
Bryan
UPDATE
The answers given have been enough to get the cogs in my head ticking again, while the answers are still slightly over my head, I think they were just what I needed to better understand why it was returning what it did, Thank you all.
You need to have a custom comparer to compare your objects and pass that in the .Except method as a second parameter. For example
public class User
{
public int Id { get; set; }
}
public class UserComparer : IEqualityComparer<User>
{
public bool Equals(User x, User y)
{
return x.Id == y.Id;
}
public int GetHashCode(User obj)
{
return obj.Id.GetHashCode();
}
}
You are using the correct function, but your user class does not implement IEquatable<UserClass>. This means that Except uses reference semantics to compare two users, and since they seem to be different objects (even if "equal") they compare unequal. So it thinks that there is no overlap at all between the two lists.
The solution is to properly implement IEquatable<T> to give the class your desired equality semantics (how to do this exactly depends on the properties of that class).
You can do something like:
var result = myFriends.Select(x => yourFriends.All(y => y != x));
It is likely you need to specify how to make comparisons of your types.
According to MSDN:
http://msdn.microsoft.com/en-us/library/bb300779(v=vs.110).aspx
Custom types need to implement equals and gethashcode for comparison to be made.

C# group by to eliminate duplicates

Im working with a large dataset of point of interest (POI) which all have Lat/Long values.
I want to filter out POIs that are in close proximity to each other. I think that to achieve this I can round the Lat/Long down to X decimal places and group by the results (or call Distinct() or whatever)...
I have written a little LINQ statement which doesnt seem to do what I want,
var l1 = (from p in PointsOfInterest where p.IsVisibleOnMap select p).Distinct(new EqualityComparer()).ToList();
where EqualityComparer is
public class EqualityComparer : IEqualityComparer<PointOfInterest>
{
public bool Equals(PointOfInterest x, PointOfInterest y)
{
return Math.Round(x.Latitude.Value, 4) == Math.Round(y.Latitude.Value, 4) &&
Math.Round(x.Longitude.Value, 4) == Math.Round(y.Latitude.Value, 4);
}
public int GetHashCode(PointOfInterest obj)
{
return obj.GetHashCode();
}
}
but the Equals method never seems to get called?!?
Any thoughts on the best way to do this?
Equals() never gets called because GetHashCode() returns different values for any two objects because you are using GetHashCode() defined in System.Object class .
You'll need to implement GetHashCode() a little differently.
try something like
public int GetHashCode(PointOfInterest obj)
{
return obj.Longitude.Value.GetHashCode() ^ obj.Latitude.Value.GetHashCode();
}
This is the problem:
public int GetHashCode(PointOfInterest obj)
{
return obj.GetHashCode();
}
You'll have to override GetHashCode() appropriately as well, probably currently all items are considered different because the hash code doesn't match.
From MSDN for IEqualityComparer.GetHashCode():
Implement this method to provide
customized hash codes for
objects,corresponding to the
customized equality comparison
provided by the Equals method.
Any thoughts on the best way to do this?
Simply:
IEnumerable<PointOfInterest> result =
from p in PointsOfInterest
where p.IsVisibleOnMap
group p by new
{
Latitude = Math.Round(p.Latitude.Value, 4),
Longitude = Math.Round(p.Longitude.Value, 4)
} into g
let winner = g.First()
select winner;

Issue with custom EqualityComparer

I have a class with fields ColDescriptionOne(string), ColDescriptionTwo(string) and ColCodelist(int). I want to get the Intersect of two lists of this class where the desc are equal but the codelist is different.
I can use the Where clause and get what I need. However I can't seem to make it work using a custom Comparer like this:
internal class CodeListComparer: EqualityComparer<SheetRow>
{
public override bool Equals(SheetRow x, SheetRow y)
{
return Equals(x.ColDescriptionOne, y.ColDescriptionOne) &&
Equals(x.ColDescriptionSecond, y.ColDescriptionOne)
&& !Equals(x.ColCodelist, y.ColCodelist);
}
public override int GetHashCode(SheetRow obj)
{
return ((obj.ColDescriptionOne.GetHashCode()*397) + (obj.ColDescriptionSecond.GetHashCode()*397)
+ obj.ColCodelist.GetHashCode());
}
}
And then use it like this:
var onylByCodeList = firstSheet.Entries.Intersect(otherSheet.Entries, new CodeListComparer());
Any ideas what I'm doing wrong here ?
thanks
Sunit
You have a typo in the Equals method. The second line is comparing ColDescriptionOne to ColDescriptionSecond. They should both be ColDescriptionSecond.
return Equals(x.ColDescriptionOne, y.ColDescriptionOne)
&& Equals(x.ColDescriptionSecond, y.ColDescriptionSecond)
&& !Equals(x.ColCodelist, y.ColCodelist);
The second problem you have is you are including the ColCodeList in the GetHashCode method. The GetHashCode method must return the same value for objects that are equal. In this case though ColCodeList is supposed to be different when values are equal. This means that in cases where you want 2 objects to be considered equal they are more likely to have different hash codes which is incorrect.
Take that out of the GetHashCode method and everything should work.
public override int GetHashCode(SheetRow obj)
{
return ((obj.ColDescriptionOne.GetHashCode()*397)
+ (obj.ColDescriptionSecond.GetHashCode()*397));
}

How does Assert.AreEqual determine equality between two generic IEnumerables?

I have a unit test to check whether a method returns the correct IEnumerable. The method builds the enumerable using yield return. The class that it is an enumerable of is below:
enum TokenType
{
NUMBER,
COMMAND,
ARITHMETIC,
}
internal class Token
{
public TokenType type { get; set; }
public string text { get; set; }
public static bool operator == (Token lh, Token rh) { return (lh.type == rh.type) && (lh.text == rh.text); }
public static bool operator != (Token lh, Token rh) { return !(lh == rh); }
public override int GetHashCode()
{
return text.GetHashCode() % type.GetHashCode();
}
public override bool Equals(object obj)
{
return this == (Token)obj;
}
}
This is the relevant part of the method:
foreach (var lookup in REGEX_MAPPING)
{
if (lookup.re.IsMatch(s))
{
yield return new Token { type = lookup.type, text = s };
break;
}
}
If I store the result of this method in actual, make another enumerable expected, and compare them like this...
Assert.AreEqual(expected, actual);
..., the assertion fails.
I wrote an extension method for IEnumerable that is similar to Python's zip function (it combines two IEnumerables into a set of pairs) and tried this:
foreach(Token[] t in expected.zip(actual))
{
Assert.AreEqual(t[0], t[1]);
}
It worked! So what is the difference between these two Assert.AreEquals?
Found it:
Assert.IsTrue(expected.SequenceEqual(actual));
Have you considered using the CollectionAssert class instead...considering that it is intended to perform equality checks on collections?
Addendum:
If the 'collections' being compared are enumerations, then simply wrapping them with 'new List<T>(enumeration)' is the easiest way to perform the comparison. Constructing a new list causes some overhead of course, but in the context of a unit test this should not matter too much I hope?
Assert.AreEqual is going to compare the two objects at hand. IEnumerables are types in and of themselves, and provide a mechanism to iterate over some collection...but they are not actually that collection. Your original comparison compared two IEnumerables, which is a valid comparison...but not what you needed. You needed to compare what the two IEnumerables were intended to enumerate.
Here is how I compare two enumerables:
Assert.AreEqual(t1.Count(), t2.Count());
IEnumerator<Token> e1 = t1.GetEnumerator();
IEnumerator<Token> e2 = t2.GetEnumerator();
while (e1.MoveNext() && e2.MoveNext())
{
Assert.AreEqual(e1.Current, e2.Current);
}
I am not sure whether the above is less code than your .Zip method, but it is about as simple as it gets.
I think the simplest and clearest way to assert the equality you want is a combination of the answer by jerryjvl and comment on his post by MEMark - combine CollectionAssert.AreEqual with extension methods:
CollectionAssert.AreEqual(expected.ToArray(), actual.ToArray());
This gives richer error information than the SequenceEqual answer suggested by the OP (it will tell you which element was found that was unexpected). For example:
IEnumerable<string> expected = new List<string> { "a", "b" };
IEnumerable<string> actual = new List<string> { "a", "c" }; // mismatching second element
CollectionAssert.AreEqual(expected.ToArray(), actual.ToArray());
// Helpful failure message!
// CollectionAssert.AreEqual failed. (Element at index 1 do not match.)
Assert.IsTrue(expected.SequenceEqual(actual));
// Mediocre failure message:
// Assert.IsTrue failed.
You'll be really pleased you did it this way if/when your test fails - sometimes you can even know what's wrong without having to break out the debugger - and hey you're doing TDD right, so you write a failing test first, right? ;-)
The error messages get even more helpful if you're using AreEquivalent to test for equivalence (order doesn't matter):
CollectionAssert.AreEquivalent(expected.ToList(), actual.ToList());
// really helpful error message!
// CollectionAssert.AreEquivalent failed. The expected collection contains 1
// occurrence(s) of <b>. The actual collection contains 0 occurrence(s).

Categories

Resources