C# Comparing lists using Except - c#

I've tried searching for the answer on the many other questions but they either don't seem to be relevant or I'm just not knowledgeable enough to know that they are.
My problems is comparing two lists (one of twitter followers the of friends(or who you follow))
This is the code I'm using to gather the lists
var friends = user.GetFriends(500).ToList();
var following = user.GetFollowers(500).ToList();
var result = compareFollowingtoFollowers(friends, following);
foreach(var res in result)
{
lstFollowerChecker.Items.Add(res.ScreenName);
}
And this is my compareFollowingtoFollowers function
private List<T> compareFollowingtoFollowers<T>(List<T> friends, List<T> followers)
{
var results = followers.Except(friends).ToList();
return results;
}
My Problem is it doesn't return what I expect, for example if I ran this against my own account where I say 100 friends and I'm following 112 people, It should return the 12 people that are not following but instead it just seems to return them all.
Am I using the correct function? The other questions I've read lead me to believe so.
Thank you for reading
Bryan
UPDATE
The answers given have been enough to get the cogs in my head ticking again, while the answers are still slightly over my head, I think they were just what I needed to better understand why it was returning what it did, Thank you all.

You need to have a custom comparer to compare your objects and pass that in the .Except method as a second parameter. For example
public class User
{
public int Id { get; set; }
}
public class UserComparer : IEqualityComparer<User>
{
public bool Equals(User x, User y)
{
return x.Id == y.Id;
}
public int GetHashCode(User obj)
{
return obj.Id.GetHashCode();
}
}

You are using the correct function, but your user class does not implement IEquatable<UserClass>. This means that Except uses reference semantics to compare two users, and since they seem to be different objects (even if "equal") they compare unequal. So it thinks that there is no overlap at all between the two lists.
The solution is to properly implement IEquatable<T> to give the class your desired equality semantics (how to do this exactly depends on the properties of that class).

You can do something like:
var result = myFriends.Select(x => yourFriends.All(y => y != x));

It is likely you need to specify how to make comparisons of your types.
According to MSDN:
http://msdn.microsoft.com/en-us/library/bb300779(v=vs.110).aspx
Custom types need to implement equals and gethashcode for comparison to be made.

Related

Elegant way to query a dictionary in C#

I am trying to create an elegant and extensible way of querying a dictionary which maps an enum to a set of strings.
So I have this class SearchFragments that has the dictionary in it. I then want a method wherein consumers of this class can simply ask "HasAny" and, this is the bit where I am struggling, simply pass in some query like expression and get the boolean answer back.
public class SearchFragments
{
private readonly IDictionary<SearchFragmentEnum, IEnumerable<string>> _fragments;
public SearchFragments()
{
_fragments = new Dictionary<SearchFragmentEnum, IEnumerable<string>>();
}
public bool HasAny(IEnumerable<SearchFragmentEnum> of)
{
int has = 0;
_fragments.ForEach(x => of.ForEach(y => has += x.Key == y ? 1 : 0));
return has >= 1;
}
}
The problem with the way this currently is, is that consumers of this class now have to construct an IEnumerable<SearchFragmentEnum> which can be quite messy.
What I am looking for is that the consuming code will be able to write something along the lines of:
searchFragments.HasAny(SearchFragmentEnum.Name, SearchFragmentEnum.PhoneNumber)
But where that argument list can vary in size (without me having to write method overloads in the SearchFragments class for every possible combination (such that if new values are added to the SearchFragmentEnum at a future date I won't have to update the class.
You can use params[]
public bool HasAny(params SearchFragmentEnum[] of)
{ ...
Sidenote: you know that LIN(Q) queries should just query a source and never cause any side-effects? But your query does unnecessarily increment the integer:
_fragments.ForEach(x => of.ForEach(y => has += x.Key == y ? 1 : 0));
Instead use this (which is also more efficient and more readable):
return _fragments.Keys.Intersect(of).Any();
An even more efficient alternative to this is Sergey's idea:
return of?.Any(_fragments.ContainsKey) == true;
For variable sized arguments in c# you use the params keyword:
public int HasAny(params SearchFragmentEnum[] of)
The .Net API usually offers a couple of overloads of this for performance reasons; the parameters passed are copied into a new array. Explicitely providing overloads for the most common cases avoids this.
public int HasAny(SearchfragmentEnum of1)
public int HasAny(SearchFragmentEnum of1, SearchFragmentEnum of2)
etc.
Instead of using params you could also consider marking your enum with the [Flags] attribute. Parameters could than be passed like HasAny(SearchFragmentEnum.Name | SearchFragmentEnum.PhoneNumber. Examples abundant on StackOverflow (e.g. Using a bitmask in C#)
Use the params keyword to allow a varying number of arguments. Further, you can simplify your code by looping over the smaller of array. Also, you are using a dictionary that has O(1) key check, so it is uneccessary to have an inner loop:
public bool HasAny(params SearchFragmentEnum[] of)
{
foreach(var o in of) {
if (this._fragments.ContainsKey(o))
return true;
}
return false;
}
or shorter with LINQ
public bool HasAny(params SearchFragmentEnum[] of) {
return of?.Any(_fragments.ContainsKey) ?? false;
}

Implementing a content-hashable HashSet in C# (like python's `frozenset`)

Brief summary
I want to build a set of sets of items in C#. The inner sets of items have a GetHashCode and Equals method defined by their contents. In mathematical notation:
x = { }
x.Add( { A, B, C } )
x.Add( { A, D } )
x.Add( { B, C, A } )
now x should be{ { A, B, C }, { A, D } }
In python, this could be accomplished with frozenset:
x = set()
x.add( frozenset(['A','B','C']) )
x.add( frozenset(['A','D']) )
x.add( frozenset(['B','C','A']) )
/BriefSummary
I would like to have a hashable HashSet in C#. This would allow me to do:
HashSet<ContentHashableHashSet<int>> setOfSets;
Although there are more sophisticated ways to accomplish this, This can be trivially achieved in practice (although not in the most efficient manner) by adding overriding ContentHashableHashSet.ToString() (outputing the strings of the elements contained in sorted order) and then using then using ContentHashableHashSet.ToString().GetHashCode() as the hash code.
However, if an object is modified after placement in setOfSets, it could result in multiple copies:
var setA = new ContentHashableHashSet<int>();
setA.Add(1);
setA.Add(2);
var setB = new ContentHashableHashSet<int>();
setB.Add(1);
setOfSets.Add(setA);
setOfSets.Add(setB);
setB.Add(2); // now there are duplicate members!
As far as I can see, I have two options: I can derive ContentHashableHashSet from HashSet, but then I will need to make it so that all modifiers throw an exception. Missing one modifier could cause an insidious bug.
Alternatively, I can use encapsulation and class ContentHashableHashSet can contain a readonly HashSet. But then I would need to reimplement all set methods (except modifiers) so that the ContentHashableHashSet can behave like a HashSet. As far as I know, extensions would not apply.
Lastly, I could encapsulate as above and then all set-like functionality will occur by returning the const (or readonly?) HashSet member.
In hindsight, this is reminiscent of python's frozenset. Does anyone know of a well-designed way to implement this in C#?
If I was able to lose ISet functionality, then I would simply create a sorted ImmutableList, but then I would lose functionality like fast union, fast intersection, and sub-linear ( roughly O(log(n)) ) set membership with Contains.
EDIT: The base class HashSet does not have virtual Add and Remove methods, so overriding them will work within the derived class, but will not work if you perform HashSet<int> set = new ContentHashableHashSet<int>();. Casting to the base class will allow editing.
EDIT 2: Thanks to #xanatos for recommending a simple GetHashCode implementation:
The easiest way to calculate the GetHashCode is to simply xor (^) all the gethashcodes of the elements. The xor operator is commutative, so the ordering is irrelevant. For the comparison you can use the SetEquals
EDIT 3: Someone recently shared information about ImmutableHashSet, but because this class is sealed, it is not possible to derive from it and override GetHashCode.
I was also told that HashSet takes an IEqualityComparer as an argument, and so this can be used to provide an immutable, content-hashable set without deriving from ImmutableHashSet; however, this is not a very object oriented solution: every time I want to use a ContentHashableHashSet, I will need to pass the same (non-trivial) argument. As I'm sure you know, this can really wreak havoc with your coding zen, and where I would be flying along in python with myDictionary[ frozenset(mySet) ] = myValue, I will be stuck doing the same thing again and again and again.
Thanks for any help you can provide. I have a temporary workaround (whose problems are mentioned in EDIT 1 above), but I'd mostly like to learn about the best way to design something like this.
Hide the elements of your set of sets so that they can't be changed. That means copying when you add/retrieve sets, but maybe that's acceptable?
// Better make sure T is immutable too, else set hashes could change
public class SetofSets<T>
{
private class HashSetComparer : IEqualityComparer<HashSet<T>>
{
public int GetHashCode(HashSet<T> x)
{
return x.Aggregate(1, (code,elt) => code ^ elt.GetHashCode());
}
public bool Equals(HashSet<T> x, HashSet<T> y)
{
if (x==null)
return y==null;
return x.SetEquals(y);
}
}
private HashSet<HashSet<T>> setOfSets;
public SetofSets()
{
setOfSets = new HashSet<HashSet<T>>(new HashSetComparer());
}
public void Add(HashSet<T> set)
{
setOfSets.Add(new HashSet<T>(set));
}
public bool Contains(HashSet<T> set)
{
return setOfSets.Contains(set);
}
}

Using Linq Except not Working as I Thought

List1 contains items { A, B } and List2 contains items { A, B, C }.
What I need is to be returned { C } when I use Except Linq extension. Instead I get returned { A, B } and if I flip the lists around in my expression the result is { A, B, C }.
Am I misunderstanding the point of Except? Is there another extension I am not seeing to use?
I have looked through and tried a number of different posts on this matter with no success thus far.
var except = List1.Except(List2); //This is the line I have thus far
EDIT: Yes I was comparing simple objects. I have never used IEqualityComparer, it was interesting to learn about.
Thanks all for the help. The problem was not implementing the comparer. The linked blog post and example below where helpful.
If you are storing reference types in your list, you have to make sure there is a way to compare the objects for equality. Otherwise they will be checked by comparing if they refer to same address.
You can implement IEqualityComparer<T> and send it as a parameter to Except() function. Here's a blog post you may find helpful.
edit: the original blog post link was broken and has been replaced above
So just for completeness...
// Except gives you the items in the first set but not the second
var InList1ButNotList2 = List1.Except(List2);
var InList2ButNotList1 = List2.Except(List1);
// Intersect gives you the items that are common to both lists
var InBothLists = List1.Intersect(List2);
Edit: Since your lists contain objects you need to pass in an IEqualityComparer for your class... Here is what your except will look like with a sample IEqualityComparer based on made up objects... :)
// Except gives you the items in the first set but not the second
var equalityComparer = new MyClassEqualityComparer();
var InList1ButNotList2 = List1.Except(List2, equalityComparer);
var InList2ButNotList1 = List2.Except(List1, equalityComparer);
// Intersect gives you the items that are common to both lists
var InBothLists = List1.Intersect(List2);
public class MyClass
{
public int i;
public int j;
}
class MyClassEqualityComparer : IEqualityComparer<MyClass>
{
public bool Equals(MyClass x, MyClass y)
{
return x.i == y.i &&
x.j == y.j;
}
public int GetHashCode(MyClass obj)
{
unchecked
{
if (obj == null)
return 0;
int hashCode = obj.i.GetHashCode();
hashCode = (hashCode * 397) ^ obj.i.GetHashCode();
return hashCode;
}
}
}
You simply confused the order of arguments. I can see where this confusion arose, because the official documentation isn't as helpful as it could be:
Produces the set difference of two sequences by using the default equality comparer to compare values.
Unless you're versed in set theory, it may not be clear what a set difference actually is—it's not simply what's different between the sets. In reality, Except returns the list of elements in the first set that are not in the second set.
Try this:
var except = List2.Except(List1); // { C }
Writing a custom comparer does seem to solve the problem, but I think https://stackoverflow.com/a/12988312/10042740 is a much more simple and elegant solution.
It overwrites the GetHashCode() and Equals() methods in your object defining class, then the default comparer does its magic without extra code cluttering up the place.
Just for Ref:
I wanted to compare USB Drives connected and available to the system.
So this is the class which implements interface IEqualityComparer
public class DriveInfoEqualityComparer : IEqualityComparer<DriveInfo>
{
public bool Equals(DriveInfo x, DriveInfo y)
{
if (object.ReferenceEquals(x, y))
return true;
if (x == null || y == null)
return false;
// compare with Drive Level
return x.VolumeLabel.Equals(y.VolumeLabel);
}
public int GetHashCode(DriveInfo obj)
{
return obj.VolumeLabel.GetHashCode();
}
}
and you can use it like this
var newDeviceLst = DriveInfo.GetDrives()
.ToList()
.Except(inMemoryDrives, new DriveInfoEqualityComparer())
.ToList();

C#: Returning 'this' for method nesting?

I have a class that I have to call one or two methods a lot of times after each other. The methods currently return void. I was thinking, would it be better to have it return this, so that the methods could be nested? or is that considerd very very very bad? or if bad, would it be better if it returned a new object of the same type? Or what do you think? As an example I have created three versions of an adder class:
// Regular
class Adder
{
public Adder() { Number = 0; }
public int Number { get; private set; }
public void Add(int i) { Number += i; }
public void Remove(int i) { Number -= i; }
}
// Returning this
class Adder
{
public Adder() { Number = 0; }
public int Number { get; private set; }
public Adder Add(int i) { Number += i; return this; }
public Adder Remove(int i) { Number -= i; return this; }
}
// Returning new
class Adder
{
public Adder() : this(0) { }
private Adder(int i) { Number = i; }
public int Number { get; private set; }
public Adder Add(int i) { return new Adder(Number + i); }
public Adder Remove(int i) { return new Adder(Number - i); }
}
The first one can be used this way:
var a = new Adder();
a.Add(4);
a.Remove(1);
a.Add(7);
a.Remove(3);
The other two can be used this way:
var a = new Adder()
.Add(4)
.Remove(1)
.Add(7)
.Remove(3);
Where the only difference is that a in the first case is the new Adder() while in the latter it is the result of the last method.
The first I find that quickly become... annoying to write over and over again. So I would like to use one of the other versions.
The third works kind of like many other methods, like many String methods and IEnumerable extension methods. I guess that has its positive side in that you can do things like var a = new Adder(); var b = a.Add(5); and then have one that was 0 and one that was 5. But at the same time, isn't it a bit expensive to create new objects all the time? And when will the first object die? When the first method returns kind of? Or?
Anyways, I like the one that returns this and think I will use that, but I am very curious to know what others think about this case. And what is considered best practice.
The 'return this' style is sometimes called a fluent interface and is a common practice.
I like "fluent syntax" and would take the second one. After all, you could still use it as the first, for people who feel uncomfortable with fluent syntax.
another idea to make an interface like the adders one easier to use:
public Adder Add(params int[] i) { /* ... */ }
public Adder Remove(params int[] i) { /* ... */ }
Adder adder = new Adder()
.Add(1, 2, 3)
.Remove(3, 4);
I always try to make short and easy-to-read interfaces, but many people like to write the code as complicated as possible.
Chaining is a nice thing to have and is core in some frameworks (for instance Linq extensions and jQuery both use it heavily).
Whether you create a new object or return this depends on how you expect your initial object to behave:
var a = new Adder();
var b = a.Add(4)
.Remove(1)
.Add(7)
.Remove(3);
//now - should a==b ?
Chaining in jQuery will have changed your original object - it has returned this.
That's expected behaviour - do do otherwise would basically clone UI elements.
Chaining in Linq will have left your original collection unchanged. That too is expected behaviour - each chained function is a filter or transformation, and the original collection is often immutable.
Which pattern better suits what you're doing?
I think that for simple interfaces, the "fluent" interface is very useful, particularly because it is very simple to implement. The value of the fluent interface is that it eliminates a lot of the extraneous fluff that gets in the way of understanding. Developing such an interface can take a lot of time, especially when the interface starts to be involved. You should worry about how the usage of the interface "reads"; In my mind, the most compelling use for such an interface is how it communicates the intent of the programmer, not the amount of characters that it saves.
To answer your specific question, I like the "return this" style. My typical use of the fluent interface is to define a set of options. That is, I create an instance of the class and then use the fluent methods on the instance to define the desired behavior of the object. If I have a yes/no option (say for logging), I try not to have a "setLogging(bool state)" method but rather two methods "WithLogging" and "WithoutLogging". This is somewhat more work but the clarity of the final result is very useful.
Consider this: if you come back to this code in 5 years, is this going to make sense to you? If so, then I suppose you can go ahead.
For this specific example, though, it would seem that overloading the + and - operators would make things clearer and accomplish the same thing.
For your specific case, overloading the arithmetic operators would be probably the best solution.
Returning this (Fluent interface) is common practice to create expressions - unit testing and mocking frameworks use this a lot. Fluent Hibernate is another example.
Returning a new instance might be a good choice, too. It allows you to make your class immutable - in general a good thing and very handy in the case of multithreading. But think about the object creation overhead if immutability is of no use for you.
If you call it Adder, I'd go with returning this. However, it's kind of strange for an Adder class to contain an answer.
You might consider making it something like MyNumber and create an Add()-method.
Ideally (IMHO), that would not change the number that is stored inside your instance, but create a new instance with the new value, which you return:
class MyNumber
{
...
MyNumber Add( int i )
{
return new MyNumber( this.Value + i );
}
}
The main difference between the second and third solution is that by returning a new instance instead of this you are able to "catch" the object in a certain state and continue from that.
var a = new Adder()
.Add(4);
var b = a.Remove(1);
var c = a.Add(7)
.Remove(3);
In this case both b and c have the state captured in a as a starting point.
I came across this idiom while reading about a pattern for building test domain objects in Growing Object-Oriented Software, Guided by Tests by Steve Freeman; Nat Pryce.
On your question regarding the lifetime of your instances: I would exspect them to be elligible for garbage collection as soon as the invocation of Remove or Add are returning.

How does Assert.AreEqual determine equality between two generic IEnumerables?

I have a unit test to check whether a method returns the correct IEnumerable. The method builds the enumerable using yield return. The class that it is an enumerable of is below:
enum TokenType
{
NUMBER,
COMMAND,
ARITHMETIC,
}
internal class Token
{
public TokenType type { get; set; }
public string text { get; set; }
public static bool operator == (Token lh, Token rh) { return (lh.type == rh.type) && (lh.text == rh.text); }
public static bool operator != (Token lh, Token rh) { return !(lh == rh); }
public override int GetHashCode()
{
return text.GetHashCode() % type.GetHashCode();
}
public override bool Equals(object obj)
{
return this == (Token)obj;
}
}
This is the relevant part of the method:
foreach (var lookup in REGEX_MAPPING)
{
if (lookup.re.IsMatch(s))
{
yield return new Token { type = lookup.type, text = s };
break;
}
}
If I store the result of this method in actual, make another enumerable expected, and compare them like this...
Assert.AreEqual(expected, actual);
..., the assertion fails.
I wrote an extension method for IEnumerable that is similar to Python's zip function (it combines two IEnumerables into a set of pairs) and tried this:
foreach(Token[] t in expected.zip(actual))
{
Assert.AreEqual(t[0], t[1]);
}
It worked! So what is the difference between these two Assert.AreEquals?
Found it:
Assert.IsTrue(expected.SequenceEqual(actual));
Have you considered using the CollectionAssert class instead...considering that it is intended to perform equality checks on collections?
Addendum:
If the 'collections' being compared are enumerations, then simply wrapping them with 'new List<T>(enumeration)' is the easiest way to perform the comparison. Constructing a new list causes some overhead of course, but in the context of a unit test this should not matter too much I hope?
Assert.AreEqual is going to compare the two objects at hand. IEnumerables are types in and of themselves, and provide a mechanism to iterate over some collection...but they are not actually that collection. Your original comparison compared two IEnumerables, which is a valid comparison...but not what you needed. You needed to compare what the two IEnumerables were intended to enumerate.
Here is how I compare two enumerables:
Assert.AreEqual(t1.Count(), t2.Count());
IEnumerator<Token> e1 = t1.GetEnumerator();
IEnumerator<Token> e2 = t2.GetEnumerator();
while (e1.MoveNext() && e2.MoveNext())
{
Assert.AreEqual(e1.Current, e2.Current);
}
I am not sure whether the above is less code than your .Zip method, but it is about as simple as it gets.
I think the simplest and clearest way to assert the equality you want is a combination of the answer by jerryjvl and comment on his post by MEMark - combine CollectionAssert.AreEqual with extension methods:
CollectionAssert.AreEqual(expected.ToArray(), actual.ToArray());
This gives richer error information than the SequenceEqual answer suggested by the OP (it will tell you which element was found that was unexpected). For example:
IEnumerable<string> expected = new List<string> { "a", "b" };
IEnumerable<string> actual = new List<string> { "a", "c" }; // mismatching second element
CollectionAssert.AreEqual(expected.ToArray(), actual.ToArray());
// Helpful failure message!
// CollectionAssert.AreEqual failed. (Element at index 1 do not match.)
Assert.IsTrue(expected.SequenceEqual(actual));
// Mediocre failure message:
// Assert.IsTrue failed.
You'll be really pleased you did it this way if/when your test fails - sometimes you can even know what's wrong without having to break out the debugger - and hey you're doing TDD right, so you write a failing test first, right? ;-)
The error messages get even more helpful if you're using AreEquivalent to test for equivalence (order doesn't matter):
CollectionAssert.AreEquivalent(expected.ToList(), actual.ToList());
// really helpful error message!
// CollectionAssert.AreEquivalent failed. The expected collection contains 1
// occurrence(s) of <b>. The actual collection contains 0 occurrence(s).

Categories

Resources