List<T>.Distinct() in C# - multiple criteria for EqualityComparer? - c#

I have a collection of objects which have several properties in each of them. I often need to get a list of distinct values for many properties in this collection. If I implement IEqualityComparer on this type , it gives me one single criteria for getting the distinct objects in the collection. How do I get to be able to call Distinct on multiple criteria for this collection ?
For example ,
class Product {
string name ;
string code ;
string supplier ;
//etc
}
Imagine a list of such product objects.
Sometimes , I want to get list of distinct names in the list , and at some oter time , a list of distinct supplier . etc.
If I call Distinct on a list of these products , based on the way IEqualityComparer is implemented , it will always use the same criteria , which is not going to serve my purpose.

Simply provide different IEqualityComparer implementations for different calls to Distinct. Note the difference between IEquatable and IEqualityComparer - usually a type shouldn't implement IEqualityComparer for itself (so Product wouldn't implement IEqualityComparer<Product>). You'd have different implementations, such as ProductNameComparer, ProductCodeComparer etc.
However, another alternative is to use DistinctBy in MoreLINQ
var distinctProducts = products.DistinctBy(p => p.name);

You can use the Distinct() overload that accepts an IEqualityComparer argument.

You could also create a comparer that accepts function arguments for the Equals and GetHashCode methods. Something like
class Foo
{
public string Name { get; set; }
public int Id { get; set; }
}
class FooComparer : IEqualityComparer<Foo>
{
public FooComparer(Func<Foo, Foo, bool> equalityComparer, Func<Foo, int> getHashCode)
{
EqualityComparer = equalityComparer;
HashCodeGenerator = getHashCode;
}
Func<Foo, Foo, bool> EqualityComparer;
Func<Foo, int> HashCodeGenerator;
public bool Equals(Foo x, Foo y)
{
return EqualityComparer(x, y);
}
public int GetHashCode(Foo obj)
{
return HashCodeGenerator(obj);
}
}
...
List<Foo> foos = new List<Foo>() { new Foo() { Name = "A", Id = 4 }, new Foo() { Name = "B", Id = 4 } };
var list1 = foos.Distinct(new FooComparer((x, y) => x.Id == y.Id, f => f.Id.GetHashCode()));

Related

check if algorithm "saw" the class before [duplicate]

I am populating an array with instances of a class:
BankAccount[] a;
. . .
a = new BankAccount[]
{
new BankAccount("George Smith", 500m),
new BankAccount("Sid Zimmerman", 300m)
};
Once I populate this array, I would like to sort it by balance amounts. In order to do that, I would like to be able to check whether each element is sortable using IComparable.
I need to do this using interfaces. So far I have the following code:
public interface IComparable
{
decimal CompareTo(BankAccount obj);
}
But I'm not sure if this is the right solution. Any advice?
You should not define IComparable yourself. It is already defined. Rather, you need to implement IComparable on your BankAccount class.
Where you defined the class BankAccount, make sure it implements the IComparable interface. Then write BankAccount.CompareTo to compare the balance amounts of the two objects.
public class BankAccount : IComparable<BankAccount>
{
[...]
public int CompareTo(BankAccount that)
{
if (this.Balance < that.Balance) return -1;
if (this.Balance == that.Balance) return 0;
return 1;
}
}
Edit to show Jeffrey L Whitledge's solution from comments:
public class BankAccount : IComparable<BankAccount>
{
[...]
public int CompareTo(BankAccount that)
{
return this.Balance.CompareTo(that.Balance);
}
}
IComparable already exists in .NET with this definition of CompareTo
int CompareTo(Object obj)
You are not supposed to create the interface -- you are supposed to implement it.
public class BankAccount : IComparable {
int CompareTo(Object obj) {
// return Less than zero if this object
// is less than the object specified by the CompareTo method.
// return Zero if this object is equal to the object
// specified by the CompareTo method.
// return Greater than zero if this object is greater than
// the object specified by the CompareTo method.
}
}
Do you want to destructively sort the array? That is, do you want to actually change the order of the items in the array? Or do you just want a list of the items in a particular order, without destroying the original order?
I would suggest that it is almost always better to do the latter. Consider using LINQ for a non-destructive ordering. (And consider using a more meaningful variable name than "a".)
BankAccount[] bankAccounts = { whatever };
var sortedByBalance = from bankAccount in bankAccounts
orderby bankAccount.Balance
select bankAccount;
Display(sortedByBalance);
An alternative is to use LINQ and skip implementing IComparable altogether:
BankAccount[] sorted = a.OrderBy(ba => ba.Balance).ToArray();
There is already IComparable<T>, but you should ideally support both IComparable<T> and IComparable. Using the inbuilt Comparer<T>.Default is generally an easier option. Array.Sort, for example, will accept such a comparer.
If you only need to sort these BankAccounts, use LINQ like following
BankAccount[] a = new BankAccount[]
{
new BankAccount("George Smith", 500m),
new BankAccount("Sid Zimmerman", 300m)
};
a = a.OrderBy(bank => bank.Balance).ToArray();
If you need to compare multiple fields, you can get some help from the compiler by using the new tuple syntax:
public int CompareTo(BankAccount other) =>
(Name, Balance).CompareTo(
(other.Name, other.Balance));
This scales to any number of properties, and it will compare them one-by-one as you would expect, saving you from having to implement many if-statements.
Note that you can use this tuple syntax to implement other members as well, for example GetHashCode. Just construct the tuple and call GetHashCode on it.
This is an example to the multiple fields solution provided by #Daniel Lidström by using tuple:
public static void Main1()
{
BankAccount[] accounts = new BankAccount[]
{
new BankAccount()
{
Name = "Jack", Balance =150.08M
}, new BankAccount()
{
Name = "James",Balance =70.45M
}, new BankAccount()
{
Name = "Mary",Balance =200.01M
}, new BankAccount()
{
Name = "John",Balance =200.01M
}};
Array.Sort(accounts);
Array.ForEach(accounts, x => Console.WriteLine($"{x.Name} {x.Balance}"));
}
}
public class BankAccount : IComparable<BankAccount>
{
public string Name { get; set; }
public int Balance { get; set; }
public int CompareTo(BankAccount other) =>
(Balance,Name).CompareTo(
(other.Balance,other.Name ));
}
Try it

How to update the property to true or false based on comparing two list

I have two list
class obj1
{
public string country{ get; set; }
public string region{ get; set; }
}
class obj2
{
public string country{ get; set; }
public string region { get; set; }
public string XYZ { get; set; }
public bool ToBeChanged{ get; set; }
}
first list looks like:
List<obj1> alist = new List<obj1>();
alist.Add("US", "NC");
alist.Add("US", "SC");
alist.Add("US", "NY");
second list (List<obj2> alist2) may make 1000 of entries with many combination of country and region.
I need to update the property "ToBeChanged" to "True" if second (alist2) list properties (country and region) matches to first(alist1) and false in otherwise.
Please help.
Thanks,
Vaibhav
Two points from the comments, and my thoughts:
Some aren't sure exactly what your matching criteria is. But to me it seems fairly clear that you're matching on 'country' and 'region'. Nevertheless, in the future, state this explicitly.
You got one comment criticizing your choice of variable names. That criticism is fully justified. Code is far easier to maintain when you have little hints as to what it's doing, and variable names are crucial for that.
Now, regarding my particular solution:
In the code below, I've renamed some of your objects to make them clear in their purpose. I'd like to rename 'obj2', but I'll leave that to you because I'm not exactly sure what you're intending to do with it, and I definitely don't know what 'XYZ' is for. Here are the renamed classes, with some added constructors to aid in list construction.
class RegionInfo {
public RegionInfo(string country, string region) {
this.country = country;
this.region = region;
}
public string country{ get; set; }
public string region{ get; set; }
}
class obj2 {
public obj2 (string country, string region, string XYZ) {
this.country = country;
this.region = region;
this.XYZ = XYZ;
}
public string country{ get; set; }
public string region { get; set; }
public string XYZ { get; set; }
public bool ToBeChanged{ get; set; }
}
I'm using a LINQ Join to match the two lists, outputting only the 'obj2' side of the join, and then looping the result to toggle the 'ToBeChanged' value.
var regionInfos = new List<RegionInfo>() {
new RegionInfo("US", "NC"),
new RegionInfo("US", "SC"),
new RegionInfo("US", "NY")
};
var obj2s = new List<obj2> {
new obj2("US", "NC", "What am I for?"),
new obj2("US", "SC", "Like, am I supposed to be the new value?"),
new obj2("CA", "OT", "XYZ doesn't have a stated purpose")
};
var obj2sToChange = obj2s
.Join(
regionInfos,
o2 => new { o2.country, o2.region },
reg => new { reg.country, reg.region },
(o2,reg) => o2
);
foreach (var obj2 in obj2sToChange)
obj2.ToBeChanged = true;
obj2s.Dump(); // using Linqpad, but you do what works to display
This results in:
country
region
XYZ
ToBeChanged
US
NC
What am I for?
True
US
SC
Like, am I supposed to be the new value?
True
CA
OT
XYZ doesn't have a stated purpose
False
First of all, with LINQ you can never change the source. You can only extract data from the source. After that you can use the extracted data to update the source.
I need to update the property "ToBeChanged" to "True" if second (alist2) list properties (country and region) matches to first(alist1) and false in otherwise.
This is not a proper requirement. alist1 is a sequence of obj1 objects. I think, that you want the property ToBeChanged of a certain obj2 to be true if any of the obj1 items in alist1 has a [country, region] combination that matches the [country, region] combination of the obj2 concerned.
requirement Get all obj2 in alist2, that have a [country, region] combination that matches any of the [country, region] combinations of the obj1 objects in alist1.
You probably thought about using Where for this. Something like "Where [country, region] combination in the other list". Whenever you need to find out if an item is in another list, consider to use one of the overloads of Enumerable.Contains
The problem is, that the [Country, Region] combination in every obj2 can be converted to an object of class obj1, but if you want to check if they are equal, you will have a compare by reference, while you want a compare by value.
There are two solutions for this:
create an EqualityComparer that compares obj1 by Value
create [Country, Region] as anonymous type. Anonymous types always compare by value.
The latter is the most easy, so we'll do that one first.
Use anonymous types for comparison
First convert alist into anonymous type containing [Country, Region] combinations:
var eligibleCountryRegionCombinations = alist.Select(obj1 => new
{
Country = obj1.Country,
Region = obj1.Region,
});
Note that I don't use ToList at the end: the enumerable is created, but the sequence has not been enumerated yet. In LINQ terms this is called lazy or deferred execution.
IEnumerable<obj2> obj2sThatNeedToBeChanged = alist2.Select(obj2 => new
{
CountryRegionCombination = new
{
Country = obj2.Country,
Region = obj2.Region,
},
Original = obj2,
})
.Where(item => eligibleCountryRegionCombinations.Contains(
item.CountryRegionCombination))
.Select(item => item.Original);
CountryRegionCombination is an anonymous type of the same type as the anonymous items in eligibleCountryRegionCombinations. Therefore you can use Contains. Because the items are anonymous type, the equality comparison is comparison by value.
The final select will remove the anonymous type, and keep only the Original.
Note that the query is still not enumerated.
foreach (var obj2 in obj2sThatNeedToBeChanged.ToList())
{
obj2.ToBeChanged = true;
}
It can be dangerous to change the source that you are enumerating. In this case it is not a problem, because the field that you change is not used to create the enumeration. Still I think it is safer, because of possible future changes, to do a ToList before you start changing the source.
Create an equality comparer
One of the overload of Enumerable.Contains has a parameter comparer. This expects an IEqualityComparer<obj1>
class Obj1Comparer : EqualityComparer<obj1>
{
public static IEqualityComparer<obj1> ByValue {get;} = new Obj1Comparer();
private static IEqualityComparer<string> CountryComparer => StringComparer.OrdinalIgnoreCase;
private static IEqualityComparer<string> RegionComparer => StringComparer.OrdinalIgnoreCase;
public override bool Equals (obj1 x, obj1 y)
{
if (x == null) return y == null; // true if both null, false if x null, but y not null
if (y == null) return false; // because x not null
// optimization:
if (Object.ReferenceEquals(x, y)) return true;
if (x.GetType() != y.GetType()) return false;
return CountryComparer.Equals(x.Country, y.Country)
&& RegionComparer.Equals(x.Region, y.Region);
}
To make it easy to change equality of countries, I created a separate comparer for countries and for regions. So if later you want to compare case sensitive, or if you change Country from string to a foreign key to a table of countries, then changes will be minimal.
You also need to override GetHashCode. If x equals y, then GetHashCode should rerturn the same value. Not the other way round: if x and y different they may return the same hash code. However, code will be more efficient if you have more different Hash codes.
public override int GetHashCode (obj1 x)
{
if (x == null) return 87966354; // just a number
return CountryComparer.GetHashCode(x.Country)
^ RegionComparer.GetHashCode(x.Region);
}
Which HashCode you return depends on how often this will be called, for instance in dictionaries, comparers like Contains, etc.
How "different" are the Countries and Regions? A different Country will probably also mean a different region. So maybe it is efficient enough if you only calculate the Hash code for the Country. If a Country has many, many regions, then it will probably be better to calculate the hash code for regions as well If a Region is only in one Country (OberAmmerGau is probably only in Germany), or in only a few Regions (how many regions "New Amsterdam" will there be?), then you won't have to check the Country at all.
Because we have an equality comparer, we don't need to convert alist to an anonymous type, we can specify that Contains should compare by value.
IEqualityComparer<obj1> comparer = Obj1Comparer.ByValue;
IEnumerable<obj2> obj2sThatNeedToBeChanged = alist2.Select(obj2 => new
{
Obj1 = new Obj1
{
Country = obj2.Country,
Region = obj2.Region,
},
Original = obj2,
})
.Where(item => alist.Contains(item.CountryRegionCombination, comparer))
.Select(item => item.Original);
Fast method: Extension method
The fastest method, and maybe also the most simple one, is to create an extension method.
private static IEqualityComparer<string> CountryComparer => StringComparer.OrdinalIgnoreCase;
private static IEqualityComparer<string> RegionComparer => StringComparer.OrdinalIgnoreCase;
public static IEnumerable<Obj2> WhereSameLocation(
this IEnumerable<Obj2> source,
IEnumerable<Obj1> obj1Items)
{
// TODO: what to do if source == null?
foreach (Obj2 obj2 in source)
{
// check if there is any obj1 with same [Country, Region]
if (obj1Items
.Where(obj1 => CountryComparer.Equals(obj2.Country, obj1.Country)
&& RegionComparer.Equals(obj2.Region, obj1.Region))
.Any())
{
yield return obj2;
}
}
}
Usage:
IEnumerable<Obj1> alist = ...
IEnumerable<Obj2> alist2 = ...
IEnumerable<obj2> obj2sThatNeedToBeChanged = alist2.WhereSameLocation(alist);

Using .Select and .Where in a single LINQ statement

I need to gather Distinct Id's from a particular table using LINQ. The catch is I also need a WHERE statement that should filter the results based only from the requirements I've set. Relatively new to having to use LINQ so much, but I'm using the following code more or less:
private void WriteStuff(SqlHelper db, EmployeeHelper emp)
{
String checkFieldChange;
AnIList tableClass = new AnIList(db, (int)emp.PersonId);
var linq = tableClass.Items
.Where(
x => x.UserId == emp.UserId
&& x.Date > DateBeforeChanges
&& x.Date < DateAfterEffective
&& (
(x.Field == Inserted)
|| (x.Field == Deleted)))
)
).OrderByDescending(x => x.Id);
if (linq != null)
{
foreach (TableClassChanges item in linq)
{
AnotherIList payTxn = new AnotherIList(db, item.Id);
checkFieldChange = GetChangeType(item.FieldName);
// Other codes that will retrieve data from each item
// and write it into a text file
}
}
}
I tried to add .Distinct for var linq but it's still returning duplicate items (meaning having the same Id's). I've read through a lot of sites and have tried adding a .Select into the query but the .Where clause breaks instead. There are other articles where the query is somehow different with the way it retrieves the values and place it in a var. I also tried to use .GroupBy but I get an "At least one object must implement IComparable" when using Id as a key.
The query actually works and I'm able to output the data from the columns with the specifications I require, but I just can't seem to make .Distinct work (which is the only thing really missing). I tried to create two vars with one triggering a distinct call then have a nested foreach to ensure the values are just unique, but will thousands of records to gather the performance impact is just too much.
I'm unsure as well if I'd have to override or use IEnumerable for my requirement, and thought I'd ask the question around just in case there's an easier way, or if it's possible to have both .Select and .Where working in just one statement?
Did you add the Select() after the Where() or before?
You should add it after, because of the concurrency logic:
1 Take the entire table
2 Filter it accordingly
3 Select only the ID's
4 Make them distinct.
If you do a Select first, the Where clause can only contain the ID attribute because all other attributes have already been edited out.
Update: For clarity, this order of operators should work:
db.Items.Where(x=> x.userid == user_ID).Select(x=>x.Id).Distinct();
Probably want to add a .toList() at the end but that's optional :)
In order for Enumerable.Distinct to work for your type, you can implement IEquatable<T> and provide suitable definitions for Equals and GetHashCode, otherwise it will use the default implementation: comparing for reference equality (assuming that you are using a reference type).
From the manual:
The Distinct(IEnumerable) method returns an unordered sequence that contains no duplicate values. It uses the default equality comparer, Default, to compare values.
The default equality comparer, Default, is used to compare values of the types that implement the IEquatable generic interface. To compare a custom data type, you need to implement this interface and provide your own GetHashCode and Equals methods for the type.
In your case it looks like you might just need to compare the IDs, but you may also want to compare other fields too depending on what it means for you that two objects are "the same".
You can also consider using DistinctBy from morelinq.
Note that this is LINQ to Objects only, but I assume that's what you are using.
Yet another option is to combine GroupBy and First:
var query = // your query here...
.GroupBy(x => x.Id)
.Select(g => g.First());
This would also work in LINQ to SQL, for example.
Since you are trying to compare two different objects you will need to first implement the IEqualityComparer interface. Here is an example code on a simple console application that uses distinct and a simple implementation of the IEqualityComparer:
class Program
{
static void Main(string[] args)
{
List<Test> testData = new List<Test>()
{
new Test(1,"Test"),
new Test(2, "Test"),
new Test(2, "Test")
};
var result = testData.Where(x => x.Id > 1).Distinct(new MyComparer());
}
}
public class MyComparer : IEqualityComparer<Test>
{
public bool Equals(Test x, Test y)
{
return x.Id == y.Id;
}
public int GetHashCode(Test obj)
{
return string.Format("{0}{1}", obj.Id, obj.Name).GetHashCode();
}
}
public class Test
{
public Test(int id, string name)
{
this.id = id;
this.name = name;
}
private int id;
public int Id
{
get { return id; }
set { id = value; }
}
private string name;
public string Name
{
get { return name; }
set { name = value; }
}
}
I hope that helps.
Do you passed a IEqualityComparer<T> to .Distinct()?
Something like this:
internal abstract class BaseComparer<T> : IEqualityComparer<T> {
public bool Equals(T x, T y) {
return GetHashCode(x) == GetHashCode(y);
}
public abstract int GetHashCode(T obj);
}
internal class DetailComparer : BaseComparer<StyleFeatureItem> {
public override int GetHashCode(MyClass obj) {
return obj.ID.GetHashCode();
}
}
Usage:
list.Distinct(new DetailComparer())
You can easily query with LINQ like this
considering this JSON
{
"items": [
{
"id": "10",
"name": "one"
},
{
"id": "12",
"name": "two"
}
]
}
putting it in a variable called json like this,
JObject json = JObject.Parse("{'items':[{'id':'10','name':'one'},{'id':'12','name':'two'}]}");
you can select all ids from the items where name is "one" using the following LINQ query
var Ids =
from item in json["items"]
where (string)item["name"] == "one"
select item["id"];
Then, you will have the result in an IEnumerable list

What is the best way to compute intersection and difference of 2 sets?

I have 2 lists List<Class1> and List<Class2> that are compared by same property Class1.Key and Class2.Key (string) and I want to write a function that will produce 3 lists out of them
List<Class1> Elements that are present in both lists
List<Class1> Elements that are present only in first list
List<Class2> Elements that are present only in second list
Is there a quick way to do that?
var requirement1 = list1.Intersect(list2);
var requirement2 = list1.Except(list2);
var requirement3 = list2.Except(list1);
For your List<string>, this will be all you need. If you were doing this for a custom class and you were looking for something other than reference comparisons, you'd want to ensure that the class properly overrided Equals and GetHashCode. Alternatively, you could provide an IEqualityComparer<YourType> to overloads of the above methods.
Edit:
OK, now you've indicated in the comments that it isn't a list of string, it's a List<MyObject>. In which case, override Equals/GetHashCode (if your key should uniquely identify these classes all the time and you have access to the source code) or provide an IEqualityComparer implementation (still involves Equals/GetHashCode, use this if the comparison is unique to these requires or if you do not have access to MyObject source).
For example:
class MyObjectComparer : IEqualityComparer<MyObject>
{
public bool Equals(MyObject x, MyObject y)
{
// implement appropriate comparison of x and y, check for nulls, etc
}
public int GetHashCode(MyObject obj)
{
// validate if necessary
return obj.KeyProperty.GetHashCode();
}
}
If you used a custom equality comparer such as this, the call to the above methods would be
list1.Intersect(list2, customComparerInstance);
Edit: And now you've moved the bar yet again, this time the problem deals with two distinct classes. For this, you would consider utilizing join operations, one being an inner, the other being an outer.
In the case of
class Class1
{
public string Foo { get; set; }
}
class Class2
{
public string Bar { get; set; }
}
You could write
var intersect = from item1 in list1
join item2 in list2
on item1.Foo equals item2.Bar
select item1;
var except1 = from item1 in list1
join item2 in list2
on item1.Foo equals item2.Bar into gj
from item2 in gj.DefaultIfEmpty()
where item2 == null
select item1;
To get the items in list2 without matching* objects in list1, simply reverse the order of the lists/items in the except1 query.

Selecting DataRows into new structures using LINQ. Calling Distinct() fails

Consider these two structures:
struct Task
{
public Int32 Id;
public String Name;
public List<Registration> Registrations;
}
struct Registration
{
public Int32 Id;
public Int32 TaskId;
public String Comment;
public Double Hours;
}
I am selecting a bunch of entries in a DataTable into new structures, like so:
var tasks = data.AsEnumerable().Select(t => new Task
{
Id = Convert.ToInt32(t["ProjectTaskId"]),
Name = Convert.ToString(t["ProjectTaskName"]),
Registrations = new List<Registration>()
});
But when I call Distinct() on the collection, it doesn't recognize objects with the same values (Id, Name, Registrations) as being equal.
But if I use an equality comparer; comparing the Id property on the objects, it's all fine and dandy...:
class TaskIdComparer : IEqualityComparer<Task>
{
public bool Equals(Task x, Task y)
{
return x.Id == y.Id;
}
public Int32 GetHashCode(Task t)
{
return t.Id.GetHashCode();
}
}
What am I missing here? Is Distinct() checking something else than the value of properties?
LINQ's Distinct method compares objects using the objects' Equals and GetHashCode implementations.
Therefore, if these methods are not overridden, it will compare by reference, not by value.
You need to use an EqualityComparer. (Or implement Equals and GetHashCode for the Task class)
my guess is that it's the list in there. Almost certainly, the two list objects are different, even if they contain the same info.

Categories

Resources