C#: Compare two ArrayList of custom class and find duplicates

C#: Compare two ArrayList of custom class and find duplicates - c#

I have two arrays of ArrayList.
public class ProductDetails
{
public string id;
public string description;
public float rate;
}
ArrayList products1 = new ArrayList();
ArrayList products2 = new ArrayList();
ArrayList duplicateProducts = new ArrayList();
Now what I want is to get all the products (with all the fields of ProductDetails class) having duplicate description in both products1 and products2.
I can run two for/while loops as traditional way, but that would be very slow specially if I will be having over 10k elements in both arrays.
So probably something can be done with LINQ.

If you want to use linQ, you need write your own EqualityComparer where you override both methods Equals and GetHashCode()
public class ProductDetails
{
public string id {get; set;}
public string description {get; set;}
public float rate {get; set;}
}
public class ProductComparer : IEqualityComparer<ProductDetails>
{
public bool Equals(ProductDetails x, ProductDetails y)
{
//Check whether the objects are the same object.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether the products' properties are equal.
return x != null && y != null && x.id.Equals(y.id) && x.description.Equals(y.description);
}
public int GetHashCode(ProductDetails obj)
{
//Get hash code for the description field if it is not null.
int hashProductDesc = obj.description == null ? 0 : obj.description.GetHashCode();
//Get hash code for the idfield.
int hashProductId = obj.id.GetHashCode();
//Calculate the hash code for the product.
return hashProductDesc ^ hashProductId ;
}
}
Now, supposing you have this objects:
ProductDetails [] items1= { new ProductDetails { description= "aa", id= 9, rating=2.0f },
new ProductDetails { description= "b", id= 4, rating=2.0f} };
ProductDetails [] items= { new ProductDetails { description= "aa", id= 9, rating=1.0f },
new ProductDetails { description= "c", id= 12, rating=2.0f } };
IEnumerable<ProductDetails> duplicates =
items1.Intersect(items2, new ProductComparer());

Consider overriding the System.Object.Equals method.
public class ProductDetails
{
public string id;
public string description;
public float rate;
public override bool Equals(object obj)
{
if(obj is ProductDetails == null)
return false;
if(ReferenceEquals(obj,this))
return true;
ProductDetails p = (ProductDetails)obj;
return description == p.description;
}
}
Filtering would then be as simple as:
var result = products1.Where(product=>products2.Contains(product));
EDIT:
Do consider that this implementation is not optimal..
Moreover- it has been proposed in the comments to your question that you use a data base.
This way performance will be optimized - as per the database implementation In any case- the overhead will not be yours.
However, you can optimize this code by using a Dictionary or a HashSet:
Overload the System.Object.GetHashCode method:
public override int GetHashCode()
{
return description.GetHashCode();
}
You can now do this:
var hashSet = new HashSet<ProductDetails>(products1);
var result = products2.Where(product=>hashSet.Contains(product));
Which will boost your performance to an extent since lookup will be less costly.

10k elements is nothing, however make sure you use proper collection types. ArrayList is long deprecated, use List<ProductDetails>.
Next step is implementing proper Equals and GetHashCode overrides for your class. The assumption here is that description is the key since that's what you care about from a duplication point of view:
public class ProductDetails
{
public string id;
public string description;
public float rate;
public override bool Equals(object obj)
{
var p = obj as ProductDetails;
return ReferenceEquals(p, null) ? false : description == obj.description;
}
public override int GetHashCode() => description.GetHashCode();
}
Now we have options. One easy and efficient way of doing this is using a hash set:
var set = new HashSet<ProductDetails>();
var products1 = new List<ProductDetails>(); // fill it
var products2 = new List<ProductDetails>(); // fill it
// shove everything in the first list in the set
foreach(var item in products1)
set.Add(item);
// and simply test the elements in the second set
foreach(var item in products2)
if(set.Contains(item))
{
// item.description was already used in products1, handle it here
}
This gives you linear (O(n)) time-complexity, best you can get.

Related

C# How to find index of a object in listbox datasource (binded with List<T>) by Linq

I have a viewmodel class like this:
public class ViewItem
{
private No as int;
private Name as string;
...Getter-Setter go here...
}
I have a listbox name LbxItemBox
I binded it with datasource is:
List<ViewItem> DataList;
LbxItemBox.Datasource = DataList;
Data final are:
Item1: 1, "Frank"
Item2: 2, "Bob"
Item3: 3, "Johh"
Item4: 4, "Lucis"
How I can find index of a object model like this in LbxItemBox:
ViewItem ViewX = new ViewItem();
ViewX.No = 3;
ViewX.Name = "John";
I tried simple way, not Linq:
int IndexMatched = LbxItemBox.Items.IndexOf(ViewX);
but return -1;
How can I use Linq to find index of what I need?
Thanks for your helps

There are a couple ways you can achieve this. First, you can override Equals for your ViewItem class. This will cause the IndexOf call with a new instance of ViewItem to say that any object you already have in the list is equivalent to a newly created object with the same information. For your class you have listed here is how you could do that:
public class ViewItem
{
private int No { get; set; }
private string Name { get; set; }
public ViewItem(int no, string name)
{
this.No = no;
this.Name = name;
}
public override bool Equals(Object obj)
{
// Check for null values and compare run-time types.
if (obj == null || GetType() != obj.GetType())
return false;
ViewItem other = (ViewItem) obj;
return (No == other.No);
}
}
Your example above should return the right index, given that Equals has been overridden. For more information on correctly implementing Equals and GetHashCode see MSDN or http://www.loganfranken.com/blog/687/overriding-equals-in-c-part-1/.
An alternative way to handle this using linq if you are not looking for true equality would be to match on certain properties of your ViewItem class. Here is an example of doing so:
List<ViewItem> items = new List<ViewItem>() { new ViewItem(1, "John"), new ViewItem(2, "Jake"), new ViewItem(2, "Frank")};
var john = new ViewItem(1, "John");
var frankInd = items.FindIndex(i => i.Name == "Frank");
Console.WriteLine(items.IndexOf(john));
Console.WriteLine(frankInd);

Using LINQ (Enumerable.Cast<TResult> Method):
int IndexMatched = LbxItemBox.Items.Cast<ViewItem>().ToList().FindIndex(x => x.No == ViewX.No);

On string interning and alternatives

I have a large file which, in essence contains data like:
Netherlands,Noord-holland,Amsterdam,FooStreet,1,...,...
Netherlands,Noord-holland,Amsterdam,FooStreet,2,...,...
Netherlands,Noord-holland,Amsterdam,FooStreet,3,...,...
Netherlands,Noord-holland,Amsterdam,FooStreet,4,...,...
Netherlands,Noord-holland,Amsterdam,FooStreet,5,...,...
Netherlands,Noord-holland,Amsterdam,BarRoad,1,...,...
Netherlands,Noord-holland,Amsterdam,BarRoad,2,...,...
Netherlands,Noord-holland,Amsterdam,BarRoad,3,...,...
Netherlands,Noord-holland,Amsterdam,BarRoad,4,...,...
Netherlands,Noord-holland,Amstelveen,BazDrive,1,...,...
Netherlands,Noord-holland,Amstelveen,BazDrive,2,...,...
Netherlands,Noord-holland,Amstelveen,BazDrive,3,...,...
Netherlands,Zuid-holland,Rotterdam,LoremAve,1,...,...
Netherlands,Zuid-holland,Rotterdam,LoremAve,2,...,...
Netherlands,Zuid-holland,Rotterdam,LoremAve,3,...,...
...
This is a multi-gigabyte file. I have a class that reads this file and exposes these lines (records) as an IEnumerable<MyObject>. This MyObject has several properties (Country,Province,City, ...) etc.
As you can see there is a LOT of duplication of data. I want to keep exposing the underlying data as an IEnumerable<MyObject>. However, some other class might (and probably will) make some hierarchical view/structure of this data like:
Netherlands
Noord-holland
Amsterdam
FooStreet [1, 2, 3, 4, 5]
BarRoad [1, 2, 3, 4]
...
Amstelveen
BazDrive [1, 2, 3]
...
...
Zuid-holland
Rotterdam
LoremAve [1, 2, 3]
...
...
...
...
When reading this file, I do, essentially, this:
foreach (line in myfile) {
fields = line.split(",");
yield return new MyObject {
Country = fields[0],
Province = fields[1],
City = fields[2],
Street = fields[3],
//...other fields
};
}
Now, to the actual question at hand: I could use string.Intern() to intern the Country, Province, City, and Street strings (those are the main 'vilains', the MyObject has several other properties not relevant to the question).
foreach (line in myfile) {
fields = line.split(",");
yield return new MyObject {
Country = string.Intern(fields[0]),
Province = string.Intern(fields[1]),
City = string.Intern(fields[2]),
Street = string.Intern(fields[3]),
//...other fields
};
}
This will save about 42% of memory (tested and measured) when holding the entire dataset in memory since all duplicate strings will be a reference to the same string. Also, when creating the hierarchical structure with a lot of LINQ's .ToDictionary() method the keys (Country, Province etc.) of the resp. dictionaries will be much more efficient.
However, one of the drawbacks (aside a slight loss of performance, which is not problem) of using string.Intern() is that the strings won't be garbage collected anymore. But when I'm done with my data I do want all that stuff garbage collected (eventually).
I could use a Dictionary<string, string> to 'intern' this data but I don't like the "overhead" of having a key and value where I am, actually, only interested in the key. I could set the value to null or the use the same string as value (which will result in the same reference in key and value). It's only a small price of a few bytes to pay, but it's still a price.
Something like a HashSet<string> makes more sense to me. However, I cannot get a reference to a string in the HashSet; I can see if the HashSet contains a specific string, but not get a reference to that specific instance of the located string in the HashSet. I could implement my own HashSet for this, but I am wondering what other solutions you kind StackOverflowers may come up with.
Requirements:
My "FileReader" class needs to keep exposing an IEnumerable<MyObject>
My "FileReader" class may do stuff (like string.Intern()) to optimize memory usage
The MyObject class cannot change; I won't make a City class, Country class etc. and have MyObject expose those as properties instead of simple string properties
Goal is to be (more) memory efficient by de-duplicating most of the duplicate strings in Country, Province, City etc.; how this is achieved (e.g. string interning, internal hashset / collection / structure of something) is not important. However:
I know I can stuff the data in a database or use other solutions in such direction; I am not interested in these kind of solutions.
Speed is only of secondary concern; the quicker the better ofcourse but a (slight) loss in performance while reading/iterating the objects is no problem
Since this is a long-running process (as in: windows service running 24/7/365) that, occasionally, processes a bulk of this data I want the data to be garbage-collected when I'm done with it; string interning works great but will, in the long run, result in a huge string pool with lots of unused data
I would like any solutions to be "simple"; adding 15 classes with P/Invokes and inline assembly (exaggerated) is not worth the effort. Code maintainability is high on my list.
This is more of a 'theoretical' question; it's purely out of curiosity / interest that I'm asking. There is no "real" problem, but I can see that in similar situations this might be a problem to someone.
For example: I could do something like this:
public class StringInterningObject
{
private HashSet<string> _items;
public StringInterningObject()
{
_items = new HashSet<string>();
}
public string Add(string value)
{
if (_items.Add(value))
return value; //New item added; return value since it wasn't in the HashSet
//MEH... this will quickly go O(n)
return _items.First(i => i.Equals(value)); //Find (and return) actual item from the HashSet and return it
}
}
But with a large set of (to be de-duplicated) strings this will quickly bog down. I could have a peek at the reference source for HashSet or Dictionary or... and build a similar class that doesn't return bool for the Add() method but the actual string found in the internals/bucket.
The best I could come up with until now is something like:
public class StringInterningObject
{
private ConcurrentDictionary<string, string> _items;
public StringInterningObject()
{
_items = new ConcurrentDictionary<string, string>();
}
public string Add(string value)
{
return _items.AddOrUpdate(value, value, (v, i) => i);
}
}
Which has the "penalty" of having a Key and a Value where I'm actually only interested in the Key. Just a few bytes though, small price to pay. Coincidally this also yields 42% less memory usage; the same result as when using string.Intern() yields.
tolanj came up with System.Xml.NameTable:
public class StringInterningObject
{
private System.Xml.NameTable nt = new System.Xml.NameTable();
public string Add(string value)
{
return nt.Add(value);
}
}
(I removed the lock and string.Empty check (the latter since the NameTable already does that))
xanatos came up with a CachingEqualityComparer:
public class StringInterningObject
{
private class CachingEqualityComparer<T> : IEqualityComparer<T> where T : class
{
public System.WeakReference X { get; private set; }
public System.WeakReference Y { get; private set; }
private readonly IEqualityComparer<T> Comparer;
public CachingEqualityComparer()
{
Comparer = EqualityComparer<T>.Default;
}
public CachingEqualityComparer(IEqualityComparer<T> comparer)
{
Comparer = comparer;
}
public bool Equals(T x, T y)
{
bool result = Comparer.Equals(x, y);
if (result)
{
X = new System.WeakReference(x);
Y = new System.WeakReference(y);
}
return result;
}
public int GetHashCode(T obj)
{
return Comparer.GetHashCode(obj);
}
public T Other(T one)
{
if (object.ReferenceEquals(one, null))
{
return null;
}
object x = X.Target;
object y = Y.Target;
if (x != null && y != null)
{
if (object.ReferenceEquals(one, x))
{
return (T)y;
}
else if (object.ReferenceEquals(one, y))
{
return (T)x;
}
}
return one;
}
}
private CachingEqualityComparer<string> _cmp;
private HashSet<string> _hs;
public StringInterningObject()
{
_cmp = new CachingEqualityComparer<string>();
_hs = new HashSet<string>(_cmp);
}
public string Add(string item)
{
if (!_hs.Add(item))
item = _cmp.Other(item);
return item;
}
}
(Modified slightly to "fit" my "Add() interface")
As per Henk Holterman's request:
public class StringInterningObject
{
private Dictionary<string, string> _items;
public StringInterningObject()
{
_items = new Dictionary<string, string>();
}
public string Add(string value)
{
string result;
if (!_items.TryGetValue(value, out result))
{
_items.Add(value, value);
return value;
}
return result;
}
}
I'm just wondering if there's maybe a neater/better/cooler way to 'solve' my (not so much of an actual) problem. By now I have enough options I guess
Here are some numbers I came up with for some simple, short, preliminary tests:
Non optimizedMemory: ~4,5GbLoad time: ~52s
StringInterningObject (see above, the ConcurrentDictionary variant)Memory: ~2,6GbLoad time: ~49s
string.Intern()Memory: ~2,3GbLoad time: ~45s
System.Xml.NameTableMemory: ~2,3GbLoad time: ~41s
CachingEqualityComparerMemory: ~2,3GbLoad time: ~58s
StringInterningObject (see above, the (non-concurrent) Dictionary variant) as per Henk Holterman's request:Memory: ~2,3GbLoad time: ~39s
Although the numbers aren't very definitive, it seems that the many memory-allocations for the non-optimized version actually slow down more than using either string.Intern() or the above StringInterningObjects which results in (slightly) longer load times. Also, string.Intern() seems to 'win' from StringInterningObject but not by a large margin; << See updates.

I've had exactly this requirement and indeed asked on SO, but with nothing like the detail of your question, no useful responses. One option that is built in is a (System.Xml).NameTable, which is basically a string atomization object, which is what you are looking for, we had (we've actually move to Intern because we do keep these strings for App-life).
if (name == null) return null;
if (name == "") return string.Empty;
lock (m_nameTable)
{
return m_nameTable.Add(name);
}
on a private NameTable
http://referencesource.microsoft.com/#System.Xml/System/Xml/NameTable.cs,c71b9d3a7bc2d2af shows its implemented as a Simple hashtable, ie only storing one reference per string.
Downside? is its completely string specific. If you do cross-test for memory / speed I'd be interested to see the results. We were already using System.Xml heavily, might of course not seem so natural if you where not.

When in doubt, cheat! :-)
public class CachingEqualityComparer<T> : IEqualityComparer<T> where T : class
{
public T X { get; private set; }
public T Y { get; private set; }
public IEqualityComparer<T> DefaultComparer = EqualityComparer<T>.Default;
public bool Equals(T x, T y)
{
bool result = DefaultComparer.Equals(x, y);
if (result)
{
X = x;
Y = y;
}
return result;
}
public int GetHashCode(T obj)
{
return DefaultComparer.GetHashCode(obj);
}
public T Other(T one)
{
if (object.ReferenceEquals(one, X))
{
return Y;
}
if (object.ReferenceEquals(one, Y))
{
return X;
}
throw new ArgumentException("one");
}
public void Reset()
{
X = default(T);
Y = default(T);
}
}
Example of use:
var comparer = new CachingEqualityComparer<string>();
var hs = new HashSet<string>(comparer);
string str = "Hello";
string st1 = str.Substring(2);
hs.Add(st1);
string st2 = str.Substring(2);
// st1 and st2 are distinct strings!
if (object.ReferenceEquals(st1, st2))
{
throw new Exception();
}
comparer.Reset();
if (hs.Contains(st2))
{
string cached = comparer.Other(st2);
Console.WriteLine("Found!");
// cached is st1
if (!object.ReferenceEquals(cached, st1))
{
throw new Exception();
}
}
I've created an equality comparer that "caches" the last Equal terms it analyzed :-)
Everything could then be encapsulated in a subclass of HashSet<T>
/// <summary>
/// An HashSet<T;gt; that, thorough a clever use of an internal
/// comparer, can have a AddOrGet and a TryGet
/// </summary>
/// <typeparam name="T"></typeparam>
public class HashSetEx<T> : HashSet<T> where T : class
{
public HashSetEx()
: base(new CachingEqualityComparer<T>())
{
}
public HashSetEx(IEqualityComparer<T> comparer)
: base(new CachingEqualityComparer<T>(comparer))
{
}
public T AddOrGet(T item)
{
if (!Add(item))
{
var comparer = (CachingEqualityComparer<T>)Comparer;
item = comparer.Other(item);
}
return item;
}
public bool TryGet(T item, out T item2)
{
if (Contains(item))
{
var comparer = (CachingEqualityComparer<T>)Comparer;
item2 = comparer.Other(item);
return true;
}
item2 = default(T);
return false;
}
private class CachingEqualityComparer<T> : IEqualityComparer<T> where T : class
{
public WeakReference X { get; private set; }
public WeakReference Y { get; private set; }
private readonly IEqualityComparer<T> Comparer;
public CachingEqualityComparer()
{
Comparer = EqualityComparer<T>.Default;
}
public CachingEqualityComparer(IEqualityComparer<T> comparer)
{
Comparer = comparer;
}
public bool Equals(T x, T y)
{
bool result = Comparer.Equals(x, y);
if (result)
{
X = new WeakReference(x);
Y = new WeakReference(y);
}
return result;
}
public int GetHashCode(T obj)
{
return Comparer.GetHashCode(obj);
}
public T Other(T one)
{
if (object.ReferenceEquals(one, null))
{
return null;
}
object x = X.Target;
object y = Y.Target;
if (x != null && y != null)
{
if (object.ReferenceEquals(one, x))
{
return (T)y;
}
else if (object.ReferenceEquals(one, y))
{
return (T)x;
}
}
return one;
}
}
}
Note the use of WeakReference so that there aren't useless references to objects that could prevent garbage collection.
Example of use:
var hs = new HashSetEx<string>();
string str = "Hello";
string st1 = str.Substring(2);
hs.Add(st1);
string st2 = str.Substring(2);
// st1 and st2 are distinct strings!
if (object.ReferenceEquals(st1, st2))
{
throw new Exception();
}
string stFinal = hs.AddOrGet(st2);
if (!object.ReferenceEquals(stFinal, st1))
{
throw new Exception();
}
string stFinal2;
bool result = hs.TryGet(st1, out stFinal2);
if (!object.ReferenceEquals(stFinal2, st1))
{
throw new Exception();
}
if (!result)
{
throw new Exception();
}

edit3:
instead of indexing strings, putting them in non-duplicate lists will save much more ram.
we have int indexes in class MyObjectOptimized. access is instant.
if list is short(like 1000 item) speed of setting values wont be noticable.
i assumed every string will have 5 character .
this will reduce memory usage
percentage : 110 byte /16byte = 9x gain
total : 5gb/9 = 0.7 gb + sizeof(Country_li , Province_li etc )
with int16 index (will further halve ram usage )
*note:* int16 capacity is -32768 to +32767 ,
make sure your list is not bigger than 32 767
usage is same but will use the class MyObjectOptimized
main()
{
// you can use same code
foreach (line in myfile) {
fields = line.split(",");
yield
return
new MyObjectOptimized {
Country = fields[0],
Province = fields[1],
City = fields[2],
Street = fields[3],
//...other fields
};
}
}
required classes
// single string size : 18 bytes (empty string size) + 2 bytes per char allocated
//1 class instance ram cost : 4 * (18 + 2* charCount )
// ie charcounts are at least 5
// cost: 4*(18+2*5) = 110 byte
class MyObject
{
string Country ;
string Province ;
string City ;
string Street ;
}
public static class Exts
{
public static int AddDistinct_and_GetIndex(this List<string> list ,string value)
{
if( !list.Contains(value) ) {
list.Add(value);
}
return list.IndexOf(value);
}
}
// 1 class instance ram cost : 4*4 byte = 16 byte
class MyObjectOptimized
{
//those int's could be int16 depends on your distinct item counts
int Country_index ;
int Province_index ;
int City_index ;
int Street_index ;
// manuallly implemented properties will not increase memory size
// whereas field WILL increase
public string Country{
get {return Country_li[Country_index]; }
set { Country_index = Country_li.AddDistinct_and_GetIndex(value); }
}
public string Province{
get {return Province_li[Province_index]; }
set { Province_index = Province_li.AddDistinct_and_GetIndex(value); }
}
public string City{
get {return City_li[City_index]; }
set { City_index = City_li.AddDistinct_and_GetIndex(value); }
}
public string Street{
get {return Street_li[Street_index]; }
set { Street_index = Street_li.AddDistinct_and_GetIndex(value); }
}
//beware they are static.
static List<string> Country_li ;
static List<string> Province_li ;
static List<string> City_li ;
static List<string> Street_li ;
}

How do I get a distinct value of items in a list of object or structure?

I have a List. This collection holds an object containing the properties of a class.
I want a distinct value of list with respect to any specific property of a class. I have attached some sample code; please check & let me know if you guys have any solutions:
class Test
{
public string firstname{get;set;}
public string lastname{get;set;}
}
class Usetheaboveclass
{
Test objTest=new Test();
List<Test> lstTest=new List<Test>();
objTest.firstname="test";
objTest.lastname="testing";
//Now i want a distinct value with respect to lastname.if i use
lstTest=lstTest.Distinct().Tolist();
//It will process according to all properties.
}
Can you suggest me a way to do this?

Try this approach.
var distinct = lstTest.GroupBy(item => item.lastname).Select(item => item.First()).ToList();

If you only need to do this for one property, override the Equals and GetHashCode methods in Test. These are what Distinct() uses to define duplicates.
If you need to do this for multiple properties, define an IEqualityComparer (the usage is documented in this MSDN article).

Or , you can implement a custom comparer
public class LastNameComparer : IEqualityComparer<Test>
{
public bool Equals(Test x, Test y)
{
if (x == null)
return y == null;
return x.lastname == y.lastname;
}
public int GetHashCode(Test obj)
{
if (obj == null)
return 0;
return obj.lastname.GetHashCode();
}
}
Then , use it like
lstTest = lstTest.Distinct(new LastNameComparer()).ToList();

You can use overloaded version of Distinct. Please see sample code below:
internal class Test
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
internal class LastNameComparer : IEqualityComparer<Test>
{
bool IEqualityComparer<Test>.Equals(Test x, Test y)
{
if (x.LastName == y.LastName)
return true;
return false;
}
int IEqualityComparer<Test>.GetHashCode(Test obj)
{
return 0; // hashcode...
}
}
private static void Main(string[] args)
{
Test objTest = new Test {FirstName = "Perry", LastName = "Joe"};
Test objTest1 = new Test {FirstName = "Prince", LastName = "Joe"};
Test objTest2 = new Test { FirstName = "Prince", LastName = "Jim" };
List<Test> lstTest = new List<Test> {objTest, objTest1, objTest2};
var distinct = lstTest.Distinct(new LastNameComparer()).ToList();
foreach (var test in distinct)
{
Console.WriteLine(test.LastName);
}
Console.Read();
}
Output of this will be:
Joe
Jim

MSTest: CollectionAssert.AreEquivalent failed. The expected collection contains 1 occurrence(s) of

Question:
Can anyone tell me why my unit test is failing with this error message?
CollectionAssert.AreEquivalent failed. The expected collection contains 1
occurrence(s) of . The actual
collection contains 0 occurrence(s).
Goal:
I'd like to check if two lists are identical. They are identical if both contain the same elements with the same property values. The order is irrelevant.
Code example:
This is the code which produces the error. list1 and list2 are identical, i.e. a copy-paste of each other.
[TestMethod]
public void TestListOfT()
{
var list1 = new List<MyPerson>()
{
new MyPerson()
{
Name = "A",
Age = 20
},
new MyPerson()
{
Name = "B",
Age = 30
}
};
var list2 = new List<MyPerson>()
{
new MyPerson()
{
Name = "A",
Age = 20
},
new MyPerson()
{
Name = "B",
Age = 30
}
};
CollectionAssert.AreEquivalent(list1.ToList(), list2.ToList());
}
public class MyPerson
{
public string Name { get; set; }
public int Age { get; set; }
}
I've also tried this line (source)
CollectionAssert.AreEquivalent(list1.ToList(), list2.ToList());
and this line (source)
CollectionAssert.AreEquivalent(list1.ToArray(), list2.ToArray());
P.S.
Related Stack Overflow questions:
I've seen both these questions, but the answers didn't help.
CollectionAssert use with generics?
Unit-testing IList with CollectionAssert

You are absolutely right. Unless you provide something like an IEqualityComparer<MyPerson> or implement MyPerson.Equals(), the two MyPerson objects will be compared with object.Equals, just like any other object. Since the objects are different, the Assert will fail.

It works if I add an IEqualityComparer<T> as described on MSDN and if I use Enumerable.SequenceEqual. Note however, that now the order of the elements is relevant.
In the unit test
//CollectionAssert.AreEquivalent(list1, list2); // Does not work
Assert.IsTrue(list1.SequenceEqual(list2, new MyPersonEqualityComparer())); // Works
IEqualityComparer
public class MyPersonEqualityComparer : IEqualityComparer<MyPerson>
{
public bool Equals(MyPerson x, MyPerson y)
{
if (object.ReferenceEquals(x, y)) return true;
if (object.ReferenceEquals(x, null) || object.ReferenceEquals(y, null)) return false;
return x.Name == y.Name && x.Age == y.Age;
}
public int GetHashCode(MyPerson obj)
{
if (object.ReferenceEquals(obj, null)) return 0;
int hashCodeName = obj.Name == null ? 0 : obj.Name.GetHashCode();
int hasCodeAge = obj.Age.GetHashCode();
return hashCodeName ^ hasCodeAge;
}
}

I was getting this same error when testing a collection persisted by nHibernate. I was able to get this to work by overriding both the Equals and GetHashCode methods. If I didn't override both I still got the same error you mentioned:
CollectionAssert.AreEquivalent failed. The expected collection contains 1 occurrence(s) of .
The actual collection contains 0 occurrence(s).
I had the following object:
public class EVProjectLedger
{
public virtual long Id { get; protected set; }
public virtual string ProjId { get; set; }
public virtual string Ledger { get; set; }
public virtual AccountRule AccountRule { get; set; }
public virtual int AccountLength { get; set; }
public virtual string AccountSubstrMethod { get; set; }
private Iesi.Collections.Generic.ISet<Contract> myContracts = new HashedSet<Contract>();
public virtual Iesi.Collections.Generic.ISet<Contract> Contracts
{
get { return myContracts; }
set { myContracts = value; }
}
public override bool Equals(object obj)
{
EVProjectLedger evProjectLedger = (EVProjectLedger)obj;
return ProjId == evProjectLedger.ProjId && Ledger == evProjectLedger.Ledger;
}
public override int GetHashCode()
{
return new { ProjId, Ledger }.GetHashCode();
}
}
Which I tested using the following:
using (ITransaction tx = session.BeginTransaction())
{
var evProject = session.Get<EVProject>("C0G");
CollectionAssert.AreEquivalent(TestData._evProjectLedgers.ToList(), evProject.EVProjectLedgers.ToList());
tx.Commit();
}
I'm using nHibernate which encourages overriding these methods anyways. The one drawback I can see is that my Equals method is based on the business key of the object and therefore tests equality using the business key and no other fields. You could override Equals however you want but beware of equality pollution mentioned in this post:
CollectionAssert.AreEquivalent failing... can't figure out why

If you would like to achieve this without having to write an equality comaparer, there is a unit testing library that you can use, called FluentAssertions,
https://fluentassertions.com/documentation/
It has many built in equality extension functions including ones for the Collections. You can install it through Nuget and its really easy to use.
Taking the example in the question above all you have to write in the end is
list1.Should().BeEquivalentTo(list2);
By default, the order matters in the two collections, however it can be changed as well.

I wrote this to test collections where the order is not important:
public static bool AreCollectionsEquivalent<T>(ICollection<T> collectionA, ICollection<T> collectionB, IEqualityComparer<T> comparer)
{
if (collectionA.Count != collectionB.Count)
return false;
foreach (var a in collectionA)
{
if (!collectionB.Any(b => comparer.Equals(a, b)))
return false;
}
return true;
}
Not as elegant as using SequenceEquals, but it works.
Of course to use it you simply do:
Assert.IsTrue(AreCollectionsEquivalent<MyType>(collectionA, collectionB, comparer));

How to compare two distinctly different objects with similar properties

This is all in C#, using .NET 2.0.
I have two lists of objects. They are not related objects, but they do have certain things in common that can be compared, such as a GUID-based unique identifier. These two lists need to be filtered by another list which just contains GUIDs which may or may not match up with the IDs contained in the first two lists.
I have thought about the idea of casting each object list to just object and sorting by that, but I'm not sure that I'll be able to access the ID property once it's cast, and I'm thinking that the method to sort the two lists should be somewhat dumb in knowing what the list to be sorted is.
What would be the best way to bring in each object list so that it can be sorted against the list with only the IDs?

You should make each of your different objects implement a common interface. Then create an IComparer<T> for that interface and use it in your sort.

Okay, if you have access to modify your original classes only to add the interface there, Matthew had it spot on. I went a little crazy here and defined out a full solution using 2.0 anonymous delegates. (I think I'm way addicted to 3.0 Lambda; otherwise, I probably would've written this out in foreach loops if I was using 2005 still).
Basically, create an interface with the common properties. Make yoru two classes implement the interface. Create a common list casted as the interface, cast and rip the values into the new list; remove any unmatched items.
//Program Output:
List1:
206aa77c-8259-428b-a4a0-0e005d8b016c
64f71cc9-596d-4cb8-9eb3-35da3b96f583
List2:
10382452-a7fe-4307-ae4c-41580dc69146
97f3f3f6-6e64-4109-9737-cb72280bc112
64f71cc9-596d-4cb8-9eb3-35da3b96f583
Matches:
64f71cc9-596d-4cb8-9eb3-35da3b96f583
Press any key to continue . . .
using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApplication8
{
class Program
{
static void Main(string[] args)
{
//test initialization
List<ClassTypeA> list1 = new List<ClassTypeA>();
List<ClassTypeB> list2 = new List<ClassTypeB>();
ClassTypeA citem = new ClassTypeA();
ClassTypeB citem2 = new ClassTypeB();
citem2.ID = citem.ID;
list1.Add(new ClassTypeA());
list1.Add(citem);
list2.Add(new ClassTypeB());
list2.Add(new ClassTypeB());
list2.Add(citem2);
//new common list.
List<ICommonTypeMakeUpYourOwnName> common_list =
new List<ICommonTypeMakeUpYourOwnName>();
//in english, give me everything in list 1
//and cast it to the interface
common_list.AddRange(
list1.ConvertAll<ICommonTypeMakeUpYourOwnName>(delegate(
ClassTypeA x) { return (ICommonTypeMakeUpYourOwnName)x; }));
//in english, give me all the items in the
//common list that don't exist in list2 and remove them.
common_list.RemoveAll(delegate(ICommonTypeMakeUpYourOwnName x)
{ return list2.Find(delegate(ClassTypeB y)
{return y.ID == x.ID;}) == null; });
//show list1
Console.WriteLine("List1:");
foreach (ClassTypeA item in list1)
{
Console.WriteLine(item.ID);
}
//show list2
Console.WriteLine("\nList2:");
foreach (ClassTypeB item in list2)
{
Console.WriteLine(item.ID);
}
//show the common items
Console.WriteLine("\nMatches:");
foreach (ICommonTypeMakeUpYourOwnName item in common_list)
{
Console.WriteLine(item.ID);
}
}
}
interface ICommonTypeMakeUpYourOwnName
{
Guid ID { get; set; }
}
class ClassTypeA : ICommonTypeMakeUpYourOwnName
{
Guid _ID;
public Guid ID {get { return _ID; } set { _ID = value;}}
int _Stuff1;
public int Stuff1 {get { return _Stuff1; } set { _Stuff1 = value;}}
string _Stuff2;
public string Stuff2 {get { return _Stuff2; } set { _Stuff2 = value;}}
public ClassTypeA()
{
this.ID = Guid.NewGuid();
}
}
class ClassTypeB : ICommonTypeMakeUpYourOwnName
{
Guid _ID;
public Guid ID {get { return _ID; } set { _ID = value;}}
int _Stuff3;
public int Stuff3 {get { return _Stuff3; } set { _Stuff3 = value;}}
string _Stuff4;
public string Stuff4 {get { return _Stuff4; } set { _Stuff4 = value;}}
public ClassTypeB()
{
this.ID = Guid.NewGuid();
}
}
}

Using only .NET 2.0 methods:
class Foo
{
public Guid Guid { get; }
}
List<Foo> GetFooSubset(List<Foo> foos, List<Guid> guids)
{
return foos.FindAll(foo => guids.Contains(foo.Guid));
}
If your classes don't implement a common interface, you'll have to implement GetFooSubset for each type individually.

I'm not sure that I fully understand what you want, but you can use linq to select out the matching items from the lists as well as sorting them. Here is a simple example where the values from one list are filtered on another and sorted.
List<int> itemList = new List<int>() { 9,6,3,4,5,2,7,8,1 };
List<int> filterList = new List<int>() { 2, 6, 9 };
IEnumerable<int> filtered = itemList.SelectMany(item => filterList.Where(filter => filter == item)).OrderBy(p => p);

I haven't had a chance to use AutoMapper yet, but from what you describe you wish to check it out. From Jimmy Bogard's post:
AutoMapper conventions
Since AutoMapper flattens, it will
look for:
Matching property names
Nested property names (Product.Name
maps to ProductName, by assuming a
PascalCase naming convention)
Methods starting with the word “Get”,
so GetTotal() maps to Total
Any existing type map already
configured
Basically, if you removed all the
“dots” and “Gets”, AutoMapper will
match property names. Right now,
AutoMapper does not fail on mismatched
types, but for some other reasons.

I am not totally sure what you want as your end results, however....
If you are comparing the properties on two different types you could project the property names and corresponding values into two dictionaries. And with that information do some sort of sorting/difference of the property values.
Guid newGuid = Guid.NewGuid();
var classA = new ClassA{Id = newGuid};
var classB = new ClassB{Id = newGuid};
PropertyInfo[] classAProperties = classA.GetType().GetProperties();
Dictionary<string, object> classAPropertyValue = classAProperties.ToDictionary(pName => pName.Name,
pValue =>
pValue.GetValue(classA, null));
PropertyInfo[] classBProperties = classB.GetType().GetProperties();
Dictionary<string, object> classBPropetyValue = classBProperties.ToDictionary(pName => pName.Name,
pValue =>
pValue.GetValue(classB, null));
internal class ClassB
{
public Guid Id { get; set; }
}
internal class ClassA
{
public Guid Id { get; set; }
}
classAPropertyValue
Count = 1
[0]: {[Id, d0093d33-a59b-4537-bde9-67db324cf7f6]}
classBPropetyValue
Count = 1
[0]: {[Id, d0093d33-a59b-4537-bde9-67db324cf7f6]}

Thist should essentially get you what you want - but you may be better of using linq
class T1
{
public T1(Guid g, string n) { Guid = g; MyName = n; }
public Guid Guid { get; set; }
public string MyName { get; set; }
}
class T2
{
public T2(Guid g, string n) { ID = g; Name = n; }
public Guid ID { get; set; }
public string Name { get; set; }
}
class Test
{
public void Run()
{
Guid G1 = Guid.NewGuid();
Guid G2 = Guid.NewGuid();
Guid G3 = Guid.NewGuid();
List<T1> t1s = new List<T1>() {
new T1(G1, "one"),
new T1(G2, "two"),
new T1(G3, "three")
};
List<Guid> filter = new List<Guid>() { G2, G3};
List<T1> filteredValues1 = t1s.FindAll(delegate(T1 item)
{
return filter.Contains(item.Guid);
});
List<T1> filteredValues2 = t1s.FindAll(o1 => filter.Contains(o1.Guid));
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C#: Compare two ArrayList of custom class and find duplicates - c#

Related

C# How to find index of a object in listbox datasource (binded with List<T>) by Linq

On string interning and alternatives

How do I get a distinct value of items in a list of object or structure?

MSTest: CollectionAssert.AreEquivalent failed. The expected collection contains 1 occurrence(s) of

How to compare two distinctly different objects with similar properties

Categories

Resources